rtweet get_retweets() tutorial

Last updated on Mar 29, 2020 10 min read Math485, Covid-19, data science

Be sure you have completed the rtweet setup tutorial first: rtweet setup tutorial

Install Necessary Packages (Brandon)

The first thing that is necessary to retrieve Twitter data is to install the rtweet package. To do this we will use the install.packages function in the console.

In the console, type:

install.packages("rtweet")

This may take a while because there are alot of other packages that rtweet depends on that must be installed as well.

Now that the package is installed, we will use the library function in the .Rmd file to make sure that rtweet is able to be used in your file.

library(rtweet)
library(tidyverse)

## ── Attaching packages ──────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.2.1     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.4
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0

## ── Conflicts ─────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter()  masks stats::filter()
## x purrr::flatten() masks rtweet::flatten()
## x dplyr::lag()     masks stats::lag()

Once the rtweet package is in your environment, you are now ready to use the package to get data.

Using rtweet to Get Twitter Data

Now that the package is loaded in, we will now go over the get_retweets function in rtweet.

To get more info on this function, type the following code into the console.

?rtweet::get_retweets()

Arguments of get_retweets Function

The arguments of the get_retweets function is as follows:

get_retweets(status_id, n = 100, parse = TRUE, token = NULL, …)

status_id is the digit on the end of the url of the desired tweet you want retweet data on. This number needs to be put in quotes to make it a character.

n is the number of retweets you want to get, THE MAXIMUM FOR n IS 100

parse is a logical indication (TRUE/FALSE) of whether you want the output to be an R list. The default of this function is TRUE

token is where you would put in your token if you had a developer account for twitter, but the default is NULL which means you don’t need a token.

Demonstrating the get_retweets Function

I want to find the retweet data on this tweet

https://twitter.com/realdonaldtrump/status/1244056534583312384

The following code will get the data desired stored into the variable test_tweet

get_retweets() is a function from the rtweet library. You may be asked to use your twitter login information to access the tweets.

test_tweet <- get_retweets("1244056534583312384", n = 100)

Working with the Data Received from rtweet (Irf)

So we’ve used the get_retweets() function and it has returned a dataframe and saved it to the test_tweet variable. The data frame should now be available under the environment tab on the right side of R studio. You can open the file in R studio to manually inspect the test_tweet data frame.

Inspect data from get_retweets()

Use glimpse() function to view data-frame variable names, type and sample data in a quick table.

## To quickly get an idea the structure of the data use the  glimpse() function from the dplr package in the tidyverse library, passsing in the test_tweet data frames

glimpse(head(test_tweet))

## Observations: 6
## Variables: 90
## $ user_id                 <chr> "790317228323434496", "849685715323506688", "…
## $ status_id               <chr> "1259857754346455042", "1255324972387635202",…
## $ created_at              <dttm> 2020-05-11 14:47:50, 2020-04-29 02:36:10, 20…
## $ screen_name             <chr> "JuanGab29782989", "RcRegalstarfire", "Manzan…
## $ text                    <chr> "On the recommendation of the White House Cor…
## $ source                  <chr> "Twitter for Android", "Twitter for Android",…
## $ display_text_width      <int> NA, NA, NA, NA, NA, NA
## $ reply_to_status_id      <lgl> NA, NA, NA, NA, NA, NA
## $ reply_to_user_id        <lgl> NA, NA, NA, NA, NA, NA
## $ reply_to_screen_name    <lgl> NA, NA, NA, NA, NA, NA
## $ is_quote                <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE
## $ is_retweet              <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE
## $ favorite_count          <int> 0, 0, 0, 0, 0, 0
## $ retweet_count           <int> 23570, 23570, 23570, 23570, 23570, 23570
## $ quote_count             <int> NA, NA, NA, NA, NA, NA
## $ reply_count             <int> NA, NA, NA, NA, NA, NA
## $ hashtags                <list> [NA, NA, NA, NA, NA, NA]
## $ symbols                 <list> [NA, NA, NA, NA, NA, NA]
## $ urls_url                <list> [NA, NA, NA, NA, NA, NA]
## $ urls_t.co               <list> [NA, NA, NA, NA, NA, NA]
## $ urls_expanded_url       <list> [NA, NA, NA, NA, NA, NA]
## $ media_url               <list> [NA, NA, NA, NA, NA, NA]
## $ media_t.co              <list> [NA, NA, NA, NA, NA, NA]
## $ media_expanded_url      <list> [NA, NA, NA, NA, NA, NA]
## $ media_type              <list> [NA, NA, NA, NA, NA, NA]
## $ ext_media_url           <list> [NA, NA, NA, NA, NA, NA]
## $ ext_media_t.co          <list> [NA, NA, NA, NA, NA, NA]
## $ ext_media_expanded_url  <list> [NA, NA, NA, NA, NA, NA]
## $ ext_media_type          <chr> NA, NA, NA, NA, NA, NA
## $ mentions_user_id        <list> ["25073877", "25073877", "25073877", "250738…
## $ mentions_screen_name    <list> ["realDonaldTrump", "realDonaldTrump", "real…
## $ lang                    <chr> "en", "en", "en", "en", "en", "en"
## $ quoted_status_id        <chr> NA, NA, NA, NA, NA, NA
## $ quoted_text             <chr> NA, NA, NA, NA, NA, NA
## $ quoted_created_at       <dttm> NA, NA, NA, NA, NA, NA
## $ quoted_source           <chr> NA, NA, NA, NA, NA, NA
## $ quoted_favorite_count   <int> NA, NA, NA, NA, NA, NA
## $ quoted_retweet_count    <int> NA, NA, NA, NA, NA, NA
## $ quoted_user_id          <chr> NA, NA, NA, NA, NA, NA
## $ quoted_screen_name      <chr> NA, NA, NA, NA, NA, NA
## $ quoted_name             <chr> NA, NA, NA, NA, NA, NA
## $ quoted_followers_count  <int> NA, NA, NA, NA, NA, NA
## $ quoted_friends_count    <int> NA, NA, NA, NA, NA, NA
## $ quoted_statuses_count   <int> NA, NA, NA, NA, NA, NA
## $ quoted_location         <chr> NA, NA, NA, NA, NA, NA
## $ quoted_description      <chr> NA, NA, NA, NA, NA, NA
## $ quoted_verified         <lgl> NA, NA, NA, NA, NA, NA
## $ retweet_status_id       <chr> "1244056534583312384", "1244056534583312384",…
## $ retweet_text            <chr> "On the recommendation of the White House Cor…
## $ retweet_created_at      <dttm> 2020-03-29 00:19:25, 2020-03-29 00:19:25, 20…
## $ retweet_source          <chr> "Twitter for iPhone", "Twitter for iPhone", "…
## $ retweet_favorite_count  <int> 108668, 108668, 108668, 108668, 108668, 108668
## $ retweet_retweet_count   <int> 23570, 23570, 23570, 23570, 23570, 23570
## $ retweet_user_id         <chr> "25073877", "25073877", "25073877", "25073877…
## $ retweet_screen_name     <chr> "realDonaldTrump", "realDonaldTrump", "realDo…
## $ retweet_name            <chr> "Donald J. Trump", "Donald J. Trump", "Donald…
## $ retweet_followers_count <int> 80364981, 80364981, 80364981, 80364981, 80364…
## $ retweet_friends_count   <int> 46, 46, 46, 46, 46, 46
## $ retweet_statuses_count  <int> 52071, 52071, 52071, 52071, 52071, 52071
## $ retweet_location        <chr> "Washington, DC", "Washington, DC", "Washingt…
## $ retweet_description     <chr> "45th President of the United States of Ameri…
## $ retweet_verified        <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE
## $ place_url               <chr> NA, NA, NA, NA, NA, NA
## $ place_name              <chr> NA, NA, NA, NA, NA, NA
## $ place_full_name         <chr> NA, NA, NA, NA, NA, NA
## $ place_type              <chr> NA, NA, NA, NA, NA, NA
## $ country                 <chr> NA, NA, NA, NA, NA, NA
## $ country_code            <chr> NA, NA, NA, NA, NA, NA
## $ geo_coords              <list> [<NA, NA>, <NA, NA>, <NA, NA>, <NA, NA>, <NA…
## $ coords_coords           <list> [<NA, NA>, <NA, NA>, <NA, NA>, <NA, NA>, <NA…
## $ bbox_coords             <list> [<NA, NA, NA, NA, NA, NA, NA, NA>, <NA, NA, …
## $ status_url              <chr> "https://twitter.com/JuanGab29782989/status/1…
## $ name                    <chr> "OMEGA22", "Cheryl Marshall", "bye-bye", "San…
## $ location                <chr> "Florida, USA", "Blaine, WA", "United States"…
## $ description             <chr> "Soccer Sports NFL Basketball Tennis Premier …
## $ url                     <chr> NA, NA, NA, NA, NA, NA
## $ protected               <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE
## $ followers_count         <int> 759, 329, 14968, 21, 14480, 3067
## $ friends_count           <int> 524, 571, 16545, 26, 14360, 4636
## $ listed_count            <int> 0, 0, 2, 0, 14, 14
## $ statuses_count          <int> 41521, 13332, 350452, 11591, 220261, 46697
## $ favourites_count        <int> 38310, 8306, 307824, 18597, 197695, 44998
## $ account_created_at      <dttm> 2016-10-23 22:21:25, 2017-04-05 18:10:35, 20…
## $ verified                <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE
## $ profile_url             <chr> NA, NA, NA, NA, NA, NA
## $ profile_expanded_url    <chr> NA, NA, NA, NA, NA, NA
## $ account_lang            <lgl> NA, NA, NA, NA, NA, NA
## $ profile_banner_url      <chr> "https://pbs.twimg.com/profile_banners/790317…
## $ profile_background_url  <chr> NA, NA, NA, "http://abs.twimg.com/images/them…
## $ profile_image_url       <chr> "http://pbs.twimg.com/profile_images/81973753…

The glimpse() creates a table of each variable in the dataset, and a column denoting the variable type, and a sample data.

<chr> character
<dttm> date and time
<int> integer
<lgl> boolean logic
<list> list

For the most part, the variable names are self explanatory.

You can also manually inspect the dataframe from the environment tab on the top right side of R studio.

Notice: Not every variable in the dataframe contains actual data. Many variables do not contain actual data, which is denoted by the NA (not avaiable) value.

Determine which variables are of interest to you

You can access individual variables in the dataframe using the notation data_frame_name$variable_name notation.

## Create a new dataframe with basic information about the usser
vars_to_inspect <- test_tweet %>% select(screen_name,name, description, location)

Working with the data

Searching for specific words, or ‘patterns’ within variables

So now we have a small data frame vars_to_inspect with the variable screen_name, name, dscription, and location. I noticed in many of the descriptions people used the term “MAGA” so I would like to find out of how many of these users have the word #MAGA in their user description.

use the grepl() function for text pattern matching of string variable types.

Lets create a subset of all the users who retweeted the tweet who have the ‘#MAGA’ term in their user description.

## Create subset of vars_to_inspect dataframe by using the grepl() function which returns TRUE/FALSE, if the row returns true and contains "#MAGA" it will be added to the subset. grep() vs grepl() 

maga_users <- vars_to_inspect %>% filter(grepl("#MAGA", vars_to_inspect$description))
maga_users

## # A tibble: 21 x 4
##    screen_name  name        description                              location   
##    <chr>        <chr>       <chr>                                    <chr>      
##  1 raybae689    RAY BAEZ    "🇺🇸 #KAG #REPUBLICAN #HAPPILY MARRIED,\… "Queens, N…
##  2 raybae689    RAY BAEZ    "🇺🇸 #KAG #REPUBLICAN #HAPPILY MARRIED,\… "Queens, N…
##  3 raybae689    RAY BAEZ    "🇺🇸 #KAG #REPUBLICAN #HAPPILY MARRIED,\… "Queens, N…
##  4 raybae689    RAY BAEZ    "🇺🇸 #KAG #REPUBLICAN #HAPPILY MARRIED,\… "Queens, N…
##  5 Nurse4Trump  Nurse4Trump "#MAGA @Foxnews #BlueLivesMatter"        "United St…
##  6 Brian439101… Brian Espi… "Liberty Car Service Rep.\nOsha30 licen… ""         
##  7 RealTyWebb   Ty Webb     "Twitter limiting tweets, n people I ca… "Tennessee…
##  8 RealTyWebb   Ty Webb     "Twitter limiting tweets, n people I ca… "Tennessee…
##  9 ldo_lawrence Lawrence M… "Father. Animal lover, 🐕 rescuer #maskO… "AZ. "     
## 10 Pacificnw777 Kay #MAGA   "Working to #MAGA for my children. Chri… ""         
## # … with 11 more rows

WARNING: grep() is specific, lets redo the grep() search but this time lets drop the hashtag (#) and see if there is a differece.

## Create subset of vars_to_inspect dataframe by using the grepl() function which returns TRUE/FALSE, if the row returns true and contains "#MAGA" it will be added to the subset. grep() vs grepl() 

maga_users_nohashtag <- vars_to_inspect %>% filter(grepl("MAGA", vars_to_inspect$description))

maga_users_nohashtag

## # A tibble: 23 x 4
##    screen_name  name               description                        location  
##    <chr>        <chr>              <chr>                              <chr>     
##  1 raybae689    RAY BAEZ           "🇺🇸 #KAG #REPUBLICAN #HAPPILY MAR… "Queens, …
##  2 raybae689    RAY BAEZ           "🇺🇸 #KAG #REPUBLICAN #HAPPILY MAR… "Queens, …
##  3 raybae689    RAY BAEZ           "🇺🇸 #KAG #REPUBLICAN #HAPPILY MAR… "Queens, …
##  4 raybae689    RAY BAEZ           "🇺🇸 #KAG #REPUBLICAN #HAPPILY MAR… "Queens, …
##  5 Nurse4Trump  Nurse4Trump        "#MAGA @Foxnews #BlueLivesMatter"  "United S…
##  6 Brian439101… Brian Espinal      "Liberty Car Service Rep.\nOsha30… ""        
##  7 RealTyWebb   Ty Webb            "Twitter limiting tweets, n peopl… "Tennesse…
##  8 RealTyWebb   Ty Webb            "Twitter limiting tweets, n peopl… "Tennesse…
##  9 ldo_lawrence Lawrence MacDonald "Father. Animal lover, 🐕 rescuer … "AZ. "    
## 10 HambySr      X chet hamby sr. … "SILENCE  IN THE FACE OF EVIL IS … "Provo, U…
## # … with 13 more rows

How many retweeter’s contain #MAGA in their profile description?

## Calculate percent of retweeters with #MAGA in their username
count(maga_users)

## # A tibble: 1 x 1
##       n
##   <int>
## 1    21

count(maga_users_nohashtag)

## # A tibble: 1 x 1
##       n
##   <int>
## 1    23

count(maga_users) / count(test_tweet) * 100

##          n
## 1 23.86364

count(maga_users_nohashtag) / count(test_tweet) * 100

##          n
## 1 26.13636

From the results we can see that searching for the term #MAGA vs MAGA has different results due to the presence of the hashta. It should be noted that the non hash tagged search results would contain those with the hashtag.

Irfan Ainuddin

Graduate Student

My research interests include soil genesis, soil mapping, soil education and outreach, soil fertility and nitrogen management.