rtweet get_retweets() tutorial
Be sure you have completed the rtweet setup tutorial first: rtweet setup tutorial
Install Necessary Packages (Brandon)
The first thing that is necessary to retrieve Twitter data is to install the rtweet
package. To do this we will use the install.packages
function in the console.
In the console, type:
install.packages("rtweet")
This may take a while because there are alot of other packages that rtweet
depends on that must be installed as well.
Now that the package is installed, we will use the library
function in the .Rmd file to make sure that rtweet is able to be used in your file.
library(rtweet)
library(tidyverse)
## ── Attaching packages ──────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1 ✓ purrr 0.3.3
## ✓ tibble 2.1.3 ✓ dplyr 0.8.4
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.4.0
## ── Conflicts ─────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x purrr::flatten() masks rtweet::flatten()
## x dplyr::lag() masks stats::lag()
Once the rtweet
package is in your environment, you are now ready to use the package to get data.
Using rtweet to Get Twitter Data
Now that the package is loaded in, we will now go over the get_retweets
function in rtweet
.
To get more info on this function, type the following code into the console.
?rtweet::get_retweets()
Arguments of get_retweets Function
The arguments of the get_retweets function is as follows:
get_retweets(status_id, n = 100, parse = TRUE, token = NULL, …)
status_id
is the digit on the end of the url of the desired tweet you want retweet data on. This number needs to be put in quotes to make it a character.
n
is the number of retweets you want to get, THE MAXIMUM FOR n IS 100
parse
is a logical indication (TRUE/FALSE) of whether you want the output to be an R list. The default of this function is TRUE
token
is where you would put in your token if you had a developer account for twitter, but the default is NULL which means you don’t need a token.
Demonstrating the get_retweets Function
I want to find the retweet data on this tweet
https://twitter.com/realdonaldtrump/status/1244056534583312384
The following code will get the data desired stored into the variable test_tweet
get_retweets() is a function from the rtweet library. You may be asked to use your twitter login information to access the tweets.
test_tweet <- get_retweets("1244056534583312384", n = 100)
Working with the Data Received from rtweet (Irf)
So we’ve used the get_retweets() function and it has returned a dataframe and saved it to the test_tweet variable. The data frame should now be available under the environment tab on the right side of R studio. You can open the file in R studio to manually inspect the test_tweet data frame.
Inspect data from get_retweets()
Use glimpse() function to view data-frame variable names, type and sample data in a quick table.
## To quickly get an idea the structure of the data use the glimpse() function from the dplr package in the tidyverse library, passsing in the test_tweet data frames
glimpse(head(test_tweet))
## Observations: 6
## Variables: 90
## $ user_id <chr> "790317228323434496", "849685715323506688", "…
## $ status_id <chr> "1259857754346455042", "1255324972387635202",…
## $ created_at <dttm> 2020-05-11 14:47:50, 2020-04-29 02:36:10, 20…
## $ screen_name <chr> "JuanGab29782989", "RcRegalstarfire", "Manzan…
## $ text <chr> "On the recommendation of the White House Cor…
## $ source <chr> "Twitter for Android", "Twitter for Android",…
## $ display_text_width <int> NA, NA, NA, NA, NA, NA
## $ reply_to_status_id <lgl> NA, NA, NA, NA, NA, NA
## $ reply_to_user_id <lgl> NA, NA, NA, NA, NA, NA
## $ reply_to_screen_name <lgl> NA, NA, NA, NA, NA, NA
## $ is_quote <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE
## $ is_retweet <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE
## $ favorite_count <int> 0, 0, 0, 0, 0, 0
## $ retweet_count <int> 23570, 23570, 23570, 23570, 23570, 23570
## $ quote_count <int> NA, NA, NA, NA, NA, NA
## $ reply_count <int> NA, NA, NA, NA, NA, NA
## $ hashtags <list> [NA, NA, NA, NA, NA, NA]
## $ symbols <list> [NA, NA, NA, NA, NA, NA]
## $ urls_url <list> [NA, NA, NA, NA, NA, NA]
## $ urls_t.co <list> [NA, NA, NA, NA, NA, NA]
## $ urls_expanded_url <list> [NA, NA, NA, NA, NA, NA]
## $ media_url <list> [NA, NA, NA, NA, NA, NA]
## $ media_t.co <list> [NA, NA, NA, NA, NA, NA]
## $ media_expanded_url <list> [NA, NA, NA, NA, NA, NA]
## $ media_type <list> [NA, NA, NA, NA, NA, NA]
## $ ext_media_url <list> [NA, NA, NA, NA, NA, NA]
## $ ext_media_t.co <list> [NA, NA, NA, NA, NA, NA]
## $ ext_media_expanded_url <list> [NA, NA, NA, NA, NA, NA]
## $ ext_media_type <chr> NA, NA, NA, NA, NA, NA
## $ mentions_user_id <list> ["25073877", "25073877", "25073877", "250738…
## $ mentions_screen_name <list> ["realDonaldTrump", "realDonaldTrump", "real…
## $ lang <chr> "en", "en", "en", "en", "en", "en"
## $ quoted_status_id <chr> NA, NA, NA, NA, NA, NA
## $ quoted_text <chr> NA, NA, NA, NA, NA, NA
## $ quoted_created_at <dttm> NA, NA, NA, NA, NA, NA
## $ quoted_source <chr> NA, NA, NA, NA, NA, NA
## $ quoted_favorite_count <int> NA, NA, NA, NA, NA, NA
## $ quoted_retweet_count <int> NA, NA, NA, NA, NA, NA
## $ quoted_user_id <chr> NA, NA, NA, NA, NA, NA
## $ quoted_screen_name <chr> NA, NA, NA, NA, NA, NA
## $ quoted_name <chr> NA, NA, NA, NA, NA, NA
## $ quoted_followers_count <int> NA, NA, NA, NA, NA, NA
## $ quoted_friends_count <int> NA, NA, NA, NA, NA, NA
## $ quoted_statuses_count <int> NA, NA, NA, NA, NA, NA
## $ quoted_location <chr> NA, NA, NA, NA, NA, NA
## $ quoted_description <chr> NA, NA, NA, NA, NA, NA
## $ quoted_verified <lgl> NA, NA, NA, NA, NA, NA
## $ retweet_status_id <chr> "1244056534583312384", "1244056534583312384",…
## $ retweet_text <chr> "On the recommendation of the White House Cor…
## $ retweet_created_at <dttm> 2020-03-29 00:19:25, 2020-03-29 00:19:25, 20…
## $ retweet_source <chr> "Twitter for iPhone", "Twitter for iPhone", "…
## $ retweet_favorite_count <int> 108668, 108668, 108668, 108668, 108668, 108668
## $ retweet_retweet_count <int> 23570, 23570, 23570, 23570, 23570, 23570
## $ retweet_user_id <chr> "25073877", "25073877", "25073877", "25073877…
## $ retweet_screen_name <chr> "realDonaldTrump", "realDonaldTrump", "realDo…
## $ retweet_name <chr> "Donald J. Trump", "Donald J. Trump", "Donald…
## $ retweet_followers_count <int> 80364981, 80364981, 80364981, 80364981, 80364…
## $ retweet_friends_count <int> 46, 46, 46, 46, 46, 46
## $ retweet_statuses_count <int> 52071, 52071, 52071, 52071, 52071, 52071
## $ retweet_location <chr> "Washington, DC", "Washington, DC", "Washingt…
## $ retweet_description <chr> "45th President of the United States of Ameri…
## $ retweet_verified <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE
## $ place_url <chr> NA, NA, NA, NA, NA, NA
## $ place_name <chr> NA, NA, NA, NA, NA, NA
## $ place_full_name <chr> NA, NA, NA, NA, NA, NA
## $ place_type <chr> NA, NA, NA, NA, NA, NA
## $ country <chr> NA, NA, NA, NA, NA, NA
## $ country_code <chr> NA, NA, NA, NA, NA, NA
## $ geo_coords <list> [<NA, NA>, <NA, NA>, <NA, NA>, <NA, NA>, <NA…
## $ coords_coords <list> [<NA, NA>, <NA, NA>, <NA, NA>, <NA, NA>, <NA…
## $ bbox_coords <list> [<NA, NA, NA, NA, NA, NA, NA, NA>, <NA, NA, …
## $ status_url <chr> "https://twitter.com/JuanGab29782989/status/1…
## $ name <chr> "OMEGA22", "Cheryl Marshall", "bye-bye", "San…
## $ location <chr> "Florida, USA", "Blaine, WA", "United States"…
## $ description <chr> "Soccer Sports NFL Basketball Tennis Premier …
## $ url <chr> NA, NA, NA, NA, NA, NA
## $ protected <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE
## $ followers_count <int> 759, 329, 14968, 21, 14480, 3067
## $ friends_count <int> 524, 571, 16545, 26, 14360, 4636
## $ listed_count <int> 0, 0, 2, 0, 14, 14
## $ statuses_count <int> 41521, 13332, 350452, 11591, 220261, 46697
## $ favourites_count <int> 38310, 8306, 307824, 18597, 197695, 44998
## $ account_created_at <dttm> 2016-10-23 22:21:25, 2017-04-05 18:10:35, 20…
## $ verified <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE
## $ profile_url <chr> NA, NA, NA, NA, NA, NA
## $ profile_expanded_url <chr> NA, NA, NA, NA, NA, NA
## $ account_lang <lgl> NA, NA, NA, NA, NA, NA
## $ profile_banner_url <chr> "https://pbs.twimg.com/profile_banners/790317…
## $ profile_background_url <chr> NA, NA, NA, "http://abs.twimg.com/images/them…
## $ profile_image_url <chr> "http://pbs.twimg.com/profile_images/81973753…
The glimpse() creates a table of each variable in the dataset, and a column denoting the variable type, and a sample data.
<chr>
character<dttm>
date and time<int>
integer<lgl>
boolean logic<list>
list
For the most part, the variable names are self explanatory.
You can also manually inspect the dataframe from the environment tab on the top right side of R studio.
Notice: Not every variable in the dataframe contains actual data. Many variables do not contain actual data, which is denoted by the NA (not avaiable) value.
Determine which variables are of interest to you
You can access individual variables in the dataframe using the notation data_frame_name$variable_name
notation.
## Create a new dataframe with basic information about the usser
vars_to_inspect <- test_tweet %>% select(screen_name,name, description, location)
Working with the data
Searching for specific words, or ‘patterns’ within variables
So now we have a small data frame vars_to_inspect
with the variable screen_name, name, dscription, and location. I noticed in many of the descriptions people used the term “MAGA” so I would like to find out of how many of these users have the word #MAGA in their user description.
use the grepl() function for text pattern matching of string variable types.
Lets create a subset of all the users who retweeted the tweet who have the ‘#MAGA’ term in their user description.
## Create subset of vars_to_inspect dataframe by using the grepl() function which returns TRUE/FALSE, if the row returns true and contains "#MAGA" it will be added to the subset. grep() vs grepl()
maga_users <- vars_to_inspect %>% filter(grepl("#MAGA", vars_to_inspect$description))
maga_users
## # A tibble: 21 x 4
## screen_name name description location
## <chr> <chr> <chr> <chr>
## 1 raybae689 RAY BAEZ "🇺🇸 #KAG #REPUBLICAN #HAPPILY MARRIED,\… "Queens, N…
## 2 raybae689 RAY BAEZ "🇺🇸 #KAG #REPUBLICAN #HAPPILY MARRIED,\… "Queens, N…
## 3 raybae689 RAY BAEZ "🇺🇸 #KAG #REPUBLICAN #HAPPILY MARRIED,\… "Queens, N…
## 4 raybae689 RAY BAEZ "🇺🇸 #KAG #REPUBLICAN #HAPPILY MARRIED,\… "Queens, N…
## 5 Nurse4Trump Nurse4Trump "#MAGA @Foxnews #BlueLivesMatter" "United St…
## 6 Brian439101… Brian Espi… "Liberty Car Service Rep.\nOsha30 licen… ""
## 7 RealTyWebb Ty Webb "Twitter limiting tweets, n people I ca… "Tennessee…
## 8 RealTyWebb Ty Webb "Twitter limiting tweets, n people I ca… "Tennessee…
## 9 ldo_lawrence Lawrence M… "Father. Animal lover, 🐕 rescuer #maskO… "AZ. "
## 10 Pacificnw777 Kay #MAGA "Working to #MAGA for my children. Chri… ""
## # … with 11 more rows
WARNING: grep() is specific, lets redo the grep() search but this time lets drop the hashtag (#) and see if there is a differece.
## Create subset of vars_to_inspect dataframe by using the grepl() function which returns TRUE/FALSE, if the row returns true and contains "#MAGA" it will be added to the subset. grep() vs grepl()
maga_users_nohashtag <- vars_to_inspect %>% filter(grepl("MAGA", vars_to_inspect$description))
maga_users_nohashtag
## # A tibble: 23 x 4
## screen_name name description location
## <chr> <chr> <chr> <chr>
## 1 raybae689 RAY BAEZ "🇺🇸 #KAG #REPUBLICAN #HAPPILY MAR… "Queens, …
## 2 raybae689 RAY BAEZ "🇺🇸 #KAG #REPUBLICAN #HAPPILY MAR… "Queens, …
## 3 raybae689 RAY BAEZ "🇺🇸 #KAG #REPUBLICAN #HAPPILY MAR… "Queens, …
## 4 raybae689 RAY BAEZ "🇺🇸 #KAG #REPUBLICAN #HAPPILY MAR… "Queens, …
## 5 Nurse4Trump Nurse4Trump "#MAGA @Foxnews #BlueLivesMatter" "United S…
## 6 Brian439101… Brian Espinal "Liberty Car Service Rep.\nOsha30… ""
## 7 RealTyWebb Ty Webb "Twitter limiting tweets, n peopl… "Tennesse…
## 8 RealTyWebb Ty Webb "Twitter limiting tweets, n peopl… "Tennesse…
## 9 ldo_lawrence Lawrence MacDonald "Father. Animal lover, 🐕 rescuer … "AZ. "
## 10 HambySr X chet hamby sr. … "SILENCE IN THE FACE OF EVIL IS … "Provo, U…
## # … with 13 more rows
How many retweeter’s contain #MAGA in their profile description?
## Calculate percent of retweeters with #MAGA in their username
count(maga_users)
## # A tibble: 1 x 1
## n
## <int>
## 1 21
count(maga_users_nohashtag)
## # A tibble: 1 x 1
## n
## <int>
## 1 23
count(maga_users) / count(test_tweet) * 100
## n
## 1 23.86364
count(maga_users_nohashtag) / count(test_tweet) * 100
## n
## 1 26.13636
From the results we can see that searching for the term #MAGA
vs MAGA
has different results due to the presence of the hashta. It should be noted that the non hash tagged search results would contain those with the hashtag.