Getting Started with the Twitter API in R
March, 2022
Laura Moses
Overview
This is a walkthrough of getting started with Twitter's API (Application Programming Interface) to access tweets and Twitter user data. If you are planning on using Twitter for multiple projects or your dissertation, consider applying for access to the Academic Research access . There are numerous pacakages in many different programming languages to access Twitter's API, but this overview will focus on R, specifically using the package rtweet.
install.packages("rtweet")
Please note that when you set up and use the API, you agree to the terms of service for Twitter and their data use policies. Please keep this in mind these as you do research and with the Twitter data.
Connecting to the Twitter API Obtaining and Using Access Tokens
rtweet makes it really simple to acces the Twitter API by connecting your Twitter user account to an application called rstats2twitterapplication . The first time you try to use rtweet your brower should open a window to authroize your personal Twitter account automatically. This is fine is you're collecting a small amount of data for testing, homework and term papers. However, if you are collecting a lot of data for a longer term project you should set up your own developer account, app and associated set of keys. For more details, see the documentaiton or auth_setup_default().
library(rtweet)
To use the Twitter API, you need to create access tokens, these are a set of keys used to authenticate your Twitter app acess. To use the rtweet package, you need toh create and obtain these tokens. RTweet's documentation contains a detailed step by step guide to creating a developer account and creating tokens here: click for link to rtweet vignette on API access tokens.
Searching for Tweets
Twitter has rate limits to the number of search results returned for many of its queries. For searching tweets, the cap is set to 18,000 every 15 minutes. To request more than that, setretryonratelimit = TRUE
and rtweet will collect the number specified >18k and implement the for rate limit resets for you. Let's consider an example for searching for 5,000 tweets that contain a specific hashtag, #polisci. To exclude retweets, we will set the option include_rts=FALSE
ps_tweets = search_tweets("#polisci", n=5000, include_rts=FALSE)
Getting Data on a User
Another option for data collection is to get the tweets from a specific user. Note that you are limited to the most recent 3,2000 tweets for any user. Below is an example for collecting the 100 most recent tweets from APSA (@APSAtweets)
apsa_tweets = get_timelines("APSAtweets", n = 100)
You may also want to get the friends for a user (the accounts that they follow) and the follwers (the accounts that follow a user). The following gets the user IDs for friends:
apsa_friends = get_friends("APSAtweets")
To get some basic account information on the users APSA follows:
apsa_friends_info = lookup_users(apsa_friends_info$user_id)
Similarly, you can get the followers for APSA:
apsa_followers = get_followers("APSAtweets", n=10000)
This returns 10,000 followers for the account. It is possible to collect all followers by setting n=Inf, retryonratelimit=TRUE
but for some accounts, like @POTUS who has 19.3 million followers, this can take an increasingly long time, and in some cases may not be tractable.
Other Resources
Twitter's Developer Platform Getting Started Guide: https://developer.twitter.com/en/docs/twitter-api/getting-started/about-twitter-api
This quick start guide is based wholly on the documentation for rtweet. More details and use case examples can be found in the documenation at: https://github.com/ropensci/rtweet
The vignette, "Intro to rtweet: Collecting Data" by Michael W. Kearney is also an excellent start guide: https://cran.r-project.org/web/packages/rtweet/vignettes/intro.html