Closed Captioning Closed captioning available on our YouTube channel

How to search Twitter with R

InfoWorld | Jan 23, 2020

See how to search, filter, and sort tweets using R and the rtweet package. It’s a great way to follow conference hashtags. If you want to follow along with the optional code for creating clickable URL links, make sure to install the purrr package if it’s not already on your system.

Copyright © 2020 IDG Communications, Inc.

Similar
Hi. I’m Sharon Machlis at IDG, here with episode 41 of Do More With R: Track a Twitter Hashtag.

Twitter is a great source of R news – especially during conferences like useR and RStudio Conference. And thanks to R and the rtweet package, I can build my own tool to download tweets for easy searching, sorting, and filtering. Let’s take a look, step by step.

First you want to install and load any of this project’s packages you don’t already have: rtweet, reactable, glue, stringr, httpuv, and dplyr. Then to start, load rtweet and dplyr.

To use rtweet, you need a Twitter account, because you need to authorize rtweet to use your specific account credentials. That’s because there’s a limit to how many tweets you can download in a 15-minute time period. Michael Kearney, who wrote rtweet, gives us two choices. The super easy way is to just request some tweets. If you don’t have any credentials stored, a browser window should pop up asking you to authorize the request. After that, an authorization token gets stored in your R environment file so you don’t have to re-authorize in the future! (You can go to rtweet.info to see the other way, which is to set up a developer project and get authorization credentials. If you’re going to use rtweet a lot, you’ll probably want to do that). But for now, the easy way! Which I’ll show in a minute.

To search for tweets with a specific hashtag, you use the (very intuitively named) search_tweets() function. It takes a few arguments.First is the query: something like #rstudioconf, #useR2020, #rstats. Second is the number of tweets you want to get back. It defaults to 100. But here’s another important thing you need to know about searching tweets by keyword: unfortunately, the search only goes back around 6 to 9 days unless you pay for a premium Twitter API account. You can’t use rtweet to search today and find RStudio conference tweets from last year. You won’t be able to search two weeks after a conference to get those tweets. So you’ll want to make sure to save the older ones you might want in the future.

Another search_tweets() argument is whether you want to include retweets. For my purpose, I don’t, so I set that to FALSE.

There are more arguments you can use to customize your search, but let’s do a basic search now: 200 tweets with the #rstudioconf hashtag. See how easy that authorization was? And rtweet saved a token in my R environment file automatically, so I won’t need to authorize again in the future.

Now let’s see the results.

A couple of things jump out at me. One is: I asked for two hundred tweets but got back fewer. There are a couple of possible reasons for that. One is that there may not be 200 tweets in the last 6-9 days, since I’m running this code before this year’s conference starts. Another is that Twitter may have initially extracted 200 tweets, but after filtering out retweets, fewer were left.

Another thing you might have noticed: There are 90 columns of data for each tweet! That’s a lot of metrics. Let’s take a look at those those are the column names.

The ones I’m usually most interested in are status_id, created_at, screen_name, text, favorite_count, retweet_count, and urls_expanded_url. You might want some other columns for your analysis; but for this demo, I’ll select just those columns. That’ll make the data a bit easier to see on screen.

So that’s part 1: Getting the first batch of tweets. Step 2 is doing something with them.

There are lots of interesting visualizations and analyses you can do with Twitter data and R. Some of them are built right into rtweet. But I’m doing this demo wearing my tech journalist hat. I want an easy way to see new and cool things I might not know about. So, I want to see some of the most-liked tweets from a conference. Thanks to R, I don’t have to rely on Twitter’s “popular” algorithm. I can do my own searches and set my own criteria for “popular”. Maybe search for top tweets just “today” while the conference is in progress. Or filter for a specific topic I’m interested in, like “shiny” or “purrr”, sorted by most likes or most retweets.

One of the easiest ways to do these kinds of searches and sorts is with a sortable table. DT is one popular package for this. But lately I’ve been experimenting with another one: reactable.

The default reactable() is kind of blah. I’ve come up with a set of my own defaults I usually add to a table. Let me go over these.

Filterable adds search filters below each column header. Searchable adds an overall table search box that searches all the columns. Turning on bordered, striped, and highlight does what you might expect: Adds a table border, adds alternating-row color “stripes”, and highlights a row if you put a cursor on it. showSortable adds little arrow icons next to column names so users know they can click to sort. I set my defaultPageSize to 25. showPageSizeOptions lets me change the page length interactively, and then I define page size options that will show up in the drop-down menu. Finally, I set each defaultSortOrder to descending instead of ascending. If I click on the number of retweets or likes, I want to see that as “most to least”, not least to most.

Finally, there’s the columns argument. That’s a list containing a column definition list for each column. Look at the reactable help files for more details, but here I’m setting a couple of columns to have a default sort order ascending. For the text column, I want it to display html as html so I can add clickable links; I want to set a minimum width for the column in pixels; and I’m making it resizable – so I can click and drag to make it wider or narrower. I turned off the filter boxes for favorite_count and reply_count. That’s because unfortunately, reactable filters don’t understand that those columns are numbers, and will filter them as character strings. Don’t worry: reactable sorts number columns properly, it’s just those filter boxes that are a problem. That’s the major drawback to reactable compared with the DT package. But sorting numerically is enough for me for this purpose. So let’s go back and see what that looks like.

If I click on the favorite_count column, I can see the most favorites. I can click and drag on the tweet text column to make it wider or narrower.

A couple of things will make this more useful. I decided not to display images or videos in the tweet text field because my purpose here is to scan text. But sometimes it’s helpful to see the original tweet. So I created a separate column for URLs that are added to tweets. This way I can easily filter for tweets that include URLs. That’s handy if I’m trying to find tweets with links to presentations or resources. Right now, though, I can’t click to see anything. I need to create the HTML for clickable links in my R code.

So let me go back to my tweets dataframe and do that. As a reminder, here’s code for choosing columns for my table. Now let me add columns with HTML. For tweet text, I want to add a small clickable something at the end where I can click to see the actual tweet on Twitter. I decided on “space greater-than-sign great-than-sign” although it could be any character or characters.

If I look at the format of a tweet. See the format of the tweet URL? It’s twitter dot com slash username slash status slash the tweet ID. Using glue, that would be the code here. If you haven’t used glue before, it’s a great package for pasting together text and variable values. You use one expression in quotation marks, and put any variable name that you want to be evaluated in braces.

The first line of code creates a link to the tweet from the status ID and user name. The next line creating TweetLink, is the space greater-than greater-than link. And then finally I’m creating a Tweet column that includes that clickable link.

Finally after that I do a quick reactable with just the new Tweet column to check my links. It works! You can do something similar to make the url column clickable, although I won’t demo that to save a little time.

I’ll now consolidate my code so I’m not making three new columns for one tweet (I just did that to make it easier to explain and show the code). I’ll rename my columns to make them more user-friendly, and generate my final table.

To recap: Here I’m creating a new data frame from the original tweet data frame with the data I want. I’m selecting some columns, then adding the Tweet column with clickable link at the end, then selecting and renaming some columns. This is my code to make clickable links from the URL column. I’m not going over it here, but you can pause the video to take a look at it. Or, head to the associated article with this video so you can copy and paste the code. It’s a bit more complex than you might expect because that column is a list column; some tweets include more than one URL.

and here’s my formatted table with reactable. The data frame is my first argument. The rest if mostly formatting and table behavior.

Put all this code in a script and run it, and you’ve got yourself a searchable conference hashtag database!

One thing to remember: If you’re following a conference hashtag during a conference, you want to pull enough tweets to get the whole conference. So check the earliest date in your tweet data frame. If that date is after the conference started, request more tweets. If your conference hashtag has more than 18,000 tweets – as happened when I was tracking the Consumer Electronics show – you’ll need to come up with some strategies to get the whole set. Check out the retryonratelimit argument for search_tweets() if you want to collect a whole set of conference hashtag tweets going back 6 days or less.

Finally, make sure to save your data to a local file when the conference ends. A week later and you’ll no longer have access to those tweets via search_tweets() and the Twitter API.

In a bonus article and video, I’ll demonstrate how to turn this into an interactive Shiny app.

That’s it for this episode, thanks for watching! For more R tips, head to the Do More With R page at bit-dot-ly slash do more with R, all lowercase except for the R. You can also find the Do More With R playlist on the YouTube IDG Tech Talk channel -- where you can subscribe so you never miss an episode. Hope to see you next time!
Featured videos from IDG.tv