How to drive a Web browser with R (and RSelenium)

woman code laptop developer programming
Credit: Shutterstock

With just a few snippets of code, automate your Web scraping and app testing

Related Topics

RSelenium may be one of the least known of R's most helpful packages. Why is it useful? Just a few lines of code will drive a Web browser for tasks that might otherwise need tedious manual pointing and clicking. That's handy both for testing Web applications and for collecting data from multiple Web pages.

If you're already somewhat familiar with this package, you can scroll down to see a reference chart of various tasks and RSelenium code to accomplish them. If not, read on for a beginner's guide to RSelenium.

RSelenium is an R interface to the Selenium 2.0 WebDriver project designed for automated testing of Web applications (there are bindings for a number of languages besides R if you prefer another platform such as Java or Python).

To use Selenium in R, you'll obviously need the R language loaded on your system; I also recommend using the RStudio IDE. (Need to learn R basics? Download our free Beginner's Guide to R PDF). Then, you'll need to 1) download the RSelenium package if it's not already on your system with install.packages("RSelenium") and 2) load it into your current R session with library("RSelenium").

The next step is to start a Selenium server with startServer(). If this is your first time running RSelenium after installation, it's possible you'll get this error: "No Selenium Server binary exists. Run checkForServer or start server manually." If so, simply follow that instruction and run checkForServer() to download and install the server software. Now, try running startServer() again.

You'll need to choose which browser to control with your R code. I generally use the default, Firefox, which is easiest to fire up (for Web scraping it doesn't matter; for application testing, you'll likely want to use multiple browsers one by one). You can name your browser object anything; I'll call it mybrowser and create it using remoteDriver:

mybrowser <- remoteDriver()

Aside: For those familiar with object-oriented programming, mybrowser is an object in the conventional sense -- it was instantiated from the remoteDriver class and as such has access to numerous methods, which are accessed using the somewhat-unusual-for-R format myobject$mymethod().

Now it's time to use this browser object to actually do something. To launch Firefox on your computer with the mybrowser object, run mybrowser$open(). If you get an "Undefined error in RCurl call" error message, this StackOverflow thread has a couple of suggestions. On my Mac, it turned out to be a security issue with the Mac not willing to allow an R script to open a Java file downloaded from the Internet that hadn't been approved. After I downloaded the standalone server into the same directory as my script and manually clicked to open it once, all the other times I ran the script, mybrowser$open() worked fine.

For a simple example of interacting with a form, head to the National Weather Service page by running mybrowser$navigate("http://www.weather.gov"). Entering text into an HTML form -- useful for logging into a website as well as checking the local forecast -- is a two-step process: 1) Create a variable that identifies the text-input box to the browser and 2) Send text to that variable.

To enter a ZIP code into the "Local forecast by 'City, St' or ZIP code" box, we need to know how to identify that box -- by name, CSS or XPath. SelectorGadget is a separate, great tool for this, and I've got more details on that tool in Web scraping with R and rvest (includes video and code).

It turns out that the weather.gov search box has a simple CSS ID of #inputstring. Step 1: Create a variable that holds information about that box -- you can name it anything, I'll call it wxbox -- and run the code:

wxbox <- mybrowser$findElement(using = 'css selector', "#inputstring")

Step 2 uses the sendKeysToElement method:

wxbox$sendKeysToElement(list("01701"))

(I've chosen the ZIP code for the Computerworld main office; feel free to substitute.) I earlier used SelectorGadget to discover the Go button has an ID of #btnSearch. So first create a variable that identifies the button to the browser object with wxbutton <- mybrowser$findElement(using = 'css selector', "#btnSearch") and then click it using clickElement: wxbutton$clickElement().

If instead you wanted to enter a ZIP code and the enter/return key in one step, the code for enter is "\uE007". You can try again by first using R to have the browser go back a page with mybrowser$goBack(). You'll need to re-define the search box because the browser object changed since the first time you ran the code, so run wxbox <- mybrowser$findElement(using = 'css selector', "#inputstring") and then wxbox$sendKeysToElement(list("01701", "\uE007")) (or whatever ZIP code you'd like). You can see more special-key codes on the Selenium site.

Here's the code in full:

library("RSelenium")
startServer()
mybrowser <- remoteDriver()
mybrowser$open()
mybrowser$navigate("http://www.weather.gov")
mybrowser$findElement(using = 'css selector', "#inputstring")
wxbox <- mybrowser$findElement(using = 'css selector', "#inputstring")
wxbox$sendKeysToElement(list("01701"))
wxbutton <- mybrowser$findElement(using = 'css selector', "#btnSearch")
wxbutton$clickElement()
mybrowser$goBack()
wxbox <- mybrowser$findElement(using = 'css selector', "#inputstring")
wxbox$sendKeysToElement(list("01701", "\uE007"))

There are many more things you can do with RSelenium, including highlighting elements on the page, and viewing and deleting cookies. See the searchable chart below for a list of common tasks and code needed to do them.

To learn more about RSelenium, scroll down past the chart and watch a webinar that RSelenium creator John Harrison recorded last year for the Orange County R User Group. Or, once you've loaded RSelenium, run help(package="RSelenium") to view all the function help files or vignette('RSelenium-basics') to see the package's starter vignette. The Testing Shiny Apps vignette is also a useful guide for testing any kind of Web application using RSelenium and the testthat package. RSelenium's home page is at http://ropensci.github.io/RSelenium/, where there are some additional resources.

 

Web automation tasks and how to do them with RSelenium

TaskFunction/methodCode formatNote
Run Selenium server startServer startServer() Required before anything else if running RSelenium session on your local machine. If you don't have the server on your machine, run checkForServer() first.
Create a browser object remoteDriver mybrowser <- remoteDriver(remoteServerAddr = "localhost", port = 4444, browserName = "firefox") mybrowser <- remoteDriver() is often sufficient if you want to accept the defaults for Firefox. Using other browsers can require additional installations and setup, see details. Creating this mybrowser object is necessary before you can do any automated Web browsing.
Launch a browser window open mybrowser$open() This is required before you can navigate to a URL.
Navigate to a URL navigate mybrowser$navigate("http://www.theurl.com")  
Back button equivalent goBack mybrowser$goBack() Navigates to previous URL.
Forward button goForward mybrowser$goForward() Navigates to next URL if/after browser has gone back in browsing history.
Refresh current page refresh mybrowser$refresh()  
View screenshot screenshot mybrowser$screenshot(display = TRUE)  
Save screenshot screenshot b64out <- mybrowser$screenshot()
writeBin(base64Decode(b64out, "raw"), 'nameoffile.png')
This captures and saves an entire web page, not just the portion viewable in the open browser window.
Find element on page by CSS id findElement method webel <- mybrowser$findElement(using = 'id', value="myid") myid is the specific ID you're seeking as a character string (without a #).
Find element(s) on page by CSS class findElement or findElements webels <- mybrowser$findElements(using = 'class name', "myclass") myclass is the specific class you're seeking as a character string.
Find element on page by CSS selector findElement or findElements webels <- mybrowser$findElements(using = 'css selector', "myselector") myselector is a CSS selector as a character string. Example: Search results from Google on a page could be found with links <- mybrowser$findElements(using = 'css selector', "li.g h3.r").
Highlight one element you selected on a page highlightElement webel$highlightElement() Useful to see if you've selected what you think you did with findElement. For multiple items selected with findElements, use the format lapply(webels, function(x){x$highlightElement()}) .
Get text of an element (after it is located on the page with findElement and stored in a variable) getElementText webel$getElementText()  
Find all links on page findElements links <- mybrowser$findElements(using = 'css selector', "a") Unless you narrow down the css selector to something besides a, you'll likely got too much returned - navigation links and son.
Get text of links after they're discovered on page with findElements getElementText linktext <- unlist(lapply(links, function(x){x$getElementText()}))  
Find element on page by name and store in a variable findElement or findElements webel <- mybrowser$findElement(using = 'name', "myname") myname is a specific name as a character string, such as "q".
Find element on page using xpath and store in a variable findElement or findElements webel <- mybrowser$findElement(using = "xpath", "myxpath") myxpath is an xpath selector as a character string.
Click an element after identifying and saving it clickElement webel$clickElement()  
Change text in an element sendKeysToElement webel$sendKeysToElement(list("Text I want to send")) Special keys such as enter can be sent with webel$sendKeysToElement(list("My search term", key = "enter")). See a list of available special keys such as enter, return, alt and control by typing names(unlist(selKeys)) at the R command prompt.
View cookies getAllCookies mybrowser$getAllCookies() Returns list.
Get names of all cookies getAllCookies with sapply and name sapply(mybrowser$getAllCookies(), "[[", 'name')  
Delete cookie by name deleteCookieNamed mybrowser$deleteCookieNamed("cookiename")  
Close browser window close mybrowser$close()  

This story, "How to drive a Web browser with R (and RSelenium)" was originally published by Computerworld.

To comment on this article and other CIO content, visit us on Facebook, LinkedIn or Twitter.
Download the CIO Nov/Dec 2016 Digital Magazine
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.