#
console
Data, R, Twitter

StreamR – filterStream()

Why use filterStream() from the streamR package rather than the searchTwitter() function from the twitteR package? Well because they use different APIs and thus return different results.

Dislaimer: I’m miles from being an API connoisseur, please let me know if and where I am wrong.

Featured package: streamR package.

The twitteR package searchTwitter() function uses the GET search/tweets query from the Search API – the documentation states the following, “Please note that Twitter’s search service and, by extension, the Search API is not meant to be an exhaustive source of Tweets. Not all Tweets will be indexed or made available via the search interface.”

Meanwhile I do not see any such statement on the Streaming API documention. I assume it returns more tweets and in near real-time rather than a sample of past tweets – I believe there is a 30 second cache on Twitter’s API.

It is tricky to properly leverage the Streaming API using R as the steam has to remain open while the data is being processed. What I suggest below is to open the stream for X minutes then close it to process the tweets.

#Initialise
libs <- c("streamR", "ROAuth")
lapply(libs, library, character.only=TRUE)

#OAuth
requestURL <- "https://api.twitter.com/oauth/request_token"
accessURL <- "https://api.twitter.com/oauth/access_token"
authURL <- "https://api.twitter.com/oauth/authorize"
my_oauth <- OAuthFactory$new(consumerKey = "your_consumer_key", consumerSecret = "your_consumer_secret",
    requestURL = requestURL, accessURL = accessURL, authURL = authURL)

#download cacert.pem
download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")

#Register OAuth
my_oauth$handshake(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))

#Save OAuth for future sessions (use load)
save(my_oauth, file = "my_oauth.Rdata")

## date when loop will be stopped
end.date <- as.Date("2014-10-02")

## continue running until current date is end.date
while (Sys.Date() < end.date){
  current.time <- format(Sys.time(), "%Y_%m_%d_%H_%M")  #Save file
  file.name <- paste("keyword_", current.time, ".json", sep="")
  filterStream( file=file.name, track="keyword", locations = c(49, -11, 61, 1),
                oauth=my_oauth, timeout=3600) ## capture tweets for 3600 seconds = 3 hours
}

If you then want to parse the tweets simply use parseTweets (streamR package).

df <- parseTweets(your_json_file, simplify = TRUE)

Now stream(ish) Tweets to Google Earth using this method

Advertisements

Discussion

2 thoughts on “StreamR – filterStream()

  1. Nice tutorial, but when I parse the tweet file I got an error, the error is ” Error in parseTweets(“bbcqt.json”, simplify = TRUE) :
    “bbcqt.json” did not contain any tweets. See ?parseTweets for more details.”

    Like

    Posted by HanaAnber | May 15, 2015, 8:53 am
  2. Hi Hana, sorry for the late reply.

    This simply means the .json is empty. I reproduced it using a term very unlikely to return results (track=”ujsdkfasd”). Though no tweets are returned the file is still written.

    # trying to read them with rjson returns the same error
    > library(rjson)
    > fromjson(file=”my_tweets.json”)
    Error in fromJSON(file = “my_tweets.json”) : no data to parse

    What do you see if you open the .json with a text editor? It is likely empty.

    Like

    Posted by SocialFunction() | May 19, 2015, 3:10 pm

reply()

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: