Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am keen to get a list of usernames and fullnames names from a specific twitter list using R. I could not see a function in any package but this code works

library(XML)
library(httr)


url.name <- "https://twitter.com/TwitterUK/lists/premier-league-players/members"
url.get=GET(url.name)
url.content=content(url.get, as="text")
pagehtml <- htmlParse(url.content)

screenNames <-xpathSApply(pagehtml, '//*/span[@class="username js-action-profile-name"]',xmlValue)
realName <- xpathSApply(pagehtml, '//*/strong[@class="fullname js-action-profile-name"]',xmlValue)

However, it only provides the first 20 values (? what appears on screen) whilst the list is much longer

If there is an rvest solution, this would also be welcome

cheers

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
329 views
Welcome To Ask or Share your Answers For Others

1 Answer

If you want to work with R and twitter, you should take a look at the twitteR package. It doesn't have a function to retrieve the information you want, but we can take advantage of its internal functions to use OAuth, and then send the correct API call. The advantage of using API calls is that you don't rely on parsing the HTML page, you're actually doing what developers are supposed to do.

The code below assumes you have already authenticated using setup_twitter_oauth(), you can find tutorials on this easily, since it's the package basics. Once authenticated, let's load the packages we need:

library(rjson)
library(httr)
# library(twitteR) Should have been loaded already of course

Now to do the API call, we'll use POST. The URL has a slug parameter which is the twitter list name, and a owner_screen_name parameter which is the Twitter Account owner of the list. We'll use internal twitteR:::get_oauth_sig() to authenticate the call.

twlist <- "premier-league-players"
twowner <- "TwitterUK"
api.url <- paste0("https://api.twitter.com/1.1/lists/members.json?slug=",
           twlist, "&owner_screen_name=", twowner, "&count=5000")
response <- POST(api.url, config(token=twitteR:::get_oauth_sig()))
#Count = 5000 is the number of names per result page,
#        which for this case simplifies things to one page.

This returns a JSON response which we can read using fromJSON:

response.list <- fromJSON(content(response, as = "text", encoding = "UTF-8"))

Now, we have a list where each element is the Twitter data of one Twitter-list member. To extract their names and user_names:

users.names <- sapply(response.list$users, function(i) i$name)
users.screennames <- sapply(response.list$users, function(i) i$screen_name)

Which are:

> head(users.names)
[1] "Peter Crouch"         "barry bannan"         "Jose Leonardo Ulloa "
    "Paul McShane"         "nacho monreal"        "James Ward-Prowse"
> head(users.screennames)
[1] "petercrouch"   "bazzabannan25" "Ciclone1923"   "pmacca15"
    "_nachomonreal" "Prowsey16"

Now the best part of this code is that it opens up pretty much the entire twitter API from R, as an already authenticated request. You can check the response list and sublists for all the available information on each query.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...