I chose to use this question as the foundation of my Twitter data mining project. Fair warning: the audience and the posters on Twitter do not represent a random or an accurate sample of the population. Nevertheless, we can extract some interesting information about what this particular sample cares about.
First I wrote a script to live stream and capture tweets using the Twitter API, Tweepy and StreamListener (you can find full documentation on my GitHub account). A short snippet of my script:
Personally, I am most interested in scientific studies linking our environment and behaviors to disease and prevention of disease. But was this what Twitter users were interested in? Using the regular expressions module it was clear that the word "diet" was of much more interest than the topic "diabetes." A portion of the analysis is below:
If people were so concerned with diet, perhaps other appearance oriented terms would win in popularity?
Using matplotlib, I plotted the percent of tweets that contained the terms 'diabetes,' 'diet,' and 'skin.' Diabetes was not present in a single tweet. However, 'skin' was the most popular term. |
Turns out, that was exactly the case. There were zero tweets containing the word diabetes (or cancer, or arthritis or c-diff, for that matter). However, 4% of tweets contained the word "diet." But the word "skin" takes the cake. The word "skin" was contained in over 26% of the tweets. It seems that people are more interested in the cosmetic implications of the microbiome.
What interested me next was: what kind of reach these tweets had. Even if more tweets were about "skin," how many people were seeing these tweets? It was possible that more people tweeted about skin, but the individuals that tweeted about "diet" had more influence. Quantity is not everything. So how many followers did the posters who wrote about each topic have?
Diabetes had zero tweets, so it follows that there wouldn't be an audience. However, it is clear that while the majority of tweets were about "skin", the word "diet" had much more reach. The majority of followers came from the account "MindBodyGreen" a green life style blog pledging to "revitalize the way people eat, move and live."
No comments:
Post a Comment