Pages

Sunday, March 15, 2015

Popular opinion and the reception of the microbiome as an emerging field according to data mining on Twitter

The study of the human microbiome is a relatively new field of science. One that has launched a new understanding about the prevailing obesity epidemic and unveiled clues as to the origins of diabetes and rheumatoid arthritis. While scientists publish research on links between colon cancer and our native microbiota, how is public opinion evolving?

I chose to use this question as the foundation of my Twitter data mining project. Fair warning: the audience and the posters on Twitter do not represent a random or an accurate sample of the population. Nevertheless, we can extract some interesting information about what this particular sample cares about.

First I wrote a script to live stream and capture tweets using the Twitter API, Tweepy and StreamListener (you can find full documentation on my GitHub account). A short snippet of my script:


After 24 hours of streaming, I had enough data to parse. The data was formatted in JSON and looked like this:



Personally, I am most interested in scientific studies linking our environment and behaviors to disease and prevention of disease. But was this what Twitter users were interested in? Using the regular expressions module it was clear that the word "diet" was of much more interest than the topic "diabetes." A portion of the analysis is below:


If people were so concerned with diet, perhaps other appearance oriented terms would win in popularity?

Using matplotlib, I plotted the percent of tweets that contained the terms 'diabetes,' 'diet,' and 'skin.' Diabetes was not present in a single tweet. However, 'skin' was the most popular term.

Turns out, that was exactly the case. There were zero tweets containing the word diabetes (or cancer, or arthritis or c-diff, for that matter). However, 4% of tweets contained the word "diet." But the word "skin" takes the cake. The word "skin" was contained in over 26% of the tweets. It seems that people are more interested in the cosmetic implications of the microbiome.

What interested me next was: what kind of reach these tweets had. Even if more tweets were about "skin," how many people were seeing these tweets? It was possible that more people tweeted about skin, but the individuals that tweeted about "diet" had more influence. Quantity is not everything. So how many followers did the posters who wrote about each topic have?

This bar chart represents the reach of each of the terms 'diabetes,' 'skin,' and 'diet.' I extracted the follower count for each user on Twitter whose post contained the specific keyword. Each color on a single bar represents a user that tweeted the given term. The sum of all the followers gives the total audience a term has in the given time frame, and completes a total vertical bar.

Diabetes had zero tweets, so it follows that there wouldn't be an audience. However, it is clear that while the majority of tweets were about "skin", the word "diet" had much more reach. The majority of followers came from the account "MindBodyGreen" a green life style blog pledging to "revitalize the way people eat, move and live."

No comments:

Post a Comment