January 27, 2010

Shorty Awards Audit

After reading the sordid tale of ballot stuffing via twitter over at Bad Astronomy, I wonder if @mercola has the same "problem". Further, I wanted to know if it was also affecting @DrRachie.


To that end I wrote a python script to "audit" shorty award votes. Given a username, the script will scrape shortyawards.com for voters, and hit their twitter.com profile to generate a file containing 2 columns: username and number of updates. Users with deleted accounts will have -1 updates.


I have ran the script for @mercola, and at the time of data collection (UTC 1100) this is the breakdown of where the votes came from:
  • deleted accounts: 348 (12.07%)
  • accounts with 1 tweet: 407 (14.11%)
  • accounts with 2 tweets: 288 (9.98%)
  • other accounts: 1838 (63.71%)
  • total: 2885


The discrepancy of 4 comes from users who somehow managed to have no tweets: I suspect the account was deleted, then recreated. These 4 users were: bugoff48, budsgirl54, tracyaustin, janesperr.


You might wonder why I took an exception to users with 2 tweets. The following screen shots should suffice as an explanation:





I checked at random 10 users with only 2 tweets, and they were all people who created a twitter account for the express purpose of voting in the shorty awards, which is against the rules.


Personally, I would say that only 64% of votes for @mercola are valid. This puts him in the lead still, but only ~300 votes in front.


Feel free to do your own analysis of the data.


I am still running the script for DrRachie, so I will update when that script is done. In case you are wondering why it takes so long, that's because I am been nice and rate limiting my queries :)


Update 1: realised some users were showing up twice. Removed them, recalculated, re-linked data.


Update 2: @DrRachie's data is available! See the following.


OK, here is a break down of where @DrRachie's votes came from:


  • deleted accounts: 113 (6.50%)
  • accounts with 1 tweet: 41 (2.35%)
  • accounts with 2 tweets: 47 (2.70%)
  • other accounts: 1542 (88.42%)
  • total: 1744


Again there is a discrepancy, this time of a single user, Superpositional.


Just as I did for @mercola, I checked random accounts with 2 tweets. They all broke the rule. These accounts contained only tweets voting in the shorty awards.


My personal opinion is that 88% of votes for @DrRachie are valid, a percentage much higher than @mercola's.


Again, the data is available for your own analysis.


What should be done about this, I hear you ask. Personally I am happy if @mercola and @DrRachie both have their vote count adjusted accordingly.


Update 3: I am running the same analysis for 1st and 2nd place for #music, to see if the same pattern holds. Those results will be in a new post.


Update 4: I should point out that I am aware both @mercola and @DrRachie received votes in multiple categories. But seeing as how majority of votes are in #health, I feel it would be Too Much Effort to separate the vote out. Though if enough people complain, I will fix it.


Update 5Part 2 has been posted. It explores the question whether 64% valid votes is the exception or the rule.


Cheers,
Steve