Dumplings from this Panda!

Jan 24

Digg Staff Pick – A System with no Rationale, What so Ever

I have never been happy with the concept of Hot News/Breaking News, where a single Digg staff picks a story and it stays on the front page for ever and almost always making the top news. However, most times – I do not see these picks entering “Top News in All Topics” as majority of community seldom finds those stories super interesting.


Totally faulty and baseless selections to the staff picks have become very common. Several times, I see news already in top news being staff picked. I wanted to paste a few links like that, where I had commented, however my comment history does not show those comments linked to the stories (Yes, I am surprised that those comments went missing).

Today morning, this story entered staff picks: http://digg.com/news/technology/chengdu_j_20_china_used_downed_us_fighter_to_develop_first_stealth_jet . As commented there by bossm4n, the very same story has already been in staff pick and did not get popular (for whatever reason). Now that a new submission from dailymail is available, again it enters staff picks. Seriously, how tough is it to keep an internal log of what stories have been on “Staff Picks” to enforce non-duplication of stories?


Leaving this particular example aside, my observation of the protocol on the Staff Picks is:
•    Look for links from partners and make sure big partners are given preference
•    Just so that people do not complain, occasionally (once or twice a day) – links from non-partners (example, YouTube) are picked, but from preferred submitters (Don’t ask me for definition of preferred submitters, but if you follow the pattern, you will know what I am talking about)

There is absolutely no search to see for stories in the same topic already active in the community and they don’t give any sh$t if there is one such story. One staff member liked this site or submitter – is the complete rationale, for the community driven site.

And here is an example right now live on the front page:

 
See that that second story on the top news and the third story on the Staff Pick are the same.

Doing big changes, introducing new features etc is a different thing, but following proper protocols and paying attention to these details does not really take anything extra, but for the willingness from Digg staff.

Update/Edit: I just read this post by JD.Rucker (oboy) in essence, what I am telling is just no different from what he already said. http://socialnewswatch.com/where-digg-continues-to-lose-its-way-trying-to-impress-mainstream-media-that-doesnt-give-a-h17/


Nov 10

How do digg ads invite a very high number of digg count?

No, this post is not criticizing digg nor is something startling disclosed. However, a misconception I had was cleared today. I use adblock and thus rarely see ads (when I use others computers or other browsers for any testing purposes). One thing I always observed was that, the diggable ads received a very high amount of diggs and the reason I guessed was that, as the sponsored stories could stay on the front page for however long they wanted they invited a lot of diggs.

However, an email I received today from an advertising major student from UT Austin contained a more logical answer. Here is the exact explanation as received from the student:

"I just wanted to point something out that you may or may not already know, but goes to prove that Digg facilitates the promotion of their publishers.  My small discovery:  since I have not signed up for an account with Digg, I am not allowed to “digg” a non-sponsored ad— upon clicking “digg” I am asked to please sign up for an account.  However, Digg allows me to “digg” SPONSORED ads without an account and without asking me if I want to sign up for one.  Ain’t that a load of bull?!”

Given that more diggs means less cost to run the ads, may be digg wants to help reduce the CPM/CPC cost incurred by its advertisers and thus motivating a longer campaign. One other explanation I could make up was that, a story with very high number of diggs amidst stories with relatively very less diggs would tempt users/visitors to click them. Whatever is the reason, digg is a business and they need to do everything to invite the highest income. As far as it is not deceiving it’s users, all is good!


Nov 4

Unofficial - Digg Bugs & Feature Requests

While we have all asked several times, there is no central place to make feature requests and bug reports on Digg. No one (users) knows which request is most asked for or which bug affects us all the most.

This is an attempt to aggregate these activity in a central and NEUTRAL location.

 Unofficial - Digg Bugs & Feature Requests


Nov 1

I am doing a Reddit AMA

Little did I expect this, but there was a request for a Reddit AMA (ask me anything) by me. Quite frankly, I was surprised to see such a request and be up-voted by more than 20 people. I also see that the purpose of it was to learn/expose something about a digg “power user”. No matter what the intention was, I wanted to respond to the request & the up-votes …. so here it is.


Oct 27

How Was The Traffic From Various Sources (Digg, Reddit etc.,)

ltgenpanda.tumblr.com was created only around 2-3 pm on Oct 25th. The link was made public around 6:30 pm on Oct 25th. Since then, the site was on various social media sites. Some of the prominent places it was listed are:

  1. Front page of digg since 11 pm CST on Oct 25th, on the top news side bar on the digg front page from about 1-2 am CST on Oct 26th to about 5-6 pm CST on Oct 26th (most of the time at the 1st position).
  2. Several submissions on reddit, with at least one submission on the front page of reddit for most part of Oct 25th night and Oct 26th day time. However never reached the top, was around the fifth or less position only.
  3. Front page of news.ycombinator.com
  4. Linked from several blogs including techcrunch.com

I have seen many discussions in the past about the traffic these sites can bring. Personally, in the past I knew that gaining the #1 position on the top news section of digg can bring about 150K visits, have experienced it several times. Here is a screen shot from my google analytics page to see how this site has done:

I never expected to write this – Reddit, you clearly have the present and the future!


Responses by Various Digg Users

Various digg users have made very nice arguments on digg. I should stress here that these are just the responses I personally liked. The link to comment on digg does not work well, so I am pasting a few here:

Response by davidtc:

"We’ve used test accounts since day one…"

Day 1 of what? Of the new algorithm on October 15th? Of V4? Day 1 of the last 4 years?

“Most importantly, we should have been forthright with our community about our testing efforts and we’ll certainly do so in the future.”

So lets hear it. Be forthright about what I am asking cause it doesn’t add up to what you are saying. Here is why.

I ask this cause on V3 we could see the “who dugg this” list (btw, bring it back). This would have been easy to spot with that list. These are obviously new accounts, so this leads me to believe this is a new thing you guys are doing which doesn’t help with your story. Why would we believe you have to make 100+ new accounts each time to test stuff?

Now that you are being “open” about it, why were the ones caught deleted? Going to make new accounts again? It doesn’t make sense to make new accounts to test a mundane thing like mass digging to see what breaks. The best reason to delete the accounts is so they can’t be monitored anymore. Now since you are going to “continue to use them” that must mean you are making more new accounts to do this. Again, deleting the caught accounts while saying you are going to continue to use test accounts is a big red flag that you don’t want them to be monitored. Why wouldn’t you want them to be monitored if you are going to be forthright about them?

It doesn’t makes sense when you look at what was found and what you said when Digg is going to be “forthright” about it now. Sorry, I’m not fully buying it.

Response by Rooper:

Okay, internal testing. Fair enough.

Why, then, did they appear to digg things only to propel them to the front page? Surely that means whether you label it as “testing” or not, you’re still gaming your own system for the benefit of a few advertisers. Why did you “test” with those domains? Why pick large publishers, rather than something more obscure?

If this is indeed legitimate testing, then there should be records (accessible from the API) that show similar patterns from previous “tests”. They’re a needle in a haystack for us to find, but *you* should be able to find them and show us.

And probably most importantly, these “tests” appear to have been performed using 150-ish dummy accounts. Is Digg really so easy to game that all you need to do is create 150 accounts and use them to digg stuff? Really?

Something isn’t right here.

Response by c_caliente:

Right. They think it’s the perfect excuse.
The program that generated the fake diggs is probably the same they use for internal testing on their dev/staging environments.
So I’m guessing they decided lately to use it on their production environment out of desperation, in order to promote articles and to create a phony sense of user activity. ( think of all the articles hitting the FP with only a handful of comments. Those are harder to fake).
No competent developer would consider running persistent tests on a production environment. Period.
Now Digg is acting as if it’s the most normal thing in the world, so they can deny their act of desperation.
They just hit a new low.

————————————————-

So, let me get this straight. You guys run tests on your production environment?
Wow.
What a convenient excuse to deny the fact that you are adding fake diggs to promote articles and to give a phony sense of user activity

Response by endersgame:

I am sorry but I just don’t quite buy it. Why would all these accounts have been created so recently? Why were so many accounts created and why was the digging activity so much and so consistent? Why was the digging enough to affect the promotion of articles to the front page on such a large scale? And finally, why were the “test” accounts deleted immediately after the digg staff was emailed about the discovery, before it was made public?

It sounds like a phony excuse to me. Why can’t we see who dugg a story? Why would you take that feature away? Why would you take away the bury option? How stupid do you think we are?

Response by bigkahunadaddy:

Yes, creating test accounts to ensure your algorithms are doing what you want them to be doing is critical. Even if that’s what you were actually doing, testing the algorithm on the primary web-facing site is pretty dumb.

I’m sure that some testing needs to be done on the primary web server, but I can’t imagine that what was being done couldn’t have been done on internal development servers. Instead of doing that, you chose to change the user experience by promoting stories to the front page that weren’t chosen by the users. Isn’t that the whole point of the site?

Either you guys did something shady or you did something stupid. Either way, it doesn’t endear you to the users that are already leaving in droves.

Response by vtbarrera:

I’d like to know more about how you guys chose which publishers to Digg. I know they’re just “test accounts”, but they seemed to have quite the impact given how much the Digg population has dwindled.

Response by dvsbastard:

These “test” accounts were the entire reason that these publisher articles made the front page (as suggested by ltgenpanda), and this has been going on for a while now. This suggests that the algorithm is completely vulnerable to such gaming (something you apparently learned)… yet you continued to run these “tests” knowing they were having a negative impact on the data making the front page of the site.

Long story short, your testing was dictating what was appearing on the live Digg front page…This hardly sounds like testing to me.

I will continue to add more as I find any.


In Response to The Digg Blog Post

Digg has responded publicly to the post I made yesterday and I appreciate their willingness to address this. I feel obligated to give my feedback/response – not in the expectation that digg will further address it.

  1. Defending my post yesterday – nothing I wrote yesterday has been proved wrong. Digg has told a “reason” for having done it.
  2. It is up to each individual whether they accept/trust the reason given.
  3. On the periphery – the reason of “testing” does seem mildly convincing; however there are numerous questions I have and I believe many of the community members will have:
    • What was the criterion on choosing the “domains” for testing?
    • Did digg believe that ONLY these big (publisher) sites like Guardian.co.uk, Dailymail.co.uk, Telegraph.co.uk etc., would game and not little sites/blogs? That is simply the biggest insult (“spam vulnerabilties”) to Digg’s publishing partners.
    • Why did the testing have to include submissions by the founder and his girlfriend? Did digg seriously think someone might bot-vote for their submissions, so we need to test it?
    • The “testing” was happening for 10 days and clearly the testing was moving a lot of stories to the front page, 276 to be exact. Is it ok to move publisher stories to the front page in the name of testing? For example: The guardian.co.uk story with 1 real digg (the digg received when it was submitted) and rest of all “test” diggs made it to  top news. Is that appropriate?
    • Approx 27% of the 992 stories promoted in this “testing” period were promoted due to the “testing”
    • In few days of testing itself it was found that the site can be “exploited” easily, should digg have not stopped the testing and returned to fixing the algo, rather than keep testing forever? If no one had found, would this test have been running infinitely?
    • Would this “testing” have happened if people were able to see who is digging?
    • Why did digg’s response not tell that they will bring back that feature? Rather chose to write about a few other non related features.
  4. I have and will always support digg, but when I see that something is not right or appropriate I will bring to the attention of who ever needs to know it. This only shows how much I and several other users are passionate about Digg.
  5. I will bring tools which make this and other types of monitoring easy for anyone interested.

Passionate Digg User,

LtGenPanda (Mohan)


Oct 26

Second Confirmation About Digg’s Involvement

While one reason to have contacted digg was to give them a chance to explain themselves, the other reason was to see if these diggs stop in the time period when ONLY digg knows about this post.

As written in my earlier post, I went public about this whole issue at sharp 6:34 PM, CST.

Here is the first email I sent at 5:23 PM CST:
————————————————————————————————-
Hi XXXXXXX,

Sorry for bothering you.

Do you know how could I get in touch with you Communications Director – Michele Husak? I have a long article which makes several accusations on digg (after I found out something shocking) and would like to give the article to her to give a chance to comment before it is published.

If you could give me a phone number, it would be awesome.

Thanks

XXXXXXXXXX
——————————————————————————————————
And at 5:33 PM CST, I sent:
—————————————————————————————————-
Here is the link:

http://ltgenpanda.tumblr.com/post/1399805023/mystery-behind-the-diggs-algorithmic-mystery-tour

Runs several pages, I would appreciate a comment within about 30 mins, as I mention in the article – I fear data destruction will happen.

Thanks

XXXXXXXXXXXXXXX
————————————————————————————
And now I present you with details of the last digg activity by the fake IDs. Again, I stress here that I went public about this topic at 6:34 PM CST. From 5:33 PM to 6:34 PM no one but digg knew about this. My server time (the system used to compute the time values below) and the desktop I use to email are about 6 to 7 minutes out of sync.


So, once I told them that I have a major accusation with shocking findings, they already expected this and as soon as the link was received, within minutes this operation ceased. Thanks Digg! - for confirming.

On a technical side: Digg can only ban accounts but cannot stop accounts from digging. So, if this was from some exterior group, digg would have only banned them as they cannot stop them from digging.


Oct 25

Did Digg game its own system to benefit publisher partners?

Digg recently published a blog post titled “Digg’s Algorithmic Mystery Tour” on October, 15th. While a Digg blog post is a normal thing, a post about the algorithm was very surprising to me. Why did Digg, which never bothers to blog about very visible changes, numerous bugs and issues, decided to blog about its secret “algorithm”? They never even verbally discussed it in public. Since the announcement, the front page stories have changed. A lot of sites not found much on the front page of Digg since the v4 fiasco; started to resurface frequently. Many diggers noticed this and there has been a lot of chatter about this. Many diggers wondered, how did such a small ‘tweak’ to the algorithm cause so much change to the front page. I wondered this as well.

Because I found this so strange I wanted to find an answer at least for my own edification. The main reason for my curiosity was that most of these sites never got enough diggs prior to the ‘algo’ change to make it anywhere close to “top news.”  I play all the time with the various social network APIs (Digg, Twitter, Google, Facebook etc). This is more of a hobby to me. What started as a casual search for an answer has now turned out to be what I think is a major revelation –- big enough for me to go public about it. Essentially what I think I have discovered is that someone has created dozens of accounts, in order to make sure that Digg’s publishing partners get front pages on the site, so that those sites get Digg referrals. They certainly had not been getting many Digg referrals in the last several weeks before the recent ‘algo’ change.


Some disclaimers & notes: I am no seasoned writer and please pardon my poor language usage. All data I refer to here can be made available by tweeting me. The data I am using is current (as of 11pm CST, October 23rd), however I will continue to pull and maintain the data from the API for a while. The API has a lot of limitations and I have tried my best to work around those limitations. I will be writing in the sequential order in which I was drilling the data and any inferences or opinions I make will be clearly identified so that the difference between facts and opinions are obvious. Some data/graphs I present may be irrelevant to the crux of the matter under discussion, but interesting nevertheless. In this report’s original form, I divided it into several pages, not just because it was very long, but because I would like your comments on various pages – as they each show a different set of data. I would really like to hear if you agree with my viewpoints or not.

Did the new ‘algo’ really change anything?

I began with downloading the details about all “top news stories” in the month of October, to see how many stories have “popped” and to get an idea of the variety in the stories that have reached the front page during the course of the month.


There is nothing out of the ordinary to be noticed in the chart above. And here is a bit more information on the top news domains. Interactive Data

Now, what are the top 20 domains (by the number of stories in a given time period) in three different time ranges?

The items highlighted in yellow in the third column, are new entrants into the “top 20 league” since the ‘algo’ change.

Did only the domains change or much more?

Now that it is obvious that the ‘algo’ change on Oct 15th has affected the chance of certain domains popping, I went forward to download ALL the UPCOMING diggs (227,936 diggs) made to any story (2390 stories) which eventually entered top news since Oct 1st. I should note here that, as explained in the Digg blog post, certain stories have had the “promote date” timestamp updated and thus, a few diggs made in the time from the 1st pop to the 2nd pop are included as well, as there is no means to exclude them.

As I mentioned before, out of curiosity to see who was digging the upcoming stories, I probed the API to find the top 100 people casting upcoming diggs on these top news stories. I also tried the same three time ranges as in step 1. Now is when interesting data started to pour in.


Much like the previous table, I have highlighted the new “entrants” in column 3. Suddenly, new, dedicated users have appeared. All those new domains in the previous table might not have been suspicious. But what we now see is that these new users are responsible ONLY for voting on the publisher domains. These users, all have similar profiles, similar names, and much more (similarities to follow in further pages).

At this point, if you are still with me – I have a request to you. Go ahead and download some of the data from the Digg servers and store them; as I fear that the data might vanish. I am only listing a small list of sample data, which has the mystery buried in them. All that you need to do is, right click each of the following links and save the (xml) files to your computer. Should the data vanish “mysteriously” – you will have proof that data did exist.  If you fear for the security of the link, note that all links are pointing to Digg servers.
1
2
3
4
5
6
7
8
9
10

Who and how many are they?

Doing some pattern matching, 159 users are suspicious. And here they are with links to their profiles:

a1
a3
a5
d10
d11
d12
d13
d14
d15
d16
d17
d2
d4
d5
d6
d8
d9
dd1
dd13
dd14
dd15
dd16
dd17
dd18
dd19
dd2
dd20
dd21
dd23
dd26
dd27
dd28
dd3
dd30
dd33
dd34
dd35
dd36
dd37
dd38
dd39
dd4
dd41
dd42
dd43
dd45
dd46
dd47
dd5
dd6
dd7
dd8
dd9
diggerz10
diggerz11
diggerz13
diggerz14
diggerz16
diggerz17
diggerz18
diggerz19
diggerz20
diggerz21
diggerz22
diggerz23
diggerz24
diggerz25
diggerz26
diggerz27
diggerz29
diggerz30
diggerz31
diggerz32
diggerz33
diggerz34
diggerz35
diggerz36
diggerz37
diggerz38
diggerz39
diggerz40
diggerz41
diggerz42
diggerz43
diggerz44
diggerz45
diggerz46
diggerz47
diggerz5
diggerz55
diggerz6
diggerz7
diggerz8
diggerz9
s1
s10
s11
s12
s13
s14
s3
s4
s5
s6
s7
s9

Now that 159 suspicious users have been found, note the similarities in their profiles. If you have not visited their profiles, please do now – to see that all of them are “new” and they do nothing but digg (no comments, submissions etc). Sample profile screen shots:

Look at the last profile, no followers, no followings, but digging a very select set of stories.
(From now, the 159 suspicious users will be called suspects)

So, What have they been digging? May be just spammers!

How much have these suspect’s diggs been spread across the various domains in 2390 stories we are analyzing. The data used is from Oct 1st. This “operation” appears to have only begun after Oct 15th.

Domain(count)
newsfeed.time.com (644)
dailymail.co.uk (578)
boingboing.net (461)
techcrunch.com (440)
telegraph.co.uk (408)
youtube.com (395)
huffingtonpost.com (378)
collegehumor.com (331)
slate.com (331)
wired.com (311)
arstechnica.com (295)
cbsnews.com (280)
bbc.co.uk (235)
maximumpc.com (232)
rawstory.com (190)
space.com (174)
gawker.com (168)
theonion.com (165)
news.discovery.com (159)
washingtonpost.com (147)
voices.washingtonpost.com (143)
newsweek.com (128)
livescience.com (122)
physorg.com (120)
news.nationalgeographic.com (118)
tpmlivewire.talkingpointsmemo. (118)
motherjones.com (115)
businessinsider.com (114)
engadget.com (113)
alternet.org (112)
i.imgur.com (112)
torrentfreak.com (110)
news.yahoo.com (108)
gizmodo.com (102)
funnyordie.com (102)
thedailybeast.com (99)
xkcd.com (91)
jalopnik.com (89)
news.cnet.com (86)
bloomberg.com (78)
greencarreports.com (65)
teamcoco.com (65)
news.com.au (65)
blogs.techrepublic.com.com (65)
tech.fortune.cnn.com (65)
abcnews.go.com (65)
novafm.com.au (64)
foxnews.com (64)
aolnews.com (63)
tuaw.com (63)
businessweek.com (63)
ucbcomedy.com (63)
io9.com (62)
buzzfeed.com (62)
guardian.co.uk (62)
holytaco.com (62)
scientificamerican.com (62)
spacefellowship.com (61)
salon.com (61)
ktla.com (60)
thefoxnation.com (60)
life.com (60)
msnbc.msn.com (60)
symmetrymagazine.org (60)
boston.com (60)
upi.com (59)
psychologytoday.com (58)
muslimswearingthings.tumblr.co (58)
myfoxdc.com (58)
reuters.com (57)
thelocal.se (56)
newgrounds.com (56)
tpmdc.talkingpointsmemo.com (55)
readwriteweb.com (55)
popsci.com (55)
expressjetpilots.com (55)
flickr.com (55)
bits.blogs.nytimes.com (54)
blogs.forbes.com (54)
indiareport.com (53)
religion.blogs.cnn.com (53)
warlogs.wikileaks.org (51)
cnn.com (50)
theappleblog.com (50)
kottke.org (49)
breitbart.com (48)
tokeofthetown.com (48)
generic1.tumblr.com (47)
blogs.discovermagazine.com (47)
theatlanticwire.com (47)
jezebel.com (46)
examiner.com (45)
npr.org (43)
treehugger.com (43)
zdnet.com (42)
spiegel.de (40)
holykaw.alltop.com (37)
blastr.com (35)
howtogeek.com (27)
ccinsider.comedycentral.com (26)
thesmokingjacket.com (25)
edition.cnn.com (22)
hollywoodreporter.com (5)
buzzll.com (1)

As can be seen in the table above, this seems to be a very widespread attempt, not targeting any single domain. One thing they do have in common though is that they are Digg publishing partners. Did you notice one notable absentee on the domain list? Hint… it starts with a “mash” and ends with “able” ;) I know they caused controversy when they were all over the front page during the transition to Version 4. Maybe that is the reason why they are not included in this front page effort at this time.

These accounts are not “spammers.” What have they achieved?

How many pops did these domains get because diggs from these suspect accounts? 229. However, just one digg from one these IDs should not make any of the stories by themselves suspicious, so I am now going to list all of the 229 stories and the number of suspect diggs and non suspect diggs. There’s no real way to know whether Digg is responsible for this or not. Remember that due to ‘promote_date confusion’ in Digg data, the total number of upcoming diggs of a few stories might not be accurate. Also remember that you are only seeing data as of 11pm CST on Oct 23rd, while this is still continuing to happen.

Link to interactive and detailed version of this data.


Now that each of the stories has been given an “ID”, we will use it for our reference. Did you notice that story with ID 1, got only 1 actual digg!. Yes, all it took was guardian.co.uk to submit the story and the rest was taken care of (but by who?). Any story with or more than 100 upcoming diggs, for sure has promote_date problem in it, so let’s for now leave those stories and crunch a few numbers. Also stories 209, 219 and 221 were excluded as they are clear outliers. For the rest of the stories (leaving out 31+3), 10016 suspect diggs were cast, they also had about 4055 non suspect diggs, but this 4055 is very far high from the reality, due to the promote_time bug/feature.  To get more reasonable estimate of the problem, let’s now only consider stories which needed 60 or less upcoming diggs, as these stories clearly are not a part of the promote_date bug. In this case, 986 diggs out of 1257 diggs were suspicious, that is 78.44% of diggs on these stories are suspicious.

There are a few interesting domains, submitters and stories to note here, which are discussed in a later page.

Is there a pattern to their digging?

So, is there a time pattern to these suspicious diggs? How would these stories compare to other regular stories? I am now showing some charts, with all of the stories in them needing 63 diggs to enter the top news. The 63 is just arbitrary, but useful in comparing the data. There are 8 suspicious stories with 63 upcoming diggs, so I am randomly picking 8 non-suspicious stories as well.
X Axis below is the number of diggs (until reaching front page) and Y axis is the number of minutes for each of those diggs — for the 16 selected stories.

As can be seen above, 5 out of the 8 suspect stories are obviously distinct from the other stories. Though not obvious, once the suspect users get into action, the respective story enters the top news section in about average 100 minutes. This graph will be used for a later discussion.
The 16 stories used for this graph are:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

Who could have been doing it?

Without pointing fingers directly at anyone, let’s get into some analysis.

  1. As seen earlier, all of the domains are widely spread-out with no clear link between them all, except for the FACT that they are all “preferred publishing partners” of Digg.
  2. Why did Digg out of nowhere decide to do a “Digg’s Algorithmic Mystery Tour” blog post, when it never before spoke about it? How did it even expect that everyone will accept that post at its face value?
  3. Why is it that the blog post came on Oct 15th and this started immediately?
  4. Now see the list of 229 stories – story with ID 72 from tumblr blog is submitted by “daryapino”. This is no one but the girl friend of the ‘Founder & Seller of the Sole of Digg’ Kevinrose. See that story is heavily dugg and commented by digg staff.
  5. Now see story 192, submitted by Kevinrose. Recently Kevin himself was getting only around 30 diggs (examples: http://digg.com/news/business/bezos_backed_doxo_launches_paperless_billing_service_2, http://digg.com/news/lifestyle/a_house_by_the_park )per story, but now ….. you know he is the founder.
  6. Why did digg decide to stop showing who dugg a story? Never responded to users feedback regarding this request. When you hide anything, people will think you are behind any problems connected to your act of hiding.


Clearly the above facts point the finger at digg with no one else as a suspect, however there is no concrete evidence to say that digg is 100% responsible, so I will only write to say that Digg is the one and only prime suspect here. This also coordinates well with their urge in getting Diggable ads out … well digg we just realized that most stories in the “top news” are ads, thanks!

So now what?

I can keep writing about this for ever …. but nothing is going to change. This is happening even this minute (4.27 pm CST, Oct 25th), but I must end it.
I am going to split this into two pieces – The piece titled “By Digg” is meant to be read if you think Digg has a direct involvement in this (as I do) and the piece titled “Not by Digg” is meant to be read if you think digg is not involved in this.
And what a coincidence, I would here like you to point to audio quote (http://www.youtube.com/watch?v=Ay8_cKWrOqw#t=62m50s ) by none but myself, in the SocialBlade show – just wow to myself ;)

If you think Digg did this:

Am sure someone from Digg is going to read this. So, I will address my points here to digg itself.

Digg, putting it very simple, this is like the US treasury printing fake dollars, just exactly the same. You lost any iota of credibility users may have had on you. Good job!

You messed up V4, you failed to listen to your users, and after a long time you agreed that you messed it up. You promised to listen to your users and are pretending to listen. Except for minor changes here and there, there is so much to be done. Instead of really working on those changes and coming back in an integral way, why did you choose to use such a cheesy method? Did you assume that all of your “several million” users are idiots? You have now not only failed traffic wise, users wise etc., you have failed as a business.

Integrity in a business is the first step towards success. A few small tricks here and there, to keep things running is seen as a “clever” thing, but cheating with the core of your business is an absolute crime. What caused this? VC pressure? Urge to not fail? For that I have spent hours and hours getting this out, be bold and give me a reply.

To the VC funders of Digg, I think you just lost your last hope!

You think Digg didn’t do this:

So you think this has not all been done by Digg and I see that you would give them the benefit of doubt. But this has been happening daily since Oct 15th. Why did Digg not spot this? Can you answer that? The chart I showed earlier shows that the curves for the suspicious stories are clearly way off. Digg keeps boasting about its complicated “algo” and monitoring system. This is so widespread – and they could not catch or stop it. Now how would you trust their algo or monitoring systems? Why would you believe them? Answer to yourself or post as comments here.

Giving a Fair Chance

Now that I am accusing Digg of something huge, I am going to give them a fair chance to explain their side until I publish this. However, I strongly suspect that data destruction might happen. So, I am going to record a video (don’t watch it, unless you have nothing else to do - http://www.youtube.com/watch?v=aQH5oC-iVnc ) showing the data being downloaded from digg servers and stored (229 XML files. I will also upload the files to a public server ( http://www.megaupload.com/?d=LUP0WFJ4 ), as a proof. That way if they ever delete data, you could trust my copy.

If I hear back from Digg, one more page will be added.
Until then,
Passionate Digg User,
LtGenPanda

I have contacted digg ….

I asked for a phone number for the Communication Director, but was told that they could take over this by email. I sent an email as below:
————————————————————
Here is the link:
http://ltgenpanda.tumblr.com/post/1399805023/mystery-behind-the-diggs-algorithmic-mystery-tour
Runs several pages, I would appreciate a comment within about 30 mins, as I mention in the article – I fear data destruction will happen.
————————————————————
I then got a reply in 20 mins:
————————————————————
That is a lot of information to assess in such a short period of time. Unfortunately, we’re not going to be able to get back to you with a comment within 30 minutes. 
————————————————————
Realizing that 30 mins might have been too short, I responded after 15 minutes with:
————————————————————
Is there a reasonable time you want me to wait for?
————————————————————
I am going to wait until 6:34 CST, that is 1 hr from when digg got a first chance to read it. If by then, they do not give me a reasonable time to wait, I will be going ahead and make this link public.

New UPDATEs


One Page Version of - Mystery Behind the “Digg’s Algorithmic Mystery Tour”

Digg published a blog post titled “Digg’s Algorithmic Mystery Tour” on October, 15th. While a digg blog post is just normal, a post about the algorithm was very surprising to me. How come Digg which never bothers to blog about very visible changes, numerous bugs & issues – decided to blog about its most hidden “algorithm”? They never even verbally discussed about it in public. And since then the front page stories changed. A lot of sites not found much on the front page of digg since the v4 fiasco; started to resurface frequently. Many diggers noticed this and there has been a lot of chatter about this. Many diggers wonder, how come a small tweak to the algo is doing so much change to the front page, and so “did” I.

Finding this strange, I wanted to find an answer to this question, at least to myself. Main reason for my curiosity was that, most of these sites did not get enough diggs prior to the algo change to make it anywhere close to the “top news”.  I play much with the various APIs (Digg, Twitter, Google, Facebook etc) and more of a hobby to me. What I started as a casual search for an answer has now turned to be a major revelation – big enough for me to go public about it.

Some disclaimers & notes: I am no seasoned writer and pardon my poor language usage. All data I refer to here can be availed by tweeting me. The data I am using is current (as of 11pm CST, October 23rd), however I will continue to pull and maintain the data from the API for a while. The API has a lot of limitations and I have tried my best to work around those limitations. I will be writing in the sequential order in which I was drilling the data and any inferences or opinions I make will be clearly identified to make the facts remain as facts. Some data/graphs I present may be irrelevant to the crux of the matter under discussion, but interesting nevertheless. I have separated this into several pages, not just because it is going to be very long, but because I would like your comments on the various pages – as they each show a different set of data. I would really like to hear if you agree to my view points or not.


Did the new algo really change anything?

I began with downloading the details about all “top news stories” in the month of October, to see how many stories have “popped” and how spread have the domains popped been across the month.


There is nothing particularly to be noticed in the chart above. And here is a bit more information on the top news domains. Interactive Data

Now, what are the top 20 domains (by the number of stories in a given time period) in three different time ranges:

The items highlighted in yellow in the third column, are new entrants into the “top 20 league” since the algo change.

Did only the domains change or much more?

Now that it is obvious that the algo change on Oct 15th has affected the chance of certain domains popping, I went forward to download ALL the UPCOMING diggs (227,936 diggs) made to any story (2390 stories) which eventually entered top news since Oct 1st. I should here note that, as explained in the digg blog post, certain stories have had the “promote date” timestamp updated and thus, a few diggs made in the time from the 1st pop to the 2nd pop are included as well, as there is no means to exclude them.

As told before, curious to see who is digging the upcoming stories, I queried to see who are the top 100 people casting upcoming diggs on these top news stories. I also, tried the same three time ranges as in step 1, and now is when interesting data started to pour.


Much like the previous table, I have highlighted the new “entrants” in column 3. But wait, new entrants in the previous table about the domains with most number of stories after an algo change might make sense, but how does so many new users – from just exactly the day after the “mysterious” blog post make sense? If you can connect the dots and convince me that there is nothing suspicious in this FACT that the algo change has got a new big bunch of dedicated users, all with similar profiles, similar names, and much more (similarities to follow in further pages) …. I bow to you.

At this point, if you are still with me – I have a request to you. Go ahead and download some of these data from the digg servers and store them; as I fear that the data might vanish. I am only listing a small list of sample data, which has the mystery buried in them. All that you got to do is, right click each of the following links and save the (xml) files to your computer. Should the data vanish “mysteriously” – you will have proof that data did exist.  If you fear for the security of the link, note that all links are pointing to digg servers.

1
2
3
4
5
6
7
8
9
10

Who and how many are they?

Doing some pattern matching, 159 users are suspicious. And here they are with links to their profiles:

a1
a3
a5
d10
d11
d12
d13
d14
d15
d16
d17
d2
d4
d5
d6
d8
d9
dd1
dd13
dd14
dd15
dd16
dd17
dd18
dd19
dd2
dd20
dd21
dd23
dd26
dd27
dd28
dd3
dd30
dd33
dd34
dd35
dd36
dd37
dd38
dd39
dd4
dd41
dd42
dd43
dd45
dd46
dd47
dd5
dd6
dd7
dd8
dd9
diggerz10
diggerz11
diggerz13
diggerz14
diggerz16
diggerz17
diggerz18
diggerz19
diggerz20
diggerz21
diggerz22
diggerz23
diggerz24
diggerz25
diggerz26
diggerz27
diggerz29
diggerz30
diggerz31
diggerz32
diggerz33
diggerz34
diggerz35
diggerz36
diggerz37
diggerz38
diggerz39
diggerz40
diggerz41
diggerz42
diggerz43
diggerz44
diggerz45
diggerz46
diggerz47
diggerz5
diggerz55
diggerz6
diggerz7
diggerz8
diggerz9
s1
s10
s11
s12
s13
s14
s3
s4
s5
s6
s7
s9

Now that 159 suspicious users have been found, note the similarities in their profiles. If you have not visited their profiles, please do now – to see that all of them are “new” and they do nothing but digg (no comments, submissions etc). Sample profile screen shots:

Look at the last profile, non followers, no following, but digging a very select set of stories.

(From now, the 159 suspicious users will be called suspects)

So, What have they been digging? May be just spammers!

How much have these suspect’s diggs been spread across the various domains in 2390 stories we are analyzing. The data used is from Oct 1st, however this “operation” only began after Oct 15th.

Domain(count)
newsfeed.time.com (644)
dailymail.co.uk (578)
boingboing.net (461)
techcrunch.com (440)
telegraph.co.uk (408)
youtube.com (395)
huffingtonpost.com (378)
collegehumor.com (331)
slate.com (331)
wired.com (311)
arstechnica.com (295)
cbsnews.com (280)
bbc.co.uk (235)
maximumpc.com (232)
rawstory.com (190)
space.com (174)
gawker.com (168)
theonion.com (165)
news.discovery.com (159)
washingtonpost.com (147)
voices.washingtonpost.com (143)
newsweek.com (128)
livescience.com (122)
physorg.com (120)
news.nationalgeographic.com (118)
tpmlivewire.talkingpointsmemo. (118)
motherjones.com (115)
businessinsider.com (114)
engadget.com (113)
alternet.org (112)
i.imgur.com (112)
torrentfreak.com (110)
news.yahoo.com (108)
gizmodo.com (102)
funnyordie.com (102)
thedailybeast.com (99)
xkcd.com (91)
jalopnik.com (89)
news.cnet.com (86)
bloomberg.com (78)
greencarreports.com (65)
teamcoco.com (65)
news.com.au (65)
blogs.techrepublic.com.com (65)
tech.fortune.cnn.com (65)
abcnews.go.com (65)
novafm.com.au (64)
foxnews.com (64)
aolnews.com (63)
tuaw.com (63)
businessweek.com (63)
ucbcomedy.com (63)
io9.com (62)
buzzfeed.com (62)
guardian.co.uk (62)
holytaco.com (62)
scientificamerican.com (62)
spacefellowship.com (61)
salon.com (61)
ktla.com (60)
thefoxnation.com (60)
life.com (60)
msnbc.msn.com (60)
symmetrymagazine.org (60)
boston.com (60)
upi.com (59)
psychologytoday.com (58)
muslimswearingthings.tumblr.co (58)
myfoxdc.com (58)
reuters.com (57)
thelocal.se (56)
newgrounds.com (56)
tpmdc.talkingpointsmemo.com (55)
readwriteweb.com (55)
popsci.com (55)
expressjetpilots.com (55)
flickr.com (55)
bits.blogs.nytimes.com (54)
blogs.forbes.com (54)
indiareport.com (53)
religion.blogs.cnn.com (53)
warlogs.wikileaks.org (51)
cnn.com (50)
theappleblog.com (50)
kottke.org (49)
breitbart.com (48)
tokeofthetown.com (48)
generic1.tumblr.com (47)
blogs.discovermagazine.com (47)
theatlanticwire.com (47)
jezebel.com (46)
examiner.com (45)
npr.org (43)
treehugger.com (43)
zdnet.com (42)
spiegel.de (40)
holykaw.alltop.com (37)
blastr.com (35)
howtogeek.com (27)
ccinsider.comedycentral.com (26)
thesmokingjacket.com (25)
edition.cnn.com (22)
hollywoodreporter.com (5)
buzzll.com (1)

As can be seen in the table above, this seems to be very clearly widespread attempt, not targeting any single domain. My only conclusion/inference here is that, the diggs have mostly been towards “publishing partners”. Did you notice one notable absentee in this domain list: hint – starts with a “mash” and ends with “able” ;) I know they did some “advice” posts to digg and them being all over the front page was the trigger point for the recent fiasco, not sure why that particular blog is missing in this list, but is obvious.

They are not spammers, what have they achieved?

How many pops did these domains gets by diggs from these suspects? 229. However, just one digg from one these IDs should not make any of the stories by themselves suspicious, so I am now going to list all of the 229 stories and the number of suspect diggs and non suspect diggs. While whether the digg is suspect or non suspect is clear, remember that due to promote_date confusion in digg data, the total number of upcoming diggs of a few stories might not be accurate. Also remember that you are only seeing data as of 11pm CST on Oct 23rd, while this is still continuing to happen.

Link to interactive and detailed version of this data.




Now that each of the story has been given an “ID”, we will use it for our reference. Did you notice that story with ID 1, got only 1 actual digg!. Yes, all it took was guardian.co.uk to submit the story and the rest was taken care (by who?). Any story with or more than 100 upcoming diggs, for sure has promote_date problem in it, so let’s for now leave those stories and crunch a few numbers. Also stories 209, 219 and 221 were excluded as they are clear outliers. For the rest of the stories (leaving out 31+3), 10016 suspect diggs were cast, they also had about 4055 non suspect diggs, but this 4055 is very far high from the reality, due to the promote_time bug/feature.  To get more reasonable estimate of the problem, let’s now only consider stories which needed 60 or less upcoming diggs, as these stories clearly are not a part of the promote_date bug. In this case, 986 diggs out of 1257 diggs were suspicious, that is 78.44% of diggs on these stories are suspicious.

There are a few interesting domains, submitters and stories to note here, which are discussed in a later page.

Is there a pattern to their digging?

So, is there any time pattern among these suspicious diggs? How would these stories compare to other regular stories? I am now showing some charts, with all of the stories in them needing 63 diggs to enter the top news. The 63 is just arbitrary, but useful in comparing the data. There are 8 suspicious stories with 63 upcoming diggs, so I am randomly picking 8 non-suspicious stories as well.

X Axis below is the number of diggs (until reaching front page) and Y axis is the number of minutes for each of those diggs — for the 16 selected stories.



As can be seen above, 5 out of the 8 suspect stories are very clearly obvious distinct from the other stories. Though not obvious, once the suspect users get into action, the respective story enters in about average 100 minutes. This graph will be used for a later discussion.

The 16 stories used for this graph are:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

Who could have been doing it?

Without pointing fingers directly anyone, let’s get into some analysis.

  1. As seen earlier, all of the domains are widely spread-out with no clear link between them all, but for the FACT that they are all “preferred publishing partners” of digg.
  2. Why did digg out of nowhere decide to do a “Digg’s Algorithmic Mystery Tour” blog post, when it never before spoke about it? How did it even expect that everyone will accept that post at its face value?
  3. Why is it that the blog post came on Oct 15th and this started immediately?
  4. Now see the list of 229 stories – story with ID 72 from tumblr blog is submitted by “daryapino”. This is no one but the girl friend of the ‘Founder & Seller of the Sole of Digg’ Kevinrose. See that story is heavily dugg and commented by digg staff.
  5. Now see story 192, submitted by Kevinrose. Recently Kevin himself was getting only around 30 diggs (examples: http://digg.com/news/business/bezos_backed_doxo_launches_paperless_billing_service_2, http://digg.com/news/lifestyle/a_house_by_the_park )per story, but now ….. you know he is the founder.
  6. Why did digg decide to stop showing who dugg a story? Never responded to users feedback regarding this request. When you hide anything, people will think you are behind any problems connected to your act of hiding.

Clearly the above facts point the finger at digg with no one else as a suspect, however there is no concrete evidence to say that digg is 100% responsible, so I will only write to say that digg is the one and only prime suspect here. This also coordinates well with their urge in getting Diggable ads out … well digg we just realized that most stories in the “top news” are ads, thanks!

So now what?

I can keep writing about this for ever …. but nothing is going to change. This is happening even this minute (4.27 pm CST, Oct 25th), but I got to conclude.

I am going to split this into two pieces – The piece titled “By Digg” is meant to be read if you think digg has a direct involvement in this (as I do) and the piece titled “Not by Digg” is meant to be read if you think digg is not involved in this.

And what a coincidence, I would here like you to point to audio quote (http://www.youtube.com/watch?v=Ay8_cKWrOqw#t=62m50s ) by none but myself, in the SocialBlade show – just wow to myself ;)

By Digg:

Am sure someone from digg is going to read this. So, I will address my points here to digg itself.

Digg, putting it very simple – this is like the US treasury printing fake dollars, just exactly the same. You lost any iota of credibility users may have had on you. Good job!

You messed up V4, you failed to listen to your users and after a long time you agreed that you messed it up. You promised to listen to your users and are pretending to listen to. Except for minor changes here and there, there is so much to be done. Instead of really working on those changes and coming back in an integral way, why did you choose to use such a cheesy method? Did you assume that all of your “several million” users are idiots? You have now not only failed traffic wise, users wise etc., you have failed as a business.

Integrity in a business is the first step towards success. A few small tricks here and there, to keep things running is seen as a “clever” thing, but cheating with the core of your business is an absolute crime. What caused this? VC pressure? Urge to not fail? For that I have spent hours and hours getting this out, be bold and give me a reply.

To the VC funders of Digg, I think you just lost your last hope!

Not By Digg:

So you think this is not by digg and I see that you would give them the benefit of doubt. But this has been happening daily since Oct 15th, why did not spot this? Can you answer that? The chart I showed earlier shows that the curves for the suspicious stories are clearly way off. Digg keeps boasting about its complicated “algo” and monitoring system. This is so widespread – and they could not catch or stop it. Now how would you trust their algo or monitoring systems? Why would you believe them? Answer to yourself or post as comments here.

Giving a Fair Chance


Now that I am accusing digg of something huge, I am going to give them a fair chance to explain their side until I publish this. However, I strongly suspect that data destruction might happen. So, I am going to record a video ( don’t watch it, unless you have nothing else to do - http://www.youtube.com/watch?v=aQH5oC-iVnc ) showing the data being downloaded from digg servers and stored (229 XML files. I will also upload the files to a public server ( http://www.megaupload.com/?d=LUP0WFJ4 ), as a proof. That way if they ever delete data, you could trust my copy.

If I hear back from digg, one more page will be added.

Until then,

Passionate Digg User,

LtGenPanda

I have contacted digg ….

I asked for a phone number for the Communication Director, but was told that they could take over this by email. I sent an email as below:

————————————————————

Here is the link:

http://ltgenpanda.tumblr.com/post/1399805023/mystery-behind-the-diggs-algorithmic-mystery-tour

Runs several pages, I would appreciate a comment within about 30 mins, as I mention in the article – I fear data destruction will happen.

————————————————————

I then got a reply in 20 mins:

————————————————————

That is a lot of information to assess in such a short period of time. Unfortunately, we’re not going to be able to get back to you with a comment within 30 minutes. 

————————————————————

Realizing that 30 mins might have been too short, I responded after 15 minutes with:

————————————————————

Is there a reasonable time you want me to wait for?

————————————————————

I am going to wait until 6:34 CST, that is 1 hr from when digg got a first chance to read it. If by then, they do not give me a reasonable time to wait, I will be going ahead and make this link public.


Page 1 of 2