Earlier today Craig Silverman of BuzzFeed broke the news that business and marketing website Bussiness2Community had deleted over 6000 articles written by Shawn Rice (brother of the site's founder Brian Rice) after an investigation revealed a massive plagiarism problem with his articles. As Craig's story noted, Lead Stories was one of the victims of Shawn Rice's behaviour and we are very happy the little sting operation we set up to catch him in the act worked out much better than we expected.
You should definitely first read Craig's story for the broad outline, it does an excellent job of telling the important parts of what happened:
This Fact Checker Hatched An Elaborate Scheme To Catch A Site That Was Stealing His Stories
Until yesterday, Shawn Rice was one of the internet's most prolific debunkers of online hoaxes. Since at least November 2016, Rice has written thousands of articles about hoaxes for business2community.com, a business and marketing blog. His quick, formulaic debunks appeared high on the first page of Google search results and in Google News.
Done? Now let's dive into the dirty details...
In the past year or so whenever Lead Stories would find a hoax going viral on some obscure website (by using our Trendolizer engine) we would debunk it and get an immediate traffic boost from Google searches made by people wanting to know if the story was true or not. But each time there would be an immediate drop in our traffic statistics only a few hours later when Shawn came in and rewrote our story on business2community.com because that site ranked much higher in Google.
There was no way this was a coincidence: hoaxes from obscure African websites or fake articles from two years ago that suddenly went viral again, no matter what we wrote about invariably Shawn would write about the same thing.
At first he still had the decency to link back to our articles but he would also lift entire paragraphs and present them as his own words. So we complained to the site's editors around January of 2018 and they promised improvements would be made and proper attribution would be given from now on for any words that were copied. In practice this meant from now on only the idea would be copied and no attribution would be given at all because after all no actual words from our articles were used.
This was quite galling because Business2Community's own contributor guidelines specifically prohibit all plagiarism, including the theft of ideas.
6. All contributors are responsible for the originality and accuracy of their submissions. Any contributor found to be plagiarizing any percentage of his or her content will be subject to an investigation of his or her entire body of work. If found guilty of plagiarism, the offending author will be banned from the community with all content removed.
What counts as plagiarism (this list is not exhaustive):
- Copying another person's work and submitting it as your own, word-for-word.
- Copying another person's work and changing some words or phrases.
- Copying any part of another person's work, whether changing words or not.
- "Spinning" another person's work.
- Using another person's idea as your own, including the progression, flow, and main points of a post, examples, images, etc.
- Copying content from any website, whether an author is given or not.
So they wanted to play it the hard way. Fine by me.
At first I tried reporting the site to Google News for the many sponsored content articles they were running which appeared in Google News even though the content guidelines in force at the time strictly prohibited this on pain of removal from Google News:
We highly encourage that you engage in the following best practices:
Stick to the news--we mean it! Google News is not a marketing service. We don't want to send users to articles created primarily for promoting a product or organization, or selling or monetizing links within an article (learn more). If your site mixes news content with other types of content, especially paid advertorials or promotional content, we strongly recommend that you separate non-news types of content. Otherwise, if we find non-news content mixed with news content, we may exclude your entire site from Google News.
Google News ignored these reports even though I submitted dozens of them (one per day for a while...). This was probably because Google News has updated their content guidelines as of last month, now allowing sponsored content to a certain extent so apparently they weren't that bothered by it:
Sponsored content
We do not allow content that conceals or misrepresents sponsored content as independent, editorial content.
It was around that time I started seeing the logo for Business2Community appear in ads in the sidebar on Lead Stories to my great irritation. Because I had visited the B2C site so often some ad network decided I would probably be interested in seeing these kinds of ads:
Turns out I actually was interested in those ads... They pointed to a site where people were offering to publish guest posts (including links) on Business2Community for money. That's link selling, which Google defines as a "Link Scheme" in their Webmaster Guidelines:
Link schemes
Any links intended to manipulate PageRank or a site's ranking in Google search results may be considered part of a link scheme and a violation of Google's Webmaster Guidelines. This includes any behavior that manipulates links to your site or outgoing links from your site.The following are examples of link schemes which can negatively impact a site's ranking in search results:
Buying or selling links that pass PageRank. This includes exchanging money for links, or posts that contain links; exchanging goods or services for links;
.....
If you see a site that is participating in link schemes intended to manipulate PageRank, let us know. We'll use your information to improve our algorithmic detection of such links.
Again I made several reports to Google with no discernible impact. And Shawn kept on copying.
That's when I decided to start monitoring my server logs to see if I could detect Shawn visiting my site so I could just block him. You can't copy what you can't see after all. It wasn't that difficult spotting him: if two or three stories would get copied in a short period of time I would simply check which IP address had visited all of them and there he was. I could literally see him come in via the main page, pick his articles and leave.
But then I thought blocking would be too obvious: he would notice eventually and then maybe try subscribing to our RSS feed or even simply steal via our Facebook page. I needed a more subtle solution.
So I set up an alternative version of the front page at https://leadstories.com/index2.html and redirected all requests for the homepage coming from Shawn's IP addresses to that page (in such a way that the actual URL shown in his browser wouldn't change). This version of the front page would typically be one or more days old so he couldn't see the newest articles.
This worked for a while but sometimes he still managed to find out about a recent article (probably through the "Recent" section in the sidebar on the article pages).
So it was time for a little fun: as the BuzzFeed story noted I set up a blog named The Honey Pot Times at https://thetrojanhoneypot.wordpress.com/ (yes, the name and the domain were really that obvious...). I would publish short death hoaxes there, debunk them in articles on Lead Stories and then only showed them to Shawn via the special homepage. And he happily copied them because Shawn sure loved death hoax stories. I thought it was hilarious that I could now feed Shawn a fake death hoax every time I had a real story I really didn't want him to copy.
I even created a Twitter account to tweet out the hoaxes because Shawn typically added some tweets to the end of his articles as filler. He happily embedded these tweets as well. My original plan was to do this a few times and then change the name on the twitter account to something like "Shawn Rice Copies Things" or "This Article Was Stolen" and have a bit of fun with it at some later point in time.
It was around this time that I decided to show all this to Craig at BuzzFeed since busting websites doing shady stuff is kind of his wheelhouse. Craig had a good laugh at my stunt and said he would look into it. The rest, as they say, is history.
Epilogue:
To quote from Craig's article:
BuzzFeed News asked Schenk via Skype how he feels now that close to 6,000 of Rice's search-friendly posts are gone.
He responded by sending a photo of him holding up a large beer, and said, "I can now spend more time on finding and debunking fake news instead of thinking up new ways to stop that site from exploiting my hard work."
My only regret is that the last death hoax on The Honey Pot Times was published ten minutes after Craig contacted Shawn's editors. It would have been glorious if the last thing he stole from Lead Stories was a fake death hoax about a plagiarising intern tripping over a "copy" cat and getting crushed by a Xerox machine.
Fake News: Intern Who Wrote Plagiarised Melania Tweet Did NOT Die In Bizarre Accident | Lead Stories
Did an intern who wrote a plagiarised Melania tweet die in an accident involving a falling Xerox machine? No, that's not true: the story was made up by a relatively new fake news website which has published several death hoaxes in the past few days. It is not real, it did not happen.
I guess you can't have everything...