Content theft(scraping) is one of the biggest headaches for bloggers. It doesn't feel good when you see your article on another blog. Most of the time, the blogs that scrape content are Made For AdSense with no original content.
Last week, we conducted a small poll about content scraping. The question was "Has Your Content Ever Been Stolen?" Here are the results:
How to Find Scrapers
Finding Scrapers
FIrst step to fight scrapers is to find them. Here are the ways to find scraped content:
- Internal Linking: Internal linking is one of the best strategy to find scrapers. Most of the scrapers copy the feed exactly and in this process, links to your posts are retained. If you use WordPress, then you will see linkback on your dashboard or you can use link:yourdomain to search Google for backlinks.
- RSS Footer Links: As most of scraping is done through RSS feeds, it's a good idea to add a copyright notice to your feeds. WordPress users can use RSS Footer plugin for this.
- Copyscape: Copyscape lets you search the web for duplicate content and offers a warning banner that you can add to your blog.
- CopyGator: Better service than Copyscape that lets you find duplicate content around the blogsphere based on page or feeds. You can get email alerts, enter feed/blog URL to find duplicates and ping them to find duplicates of latest content.
Scrapers are getting intelligent day by day and use new techniques to make sure you do not find the duplicate content. I have noticed that many splogs use scripts that convert links to your blog to their own links. One such scraper I noticed was copying feeds from Daily Blog Tips and was replacing dailyblogtips.com part of domain with own URL. This prevented him from appearing in Google for backlinks but CopyGator still found him.
How to Fight Scrapers
Knock Scrapers Out
Once you find a scraper, next thing is to take action. Here are the steps that you can take against scrapers:
- Secure Evidence: Search around to see if Google has cached the page. Cached pages are snapshots taken by the search robots as they crawl the blog. Google uses these as backup, but you can select that version and copy it. You can also use WebCite to make a copy of cached pages.
- Legal Actions: You can start by contacting the blog owner to take the duplicate content down. Here is a good post in detail about legal steps you can take: Six Steps to Prevent Content Theft.
- Report to AdSense: Most of the splogs have AdSense ads running on them and you can complain easily. Just click the Ads by Google logo. In the new tab that appears, click "Send Google your thoughts on the site or the ads you just saw" and report the violation.
- Report to their Host: Go to whoishostingthis.com to find the host of splog and report in detail.
- Report to Domain Company: Many times, such blogs use free domain services like co.cc and in that case you can notify them about it.
How do you control scraping? Do tell us through comments.


About Author
Related posts
{ 13 comments }
My entire site was stolen a couple of months ago and I spent a two days getting the site taken down my the host. Two months latter the same material that is taken word for word and photo for photo from my site was back up under a different domain name.
This can turn into a full time job to track down the infringing parties and almost seems to be a waste of time!
Premier-Kitchen-Design.com´s recent [type] ..Pull Out Pantry Options Kitchen Storage Solutions That Work The Way You Do
hi.. i'm new in this blogging thing.. i have a blog and most of the content are providing information about product that amazon sell. Is that considered as a splog? because most of the detail information of the product will be the same as in the amazon site, i don't steal content from other blog, but because my blog is mostly about amazon product, there are maybe 60% to 65% similarity with amazon. What do you think about wordpress plugin that automaticly post a content from ebay (i think it called BANS if i'm not mistaken)? is that considered as splog too? Because in my personal opinion that plugin was not a splog because people who see BANS page probably are looking for that product.
Though I agree with the bulk of what is in this post, and am very glad to see more bloggers taking up this issue, I think we should be clear that, if you are going to report a scraper/spam blogger to their host for a copyright violation, which is probably the best route in this case, you need to make sure your notice is DMCA compliant.
If you need help with that, i've got a stock letter on my site anyone here is more than free to use. But if you don't meet the requirements, it is unlikely most hosts, at least in the U.S., will work with you on this matter.
Hope that helps and let me know if there is anything that I can do to assist!
Excellent detailed article you have at http://www.plagiarismtoday.com/stopping-internet-… for those who take that route. Your stock letter for 'DMCA Notice to Host' is a time saver and signals the blog owner is professional and serious. Thanks.
Glad I was able to help! Let me know if there is anything I can do to assist!
I haven't been too concerned with scrapers, and as far as I know my content has not been stolen, but on the other hand I've been fighting it unknowingly by heavy internal linking.
I hadn't run into copygator before, thanks for mentioning that. And RSS Footer plugin, I'll find other uses as well, so this was most useful.
Internal linking is indeed a win-win for bloggers. First, it makes readers visit other articles that may interest them. Second, it is good for Search Engine Optimization and third, it helps combating content scraping.
Thanks for compliments. Hope you found post useful.
I have one question if we do internal linking to our sites article and don't use nofollow tag , does google punishes us on the ranking side ?
Well, I had asked about this from Ann Smarty(SEO expert) and here is her reply: "There are rumors about this but it is not confirmed."
I have not yet heard of anyone getting penalty for this, so it is pretty safe!
As the blogging phenomenon expands, copyright concerns become quite important. Technology makes it really easy to copy, modify and share information, whether we talk about text, images, audio or video. The problem is that the vast majority of people do not have a clear understanding of the Copyright Law, which might result in illegal and costly mistakes.
As a blogger it’s important to understand that ignorance and misinformation about copyright law and fair use has escalated and the numbers of content thieves and e-beggars has dramatically increased. Bloggers are expected to be able to sort facts from fiction, so if you are a newcomer becoming familiar with copyright law is part of the territory.
As I am a paralegal I have created a collection of posts on these issues that include copyright basics for bloggers; fair use limitations; plagiarism versus copyright infringement; how to copyright your digital works; preventing content theft; how to spot a splog; tracing who has stolen your content; and what to do when your content is stolen.
Using both deep linking to earlier posts in your text and using static pages for some content can be helpful ways of preventing some theft. Reducing RSS feeds to "summary" and using anchor text in the first 50 words to link back to the same post can also be helpful as a preventive measure. When all else fails, learning how to make a DMCA complaint is a necessity for bloggers these days.
Your post is a good resource for bloggers and I thank you for creating it. The more informed we bloggers become, and the more we spread that correct information throughout the blogosphere to combat the ignorance and misinformation in circulation, the better things will become for all of us.
Your response to this post and the related poll adds so much value to the discussion. It's rewarding when you and readers like Ruchi answer (actual and unspoken) questions. This encourages others not to be shy. And we welcome the perspective and help! Thanks.
Hi,
thanks for this info. I just checked and I had a backlink waiting for approval. It was a link from http://themeswp.co.cc. This site seems to have numerous posts about themes and I deleted it.
I can understand someone referencing my site, but taking it from me with asking me is just not cool. I did notice that when I clicked on the link that it took me directly to my site that opened in a new window.
It might be a trackback , in which only part of your post is taken a link back is given back to you.
{ 2 trackbacks }