Scraper sites – those sites that copy content others have created and post it on their own site or blog as their own – have been the bane of webmasters for many, many years.
Even though, logically, the originating content should rank number one for the content since they are the originating source, often times you'll find scraper sites ranking above the content originator, usually in conjunction with other spam methods to get the content ranking.
Even worse, sometimes the original source of content vanishes from the search results while a scraper site's version continues to rank well
Google today has released a new Scraper Report form where webmasters can submit scraper sites that has copied their own content by providing Google with the source URL, where the content was taken from, and the URL of the scraper site where the content is being republished or repurposed, and the keywords where the scraper site was ranking on.
Google is also asking webmasters to confirm that their site follows the webmaster guidelines before submitting, although chances are pretty good that those webmasters who find the scraper report form are also aware of the Google webmaster guidelines and how to find penalties in their Google Webmaster accounts.
Does this mean that scraper sites are becoming more of a problem now than they have in the past? Not necessarily, however that could be part of the reason.
Sometimes scraper sites aren't necessarily ranking for the top money keywords, but there prevalent enough cluttering up the search results after the top 10 or so, and can be a lot more prevalent on long-tail search results as well as when you go beyond Page 1 or 2 of Google (it can be difficult to find non-scraped content for long-tail keywords on some searches). And the only way to get scraper content out of Google search results is by filing a DMCA.
Google isn't saying exactly what they're doing with this data. Is this being used as an easy way for webmasters to get the scraper sites out of the index without having to use the DMCA? Are they using it to improve their algorithms to try and determine where the originating content is versus the scraper content? Google doesn't say, although I suspect it is being used to improve the algorithm by seeing how and why scrapers are ranking.
This definitely has the mark of projects that one spam team member is working on. Back in August, Cutts also asked for examples of small websites that weren't ranking as well, despite being high quality, although that one specifically had a disclaimer saying that those submissions wouldn't affect the rankings.
It's great that Google is choosing to again look at scraper sites, because it has been pretty annoying for webmasters for so many years, even if they aren't necessarily ranking high.
This isn't the first time Google has asked for help with scrapers, and Google also tried to reduce the number of scrapers algorithmically in 2011.
Hopefully we will see a refresh on how scrapers are handled in a future update of Google's search algorithm.