Google search overwhelmed by massive spam attack

Google search overwhelmed by massive spam attack

Google search results have been hit by an onslaught of spam over the past few days that can only be described as completely out of control. Many domains rank for hundreds of thousands of keywords each, an indication that the scale of this attack could easily reach millions of keyword phrases.

Surprisingly, many of the domains were only registered in the last 24-48 hours.

This was recently brought to my attention from a series of posts by Bill Hartzer (LinkedIn profile) where he published a link graph generated by the Majestic backlink tool that exposed the link networks of several of the spam sites.

The link graph he posted showed numerous websites that link closely to each other, which is a fairly typical pattern of spam link networks.

Screenshot of a tightly interconnected network

Image by Bill Hartzer via Majestic

Bill and I discussed spamming sites on Facebook Messenger and we both agreed that while spammers worked hard to build a network of backlinks, the links weren’t really responsible for the high rankings

Bill said:

“In my opinion, this is partly the fault of Google, which seems to be putting more emphasis on content than on links.”

I agree 100% that Google is putting more emphasis on content than links. But my thoughts are that the spammy links are there so Googlebot can discover the spammy pages and index them, even if only for a day or two.

Once indexed, spammy pages are likely to exploit what I consider two loopholes in Google’s algorithms, which I discuss below.

Out of control spam in Google SERPs

Several sites rank for long phrases that are somewhat easy to rank for, as well as phrases with a local search component, which are also easy to rank for.

Long tail phrases are keyword phrases that people use but very rarely. Longtail is a concept that was popularized nearly twenty years ago with a 2006 book called The Long Tail: Why the Future of Business is Selling Less of More.

Spammers can rank for these infrequently used phrases because there is little competition for those phrases, making it easier to rank.

So if a spammer creates millions of pages of long phrases, those pages can rank for hundreds of thousands of keywords every day in a short period of time.

Companies like Amazon use the long tail principle to sell hundreds of thousands of individual products a day, which is different from selling one product a hundred thousand times a day.

This is what spammers are exploiting, the ease of ranking long sentences.

The second thing spammers are exploiting is the loophole inherent in local search.

The local search algorithm is not the same as the algorithm for ranking non-local keywords.

Examples that have come to light are variations of Craigslist and related keywords.

Examples are phrases like craigslist auto parts, craigslist rooms for rent, craigslist for sale by owner, and thousands of other keywords, most of which do not use the word craigslist.

The scale of spam is huge and goes far beyond keywords with the word “Craigslist”.

What does the spam page look like?

Taking a look at the appearance of the spam page is impossible if you visit the pages with a browser.

I tried to see the source code of the sites that rank in google, but all the spam sites are automatically redirected to another domain.

I then entered the spam URL into the W3C link checker to visit the website, but the W3C bot couldn’t see the site either.

So I changed my browser’s user agent to identify itself as Googlebot, but the spam site still redirected me.

This indicated that the site was not checking if the user agent was Googlebot.

The spam site was looking for Googlebot IP addresses. If the visitor’s IP address matched Google’s, the spam page displayed content to Googlebot.

All other visitors were redirected to other domains that displayed incomplete content.

To see the HTML of the website I had to visit with a Google IP address. So I used Google’s rich results tester to visit the spam site and record the HTML of the page.

I showed Bill Hartzer how to extract the HTML using the rich results tester and he immediately went tweeting about it, lol. Dang!

The Rich Results Tester has an option to display the HTML of a web page. So he copied the HTML, pasted it into a text file, and then saved it as an HTML file.

Screenshot of HTML provided by the rich results tool

Google search overwhelmed by massive spam attack

I then edited the HTML file to remove any JavaScript and then resaved the file.

Now I was able to see how the web page looks like in Google:

Screenshot of the spam web page

Screenshot of a spam web page that ranks in Google

A domain is ranked for more than 300,000 keywords

Bill sent me a spreadsheet containing a list of keyword phrases that only one of the spam sites ranked for. One spam site, just one of them, has ranked for over 300,000 keyword phrases.

Screenshot showing keywords for a domain

Image showing a close-up of a spreadsheet with keyword phrases

There were a lot of Craigslist keyword phrases, but there were also other long-tail phrases, many of which contained a local search element. As I mentioned, it’s easy to rank for long phrases, easy to rank for local search phrases and a mix of both types of phrases, and very easy to rank for those keyword phrases.

Why does this spam technique work?

Local search uses a different algorithm than the non-local algorithm. For example, a local site generally doesn’t need a lot of links to rank for a query. Pages only need the right types of keywords to trigger a local search algorithm and rank for a geographic area.

So if you search for “craigslist auto parts” that will trigger the local search algorithm and since it’s long tail it won’t take too long to rank.

This has been an ongoing problem for many years. Several years ago, a website was able to rank for “Rhinoplasty Plano, Texas” with a site that contained Old Roman Latin content and English headings. Rhinoplasty is a long-tail local search, and Plano, Texas is a relatively small town. The ranking of this Rhinoplasty keyword phrase was so easy that the Latin language website could easily rank it.

Google has known about this spam problem since at least December 19, as acknowledged in a tweet by Danny Sullivan.

Yes, I have already passed it on to the search team. Here’s a look. And it’s being watched. pic.twitter.com/vJH3EisnXD

— Google SearchLiaison (@searchliaison) December 19, 2023

It will be interesting to see if Google finally, after all this time, figures out a way to combat this type of spam.

Featured image by Shutterstock/Kateryna Onyshchuk



[ad_2]

Source link

You May Also Like

About the Author: Ted Simmons

I follow and report the current news trends on Google news.

Leave a Reply

Your email address will not be published. Required fields are marked *