Google Where to protect anchor text signal from spam site influence

In a Google SEO office hours session, Google’s Duy Nguyen from the Search Quality team answered a question about links to spammy sites and how it relates to trust.

It was interesting how the Googler said they protected the anchor text signal. It is not something that is commonly discussed.

Building trust with Google is an important consideration for many publishers and SEOs.

There is the idea that “trust” will help a site get indexed and ranked correctly.

It is also known that there is no “trust” metric, which sometimes confuses some in the search community.

How can you trust the algorithm if it’s not measuring something?

Googlers don’t really answer that question, but there are patents and research papers that give an idea.

Google does not trust links from spammy sites

The person who submitted a question to SEO Office Hours asked:

“If a domain is penalized, does it affect the links coming from it?”

Googler Duy Nguyen replied:

“I assume by ‘penalized’ you mean the domain was degraded by our spam algorithms or manual actions.

In general, yes, we don’t trust links from sites we know are spam.

This helps us maintain the quality of our anchor signals.”

Trust and Links

Googlers talk about trust, and it’s clear that they’re talking about their algorithms either trusting something or not trusting something.

In this case it’s not about not counting links found on spammy sites, in particular, it’s about not counting the anchor text signal.

The SEO community talks about “building trust,” but in this case, it’s really about not spamming.

How does Google determine that a site is spam?

Not all sites are penalized or receive a manual action. Some sites aren’t even indexed and that’s the job of Google’s Spam Brain, an AI platform that analyzes web pages at different points, starting at the time of crawling.

The Spam Brain platform works like:

Indexing Gatekeeper
Spam Brain blocks sites at crawl time, including content discovered through the search console and sitemaps.
Search for indexed spam
Spam Brain also captures spam that has been indexed at the time sites are considered for ranking.

The way the Spam Brain platform works is that it trains an AI on Google’s knowledge of spam.

Google commented on how the spam brain works:

“By combining our deep knowledge of spam with AI, last year we were able to create our own anti-spam AI that is incredibly effective at catching both known and new spam trends.”

We don’t know what Google is talking about “spam knowledge”, but there are several patents and research papers on it.

Those who want to dig deeper into this topic might consider reading an article I wrote about the concept of link distance ranking algorithms, a method for ranking links.

I also posted a full article on several research papers describing algorithms related to links that can describe what the Penguin Algorithm is.

Although many of the patents and research papers are in the last ten years, nothing else has been published by search engines and university researchers since then.

The importance of these patents and research works is that it is possible that they can be incorporated into Google’s algorithm in a different way, such as for training and AI like Spam Brain.

The patent discussed in the link distance ranking article describes how the method assigns ranking scores for pages based on the distances between a set of trusted “seed sites” and the pages they link to . The seed sites are like the starting points for calculating which sites are normal and which are not (ie spam).

The intuition is that the further a site is from a seed site, the more likely the site is considered spam. This part, about determining spam via link distance, is discussed in the research papers cited in the Penguin article I referenced above.

The patent, (Elaboration of a ranking for pages using distances in a graph of web links), explain:

“The system then assigns lengths to the links based on the properties of the links and the properties of the pages attached to the links.

The system then calculates the shortest distances from the seed page set to each page in the page set based on the length of the links between the pages.

The system then determines a rank score for each page in the set of pages based on the calculated shortest distances.”

Reduced links graph

The same patent also mentions what is known as a reduced link graph.

But it’s not just a patent that talks about reduced link graphs. Reduced link graphs were also investigated outside of Google.

A link graph is like a map of the Internet that is created by mapping with links.

In a reduced link graph, low-quality links and associated sites are removed.

What remains is what is called a reduced link graph.

Here’s a quote from the Google patent cited above:

“A reduced link graph

Note that links participating in the k shortest paths from seeds to pages form a subgraph that includes all links that are classified as “flow” from seeds.

Although this subgraph contains many fewer links than the original link graph, the k shortest paths from the seeds to each page in this subgraph have the same length as the paths in the original graph.

… Also, the ranking flow on each page can be backtracked to the nearest k seeds through the paths of this subgraph.”

Google does not trust links from penalized sites

It is an obvious thing that Google does not trust links from penalized websites.

But sometimes one doesn’t know if a site is penalized or marked as spam by Spam Brain.

Doing your research to see if a site can’t be trusted is a good idea before you go to the trouble of trying to get a link from a site.

In my opinion, third-party metrics should not be used to make business decisions like this because the calculations used to produce a score are hidden.

If a site is already linking to potentially spammy sites that themselves have inbound links from potential paid links like PBNs (Private Blog Networks), it’s probably a spammy site.

See SEO office hours: