Google explains how it chooses canonical web pages

How Google chooses canonical webpages

In a Google Search Central video, Google’s Gary Illyes explained a part of web page indexing that involves selecting canonicals, explaining what a canonical means to Google, a thumbnail explanation of web page signals, he mentions the centerpiece of a page and explains what it does with duplicates that involves a new way of thinking about it.

What is a canonical web page?

There are several ways to consider what canonical means from a publisher and SEO standpoint from our side of the search box and what canonical means from Google’s side.

Publishers identify what they consider to be the “original” web page, and the canonical SEO concept is to choose the “strongest” version of a web page for ranking.

Canonicalization for Google is a completely different thing than publishers and SEOs think it is, so it’s good to hear it from a Googler like Gary Illyes.

Official Google documentation on the canonization uses the word deduplication to refer to the process of choosing a canonical and lists five typical reasons why a site might have duplicate pages.

Five reasons for duplicate pages

Regional variants: for example, content for the US and UK, accessible from different URLs, but essentially the same content in the same language Device variants: for example, a page with a mobile version and a mobile version desktop Protocol Variants: for example, the HTTP and HTTPS versions of a site Site Functions: for example, the results of the sorting and filtering functions of a category page Accidental Variants: for example, the version site demo is accidentally made accessible to crawlers”

Canonicals can be considered in three different ways and there are at least five reasons for duplicate pages.

Gary describes one more way of thinking about canons.

Signals are used to choose canonicals

Ilyes shares one more definition of canonical, this time from an indexing point of view, and discusses the signals used to select canonicals.

Gary explains:

“Google determines if the page is a duplicate of another already known page and which version should be kept in the index, the canonical version.

But in this context, the canonical version is the page in a group of duplicate pages that best represents the group based on the signals we’ve collected about each version.

Gary pauses to explain duplicate grouping and then returns to the signals a little later.

It continued:

“For the most part, only canonical pages appear in search results. But how do we know which page is canonical?

So once Google has the content of your page, or more specifically the main content or centerpiece of a page, it will group it with one or more pages with similar content, if any. This is duplicate grouping.”

I just want to stop here to point out that Gary refers to the main content as the “centerpiece of a page,” which is interesting because there’s a concept introduced by Martin Splitt at Google called Centerpiece Annotation. He didn’t really explain what the center table annotation is, but this part Gary shared helps.

Next is the part of the video where Gary talks about what the signs actually are.

Illyes explains what “signals” are:

“It then compares a handful of signals it has already calculated for each page to select a canonical version.

Signals are pieces of information that the search engine collects about pages and websites, which are used for further processing.

Some signals are very simple, such as the site owner’s HTML annotations such as rel=”canonical”, while others, such as the importance of an individual page on the Internet, are less straightforward.”

Duplicate clusters have a canonical

Gary then explains that one page is chosen to represent the canonical for each group of duplicate pages in the search results. Each group of duplicates has a canonical.

He continues:

“Each of the duplicate clusters will have a single version of the content selected as canonical.

This version will render the content in the search results of all other versions.

The other versions of the cluster become alternate versions that can be served in different contexts, such as if the user is searching for a very specific page in the cluster.”

Alternative versions of web pages

This last part is really interesting and important to keep in mind because it can be useful to be able to rank for multiple variations of a keyword, especially for e-commerce websites.

Sometimes the content management system (CMS) creates duplicate web pages to account for variations in a product, such as a product’s size or color, which can then affect the description. Google may choose these variations to rank in search results when that variation page serves a closer match to a search query.

This is important to think about because it can be tempting to redirect web pages of noindex variants to keep them out of the search index for fear of the (non-existent) keyword cannibalization problem. Adding a noindex to pages that are variants of a page can backfire because there are scenarios where those variant pages are best to rank for a more nuanced search query that contains different colors, sizes, or version numbers those of the canonical page.

The best takeaways about Canonicals (and more) to remember

There’s a lot of information in Gary’s discussion of the canons, including some side topics to the main content.

Here are seven takeaways to consider:

The main content is called the Centerpiece Google calculates a “signal knob” for each page it discovers. Beacons are data that is used for “post-processing” after web pages are discovered. Some signals are under editor control, such as hints (and presumably directives). The clue Illyes mentioned is the rel=canonical link attribute. Other signals are beyond the control of the publisher, such as the importance of the page in the context of the Internet. Some duplicate pages can serve as alternate versions Alternate versions of web pages can still rank and are useful to Google (and the publisher) for ranking.

Watch Search Central’s episode about indexing:

How Google Search indexes pages

Featured image from Google video/altered by author

[ad_2]

Source link

You May Also Like

About the Author: Ted Simmons

I follow and report the current news trends on Google news.

Leave a Reply

Your email address will not be published. Required fields are marked *