Google accidentally published internal search documentation on GitHub

Getty Images | Alexander Koerner

Google apparently accidentally published a large number of internal technical documents on GitHub, partially detailing how the search engine ranks web pages. For most of us, the question of search rankings is just “are my web results good or bad”, but the SEO community is happy to peek behind the curtain and get up in arms , as the documents apparently contradict some of what Google has done. told them in the past. Most of the comments on the leak are from SEO experts Rand Fishkin i Mike King.

Google confirmed the authenticity of the documents The Virginsaying, “We would caution against making inaccurate assumptions about search based on out-of-context, out-of-date, or incomplete information. We’ve shared extensive information about how search works and the types of factors that weigh on our systems, while working to protect the integrity of our manipulation results.”

The funny thing about accidentally publishing GoogleAPI on GitHub is that even though these are sensitive internal documents, Google technically published them under an Apache 2.0 license. This means that anyone who came across the documents was granted a “perpetual, worldwide, non-exclusive, free, royalty-free and irrevocable copyright license”, so they are now freely available online, like now here.

Zoom in / One of the leaked documents.

The leak contains a bunch of API documentation for Google’s “ContentWarehouse,” which is very similar to the search index. As you might expect, even this sketchy look at how Google ranks web pages is impossibly complex. King writes that there are “2,596 modules represented in the API documentation with 14,014 attributes (features)”. These are all documents written by programmers for programmers and are based on a lot of background information that you would probably only know if you worked on the search team. The SEO community still studies the documents and uses them to build assumptions about how Google Search works.

announcement

Both Fishkin and King accuse Google of “lying” to SEO experts in the past. One of the revelations of the documents is that the click-through rate of a search results listing affects its ranking, which Google has denied enters the results “stew” on several occasions. The click tracking system is called “Navboost”, meaning it boosts the websites that users navigate to. Naturally, a lot of this click data comes from Chrome, even when you exit search. For example, some results may show a small set of “sitemap” results below the main listing, and apparently part of what drives these are the most popular subpages, as determined by Chrome’s clicktracking.

The documents also suggest that Google has whitelists that will artificially boost certain websites for certain topics. The two mentioned were “isElectionAuthority” and “isCovidLocalAuthority”.

Much of the documentation is exactly how you’d expect a search engine to work. Sites have a “SiteAuthority” value that will rank well-known sites higher than less well-known sites. Authors also have their own rankings, but as with everything here, it’s impossible to know how everything interacts with the rest.

Both comments from our SEO experts make them seem offended that Google would ever cheat them, but shouldn’t the company have at least a mildly adversarial relationship with people who try to manipulate search results? A recent study found that “search engines seem to be missing the cat-and-mouse game that is SEO spam” and found “an inverse relationship between a page’s level of optimization and its perceived experience , indicating that SEO can at least hurt the subjective quality of the page.” None of this additional documentation is likely to be great for users or the quality of Google’s results. For example, now that people know that click-through rate affects search ranking, couldn’t you boost a website’s listing with a click farm?

[ad_2]

Source link