A few SEO agents spotted a large cache of documents describing how Google ranks its search results on GitHub. They went to town claiming the documents proved what Google had always contradicted about its ranking processes. However, Google chose to remain silent on the issue, but has now confirmed that the leaked documents were authentic.
The documents, which may have been inadvertently committed to GitHub in mid-March by Google’s own automated tools, contain data that Google tracks and possibly used in the company’s secret sauce ranking algorithm. It gives users a peek under the hood of one of the most important systems that has shaped the Internet.
A report in The Verge quoted Google spokesman Davis Thompson as saying that the company would advise caution against making inaccurate assumptions about search “based on information that is out of context, out of date, or incomplete.” “We have shared extensive information about how search works and the types of factors that weigh our systems, while working to protect the integrity of our results from manipulation,” the official said.
What was leaked and how was it leaked?
The leaked material was first found and described by search engine optimization experts Rand Fishkin (of SparkToro) and Mike King (of iPullRank), who published his analyzes of the documents and their content a couple of days ago. Reports also suggested that the material was first discovered by another SEO specialist Erfan Azimi of EA Digital Eagle.
The researchers noted that the error had occurred on Google’s end through an automated tools process on March 13, when the automation incorporated an Apache 2.0 open source license to commit, a standard process for Google public documentation. Reports also noted that a follow-up engagement on May 7 also sought to undo the earlier engagement.
The leaked documents describe an earlier version of Google’s Content Warehouse API that provides insight into how search engine rankings are made. They do not contain any code or technical material. What they do is contain references to internal systems and projects that are likely to be internal documentation of the processes involved.
It should be noted here that Google has already put a Google Cloud API document with a similar name in the public domain, but GitHub’s seems to go much further. It contains references to what Google considers critical when ranking web pages for relevance, something the SEO community is now salivating over.
There is still much to speculate about
There are over 2,500 pages of documentation (you can consult them here) containing more than 14,000 attributes accessible or associated with the API. Of course, we can only speculate how many of these signals Google uses, as there is no information on how much weight Google gives them in its ranking algorithm.
Of course, SEO consultants believe that the documents contain enough detail and are a significant departure from the public documents made by Google from time to time. In his post, Rankin notes that the leak contradicts Googlers’ public statements over the years, “in particular the company’s repeated denial that click-centric user signals are used or that subdomains are considered for separated in the rankings”.
But there are also things that bring clarity
In his post, King points to a statement from Google search advocate John Mueller, who notes in a video that the company doesn’t have anything like a website authority score where it measures whether Google considers a site authoritative and , therefore, deserves a higher search ranking. results
However, documents that appeared on GitHub reveal that as part of the compressed quality signals that Google stores for documents, there is a “siteAuthority” score that can be calculated. Another relates to the importance of click-through rates as a ranking factor in web search, while another uses websites viewed in Chrome as a signal of quality; this is seen in the leaked API as “ChromeInTotal”.
Of course, there are references in the docs that confirm what we’ve known for some time: that Google takes into account factors such as content freshness, authorship, whether a page is related to the central focus of a site, the alignment between the page title and content and an average. weighted font size of a document body term.
How much can we believe?
While the authenticity and validity of the documents could be questioned, there is no doubt that it is a treasure for the SEO, marketing and publishing industries, given the secrecy that Google has maintained around its algorithms. Based on Google’s testimony in the US Department of Justice antitrust case, these leaks are significant.
The choices Google makes in search have a significant impact on how the Internet works for people and businesses of all colors. Not to mention a growing class of experts who claim to find ways to beat Google’s algorithm through SEO efforts. That Google has always been vague about its processes adds to the value of these leaked documents.
Also, their response to the leaks while admitting them further underscores that the industry is probably on to something here. How the company will respond to this in the future is something we may have to wait and see.
[ad_2]
Source link