The Google tracker documentation has a new IP list

Google has updated its Googlebot and tracker documentation to add a range of IPs for bots activated by users of Google products. Channel names have been changed, which is important for publishers who whitelist IP addresses controlled by Google. The change will be useful for publishers who want to block scrapers that use the Google Cloud and other crawlers not directly associated with Google itself.

New list of IP addresses

Google says that the list contains IP ranges that have been in use for a long time, so they are not new IP address ranges.

There are two types of IP address ranges:

User-initiated but Google-controlled IP ranges that resolve to a Google.com hostname.
These are tools like Google Site Verifier and presumably the Rich Results Tester. IP ranges that are initiated by users but not controlled by Google and resolve to a hostname gae.googleusercontent.com.
These are applications that reside in Google’s cloud or application scripts which are called from Google Sheets.

The lists corresponding to each category are now different.

Previously, the list corresponding to Google IP addresses was this: special-crawlers.json (resolves to gae.googleusercontent.com)

Now the list of “special browsers” corresponds to trackers that are not controlled by Google.

“IPs in the user-triggered-fetchers.json object resolve to hostnames gae.googleusercontent.com. These IPs are used, for example, if a site running on Google Cloud (GCP) has a feature that requires getting external RSS feeds at the request of the user of this site.”

The new list corresponding to trackers controlled by Google is:

user-triggered-fetchers-google.json

“Product tools and features where the end user triggers a retrieval. For example, Google Site Verifier acts upon a user’s request. Because the retrieval was requested by a user, these retrievers ignore bot rules .txt.

Google-controlled fetchers originate from IPs in the user-triggered-fetchers-google.json object and resolve to a google.com hostname.”

The list of Google Cloud IPs and application trackers that Google does not control can be found here:

The list of Google IPs that are activated by users and controlled by Google is here:

New content section

There is a new content section that explains what the new list is all about.

“Fetchers controlled by Google originate from IPs in the user-triggered-fetchers-google.json object and resolve to a hostname google.com. IPs in the user-triggered- fetchers.json resolve to hostnames gae.googleusercontent.com These IPs are used, for example, if a site running on Google Cloud (GCP) has a function that requires fetching external RSS feeds from the user of this site.***-***-***-***.gae.googleusercontent.com or google-proxy-***-***-***-***. google.com users-triggered-fetchers .json and user-triggered-fetchers-google.json”

Google changelog

from Google change log explained the changes like this:

“Exporting an additional range of Google fetcher IP addresses
What: Added an additional list of IP addresses for retrievers that are controlled by Google products, as opposed to, say, a user-controlled Apps Script. The new list, user-triggered-fetchers-google.json, contains long-used IP ranges.

Why: Technically it made it possible to export ranges.”

Read the updated documentation:
Checking for Googlebot and other Google crawlers

Read the old documentation:
Archive.org – Verification of Googlebot and other Google crawlers

Featured image by Shutterstock/JHVEPhoto

[ad_2]

Source link