Google has released a new installment of its “How Search Works” educational video series, explaining how its search engine discovers and accesses web pages through crawling.
Google Engineer Details Crawling Process
In the seven-minute episode hosted by Google analyst Gary Illyes, the company provides an in-depth look at the technical aspects of how Googlebot, the software Google uses to crawl the web, works.
Illyes describes the steps Googlebot takes to find new and updated content on the billions of web pages on the Internet and make them searchable by Google.
Illyes explains:
“Most of the new URLs that Google discovers come from other known pages that Google previously crawled.
You can think of a news site with different category pages that then link to individual news articles.
Google can discover most published articles by revisiting the category page from time to time and extracting the URLs that lead to the articles.”
How Googlebot crawls the web
Googlebot starts by following links from known web pages to discover new URLs, a process called URL discovery.
Avoid overloading sites by crawling each at a unique, custom speed based on server response times and content quality.
Googlebot renders pages using a current version of the Chrome browser to run any JavaScript and properly display dynamic content loaded by scripts. It also only crawls publicly available pages, not those behind logins.
Improved discovery and traceability
Illyes highlighted the usefulness of sitemaps (XML files that list a site’s URLs) in helping Google find and crawl new content.
He advised developers to have their content management systems generate sitemaps automatically.
Optimizing technical SEO factors like site architecture, speed, and crawl directives can also improve crawlability.
Here are some additional tactics to make your site more crawlable:
Avoid depleting your crawl budget – Websites that update frequently can overwhelm Googlebot’s crawl budget, preventing new content from being discovered. Careful CMS settings and rel= “next” / rel= “prev” tags can help.
Implement good internal linking – Linking to new content on category and hub pages allows Googlebot to discover new URLs. An effective internal link structure makes it easy to crawl.
Make sure pages load quickly – Sites that respond slowly to Googlebot crawls may have their crawl speed accelerated. Optimizing pages for performance can allow faster crawling.
Eliminate soft 404 errors – Fixing soft 404s caused by incorrect CMS settings ensures that URLs lead to valid pages, improving crawl success.
Consider robots.txt tweaks – A tight robots.txt can block useful pages. An SEO audit can uncover restrictions that can be safely removed.
The latest in a series of educational videos
The latest video comes after Google launched the “How Search Works” educational series last week to shed light on the search and indexing processes.
The recently released episode on crawling provides insight into one of the most fundamental operations of the search engine.
In the coming months, Google will produce additional episodes that explore topics such as indexing, quality assessment, and search refinement.
The series is available on the Google Search Central YouTube channel.
Frequently asked questions
What is the tracking process described by Google?
Google’s crawling process, as described in its recent episode of the “How Search Works” series, includes the following key steps:
Googlebot discovers new URLs by following links from known pages it has previously crawled. It strategically crawls sites at a custom speed to avoid overloading servers, taking into account response times and content quality. The crawler also displays pages with the latest version of Chrome to correctly display content loaded by JavaScript and only access publicly available pages. Optimizing technical SEO factors and using sitemaps can make it easier for Google to crawl new content.
How can marketers ensure their content is effectively discovered and crawled by Googlebot?
Marketers can adopt the following strategies to improve the discoverability and crawlability of their content for Googlebot:
Implement an automated generation of sitemaps within your content management systems. Focus on optimizing the technical elements of SEO, such as site architecture and load speed, and use crawl directives accordingly. Make sure frequent content updates don’t drain your crawl budget by setting up your CMS efficiently and using pagination tags. Create an effective internal linking structure that helps discover new URLs. Check and optimize your website’s robots.txt file to make sure it’s not too restrictive for Googlebot.
[ad_2]
Source link