Google’s John Mueller answered whether removing pages from a site helps solve the problem of pages that Google discovers but doesn’t crawl. John provided general ideas on how to solve this problem.
Discovered: Not currently indexed
Search Console is a service provided by Google that communicates search-related issues and feedback.
Indexing status is an important part of the search console because it tells the publisher how much of a site is indexed and eligible for ranking.
The indexing status of web pages can be found in the Page Indexing Report in the Search Console.
A report that Google has discovered a page but not indexed it is often a sign that a problem needs to be addressed.
There are several reasons why Google may discover a page but refuse to index it, even though Google’s official documentation list only one reason.
“Discovered, not currently indexed
Google has found the page, but it hasn’t been crawled yet.
Normally, Google wanted to crawl the URL, but this was expected to overload the site; therefore, Google rescheduled the crawl.
That’s why the last crawl date is blank in the report.”
Google’s John Mueller offers more reasons why a page would be discovered but not indexed.
De-indexing non-indexed pages to improve site-wide indexing?
There is the idea that removing certain pages will help Google crawl the rest of the site by giving it fewer pages to crawl.
There is a perception that Google has a limited crawl capacity (crawl budget) allocated to each site.
Googlers have repeatedly said that there is no such thing as a crawl budget as perceived by SEOs.
Google has a number of considerations for how many pages to crawl, including the ability of the website’s server to handle an extensive crawl.
An underlying reason why Google is picky about how much it crawls is that Google doesn’t have enough capacity to store every web page on the Internet.
This is why Google tends to index pages that have some value (if the server can handle it) and not index other pages.
To learn more about crawl budget, read: Google shares information about crawl budget
This is the question that was asked:
“Would de-indexing and aggregating 8 million used products into 2 million unique indexable product pages help improve crawlability and indexability (discovered, currently unindexed issue)?”
Google’s John Mueller first acknowledged that it was not possible to address the person’s specific problem and then offered general recommendations.
He answered:
“It is impossible to say.
I recommend checking out the big site guide to budget tracking in our documentation.
For large sites, crawling more is sometimes limited by how your website can handle crawling more.
In most cases, though, it’s more about the overall quality of the website.
Are you significantly improving the overall quality of your website from 8 million pages to 2 million pages?
Unless you’re focused on improving actual quality, it’s easy to spend a lot of time reducing the number of indexable pages, but not actually improving the website, and that wouldn’t improve things for search.”
Mueller offers two reasons for the unindexed discovery problem
Google’s John Mueller offered two reasons why Google might discover a page but refuse to index it.
Server capacity Overall website quality
1. Server capacity
Mueller said Google’s ability to crawl and index web pages may be “limited by how well your website can handle more crawling.”
The bigger a website is, the more robots are needed to crawl a website. The problem is that Google isn’t the only bot crawling a big site.
There are other legitimate bots, for example from Microsoft and Apple, that are also trying to crawl the site. In addition, there are many other bots, some legitimate and some related to hacking and data scraping.
This means that for a large site, especially in the evening hours, there can be thousands of bots using website server resources to crawl a large website.
That’s why one of the first questions I ask a publisher with indexing problems is the status of their server.
Typically, a website with millions of pages, or even hundreds of thousands of pages, will need a dedicated server or a cloud host (because cloud servers offer scalable resources such as bandwidth, GPUs, and RAM).
Sometimes a hosting environment may need more memory allocated to a process, such as the PHP memory limit, to help the server cope with high traffic and avoid 500 error response messages.
Troubleshooting servers involves analyzing a server error log.
2. Overall quality of the website
This is an interesting reason for not indexing enough pages. Overall site quality is like a score or determination that Google assigns to a website.
Parts of a website can affect the overall quality of the site
John Mueller has said that one section of a website can affect the overall determination of the site’s quality.
Mueller said:
“…for some things, we look at the overall quality of the site.
And when we look at the quality of the site overall, if you have significant portions that are lower quality, we don’t care why they would be lower quality.
… if we see that there are significant parts that are of lower quality, we might think that overall this website is not as great as we thought.”
Definition of site quality
Google’s John Mueller offered a definition of site quality in another Office Hours video:
“When it comes to content quality, we don’t just mean the text of your articles.
It’s really the quality of your website overall.
And that includes everything from design to design.
For example, how you have things presented on your pages, how you integrate images, how you work with speed, all these factors come into play.”
How long it takes to determine the overall quality of the site
Another fact about how Google determines site quality is how long it takes Google to determine site quality, which can take months.
Mueller said:
“It takes a long time for us to understand how a website fits into the rest of the Internet.
… And this is something that can easily take, I don’t know, a couple of months, half a year, sometimes even more than half a year…”
Optimizing a site for crawling and indexing
Optimizing an entire site or a section of a site is kind of a general, high-level way of looking at the problem. It often comes down to optimizing individual pages at scale.
Particularly for e-commerce sites with billions of products, optimization can take many forms.
Things to consider:
Main menu
Make sure the main menu is optimized to take users to the important sections of the site that most users are interested in. The main menu can also link to the most popular pages.
Link to popular sections and pages
The most popular pages and sections can also be linked from a featured section on the home page.
This helps users get to the pages and sections that matter most to them, but also tells Google that these are important pages that should be indexed.
Improve thin content pages
Thin content is basically pages with little useful content or pages that are mostly duplicates of other pages (template content).
It is not enough to fill the pages with words. Words and phrases must have meaning and relevance to site visitors.
For products, this can be measurements, weight, available colors, suggestions for other products to pair with it, brands that products work best with, links to manuals, FAQs, ratings, and other information that users will find valuable .
Non-indexed crawl solution for more online sales
In a physical store it seems that it is enough to put the products on the shelves.
But the reality is that it often takes savvy marketers to get those products off those shelves.
A web page can play the role of an expert marketer who can communicate to Google why the page should be indexed and help customers choose those products.
See Google SEO office hours at 13:41 minute:
[ad_2]
Source link