John Mueller of Google responded to a question on LinkedIn to discuss the use of an unsupported noindex directive in his own personal website’s robots.txt. He explained the pros and cons of search engine support for the directive and provided insight into Google’s internal discussions about its support.
Robots.txt by John Mueller
Mueller’s robots.txt has been a topic of conversation for the past week due to the general weirdness of the strange and non-standard directives he used within it.
It was almost inevitable that Mueller’s robots.txt would be scrutinized and go viral in the search marketing community.
Noindex directive
Everything in a robots.txt is called a directive. A directive is a request to a web crawler that it is obliged to obey (if it obeys the robots.txt directives).
There are standards for how to write a robots.txt directive, and anything that doesn’t conform to those standards is likely to be ignored. A non-standard directive in Mueller’s robots.txt caught the attention of someone who decided to post a question about it to John Mueller via LinkedIn, to find out if Google supported the non-standard directive.
It’s a good question because it’s easy to assume that if a Googler is using it, maybe Google supports it.
The non-standard directive was noindex. Noindex is part of the robots meta standard, but not the robots.txt standard. Mueller didn’t just have one instance of the noindex directive, he had 5,506 noindex directives.
The SEO specialist who asked the question, Mahek Giri, he wrote:
“In John Mueller’s robots.txt file,
there is an unusual order:
“noindex:”
This command is not part of the standard robots.txt format,
So do you think it will have any impact on how the search engine indexes your pages?
John Mueller is curious about noindex: a robots.txt”
Why the Noindex directive in Robots.txt is not supported by Google
Google’s John Mueller replied that it was not supported.
Mueller responded:
“This is an unsupported directive, it does nothing.”
Mueller then explained that Google had at one point considered supporting the noindex directive from robots.txt because it would provide a way for publishers to block Google from both crawling and indexing content at the same time.
Right now it is possible to block crawling in robots.txt or block indexing with the robots noindex meta directive. But you cannot block indexing with the meta robots directive and block crawling in robots.txt at the same time because a crawl block will prevent the crawler from “seeing” the meta robots directive.
Mueller explained why Google decided not to move forward with the idea of honoring the noindex directive within robots.txt.
He wrote:
“There was a lot of discussion about whether it should be supported as part of the robots.txt standard. The thinking behind it was that it would be nice to block crawling and indexing at the same time. With robots.txt, you can block crawling or you can block indexing (with a robots meta tag, if you allow crawling.) The idea was that you could also have a “noindex” in robots.txt and block both.
Unfortunately, since many people copy and paste robots.txt files without looking at them in detail (few people look as far as you do!), it would be very, very easy for someone to accidentally remove critical parts of a website. And so it was decided that this should not be a supported director, or part of the robots.txt standard…probably over 10 years ago at this point.”
Why was this Noindex in Mueller’s Robots.txt
Mueller made it clear that Google is unlikely to support this tag, and that this was confirmed about ten years ago. The revelation about these internal discussions is interesting, but it also deepens the sense of strangeness about Mueller’s robots.txt.
See also: 8 common Robots.txt problems and how to fix them
Featured image by Shutterstock/Kues
[ad_2]
Source link