the article is pretty much correct (although strangely worded at some times), the stuff about "communicating via robotst comments to google" is of course not true. the example he gives are developer jokes, nothing more.
still, you should not use comments in the robots.txt, why?
so well, you have definitly blocked yandex, you do not care for bingbot (commented out), but what about googlebot? is googlebot and yandex part of a user-agent group? or is googlebot it's own group and yandex it's own group? if the commented line is interpredted as blank line, then googlebot and yandex are different groups, if it's interpredted are as non existent, they belong together.
simple solution: don't use comments in the robots.txt file.
also, please somebody fork and take over https://www.npmjs.org/package/robotstxt it has this undefined behaviour and it also does not follow HTTP 301 requests (which was unspecified when i coded it) and also it tries to do too much (fetching and analysing, it should only do one thing).
by the way, my recommendation is to have a robots.txt file like this
why: if you do not have a file there, then at some point in the future suddenly you will return HTTP 500 or HTTP 200 with some response, that can be misleading. also it's quite common that the staging robots.txt file spills over into the real word, this happens as soon as you forget that you have to care about your real robots.txt
the article is pretty much correct (although strangely worded at some times), the stuff about "communicating via robotst comments to google" is of course not true. the example he gives are developer jokes, nothing more.
still, you should not use comments in the robots.txt, why?
you can group user agents i.e.:
Congrats, you have just disallowed googlebot, bingbot and yandox from crawling (not indexing, just crawling)ok, now:
so well, you have definitly blocked yandex, you do not care for bingbot (commented out), but what about googlebot? is googlebot and yandex part of a user-agent group? or is googlebot it's own group and yandex it's own group? if the commented line is interpredted as blank line, then googlebot and yandex are different groups, if it's interpredted are as non existent, they belong together.they way i read the spec https://developers.google.com/webmasters/control-crawl-index..., this behaviour is undefined. (pleae correct me if i'm wrong)
simple solution: don't use comments in the robots.txt file.
also, please somebody fork and take over https://www.npmjs.org/package/robotstxt it has this undefined behaviour and it also does not follow HTTP 301 requests (which was unspecified when i coded it) and also it tries to do too much (fetching and analysing, it should only do one thing).
by the way, my recommendation is to have a robots.txt file like this
and return HTTP 200why: if you do not have a file there, then at some point in the future suddenly you will return HTTP 500 or HTTP 200 with some response, that can be misleading. also it's quite common that the staging robots.txt file spills over into the real word, this happens as soon as you forget that you have to care about your real robots.txt
also read the spec https://developers.google.com/webmasters/control-crawl-index...