Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> They’ve nev­er claimed to in­dex ev­ery word on ev­ery page.

Not in those words, but they do claim to aspire to “Organize the world’s information and make it universally accessible and useful.”[1] which ought to include old web pages. They've gone to the effort of finding out of print books and digitizing them to make those searchable so it doesn't seem like a ten year old web page should be such a stretch.

[1] https://www.google.com/intl/en/about/our-company/



you'd think it would at least come up in the internet archive if not anywhere else.



That's unfortunate. But understandable in a way.

    # robots.txt web.archive.org 2013-10-02

    User-agent: *
    Disallow: /

    User-agent: ia_archiver
    Allow: /


touche, I don't suppose the old non commercial websites mentioned in the article suffer the same problem though right? Maybe an accidental robots.txt file was mistakenly left around?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: