I had a client once who had something similar, although unintentionally. She approached me because her website "kept getting hacked" and she didn't trust the original developers to solve the security problems... And rightly so!
There were two factors that, together, made this happen: first, the admin login form was implemented in JS, and if you went to log in with it with JS disabled, it wouldn't verify your credentials. And it submitted via a GET request. Second, once you were in the admin interface, you could delete content from the site by clicking on an X in the CMS. Which, as was the pattern, presented you with a JS alert() prompt before deleting the content... via a GET request.
Looking at the server logs around the time it got "hacked", you could see GoogleBot happily following all the delete links in the admin interface.
> I had a client once who had something similar, although unintentionally.
I did that too. I was aware of the problem, but at the time (1996) I did not know how to fix it.
So I just documented it and warned that they should keep the site away from altavista.
This was back before cookies had wide support, so login state was in the URL. If you allowed a search spider to know that URL it would have deleted the entire site by spidering it.
I did eventually fix it by switching to forms, and strengthening the URL token to expire if unused for a while. And then eventually switching to cookies (at one point it supported both url tokens and cookies).
I have not thought about those days in such a long time.
Obviously that is the solution. I know that now, I didn't then. (As I wrote: "I did eventually fix it by switching to forms.")
The whole thing about POST vs GET that everyone knows today for read only vs write was not that well known back then.
Back then you used GET for things with a small number of variables, and POST when you expected enough data that it wouldn't fit in the URL. It was all about the URL, not about the effect of the request.
I guess there was no Wikipedia to have an article for HTTP back then, which has been an invaluable resource for me to understand some of the intricacies in my work.
This site ran on IIS 1.0 on Windows NT 3.51. For scripting we used a prerelease Coldfusion version. (i.e. the version before 1.0, which was released as we were developing the site, partially based on feedback we provided as we tested it.)
> How did you prevent any visitor from deleting the site?
A security token in the url which was secret. The worry was that some admin would try to submit the site to altavista for indexing without removing the token from the url first.
Probably not. This happens more than you might think. I got called in to consult on a project where something similar was happening. Client would add products to their web store and the next day the products were missing.
Unsecured access and 'GET' based deletes were everywhere.
I accidentally deleted about half of the database at a startup where I’d recently started working by approximately the same method. I was running a copy of the web interface on my laptop, connecting over the internet to our MySQL server, and also running ht://dig’s spider on localhost from cron. It started spidering the delete links. Fortunately, I’d also started running daily MySQL backups from cron (there were no backups before I started working there), so we only lost a few hours of everyone’s work. As you can imagine, though, they weren’t super happy with me that day.
Fundamentally, authentication when someone tries to delete a thing needs to happen in server-side logic, not on the client side. The rest is flavouring.
Authentication should happen server side, but authentication need not happen at the time of delete. When deleting, you should be authorizing and validating, which can safely be done client side... but, if you are doing something server-side (e.g., a delete) you should also be doing it server side.
Basically do the exact opposite of everything they did. Doing authentication in client-side JavaScript is an absolute no-no. Using GET requests for things that have side effects (like deleting content) is another.
I agree that there are lots of reasons that someone would make a site like this, but I think people are curious as to the maker's specific reason. From the github:
Why would you do such a thing? My full explanation was in the content of the site. (edit: ...which is now gone)
I'm curious as to what the website said originally.
I think more and more the word hacker has lost its original meaning at least in this community. If I were reading a similar story on a tor hidden service, let's say, I would not be asking why, but here I do.
It's totally twisted, because "business types" got involved and everyone started confusing "hacking" with "working". This story is a cool but simple hack, in the original meaning of the word. I find that a good definition of a hack is "a project or a trick that you can tell to your tech friends over a beer and have some good laugh from it".
An alternative would be to check for the browser user agent and delete the website right at that point and return a 404 page to the Google crawler bot. Then Google won't have a static copy of the website.
Your approach is "a website that irrevocably deletes itself once indexed by Google".
What OP has done is "a website that irrevocably deletes itself once Google decided to publicly reveal the fact that it indexed said website".
OP's approach has no way of knowing when the site was indexed. It's conceivable that Google indexed it on the very first day and decided not to share it publicly until 21 days later.
If you really want to get "technical", then the first one is when the site is "crawled" and the latter is when it's "served". "Indexing" happens in-between the two.
Even if the request that claims to be from Googlebot is actually the Googlebot (which it might not be,) that doesn't guarantee the site is indexed. It's impossible to know when the site is indexed without direct access to Google's index.
Actually, you could do a reverse IP lookup against any user agent claiming to be googlebot followed by a forward IP lookup against the domain name you were returned. Legitimate googlebots will be in the *.googlebot.com space.
That meta tag prevents Google from publicly showing their cached version of the page. In practice this means the "Cached" link, within the results, doesn't appear when a given page asks Google to NOARCHIVE -- which I believe can be 'asked for' via either the meta tag or via a special response header.
Edit:
Yeah, 'noarchive' can be specified via the meta tag or via header. Also available to you are a handful of other directives such as NoIndex, NoFollow, NoArchive, NoSnippet, NoTranslate, etc...
See these links for more in-depth info about the directives & which search engines support what:
What about the opposite? A website that created when it is indexed? Start with nothing and content is added each time the site is visited by Googlebot, or shared on Facebook, tweeted, posted on Reddit, etc. The website exists only so that it can be shared, and the act of sharing it defines what the website is.
This is an uber cool idea. Especially if, when this website is shared by someone, it would attempt to scan the sharer's public feed, last submissions, last comments, last tweets, etc. (depending on where it got shared), and generate additional content based on what it found.
Postmodernism is a lot more relevant to the digital age than anything, imo. It emphasizes pointing out ways of thinking and doing, which I think is especially relevant when we are actually automating most of our ways of thinking and doing.
I know it gets a bad rap because of the ridiculous examples, but the real point of it engages the viewer into a serious kind of contemplation concerning the massive infrastructure that exists and how that shapes our culture, thoughts, understanding, action..
We have the expectation that the generations to come will accept this infrastructure and what it says about how the human mind functions. But much of it is founded on belief systems of how thought and action operate in the real world. Most of these systems are baseless, the idea of a base obfuscated only by the sheer complexity involved in understanding each layer.
I really look forward to when we, as academics, historically document and seriously examine the various phases of the internet, from a variety of alternative perspectives.
It's interesting while it's being built, but it's also interesting to look back and reflect on the bigger picture, outside of the buzzwords and technical terminology used to pull the creation through, and make it actualized.
I look forward when critics and theorists start thinking about the goal of the internet from a social perspective, a collective cultural subconscious directive. I look forward to all the kinds of art history theoretical methodology used to explain the significance of Picasso or Manet in their respective time periods, to use the same kinds of methodology to reason about the relation between the internet and everything that is not the internet.
It's interesting when some information gets washed away and other information is retained through time, and it isn't always the stuff that is indexed that is retained. The idea that art critics can even agree to call the same collection of works "cubism" or "impressionism" fascinates me, and I look forward to the same kinds of invented vocabularies being used to describe various processes, movements, and patterns throughout internet culture (way beyond studying memes and tropes - there are so many layers to the collective psyche of the internet, it is dumbfounding).
I don't know what geocities represents. I'd have to define it's 'kind' and compare and contrast it to other 'kinds' throughout time. I know this was meant to be a humorous comment, but I love to weave theories, and some of them even turn out to be descriptive of the nature of things.
The reason why "Laconian wit" is normally frowned upon is because it's actually almost disruptively lazy. In the event that almost everyone agrees with you, then that's okay.
But should anyone disagree with you, now they're going to have to do the heavy lifting for YOUR side. That disrupts the willingness someone has to even converse with you, and if someone retorts with similarly Laconian wit, you can see the conversation breaks down really fast, because nobody is willing to put in the extra effort to flesh out someone else's opinion when there's no reciprocity or show of effort.
As far as I can tell, you just posted part of a random screengrab from your web browser for no obvious reason. Striking's response suggests that this is actually a reference to a site which, per the OP, is gone forever, along with any chance of getting your joke. So...I'm not really sure what you were expecting.
IF anything, that's a much deeper comment than the website itself. No matter how hard you try, it's impossible to really destroy something once it's been on the web. Resistance is futile.
Of course, this won't prevent crawlers which do not honor these headers/meta tags from caching your site, but if you're not in Google's index you're likely not getting traffic from said crawlers.
I see some potential use of this, for example as soon as Google crawlers reach the site I know that it is accessible from outside and I destroy the site.
There were two factors that, together, made this happen: first, the admin login form was implemented in JS, and if you went to log in with it with JS disabled, it wouldn't verify your credentials. And it submitted via a GET request. Second, once you were in the admin interface, you could delete content from the site by clicking on an X in the CMS. Which, as was the pattern, presented you with a JS alert() prompt before deleting the content... via a GET request.
Looking at the server logs around the time it got "hacked", you could see GoogleBot happily following all the delete links in the admin interface.