> *maintaining history is expensive (content needs moderating, you are required ...

donaltroddyn · on Dec 8, 2019

If you are a company, GDPR does apply to data on physical letters and local emails. A large part of the preparation for the introduction of GDPR enforcement was companies getting a handle on what they had stored in various media.

merb · on Dec 8, 2019

actually email and letters are something which the gdpr falls short in some countries. especially germany. since basically the constitution is above the gdpr and depending on the letter/email the content of the letter does not need to be acknowledged or showed (gdpr also means you can access your data) to the person who want his data deleted/showed/whatever.

big_chungus · on Dec 9, 2019

All true, but costs of hosting and serving aside, there is a non-zero legal cost with hosting and serving the content. Blame bureaucrats, parasite lawyers, and our litigious society.

dredmorbius · on Dec 9, 2019

Those costs reflect the actual social costs of that hosting. Prior to GDPR and similar legislation, those risks were externalised onto users and society at large. They're now being shifted, properly, to where they should have been borne in the first place, on the service providers themselves.

Blame risk-externalising business practices and willful ignorance.

big_chungus · on Dec 9, 2019

What social coast is there to distributing content contributed by people who agreed to terms according to those terms? Users transmitted data about themselves to a party after reading that party's terms of service and agreeing to the things it promised to do with the data. To paraphrase a popular talking point, two consenting IP addresses should be able to send whatever data they want between each other.

dredmorbius · on Dec 9, 2019

1. Terms of use can change at any time.

2. Technical capabilities have expanded massively. When Yahoo Groups launched, enterprise storage of more than a few hundred GB was highly unusual. I worked for a Very Impressive Service Agency which was lucky to claim two Sun Starfire servers, only one of which was Large File (> 2 GB) at about the time, for analytic use.

By the late 2000s, AOL were deploying massive-RAM based systems to be able to perform whole-dataset operations in memory.

For the past ~5-8 years, large-scale SSD drives have been A Thing, now available in the terabyte range, for a price. Again, the level of analysis and expolration possible have made tremendous leaps.

3. There is the concept of manifest vs. latent functions, and awareness. The full realm of possibilities of technical systems are rarely apparent to their creators, let alone nontechnical users. See (very generally): https://en.wikipedia.org/wiki/Manifest_and_latent_functions_...

The marketing and disclosures of such services rarely include such disclaimers as "use of this system may subject you to a lifetime of personal and social profiling, grammar-based context analysis, GD ML AI based image content analysis, and imperil the global liberal social democratic experiment."

Hiding behind the figleaf of "you should have considered all possible future implications of your present actions and will have no future recourse" is grossly flawed, and quite frankly, professional malfeasance and malice aforethought given current understanding.

The awareness of risks has changed, and is unambiguous. Providers should foot the costs, or mitigate them accordingly.

(I suspect that at least in part, the actions of Yahoo, Google, and others, reflects this changed awareness, though I'm not aware any providers have explicitly stated this.)

Again: the risks always existed. The previous state was made possible only by pretending they did not. They do. Practices must change.

big_chungus · on Dec 9, 2019

Social cost would be at best very difficult to quantify, though, making it quite hard to handle. "Increased partisan tensions" due to social media, for instance, is not the sort of thing the cost of which one can quantify and mitigate.

Your point that the things which can be done with information collected are constantly in flux, and I agree the ability to retroactively change terms of service to cover previously-collected data is ridiculous and implies an illusory contract which is not legally valid. No one should be able to run through a neural net data collected in the nineties. However, it's also not reasonable to demand that old data be removed, as it's produced at least as much by the server as by the client (e.g. access logs are typically produced by server-side monitoring of server-side software). The most sensible option is for companies to require explicit agreement to TOS changes to continue using the service, and use new data only under that policy while using the old data under the old policy. It's additional compliance overhead, certainly, but it's no different from how a client contract would be treated.

> professional malfeasance and malice aforethought

You are not the arbiter of such things, but thank you for your opinion. There's also a site guideline about assuming good faith, so you're in violation of that.

dredmorbius · on Dec 9, 2019

My own thinking on this has evolved very considerably over the past five years or so. That's included a comprehensive and ongoing exploration of the fields of media, communications, epistemology, and several others, related to this. I'd long seen computers as technology, largely independent of social implications. I now see these as utterly inextricably linked, and with implications that are anything but predictably benign.

Costs being difficult to assess does not mean impossible, and the notions of probability and risk are central to all finance, investment, and insurance. Uncertainty is NOT an absolute lack of knowledge.

Among the principles that becomes apparent is that changes in informational regimes have profound impacts upon societies, and that this is a pattern which can be traced back through history to the invention of writing itself, and via indirect anthropological evidence likely to the emergence of speech.

The principle transcends humans themselves -- a leading theory for the Cambrian Explosion is that it was a consequence, effecively, of structuring and communications mechanisms within organisms developing, and allowing the creation of complex body plans, and not merely single-celled organisms or masses or colonies of cells.

For media, see especially Elizabeth Eisenstein's The Printing Press as an Agent of Change and Marshall McLuhan's The Gutenberg Galaxy. The link between mass media and totalitarian, fascist, authoritarion, and nationalist sentiments has long been observed (Hannah Arendt, Dwight MacDonald, the Frankfurt School, Edward Herman & Noam Chomsky, Adam Curtis).

I've been impressed by the insight, or occasionally, lack, of awareness of the potential perils of comprehensive data archives by pioneers within the data field.

Paul Baran, co-inventer of packet-based networking, wrote "On the Engineer's Responsibility in Protecting Privacy" (https://www.rand.org/pubs/papers/P3829.html) in 1968, some 51 years ago. In it he remarked on both the risks, and industry attitudes:

There are many amongst us who would not hesitate to build equipment to compromise the privacy of any given individual provided the price is right. These are the whores of industry. They would not hesitate building systems and devices contrary to the public interest; their only concern is the buck.

The full paper, and in fact, all of Baran's RAND publications, are online in full-text, following my request to RAND. I remain grateful to them for this.

Baran was also interviewed for a 1966 BBC documentary:

"Well, he who has access to information controls the game. This is very dangerous. I think both your country and mine have never trusted the government completely. We do so for good reason. Here we have a mechanism that could be abused. Here we have a mechanism that would allow the creation of a dictator. . .

I've yet to see an expression by anyone in Congress about this new type of danger. In fact, we see proposals for centralizing information, we see proposals for rushing ahead into new, more efficient computer information systems, and very little thought is being given to the dangers of the misuse of these systems. . . I ask a lot of people about privacy, why they valued it, and I was surprised by the number of people who said "Well, I don't do anything wrong. Why should I worry about privacy?" And then, on the other hand, I think there's a more wise group that says, 'Privacy is really the right to be wrong, then go on and live the rest of your life, without having it mark you forever.' I tend to think this latter view is the view we should hold.

https://invidio.us/watch?v=FwaDvJYZTVk&t=29m31s

Another view was expressed by AI pioneer and Nobel Laureate (economics) Herbert Simon:

"The privacy issue has been raised most insistently with respect to the creation and maintenance of longitudinal data files that assemble information about persons from a multitude of sources. Files of this kind would be highly valueable for many kinds of economic and social research, but they are bought at too high a price if they endanger human freedom or seriously enhance the opportunities of blackmailers. While such dangers should not be ignored, it should be noted that the lack of comprehensive data files has never been the limiting barrier to the suppression of human freedom. The Watergate criminals made extensive, if unskillful, use of electronics, but no computer played a role in their conspiracy. The Nazis operated with horrifying effectiveness and thoroughness without the benefits of any kind of mechanized data processing."

https://pdfs.semanticscholar.org/a9e7/33e25ee8f67d5e670b3b7d....

There is, of course, one slight problem with Simon's argument: The Nazis did make heavy use of mechanised data processing, provided and supported by IBM. Edwin Black documents this meticulously in his book IBM and the Holocaust:

https://ibmandtheholocaust.com