I too know the websites you visited

ithrewitallaway · on Dec 3, 2011

Throwaway account. My company created an analytics product around the ability to track which sites your visitors have visited. It used a different and (at the time) more reliable technique.

About a year after the product launch we were contacted by a powerful washington based lobby group and they wanted to chat. They felt it violated a site visitor's "reasonable expectation of privacy". I agreed. So we pulled the feature and dodged a bullet as this "browser bug" hit the mainstream press a few months later. The feature wasn't a major part of our product's value prop, few of our customers used it and none missed it.

So if you're thinking about basing a startup on this, don't. You will get a call very quickly from organizations much larger than you are asking awkward questions.

code_duck · on Dec 4, 2011

I wonder if they have contacted Facebook about some of their practices, which may be similar?

inconditus · on Dec 4, 2011

Related, an advertising company who was accused of tracking history: http://cyberlaw.stanford.edu/node/6695

swah · on Dec 3, 2011

Open-source it?

evan_ · on Dec 4, 2011

I'm sure it's the trick where you make a bunch of links to various sites, set a :visited color, and then interrogate all of the links with javascript to see what color they are. It's no longer possible on most browsers.

scottkrager · on Dec 3, 2011

Somehow I think with a throwaway account that's probably not likely.

kingkilr · on Dec 3, 2011

Not even close, while all the sites it said I visited, I had, it missed tons of other sites I'd visited.

burgerbrain · on Dec 3, 2011

It said I have not visited any of them, except google (there it just says "whoops", I'm not sure if that is a hit or not).

shdon · on Dec 3, 2011

Yup, got the same. Then rerunning it said I'd visited almost all of the sites.

anfractuosity · on Dec 3, 2011

You'd expect that, as it works by loading an image from each of those sites.

So your cache will end up with images from each of those sites.

shdon · on Dec 3, 2011

I get what you're saying, but if the technique really worked, I'd expect it to have told me the sites I frequently visit on the first run and a 100% score on the second run, but it was only like 80% of that.

_ezkx · on Dec 3, 2011

Same here on ipad

cruise02 · on Dec 4, 2011

Same here. Zero false positives, but several false negative results.

AzAngel · on Dec 4, 2011

Same here, it failed to catch Google and I am there almost daily. (Mostly for the doodles.)

bjoernbu · on Dec 4, 2011

same here. funny thing is that all correct matches were for sites actually opened in another tab and it even missen facebook which was also open in some tab.

jyrkesh · on Dec 3, 2011

Same here. There were 3 correct positives listed, and I had about 7 or 8 false negatives.

thenextcorner · on Dec 4, 2011

Same here, although I visited multiple sites listed in the very same session on the browser I have open, the tool only reported one as visited, and Google as a Whoops. It should have reported at least 6 more!

shaggyfrog · on Dec 3, 2011

I think it got 100% for me, on Safari/Mac.

Note that it doesn't need to be 100% accurate to be effective. If it guesses better than 50% (i.e. coin flip), then it could be used to give guesses with at least some confidence. No different than analyzing any other noisy dataset. Because this all works client-side, it can also be done quite invisibly.

cookiecaper · on Dec 3, 2011

Right, but those applications will be for things like online advertising, which is already tracking your visits and/or just assuming you use the popular sites anyway. Can this be used to violate privacy in a meaningful manner if it misses 50% or more of the time? You'd have complete plausible deniability if you were pinned to have accessed some site you don't want people to know you accessed. It can't be used as evidence in anything. What would the threat look like?

nitrogen · on Dec 4, 2011

I oppose user tracking not necessarily because there's something to hide, but because a sense privacy is a fundamental component of a sense of independence, and for me, independence is one of the critical components of happiness.

gujk · on Dec 4, 2011

It would have to guess better than anonymous modelling, not 50%. I'd happily bet even odds layout that each of my site visitors visit Google.

shaggyfrog · on Dec 4, 2011

If the question is "Has this user visited site X?" then I'd hope that any kind of modelling is better than 50%, as simulating a coin toss would be at least as good.

_mnjb · on Dec 3, 2011

there is already way better and reliable methods to accomplish same goal. e.g

http://ajaxian.com/archives/spyjax-using-avisited-to-test-yo...

tmcdonald · on Dec 3, 2011

I was under the impression that attempts had been made to hide any effects of :visited styles on the accessible DOM to stop this from working. There's a particularly good article on Mozilla's attempts [1], and the relevant bug on Bugzilla. [2]

[1]: http://dbaron.org/mozilla/visited-privacy

[2]: https://bugzilla.mozilla.org/show_bug.cgi?id=147777

mike-cardwell · on Dec 3, 2011

There's another trick you can use, for detecting if somebody is logged in to certain sites. See:

https://grepular.com/Abusing_HTTP_Status_Codes_to_Expose_Pri...

Although, the Google test on that page is currently broken. The Facebook and Twitter ones aren't.

rmason · on Dec 3, 2011

Apparently the key to people not knowing where you visited is to use IE. It missed sites like Twitter and Facebook that are open for me all the time. It did get one site correct, HN ;<).

click170 · on Dec 4, 2011

It guessed HN correctly for me the first time, but on the second run it said I hadn't visited any of the sites, including HN. Perhaps a bug..

AshleysBrain · on Dec 3, 2011

What would be a possible use of this attack? I can't think of anything useful you'd do with knowing that you've visited Facebook. And so many people use sites like Facebook you might get a better success rate just always returning "visited" rather than measuring this way!

TorKlingberg · on Dec 3, 2011

If a malicious website can tell which banking websites you have visited, it can show a phishing page that looks just like your bank.

bergie · on Dec 3, 2011

Maybe you could use it to only show those social sharing widgets that are for services the visitor actually uses. Though I guess WebIntents will eventually be a better way to handle that.

dangrossman · on Dec 3, 2011

Ad retargeting without going through one of the high-reach ad networks. Or any product site instantly knowing which competitors you've researched, and tailoring their pricing to that.

fasouto · on Dec 3, 2011

Old but related: Using your browser URL history to estimate gender http://www.mikeonads.com/2008/07/13/using-your-browser-url-h...

Seems like i'm 50% male 50% female :D

simonbrown · on Dec 3, 2011

That's because the CSS trick it uses doesn't work any more.

It would be interesting if someone updated it to use this new trick or even just as a Chrome extension.

pnathan · on Dec 3, 2011

Nope. Only got 1 right.

Best of luck, it's an interesting concept!

joe_the_user · on Dec 3, 2011

It's kind of comforting to see the fails here...

joejohnson · on Dec 3, 2011

Again, this seems to be inaccurate for a large number of people. Can we take these two attempts as evidence that is hard for malicious websites to discern our browser history?

nolliesnom · on Dec 3, 2011

With the exception of Facebook (which I visited this morning), the results were accurate (Amazon, reddit, linkedin, wikipedia, youtube). Spoooky!

pwaring · on Dec 4, 2011

Said I hadn't visited any of the sites, except HN (which is easy to guess since news.ycombinator.com will be in the HTTP_REFERER field...).

If I re-run the test it still gets some sites wrong (says I haven't visited them when in fact I have). It even claims I haven't visited Amazon both times when in fact it's open in another tab.

aaronjg · on Dec 4, 2011

I just tried it twice, once on a public wi-fi network. And then again when I got home. It worked very well on the public wifi, and had many false positives at home.

It seems to work better on slower internet connections. The script returns calls a site "visited" if the response time of the potentially cached image is less than 1/20 the time of the certainly uncached image.

On slow connections the cache is much faster than the uncached. On fast connections it's only slightly faster. However, the known uncached images sometimes have "10x increase in latency" so it seems that based on my (and other's experience) that this is a major problem.

One could attempt to normalize this for the sites where appending random query string causes higher latency. Simply precalculate the added latency from images with the random query string on a per site basis. Then subtract it from "uncachedTime."

prsimp · on Dec 3, 2011

Doesn't appear to guess correctly in Chrome 15 on OS X (10.7.2). I'm not sure exactly what the 'whoops' means for google - but I've obviously visited HN and have visited a few of the others as well.

Screenshot: http://cl.ly/1i0921270W2b1u190b0W

crocowhile · on Dec 3, 2011

Didn't work on Opera on Linux (said I never visited any of those sites)

copypasteweb · on Dec 3, 2011

RequestPolicy prevents that approach.

mahmud · on Dec 3, 2011

Damn right. Don't handout cookies to strangers unless you're a girl scout.

pella · on Dec 4, 2011

the Images :

  facebook: 'https://s-static.ak.facebook.com/rsrc.php/v1/yJ/r/vOykDL15P0R.png',

  twitter: 'https://twitter.com/images/spinner.gif,

  digg:http://cdn2.diggstatic.com/img/sprites/global.5b25823e.png,

  reddit: 'http://www.redditstatic.com/sprite-reddit.pZL22qP4ous.png,

  hn: 'http://ycombinator.com/images/y18.gif,

  stumbleupon: 'http://cdn.stumble-upon.com/i/bg/logo_su.png,

  wired: 'http://www.wired.com/images/home/wired_logo.gif,

  ....

georgefox · on Dec 3, 2011

Really interesting concept. This one wasn't as accurate for me as the original Firefox-specifc proof of concept, though. It only picked up on YouTube and Wikipedia. What's with the "whoops" on Google?

I do use NoScript and Ghostery, though, and I could see how that might cause some false negatives.

sairamkunala · on Dec 4, 2011

When the script is run the second time, it will show that every site was visited. After the first visit, guess its cached and cannot figure out if its a hit or a fail :)

Running in Chrome's incognitive mode is a bit different though. only 7 show up cached the first time its run.

mahmud · on Dec 3, 2011

It said "not visited" for EVERYTHING, except google which says "whoops". I have visited nearly all of them in the last 2 months.

But don't despair, I have one of the most hostile browser settings. I have RequestPolicy, NoScript,and Flashblock.

bermanoid · on Dec 3, 2011

Same results here, and I'm not running any of the things you mentioned; also been to many of those sites, within the past few days in many cases. I'm basically on stock Chrome (though I do have Adblock).

larubbio · on Dec 3, 2011

Same for me. I then opened facebook in a tab, reran it and it said I had visited every site except facebook.

5hoom · on Dec 4, 2011

Safari with adblock and ghostery here, got 8 correct and 18 false positives.

preek · on Dec 3, 2011

Interesting concept, but I'm quite certain that I haven't only been on xkcd.

http://dispatched.ch/pic/visipisi-20111203-214939.jpg

cf0ed2aa-bdf5 · on Dec 3, 2011

Mostly right for me except it didn't know I visited twitter and facebook (both tabs are open right now).

That's probably due to me blocking facebook and twitter widgets on sites other than Fb and twitter though.

joshfraser · on Dec 3, 2011

In my case, the ones it got wrong were the images that returned a 304 (not changed) header since they returned significantly faster than fetching the full image.

drunkenmasta · on Dec 4, 2011

It said no to sites I had visited. Unless that is what is was programmed to do I'd say it did not work. You can message me for any other info about the test.

dhs · on Dec 3, 2011

From the 5 sites I visited, it correctly flagged HN, WP and YT as visited, and gave a "whoops" for FB and Google (what does that mean?), which I both visited.

kevinalexbrown · on Dec 3, 2011

I got extremely inconsistent answers on multiple runs.

mvalle · on Dec 3, 2011

Indeed, as the second time you run it, you have visited all those sites, and some of those images are in the cache.

The first time, I had one 'visited', the second time about half were 'visited'. I'm surprised not all of them were, though...

stewbrew · on Dec 4, 2011

No, you don't. One false positive, many woops, quite a few false negatives. After calling the script a second time, almost all guesses are wrong.

jamesbritt · on Dec 4, 2011

Apparently I haven't visited HN. :)

I wonder if the use of ghostery, no-script, that sort of thing, is what bamboozles it? Overall, it looks like it's guessing.

ComputerGuru · on Dec 3, 2011

Completely wrong. Said I visited some I haven't heard of, whooops on Google, and not visited on most of those that I've been to recently.

rohit89 · on Dec 3, 2011

It got all of them right for me (Chrome, Windows XP) except for twitter. I got a "whoops" for google multiple times.

swah · on Dec 3, 2011

Big miss for Twitter, but cool idea anyway.

Technopia · on Dec 3, 2011

The results are not consistent. Each time I click the button it keeps changing and also lists the wrong sites.

cf0ed2aa-bdf5 · on Dec 3, 2011

A second try gives me a 'visited' result on almost every page (except techbuy).

The first try was pretty correct though.

fez · on Dec 3, 2011

It only got 4 out of 15 correct for me.

meric · on Dec 5, 2011

It says I've visited HN and Slashdot. I haven't been to slashdot this year, but I did go to facebook...

ChristianMarks · on Dec 4, 2011

Dead on for me. Chrome under Ubuntu.

wasd · on Dec 3, 2011

Did not work at all for me. The other one had slightly better results. Win 7 on most recent FF.

davidwparker · on Dec 4, 2011

Got only 1 for me- youtube.

Several others it said I didn't visit but I did.

And it said I visited linkedin, and I didn't.

tronicron · on Dec 3, 2011

Interesting. Twitter and HN yes, Facebook and LinkedIn no. Chromium on Debian.

bad_user · on Dec 4, 2011

I'm on Firefox 9.

For all the entries I got "not visited", even though I visit a lot of them.

aespinoza · on Dec 3, 2011

Extremely wrong on Chrome. The first time I ran it, it said I had not visited any of them. Google and HN were definitely browsed today.

Ran it again, ALL of them appeared visited. Even sites like abebooks, which I have not visited at all.

dangrossman · on Dec 3, 2011

That's because the second time you ran it, all the images were in your cache from the first time you ran it. That's the expected result.

cookiecaper · on Dec 3, 2011

Yet another reason this test is useless. If site A uses it, it may get partially correct data, but when you browse to site B, it will return 100% positive, most of these being false positives.

I just don't see any practical application for this method with such high error rates. The methods mentioned above are only valuable if you can guarantee at least relative reliability. By and large the results have been seemingly random, with only one or two persons reporting 100% correctness. So what's the difference between running a test with wildly unreliable results and just doing something randomly?

dangrossman · on Dec 4, 2011

First, it's a proof of concept, that's all.

Even so, even without doing any work to ameliorate these flaws, it could still be (ab)used. Don't assume that it's only useful if everyone can scan which of the top 100 websites you've visited.

Any site could use this to check which competitors' sites have been visited. It's unlikely anyone else has an interest in checking that information, so the cache is not going to be poisoned by anyone else. With knowledge of which competitors a potential customer has checked out, you could do some effective price discrimination -- the guy looking at the $10 solutions sees your lowest price, while the guy looking at some competing Microsoft Dynamics package enters a more enterprisey sales funnel.

It's also useful for retargeting. Throw the code up on an ad network and you only test for cache hits against domains of current advertisers. If there's a hit, store it in a cookie so you don't need to check the (now filled) cache again. You can now show ads for companies a person has already had an interaction with, without having to cookie every visitor to the advertisers' sites first.

It doesn't take much to come up with (mostly nefarious) uses for this, even without perfect accuracy and even without the ability to have multiple parties check the same URLs.

It also doesn't take much to come up with ways to improve the process. You can ameliorate the problem of overlapping testers by having a large pool of URLs from each site to check. The average top 1000 site probably has dozens and dozens of images and other resources per page, each of which can be used for a cache test.

aespinoza · on Dec 4, 2011

First of all, I was explaining my experience. Second,I know it is a proof of concept. Still, I don't think it really does prove any concept, since the results are not reliable at all. A proof of concept, requires to give you a reliable results, hence a proof.

wnoise · on Dec 4, 2011

Three false negative, one false positive, and one "whoops".

tct · on Dec 4, 2011

Guessed all of mine correctly with Chrome on Mac. Scary.

geuis · on Dec 3, 2011

The other one worked fine on iOS. Yours failed all tests.

paisible · on Dec 3, 2011

got it mostly right initially, I then visited facebook (which it said I hadn't been to), it then told me I visited ALL the websites (except facebook).

1bertlol · on Dec 3, 2011

that makes sense though (mostly), after you test once all the images are cached on the second run.

JohnLBevan · on Dec 4, 2011

Nice trick; though it's a once in a cache-time event.

mooki · on Dec 3, 2011

Got three - one false positive. Firefox on linux

Detrus · on Dec 3, 2011

only got HN for me, using Chrome 15 on OSX.

hackermom · on Dec 3, 2011

Fails for me on Safari 5.1.2 / OS X. It got 1 site right, 1 site wrong, the rest being "not visited".

_mnjb · on Dec 3, 2011

Not even close, too. Instead of measuring load time, you can create "<a>" elements verify their rendered color is the color you defined for visited links. It's a trick of old times...

joshfraser · on Dec 3, 2011

a trick that all modern browsers have fixed

_mnjb · on Dec 5, 2011

hmm. I thought it's impossible to block that solution since a coder should be able to get the computed value of a style property. I'll try it soon.

Craiggybear · on Dec 3, 2011

More accurate but a couple of false positives.

Chromium on Linux

_mnjb · on Dec 3, 2011

http://ajaxian.com/archives/spyjax-using-avisited-to-test-yo...

come on...

joshfraser · on Dec 3, 2011

as mentioned elsewhere in this thread, that loophole has been closed by all modern browsers. not to say there aren't other ways to get at that information, but it's not as simple as checking the color of a link anymore.

portentint · on Dec 3, 2011

Missed about 75% for me.

sairamkunala · on Dec 4, 2011

try running it again. it will say you visited them all !

script FAIL !