Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

After seeing some of their posts earlier and comparing it to live data I record at the colocations, I've concluded that they have clock issues which makes these types of anomalies appear frequently. Or they have a bad data vendor.

Interestingly enough, even the regulators don't have good (only millisecond-resolution) trade data.



These are charted with CQS (official) timestamps. Try again.


CQS has had issues in the past. You should know this.

More importantly, why wouldn't you invest in colocations and collect the data yourself using direct feeds (with GPS clock synchronization etc to validate the data)?


We do. You are grossly misinformed. The charts correctly show the sequence and times of this event. I'm not going to engage this discussion further, though pmail is fine.


As a bystander who does not have the background necessary to evaluate either of your claims at face value, I would really like to hear your explanation for why you disagree with his explanation.


A simple analogy: you are an advertiser using Facebook. You tell them you want to get 1000 impressions, and facebook gives you a report at the end of the day saying that they gave the number of impressions you paid for.

So you look at yesterday's view count and today's view count and notice that somehow the view count only increased by 900. Something is afoot!

Nanex's argument is tantamount to saying "that means we must have lost 100 organic views today"

My argument is tantamount to saying "I actually bothered to look at our access logs (which we record on our servers) and only saw 900 that we could definitively attribute to real Facebook users. Is it possible that the report is incorrect or falsified?"

Now I bring up this example because this actually did happen with facebook. Quick HN search revealed one such discussion: http://news.ycombinator.com/item?id=667308

Back to the current situation. There are many sources of market data. Each individual exchange generates its own feed, and with the major exchanges (NYSE, NASDAQ, BATS, DirectEDGE, ...) you can colocate in the exchange data centers (NYSE and ARCA are in Mahwah NJ, NASDAQ is in Carteret NJ, and various other exchanges are located in New Jersey and Chicago) and record the data yourself. There is a unified tape (CQS/CTS) which combines and disseminates a combined record (across all exchanges). This is used to determine the "national best bid/offer" -- the prices people are willing to buy/sell at.

The process of CQS generation is fraught with problems, but lots of older traders and academic types use CQS data because its much cheaper to get that data than to get data from individual exchanges directly. However, you are subject to the quirks of the combination process, including subtleties regarding timestamping data (since this data includes trades and quotes from Chicago and from New Jersey, the sequence of events may appear different if you record from chicago or new york or philadelphia or some other place; if you ask the exchanges to timestamp directly, you have to worry about clock delay and skew between the exchanges' servers).

nanex is saying that it is acceptable to depend on that data and any anomalies must have occurred outside of the recording process. I am saying that the recording process can create the types of anomalies that nanex is showing, and that the only way to be sure is to record the data directly and carefully synchronize your recording machines. AND when you do that you see that there really is no anomaly.

Just to emphasize how sloppy the exchanges are with regards to timing: on the BATS exchange they use multiple servers to run trades and generate quotes, and every once in a while you see messages appear to be out of time order because the individual machines weren't properly synchronized (although, if you filter for a single ticker, messages are always in chronological order)


There's quote feed solutions that do co-location at the exchange servers and deliver the quotes to you in individual channels (ARCA/BATS/EDGX-A etc) and also SIAC feeds.

http://www.limebrokerage.com/services/marketdata/citrius


That looks like a hardware product. You still need to purchase market data access to the relevant exchanges to get that data, and those are expensive (NASDAQ costs, for example, run upwards of 20K/mo to get the lowest-latency data)

They should just spring for the data before making accusations -- the problem is that when you cry wolf all the time no one will take them seriously when a real case comes around.


Hi, sorry for being off-topic for other reasons, I want to have a personal chat with you, my email is abtocool'at'gmail

please send me a line, when you can. Cheers.


Honest question: How are timestamps different from any other user-supplied data? Is the timeline of events ever recorded using an unsynchronized clock for relative comparison?


hmm, its not only the activity just before the official announcement, but also, just after it. There is no way possible that people can just make make a trade 'after' hearing a report 100-200 milliseconds after the report.

The trading before the official announcement is a concrete proof, if this is official, and bug/error free. But then again, this is not my forte.


Interesting, so you are saying that Nanex is wrong about someone trading 400ms before the report?


To give an example of how this could happen (not saying this is what happened, but I've heard this happened before):

Suppose you left ntpd running and automatically adjusting the clock every hour.

If your clock is running faster than pool.ntp.org, and you are synchronizing to it, you may end up adjusting in the middle of an event. Because your clock is running fast, you would jump back in time, breaking the sequence of time (this is somewhat equivalent to what you see during daylight savings time if you aren't intelligent in the way you handle the backwards hour shift)

In this case, if the adjustment was forward in time, there would be a gap.


My understanding is that ntpd corrects clock drift by replaying milliseconds consecutively, not by actually jumping back.

However I can't remember where I read that and could be totally wrong.


I'm sure there are lots of buggy NTP implementations out there that "adjust the clock every hour", but the way it's supposed to work is by continuously varying the speed of the clock (for example, using adjtime()) to correct any discrepancies. At no point should the clock jump backwards or forwards, or even have milliseconds that are more than X percent longer or shorter than usual.


400 ms is an eternity of time. I can't imagine they'd be off by that much.


"is plotted with official exchange timestamps" suggests that they are doing it wrong. You can't compare apples to oranges without knowing how the exchanges are timed.

For those who do latency tests, this is a very important point: you should always be on the lookout for what clock is recording the 'start' and the 'stop' and to be sure to consider clock skew.

To get a sense for how far timestamps can diverge, OATS -- the reports that are sent to the Financial Industry Regulatory Authority -- require that machines be synced to within 3 seconds of NIST (which is nearly 7.5x longer than the 400ms quoted).


You do seem to know a lot about this topic, but I don't think you should draw any conclusions based on speculation.


There is no "speculation" here:

When you measure latency, you always have to be careful about clock issues at the point where you measure the start and the point where you measure the end. This is old-hat for sysadmins and others who deal with these types of issues. This is why round-trip latency numbers are easier to work with: both the start and the end times are taken on the same clock.

It's clear, given nanex's responses, that they depended on someone else to give the timestamps. Before concluding that someone had inside information ahead of time, they should check their processes. It's like someone claiming they built a perpetual energy machine because they confused power with energy (i want to say it was paul newman but the name escapes me -- this actually happened)


Could be completely explained by someone having the right relay for this data in the right place at the right time (which would have been the result of a considerable outlay of resources and compensation to establish, thus boosting economic output long ahead of any potential killing(tm) made here).

And I can't see any particularly efficient way to level the playing field with a government that barely understands facebook and fair use. Just try to explain microsecond latency to them, go ahead, really, should be even more fun than a series of tubes(tm).


Not disagreeing with you, but wanted to mention that I recently learned that quartz clocks typically drift half a second PER DAY, which was shocking to me. This seems to imply that computers are subject to the same drift unless they are syncing via NTP many times a day.


(I don't know a whole lot about the intricacies of trading) Is there a minimum standard "resolution" for trades, or is everything sort of a best effort based on the speed of the computing hardware involved?


Each exchange is implemented differently. Some have much lower round trip latency times, but currently latency numbers are in the microsecond range.

The quotes themselves are provided with resolution from 1msec to 1nsec (depending on the exchange and on the type of data you wish to receive)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: