Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Accuracy is a useless metric for something like this. If you have binary data filled with 97% zeros (ie most of the time it is not arrhythmia) you can use the sophisticated machine learning technique of:

  if(TRUE) return(0)
This will give you 97% accuracy.

EDIT:

I just read the headline earlier. Now after checking:

>"The study involved 6,158 participants recruited through the Cardiogram app on Apple Watch. Most of the participants in the UCSF Health eHeart study had normal EKG readings. However, 200 of them had been diagnosed with paroxysmal atrial fibrillation (an abnormal heartbeat). Engineers then trained a deep neural network to identify these abnormal heart rhythms from Apple Watch heart rate data."

So 1 - 200/6158 = 0.9675219. My method performs just as well as theirs if we round to the nearest percent. This is ridiculous.



From the commments on the article itself:

Cardiogram engineer here. 97% accuracy refers to a c-statistic (area under the ROC curve) of 0.9740. An example operating point would be 98% sensitivity with 90% specificity.

These important details are often lost in the news. You can some more details on our findings in our blog post:

https://blog.cardiogr.am/applying-artificial-intelligence-in...


Just wondering, with such an unbalanced dataset (5,958 negatives, 200 positives), wouldn't have been fairer to use average precision (area under the precision-recall curve) instead of ROC-AUC?


Thanks, the link should be changed to that.


I think the article is not precise with their wording, but the 97% is actually recall (i.e. detects 97% of the positives).


(Cardiogram co-founder here) 97% refers to c-statistic (area under the ROC curve).


Can you quote the part that leads you to think that? At first I was just commenting on the title, but see the edit. I would agree, my interpretation makes this the most ridiculous hype I have ever seen, so maybe I missed something.


No quote in particular, but it seems unlikely that they would miss such a blatant dataset bias and fail to instrument their ML models with metrics besides accuracy. But then again, welcome to Silicon Valley :)

1 - 200/6158 = 97% is indeed a pretty suspicious coincidence though. I would assume/hope that they've shuffled a big dataset of recorded heart events (like the image in the TC article) and that the 200 people diagnosed with paroxysmal atrial fibrillation only rarely experience AF, so the number of true positives is probably far smaller than 1% of the dataset.


See above, the value referred to AUC rather than accuracy.


Then it's missed 100% of the arrythmatic cases. But I appreciate the sentiment of what you're trying to say.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: