Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The "control" for the actual (machine) learning may be the test set derived from the available data, but the control of a medical study must be real life (new data) and not old data used for the machine learning algorithm.


You don't need 100% new data to get an idea of performance. It's meaningful (indeed, common) to select some arbitrary percentage of the real-world data as a "training set" then measure the performance of the algorithm on the rest. It's not proof against overfitting, of course, but it helps.


right, but there's a reason we run clinical trials before accepting that something is superior to standard of care. It is really, really hard to manipulate a preregistered clinical trial (not that people don't try) and incredibly easy to manipulate train/test/validate results.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: