Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As others mentioned you want to do combine both semantic and td-idf like searches. The thing is that searches that carry no semantic weight (e.g. a part number) or that have special meaning compared to the corpus used to train your embedding model (e.g. average Joe thinks about the building when seeing the word "bank", but if you work in a city planning firm you might only consider the bank of a river) fail spectacularly when using only semantic search.

Alternatively you can finetune your embedding model so that it was exposed to these words/meanings. However, the best (from personal experience) is doing both and use some sort of query rewriting on the full-text search to keep only the "keywords".



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: