Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Real-time Search as a Service (algolia.com)
92 points by davidbarker on Feb 16, 2014 | hide | past | favorite | 36 comments


To keep your users engaged, search results need to show up instantly and be relevant to them, even when they do typos.

To try this out, I went to the demo page for searching TV episodes (http://www.algolia.com/demo) and searched for "The Wire Season 2". Here are the four results given, with the highlighted portions bracketed:

[The Wire] Gag Reel [Season 2]

[The] Simple Life [Season 2] - Special - The Stuff We [Were]n't Allowed To Show You

[The] Farmer Wants a [Wife] (Australia) [Season] 6, Episode [2]

[The] Cosby Show [Season 2] DVD Extra: New Interview with [Dire]ctor Jay Sandrich

Rather than seeking "engagement", I'd put more emphasis on having high quality search results. Having 3 of the 4 results ignore the properly typed title of the show is a terrible interface. Correcting "Wire" to match "Director" is absurd.

The sad part is that these results might make you think that the episodes for season 2 of the "The Wire" aren't in the database, but they are. But they are, just not indexed in a way that they are found using the exact phrase "Season 2".

Trying to be more constructive, there is a typo in the first sentence of your Intro, where the name of your company is spelled wrong. Also, "Real-Time Search" usually means search against a database that is being constantly updated. Anyway, I need to get back to screaming at the kids on my lawn.


(I am the Co-founder of Algolia)

Thanks for your comment, you are right that the data are not perfectly indexed for this query. We have taken data from one of our customer and his use case is to search only for TV show names.

We will improve our demo to cover that case.

Thanks


Also try this:

"Futurama holiday" shows one results which has the word "Episode" in the description. Try "Futurama holiday episode" and you get no results.


"Episode" is actually part of the HTML label used for display, not the description.


Yeah but that information should be picked up in the search


If you're wondering if Algolia is right for you, just ask them. Within 5 minutes of initiating the chat window I had the CEO, Julien helping guide me through the process of getting my XML into JSON to see if it was right.

Then he asked me more about my use case and actually steered me towards an Elasticsearch solution since it sounded like a better fit.

All in all we went back in forth communicating for 3-4 days for him to lose me by necessity and I already feel like a satisfied customer.


just to be clear, this was about a week ago

obviously the CEO is watching this thread now so he should be quick to grab people now as well, I imagine


I don't understand what makes this particular service tout 'realtime' as its primary selling point.

Don't all search engines (and other hosted search services) aim for fast (100s of milliseconds) retrieval, show-as-you-type and realtime indexing?

Don't get me wrong. Getting all this right is very hard, and kudos for the great performance numbers (vs Elasticsearch), but 'realtime search' smacks of marketing copy.


You can try to search-as-you-type on our hacker news search to see the difference with other search engines: http://hn.algolia.com/

You have relevant results after each keystroke, even with typos. Classical engines use approximation to perform instant search, like the suggest module of Elasticsearch.


It seems like for the HN search, your ranking function is the number of votes (or very highly correlated with it). If this is true, its not solving a problem as hard as 'classical' engines, which compute a lot more. It would be great to demonstrate this sort of performance on comparable rank functions. I don't know anything about Elasticsearch ranking though, maybe they have a very simple rank function too.


It is more than just a sort on number of points :) Our value is to be able to mix textual relevance with business data (in that case the number of points but is can be the number of page views, number of followers, ...).

You can have a look at our blog post which explain our ranking in details: http://blog.algolia.com/search-ranking-algorithm-unveiled/


No offense, but I hate your business model. Convincing devs to put their search db in the hands of a small hosted startup is a recipe for disaster (see indextank).

There must be a better way. ElasticSearch and MongoDB use open source business models that I think tend to work much better for smart devs picking technologies (irrespective of their actual products).


Hmm, from my own experience - yes, there is alternative open search engines, and there are a lot actually. But did you ever try one on your own? Most of them are a nightmare to setup and are atrociously slow as soon as you get a few thousands entries... Sometimes, it's definitely worth it to externalize some expertise. Search is definitely not easy to masterize.

(And I was a user of indextank when they shutdown)


Agreed, I think there's a good middle ground. Lots of hosted APIs use open technologies, and I'm paying for the convenience of someone handling everything for me, without being locked into a single provider who will likely get acquihired and shut down at some point.


What was wrong with indextank?


They got acquired by LinkedIn and they shutdown the service.


Found this because because hnsearch.com is migrating to it. It's very fast. http://hn.algolia.com


Unfortunately it does not seem to have the accuracy, or breadth, of the old hnsearch.com. Hopefully this will be fixed in time, but I have found it lacking relevant results and myself switching back to hnsearch on most occasions.

I also wonder about all the other small applications in the "HN ecosystem", like karma tracker, that rely on the hnsearch API. I see that algolia has an API, but will those other projects just die too?


Feedback is always welcome. Do you have a concrete example of a query returning bad results and what would be the good results?


That explains why I noticed a couple of things off on my aggregator. I was using hnsearch.com/rss which recently seemed to have been alter and is now missing most of the data I was actually using.


How is this different than swiftype? I ask because I am a current switftype user and am trying to understand what your case to switch might be.


Swiftype is a great tool to search for webpages.

Our focus is to search in a database (for example products in e-commerce website, persons in a social network or CRM, ...).

We offer features dedicated for that use case, you can have a look at these two demos for example: - http://www.algolia.com/ecommerce - http://demos.algolia.com/rapgenius/


All great and good to be very fast, but at what price? From their page it costs $450 for 5mil records. In the search world, this is nothing. So I guess its going to come down to if your company is at the point where they need to shave off 1-200ms for hundreds of dollars a month.

Second, I would wait and see how their reliability hashes out before I rely on them for any production services.


The search world is very big :) 5 mil records is nothing if you index logs (which is not Algolia typical use-cases) but for example this is big from an e-commerce perspective.


In the e-commerce world, the difference between 2ms and 200ms isn't that big of a deal. Search relevance, however, might be something that is important. It looks like that is something they are focusing on heavily.


200ms does not allow you to provide search-as-you type and our e-commerce customers see that performance is linked to conversion.

In term of relevance, we have developed a ranking algorithm specialized on that case that provide better results than traditional approach: http://blog.algolia.com/search-ranking-algorithm-unveiled/


Your algorithm seems OK, but what was the "traditional approach" that you compared it to, and how did you compare them? It seems like you actually gain a lot from full document search (e.g. products with multi-paragraph descriptions). Otherwise, you might as well just do a SQL query to get your results.


By traditional approach we means all engines that use a unique score to rank documents (like all engine based on Lucene).

SQL queries are not relevant for text query, for example you have no notion of tokenization or proximity between words.

We plan to write a blog post on limit of SQL based search.


Ahaha, speed is huge deal for e-commerce. It has already been proved enough by big merchants tests. Search relevance is too! The key is to have them both at the same time :)


I think improving on relevance ranking configuration would be a big boost to this product as well as offering some ability to cross-search multiple indexes. Both are quite difficult problems to solve well in search, but if a simple API service was available that might be attractive for larger commercial customers.

The icing on the cake would be to have some support for relational (at least partially relational) data and multimedia / files. Good luck!


First of all, great job guys. The library support is fantastic (node.js, python, ruby, php, even a shell client). We are currently pushing our nginx logs to ElasticSearch, and was going to use ES for some new features on https://commando.io, but instead we will use algolia.


Just in case anyone from algolia is reading: search is sub-optimal on mobile

https://news.ycombinator.com/item?id=7245140


Yep right, filtering options are currently hidden on mobile devices, showing only stories. I'll try to improve that later today.


How does it work with API calls? How many calls are typically made by a real-time search for, say, a 10-letter keyword?


We usually recommend to perform one query (one API call) per keystroke starting from the first one. The actual number of calls depends a lot on the use-case. Our ranking takes into account both relevance and popularity to suggest the best result first which greatly reduces the number of letters you need to type. In use-cases where there is a very strong popularity indicator, like the number of followers for TV shows, we usually get the correct result at the first keystroke (b -> breaking bad, d -> dexter). At the other extreme, you may need to type several words.


Got it, thanks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: