My personalized news site based on the book Programming Collective Intelligence

thorax · on March 12, 2008

I'm reading this book currently. A lot of it is covered in traditional AI courses in CS programs, but I like seeing the Pythonic representations of some of the concepts.

The site itself seems kind of neat, feels like you'd need to use it for a while to get it working well.

csmajorfive · on March 12, 2008

Cool site. I am working on the same thing (but for school/fun). You should look into Support Vector Machines as they are much better at text classification then kNN.

hashbucket · on March 14, 2008

I have looked into SVMs but I don't think they would work well in this case because: 1) A separate classifier would have to trained for each user and this would take too much resources. 2) I think an SVM would require too many training cases before it becomes useful.

If you know different, let me know.

hashbucket · on March 12, 2008

This is a personalized news site that I wrote in two months based on algorithms from the book Programming Collective Intelligence. Please tell me what you think.

It has two main features: the ability to identify related /similar links and suggestions/recommendations that actually work.

The basis for all of the algorithms is a document similarity metric presented in Chapter 3: Discovering Groups. Basically, to compare document A with document B, we calculate the Pearson correlation coefficient between the word frequencies of document A and the word counts of document B. (You can imagine this as plotting a series of points of a graph: each point's x coordinate is its frequency in document A and each point's Y coordinate is its frequency in document B. The Pearson correlation coefficient is a measure of how well the line-of-best-fit fits the points.)

Using this similarity metric, links can be clustered together using K-means clustering. This is what you get when you click on “related” at the bottom of each link. Clicking on “similar” gives the results of running K-NN. (“related” doesn't work as well as it could be right now because there are too few links for a link to be similar with, but this is an example of where it does work: http://fyynd.com/links/197/related/ “similar” usually works better right now.)

There are two algorithms for giving recommendations, “Suggested” and “Recommended”. "Recommended" generally works better than Suggested when you haven't yet made votes but Suggested should be more in tune to your preferences in the long run.

In layman's terms, the Recommendation algorithm works by "averaging" together the links that you liked and then find links that are similar to that while the Suggestion algorithm tries to determine whether you will like a particular link by seeing whether it is similar to any page that you have already rated highly. As a result, "Recommended" will list pages in your general interest area, but insensitive to any "niche" interest that you might have. The "Suggested" page will be sensitive to "niche" interests but will requires more votes to train. For example, if most of the link you rate highly are about computer science, with a only a few links about biology, when the recommendation algorithm averages them together, the biology links would count for very little. As a result, you wouldn't see much on biology. On the other hand, the suggestion algorithm will not be hindered by this, though it will have trouble if you don't vote much.

Please note that because predictions are so computationally intensive, they are not updated in real-time but on a hourly basis. Thus, you have to wait a bit before they come out. Please be patient!

Please check it out and tell me what you think! Any questions/comments/suggestions are more than welcome!

hashbucket · on March 12, 2008

P.S.: I forgot to mention: the voting system normalizes your ratings. Thus, if you vote all 5 stars it is same as not voting at all! You must tell it what you don't like as well as what you like.

cstejerean · on March 12, 2008

I really like the interface. It has some features I wish HN had, like the ability to hide items from view. I've been meaning to write something like this for a while but never got around to it. Keep up the good work.

oh, please create a bookmarklet to let users submit stories while browsing, this is VERY IMPORTANT, and shouldn't take much effort (use the HN one as an example).

I'd like to feed the site with stories from here and create a Greasemonkey plugin to automatically rate items on your site when I vote them up here (if I can find a good way to vote up items programatically on fyynd).

hashbucket · on March 12, 2008

Bookmarklets: done. See http://fyynd.com/bookmarklets/. As for rating links programmatically: it is a simple POST to "http://fyynd.com/links/[link_id]/rate/" with a parameter "rating". "rating" should be a float between 0 and 5. A rating of 0 will delete that vote.

Thanks for your interest.

cstejerean · on March 13, 2008

how do I tell the application which user I am when posting a ranking?

hashbucket · on March 13, 2008

You have to include the cookie.

aswanson · on March 13, 2008

How do you seed it with the search space? Does it go out on it's own and crawl?

yters · on March 13, 2008

Does fyynd correlate the interests of different users to branch out more?

jrsims · on March 13, 2008

Thanks, this is interesting!

What languages/technologies did you use to build this?

hashbucket · on March 13, 2008

I used Django. Everything is written in Python, including the recommendation "backend".

hashbucket · on March 14, 2008

UPDATE: a major bug that prevented people from submitting new links was fixed.

ews · on March 13, 2008

Congrats for the site.

Regarding presentation, did you reused any old reddit-like frontend? It looks a lot like links http://reddit.com/r/programming/info/61e7j/comments/c02j42c

gscott · on April 14, 2008

I viewed your site and I had a case of information overload. I would suggest having a feature where a person can type in what they like (cars, technology, etc) and that you feed them what they want rather then dump everything on them at once.