Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Sentiment Analysis – Improving Bayesian Methods (github.com/kennycason)
83 points by KennyCason on Aug 1, 2016 | hide | past | favorite | 16 comments


In the spirit of picking at nits, why not call the technique by its name, "bagging" or "bootstrapping"?

Also, I would like to see a better metric than simple accuracy. Maybe harmonic mean of precision and recall?

Otherwise, great to see more stuff like that in more languages (Kotlin, here).

Edit: Oh, hang on, I missed this bit:

>> more developed tokenizers that understand language

"understand language"? Really? What about "can handle"?


>"understand language"? Really? What about "can handle"?

I would definitely qualify this as "picking at nits" haha :)


And thanks, "Bagging" was the term I was looking for. I'll update the post!


Would anybody explain me what is the takeaway from this approach?

When I run the code I see something like: 38/100 = 42.0% - 38.0% accuracy, 90.47619% accuracy of rated data.

90% is nice, but it looks to me like a tradeoff between precision and recall, where we just label those samples with the highest probability of pos/neg...


Sorry for the confusion.

The primary take away was meant to be that a clustered Bayesian classifier (each trained on random samples of training data) offers significant improvements over traditional single Bayesian classifiers. Nothing revolutionary, just wanted to share a practical example that out performs most Bayesian implementations I have came across. I also wanted to share the results of model pruning and that tokenization matters.

I originally coded this without much intent on sharing, so the post write-up could probably use a lot of work. :)


This idea seems similar to local linear regression [1] in a way: approximate a non-linear model with multiple linear models that are fit to different data selections.

Neat idea wrt. using a Naive Bayes classifier here!

[1] https://en.wikipedia.org/wiki/Local_regression


Interesting, I actually implemented Local Regression from that very Wiki page! https://github.com/kennycason/ml/tree/master/src/main/java/k...

This idea originally came to me after explaining Random Forests to someone as a way to improve their Decision Tree. On a whim, I built out a prototype and it offered improved results. :)


Thank you!

How does it compare to the state of the art methods that used those datasets? Is it better or comparable?


I guess I didn't completely answer. If you force the classifier to rate 100% of the data, I expect ~80%.

If you set a reasonable confidence threshold, I expect mid 80s to low 90s.

In both cases, the clustered Bayesian classifier outperforms the single classifier.


My best runs using our proprietary tokenizers and much more training data (tens of thousands of tweets, reviews, and open source training data), yield results of: 87-93% accuracy while rating 50-70% of the data. I.e. If the classifier isn't "confident" then it doesn't rate the data.

I'll run a quick test to demonstrate it's absolute accuracy. Though it definitely performs more than well enough to use in production, and is relatively little effort. It especially performs well in aggregate stats (>99% accuracy)


My favorite finding was the clustering of Bayesian classifiers all trained on random samples of data resulted in significant improvement. (Think Decision Tree -> Random Forest)


Or you could think of each NB classifier as a convolutional filter, followed by maxout / selection by voting at the top.


Perhaps the most important thing I took away from this write-up is the motivation: even though state of art NNs can perform better, they are impossible to debug for humans. Bayesian methods like the decision tree allow humans to interpret the underlying model, and so are often more useful in practice.


Thanks. Being a huge NN fan, I have to say I have been surprised of the number of cases where Bayesian classification have proven useful. :)


I never know exactly what they're referring to when they talk about "accuracy" in these kinds of papers: what was the precision and recall across the various categories (sentiments?) into which we are classifying the data?


That is a good question. Typically accuracy for something like this is "what percentage of the test data does the model accurately predict". In this example, the model is not punished for "not rating" text, and that is captured in the "percent rated" column. This example only demonstrates two classes of sentiment, positive & negative.

The main purpose is to demonstrate techniques to gain improvements over traditional/common Bayesian methods.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: