Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, a 2002 paper from Peter D. Turney of Canada's National Research Council, was referenced by the Semantic Analysis paper I read a few weeks back.
You may recall that I quite enjoyed that paper, and I felt like I was getting a fair amount out of it. However, I feel that about all this paper really did for me was point me toward other papers that may contain more real information.
The algorithm proposed is interesting, in that it looks at adjectives and adverbs to determine tone, but attempts to assign context by taking the next word, since the context of a adjective can determine if it's positive or negative sometimes. This measurement is done based on a statistical ratio weighting on probabilities that words co-occur.
This idea of phrase dependence on the adverb/adjective does appear to carry some success. On non-film reviews, the algorithm would generally be able to guess over 70% of the time if the review was positive or negative. Even with film, it was right 60% of the time, only weakened by the fact that almost every movie carries with it good and bad that people talk about in the same review.
I'm not entirely sure how he seeded the values used to generate the Semantic Orientation scores, which is probably why I'm not terribly into this paper. Those scores appear to hold the majority of sway over the outcome of the algorithm, and I don't know where they came from.
The basic finding, that you get a significant boost to the accuracy of your result by considering the word following the adjective or adverb, is significant, but for some reason this paper really failed to grab my attention with the possibilities it raised. Perhaps I just need to read more of the work it's based on.
Next Week: On sequential Monte Carlo sampling methods for Bayesian filtering