Text Mining

A word on sentiment analysis

One may say ‘The ending of this movie was ‘unpredictable’, here ‘unpredictable’ is positive. However I cannot include that in my rudimentary list of positive words because its meaning will differ depending on the context. ‘Unpredictable’ is negative in the sentence ‘This car is unpredictable’.

The simplest and most commonly used method (and the one used in this blog) to assess sentiment classifies texts into three general categories; positive, neutral and negative. However it is probably the least accurate method of scoring sentiment there is, this is the reason why I generally modify the lexicons of positive and negative words by either adding or removing words which increases the accuracy to a certain extend.

It is inaccurate as it regards text as a ‘bag of words’ not even taking into account the order in which they appear and ignoring the context this is why it is unable to deal with such things as sarcasm. For instance the sentence ‘We all love to wait in line for hours …’ would be understood as positive by the algorithm due to keyword ‘love’ being listed in the lexicon of positive words. Moreover the more complex the opinion the less accurate it is;

– ‘I hate this lobster’ – negative

– ‘I love this laptop’ – positive

– ‘I liked the lobster but the potato salad was disgusting‘ – ???

The truth is that the last given example cannot really be classified into any of the three categories (positive, negative and neutral). It includes multiple sentiments and regards various aspects of a lobster. It is a sophisticated piece of language which could perhaps be classified using algorithms that go ‘beyond polarity’ and allow classifying text into emotional stages such as ‘sad’ or ‘angry’. Most sentiment analysis carried by businesses and software providers do not allow this. Even then, I do not think this example should be linked to any particular sentiment – it should be seen as objective .It is one of the many reasons why I mainly look at what is being said on Twitter. The 140-character-limit makes tweets rather straightforward and allow this technique to be relatively accurate.

Being based on keywords, this method requires the keywords to be present … obviously. But it is a greater obstacle than it looks. In the previous example I used the sentence ‘I hate this lobster’ but what if it read ‘I do not like this lobster’? The algorithm would probably classify it as neutral, because no word in particular appear to be negative. Hence why the number online comments classified as neutral appear to be so high.

I am not selling this really … but it is not as inaccurate as you might think. Modifying the positive and negative lexicons to suit the context better can greatly increase the accuracy and give a general idea of the sentiment expressed towards a particular topic. When done properly very few comments are classified as being of the opposite polarity. Or in other words negative as positive and vice versa. They might, however, be classified as neutral. Still, the overall sentiment score will give a relatively accurate picture of the overall sentiment.

Another issue with classifying sentiment comes from human nature. It often depends on perspective. Take for instance conversations regarding ‘abortion’, depending on your point of view ‘pro-life’ will either be positive or negative.

Some businesses like WiseWindow focused on increasing the accuracy of sentiment analysis in order to predict changes in the stock market which it apparently did with great success. The theory being that the sentiment can be used as a leading indicator as you would expect it to drop before the stock price does as consumers complain, etc.

The main problem I have with sentiment analysis is that it is too often misused. There is so much more information included in text. Yet many solely rely o nthe sentiment analysis – without even know what the negative sentiment is about.

It is not so much how somnething is talked about that matters but rather what is talked about



No comments yet.


Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: