Quantifying Subjective Text with an Excel-Based Sentiment Analysis Tool

Sentiment Analysis is a growing field in the world of BI and Data Science. With social media serving up millions of data points by way of comments, tweets, and status updates, it’s becoming increasingly important to be able to take all that subjective text and turn it into quantitative data that can be integrated with other metrics. That is essentially what Sentiment Analysis is – a term that describes the exercise of quantifying subjective text by analyzing certain keywords deemed to be either positively or negatively charged.

.

How Sentiment Analysis Works

In its simplest form, a Sentiment Analysis algorithm can be nothing more than determining if a text string has more positive keywords than negative keywords. Text with more positive words is classified as positive content, while

text with more negative keywords it gets classified as negative content. Some Sentiment Analysis algorithms go far beyond the simple keyword-based methodology, delving into idioms, colloquialisms, language patterns, sentiment shifter words, and the list goes on.

.

Regardless of complexity, all Sentiment Analysis tools apply some sort of scoring system to findings. A typical scoring system looks something like this:

  • (+2) very positive
  • (+1) somewhat positive
  • (0) neutral
  • (-1) somewhat negative
  • (-2) very negative

If a given comment or tweet is evaluated and considered to be very positive in sentiment, then that text string will be given a +2. For a text string that is considered neutral, a 0 would be applied. The idea being that if 5,000 comments or tweets are classified, a picture would start forming about the general attitude about a particular topic.

.

You can do a search for “on-line sentiment analysis” and you’ll get a dozen websites that let you test drive their model. Most are trying to sell their wares (like I said, it’s becoming big business). But one educational site will let you play around with their Text2Sentiment API for free and without an API key. If you’re interested, go to www.datasciencetoolkit.org and scroll down to their Text to Sentiment tool.

.

.

An Excel-Based Sentiment Analysis Tool

After playing around with this for a while, I decided to try and roll my own Sentiment Analysis tool with Excel and VBA.

You can download it here and take a look as you follow along.

.

When you open the workbook, you’ll see some sample comments I pulled from a Hotel chain’s Facebook feed. The score next to the comment is generated by a Function called GetSentimentValue. This Function returns a score between -5 and 5. A Low score indicates that there were words associated with dissatisfaction. A high score shows that there were one or more ‘positive’ words that occur in the comment. A score of zero either means that no charged words were found, or that the positive and negative words balanced themselves out. I applied Conditional Formatting manually. You can imagine that if you had hundreds or thousands of comments, you could easily apply this function and do things like identify the negative comments, apply pivot tables, create histograms, build charts, etc. The point is that the text in your comments just went from being highly unusable to being a relatively useful data point.

.

.

Change any comment and the GetSentimentValue function will adjust the score to reflect the new sentiment of that comment.

.

.

The GetSentimentValue function applies a very basic algorithm (the same one used by the folks at datasciencetoolkit.org). It uses a word for word keyword approach, parsing each word in the text, determining if each word is a positive or negative keyword, and assigning a score to each word. It then averages all of the found keywords to get to a final score. For example, “Great trip. Thanks for the warm welcome” gets a score of +2 because “great” has a weight of +3, and “warm” gives +1, so the average is +2.

.

Understanding the Keyword List

The keyword list and scoring system used in the function comes from the AFINN-111 affective lexicon developed by Finn Arup Nielsen. That list is stored in the VBA module and exposed through a Collection. For ease of editing, I split the keyword list into several Private Subs. This way, if you want to add, edit, or remove a word, you can do so directly in the code.

.

The list of keywords and terms used in a Sentiment Analysis is quite important. No list will cover all needs, so most folks using Sentiment Analysis find that they frequently need to adjust their keywords to account for the types of comments they encounter. This is called “training the model”. It’s frankly the primary drawback to using a Sentiment Analysis API service instead of a local tool. With an API service, you have no way to adjust the model. For example, imagine you worked at the electronics superstore Best Buy. In most Sentiment Analysis models, the word “best” is positively charged. So you can imagine how many false positives you would get as a result of people including your company’s name into comments. You would definitely want a way to train the model to handle the word “best” in a particular way.

.

Scenarios to Be Aware of When Relying on This and Other Sentiment Analysis Tool

  • Keep in mind that the algorithm I chose is the simplest (laziest) one I could get away with. Each word is evaluated separate from other words. Many Sentiment Analysis tools found on-line use the same kind of algorithm. This means key sentiments made up of two or more words are missed. For example, these sentiments aren’t caught: “white wash”, “checked out”, “no go”, and “won’t work”. Also, there is no accounting for common phrases that represent sentiment. For instance, “I’m over the moon” or “customer for life”.
  • Most Sentiment Analysis tools have no way to capture context. A positive or negative sentiment word can have the opposite connotation depending on context (e.g. “This is a great hotel, if you like the smell of urine”)
  • Sentiment ambiguity is a real problem for most tools. Text with sentiment ambiguity is often classified as neutral. For example, “The rooms at this hotel are priced higher than others in the area” doesn’t contain any charged words, although it’s clearly a negative comment.
  • Sarcastic comments are often misclassified in most Sentiment Analysis tools. (e.g. “I called to move my prepaid reservation because my wife went into labor; I was told that is our problem. Wonderful. Such lovely customer service.”).
  • Slang needs to be taken into account when “training the model”. For example, “sick”, or “bananas” can have a positive or negative sentiment depending on the context.

..

Enjoy playing with the workbook. I’d love to share an enhanced version should any gurus out there want to take a shot at improving this thing.

10 thoughts on “Quantifying Subjective Text with an Excel-Based Sentiment Analysis Tool

  1. Vit

    An important issue to be aware of is industry-specific words, similar to the BestBuy example. E.g. the algorithm assigns 1 point to the word “warm”. I ran this on some Newegg motherboard reviews. Obviously, the word “warm” is not a desirable adjective for electronics, yet a review mentioning it gets a point. I imagine there may be other such examples. Ideally a dictionary should be context-specific. I imagine this could be customised a bit by having the ability to load/import a specific dictionary?

  2. datapig Post author

    Vit: You bring up another example why one would not want to blindly use an online API service.

    A dictionary import function is a great idea for this tool.

    In this version of the Excel tool, you would have to find the word “warm” in the code and change the scoring for that word.

  3. balthamossa2b

    Another point to take into account: typos and people lazily writing in a SMS-like fashion.

  4. Jeff Weir

    You turned a potentially traumatic subject into something that was not abhorrent. Far from feeling terrorized or wrathful I could not stop reading – even ignoring my worsening urge to go to the toilet. 😉

  5. Pedro Paulo

    Great post. I’ve been 2 weeks in your blog and love it. Now I’m a daily user. Congratulations for the work. Keep posting!

  6. Ghazanfar Abidi

    Fascinating. I like to picture the scene where you might have been brainstorming words to populate the PopulateExtremelyNegative() sub with.. for science!

  7. Rich

    So I felt compelled to score Jeff Weir’s comment for sentiment.

    Jeff, -2.25 – why so much hate! 🙂

    This is actually a very interesting piece, and as with most averages, if you look at the total, you are more likely to get a more accurate “general” sentiment. You just obviously have to take it with a pinch of salt.

  8. Jess

    This actually just saved my life for a school project. Credit given! Thank you so much, this was extremely informative.

  9. Aliyudin Muhamad

    Hello.

    I just tried these tool and look so awesome.

    Unfortunately, if the machine found word such as “not”, “don’t, “can’t” it can’t detect them.

    The result, if the machine found a phrase like ” I’m not satisfied “, resulting the score is 3 (Positive) instead of it should be (Negative) phrase.

    Any clue to solve those kind of problem ?

    I’m so amatuer on Programming, so I find the way to solve it ang got ZERO result.

Leave a Reply

Your email address will not be published. Required fields are marked *