7 NLP must-haves for customer feedback analysis
If you missed our presentation at the Sentiment Analysis Symposium in New York last July, read on to see it in full with accompanying slide notes.
Transcript
Slide 1
Today I want to talk about customer feedback analysis. We all here agree that sentiment analysis plays an important role in understanding customer feedback. But I found there is a disconnect to what’s actually happening in the industry.
Slide 2
If you google ‘Customer Feedback Analysis software’, what you find is an overview of tools that collect people’s scores and then presenting them as pretty dashboards. Or here are the answers from Quora on ‘What’s the best customer analysis tool’. Most focus on scores and not people’s comments.
Slide 3
And sure, if you are a consumer, a quick summary of competitors by score may be all you need. For example, to find the best restaurant. But as the owner of a poor restaurant with a 3 score rating, how would you know what do? Would you rather have 100 scores or 10 customer comments on why they gave you that score?
Slide 4
We found that comments are quite important to customer insight professionals and this is how they use them.
Slide 5
Comments that change over time, with scores, are particularly valuable. They can explain why the scores rise and drop, and if scores stay the same, provide a richer insight.
Slide 6
By looking deeper into the comments, you can find out who should be following up with the customer. Imagine for example capturing all people who want to cancel a service.
Slide 7
And also, if you have done any changes to your offering, for example, use a new recipe, did that actually get noticed and affected the score.
To summarize, applying NLP on people’s comments helps to get a deeper insight and get to the action of improving customer experience faster.
Slide 8
My background is in NLP but over the past 2 years we’ve spent a lot of time talking to customer insight team. I noticed that many current NLP solutions do not actually provide functionality that matters to them. Therefore, today I would like to share with you the needs that we discovered while building our NLP solution at Thematic. We may not have cracked all of them yet, but we do believe that they are Must-Haves.
If you own an NLP solution to CX or plan to build one, feel free to use the Must-Haves as a guide.
If you are looking to buy a solution, or implement one using open-source, send me an email and I will share with you a report that we found valuable while evaluating different options.
Slide 9, Must-Have 1
The first Must-Have is about capturing many ways people may be referring to the same thing.
Slide 10
Imagine you have paid for a newspaper delivered to your door. It rained. As you are unsticking the wet pages you are frustrated that you cannot read it. How many ways do you think there are to complain about a wet newspaper?
Slide 11
There are dozens of possibilities! And if an NLP solution cannot capture them accurately, the importance of this issue may be misrepresented. Many solutions out there use industry dictionaries or worse WordNet. But customer comments are messy and synonyms will be specific to your business. For example, ‘paper’ and ‘newspaper’ is rarely a synonym pair outside of publishing. And we found that ‘build’ and ‘buy’ could be either synonyms or antonyms depending on the context: real estate or software.
Slide 12
At Thematic, we learn synonyms from the data itself. And once, we came across an unusual, and at the first glance incorrect pair. Airport is the frequent flyer currency of AirNZ, the airport is usually a very different thing. After examining the results closely we found that the system was right. Autocomplete did not know about ‘airpoint’ and autocorrected it to ‘airport’, which meant that this was a dataset specific synonym pair.
Slide 13
This is why one size will not fit all.
Slide 14, Must-Have 2
While you need to capture many different ways people are talking about the same thing, when it comes to attributes, e.g. good coffee/bad coffee, often Customer Insight professionals prefer if you capture them separately. This may be relatively easy if the attributes are clear antonyms, e.g. ‘fast service’ vs. ‘slow service’. But negation makes everything much harder.
Slide 15
Here is an actual example from manual categories chosen by a human tagger. An NLP system for customer feedback analysis should ideally be able to capture that the two sentences while using the same nouns and adjectives actually should be categorized differently.
Slide 16
Most NLP solutions do not deal with negations. Those who do, simply reverse polarity: did not like = dislike. But there are other purposes, like the emphasis: nothing I did not like means loved it. Or making a weaker claim. So ‘not bad’ does not necessarily means ‘good’, most likely it means a rather neutral statement.
When dealing with negation, parsing will help determine its focus and scope. But the next step is to actually merge negated statements with non-negated ones correctly. For this, you’ll need some sort of antonym detection. Only then, a solution can help accurately determine how many people liked or disliked a certain aspect of the business.
Slide 17
This is why one size will not fit all.
Slide 18, Must-Have 3
A common approach to summarizing feedback, even when done manually, is to use a static set of categories or themes. The first problem with this is that it reflects the bias of the person who created them. The second problem is that it is, well, static. It’s the nature of doing a business that there are always changes. There may be changes in pricing structure or in competition. If you want to capture people’s reaction to these changes, you need a solution where themes can emerge over time.
Slide 19
If you do not do this, and let’s say use supervised categorization, over time, what can happen is that you end up with a very large ‘Other’ category because comments would not fit into any of the pre-defined ones. You will always have people commenting on things that are different to others. But as a rule of thumb, your ‘other’ category should not be more than 20%. This is an actual example from one company’s data we worked with, where we helped them reduce ‘Other’ to 8% compared to a 54% of a home-baked code.
Slide 20
This is why the ideal solution should allow for themes to emerge from data, instead of being pre-defined!
Slide 21, Must-Have 4
My next NLP Must-Have is about the necessity of having a clear link to the original comment. Context is king, as they say, and without context it is hard to interpret, understand and act upon the results. I have seen several NLP solutions that do not provide that option.
Slide 22
Verification can be painful. Thematic once was tested against a human coder Kate. We identified that one of the key things students wished was improved at a university was the quality of food. Kate found the same issue, but at much lower frequency. By being able to pull out all comments on this topic, we verified them, and found that Kate was tagging only key issues in each comment, whereas we tagged all of them. As a result, the university could act upon this problem and increase student satisfaction simply by improving the situation with food.
Slide 23, Must-Have 5
Transparency in how the algorithm came to particular results is also important because only then we can give somebody like Kate a chance to work with an algorithm to benefit from both of their strengths. Kate knows the domain, what’s important to track and what can be ignored.
Slide 24
Sometimes there is a wrong and right answer. For example, soccer world cup is in many countries the same as football world cup. But in other cases, it depends on the customer’s priorities whether they want to track rugby world cup separately from soccer/football world cup, or as the same thing. And they need to be able to make changes to how the system decided to do the grouping.
Slide 25, Must-Have 6
Small datasets are a big pain for data-driven algorithms. You can’t build a language model on Wikipedia or IMDB reviews, because words mean different things in different context. And a model built on a small dataset won’t work.
Slide 26
Current solutions are: create industry-specific rules, which are prone to ambiguity, or pre-defined static categories, which fail to capture emerging themes. The only possible solution is to repurpose data from different clients, or get creative.
Slide 27
At Thematic, we get creative quite often. One of our customers is a DJ software company Serato. They have thousands of users, but only get a few hundred of short comments per month. So to help them, we built a language model from their community forums, that turned out to have millions of threads, and learned about things like processors, controllers, playback etc.
Slide 28, Must-Have 7
Finally, the result of NLP analysis should provide information that’s not trivial and easy to act on. Let’s say, an NLP system analyzed 500 comments of a software company and returned that key categories like ‘product’, ‘customer service’, and the name of the company. This is not insightful. Similarly, knowing that customer service has poor sentiment is not actionable.
Slide 29
Keeping this in mind, NLP solutions can be evaluated according to this diagram. On the one axis we have language knowledge categorized by how actionable it is. On the other axis, we have trivial, suspected, but needed to verify using data, and finally new insightful knowledge. For example, we can easily guess which words will repeat in customer comments. These words will have zero meaning. 90% of NLP solutions that I’ve seen in the market capture general aspect of what’s in the comment and do not return any actionable results. Ideally, an NLP solution should return a mixture of themes, some of which should be insightful and actionable. Perhaps, only customer insight managers can judge if something is an insight to them or not, but in general, this is where we want to be.
Slide 30
Coming back to our diagram from the beginning of this talk, the correct answer is ‘New product feature’. If the NLP solution works correctly, as you are moving from one month to another, you should be able to see a change in the trending themes for that month. In this particular case, the trending keyword was ‘hard to read’, and the company fixed it by changing the font in the UI.
Slide 31
Here they are again. If I have missed something or you disagree, let’s discuss!
If you would like a report comparing different NLP methods against these Must-Haves, please send me an email.