Topic Modeling for Text Analysis: The Hype vs. Reality (Part 4/5)

Last updated:

August 29, 2018

Alyona Medelyan PhD

Insights

Feedback Analytics 101

Topic Modeling for Text Analysis: The Hype vs. Reality (Part 4/5)

Contents

Heading 2

While you're here

Download Free Guide

TLDR

This is the 4th article in my series of Text Analytics posts explaining popular approaches to feedback analysis. Last week, we talked about text categorization, a Machine Learning approach that requires training data. We concluded that it can’t detect emerging themes in people’s feedback and that it’s only as accurate as the supplied training data.

Today, we’ll discuss topic modeling, also a Machine Learning approach, but an unsupervised one, which means that this approach learns from raw text. Sounds exciting, right?

Occasionally, I hear insights professionals refer to any Machine Learning approach as “topic modeling”, but data scientists usually mean a specific algorithm when they say topic modelling. It’s called LDA, an acronym for the tongue-twisting Latent Dirichlet Allocation. It’s an elegant mathematical model of language that captures topics (lists of similar words) and how they span across various texts.

📌

Here's a quick link to the entire series.

1.Word Spotting for Text Analytics: Quick & Dirty (But When Does It Fail?)

2. Manual Rules for Text Analytics: Why They Often Miss the Mark

3. Text Categorization Demystified: Does It Really Deliver on Advanced Analytics?

👉 Topic Modeling for Text Analysis: The Hype vs. Reality

5. Thematic Analysis for Feedback: The Secret Weapon Most Companies Miss

Example of topic modeling in action

Here is an example of applying topic modeling to beer reviews:

The input are reviews of various beers

A topic is a collection of similar words like coffee, dark, chocolate, black, espresso

Each review is assigned a list of topics. In this example, The Kernel Export stout London has 4 topics assigned to it.

The topics can also be weighted. For example, a customer comment like “your customer support is awful, please get a phone number”, could have weights and topics as following:

40% support, service, staff

30% bad, poor, awful

28% number, phone, email, call

What’s great about topic modeling

The best thing about topic modeling is that it needs no input other than the raw customer feedback. As mentioned, unlike text categorization, it’s unsupervised. In simple words, the learning happens by observing which words appear alongside other words in which reviews, and capturing this information using probability statistics. If you are into maths, you will love the concept, explained thoroughly in the corresponding Wikipedia article, and if those formulas are a bit too much, I recommend Joyce Xu’s explanation.

There are Text Analytics startups that use topic modeling to provide analysis of feedback and other text datasets. Other companies, like StitchFix for example, use topic modelling to drive product recommendations. They extended traditional topic modelling with a Deep Learning technique called word embeddings. It allows to capture semantics in a more accurate way (more on this in our Part 5).

Why is topic modeling an inadequate technique for feedback analysis

When used for feedback analysis, topic modeling has one main disadvantage:

The meaning of the topics is really difficult to interpret

Each topic does capture some aspect of language, but in a non-transparent algorithmic way, which is different from how people understand language.

Any data scientist can put together a solution using public libraries that can quickly spit out a somewhat meaningful output. However, turning this output into charts and graphs that can underpin business decisions is hard. Monitoring how a particular topic changes over time to establish whether the actions taken are working is even harder.

To sum up, because topic modeling produces results that are hard to interpret, because it lacks transparency just like text categorization algorithms do, I don’t recommend this approach for analysing feedback. However, I stand by the algorithm as one that can capture language properties fairly well, and one that works really well in other tasks that require Natural Language Understanding.

See more Articles

Streams of open-ended customer comments converging into a single measured impact score, with sentiment-colored segments.

Topic Modeling for Text Analysis: The Hype vs. Reality (Part 4/5)

TLDR

Example of topic modeling in action

What’s great about topic modeling

See Thematic in Action

Why is topic modeling an inadequate technique for feedback analysis

Related Articles

How Do You Quantify Qualitative Customer Feedback So Teams Can Act on It With Confidence?

How Accurate Is AI-Powered Customer Feedback Analytics?

What Is the Best Software to Analyze Customer Reviews?

Are There Financial Services Use Cases for AI Feedback Analytics?

Build, Buy or Partner? A Layered Guide to AI Feedback Analytics

Topic Modeling for Text Analysis: The Hype vs. Reality (Part 4/5)

TLDR

Example of topic modeling in action

What’s great about topic modeling

See Thematic in Action

Why is topic modeling an inadequate technique for feedback analysis

Request a demo of Thematic's Customer Intelligence Platform

Related Articles

How Do You Quantify Qualitative Customer Feedback So Teams Can Act on It With Confidence?

How Accurate Is AI-Powered Customer Feedback Analytics?

What Is the Best Software to Analyze Customer Reviews?

Are There Financial Services Use Cases for AI Feedback Analytics?

Build, Buy or Partner? A Layered Guide to AI Feedback Analytics