Analying your customer feedback using ChatGPT shown with an abstract cube surround with feedback elements — Complex customer feedback analyzed by large language models

How to analyze your customer feedback using ChatGPT

23 Nov 2023

When my kids' school sent me a survey with 15 open-ended questions, I volunteered to help them analyze the data. I was curious what other parents said, but I also wanted to test ChatGPT on this feedback. In this blog post, I’ll share my approach and the prompts used for the analysis. At the end, I’ll summarize the pros and cons of using ChatGPT for analyzing customer feedback and our approach of using Large Language Models at Thematic.

Why I decided to use ChatGPT for analyzing parent feedback

The dataset had 100 responses to 35 survey questions, 10 of which were open-ended. The majority of parents have answered at least two open-ended questions. Some were tied to a score. For example, a 5-scale rating question:

“How strongly do you agree with this statement: Our School allows children to find their place and experience a sense of wellbeing” was followed by “Why do you say that” open text field.

Other questions were independent: “What do you see as areas for improvement at our School?”.

Traditionally, this data is analyzed in huge spreadsheets manually by creating a code frame (a taxonomy of themes) and then categorizing each answer according to this code frame. At first I considered Thematic, an automated solution to discover themes from feedback. But it seemed like an overkill for this school’s needs for two main reasons:

The dataset was small: 100 responses. Thematic becomes useful from 1000 comments or more where you can only review a subset.

The school only needed to analyze the data once and not track progress over weeks, months or quarters
How to analyze customer feedback using ChatGPT

Enter ChatGPT!

I decided to work on a prompt to analyze this feedback to see how well it can create a code frame automatically and to understand any limitations. Below are the step-by-step instructions on how to do this task.

Step 1. Write a prompt to discover and group themes

Finding the best prompt takes a few iterations. The general idea is that the more context you provide the more accurate are the results. But you need to keep the prompt and the data within the same context window. Here are some helpful tips on how to write a feedback analysis prompt:

Write a short intro about your organization and why you run this survey.
Make sure to ask for counts against each theme to understand their importance.
Depending on the question, include the requirement to split feedback into positives or negatives.

Here is the main part of my prompt:

GPT interface of a prompt — I wrote a prompt for ChatGPT to do feedback analysis

Step 2. Add your data below the prompt

Simply copy and paste open text data “as is” into the prompt. You don’t need to clean it. However, if it doesn’t fit into a single context / prompt window, you will need to do it in batches.

Obviously, make sure that there is no private data included.

Step 3. Run ChatGPT and review the themes

You might get satisfactory results straight away! But more common than not, you will need to solve for errors since, unfortunately, ChatGPT will create duplicates of the same themes.

If you have to split your data into batches, the themes will be named differently. You might want shorter names for your themes than the ones I received in my analysis. In this case, you can add that requirement into your prompt.

Here’s how I manually merged themes that mean the same thing for my final report:

Merging initial themes in a more concise table of themes — After ChatGPT provided themes, I reviewed them and made manual refinements

I also reviewed the data to make sure no themes were missed. This step is critical if the accuracy of the analysis matters to you. Most commonly, ChatGPT made the following mistakes:

It latches onto a theme it “understands well” and ignores other themes.
It is unable to combine themes because it lacks language knowledge.
It is unable to combine themes because it lacks context

For example, in our school’s feedback, “Learner-led conferences” and “Parent-teacher interviews” were the same thing. In both cases, the child is updating the parent on their progress in front of a teacher. I don’t expect ChatGPT to know it, but somehow I need to teach it this knowledge. If a person not from our school would be analyzing the data, I would also need to teach them this knowledge.

My favorite part of using ChatGPT to analyze feedback was to verify if the themes were correct. Some themes seemed incorrect because I did not pick them up by scanning the data. So I wanted to see evidence! I simply asked ChatGPT to list relevant comments for that theme. ChatGPT obliged and I was made aware of my bias for missing these themes on my own.

Example of themes being verified with evidence from comments — I asked ChatGPT Assistant to give me relevant comments to give context to the themes 'extension work' and 'advanced levels of teaching'.

But I also noticed occasional mistakes! Mostly it was the themes ChatGPT missed. Thankfully, I could verify the results by reading the data. But the more data you have, the harder it is for ChatGPT to do the analysis! This would have been more difficult if the feedback was split across multiple prompts.

The Benefits and Limitations of analyzing customer feedback using ChatGPT

Here’s a summary of the things I liked and didn’t like about using ChatGPT for analyzing feedback:

✔ ChatGPT was fast! It would have taken me 1 or 2 hours more to analyze the same data manually.

✔ I enjoyed most the fact that I did not need to clean the data. ChatGPT handled typos and spelling mistakes with grace.

✔ I could use the ChatGPT interface to work “with the AI” to validate themes.

However…

✖ ChatGPT did not create any charts. This was all manual work, and occupied the bulk of my 2h of preparing the report for the school. And if I wanted to change what I was reporting on, I would need to re-do the whole process.

✖ I could not segment the results by other survey responses, or let’s say metadata about respondents (e.g. child’s gender, ethnicity, family income). For this survey it wasn’t needed, but for company feedback at scale, you often want to segment by customer value, location etc.

✖ Finally, when we experimented with larger datasets, we found that ChatGPT can handle a maximum of 20 themes and was less accurate the more themes we wanted to discover.

So! If you need to analyze 1000s of feedback comments or more, if you need consistent analysis over time, if you need segment-specific insights, and if you need to share the results with others in the company, ChatGPT will quickly fail your needs.

Conclusion: ChatGPT is a great tool for analyzing feedback in small one-off surveys, but it hits many limits for large-scale analysis and reporting.

Taking customer feedback analysis to the next level:
Six benefits to using Large Language Models with Thematic

At Thematic, we use large language models (LLMs) such as GPT4 with our own algorithms, to make it easier and faster to get specific and reliable answers. This way our customers get the combined benefits of our own AI, of LLMs, and our intuitive platform.

Thematic transforms feedback data from any channel into a consistent format, making it effortless to get a full view of the voice of the customer. You don’t need to do batch analysis. Depending on the setup, the feedback flows directly from your survey provider (or other feedback source) without you noticing.
You can guide our Thematic AI to tailor the themes to fit your organizational structure, bringing in your domain expertise, using our themes editor. Organize the themes any way you like, to drive informed decisions.
Data visualization charts are always provided, whether for drivers on NPS or CSAT (or any other rating questions) or trends over time. You can easily export the visualizations for reporting, saving countless hours of tedious and detailed work.
Thematic builds all the specific and relevant themes found across feedback data. You can use our analysis suite or Answers tool to dig deeper, for more specific insights. ChatGPT, on the other hand, delivers a limited view of issues as it can only handle a relatively small number of themes.
Thematic is GDPR and SOC2 Type II compliant, ensuring we protect your sensitive data and the privacy of your customers. Depending on how you use GPT, you might violate data privacy rules in your organizations.
With LLMs infused throughout our platform, Thematic is now even easier to use. Ask a question, read the summaries and request the verbatims. Whether you have a complex predictive analysis question or a simpler theme volume query, it’s seamless to get insights.

For a quick overview of how Thematic - now infused with LLMs - meets an organization’s needs when it comes to analyzing larger volumes of customer feedback, check out the table below:

Analysis needs	Why LLMs can't deliver	How Thematic delivers
Transparency is key. I need to trust the analysis quality.	Probabilistic black box.	Our AI and User Interface make it easy to view what makes up a theme’s quality.
Flexibility is key I need to guide the AI on our company’s point of view and refine analysis.	The prompt engineering is complex and specific.	It's simple to fine tune themes with a drag and drop UI.
Consistency is key. I need to deliver reliable and trusted results.	Delivers different output each time. As data increases, inaccuracies increase.	The results are reliable. You get consistent results, unless you request the AI to deliver a new lens. Sends only relevant feedback data into LLMs, for cost effective and accurate results.
Granularity is key. I need to manage 100s themes.	Theme range limited to 20.	Theme range can go beyond the 1000s.
Analytics tools are key. To analyze the data to inform business decisions.	Analyzes text feedback to deliver a list of tags, describing the themes.	Analyzes text feedback, to deliver a taxonomy of relevant themes, along with sentiment analysis. Shows the feedback, themes and sentiment in context. Analyzes themes with other customer variables (such as customer region, product usage).

To summarize, there are many gotchas when implementing Generative AI for feedback analysis in-house. The biggest hurdle will be hallucinations, inconsistent and varying results. Apart from making decisions based on incorrect data, this can undermine the value of unstructured feedback in your company.

Dive deeper into Thematic with LLMs

You’ve just read some of the key insights about using GPT to analyze feedback data, but we’ve just scratched the surface in this post.

We have more insights that can help you in your role. Whether you want details into how to improve the insights for CX teams or more commentary on how Thematic uses LLMS.

FAQs

What is Generative AI?

Generative AI describes a broad type of algorithm that can generate new forms of creative content. The AI technology arises from Large Language models.

What is a Large Language Model (LLM)

An LLM is a powerful machine learning model that can process and identify complex relationships in natural language, understand user questions and generate text. These models rely on techniques like deep learning and neural networks. Defined as natural language-processing AI models, LLMs are trained on massive amounts of text data.

How is Thematic’s LLM output customized to handle a specific customer context?

We fine-tune the output of LLMs for our clients by passing only their high quality context into the prompt. This context data originates from Thematic's own AI after guidance from a human expert, often with input from the customer’s organization.

How can I verify the results in Thematic?

We prioritize responsible use of Generative AI with transparency, empowering users to verify results easily. Every summary includes links to the original data, where they can review how our foundational AI reached the result and to get context. Users should check the original data for accuracy and relevancy before using the summary to influence a decision.

How can I make sure my data is not leaked to 3rd-party by sending it to Generative AI models?

We use Azure’s service as it provide enterprise grade security and data protection. In addition, we review agreements and dataflow to ensure that no training/logging etc is used by the provider. We also log and monitor all uses of Generative AI at Thematic.

Can I use Generative AI to analyze large volumes of feedback in-house?

You can easily analyze small subsets of feedback using Generative AI. Make sure that no private data is passed onto the model and that your use is compliant with your company’s policy.

For larger datasets and tracking progress over time things become tricky. An AI specialist could design an in-house solution that sends all data to a Generative AI model in batches. In addition to compliance and data privacy, you’ll need to consider three aspects:

Can we trust these results? LLMs are prone to hallucinations and inconsistencies. At Thematic, we pre-analyze the data using our proprietary AI and themes verified by people. Our prompt sequences are designed to guarantee accuracy. Thematic also makes the underlying data easily accessible for verification.
Can we afford this? Naive applications lead to huge bills with LLM providers or hosting solutions. In addition, great AI experts are difficult to find, hire and command high annual salaries.
Is this a complete solution? Thematic’s visualizations and dashboards instantly show answers to common questions from customer feedback. We track it over time producing consistent reporting and creating alignment across teams. Creating this in-house is possible with time/effort, but even the most engineering-focused orgs like Atlassian and LinkedIn chose to buy vs. build.

Alyona Medelyan PhD Twitter

Alyona has a PhD in NLP and Machine Learning. Her peer-reviewed articles have been cited by over 2600 academics. Her love of writing comes from years of PhD research.

How to analyze your customer feedback using ChatGPT

Why I decided to use ChatGPT for analyzing parent feedback