Machine learning makes a goldmine of your survey comments

Author - Søren Smit. Director

Imagine if your company could systematically - and at lightning speed – distill out the essence of thousands of comments written in connection with surveys, on social media, Trustpilot, Glassdoor etc. In addition, do it systematically – weekly or monthly for example – and across comments provided in many different languages. What are people talking about? Are there any new issues that you need to address? And are people talking less about certain issues after you have set activities in motion?


If you work with surveys, this will certainly be familiar to you. You have received the results from the latest survey with a whole string of accompanying comments that busy people have taken the time to write.

And these comments are valuable – they breathe life into the many KPIs. They get directly to the point, and this is often where you will find out why things are going well or badly.

Hard to get a clear view of comments

The problem is that there are often thousands of comments, and some of them are very lengthy. You cannot read all of them – let alone identify their essence and common thread. And you might have comments that are written in a language that you do not understand.

Historically, most surveys have consisted of 99 per cent closed questions, where the respondent answers on a scale of e.g. 1-10. There may have been one open qualitative question – two at the most!

The reason for this is that we have had the statistical tools for processing the large volumes of quantitative data, which has allowed us to report KPIs. Techniques for processing large numbers of open comments have been non-existent.

More people are discovering the value of qualitative insights

At Ennova, we see a trend towards more and more qualitative questions in surveys. In fact, some companies now primarily want open comment questions in their surveys.

Two factors in particular have driven this trend:

  1. Many are realizing the need for a more balanced understanding of their employees, with small/thick data on one side and big data on the other
  2. We are beginning to get technologies that can process large numbers of comments and produce acceptable results

The first is most welcome after a prolonged period where big data has been considered the answer to almost every problem. Time has shown that although large data volumes really can generate a vast number of new and exciting insights, we often lack an understanding of the causes. We have been able to answer what and how, but not why.

This is where the other factor, in the form of increasingly mature technology, comes in. The technologies available on the market, with ‘Text analytics’ belonging under hyped terms such as ‘Machine Learning’ and ‘Artificial intelligence’, have come a long way. However, those of us who use and develop these technologies are still working hard to improve and refine them.

Some of the elements that influence the quality of the analyses are:

  • Spelling errors, humor, irony, sarcasm and such like
  • The wording of the questions
  • The ability to translate the comments mechanically

In the following, I will elaborate on these three elements.

Dealing with spelling errors and humor

The comments often contain spelling mistakes, which make it difficult for the systems to process the texts. This is solved relatively easily by using spell checks before translation and analysis. However, it is much more difficult to deal with irony, sarcasm and humor. These elements usually vary across cultures and countries and require a fundamental understanding of the specific context of the respondent.

Because cultural understanding is a variable that lies outside the text itself, it becomes enormously complex. Just think about how quickly communication can become unclear on social media, where we cannot sense facial expression and tone of voice.

When it comes to machine decoding irony, sarcasm and humor, we are unfortunately still a long way off.

The quality of the question is decisive

The next challenge is the actual wording of the open question.

The quality of the question is essential, though extremely undervalued. After all, the question is the entire basis of the analysis that you subsequently perform on all the comments.

Most companies ask a question such as ‘If you have anything to add, you can write it here’ or ‘Please explain why you would recommend/not recommend the company’.

This is not particularly engaging or motivating for a customer to spend time on.

At Ennova, we have experimented with other types of qualitative questions with a particular focus on giving more edge.

One example that has produced a good response has been “If you became CEO of this company tomorrow, what are the first three things you would do?”

The advantage with this question is that the respondent can relate to a specific situation, even though it is of course imaginary. This makes it easier to come up with something to write. Another advantage is that the question is not reduced to a follow-up question. Instead, it stands in its own right. In effect, the question feels mandatory and the respondent will be more inclined to answer it.

The drawback is that we are restricting what the respondent can comment on and as a result, we lose certain types of comments. However, the question is still open for many different comments.

If you really want to work with high-quality analysis of comments in surveys, you need to think carefully about the question you are asking – just as you (hopefully) already do with your scale-based questions.

Comments in different languages

The next issue is to translate all the comments into the same language in a quality good enough to analyze them together.

At Ennova, we have developed an automated text analysis tool in a version 1.0. We have used this in global employee surveys, and we are starting to use it for selected customer satisfaction projects. We are able to classify on average over 75 per cent of all comments on various subjects and to determine whether a comment is positively or negatively loaded, aided by a deep-learning algorithm. This provides an opportunity of quickly digging into thousands of comments in different languages.

The product is in continuous development, and we will probably never get an exact answer as we can with quantitatively collected KPIs. This is an automated approach, where the point is to reveal the broad outlines. Therefore, it still makes perfectly good sense to read a good part of the comments manually, as you get a true sense of the customer’s voice. You must not undervalue this.

Automated analysis however provides a solid basis for understanding how the comments you are reading through fit in with the recurring themes in your survey. This gives you an invaluable key for finding the treasure buried in the many expressions.

Søren Smit. Director

Søren Smit. Director

Søren Smit is passionate about employee and customer experience. He is head of Ennova’s EX and CX business development. He has worked for over 10 years with customer experience as a director and as the person responsible for establishing a data- and analysis-driven CX culture as well as a transformation of the customer experience at TDC Group.