Text Mining The First Presidential Debate: An Overview Of Text Analysis
Text is a rich source of insight across multiple domains—employee survey comments, customer feedback, product reviews, and more. With the proliferation of online platforms for collecting such feedback, it has become even more imperative for businesses to be able to make use of this unstructured data. While text mining has traditionally been a highly manual, laborious process, new developments in natural language processing (NLP) tools, with the application of artificial intelligence (AI) / machine learning (ML), have lightened the analytic load.
To demonstrate how far text analytics has come, we uploaded a transcript of the first U.S. presidential election debate of 2020 into the Perceptyx platform. Our customers have the benefit of using this platform each day to make sense of their people data and drive better business decisions.
Analyzing The 2020 Debate: Text Mining Using the Perceptyx Text Analysis Tool
We prepared the transcript data as if it were a survey, and set it up in our reporting system. Our platform uses sentiment analysis—AI/ML detection of sentiments in text data—along with theme detection, to efficiently analyze comments. Sentiment analysis uses deep learning algorithms to review text data and assign a sentiment to each comment (i.e. positive, neutral, or negative). Theme detection uses keywords and phrases to detect the theme(s) mentioned in each comment. Thus, rather than having to read every single comment in order to understand the issues, one can quickly get the big picture of the issues that come up, and how people feel about the issues.
We now turn to analyzing the debate data using Perceptyx’s Comments Report tool. There are six tabs in the Comments Report:
- All Comments
- Thematic Analysis
- Word Cloud
- Word Frequency
- Hot Spot Report
- Crosstab Report
The text analysis tool also allows users to filter data by demographic variables, sentiments, themes, questions, and keywords/phrases.
We recommend first examining the themes present in the data. The Thematic Analysis tab offers a quick, bird’s eye view of the main issues present in the data, providing users with the results for each theme detected in the analysis. Each theme will have aligned to it the total number of comments mentioning that theme, the percentage of comments in which that theme is mentioned, and the distribution of sentiments aligned to that theme. Users can also create and edit themes in this tab.
In the examples below, data was filtered by speaker to show the most frequent themes for Trump and Biden, respectively. Trump’s top themes (Figure 1) included mentions of the Democrats, Joe Biden, healthcare, and Covid-19; Biden’s top themes (Figure 2) included integrity, the people, the economy, and Covid-19. One can also uncover the most positive themes and negative themes. In this case, very few themes were mentioned in a positive manner, with the vast majority of themes being negative (Figures 3 & 4).
Figure 1. Trump Thematic Analysis
Figure 2. Biden Thematic Analysis
Figure 3. Debate Themes, Most Positive
Figure 4. Debate Themes, Most Negative
Another recommended way to quickly see what issues are coming up is to review the word cloud of comment themes. The Word Cloud tab provides users with the ability to visually examine the keywords/phrases and themes frequently used in the comments, along with being able to note the sentiments aligned to these.
In the present case, it can be seen that the major themes standing out in the Trump word cloud (Figure 5) included Democrats, Joe Biden, healthcare, and the election process. The main themes showing in the Biden word cloud (Figure 6) were integrity, the people, the economy, and broad appeal.
Figure 5. Trump Word Cloud
Figure 6. Biden Word Cloud
To dig further into these data, we will turn to the Comment Crosstab Report, which provides a crosstab analysis of theme and sentiment data. This tab allows a user to check patterns of differences across their demographic variables through examining sentiments and frequencies of comment themes, sentiments for question items, and sentiments for question items separated by a theme, shaded according to an absolute scale (e.g. 100%-80%).
In the present example, one can quickly detect patterns of agreement and divergence between the candidates and moderator, in regard to patterns of the sentiments for their responses to various issues. Overall, the pattern of responses was overwhelmingly negative for all speakers (Figure 7). However, there were patterns of differences detected as well, such as with Trump speaking more positively about the jobs theme (Figure 8).
Figure 7. Debate Crosstab, Question by Negative Sentiment
Figure 8. Debate Crosstab, Theme by Positive Sentiment
Hot Spot Report
Moving onto further topical analysis, we go to the Comment Hot Spot Report, which provides a quantitative hot spot analysis of theme and sentiment data. This allows users to check for hot spots across their demographic variables by examining sentiments and frequencies of comment themes, sentiments for question items, and sentiments for question items cut by theme.
In the present case, the Comment Hot Spot Report shows that the overall tone of the statements made by the candidates was negative (Figure 9). On specific issues, the candidates both generally spoke in negative terms on many of the issues (Figures 10 & 11)—a surprising level of agreement. There was a strong exception to this pattern on the issue of the economy, which Biden spoke of in highly negative terms and Trump spoke of more positively or neutral (Figure 12).
Figure 9. Hot Spots Overall Sentiment of Debate Comments
Figure 10. Hot Spots Sentiment on Covid-19 Response
Figure 11. Hot Spots Sentiment on Crime
Figure 12. Hot Spots Sentiment on the Economy
After investigating this data at a high level, we can dig further to analyze it at a more granular level. The All Comments tab gives users a way to dig in, reading the exact comments respondents made and seeing the sentiments aligned to those comments.
In the present case, we can see the exact comments from the candidates and moderator; filter the comments by specific individuals; and see the themes and sentiments aligned to them all (Figures 13 & 14). Someone carrying out a systematic review would be able to mark comments as being read, to keep track of where they are in the review process.
Finally, we can see if any particular words or phrases stand out in the data. The Word Frequencies tab allows users to see the number of occurrences for all keywords and phrases in tabular format, in contrast to the word cloud format. As can be seen here, one of the most frequent words or phrases was “crosstalk,” denoting when the candidates and/or moderator were talking over one another (Figure 15).
Figure 13. Trump, All Comments tab, Economy theme
Figure 14. Biden, All Comments tab, Economy theme
Figure 15. Debate Word Frequencies
What Insights Does Your Text Data Hold?
Text analysis has traditionally been difficult for businesses to perform efficiently due to the traditionally manual, time-consuming process for reading and reporting on comments. However, with new advances in the Perceptyx platform, text mining for insights can now be done quickly and easily.
Perceptyx clients are using text analytics to mine their survey data, to help them meet both perennial and new challenges in improving the employee experience. Instead of simply asking if employees like their benefits, theme detection can flag comments related to that topic, along with sentiment, so that reviewers can understand why employees feel the way they do, and what can be done to make improvements. Comments analysis can also assist with meeting the demands of pressing issues, such as many companies faced with the rise of COVID-19. Text analytics allowed clients to engage the full spectrum of issues related to the pandemic, uncovering areas of concern over employee safety and welfare, identifying emerging best practices for work from home (WFH), to understanding employee needs and concerns about the transition back to the workplace. Perceptyx tools are available in 32 languages. To learn more about survey text analysis and what it can do for your organization, schedule a demo today.