Text analysis

Unstructured text analysis in Relative Insight: how it works

Relative Insight, 26 Jul 2021

Text analytics is the future of business intelligence. Relative Insight’s unstructured text analysis platform helps organizations capture insights from qualitative data at scale. While numbers can tell you what is happening, our comparative approach helps unearth the how and why so you can take informed actions.

But we often get asked – how does it all work? In this article, we demystify the inner workings of Relative Insight, helping you understand what happens once data is uploaded to the platform.

Natural language processing

Natural language processing (NLP) is a branch of AI that helps computers make sense of text. When an unstructured data set is fed into Relative Insight, the text flows through a series of processes – the NLP pipeline. As the data passes through the pipeline, it is transformed into something a computer can understand.

In essence, Relative Insight’s algorithms ‘read’ the text and record the linguistic features to enable further analysis of the data.

Key processes in the NLP pipeline include:

Breaking the text down into sub-components (sentences, phrases, words)
Labelling parts of speech (noun, pronoun, adjective, determiner etc.)
Identifying named entities (people, locations, companies etc.)
Topic Identification – the process of the computer recording what the text is about (the meaning)

The NLP pipeline also considers the fact that words can take on context-specific meanings. Think of the word spring which can be a body of water, season, mechanical component or verb. Because of this inherent quality of language, semantic tagging requires consideration of the words being used in conjunction with a linguistic feature to determine the meaning.

Meaning is assessed by using knowledge graphs – databases that inform the algorithms about the relationships between different concepts. These knowledge graphs are continuously updated using machine learning to improve the accuracy of classification over time.

A comparative approach to unstructured text analysis

After passing through the NLP pipeline, the platform stores a record of the frequencies of each identified linguistic feature. At this point, the data is ready for comparison!

To ensure an ‘apples to apples’ comparison, the platform first calculates the relative frequency of each linguistic feature. Relative frequency is a normalized frequency value that allows you to compare unequally sized data sets without distorting the analysis.

For example, if the word ‘beauty’ appears 5 times in a data set of 1,000 words this will have the same relative frequency as a 2,000-word data set where the word appears 10 times.

Once this is done, the relative frequencies for each linguistic feature are compared to determine the relative difference. Relative difference is calculated for each data set being compared:

unstructured text analysis relative difference calculation

Based on this calculation, a particular linguistic feature can be classified as a difference, similarity or neither.

Differences

A relative difference of 1.0 indicates an equal prevalence of a particular linguistic feature. When relative difference values exceed 1.0 this indicates the linguistic feature is more prevalent in the data set being examined compared to others. The higher the value the bigger the difference.

To ensure there is sufficient evidence to assert a difference is not just happening by chance, the platform calculates the probability that the relative difference would indicate a difference where one doesn’t truly exist. This is done by looking at the probability distributions of the linguistic features in a dataset, and testing the ‘goodness of fit’ between the models that power Relative Insight and the data being analyzed.

Similarities

When a linguistic feature returns a relative difference between 0.9 and 1.1 and does not meet the threshold for classification as a difference, this indicates a potential similarity.

Function words (e.g. if, the, and) are removed as these words occur with a high frequency in any data set. Given these words do not convey meaning by themselves, they are not typically insightful.

As with differences, a statistical test is conducted to assess that a similarity wasn’t identified where one doesn’t truly exist before presenting the results in the platform.

Qualitative text analysis at scale

Through the combination of the NLP pipeline and comparison, Relative Insight’s unstructured text analysis engine surfaces the differences and similarities between data sets. This approach reveals the things that matter within the texts without having to labour through reading them manually. Once the algorithms have fired away, you can explore the results of the analysis, build insights cards and visualize groups of comparisons using Heatmaps.

Interested in learning more about unstructured text analysis? Watch our webinar with Relative Insight’s Head of Natural Language Processing Ryan Callihan.

Discover the power of comparative text analysis

Request a demo

explainers

text analysis

Join our community of insights professionals

Subscribe for the latest in customer insights and market research, featuring guides, case studies, webinars and interviews with leaders in the field.

Get Your Guide

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Unstructured text analysis in Relative Insight: how it works

Natural language processing

A comparative approach to unstructured text analysis

Differences

Similarities

Qualitative text analysis at scale

Discover the power of comparative text analysis

Join our community of insights professionals

Related content

The four biggest challenges in building AI text analysis solutions

Relative Insight: Top text analytics software in G2’s Fall 2023 reports

Metrics, evidence and audit in text analytics

Text mining vs. NLP: What’s the difference?