Text analysis

Understanding the relative difference metric

Relative Insight, 06 Apr 2021

Relative Insight’s comparative approach to text analytics surfaces statistically significant differences and similarities between language sets. In doing so, our software directs your attention to the things that actually matter in a body of text.

The Relative difference metric is a measure of how much more prevalent a topic, phrase, word, emotion or grammar element is in one body of text compared to others. The platform also displays frequency and similarity metrics.

How is relative difference calculated?

For each language set uploaded into the platform, Relative Insight conducts a detailed linguistic analysis. Our natural language processing algorithms ‘read’ the text, identifying topics, grammar, emotions, words and phrases. The frequencies of each language element are then determined and normalized based on the size of the language set to enable ‘apples to apples’ comparisons between language sets of different sizes.

Relative difference is calculated by dividing the normalized frequency of a particular language element in one language set by the normalized frequency of the same element in the comparison language set(s).

Where the relative difference calculation reveals a difference, the platform applies an additional layer of statistical analysis to provide confidence that the difference is not surfacing due to chance. Log-likelihood calculations are performed to assess this possibility, and the output of the analysis viewable within the platform will only display differences that meet a 99% confidence interval. This means that there will be a maximum of 1% chance that a difference was identified where one does not truly exist.

Why should I trust insights based on low frequencies?

This is one of the most common questions we get from new users of Relative Insight.

The frequency of word usage follows what is called a Zipf distribution. This statistical law dictates that the frequency of a word is inversely proportional to its rank in the frequency table. Put simply, this means the second most common word will appear half as often as the most common, the third one third as often and so on. Because of this, most words are expected to occur very infrequently and thus even a few occurrences can result in a statistically significant finding.

The nature of dealing with words

Words are less precise than numbers. This means that even the most advanced text analysis solution may surface findings that don’t make perfect sense. Relative Insight is no exception. For example, consider the word ‘spring’ which has context-specific meanings as a verb, to describe a season or in reference to a mechanical component. This can pose a challenge when it comes to topical classifications. The ability to view verbatim examples from the text can help you overcome this and better understand the data you have analyzed when things may not be immediately clear.

If you ever need help making sense of something in the analysis, our team will be here to help!

Find out more

Request a demo

analysis

metrics

natural language processing

relative insight

Join our community of insights professionals

Subscribe for the latest in customer insights and market research, featuring guides, case studies, webinars and interviews with leaders in the field.

Get Your Guide

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Understanding the relative difference metric

How is relative difference calculated?

Why should I trust insights based on low frequencies?

The nature of dealing with words

Find out more

Join our community of insights professionals

Related content

The four biggest challenges in building AI text analysis solutions

Relative Insight: Top text analytics software in G2’s Fall 2023 reports

Metrics, evidence and audit in text analytics

Text mining vs. NLP: What’s the difference?