Text mining vs. NLP (natural language processing) – two big buzzwords in the world of analysis, and two terms that are often misunderstood.
Perhaps you’ve heard the terms thrown around and want to finally nail down the definitions.
Perhaps you’re well-versed in the language of analytics but want to brush up on your knowledge.
Perhaps you’re looking to wow your audience the next time you’re asked that familiar post-presentation question: “How do you know that?”.
Wherever you’re at in your text analysis journey, we’re regularly met with the same questions:
- What are the differences between text mining and NLP?
- How do they work?
- How are they related?
We sat down with Relative Insight’s Head of AI, Ryan Callihan, to get to the bottom of things.
With a background in computational linguistics (a discipline that blends data science and linguistics), Ryan sets the direction for research and development at Relative, helping us build high-impact features that give our customers the insights they need to make better decisions. In other words, he knows his stuff. And after reading this, you will too.
Let’s start with natural language processing
Natural language processing, or in fewer syllables: NLP. While ‘natural’ might imply human or manual work, as a process it’s actually concerned with the computerized analysis of language as it spoken and written naturally (i.e. straight from a person’s mouth).
It is rooted in computational linguistics and utilizes either machine learning systems or rule-based systems. These areas of study allow NLP to interpret linguistic data in a way that accounts for human sentiment and objective.
As Ryan explains, “when it comes to NLP, Google has a definition that I think summarizes it well. NLP is ‘the application of computational techniques to the analysis and synthesis of natural language and speech.’”
Generally, natural language processing consists of two components:
In a nutshell, NLP is a way of organizing unstructured text data so it’s ready to be analyzed.
What can natural language processing do?
Great, now we know how NLP works. But what can it do? And what is it used for?
Most of us use NLP on a daily basis – from Amazon Alexa to iPhone autocorrect and Google Translate – all of these everyday tools utilize complex technology and algorithms to make your life easier.
Let’s break down some of the main things NLP technologies can help you do…
- Sentiment analysis is the practice of using computers to identify and categorize opinions and emotions expressed in text. This form of analysis reveals the writer’s sentiment as demonstrated by word choice. Natural language processing recognizes subtle nuances in language to classify the opinion as positive, negative or neutral.
- Text classification streamlines manual analysis to understand and classify unstructured text. This tool can be used to autonomously categorize text data like open-ended survey responses by written attributes.
Chatbots and virtual assistants
- These technologies utilize natural language processing to communicate with humans by understanding language use and responding appropriately. Machine learning systems adapt over time, using previous interactions to inform current and future queries.
- Information extraction sifts through text to recognize bits of desired data or keywords. This practice can be used to extract small pieces of information like a name or address from a larger piece of language data. Keyword extraction demonstrates the frequency of certain words or phrases to reveal trends and patterns.
- Machine translation eliminates language boundaries, allowing us to communicate more effectively. Natural language processing has been the driving force in the progression of language translation technologies. Using machine learning and large amounts of data, tools like Google Translate are able to autonomously improve.
- Text summary tools use NLP to read over larger text files and recognize important information. Technologies typically either extract specific keywords or paraphrase the original text with relevant findings.
- NLP doesn’t just recognize words, it analyzes grammar. Autocorrect tools detect errors in grammar and provide revisions and suggestions to improve readability.
- Speech recognition uses natural language processing to receive and convert spoken language into a format readable by a computer. Virtual assistants, speech to text tools, voicemail transcription, audio translation technologies and more all utilize NLP.
Then comes text mining…
Text analysis – or text mining – can be hard to understand, so we asked Ryan how he would define it in a sentence or two.
In his words, text analytics is “extracting information and insight from text using AI and NLP techniques. These techniques turn unstructured data into structured data to make it easier for data scientists and analysts to actually do their jobs.
Wait, so are NLP and text mining the same?
Related? Yes. The same? No.
In simple terms, NLP is a technique that is used to prepare data for analysis. As humans, it can be difficult for us to understand the need for NLP, because our brains do it automatically (we understand the meaning, sentiment, and structure of text without processing it). But because computers are (thankfully) not humans, they need NLP to make sense of things.
As Ryan explains, “language is full of different layers which all work together. Humans combine all these layers with ease, but it is much more labor intensive for computers.” And now we’re talking about layers, it naturally makes sense to look to our favorite Scottish ogre for inspiration. “To take an analogy from Shrek”, observes Ryan, “language is truly like an onion. NLP picks apart that onion, identifying each layer.”
A little unconventional as an analogy, but we’ll bite. Ryan continues: “Now, you generally wouldn’t just eat all those layers of onion on their own, you would do something with them. Analysts can swoop in and make a nice French onion soup with all those layers. The recipes they use may be varied, but ultimately, their goal is the same: to make something beautiful – aka, extract meaning and insight.”
The difference between text mining vs. NLP? They’re not the same but closely connected. In short, you can have NLP without text analytics, but it would be difficult to do text analytics without NLP.
Boost your business intelligence with text analytics
What makes a good NLP tool?
Well firstly, it’s important to understand that not all NLP tools are created equal. The differences are often in the way they classify text, as some have a more nuanced understanding than others.
As Ryan warns, we shouldn’t always “press toward using whatever is new and flashy”. When it comes to NLP tools, it’s about using the right tool for the job at hand, whether that’s for sentiment analysis, topic modeling, or something else entirely.
Because ultimately, there’s so much you can do with NLP. As Ryan explains, “there are so many onion layers to peel back. But not every layer is useful for every situation. There are times that sentiment analysis or topic modeling might not be all that useful. Selecting the NLP layers you need and not just throwing the kitchen sink at the problem is a real skill.”
Language is an ever-evolving landscape, so it makes sense that NLP tools should be too. The best NLP tools are continually updating and learning, especially as “every new tool is out of date the minute it is created.”
But how does NLP pick up on nuance in emotion or sentiment?
Sentiment or emotional analysis is one of the layers that NLP can provide. But it’s right to be skeptical about how well computers can pick up on sentiment that even humans struggle with sometimes.
Especially, as Ryan says, when you consider that “humans very often don’t literally say what they mean, and different things have different meanings in different contexts.”
Inevitably, there are different levels of sophistication in NLP tools, but the best are more intelligent than you might expect.
Ryan gives the following example to prove his point: “if you take the following sentence – ‘I’m really looking forward to the next prime minister 🙄’ – you might instantly understand this to be sarcastic with a negative sentiment. If you take the text in isolation, ‘…looking forward to…’ means anticipation of something and ‘…really looking forward to…’ is even stronger, so that intensifies it. However, there is a pesky ‘🙄’ at the end of the text, which serves as a gesture or facial expression that the text lacks. If I were to say this in real life, I might roll my eyes, just like this emoji. Hopefully, all the NLP layers would combine to correctly identify the actual sentiment of this sentence.”
As Ryan’s example shows, NLP can identify the right sentiment at a more sophisticated level than you might imagine.
But one thing NLP does not currently do so well is “adding external knowledge or real-world context”. He adds that “humans in the UK would know that there is controversy around the new PM. Not everyone may actually be looking forward to this change, so saying that you are looking forward to it is likely sarcasm, with or without ‘🙄’. NLP misses this real-world understanding, but I am confident this will improve in the future.”
Controversy aside, the identification of nuance is certainly possible with NLP and, according to Ryan, it’s only going to grow over time.
And is there any real difference between text analysis and text mining?
Once your NLP tool has done its work and structured your data into coherent layers, the next step is to analyze that data. “Don’t you mean text mining”, some smart alec might pipe up, correcting your use of the term ‘text analytics’.
The truth is, you’d both be right, and you should feel confident using the terms interchangeably.
As Ryan tell us, “I think most people who assertively define either to be full of it! So, to throw my hat into the ring, I would like to define text analysis and text mining as having the same intention: to find meaning, insight, and answers in text data. There are a lot of ways to get to those answers so don’t think there is only one right way!”
Ryan has officially given you permission to use both terms at your leisure.
The bottom line: Text mining vs. NLP
So text mining vs. NLP, what’s the difference? Thanks to our data science expert Ryan, we’ve learned that NLP helps in text mining by preparing data for analysis. Or to use Ryan’s analogy, where language is the onion, NLP picks apart that onion, so that text mining can make a lovely onion soup that’s full of insights.
We hope this Q&A has given you a greater understanding of how text analytics platforms can generate surprisingly human insight. And if anyone wishes to ask you tricky questions about your methodology, you now have all the answers you need to respond with confidence.