Breaking down NLP by Areas of Interest

A few areas to start exploring for budding NLP developers

Interested in learning more about Natural Language Processing? Here are a few areas to start exploring to develop your niche.

Sentiment Analysis

The goal of sentiment analysis is to classify a corpus based on sentiment. Online stores might use sentiment analysis to determine if a review was favorable, while researchers might use it to find popular sentiments towards certain subjects such as the upcoming election.

For instance, this review would be classified as being positive through sentiment analysis:

Everything was great. The store was clean, fresh and prices were reasonable. I would definitely come back.

Intent and Slot Classification

Intent and Slot classification is the process of extracting user commands and related arguments from an input. This is commonly used for AI assistants, such as Siri, Alexa and Google Home.

If I tell my AI assistant:

Set my alarm to 7am tomorrow

A smart AI assistant would recognize that the command is setting an alarm. The relevant argument is the time: tomorrow at 7am.

Entity Extraction and Linking

Entity Extraction is better known as Named Entity Resolution, where relevant entities are extracted from a corpus. For instance, if we are interested in extracting the location or time, we can train a NER system to extract the following in bold:

The train will leave tomorrow at noon from Charles Station. It will make three stops along the way before arriving to your destination at 5pm.

Entity Linking is the process of linking extracted entities to entries in a knowledge base. If we wanted to link the extracted entities from our earlier example to a directory of locations (our knowledge base), we can train a Entity Linking system to infer that:

Charles Station → Charles Street Terminal Station

Language Modeling

Language Modeling is the process of predicting the next word/phrase based on previous input. A popular application of language modeling is auto-completion on your device. For instance, if you frequently use the same phrase, your phone will start recommending it through auto complete after you typed the first word.

Machine Translation

Machine Translation is the process of translating from one language to another. We’ve all used Google translate at some point, whether for studying a second language or to cheat on our French homework. While Machine Translation has achieved near human parity in several languages, there is a lot of opportunity to improve this field for most languages.

Question Answering

Question Answering takes a contextual document and a question to return the best answer within the document. Another variation of Question Answering leverages a knowledge base to return the answer from the knowledge base instead of the document. Several chat bots and dialogue agents leverage mechanisms from Question Answering for converational AI. A fancy name for Question Answering is Machine Comprehension.

Any reading comprehension problem is a Question Answering problem.

Text Summarization

As the name suggests, Text Summarization is about condensing a longer form of text into a shorter one without removing any important information. Most approaches to text summarization does not actually rely on Machine Learning — they use simple heuristics to determine if a sentence or phrase is worth keeping.

There is however a lot of research and development in using Machine Learning for text summarization.

For more information, this is a great resource for datasets, benchmarks and how to measure performance in these areas:

Check out my website for learning Data Science: https://www.dscrashcourse.com/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store