In this blog we will answer the most common questions asked by beginners in NLP.

Question: What is Natural Language Processing?

Answer: Natural Language Processing (NLP) is a programming technique by which computers can understand the language spoken or transcribed, based on the intent. It sounds very trivial, but it is not! Understanding a language is very difficult, as people can have conversations in many different ways, and based on tone and connotation, can change the meaning of the conversation.

Question:How do computers process language?

Answer: Computers process audio (spoken language) or text (written language) very differently than how people do. NLP makes it possible for a computer to interact with humans by mimicking a person’s capability of sense, comprehension, and ability to act on available data. Historically, NLP was driven by rule-based techniques such as stemming and grammar; but it had severe shortcomings in understanding the language that didn’t conform to any rules.

For Example – Rule-Based System falls flat when sarcasm is used in language. Sarcasm is usually displayed in tone or context, both equally difficult for a machine to discern. Therefore, to overcome challenges faced by traditional rule-based systems, like sarcasm, NLP systems leverage deep learning techniques to overcome these challenges.

Question: What are the main NLP components?

Answer: There are 3 components in a natural language interaction of machines with humans:

  • NLP (Natural Language Processing): is a programming technique by which computers can understand the language spoken or transcribed based on the intent.
  • NLU (Natural Language Understanding): is an AI-Hard problem, that deals with machine reading comprehension
  • NLG (Natural Language Generation): is a software process that transforms structured data into natural language.

All three components work in combination to decipher language into a way that machines can act on the information.

Question: How do machines make sense of the language?

Answer: Data Scientists work off a given set of information, create a model for how the machine should sift through the data, check the results, and fine-tune the model again until they achieve the desired outcome. In this case, its conversations people have had. As you might have guessed, the given data needs to be significant enough to be a representation of the language as a whole, which is not easy. These models then become the program the machine uses to understand the language.

Question: How do data scientists standardize the data?

Answer:The data they use to train the models must follow a standard workflow to make sense of the language.

Some of the NLP techniques are:

  • Remove <HTML> tags
  • Clean special characters
  • Stemming – It is a technique to obtain the base form of a word from its inflected form. Example – ‘work’ is a base word obtained from its inflected form ‘working’ by removing suffix ‘ing’. This technique helps in data standardization.
  • Lemmatization – This technique is similar to Stemming in which we remove word affixes to reach to the base form of a word. Here the base word is a root word, lexicographically, the same word you would find in a dictionary.
  • Removing Stopwords – Stopwords are words that have little or no significance in deriving meaning of data. Examples – articles, conjunctions, prepositions, etc.
  • Spell Check
  • Grammar check

The purpose of such techniques is to get to the root purpose of the sentence. If data scientists used data that was not “cleaned up” results would be skewed and sentences may take on different meanings.

Question: How do machines understand the context?

Answer: Once the data is cleaned, there are several different techniques to begin identifying components of sentences and start inferring meaning.

Parts of Speech Tagging

Parts of Speech Tagging is the process of classifying and labeling words in a text. This process helps with grammar analysis and word sense disambiguation. In other words, they tag words as Noun, Verb, Adjectives, Adverbs, etc.

ADJadjectiveaccurate, new, high
ADPadpositionin, up, on
ADVadverbquickly, awkwardly, really
CONJconjunctionand, but, or
DETdeterminersome, the, which
NOUNnounSam, Paris, cat
NUMnumeral2020, two, 9:00
PRTparticleon, that, with
PRONpronounsI, she, he, they
VERBverbstold, playing, would
.punctuation marks. , ! ? :
xotherLOL, g8,

Shallow Parsing or Chunking

Chunking is an NLP technique of analyzing the sentence by its constituent words and then linking them to higher-level phrases.

  • Noun Phrase: my toddler daughter
  • Verb Phrase: is writing
  • Adjective Phrase: not accurate
  • Adverb Phrase:  very quickly
  • Prepositional Phrase: in the storefront window

Named Entity Recognition

Named entity recognition (NER), also known as entity chunking or entity extraction, is a popular technique used in information extraction to identify and segment the named entities (names, organizations, locations, time, etc.).

Sentiment Analysis

Sentiment Analysis technique analyzes a body of text for understanding the opinion expressed in it. Typically, the text receives a positive, negative, or neutral score called a Polarity. Sentiment analysis works best on a text that has a subjective context rather than on text with only an objective context. Sentiment analysis for text data is computed on several levels, including on an individual sentence level, paragraph level, or the entire document as a whole.

Question: What is the biggest challenge NLP faces?

Answer: As mentioned, Natural Language Understanding is an AI-Hard problem, with a language’s high ambiguity and inability to conform to a given set of rules, we are just at the beginning of Natural Language Understanding. Even so, it is evident we are making strides in this field, and will one day soon be able to have machines understand us.

Query your data in plain English

At Query.AI, our NLP technology makes it easy for your security analysts to use plain English to query your data, removing the time and ramp up required to learn the specific language and query processes for each of your data sources. Query.AI also helps new users learn their native platform’s pipeline syntax if they choose to.