## Introduction
With the wide applications in chatbots, machine translation, etc., Natural Language Processing (NLP) is growing very fast, and it is also one of the key for the revolution of artificial intelligence with the ability to understand and interact with humans.
Reference materials: 
https://textblob.readthedocs.io/en/dev/quickstart.html#sentiment-analysis

https://textblob.readthedocs.io/en/dev/advanced_usage.html#advanced

## textBlob
TextBlob aims to provide access to common text-processing operations through a familiar interface. You can treat TextBlob objects as if they were Python strings that learned how to do Natural Language Processing.

In [1]:
!pip install textblob # use this to install if you haven't done that yet



In [3]:
from textblob import TextBlob
wiki = TextBlob("Python is a high-level, general-purpose programming language.")

In [4]:
# Noun Phrase Extraction
wiki.noun_phrases

WordList(['python'])

## sentiment analysis
polarity: [-1,1], -1 means negative, 1 means positive. subjectivity: [0,1], 0 means objective, 1 means subjective

In [6]:
# Sentiment Analysis
testimonial = TextBlob("Textblob is amazingly simple to use. What great fun!")

print(testimonial.sentiment)

testimonial.sentiment.polarity

Sentiment(polarity=0.39166666666666666, subjectivity=0.4357142857142857)


0.39166666666666666

## Tokenization
You can extract the words / sentence easily

In [13]:
data233 = TextBlob("You can learn a lot of interesting things in DATA 233. It helps every student learning data science. They are beautiful. Most of the time, Beautiful is better than ugly.\nIt is simple\nSimple is better than complex.")

print(data233.words)

print(data233.sentences)


['You', 'can', 'learn', 'a', 'lot', 'of', 'interesting', 'things', 'in', 'DATA', '233', 'It', 'helps', 'every', 'student', 'learning', 'data', 'science', 'They', 'are', 'beautiful', 'Most', 'of', 'the', 'time', 'Beautiful', 'is', 'better', 'than', 'ugly', 'It', 'is', 'simple', 'Simple', 'is', 'better', 'than', 'complex']
[Sentence("You can learn a lot of interesting things in DATA 233."), Sentence("It helps every student learning data science."), Sentence("They are beautiful."), Sentence("Most of the time, Beautiful is better than ugly."), Sentence("It is simple
Simple is better than complex.")]


## Words Inflection and Lemmatization
nflection is a process of word formation in which characters are added to the base form of a word to express grammatical meanings. Word inflection in TextBlob is very simple, i.e., the words we tokenized from a textblob can be easily changed into singular or plural.

In [14]:
print (data233.sentences[1].words[1])
print (data233.sentences[1].words[1].singularize())

helps
help


In [16]:
## lemmatization
from textblob import Word
w = Word('swim')
print(w.pluralize())

w = Word('running')
print(w.lemmatize("v")) ## v here represents verb

swims
run


## N-grams
A combination of multiple words together are called N-Grams.

In [18]:
# you can change the N to 3 below, and check the output
for ngram in data233.ngrams(2): 
 print (ngram)

['You', 'can']
['can', 'learn']
['learn', 'a']
['a', 'lot']
['lot', 'of']
['of', 'interesting']
['interesting', 'things']
['things', 'in']
['in', 'DATA']
['DATA', '233']
['233', 'It']
['It', 'helps']
['helps', 'every']
['every', 'student']
['student', 'learning']
['learning', 'data']
['data', 'science']
['science', 'They']
['They', 'are']
['are', 'beautiful']
['beautiful', 'Most']
['Most', 'of']
['of', 'the']
['the', 'time']
['time', 'Beautiful']
['Beautiful', 'is']
['is', 'better']
['better', 'than']
['than', 'ugly']
['ugly', 'It']
['It', 'is']
['is', 'simple']
['simple', 'Simple']
['Simple', 'is']
['is', 'better']
['better', 'than']
['than', 'complex']
