{ "cells": [ { "cell_type": "markdown", "id": "9f4cf91a", "metadata": {}, "source": [ "## Introduction\n", "With the wide applications in chatbots, machine translation, etc., Natural Language Processing (NLP) is growing very fast, and it is also one of the key for the revolution of artificial intelligence with the ability to understand and interact with humans.\n", "Reference materials: \n", "https://textblob.readthedocs.io/en/dev/quickstart.html#sentiment-analysis\n", "\n", "https://textblob.readthedocs.io/en/dev/advanced_usage.html#advanced" ] }, { "cell_type": "markdown", "id": "7b6c0731", "metadata": {}, "source": [ "## textBlob\n", "TextBlob aims to provide access to common text-processing operations through a familiar interface. You can treat TextBlob objects as if they were Python strings that learned how to do Natural Language Processing." ] }, { "cell_type": "code", "execution_count": 1, "id": "224e1455", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: textblob in /Users/wenwang/eman2-sphire-sparx/lib/python3.7/site-packages (0.17.1)\n", "Requirement already satisfied: nltk>=3.1 in /Users/wenwang/eman2-sphire-sparx/lib/python3.7/site-packages (from textblob) (3.6.7)\n", "Requirement already satisfied: joblib in /Users/wenwang/eman2-sphire-sparx/lib/python3.7/site-packages (from nltk>=3.1->textblob) (1.0.1)\n", "Requirement already satisfied: click in /Users/wenwang/eman2-sphire-sparx/lib/python3.7/site-packages (from nltk>=3.1->textblob) (8.0.3)\n", "Requirement already satisfied: regex>=2021.8.3 in /Users/wenwang/eman2-sphire-sparx/lib/python3.7/site-packages (from nltk>=3.1->textblob) (2021.11.10)\n", "Requirement already satisfied: tqdm in /Users/wenwang/eman2-sphire-sparx/lib/python3.7/site-packages (from nltk>=3.1->textblob) (4.56.0)\n", "Requirement already satisfied: importlib-metadata in /Users/wenwang/eman2-sphire-sparx/lib/python3.7/site-packages (from click->nltk>=3.1->textblob) (3.7.3)\n", "Requirement already satisfied: typing-extensions>=3.6.4 in /Users/wenwang/eman2-sphire-sparx/lib/python3.7/site-packages (from importlib-metadata->click->nltk>=3.1->textblob) (3.7.4.3)\n", "Requirement already satisfied: zipp>=0.5 in /Users/wenwang/eman2-sphire-sparx/lib/python3.7/site-packages (from importlib-metadata->click->nltk>=3.1->textblob) (3.4.1)\n" ] } ], "source": [ "!pip install textblob # use this to install if you haven't done that yet" ] }, { "cell_type": "code", "execution_count": 3, "id": "a5419171", "metadata": {}, "outputs": [], "source": [ "from textblob import TextBlob\n", "wiki = TextBlob(\"Python is a high-level, general-purpose programming language.\")" ] }, { "cell_type": "code", "execution_count": 4, "id": "91c83922", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "WordList(['python'])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Noun Phrase Extraction\n", "wiki.noun_phrases" ] }, { "cell_type": "markdown", "id": "89ebcaf4", "metadata": {}, "source": [ "## sentiment analysis\n", "polarity: [-1,1], -1 means negative, 1 means positive. subjectivity: [0,1], 0 means objective, 1 means subjective" ] }, { "cell_type": "code", "execution_count": 6, "id": "1baeb67d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sentiment(polarity=0.39166666666666666, subjectivity=0.4357142857142857)\n" ] }, { "data": { "text/plain": [ "0.39166666666666666" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Sentiment Analysis\n", "testimonial = TextBlob(\"Textblob is amazingly simple to use. What great fun!\")\n", "\n", "print(testimonial.sentiment)\n", "\n", "testimonial.sentiment.polarity" ] }, { "cell_type": "markdown", "id": "0aabbf3e", "metadata": {}, "source": [ "## Tokenization\n", "You can extract the words / sentence easily" ] }, { "cell_type": "code", "execution_count": 13, "id": "87a04f4b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['You', 'can', 'learn', 'a', 'lot', 'of', 'interesting', 'things', 'in', 'DATA', '233', 'It', 'helps', 'every', 'student', 'learning', 'data', 'science', 'They', 'are', 'beautiful', 'Most', 'of', 'the', 'time', 'Beautiful', 'is', 'better', 'than', 'ugly', 'It', 'is', 'simple', 'Simple', 'is', 'better', 'than', 'complex']\n", "[Sentence(\"You can learn a lot of interesting things in DATA 233.\"), Sentence(\"It helps every student learning data science.\"), Sentence(\"They are beautiful.\"), Sentence(\"Most of the time, Beautiful is better than ugly.\"), Sentence(\"It is simple\n", "Simple is better than complex.\")]\n" ] } ], "source": [ "data233 = TextBlob(\"You can learn a lot of interesting things in DATA 233. It helps every student learning data science. They are beautiful. Most of the time, Beautiful is better than ugly.\\nIt is simple\\nSimple is better than complex.\")\n", "\n", "print(data233.words)\n", "\n", "print(data233.sentences)\n" ] }, { "cell_type": "markdown", "id": "6f26fa94", "metadata": {}, "source": [ "## Words Inflection and Lemmatization\n", "nflection is a process of word formation in which characters are added to the base form of a word to express grammatical meanings. Word inflection in TextBlob is very simple, i.e., the words we tokenized from a textblob can be easily changed into singular or plural." ] }, { "cell_type": "code", "execution_count": 14, "id": "d1402914", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "helps\n", "help\n" ] } ], "source": [ "print (data233.sentences[1].words[1])\n", "print (data233.sentences[1].words[1].singularize())" ] }, { "cell_type": "code", "execution_count": 16, "id": "e23661f1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "swims\n", "run\n" ] } ], "source": [ "## lemmatization\n", "from textblob import Word\n", "w = Word('swim')\n", "print(w.pluralize())\n", "\n", "w = Word('running')\n", "print(w.lemmatize(\"v\")) ## v here represents verb" ] }, { "cell_type": "markdown", "id": "252c7ad0", "metadata": {}, "source": [ "## N-grams\n", "A combination of multiple words together are called N-Grams." ] }, { "cell_type": "code", "execution_count": 18, "id": "e285399f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['You', 'can']\n", "['can', 'learn']\n", "['learn', 'a']\n", "['a', 'lot']\n", "['lot', 'of']\n", "['of', 'interesting']\n", "['interesting', 'things']\n", "['things', 'in']\n", "['in', 'DATA']\n", "['DATA', '233']\n", "['233', 'It']\n", "['It', 'helps']\n", "['helps', 'every']\n", "['every', 'student']\n", "['student', 'learning']\n", "['learning', 'data']\n", "['data', 'science']\n", "['science', 'They']\n", "['They', 'are']\n", "['are', 'beautiful']\n", "['beautiful', 'Most']\n", "['Most', 'of']\n", "['of', 'the']\n", "['the', 'time']\n", "['time', 'Beautiful']\n", "['Beautiful', 'is']\n", "['is', 'better']\n", "['better', 'than']\n", "['than', 'ugly']\n", "['ugly', 'It']\n", "['It', 'is']\n", "['is', 'simple']\n", "['simple', 'Simple']\n", "['Simple', 'is']\n", "['is', 'better']\n", "['better', 'than']\n", "['than', 'complex']\n" ] } ], "source": [ "# you can change the N to 3 below, and check the output\n", "for ngram in data233.ngrams(2): \n", " print (ngram)" ] }, { "cell_type": "code", "execution_count": null, "id": "500eec42", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.9" } }, "nbformat": 4, "nbformat_minor": 5 }