Part-of-Speech Tagging

In this lesson, we’re going to learn about the textual analysis methods part-of-speech tagging and keyword extraction. These methods will help us computationally parse sentences and better understand words in context.


[Charles] Babbage, who called [Ada Lovelace] the “enchantress of numbers,” once wrote that she “has thrown her magical spell around the most abstract of Sciences and has grasped it with a force which few masculine intellects (in our own country at least) could have exerted over it.

—Claire Cain Miller, “Ada Lovelace,” NYT Overlooked Obituaries

She “ PRON has AUX thrown VERB her DET magical ADJ spell NOUN around ADP the DET most ADV abstract ADJ of ADP Sciences. PROPN nsubj aux poss amod dobj prep det advmod pobj prep pobj

Why is Part-of-Speech Tagging Useful?

I don’t mean to go all Language Nerd on you, but parts of speech are important. Even if they seem kind of boring. Parts of speech are the grammatical units of language — such as (in English) nouns, verbs, adjectives, adverbs, pronouns, and prepositions. Each of these parts of speech plays a different role in a sentence.

https://imgs.xkcd.com/comics/language_nerd.png

By computationally identifying parts of speech, we can start computationally exploring syntax, the relationship between words — rather than only focusing on words in isolation, as we did with tf-idf. Though parts of speech may seem pedantic, they help computers (and us) crack at that ever-elusive abstract noun — meaning.

spaCy and Natural Language Processing (NLP)

To computationally identify parts of speech, we’re going to use the natural language processing library spaCy. For a more extensive introduction to NLP and spaCy, see the previous lesson.

To parse sentences, spaCy relies on machine learning models that were trained on large amounts of labeled text data. The English-language spaCy model that we’re going to use in this lesson was trained on an annotated corpus called “OntoNotes”: 2 million+ words drawn from “news, broadcast, talk shows, weblogs, usenet newsgroups, and conversational telephone speech,” which were meticulously tagged by a group of researchers and professionals for people’s names and places, for nouns and verbs, for subjects and objects, and much more.

Install spaCy

To use spaCy, we first need to install the library.

!pip install -U spacy

Import Libraries

Then we’re going to import spacy and displacy, a special spaCy module for visualization.

import spacy
from spacy import displacy
from collections import Counter
import pandas as pd
pd.set_option("max_rows", 400)
pd.set_option("max_colwidth", 400)

We’re also going to import the Counter module for counting nouns, verbs, adjectives, etc., and the pandas library for organizing and displaying data (we’re also changing the pandas default max row and column width display setting).

Download Language Model

Next we need to download the English-language model (en_core_web_sm), which will be processing and making predictions about our texts. This is the model that was trained on the annotated “OntoNotes” corpus. You can download the en_core_web_sm model by running the cell below:

!python -m spacy download en_core_web_sm
Requirement already satisfied: en_core_web_sm==2.1.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.1.0/en_core_web_sm-2.1.0.tar.gz#egg=en_core_web_sm==2.1.0 in /Users/melaniewalsh/anaconda3/lib/python3.7/site-packages (2.1.0)
✔ Download and installation successful
You can now load the model via spacy.load('en_core_web_sm')

Note: spaCy offers models for other languages including German, French, Spanish, Portuguese, Italian, Dutch, Greek, Norwegian, and Lithuanian. Languages such as Russian, Ukrainian, Thai, Chinese, Japanese, Korean and Vietnamese don’t currently have their own NLP models. However, spaCy offers language and tokenization support for many of these language with external dependencies — such as PyviKonlpy for Korean or Jieba for Chinese.

Load Language Model

Once the model is downloaded, we need to load it with spacy.load() and assign it to the variable nlp.

nlp = spacy.load('en_core_web_sm')

Create a Processed spaCy Document

Whenever we use spaCy, our first step will be to create a processed spaCy document with the loaded NLP model nlp(). Most of the heavy NLP lifting is done in this line of code. After processing, the document object will contain tons of juicy language data — named entities, sentence boundaries, parts of speech — and the rest of our work will be devoted to accessing this information.

To test out spaCy’s part-of-speech tagging, we’ll begin by processing a sample sentence from Ada Lovelace’s obituary:

“[Charles] Babbage, who called [Ada Lovelace] the “enchantress of numbers,” once wrote that she “has thrown her magical spell around the most abstract of Sciences and has grasped it with a force which few masculine intellects (in our own country at least) could have exerted over it.

This sentence makes for an interesting example because it is syntactically complex and because it includes contains difficultly ambiguous words such as “spell,” “abstract,” and “force.”

sample = """She “has thrown her magical spell around the most abstract of Sciences."""
document = nlp(sample)

spaCy Part-of-Speech Tagging

POS

Description

Examples

ADJ

adjective

big, old, green, incomprehensible, first

ADP

adposition

in, to, during

ADV

adverb

very, tomorrow, down, where, there

AUX

auxiliary

is, has (done), will (do), should (do)

CONJ

conjunction

and, or, but

CCONJ

coordinating conjunction

and, or, but

DET

determiner

a, an, the

INTJ

interjection

psst, ouch, bravo, hello

NOUN

noun

girl, cat, tree, air, beauty

NUM

numeral

1, 2017, one, seventy-seven, IV, MMXIV

PART

particle

’s, not,

PRON

pronoun

I, you, he, she, myself, themselves, somebody

PROPN

proper noun

Mary, John, London, NATO, HBO

PUNCT

punctuation

., (, ), ?

SCONJ

subordinating conjunction

if, while, that

SYM

symbol

$, %, §, ©, +, −, ×, ÷, =, :), 😝

VERB

verb

run, runs, running, eat, ate, eating

X

other

sfpksdpsxmsa

SPACE

space

Above is a POS chart taken from spaCy’s website, which shows the different parts of speech that spaCy can identify as well as their corresponding labels. To quickly see spaCy’s POS tagging in action, we can use the spaCy module displacy on our sample document with the style= parameter set to “dep” (short for dependency parsing):

#Set some display options for the visualizer
options = {"compact": True, "distance": 90, "color": "yellow", "bg": "black", "font": "Gill Sans"}

displacy.render(document, style="dep", options=options)
She “ PRON has AUX thrown VERB her DET magical ADJ spell NOUN around ADP the DET most ADV abstract ADJ of ADP Sciences. PROPN nsubj aux poss amod dobj prep det advmod pobj prep pobj

As you can see, spaCy has correctly identified that “spell” and “force” are nouns in our sample sentence:

for token in document:
    if token.pos_ == "NOUN":
        print(token, token.pos_)
spell NOUN
force NOUN
intellects NOUN
country NOUN

But if we look at the same words in a different context — in a sentence that I made up — spaCy can identify when these words have changed grammatical roles and meanings.

You shouldn’t force someone to learn how to spell Babbage. They just need practice. You can’t abstract it.

document = nlp("You shouldn't force someone to learn how to spell Babbage. They just need practice. You can't abstract it.")
for token in document:
    if token.pos_ == "VERB":
        print(token, token.pos_)
force VERB
learn VERB
spell VERB
need VERB
abstract VERB

Where previously spaCy had identified “force” and “spell” as nouns, here spaCy correctly identifies the words “force,” “spell,” and “abstract” as verbs.

Get Part-Of-Speech Tags

To get part of speech tags for every word in a document, we have to iterate through all the tokens in the document and pull out the .pos_ attribute for each token. We can get even finer-grained dependency information with the attribute .dep_.

for token in document:
    print(token.text, token.pos_, token.dep_)
You PRON nsubj
should AUX aux
n't ADV neg
force VERB ROOT
someone NOUN dobj
to PART aux
learn VERB xcomp
how ADV advmod
to PART aux
spell VERB xcomp
Babbage PROPN dobj
. PUNCT punct
They PRON nsubj
just ADV advmod
need VERB ROOT
practice NOUN dobj
. PUNCT punct
You PRON nsubj
ca AUX aux
n't ADV neg
abstract VERB ROOT
it PRON dobj
. PUNCT punct

Practicing with Dracula

filepath = "../texts/literature/Dracula_Bram-Stoker.txt"
document = nlp(open(filepath, encoding="utf-8").read())

Get Adjectives

POS

Description

Examples

ADJ

adjective

big, old, green, incomprehensible, first

To extract and count the adjectives in Dracula, we will follow the same model as above, except we’ll add an if statement that will pull out words only if their POS label matches “ADJ.”

Python Review!

While we demonstrate how to extract parts of speech in the sections below, we’re also going to reinforce some integral Python skills. Notice how we use for loops and if statements to .append() specific words to a list. Then we count the words in the list and make a pandas dataframe from the list.

Here we make a list of the adjectives identified in Dracula:

adjs = []
for token in document:
    if token.pos_ == 'ADJ':
        adjs.append(token.text)
adjs
['available',
 'DEAR',
 '1st',
 'next',
 'wonderful',
 'little',
 'correct',
 'possible',
 'western',
 'splendid',
 'noble',
 'Turkish',
 'good',
 'red',
 'good',
 'thirsty',
 'national',
 'able',
 'German',
 'useful',
 'able',
 'wildest',
 'able',
 'exact',
 'own',
 'distinct',
 'latter',
 'eleventh',
 'imaginative',
 'interesting',
 'comfortable',
 'thirsty',
 'continuous',
 'more',
 'excellent',
 'little',
 'more',
 'further',
 'unpunctual',
 'full',
 'little',
 'steep',
 'such',
 'old',
 'wide',
 'subject',
 'great',
 'strong',
 'outside',
 'clear',
 'short',
 'round',
 'picturesque',
 'pretty',
 'clumsy',
 'full',
 'white',
 'other',
 'most',
 'big',
 'strangest',
 'barbarian',
 'big',
 'great',
 'baggy',
 'dirty',
 'white',
 'white',
 'enormous',
 'heavy',
 'wide',
 'high',
 'long',
 'black',
 'heavy',
 'black',
 'old',
 'Oriental',
 'harmless',
 'natural',
 'dark',
 'interesting',
 'old',
 'stormy',
 'great',
 'terrible',
 'separate',
 'very',
 'seventeenth',
 'proper',
 'great',
 'old',
 'fashioned',
 'elderly',
 'usual',
 'long',
 'double',
 'coloured',
 'tight',
 'modesty',
 'elderly',
 'white',
 'happy',
 'beautiful',
 'best',
 'reticent',
 'true',
 'least',
 'old',
 'other',
 'frightened',
 'mysterious',
 'old',
 'hysterical',
 'young',
 'excited',
 'German',
 'other',
 'able',
 'many',
 'important',
 'fourth',
 'evil',
 'full',
 'such',
 'evident',
 'least',
 'ridiculous',
 'comfortable',
 'imperative',
 'such',
 'idolatrous',
 'ungracious',
 'old',
 'rosary',
 'old',
 'many',
 'easy',
 'usual',
 'high',
 'distant',
 'jagged',
 'big',
 'little',
 'mixed',
 'sleepy',
 'many',
 'odd',
 'simple',
 'disagreeable',
 'most',
 'many',
 'dictionary',
 'same',
 'other',
 'considerable',
 'evil',
 'pleasant',
 'unknown',
 'unknown',
 'hearted',
 'sorrowful',
 'sympathetic',
 'last',
 'picturesque',
 'wide',
 'rich',
 'green',
 'wide',
 'whole',
 'big',
 'small',
 'ghostly',
 'able',
 'green',
 'full',
 'steep',
 'blank',
 'gable',
 'green',
 'green',
 'grassy',
 'pine',
 'rugged',
 'feverish',
 'bent',
 'summertime',
 'different',
 'general',
 'old',
 'good',
 'old',
 'foreign',
 'green',
 'mighty',
 'lofty',
 'full',
 'glorious',
 'beautiful',
 'deep',
 'blue',
 'purple',
 'green',
 'brown',
 'endless',
 'jagged',
 'snowy',
 'mighty',
 'white',
 'lofty',
 'serpentine',
 'right',
 'endless',
 'snowy',
 'delicate',
 'cool',
 'prevalent',
 'many',
 'outer',
 'many',
 'new',
 'beautiful',
 'white',
 'delicate',
 'ordinary',
 'long',
 'like',
 'sure',
 'latter',
 'long',
 'cold',
 'dark',
 'pine',
 'great',
 'weird',
 'solemn',
 'grim',
 'strange',
 'like',
 'steep',
 'fierce',
 'grim',
 'enough',
 'such',
 'only',
 'dark',
 'other',
 'further',
 'long',
 'wild',
 'further',
 'grey',
 'greater',
 'crazy',
 'great',
 'stormy',
 'more',
 'nearer',
 'several',
 'odd',
 'varied',
 'simple',
 'good',
 'kindly',
 'strange',
 'evil',
 'evident',
 'exciting',
 'slightest',
 'little',
 'eastern',
 'dark',
 'rolling',
 'heavy',
 'oppressive',
 'thunderous',
 'dark',
 'only',
 'own',
 'white',
 'white',
 'own',
 'low',
 'less',
 'worse',
 'next',
 'next',
 'universal',
 'black',
 'splendid',
 'tall',
 'long',
 'brown',
 'great',
 'black',
 'bright',
 'red',
 'early',
 'swift',
 'red',
 'white',
 'dead',
 'strange',
 'same',
 'close',
 'prodigious',
 'late',
 'strange',
 'lonely',
 'excellent',
 'same',
 'little',
 'little',
 'frightened',
 'unknown',
 'hard',
 'complete',
 'straight',
 'same',
 'curious',
 'few',
 'general',
 'recent',
 'sick',
 'long',
 'wild',
 'first',
 'rear',
 'sudden',
 'louder',
 'sharper',
 'same',
 'minded',
 'great',
 'few',
 'own',
 'accustomed',
 'quiet',
 'able',
 'extraordinary',
 'manageable',
 'great',
 'far',
 'narrow',
 'great',
 'colder',
 'colder',
 'fine',
 'white',
 'keen',
 'nearer',
 'afraid',
 'least',
 'disturbed',
 'right',
 'faint',
 'flickering',
 'blue',
 'same',
 'less',
 'asleep',
 'awful',
 'blue',
 'faint',
 'few',
 'strange',
 'optical',
 'ghostly',
 'same',
 'momentary',
 'blue',
 'last',
 'worse',
 'black',
 'jagged',
 'white',
 'red',
 'long',
 'shaggy',
 'more',
 'terrible',
 'grim',
 'such',
 'true',
 'peculiar',
 'round',
 'painful',
 'only',
 'imperious',
 'long',
 'impalpable',
 'heavy',
 'strange',
 'uncanny',
 'dreadful',
 'afraid',
 'interminable',
 'complete',
 'rolling',
 'occasional',
 'quick',
 'main',
 'conscious',
 'vast',
 'tall',
 'black',
 'broken',
 'jagged',
 'moonlit',
 'asleep',
 'awake',
 'remarkable',
 'considerable',
 'several',
 'dark',
 'great',
 'round',
 'bigger',
 'able',
 'prodigious',
 'great',
 'old',
 'large',
 'massive',
 'dim',
 'dark',
 'dark',
 'likely',
 'endless',
 'grim',
 'customary',
 'successful',
 'awake',
 'horrible',
 'awake',
 'awake',
 'patient',
 'heavy',
 'great',
 'massive',
 'loud',
 'long',
 'great',
 'tall',
 'old',
 'long',
 'white',
 'single',
 'antique',
 'long',
 'open',
 'old',
 'right',
 'excellent',
 'strange',
 'Welcome',
 'own',
 'cold',
 'dead',
 'Welcome',
 'akin',
 'same',
 'sure',
 'late',
 'available',
 'great',
 'great',
 'open',
 'heavy',
 'mighty',
 'great',
 'small',
 'octagonal',
 'single',
 'great',
 'top',
 'fresh',
 'hollow',
 'wide',
 'ready',
 'other',
 'prepared',
 'courteous',
 'normal',
 'famished',
 'hasty',
 'other',
 'great',
 'graceful',
 'charming',
 'least',
 'constant',
 'happy',
 'sufficient',
 'possible',
 'young',
 'full',
 'own',
 'faithful',
 'silent',
 'ready',
 'excellent',
 'old',
 'many',
 'same',
 'marked',
 'strong',
 'strong',
 'high',
 'thin',
 'lofty',
 'massive',
 'bushy',
 'own',
 'heavy',
 'cruel',
 'sharp',
 'white',
 'remarkable',
 'astonishing',
 'pale',
 'pointed',
 'broad',
 'strong',
 'thin',
 'general',
 'extraordinary',
 'white',
 'fine',
 'broad',
 'squat',
 'Strange',
 'long',
 'fine',
 'sharp',
 'horrible',
 'grim',
 'more',
 'protuberant',
 'own',
 'silent',
 'first',
 'dim',
 'strange',
 'many',
 'strange',
 'tired',
 'ready',
 'courteous',
 'octagonal',
 'strange',
 'confess',
 'own',
 'dear',
 'early',
 'last',
 'own',
 'cold',
 'hot',
 'absent',
 'hearty',
 'odd',
 'extraordinary',
 'immense',
 'costliest',
 'beautiful',
 'fabulous',
 'old',
 'excellent',
 'frayed',
 'little',
 'great',
 'vast',
 'whole',
 'full',
 'English',
 'recent',
 'varied',
 'political',
 'such',
 'Blue',
 'hearty',
 'good',
 'glad',
 'sure',
 'much',
 'good',
 'past',
 'many',
 'many',
 'great',
 'crowded',
 'mighty',
 'flattering',
 'little',
 'True',
 'enough',
 'noble',
 'common',
 'strange',
 'content',
 'long',
 'least',
 'other',
 'new',
 'English',
 'smallest',
 'sorry',
 'long',
 'many',
 'important',
 'willing',
 'sure',
 'many',
 'strange',
 'strange',
 'much',
 'evident',
 'many',
 'bolder',
 'strange',
 'blue',
 'certain',
 'last',
 'evil',
 'unchecked',
 'blue',
 'last',
 'little',
 'old',
 'stirring',
 'aged',
 'artificial',
 'triumphant',
 'little',
 'friendly',
 'undiscovered',
 'sure',
 'long',
 'sharp',
 'dear',
 'own',
 'able',
 'right',
 'more',
 'dead',
 'other',
 'last',
 'own',
 'next',
 'interested',
 'myriad',
 'more',
 'needful',
 'alone',
 'other',
 'necessary',
 'ready',
 'suitable',
 'inscribe',
 'dilapidated',
 'high',
 'ancient',
 'heavy',
 'large',
 'closed',
 'heavy',
 'old',
 'old',
 'sided',
 'cardinal',
 'solid',
 'many',
 'gloomy',
 'deep',
 'looking',
 'small',
 'clear',
 'fair',
 'sized',
 'large',
 'mediæval',
 'thick',
 'few',
 'close',
 'old',
 'various',
 'great',
 'few',
 'close',
 'large',
 'private',
 'lunatic',
 'visible',
 'glad',
 'old',
 'big',
 'old',
 'new',
 'habitable',
 'few',
 'old',
 'Transylvanian',
 'common',
 'bright',
 'much',
 'young',
 'gay',
 'young',
 'weary',
 'dead',
 'many',
 'cold',
 'broken',
 'alone',
 'malignant',
 'saturnine',
 'little',
 'certain',
 'little',
 'east',
 'new',
 'other',
 'better',
 'Good',
 'ready',
 'next',
 'excellent',
 'ready',
 'previous',
 'last',
 'conceivable',
 'late',
 'sleepy',
 'long',
 'tired',
 'preternatural',
 'clear',
 'remiss',
 'dear',
 'new',
 'interesting',
 'own',
 'little',
 'warm',
 'glad',
 'first',
 'strange',
 'uneasy',
 'safe',
 'strange',
 'only',
 'prosaic',
 'few',
 'Good',
 'whole',
 'close',
 'whole',
 'startling',
 'many',
 'strange',
 'vague',
 'near',
 'little',
 'instant',
 'dangerous',
 'wretched',
 'foul',
 'bauble',
 'heavy',
 'terrible',
 'annoying',
 'strange',
 'peculiar',
 'little',
 'magnificent',
 'very',
 'terrible',
 'green',
 'deep',
 'silver',
 'deep',
 'available',
 'veritable',
 'wild',
 'little',
 'other',
 'few',
 'mad',
 'helpless',
 'best',
 'definite',
 'certain',
 'doubtless',
 'own',
 'only',
 'open',
 'own',
 'desperate',
 'latter',
 'great',
 'own',
 'odd',
 'menial',
 'terrible',
 'terrible',
 'wild',
 'good',
 'good',
 'odd',
 'idolatrous',
 'tangible',
 'careful',
 'long',
 'few',
 'present',
 'own',
 'fascinating',
 'whole',
 'excited',
 'great',
 'main',
 'proud',
 'many',
 'brave',
 'European',
 'such',
 'fell',
 'warlike',
 'old',
 'great',
 'conquering',
 'proud',
 'strange',
 'Hungarian',
 'Hungarian',
 'victorious',
 'more',
 'endless',
 'sleepless',
 'bloody',
 'great',
 'own',
 'own',
 'own',
 'unworthy',
 'other',
 'later',
 'great',
 'bloody',
 'free',
 'young',
 'warlike',
 'precious',
 'dishonourable',
 'great',
 'close',
 'bare',
 'meagre',
 'own',
 'Last',
 'legal',
 'certain',
 'certain',
 'useful',
 'more',
 'wise',
 'more',
 'certain',
 'practical',
 'local',
 'beautiful',
 'good',
 'Good',
 'strange',
 'local',
 'afield',
 'much',
 'more',
 'easy',
 'other',
 'local',
 'further',
 'such',
 'Good',
 'best',
 'wonderful',
 'much',
 'wonderful',
 'available',
 'first',
 'other',
 'young',
 'heavy',
 'other',
 'cold',
 'own',
 'good',
 'young',
 'other',
 'well',
 'thinnest',
 'foreign',
 'quiet',
 'sharp',
 'red',
 'careful',
 'able',
 'only',
 'formal',
 'quiet',
 'several',
 'own',
 'third',
 'fourth',
 'second',
 'fourth',
 'unsealed',
 'about',
 'much',
 'private',
 'dear',
 'young',
 'other',
 'old',
 'many',
 'bad',
 'own',
 'safe',
 'careful',
 'gruesome',
 'only',
 'terrible',
 'unnatural',
 'horrible',
 'last',
 'freer',
 'little',
 'vast',
 'inaccessible',
 'narrow',
 'fresh',
 'nocturnal',
 'own',
 'full',
 'horrible',
 'terrible',
 'beautiful',
 'soft',
 'yellow',
 'light',
 'soft',
 'distant',
 'melted',
 'mere',
 'own',
 'tall',
 'deep',
 'complete',
 'many',
 'many',
 'interested',
 'wonderful',
 'small',
 'very',
 'whole',
 'dreadful',
 'great',
 'weird',
 'clear',
 'considerable',
 'horrible',
 'awful',
 'more',
 'sidelong',
 'good',
 'more',
 'great',
 'proper',
 'more',
 'new',
 'great',
 'thorough',
 'various',
 'small',
 'open',
 'old',
 'dusty',
 'last',
 'little',
 'heavy',
 'many',
 'right',
 'latter',
 'former',
 'great',
 'great',
 'impregnable',
 'great',
 'light',
 ...]

Then we count the unique adjectives in this list with the Counter() module:

adjs_tally = Counter(adjs)
adjs_tally.most_common()
[('good', 192),
 ('old', 187),
 ('more', 185),
 ('other', 185),
 ('own', 184),
 ('great', 171),
 ('poor', 171),
 ('little', 163),
 ('dear', 145),
 ('much', 132),
 ('such', 129),
 ('last', 116),
 ('same', 110),
 ('many', 100),
 ('terrible', 99),
 ('full', 97),
 ('white', 97),
 ('long', 93),
 ('few', 86),
 ('strange', 85),
 ('first', 76),
 ('new', 74),
 ('open', 71),
 ('ready', 71),
 ('dead', 69),
 ('whole', 66),
 ('sweet', 65),
 ('red', 62),
 ('dark', 61),
 ('strong', 58),
 ('very', 54),
 ('true', 54),
 ('heavy', 53),
 ('young', 53),
 ('right', 49),
 ('able', 47),
 ('happy', 47),
 ('asleep', 46),
 ('quick', 46),
 ('big', 44),
 ('sure', 44),
 ('small', 43),
 ('cold', 41),
 ('wild', 41),
 ('best', 40),
 ('certain', 40),
 ('better', 40),
 ('free', 40),
 ('afraid', 39),
 ('pale', 39),
 ('alone', 39),
 ('high', 37),
 ('low', 37),
 ('silent', 36),
 ('quiet', 35),
 ('glad', 35),
 ('close', 34),
 ('usual', 33),
 ('thin', 33),
 ('sad', 33),
 ('possible', 32),
 ('least', 32),
 ('hard', 32),
 ('present', 32),
 ('bad', 32),
 ('beautiful', 31),
 ('awful', 31),
 ('Good', 31),
 ('next', 29),
 ('mad', 29),
 ('brave', 29),
 ('wide', 28),
 ('anxious', 28),
 ('wonderful', 27),
 ('empty', 27),
 ('electronic', 27),
 ('deep', 26),
 ('only', 26),
 ('late', 25),
 ('horrible', 25),
 ('sharp', 25),
 ('necessary', 25),
 ('fair', 25),
 ('safe', 25),
 ('black', 24),
 ('grim', 24),
 ('bright', 24),
 ('sudden', 24),
 ('fresh', 24),
 ('tired', 24),
 ('well', 24),
 ('different', 23),
 ('awake', 23),
 ('common', 23),
 ('most', 22),
 ('enough', 22),
 ('short', 21),
 ('bitter', 21),
 ('weak', 21),
 ('noble', 20),
 ('odd', 20),
 ('dreadful', 20),
 ('past', 20),
 ('Poor', 19),
 ('satisfied', 19),
 ('latter', 18),
 ('evident', 18),
 ('less', 18),
 ('worse', 18),
 ('real', 18),
 ('half', 18),
 ('round', 17),
 ('simple', 17),
 ('fine', 17),
 ('careful', 17),
 ('sane', 17),
 ('wrong', 17),
 ('evil', 16),
 ('several', 16),
 ('second', 16),
 ('ill', 16),
 ('clever', 16),
 ('hypnotic', 16),
 ('sleepy', 15),
 ('fierce', 15),
 ('complete', 15),
 ('foul', 15),
 ('sacred', 15),
 ('angry', 15),
 ('cheerful', 15),
 ('blind', 15),
 ('easy', 14),
 ('blue', 14),
 ('greater', 14),
 ('tall', 14),
 ('faint', 14),
 ('soft', 14),
 ('worth', 14),
 ('nice', 14),
 ('surprised', 14),
 ('excellent', 13),
 ('green', 13),
 ('general', 13),
 ('large', 13),
 ('various', 13),
 ('Last', 13),
 ('fearful', 13),
 ('grave', 13),
 ('alive', 13),
 ('Dead', 13),
 ('clear', 12),
 ('important', 12),
 ('considerable', 12),
 ('like', 12),
 ('sick', 12),
 ('broken', 12),
 ('dim', 12),
 ('thick', 12),
 ('impossible', 12),
 ('nervous', 12),
 ('unhappy', 12),
 ('stern', 12),
 ('personal', 12),
 ('grateful', 12),
 ('useful', 11),
 ('pretty', 11),
 ('unknown', 11),
 ('conscious', 11),
 ('sufficient', 11),
 ('broad', 11),
 ('hot', 11),
 ('dangerous', 11),
 ('precious', 11),
 ('wise', 11),
 ('calm', 11),
 ('lovely', 11),
 ('human', 11),
 ('violent', 11),
 ('dearest', 11),
 ('selfish', 11),
 ('kind', 11),
 ('special', 11),
 ('public', 11),
 ('rough', 11),
 ('available', 10),
 ('interesting', 10),
 ('further', 10),
 ('mighty', 10),
 ('endless', 10),
 ('early', 10),
 ('painful', 10),
 ('main', 10),
 ('patient', 10),
 ('willing', 10),
 ('uneasy', 10),
 ('desperate', 10),
 ('mere', 10),
 ('voluptuous', 10),
 ('due', 10),
 ('serious', 10),
 ('physical', 10),
 ('stronger', 10),
 ('tiny', 10),
 ('particular', 10),
 ('exact', 9),
 ('comfortable', 9),
 ('steep', 9),
 ('natural', 9),
 ('double', 9),
 ('mysterious', 9),
 ('distant', 9),
 ('bent', 9),
 ('foreign', 9),
 ('solemn', 9),
 ('lonely', 9),
 ('curious', 9),
 ('extraordinary', 9),
 ('narrow', 9),
 ('sorry', 9),
 ('previous', 9),
 ('vague', 9),
 ('deadly', 9),
 ('unconscious', 9),
 ('earnest', 9),
 ('difficult', 9),
 ('armed', 9),
 ('spiritual', 9),
 ('holy', 9),
 ('active', 9),
 ('ordinary', 8),
 ('sharper', 8),
 ('peculiar', 8),
 ('loud', 8),
 ('faithful', 8),
 ('warm', 8),
 ('near', 8),
 ('helpless', 8),
 ('local', 8),
 ('about', 8),
 ('former', 8),
 ('worst', 8),
 ('similar', 8),
 ('miserable', 8),
 ('funny', 8),
 ('absolute', 8),
 ('busy', 8),
 ('eager', 8),
 ('dry', 8),
 ('rare', 8),
 ('tight', 7),
 ('excited', 7),
 ('grey', 7),
 ('successful', 7),
 ('single', 7),
 ('prepared', 7),
 ('courteous', 7),
 ('content', 7),
 ('interested', 7),
 ('proud', 7),
 ('later', 7),
 ('light', 7),
 ('gentle', 7),
 ('More', 7),
 ('resolute', 7),
 ('beloved', 7),
 ('regular', 7),
 ('accurate', 7),
 ('weaker', 7),
 ('mental', 7),
 ('stertorous', 7),
 ('leaden', 7),
 ('equal', 7),
 ('correct', 6),
 ('frightened', 6),
 ('cool', 6),
 ('recent', 6),
 ('accustomed', 6),
 ('hollow', 6),
 ('hearty', 6),
 ('English', 6),
 ('sized', 6),
 ('private', 6),
 ('visible', 6),
 ('weary', 6),
 ('bare', 6),
 ('legal', 6),
 ('dusty', 6),
 ('cunning', 6),
 ('advanced', 6),
 ('upset', 6),
 ('fatal', 6),
 ('wooden', 6),
 ('diabolical', 6),
 ('intense', 6),
 ('married', 6),
 ('favourite', 6),
 ('vital', 6),
 ('lower', 6),
 ('rusty', 6),
 ('longer', 6),
 ('intellectual', 6),
 ('deeper', 6),
 ('suspicious', 6),
 ('left', 6),
 ('immediate', 6),
 ('proper', 5),
 ('hysterical', 5),
 ('jagged', 5),
 ('hearted', 5),
 ('blank', 5),
 ('nearer', 5),
 ('swift', 5),
 ('vast', 5),
 ('massive', 5),
 ('likely', 5),
 ('bushy', 5),
 ('cruel', 5),
 ('immense', 5),
 ('True', 5),
 ('closed', 5),
 ('looking', 5),
 ('east', 5),
 ('instant', 5),
 ('silver', 5),
 ('bloody', 5),
 ('third', 5),
 ('yellow', 5),
 ('brilliant', 5),
 ('-', 5),
 ('sensitive', 5),
 ('key', 5),
 ('subtle', 5),
 ('determined', 5),
 ('earthly', 5),
 ('secret', 5),
 ('brute', 5),
 ('homicidal', 5),
 ('concerned', 5),
 ('unusual', 5),
 ('needless', 5),
 ('additional', 5),
 ('loose', 5),
 ('utmost', 5),
 ('infinite', 5),
 ('humble', 5),
 ('younger', 5),
 ('pure', 5),
 ('apparent', 5),
 ('hush', 5),
 ('pleased', 5),
 ('rude', 5),
 ('slow', 5),
 ('worried', 5),
 ('original', 5),
 ('individual', 5),
 ('mortal', 5),
 ('thicker', 5),
 ('dirty', 4),
 ('fourth', 4),
 ('pleasant', 4),
 ('sorrowful', 4),
 ('ghostly', 4),
 ('snowy', 4),
 ('delicate', 4),
 ('weird', 4),
 ('far', 4),
 ('top', 4),
 ('normal', 4),
 ('charming', 4),
 ('constant', 4),
 ('Strange', 4),
 ('gay', 4),
 ('startling', 4),
 ('definite', 4),
 ('sheer', 4),
 ('golden', 4),
 ('wicked', 4),
 ('rocky', 4),
 ('safer', 4),
 ('agonised', 4),
 ('laden', 4),
 ('harsh', 4),
 ('Same', 4),
 ('lethal', 4),
 ('useless', 4),
 ('bloated', 4),
 ('despairing', 4),
 ('idle', 4),
 ('nearest', 4),
 ('haired', 4),
 ('ashamed', 4),
 ('perfect', 4),
 ('loving', 4),
 ('sceptical', 4),
 ('firm', 4),
 ('zoöphagous', 4),
 ('anæmic', 4),
 ('sore', 4),
 ('greatest', 4),
 ('frequent', 4),
 ('tempest', 4),
 ('entire', 4),
 ('fro', 4),
 ('steady', 4),
 ('ignorant', 4),
 ('restless', 4),
 ('agonising', 4),
 ('clung', 4),
 ('strict', 4),
 ('deserted', 4),
 ('padded', 4),
 ('unexpected', 4),
 ('reasonable', 4),
 ('future', 4),
 ('slight', 4),
 ('lethargic', 4),
 ('desolate', 4),
 ('narcotic', 4),
 ('garlic', 4),
 ('blessed', 4),
 ('feeble', 4),
 ('whiter', 4),
 ('unable', 4),
 ('amazed', 4),
 ('unclean', 4),
 ('hungry', 4),
 ('official', 4),
 ('powerless', 4),
 ('highest', 4),
 ('Holy', 4),
 ('splendid', 3),
 ('Turkish', 3),
 ('thirsty', 3),
 ('harmless', 3),
 ('elderly', 3),
 ('rich', 3),
 ('lofty', 3),
 ('glorious', 3),
 ('purple', 3),
 ('brown', 3),
 ('outer', 3),
 ('eastern', 3),
 ('straight', 3),
 ('colder', 3),
 ('keen', 3),
 ('momentary', 3),
 ('uncanny', 3),
 ('occasional', 3),
 ('remarkable', 3),
 ('marked', 3),
 ('pointed', 3),
 ('needful', 3),
 ('suitable', 3),
 ('ancient', 3),
 ('solid', 3),
 ('lunatic', 3),
 ('doubtless', 3),
 ('warlike', 3),
 ('sleepless', 3),
 ('practical', 3),
 ('formal', 3),
 ('gruesome', 3),
 ('freer', 3),
 ('thorough', 3),
 ('nineteenth', 3),
 ('delightful', 3),
 ('repulsive', 3),
 ('shadowy', 3),
 ('intact', 3),
 ('manifest', 3),
 ('uncertain', 3),
 ('clean', 3),
 ('square', 3),
 ('post', 3),
 ('dizzy', 3),
 ('south', 3),
 ('vain', 3),
 ('smooth', 3),
 ('Hark', 3),
 ('powerful', 3),
 ('happier', 3),
 ('redder', 3),
 ('quickest', 3),
 ('longing', 3),
 ('inclined', 3),
 ('worthy', 3),
 ('Little', 3),
 ('unselfish', 3),
 ('fit', 3),
 ('shorthand', 3),
 ('wet', 3),
 ('inner', 3),
 ('Russian', 3),
 ('foolish', 3),
 ('eyed', 3),
 ('fond', 3),
 ('rosy', 3),
 ('respectful', 3),
 ('religious', 3),
 ('jealous', 3),
 ('exhausted', 3),
 ('medical', 3),
 ('appalling', 3),
 ('dull', 3),
 ('sullen', 3),
 ('shocked', 3),
 ('thankful', 3),
 ('prolonged', 3),
 ('nauseous', 3),
 ('sceptic', 3),
 ('bold', 3),
 ('dazed', 3),
 ('stupid', 3),
 ('frightful', 3),
 ('relieved', 3),
 ('frantic', 3),
 ('overhead', 3),
 ('decent', 3),
 ('aware', 3),
 ('monstrous', 3),
 ('harrowing', 3),
 ('bowed', 3),
 ('faced', 3),
 ('negative', 3),
 ('tender', 3),
 ('daily', 3),
 ('unholy', 3),
 ('smaller', 3),
 ('actual', 3),
 ('mistaken', 3),
 ('puzzled', 3),
 ('startled', 3),
 ('troubled', 3),
 ('electric', 3),
 ('huge', 3),
 ('gallant', 3),
 ('final', 3),
 ('higher', 3),
 ('hellish', 3),
 ('Unclean', 3),
 ('orderly', 3),
 ('plain', 3),
 ('wily', 3),
 ('flagged', 3),
 ('alert', 3),
 ('criminal', 3),
 ('derivative', 3),
 ('applicable', 3),
 ('defective', 3),
 ('western', 2),
 ('German', 2),
 ('continuous', 2),
 ('subject', 2),
 ('picturesque', 2),
 ('clumsy', 2),
 ('strangest', 2),
 ('stormy', 2),
 ('fashioned', 2),
 ('coloured', 2),
 ('modesty', 2),
 ('reticent', 2),
 ('ridiculous', 2),
 ('imperative', 2),
 ('idolatrous', 2),
 ('mixed', 2),
 ('sympathetic', 2),
 ('pine', 2),
 ('varied', 2),
 ('kindly', 2),
 ('exciting', 2),
 ('slightest', 2),
 ('rolling', 2),
 ('oppressive', 2),
 ('prodigious', 2),
 ('louder', 2),
 ('minded', 2),
 ('imperious', 2),
 ('moonlit', 2),
 ('bigger', 2),
 ('Welcome', 2),
 ('octagonal', 2),
 ('absent', 2),
 ('political', 2),
 ('stirring', 2),
 ('aged', 2),
 ('friendly', 2),
 ('myriad', 2),
 ('malignant', 2),
 ('veritable', 2),
 ('fascinating', 2),
 ('fell', 2),
 ('Hungarian', 2),
 ('unnatural', 2),
 ('sidelong', 2),
 ('hateful', 2),
 ('merciful', 2),
 ('intolerable', 2),
 ('super', 2),
 ('languorous', 2),
 ('spoken', 2),
 ('vile', 2),
 ('cheery', 2),
 ('piteous', 2),
 ('muffled', 2),
 ('ruthless', 2),
 ('tiniest', 2),
 ('aërial', 2),
 ('naked', 2),
 ('British', 2),
 ('odour', 2),
 ('earthy', 2),
 ('stately', 2),
 ('handsome', 2),
 ('fancy', 2),
 ('momentous', 2),
 ('playful', 2),
 ('honest', 2),
 ('confused', 2),
 ('ungrateful', 2),
 ('valuable', 2),
 ('excitable', 2),
 ('secure', 2),
 ('drunk', 2),
 ('happiest', 2),
 ('sandy', 2),
 ('grand', 2),
 ('Sacred', 2),
 ('bier', 2),
 ('fed', 2),
 ('raw', 2),
 ('exceptional', 2),
 ('easier', 2),
 ('giant', 2),
 ('mild', 2),
 ('downward', 2),
 ('lively', 2),
 ('broke', 2),
 ('incredible', 2),
 ('fast', 2),
 ('flat', 2),
 ('drooping', 2),
 ('chief', 2),
 ('August._--The', 2),
 ('impatient', 2),
 ('superstitious', 2),
 ('First', 2),
 ('calmer', 2),
 ('furious', 2),
 ('distorted', 2),
 ('paler', 2),
 ('languid', 2),
 ('larger', 2),
 ('routine', 2),
 ('sublime', 2),
 ('attendant', 2),
 ('bulky', 2),
 ('functional', 2),
 ('stiff', 2),
 ('truest', 2),
 ('recuperative', 2),
 ('beneficial', 2),
 ('stalwart', 2),
 ('cerebral', 2),
 ('Quick', 2),
 ('pallid', 2),
 ('probable', 2),
 ('medicinal', 2),
 ('healthy', 2),
 ('poignant', 2),
 ('hospitable', 2),
 ('sovereign', 2),
 ('unnecessary', 2),
 ('peaceful', 2),
 ('prostrate', 2),
 ('surgical', 2),
 ('sternest', 2),
 ('outstretched', 2),
 ('profound', 2),
 ('cathedral', 2),
 ('intent', 2),
 ('spirited', 2),
 ('professional', 2),
 ('specific', 2),
 ('mortem', 2),
 ('eternal', 2),
 ('dreary', 2),
 ('direct', 2),
 ('hostile', 2),
 ('closer', 2),
 ('youthful', 2),
 ('terrified', 2),
 ('moral', 2),
 ('logical', 2),
 ('forceful', 2),
 ('Dear', 2),
 ('silly', 2),
 ('indicative', 2),
 ('barren', 2),
 ('overwrought', 2),
 ('comparative', 2),
 ('aright', 2),
 ('numerous', 2),
 ('northern', 2),
 ('fewer', 2),
 ('unhallowed', 2),
 ('nay', 2),
 ('Most', 2),
 ('rational', 2),
 ('frank', 2),
 ('affected', 2),
 ('crimson', 2),
 ('careless', 2),
 ('callous', 2),
 ('livid', 2),
 ('carnal', 2),
 ('Brave', 2),
 ('hideous', 2),
 ('awkward', 2),
 ('chronological', 2),
 ('thoughtful', 2),
 ('ultimate', 2),
 ('central', 2),
 ('neutral', 2),
 ('emotional', 2),
 ('appealing', 2),
 ('contemptuous', 2),
 ('apt', 2),
 ('elemental', 2),
 ('positive', 2),
 ('live', 2),
 ('known', 2),
 ('meaner', 2),
 ('limited', 2),
 ('idiotic', 2),
 ('conventional', 2),
 ('liable', 2),
 ('respectable', 2),
 ('amenable', 2),
 ('oaken', 2),
 ('corrupt', 2),
 ('nether', 2),
 ('sound', 2),
 ('false', 2),
 ('indifferent', 2),
 ('unspeakable', 2),
 ('typical', 2),
 ('fangled', 2),
 ('laconic', 2),
 ('weakest', 2),
 ('paralysed', 2),
 ('wakeful', 2),
 ('pitiful', 2),
 ('simplest', 2),
 ('forgetful', 2),
 ('holiest', 2),
 ('devoted', 2),
 ('greenish', 2),
 ('stable', 2),
 ('radiant', 2),
 ('latest', 2),
 ('sole', 2),
 ('doubtful', 2),
 ('predestinate', 2),
 ('equipped', 2),
 ('commercial', 2),
 ('Many', 2),
 ('registered', 2),
 ('readable', 2),
 ('widest', 2),
 ('exempt', 2),
 ('federal', 2),
 ('DEAR', 1),
 ('1st', 1),
 ('national', 1),
 ('wildest', 1),
 ('distinct', 1),
 ('eleventh', 1),
 ('imaginative', 1),
 ('unpunctual', 1),
 ('outside', 1),
 ('barbarian', 1),
 ('baggy', 1),
 ('enormous', 1),
 ('Oriental', 1),
 ('separate', 1),
 ('seventeenth', 1),
 ('ungracious', 1),
 ('rosary', 1),
 ('disagreeable', 1),
 ('dictionary', 1),
 ('gable', 1),
 ('grassy', 1),
 ('rugged', 1),
 ('feverish', 1),
 ('summertime', 1),
 ('serpentine', 1),
 ('prevalent', 1),
 ('crazy', 1),
 ('thunderous', 1),
 ('universal', 1),
 ('rear', 1),
 ('manageable', 1),
 ('disturbed', 1),
 ('flickering', 1),
 ('optical', 1),
 ('shaggy', 1),
 ('impalpable', 1),
 ('interminable', 1),
 ('customary', 1),
 ('antique', 1),
 ('akin', 1),
 ('famished', 1),
 ('hasty', 1),
 ('graceful', 1),
 ('astonishing', 1),
 ('squat', 1),
 ('protuberant', 1),
 ('confess', 1),
 ('costliest', 1),
 ('fabulous', 1),
 ('frayed', 1),
 ('Blue', 1),
 ('crowded', 1),
 ('flattering', 1),
 ('smallest', 1),
 ('bolder', 1),
 ('unchecked', 1),
 ('artificial', 1),
 ('triumphant', 1),
 ('undiscovered', 1),
 ('inscribe', 1),
 ('dilapidated', 1),
 ('sided', 1),
 ('cardinal', 1),
 ('gloomy', 1),
 ('mediæval', 1),
 ('habitable', 1),
 ('Transylvanian', 1),
 ('saturnine', 1),
 ('conceivable', 1),
 ('preternatural', 1),
 ('remiss', 1),
 ('prosaic', 1),
 ('wretched', 1),
 ('bauble', 1),
 ('annoying', 1),
 ('magnificent', 1),
 ('menial', 1),
 ('tangible', 1),
 ('European', 1),
 ('conquering', 1),
 ('victorious', 1),
 ('unworthy', 1),
 ('dishonourable', 1),
 ('meagre', 1),
 ('afield', 1),
 ('thinnest', 1),
 ('unsealed', 1),
 ('inaccessible', 1),
 ('nocturnal', 1),
 ('melted', 1),
 ('impregnable', 1),
 ('bygone', 1),
 ('curtainless', 1),
 ('unchanged', 1),
 ('dreamy', 1),
 ('musical', 1),
 ('Sweet', 1),
 ('deliberate', 1),
 ('thrilling', 1),
 ('scarlet', 1),
 ('Lower', 1),
 ('slender', 1),
 ('soulless', 1),
 ('smothered', 1),
 ('aghast', 1),
 ('unquestionable', 1),
 ('unwound', 1),
 ('suavest', 1),
 ('madness', 1),
 ('fearless', 1),
 ('smoothest', 1),
 ('surest', 1),
 ('sturdy', 1),
 ('hetman', 1),
 ('unloaded', 1),
 ('nebulous', 1),
 ('phantom', 1),
 ('materialised', 1),
 ('dishevelled', 1),
 ('metallic', 1),
 ('vaporous', 1),
 ('crawl', 1),
 ('slid', 1),
 ('Roman', 1),
 ('Austrian', 1),
 ('Greek', 1),
 ('circular', 1),
 ('heavier', 1),
 ('stony', 1),
 ('genuine', 1),
 ('ponderous', 1),
 ('unlocked', 1),
 ('welcome', 1),
 ('fuller', 1),
 ('ruby', 1),
 ('filthy', 1),
 ('nethermost', 1),
 ('gipsy', 1),
 ('strained', 1),
 ('cursed', 1),
 ('stenographic', 1),
 ('hurried', 1),
 ('imperturbable', 1),
 ('tough', 1),
 ('psychological', 1),
 ('extravagant', 1),
 ('engaged', 1),
 ('exquisite', 1),
 ('American', 1),
 ('humoured', 1),
 ('sloppy', 1),
 ('rarer', 1),
 ('ætat', 1),
 ('sanguine', 1),
 ('centripetal', 1),
 ('paramount', 1),
 ('noblest', 1),
 ('sweeter', 1),
 ('lovelier', 1),
 ('romantic', 1),
 ('nicest', 1),
 ('crooked', 1),
 ('mournful', 1),
 ('cheap', 1),
 ('bothered', 1),
 ('dictatorial', 1),
 ('sermon', 1),
 ('illsome', 1),
 ('ireful', 1),
 ('acant', 1),
 ('poorish', 1),
 ('balm', 1),
 ('aftest', 1),
 ('opposite', 1),
 ('pious', 1),
 ('pantin', 1),
 ('gladsome', 1),
 ('stubble', 1),
 ('coming', 1),
 ('paved', 1),
 ('back', 1),
 ('wholesome', 1),
 ('rid', 1),
 ('Whole', 1),
 ('rudimentary', 1),
 ('obliterated', 1),
 ('sleek', 1),
 ('unprepared', 1),
 ('tame', 1),
 ('undeveloped', 1),
 ('cumulative', 1),
 ('hopeless', 1),
 ('churchyard', 1),
 ('August._--Another', 1),
 ('threatening', 1),
 ('whettin', 1),
 ('bringin', 1),
 ('queerest', 1),
 ('suddenest', 1),
 ('unique', 1),
 ('uncommon', 1),
 ('barometrical', 1),
 ('foretold', 1),
 ('emphatic', 1),
 ('pink', 1),
 ('colossal', 1),
 ('cobble', 1),
 ('mule', 1),
 ('noticeable', 1),
 ('prolific', 1),
 ('French', 1),
 ('convulsed', 1),
 ('dank', 1),
 ('clammy', 1),
 ('immeasurable', 1),
 ('effective', 1),
 ('gunwale', 1),
 ('sheltering', 1),
 ('damp', 1),
 ('headlong', 1),
 ('unsteered', 1),
 ('aft', 1),
 ('awed', 1),
 ('Accurate', 1),
 ('civilian', 1),
 ('delegated', 1),
 ('honourable', 1),
 ('technical', 1),
 ('East', 1),
 ...]

Then we make a dataframe from this list:

df = pd.DataFrame(adjs_tally.most_common(), columns=['adj', 'count'])
df[:100]
adj count
0 good 192
1 old 187
2 more 185
3 other 185
4 own 184
5 great 171
6 poor 171
7 little 163
8 dear 145
9 much 132
10 such 129
11 last 116
12 same 110
13 many 100
14 terrible 99
15 full 97
16 white 97
17 long 93
18 few 86
19 strange 85
20 first 76
21 new 74
22 open 71
23 ready 71
24 dead 69
25 whole 66
26 sweet 65
27 red 62
28 dark 61
29 strong 58
30 very 54
31 true 54
32 heavy 53
33 young 53
34 right 49
35 able 47
36 happy 47
37 asleep 46
38 quick 46
39 big 44
40 sure 44
41 small 43
42 cold 41
43 wild 41
44 best 40
45 certain 40
46 better 40
47 free 40
48 afraid 39
49 pale 39
50 alone 39
51 high 37
52 low 37
53 silent 36
54 quiet 35
55 glad 35
56 close 34
57 usual 33
58 thin 33
59 sad 33
60 possible 32
61 least 32
62 hard 32
63 present 32
64 bad 32
65 beautiful 31
66 awful 31
67 Good 31
68 next 29
69 mad 29
70 brave 29
71 wide 28
72 anxious 28
73 wonderful 27
74 empty 27
75 electronic 27
76 deep 26
77 only 26
78 late 25
79 horrible 25
80 sharp 25
81 necessary 25
82 fair 25
83 safe 25
84 black 24
85 grim 24
86 bright 24
87 sudden 24
88 fresh 24
89 tired 24
90 well 24
91 different 23
92 awake 23
93 common 23
94 most 22
95 enough 22
96 short 21
97 bitter 21
98 weak 21
99 noble 20

Get Nouns

POS

Description

Examples

NOUN

noun

girl, cat, tree, air, beauty

To extract and count nouns, we can follow the same model as above, except we will change our if statement to check for POS labels that match “NOUN”.

nouns = []
for token in document:
    if token.pos_ == 'NOUN':
        nouns.append(token.text)

nouns_tally = Counter(nouns)

df = pd.DataFrame(nouns_tally.most_common(), columns=['noun', 'count'])
df[:100]
noun count
0 time 385
1 night 314
2 man 251
3 room 231
4 way 222
5 day 218
6 hand 202
7 face 199
8 door 198
9 eyes 188
10 things 171
11 friend 165
12 work 162
13 life 144
14 heart 140
15 men 138
16 place 133
17 house 131
18 sleep 121
19 window 116
20 blood 112
21 one 110
22 moment 106
23 head 104
24 hands 104
25 morning 97
26 thing 91
27 bed 90
28 mind 88
29 death 88
30 others 82
31 sort 80
32 fear 76
33 child 74
34 case 72
35 husband 72
36 light 70
37 side 68
38 dear 67
39 rest 66
40 word 66
41 soul 65
42 world 62
43 box 62
44 ship 62
45 part 61
46 days 61
47 end 60
48 water 59
49 lips 59
50 woman 57
51 diary 57
52 hour 56
53 horses 56
54 times 56
55 brain 56
56 body 55
57 air 54
58 sun 53
59 fellow 52
60 voice 52
61 look 51
62 CHAPTER 50
63 words 50
64 earth 50
65 boxes 50
66 mother 48
67 trouble 48
68 people 47
69 letter 46
70 strength 46
71 silence 46
72 feet 46
73 power 46
74 kind 45
75 women 45
76 wolves 45
77 cause 45
78 thought 44
79 o'clock 43
80 throat 43
81 snow 42
82 sunset 42
83 sea 42
84 morrow 42
85 teeth 42
86 knowledge 42
87 key 41
88 instant 41
89 friends 41
90 matter 41
91 duty 40
92 fire 40
93 patient 40
94 castle 39
95 sight 39
96 minutes 39
97 wind 39
98 none 39
99 pain 39

Get Verbs

POS

Description

Examples

VERB

verb

run, runs, running, eat, ate, eating

To extract and count works of art, we can follow a similar-ish model to the examples above. This time, however, we’re going to make our code even more economical and efficient (while still changing our if statement to match the POS label “VERB”).

Python Review!

We can use a list comprehension to get our list of verbs in a single line of code! Closely examine the first line of code below:

verbs = [token.text for token in document if token.pos_ == 'VERB']

verbs_tally = Counter(verbs)

df = pd.DataFrame(verbs_tally.most_common(), columns=['verb', 'count'])
df[:100]
verb count
0 could 504
1 said 461
2 can 459
3 must 447
4 would 441
5 will 431
6 shall 425
7 know 396
8 may 394
9 see 376
10 came 307
11 went 298
12 come 295
13 go 271
14 seemed 242
15 took 223
16 saw 216
17 think 216
18 should 197
19 made 196
20 looked 186
21 tell 177
22 make 164
23 might 158
24 got 157
25 found 154
26 told 144
27 say 141
28 asked 139
29 take 136
30 knew 130
31 done 128
32 find 114
33 let 113
34 want 112
35 thought 110
36 began 109
37 put 106
38 hear 101
39 coming 100
40 look 100
41 seen 95
42 keep 94
43 heard 91
44 looking 88
45 felt 87
46 left 84
47 turned 84
48 stood 80
49 opened 80
50 read 79
51 give 78
52 help 78
53 feel 77
54 lay 74
55 held 73
56 seems 72
57 gone 72
58 sleep 69
59 sat 69
60 ask 68
61 gave 67
62 set 66
63 seem 65
64 believe 65
65 going 64
66 spoke 64
67 try 64
68 speak 62
69 tried 62
70 be 62
71 had 61
72 write 61
73 fear 59
74 fell 57
75 kept 56
76 understand 55
77 passed 55
78 leave 55
79 suppose 53
80 love 51
81 ran 50
82 answered 50
83 grew 49
84 taken 47
85 used 47
86 lost 45
87 die 45
88 called 44
89 wanted 44
90 like 44
91 says 44
92 stopped 43
93 need 43
94 wish 43
95 wait 42
96 given 42
97 became 42
98 moved 42
99 Come 42

Keyword Extraction

Get Sentences with Keyword

spaCy can also identify sentences in a document. To access sentences, we can iterate through document.sents and pull out the .text of each sentence.

We can use spaCy’s sentence-parsing capabilities to extract sentences that contain particular keywords, such as in the function below.

With the function find_sentences_with_keyword(), we will iterate through document.sents and pull out any sentence that contains a particular “keyword.” Then we will display these sentence with the keywords bolded.

import re
from IPython.display import Markdown, display
def find_sentences_with_keyword(keyword, document):
    
    #Iterate through all the sentences in the document and pull out the text of each sentence
    for sentence in document.sents:
        sentence = sentence.text
        
        #Check to see if the keyword is in the sentence (and ignore capitalization by making both lowercase)
        if keyword.lower() in sentence.lower():
            
            #Use the regex library to replace linebreaks and to make the keyword bolded, again ignoring capitalization
            sentence = re.sub('\n', ' ', sentence)
            sentence = re.sub(f"{keyword}", f"**{keyword}**", sentence, flags=re.IGNORECASE)
            
            display(Markdown(sentence))
find_sentences_with_keyword(keyword="telegram", document=document)

_telegram from Arthur Holmwood to Quincey P. Morris.

_telegram, Arthur Holmwood to Seward.

You must send to me the telegram every day; and if there be cause I shall come again.

_telegram, Seward, London, to Van Helsing, Amsterdam.

_telegram, Seward, London, to Van Helsing, Amsterdam.

_telegram, Seward, London, to Van Helsing, Amsterdam.

I hold over telegram to Holmwood till have seen you."

"I waited till I had seen you, as I said in my telegram.

A telegram came from Van Helsing at Amsterdam whilst

_telegram, Van Helsing, Antwerp, to Seward, Carfax.

Helsing's telegram filled me with dismay.

Did you not get my telegram?

I answered as quickly and coherently as I could that I had only got his telegram early in the morning, and had not lost a minute in coming here, and that I could not make any one in the house hear me.

He handed me a telegram:--

In the hall I met Quincey Morris, with a telegram for Arthur telling him that Mrs. Westenra was dead; that Lucy also had been ill, but was now going on better; and that Van Helsing and I were with her.

Later.--A sad home-coming in every way--the house empty of the dear soul who was so good to us; Jonathan still pale and dizzy under a slight relapse of his malady; and now a telegram from Van Helsing,

_telegram, Mrs. Harker to Van Helsing.

When we arrived at the Berkeley Hotel, Van Helsing found a telegram waiting for him:--

I have sent a telegram to Jonathan to come on here when he arrives in London from Whitby.

About half an hour after we had received Mrs. Harker's telegram, there came a quiet, resolute knock at the hall door.

Nota bene, in Madam's telegram he went south from Carfax,

Lord Godalming went to the Consulate to see if any telegram had arrived for him, whilst the rest of us came on to this

He had four telegrams, one each day since we started, and all to the same effect: that the Czarina Catherine had not been reported to Lloyd's from anywhere.

He had arranged before leaving London that his agent should send him every day a telegram saying if the ship had been reported.

Daily telegrams to Godalming, but only the same story: "Not yet reported.

_telegram, October 24th.

We were all wild with excitement yesterday when Godalming got his telegram from Lloyd's.

The telegrams from London have been the same: "no further report."

28 October.--telegram.

the telegram came announcing the arrival in Galatz

Get Keyword in Context

We can also find out about a keyword’s more immediate context — its neighboring words to the left and right — and we can fine-tune our search with POS tagging.

To do so, we will first create a list of what’s called ngrams. “Ngrams” are any sequence of n tokens in a text. They’re an important concept in computational linguistics and NLP. (Have you ever played with Google’s Ngram Viewer?)

Below we’re going to make a list of bigrams, that is, all the two-word combinations from Dracula. We’re going to use these bigrams to find the neighboring words that appear alongside particular keywords.

#Make a list of tokens and POS labels from document if the token is a word 
tokens_and_labels = [(token.text, token.pos_) for token in document if token.is_alpha]
#Make a function to get all two-word combinations
def get_bigrams(word_list, number_consecutive_words=2):
    
    ngrams = []
    adj_length_of_word_list = len(word_list) - (number_consecutive_words - 1)
    
    #Loop through numbers from 0 to the (slightly adjusted) length of your word list
    for word_index in range(adj_length_of_word_list):
        
        #Index the list at each number, grabbing the word at that number index as well as N number of words after it
        ngram = word_list[word_index : word_index + number_consecutive_words]
        
        #Append this word combo to the master list "ngrams"
        ngrams.append(ngram)
        
    return ngrams
bigrams = get_bigrams(tokens_and_labels)

Let’s take a peek at the bigrams:

bigrams[5:20]
[[('by', 'ADP'), ('Bram', 'PROPN')],
 [('Bram', 'PROPN'), ('Stoker', 'PROPN')],
 [('Stoker', 'PROPN'), ('This', 'DET')],
 [('This', 'DET'), ('eBook', 'NOUN')],
 [('eBook', 'NOUN'), ('is', 'AUX')],
 [('is', 'AUX'), ('for', 'ADP')],
 [('for', 'ADP'), ('the', 'DET')],
 [('the', 'DET'), ('use', 'NOUN')],
 [('use', 'NOUN'), ('of', 'ADP')],
 [('of', 'ADP'), ('anyone', 'PRON')],
 [('anyone', 'PRON'), ('anywhere', 'ADV')],
 [('anywhere', 'ADV'), ('at', 'ADP')],
 [('at', 'ADP'), ('no', 'DET')],
 [('no', 'DET'), ('cost', 'NOUN')],
 [('cost', 'NOUN'), ('and', 'CCONJ')]]

Now that we have our list of bigrams, we’re going to make a function get_neighbor_words(). This function will return the most frequent words that appear next to a particular keyword. The function can also be fine-tuned to return neighbor words that match a certain part of speech by changing the pos_label parameter.

def get_neighbor_words(keyword, bigrams, pos_label = None):
    
    neighbor_words = []
    keyword = keyword.lower()
    
    for bigram in bigrams:
        
        #Extract just the lowercased words (not the labels) for each bigram
        words = [word.lower() for word, label in bigram]        
        
        #Check to see if keyword is in the bigram
        if keyword in words:
            
            for word, label in bigram:
                
                #Now focus on the neighbor word, not the keyword
                if word.lower() != keyword:
                    #If the neighbor word matches the right pos_label, append it to the master list
                    if label == pos_label or pos_label == None:
                        neighbor_words.append(word.lower())
    
    return Counter(neighbor_words).most_common()
get_neighbor_words("telegram", bigrams)
[('a', 6),
 ('from', 3),
 ('seward', 3),
 ('arthur', 2),
 ('the', 2),
 ('to', 2),
 ('my', 2),
 ('i', 2),
 ('came', 2),
 ('helsing', 2),
 ('his', 2),
 ('harker', 2),
 ('morris', 1),
 ('every', 1),
 ('see', 1),
 ('day', 1),
 ('back', 1),
 ('over', 1),
 ('it', 1),
 ('van', 1),
 ('filled', 1),
 ('early', 1),
 ('for', 1),
 ('waiting', 1),
 ('there', 1),
 ('madam', 1),
 ('he', 1),
 ('any', 1),
 ('had', 1),
 ('saying', 1),
 ('masts', 1),
 ('october', 1)]
get_neighbor_words("telegram", bigrams, pos_label='VERB')
[('came', 2), ('see', 1), ('filled', 1), ('waiting', 1), ('saying', 1)]

Your Turn!

Try out find_sentences_with_keyword() and get_neighbor_words with your own keywords of interest.

find_sentences_with_keyword(keyword="YOUR KEY WORD", document=document)
get_neighbor_words(keyword="YOUR KEY WORD", bigrams, pos_label=None)