TF-IDF

In this lesson, we’re going to learn about a text analysis method called term frequency–inverse document frequency (tf–idf). This method will help us identify the most unique words in a document from a given corpus.

Simple Formula

Calculating the most frequent words in a text can be useful. But often the most frequent words in a text aren’t the most interesting words in a text.

Term frequency-inverse document frequency is a method that tries to help with this problem becuase it identifies the most frequent unique words in a text by comparing it to other texts.

term_frequency * inverse_document_frequency

term_frequency = number of times a given term appears in document

inverse_document_frequency = log(total number of documents / number of documents with term) + 1

We’re going to calculate and compare the tf–idf scores for the word said and the word pigeons in “The Girl Who Raised Pigeons,” the first short story in Lost in the City.

We need the log() function for our calculation, so we’re going to import it from the math package.

from math import log

“said”

total_number_of_documents = 14 ##total number of short stories in *Lost in the City*
number_of_documents_with_term = 13 ##number of short stories the contain the word "said"
term_frequency = 47 ##number of times "said" appears in "The Girl Who Raised Pigeons"
inverse_document_frequency = log(total_number_of_documents / number_of_documents_with_term) + 1
term_frequency * inverse_document_frequency
50.48307469122493

“pigeons”

total_number_of_documents = 14 ##total number of short stories in *Lost in the City*
number_of_documents_with_term = 2 ##number of short stories the contain the word "pigeons"
term_frequency = 30 ##number of times "pigeons" appears in "The Girl Who Raised Pigeons"
inverse_document_frequency = log(total_number_of_documents / number_of_documents_with_term) + 1
term_frequency * inverse_document_frequency
88.3773044716594

tf–idf Scores

“said” = 50.48
“pigeons” = 88.38

Though the word “said” appears 47 times in “The Girl Who Raised Pigeons” and the word “pigeons” only appears 30 times, “pigeons” has a higher tf–idf score than “said” because it’s a rarer word. The word “pigeons” appears in 2 of 14 stories, while “said” appears in 13 of 14 stories, almost all of them.

tf–idf with scikit-learn

We could continue calculating tf–idf scores in this manner — by doing all the math with Python — but conveniently there’s a Python library that can calculate tf–idf scores in just a few lines of code.

This library is called scikit-learn, imported as sklearn. It’s a popular Python library for machine learning approaches such as clustering, classification, and regression, among others. Though we’re not doing any machine learning in this lesson, we’re nevertheless going to use scikit-learn’s TfidfVectorizer and CountVectorizer.

!pip install sklearn
Requirement already satisfied: sklearn in /Users/melaniewalsh/anaconda3/lib/python3.7/site-packages (0.0)
Requirement already satisfied: scikit-learn in /Users/melaniewalsh/anaconda3/lib/python3.7/site-packages (from sklearn) (0.20.3)
Requirement already satisfied: numpy>=1.8.2 in /Users/melaniewalsh/anaconda3/lib/python3.7/site-packages (from scikit-learn->sklearn) (1.17.4)
Requirement already satisfied: scipy>=0.13.3 in /Users/melaniewalsh/anaconda3/lib/python3.7/site-packages (from scikit-learn->sklearn) (1.3.1)

Import Libraries

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import CountVectorizer
import pandas as pd
pd.set_option("max_rows", 600)
pd.set_option("max_columns", 200)
#pd.options.display.float_format = lambda value : '{:.0f}'.format(value) if round(value,0) == value else '{:,.3f}'.format(value)
from pathlib import Path  
import glob

We’re also going to import pandas and change two of its default display settings. We’re going to increase the maximum number of rows that pandas will display, and we’re going to format numbers in a special way. If it’s a decimal number, format to three decimal places; if it’s a whole number, round to the whole number.

Finally, we’re going to import two libraries that will help us work with files and the file system: pathlib and glob. These libraries will help us read in all the short story text files from Lost in the City.

Set Directory Path

Below we’re setting the directory filepath that contains all the short story text files that we want to analyze.

directory_path = "../texts/literature/Lost-in-the-City_Stories/"

Then we’re going to use glob and Path to make a list of all the short story filepaths in that directory and a list of all the short story titles.

text_files = glob.glob(f"{directory_path}/*.txt")
text_files
['../texts/literature/Lost-in-the-City_Stories/11-Gospel.txt',
 '../texts/literature/Lost-in-the-City_Stories/13-A-Dark-Night.txt',
 '../texts/literature/Lost-in-the-City_Stories/01-The-Girl-Who-Raised-Pigeons.txt',
 '../texts/literature/Lost-in-the-City_Stories/12-A-New-Man.txt',
 '../texts/literature/Lost-in-the-City_Stories/02-The-First-Day.txt',
 '../texts/literature/Lost-in-the-City_Stories/07-The-Sunday-Following-Mother’S-Day.txt',
 '../texts/literature/Lost-in-the-City_Stories/03-The-Night-Rhonda-Ferguson-Was-Killed.txt',
 '../texts/literature/Lost-in-the-City_Stories/05-The-Store.txt',
 '../texts/literature/Lost-in-the-City_Stories/08-Lost-In-The-City.txt',
 '../texts/literature/Lost-in-the-City_Stories/14-Marie.txt',
 '../texts/literature/Lost-in-the-City_Stories/09-His-Mother’S-House.txt',
 '../texts/literature/Lost-in-the-City_Stories/10-A-Butterfly-On-F-Street.txt',
 '../texts/literature/Lost-in-the-City_Stories/06-An-Orange-Line-Train-To-Ballston.txt',
 '../texts/literature/Lost-in-the-City_Stories/04-Young-Lions.txt']
text_titles = [Path(text).stem for text in text_files]
text_titles
['11-Gospel',
 '13-A-Dark-Night',
 '01-The-Girl-Who-Raised-Pigeons',
 '12-A-New-Man',
 '02-The-First-Day',
 '07-The-Sunday-Following-Mother’S-Day',
 '03-The-Night-Rhonda-Ferguson-Was-Killed',
 '05-The-Store',
 '08-Lost-In-The-City',
 '14-Marie',
 '09-His-Mother’S-House',
 '10-A-Butterfly-On-F-Street',
 '06-An-Orange-Line-Train-To-Ballston',
 '04-Young-Lions']

Calculate tf–idf

To calculate tf–idf scores for every word, we’re going to use scikit-learn’s TfidfVectorizer.

When you initialize TfidfVectorizer, you can choose to set it with different parameters. These parameters will change the way you calculate tf–idf.

The recommended way to run TfidfVectorizer, however, is with smoothing (smooth_idf = True) and normalization (norm='l2') turned on. These parameters will better account for differences in story length, and, overall, they’ll produce more meaningful tf–idf scores.

Smoothing and L2 normalization are actually the default settings for TfidfVectorizer. To turn them on, you don’t need to include any extra code at all.

Initialize TfidfVectorizer with desired parameters (default smoothing and normalization)

tfidf_vectorizer = TfidfVectorizer(input='filename', stop_words='english')

Plug in “text_files” which contains all our short stories

directory_path = "../texts/literature/Lost-in-the-City_Stories/"
text_files = glob.glob(f"{directory_path}/*.txt")
text_titles = [Path(text).stem for text in text_files]
tfidf_vector = tfidf_vectorizer.fit_transform(text_files)

Make a DataFrame out of the tf–idf vector and sort by title

tfidf_df = pd.DataFrame(tfidf_vector.toarray(), index=text_titles, columns=tfidf_vectorizer.get_feature_names())
tfidf_df = tfidf_df.sort_index()

Add column for number of times word appears in all documents

tfidf_df.loc['Document Frequency'] = (tfidf_df > 0).sum()
tfidf_slice = tfidf_df[['pigeons', 'school', 'said', 'church', 'gospelteers', 'thunder','girl', 'street', 'father', 'dreaming', 'car']]
tfidf_slice
pigeons school said church gospelteers thunder girl street father dreaming car
01-The-Girl-Who-Raised-Pigeons 0.207 0.036 0.133 0.011 0 0 0.062 0.105 0.042 0 0
02-The-First-Day 0 0.134 0 0.031 0 0 0.094 0.070 0.012 0 0
03-The-Night-Rhonda-Ferguson-Was-Killed 0 0.020 0.212 0.003 0 0 0.032 0.082 0.061 0 0.092
04-Young-Lions 0 0.015 0.186 0 0 0 0.005 0.065 0.073 0 0.012
05-The-Store 0 0.018 0.246 0.012 0 0 0.065 0.093 0.100 0 0.032
06-An-Orange-Line-Train-To-Ballston 0 0.036 0.286 0 0 0 0.022 0.036 0.022 0 0.020
07-The-Sunday-Following-Mother’S-Day 0 0.003 0.210 0.010 0 0 0.028 0.023 0.059 0 0.073
08-Lost-In-The-City 0 0.007 0.292 0.025 0 0 0.019 0.051 0.051 0.090 0
09-His-Mother’S-House 0 0.006 0.231 0 0 0 0.010 0.065 0.007 0 0.025
10-A-Butterfly-On-F-Street 0 0 0.171 0 0 0 0 0.128 0.043 0 0.037
11-Gospel 0 0.003 0.233 0.104 0.128 0 0.009 0.045 0.021 0 0.059
12-A-New-Man 0 0.015 0.148 0.030 0 0 0.031 0.027 0.099 0 0.005
13-A-Dark-Night 0 0 0.220 0.024 0 0.152 0.009 0 0 0 0.005
14-Marie 0.011 0 0.226 0 0 0 0.018 0.018 0.018 0 0.010
Document Frequency 2 11 13 9 1 1 13 13 13 1 11

To find out the top 10 words with the highest tf–idf for every story, we’re going to make and run the following function: get_top_tfidf_scores()

def get_top_tfidf_scores(series, top_n=10):
    pretty_df = series.stack().groupby(level=0).nlargest(top_n).reset_index()
    pretty_df = pretty_df.rename(columns={0:'tfidf_score', 'level_1': 'story', 'level_2': 'word'})
    pretty_df = pretty_df.drop(columns='level_0')
    pretty_df['tfidf_rank'] = pretty_df.groupby('story')['tfidf_score'].rank(method='first', ascending=False)
    return pretty_df

As before, this function will rearrange the dataframe, .groupby() short story, and filter for the top 10 highest tf–idf scores in every story. Finally, it will produce a dataframe with a new column tfidf_rank, which contains a 1-10 ranking of the highest tf–idf scores.

tfidf_df = tfidf_df.drop('Document Frequency', errors='ignore')
top_tfidf = get_top_tfidf_scores(tfidf_df)
top_tfidf
story word tfidf_score tfidf_rank
0 01-The-Girl-Who-Raised-Pigeons betsy 0.358 1
1 01-The-Girl-Who-Raised-Pigeons jenny 0.350 2
2 01-The-Girl-Who-Raised-Pigeons ann 0.310 3
3 01-The-Girl-Who-Raised-Pigeons robert 0.295 4
4 01-The-Girl-Who-Raised-Pigeons coop 0.223 5
5 01-The-Girl-Who-Raised-Pigeons pigeons 0.207 6
6 01-The-Girl-Who-Raised-Pigeons miss 0.163 7
7 01-The-Girl-Who-Raised-Pigeons birds 0.147 8
8 01-The-Girl-Who-Raised-Pigeons clara 0.143 9
9 01-The-Girl-Who-Raised-Pigeons said 0.133 10
10 02-The-First-Day mother 0.492 1
11 02-The-First-Day woman 0.252 2
12 02-The-First-Day takes 0.200 3
13 02-The-First-Day looks 0.172 4
14 02-The-First-Day says 0.168 5
15 02-The-First-Day form 0.138 6
16 02-The-First-Day school 0.134 7
17 02-The-First-Day seaton 0.132 8
18 02-The-First-Day jersey 0.127 9
19 02-The-First-Day tells 0.102 10
20 03-The-Night-Rhonda-Ferguson-Was-Killed cassandra 0.711 1
21 03-The-Night-Rhonda-Ferguson-Was-Killed melanie 0.355 2
22 03-The-Night-Rhonda-Ferguson-Was-Killed anita 0.317 3
23 03-The-Night-Rhonda-Ferguson-Was-Killed rhonda 0.232 4
24 03-The-Night-Rhonda-Ferguson-Was-Killed said 0.212 5
25 03-The-Night-Rhonda-Ferguson-Was-Killed gladys 0.182 6
26 03-The-Night-Rhonda-Ferguson-Was-Killed car 0.092 7
27 03-The-Night-Rhonda-Ferguson-Was-Killed street 0.082 8
28 03-The-Night-Rhonda-Ferguson-Was-Killed wesley 0.065 9
29 03-The-Night-Rhonda-Ferguson-Was-Killed girls 0.064 10
30 04-Young-Lions caesar 0.561 1
31 04-Young-Lions sherman 0.458 2
32 04-Young-Lions manny 0.332 3
33 04-Young-Lions carol 0.222 4
34 04-Young-Lions said 0.186 5
35 04-Young-Lions retarded 0.147 6
36 04-Young-Lions heh 0.126 7
37 04-Young-Lions woman 0.110 8
38 04-Young-Lions anna 0.103 9
39 04-Young-Lions man 0.089 10
40 05-The-Store penny 0.441 1
41 05-The-Store said 0.246 2
42 05-The-Store jenkins 0.185 3
43 05-The-Store store 0.182 4
44 05-The-Store kentucky 0.162 5
45 05-The-Store mrs 0.153 6
46 05-The-Store lonney 0.149 7
47 05-The-Store just 0.134 8
48 05-The-Store time 0.131 9
49 05-The-Store mother 0.131 10
50 06-An-Orange-Line-Train-To-Ballston marcus 0.479 1
51 06-An-Orange-Line-Train-To-Ballston avis 0.340 2
52 06-An-Orange-Line-Train-To-Ballston marvin 0.315 3
53 06-An-Orange-Line-Train-To-Ballston marvella 0.290 4
54 06-An-Orange-Line-Train-To-Ballston man 0.286 5
55 06-An-Orange-Line-Train-To-Ballston said 0.286 6
56 06-An-Orange-Line-Train-To-Ballston train 0.200 7
57 06-An-Orange-Line-Train-To-Ballston subway 0.189 8
58 06-An-Orange-Line-Train-To-Ballston dreadlocks 0.139 9
59 06-An-Orange-Line-Train-To-Ballston orange 0.105 10
60 07-The-Sunday-Following-Mother’S-Day madeleine 0.548 1
61 07-The-Sunday-Following-Mother’S-Day maddie 0.454 2
62 07-The-Sunday-Following-Mother’S-Day samuel 0.296 3
63 07-The-Sunday-Following-Mother’S-Day sam 0.260 4
64 07-The-Sunday-Following-Mother’S-Day said 0.210 5
65 07-The-Sunday-Following-Mother’S-Day pookie 0.159 6
66 07-The-Sunday-Following-Mother’S-Day curtis 0.106 7
67 07-The-Sunday-Following-Mother’S-Day arnisa 0.101 8
68 07-The-Sunday-Following-Mother’S-Day williams 0.094 9
69 07-The-Sunday-Following-Mother’S-Day day 0.091 10
70 08-Lost-In-The-City lydia 0.573 1
71 08-Lost-In-The-City said 0.292 2
72 08-Lost-In-The-City mother 0.279 3
73 08-Lost-In-The-City georgia 0.262 4
74 08-Lost-In-The-City cab 0.137 5
75 08-Lost-In-The-City man 0.095 6
76 08-Lost-In-The-City antibes 0.090 7
77 08-Lost-In-The-City dreaming 0.090 8
78 08-Lost-In-The-City walsh 0.090 9
79 08-Lost-In-The-City know 0.089 10
80 09-His-Mother’S-House joyce 0.499 1
81 09-His-Mother’S-House rickey 0.441 2
82 09-His-Mother’S-House santiago 0.373 3
83 09-His-Mother’S-House said 0.231 4
84 09-His-Mother’S-House humphrey 0.231 5
85 09-His-Mother’S-House pearl 0.129 6
86 09-His-Mother’S-House sandy 0.122 7
87 09-His-Mother’S-House smokey 0.109 8
88 09-His-Mother’S-House house 0.090 9
89 09-His-Mother’S-House like 0.086 10
90 10-A-Butterfly-On-F-Street mildred 0.703 1
91 10-A-Butterfly-On-F-Street woman 0.289 2
92 10-A-Butterfly-On-F-Street mansfield 0.180 3
93 10-A-Butterfly-On-F-Street said 0.171 4
94 10-A-Butterfly-On-F-Street butterfly 0.150 5
95 10-A-Butterfly-On-F-Street street 0.128 6
96 10-A-Butterfly-On-F-Street median 0.120 7
97 10-A-Butterfly-On-F-Street woolworth 0.120 8
98 10-A-Butterfly-On-F-Street say 0.096 9
99 10-A-Butterfly-On-F-Street morton 0.090 10
100 11-Gospel vivian 0.581 1
101 11-Gospel diane 0.359 2
102 11-Gospel maude 0.273 3
103 11-Gospel said 0.233 4
104 11-Gospel anita 0.185 5
105 11-Gospel reverend 0.133 6
106 11-Gospel gospelteers 0.128 7
107 11-Gospel mae 0.128 8
108 11-Gospel group 0.120 9
109 11-Gospel church 0.104 10
110 12-A-New-Man woodrow 0.736 1
111 12-A-New-Man rita 0.279 2
112 12-A-New-Man said 0.148 3
113 12-A-New-Man man 0.126 4
114 12-A-New-Man elaine 0.114 5
115 12-A-New-Man father 0.099 6
116 12-A-New-Man cunningham 0.089 7
117 12-A-New-Man daughter 0.088 8
118 12-A-New-Man old 0.085 9
119 12-A-New-Man read 0.082 10
120 13-A-Dark-Night garrett 0.481 1
121 13-A-Dark-Night beatrice 0.392 2
122 13-A-Dark-Night mrs 0.308 3
123 13-A-Dark-Night carmena 0.253 4
124 13-A-Dark-Night said 0.220 5
125 13-A-Dark-Night uncle 0.153 6
126 13-A-Dark-Night thunder 0.152 7
127 13-A-Dark-Night henry 0.127 8
128 13-A-Dark-Night door 0.113 9
129 13-A-Dark-Night daddy 0.108 10
130 14-Marie marie 0.530 1
131 14-Marie vernelle 0.375 2
132 14-Marie wilamena 0.250 3
133 14-Marie said 0.226 4
134 14-Marie man 0.142 5
135 14-Marie security 0.141 6
136 14-Marie woman 0.128 7
137 14-Marie receptionist 0.125 8
138 14-Marie told 0.111 9
139 14-Marie social 0.108 10

Write to a CSV File

filename = "tfidf_Lost-in-The-City.csv"
top_tfidf.to_csv(filename, encoding='UTF-8', index=False)
directory_path = "../texts/history/US_Inaugural_Addresses/"
text_files = glob.glob(f"{directory_path}/*.txt")
text_titles = [Path(text).stem for text in text_files]
tfidf_vector = tfidf_vectorizer.fit_transform(text_files)

Make a DataFrame out of the tf–idf vector and sort by title

tfidf_df = pd.DataFrame(tfidf_vector.toarray(), index=text_titles, columns=tfidf_vectorizer.get_feature_names())
tfidf_df = tfidf_df.sort_index()

Add column for number of times word appears in all documents

tfidf_df.loc['Document Frequency'] = (tfidf_df > 0).sum()

To find out the top 10 words with the highest tf–idf for every story, we’re going to make and run the following function: get_top_tfidf_scores()

def get_top_tfidf_scores(series, top_n=10):
    pretty_df = series.stack().groupby(level=0).nlargest(top_n).reset_index()
    pretty_df = pretty_df.rename(columns={0:'tfidf_score', 'level_1': 'story', 'level_2': 'word'})
    pretty_df = pretty_df.drop(columns='level_0')
    pretty_df['tfidf_rank'] = pretty_df.groupby('story')['tfidf_score'].rank(method='first', ascending=False)
    return pretty_df

As before, this function will rearrange the dataframe, .groupby() short story, and filter for the top 10 highest tf–idf scores in every story. Finally, it will produce a dataframe with a new column tfidf_rank, which contains a 1-10 ranking of the highest tf–idf scores.

tfidf_df = tfidf_df.drop('Document Frequency', errors='ignore')
top_tfidf = get_top_tfidf_scores(tfidf_df)
top_tfidf
story word tfidf_score tfidf_rank
0 01_washington_1789 government 0.114 1
1 01_washington_1789 immutable 0.104 2
2 01_washington_1789 impressions 0.104 3
3 01_washington_1789 providential 0.104 4
4 01_washington_1789 ought 0.104 5
5 01_washington_1789 public 0.103 6
6 01_washington_1789 present 0.098 7
7 01_washington_1789 qualifications 0.096 8
8 01_washington_1789 peculiarly 0.091 9
9 01_washington_1789 article 0.086 10
10 02_washington_1793 1793 0.229 1
11 02_washington_1793 arrive 0.229 2
12 02_washington_1793 upbraidings 0.229 3
13 02_washington_1793 incurring 0.208 4
14 02_washington_1793 violated 0.208 5
15 02_washington_1793 willingly 0.208 6
16 02_washington_1793 injunctions 0.193 7
17 02_washington_1793 knowingly 0.193 8
18 02_washington_1793 previous 0.193 9
19 02_washington_1793 witnesses 0.193 10
20 03_adams_john_1797 people 0.191 1
21 03_adams_john_1797 government 0.161 2
22 03_adams_john_1797 pleasing 0.147 3
23 03_adams_john_1797 foreign 0.117 4
24 03_adams_john_1797 nations 0.114 5
25 03_adams_john_1797 virtuous 0.111 6
26 03_adams_john_1797 houses 0.110 7
27 03_adams_john_1797 legislatures 0.110 8
28 03_adams_john_1797 constitution 0.105 9
29 03_adams_john_1797 honor 0.102 10
30 04_jefferson_1801 government 0.156 1
31 04_jefferson_1801 principle 0.130 2
32 04_jefferson_1801 let 0.118 3
33 04_jefferson_1801 safety 0.108 4
34 04_jefferson_1801 man 0.107 5
35 04_jefferson_1801 thousandth 0.105 6
36 04_jefferson_1801 honest 0.102 7
37 04_jefferson_1801 fellow 0.097 8
38 04_jefferson_1801 retire 0.095 9
39 04_jefferson_1801 opinion 0.093 10
40 05_jefferson_1805 public 0.180 1
41 05_jefferson_1805 false 0.136 2
42 05_jefferson_1805 state 0.122 3
43 05_jefferson_1805 whatsoever 0.117 4
44 05_jefferson_1805 limits 0.107 5
45 05_jefferson_1805 citizens 0.107 6
46 05_jefferson_1805 reason 0.104 7
47 05_jefferson_1805 comforts 0.102 8
48 05_jefferson_1805 press 0.102 9
49 05_jefferson_1805 expenses 0.097 10
50 06_madison_1809 improvements 0.153 1
51 06_madison_1809 belligerent 0.123 2
52 06_madison_1809 public 0.122 3
53 06_madison_1809 nations 0.105 4
54 06_madison_1809 rendered 0.102 5
55 06_madison_1809 authorities 0.089 6
56 06_madison_1809 avail 0.089 7
57 06_madison_1809 examples 0.089 8
58 06_madison_1809 councils 0.086 9
59 06_madison_1809 ones 0.086 10
60 07_madison_1813 war 0.254 1
61 07_madison_1813 british 0.223 2
62 07_madison_1813 massacre 0.119 3
63 07_madison_1813 captives 0.108 4
64 07_madison_1813 cruel 0.108 5
65 07_madison_1813 prisoners 0.108 6
66 07_madison_1813 savage 0.108 7
67 07_madison_1813 element 0.085 8
68 07_madison_1813 enemy 0.085 9
69 07_madison_1813 honorable 0.085 10
70 08_monroe_1817 states 0.184 1
71 08_monroe_1817 government 0.174 2
72 08_monroe_1817 great 0.161 3
73 08_monroe_1817 union 0.117 4
74 08_monroe_1817 people 0.113 5
75 08_monroe_1817 united 0.112 6
76 08_monroe_1817 dangers 0.109 7
77 08_monroe_1817 naval 0.105 8
78 08_monroe_1817 foreign 0.103 9
79 08_monroe_1817 principles 0.098 10
80 09_monroe_1821 great 0.174 1
81 09_monroe_1821 states 0.137 2
82 09_monroe_1821 revenue 0.115 3
83 09_monroe_1821 war 0.114 4
84 09_monroe_1821 parties 0.109 5
85 09_monroe_1821 united 0.108 6
86 09_monroe_1821 commerce 0.105 7
87 09_monroe_1821 force 0.103 8
88 09_monroe_1821 fortifications 0.099 9
89 09_monroe_1821 term 0.095 10
90 10_adams_john_quincy_1825 union 0.257 1
91 10_adams_john_quincy_1825 government 0.148 2
92 10_adams_john_quincy_1825 general 0.109 3
93 10_adams_john_quincy_1825 rights 0.096 4
94 10_adams_john_quincy_1825 dissensions 0.095 5
95 10_adams_john_quincy_1825 public 0.095 6
96 10_adams_john_quincy_1825 constitution 0.090 7
97 10_adams_john_quincy_1825 peace 0.088 8
98 10_adams_john_quincy_1825 country 0.087 9
99 10_adams_john_quincy_1825 performance 0.086 10
100 11_jackson_1829 public 0.161 1
101 11_jackson_1829 generally 0.123 2
102 11_jackson_1829 diffidence 0.113 3
103 11_jackson_1829 defending 0.106 4
104 11_jackson_1829 shall 0.105 5
105 11_jackson_1829 revenue 0.103 6
106 11_jackson_1829 worth 0.100 7
107 11_jackson_1829 government 0.100 8
108 11_jackson_1829 federal 0.093 9
109 11_jackson_1829 power 0.092 10
110 12_jackson_1833 union 0.213 1
111 12_jackson_1833 government 0.208 2
112 12_jackson_1833 states 0.142 3
113 12_jackson_1833 people 0.137 4
114 12_jackson_1833 preservation 0.128 5
115 12_jackson_1833 general 0.125 6
116 12_jackson_1833 exercise 0.119 7
117 12_jackson_1833 inculcate 0.117 8
118 12_jackson_1833 proportion 0.117 9
119 12_jackson_1833 powers 0.114 10
120 13_van_buren_1837 institutions 0.187 1
121 13_van_buren_1837 people 0.138 2
122 13_van_buren_1837 government 0.117 3
123 13_van_buren_1837 supposed 0.110 4
124 13_van_buren_1837 country 0.109 5
125 13_van_buren_1837 actual 0.096 6
126 13_van_buren_1837 experience 0.093 7
127 13_van_buren_1837 adherence 0.084 8
128 13_van_buren_1837 conduct 0.082 9
129 13_van_buren_1837 opinions 0.082 10
130 14_harrison_1841 power 0.204 1
131 14_harrison_1841 constitution 0.183 2
132 14_harrison_1841 executive 0.157 3
133 14_harrison_1841 people 0.142 4
134 14_harrison_1841 government 0.141 5
135 14_harrison_1841 roman 0.111 6
136 14_harrison_1841 states 0.109 7
137 14_harrison_1841 citizens 0.106 8
138 14_harrison_1841 character 0.103 9
139 14_harrison_1841 state 0.095 10
140 15_polk_1845 union 0.259 1
141 15_polk_1845 government 0.257 2
142 15_polk_1845 states 0.218 3
143 15_polk_1845 texas 0.200 4
144 15_polk_1845 revenue 0.147 5
145 15_polk_1845 powers 0.125 6
146 15_polk_1845 protection 0.107 7
147 15_polk_1845 constitution 0.107 8
148 15_polk_1845 interests 0.105 9
149 15_polk_1845 extended 0.090 10
150 16_taylor_1849 shall 0.266 1
151 16_taylor_1849 government 0.118 2
152 16_taylor_1849 duties 0.118 3
153 16_taylor_1849 object 0.104 4
154 16_taylor_1849 congress 0.104 5
155 16_taylor_1849 purity 0.102 6
156 16_taylor_1849 vested 0.102 7
157 16_taylor_1849 measures 0.102 8
158 16_taylor_1849 country 0.101 9
159 16_taylor_1849 affections 0.097 10
160 17_pierce_1853 hardly 0.114 1
161 17_pierce_1853 power 0.102 2
162 17_pierce_1853 position 0.087 3
163 17_pierce_1853 constitutional 0.086 4
164 17_pierce_1853 expect 0.084 5
165 17_pierce_1853 government 0.084 6
166 17_pierce_1853 apparent 0.080 7
167 17_pierce_1853 regarded 0.080 8
168 17_pierce_1853 shall 0.080 9
169 17_pierce_1853 like 0.079 10
170 18_buchanan_1857 states 0.208 1
171 18_buchanan_1857 constitution 0.189 2
172 18_buchanan_1857 shall 0.162 3
173 18_buchanan_1857 question 0.157 4
174 18_buchanan_1857 whilst 0.141 5
175 18_buchanan_1857 territory 0.141 6
176 18_buchanan_1857 union 0.126 7
177 18_buchanan_1857 government 0.120 8
178 18_buchanan_1857 congress 0.118 9
179 18_buchanan_1857 people 0.106 10
180 19_lincoln_1861 constitution 0.214 1
181 19_lincoln_1861 union 0.204 2
182 19_lincoln_1861 case 0.152 3
183 19_lincoln_1861 states 0.145 4
184 19_lincoln_1861 minority 0.132 5
185 19_lincoln_1861 people 0.131 6
186 19_lincoln_1861 clause 0.126 7
187 19_lincoln_1861 government 0.124 8
188 19_lincoln_1861 shall 0.123 9
189 19_lincoln_1861 law 0.123 10
190 20_lincoln_1865 war 0.267 1
191 20_lincoln_1865 offenses 0.235 2
192 20_lincoln_1865 woe 0.235 3
193 20_lincoln_1865 god 0.151 4
194 20_lincoln_1865 offense 0.142 5
195 20_lincoln_1865 wills 0.142 6
196 20_lincoln_1865 answered 0.132 7
197 20_lincoln_1865 slaves 0.124 8
198 20_lincoln_1865 union 0.115 9
199 20_lincoln_1865 altogether 0.112 10
200 21_grant_1869 dollar 0.270 1
201 21_grant_1869 paying 0.162 2
202 21_grant_1869 deal 0.152 3
203 21_grant_1869 specie 0.152 4
204 21_grant_1869 debt 0.135 5
205 21_grant_1869 country 0.128 6
206 21_grant_1869 advisable 0.117 7
207 21_grant_1869 laws 0.116 8
208 21_grant_1869 payments 0.108 9
209 21_grant_1869 pay 0.099 10
210 22_grant_1873 proposition 0.187 1
211 22_grant_1873 domingo 0.178 2
212 22_grant_1873 santo 0.178 3
213 22_grant_1873 transit 0.178 4
214 22_grant_1873 territory 0.121 5
215 22_grant_1873 extermination 0.118 6
216 22_grant_1873 steam 0.118 7
217 22_grant_1873 telegraph 0.118 8
218 22_grant_1873 country 0.118 9
219 22_grant_1873 extension 0.117 10
220 23_hayes_1877 country 0.186 1
221 23_hayes_1877 government 0.168 2
222 23_hayes_1877 behalf 0.128 3
223 23_hayes_1877 public 0.124 4
224 23_hayes_1877 political 0.121 5
225 23_hayes_1877 states 0.114 6
226 23_hayes_1877 party 0.113 7
227 23_hayes_1877 dispute 0.113 8
228 23_hayes_1877 parties 0.110 9
229 23_hayes_1877 reform 0.104 10
230 24_garfield_1881 government 0.187 1
231 24_garfield_1881 people 0.162 2
232 24_garfield_1881 constitution 0.158 3
233 24_garfield_1881 states 0.135 4
234 24_garfield_1881 union 0.132 5
235 24_garfield_1881 suffrage 0.120 6
236 24_garfield_1881 negro 0.119 7
237 24_garfield_1881 authority 0.117 8
238 24_garfield_1881 congress 0.113 9
239 24_garfield_1881 law 0.104 10
240 25_cleveland_1885 people 0.210 1
241 25_cleveland_1885 government 0.209 2
242 25_cleveland_1885 partisan 0.169 3
243 25_cleveland_1885 public 0.164 4
244 25_cleveland_1885 shall 0.129 5
245 25_cleveland_1885 constitution 0.128 6
246 25_cleveland_1885 interests 0.118 7
247 25_cleveland_1885 extravagance 0.111 8
248 25_cleveland_1885 citizen 0.103 9
249 25_cleveland_1885 strife 0.102 10
250 26_harrison_1889 people 0.172 1
251 26_harrison_1889 laws 0.154 2
252 26_harrison_1889 states 0.139 3
253 26_harrison_1889 ballot 0.137 4
254 26_harrison_1889 public 0.129 5
255 26_harrison_1889 methods 0.119 6
256 26_harrison_1889 shall 0.118 7
257 26_harrison_1889 friendly 0.104 8
258 26_harrison_1889 european 0.103 9
259 26_harrison_1889 constitution 0.089 10
260 27_cleveland_1893 people 0.222 1
261 27_cleveland_1893 government 0.148 2
262 27_cleveland_1893 frugality 0.128 3
263 27_cleveland_1893 public 0.103 4
264 27_cleveland_1893 service 0.102 5
265 27_cleveland_1893 support 0.100 6
266 27_cleveland_1893 american 0.097 7
267 27_cleveland_1893 activity 0.096 8
268 27_cleveland_1893 governmental 0.096 9
269 27_cleveland_1893 countrymen 0.089 10
270 28_mckinley_1897 congress 0.189 1
271 28_mckinley_1897 revenue 0.168 2
272 28_mckinley_1897 people 0.162 3
273 28_mckinley_1897 government 0.157 4
274 28_mckinley_1897 loans 0.149 5
275 28_mckinley_1897 legislation 0.126 6
276 28_mckinley_1897 public 0.107 7
277 28_mckinley_1897 business 0.107 8
278 28_mckinley_1897 great 0.105 9
279 28_mckinley_1897 revision 0.100 10
280 29_mckinley_1901 islands 0.216 1
281 29_mckinley_1901 cuba 0.206 2
282 29_mckinley_1901 government 0.154 3
283 29_mckinley_1901 executive 0.148 4
284 29_mckinley_1901 inhabitants 0.147 5
285 29_mckinley_1901 congress 0.142 6
286 29_mckinley_1901 people 0.117 7
287 29_mckinley_1901 states 0.102 8
288 29_mckinley_1901 united 0.100 9
289 29_mckinley_1901 preparation 0.098 10
290 30_roosevelt_theodore_1905 regards 0.199 1
291 30_roosevelt_theodore_1905 problems 0.182 2
292 30_roosevelt_theodore_1905 tasks 0.150 3
293 30_roosevelt_theodore_1905 aright 0.146 4
294 30_roosevelt_theodore_1905 republic 0.121 5
295 30_roosevelt_theodore_1905 life 0.119 6
296 30_roosevelt_theodore_1905 cause 0.116 7
297 30_roosevelt_theodore_1905 faced 0.116 8
298 30_roosevelt_theodore_1905 conditions 0.115 9
299 30_roosevelt_theodore_1905 wish 0.107 10
300 31_taft_1909 interstate 0.207 1
301 31_taft_1909 business 0.201 2
302 31_taft_1909 tariff 0.155 3
303 31_taft_1909 negro 0.154 4
304 31_taft_1909 south 0.129 5
305 31_taft_1909 government 0.121 6
306 31_taft_1909 proper 0.115 7
307 31_taft_1909 race 0.113 8
308 31_taft_1909 feeling 0.111 9
309 31_taft_1909 canal 0.111 10
310 32_wilson_1913 great 0.159 1
311 32_wilson_1913 men 0.143 2
312 32_wilson_1913 familiar 0.142 3
313 32_wilson_1913 stirred 0.142 4
314 32_wilson_1913 studied 0.142 5
315 32_wilson_1913 things 0.124 6
316 32_wilson_1913 justice 0.106 7
317 32_wilson_1913 government 0.106 8
318 32_wilson_1913 life 0.102 9
319 32_wilson_1913 look 0.100 10
320 33_wilson_1917 wished 0.229 1
321 33_wilson_1917 counsel 0.175 2
322 33_wilson_1917 purpose 0.153 3
323 33_wilson_1917 action 0.150 4
324 33_wilson_1917 shall 0.134 5
325 33_wilson_1917 thought 0.127 6
326 33_wilson_1917 stand 0.121 7
327 33_wilson_1917 set 0.111 8
328 33_wilson_1917 politics 0.109 9
329 33_wilson_1917 drawn 0.105 10
330 34_harding_1921 world 0.196 1
331 34_harding_1921 civilization 0.157 2
332 34_harding_1921 america 0.156 3
333 34_harding_1921 war 0.121 4
334 34_harding_1921 relationship 0.119 5
335 34_harding_1921 republic 0.117 6
336 34_harding_1921 order 0.110 7
337 34_harding_1921 understanding 0.110 8
338 34_harding_1921 new 0.098 9
339 34_harding_1921 amid 0.095 10
340 35_coolidge_1925 country 0.121 1
341 35_coolidge_1925 ought 0.117 2
342 35_coolidge_1925 represents 0.114 3
343 35_coolidge_1925 tax 0.113 4
344 35_coolidge_1925 great 0.110 5
345 35_coolidge_1925 property 0.108 6
346 35_coolidge_1925 party 0.107 7
347 35_coolidge_1925 stands 0.107 8
348 35_coolidge_1925 peace 0.104 9
349 35_coolidge_1925 people 0.101 10
350 36_hoover_1929 sup 0.297 1
351 36_hoover_1929 government 0.203 2
352 36_hoover_1929 enforcement 0.194 3
353 36_hoover_1929 18th 0.135 4
354 36_hoover_1929 progress 0.132 5
355 36_hoover_1929 federal 0.126 6
356 36_hoover_1929 ideals 0.113 7
357 36_hoover_1929 business 0.108 8
358 36_hoover_1929 laws 0.107 9
359 36_hoover_1929 peace 0.104 10
360 37_roosevelt_franklin_1933 helped 0.216 1
361 37_roosevelt_franklin_1933 leadership 0.191 2
362 37_roosevelt_franklin_1933 stricken 0.129 3
363 37_roosevelt_franklin_1933 emergency 0.123 4
364 37_roosevelt_franklin_1933 discipline 0.118 5
365 37_roosevelt_franklin_1933 respects 0.118 6
366 37_roosevelt_franklin_1933 money 0.113 7
367 37_roosevelt_franklin_1933 national 0.111 8
368 37_roosevelt_franklin_1933 recovery 0.102 9
369 37_roosevelt_franklin_1933 action 0.097 10
370 38_roosevelt_franklin_1937 democracy 0.178 1
371 38_roosevelt_franklin_1937 government 0.177 2
372 38_roosevelt_franklin_1937 millions 0.141 3
373 38_roosevelt_franklin_1937 paint 0.121 4
374 38_roosevelt_franklin_1937 people 0.116 5
375 38_roosevelt_franklin_1937 economic 0.114 6
376 38_roosevelt_franklin_1937 road 0.113 7
377 38_roosevelt_franklin_1937 progress 0.104 8
378 38_roosevelt_franklin_1937 despair 0.100 9
379 38_roosevelt_franklin_1937 nation 0.100 10
380 39_roosevelt_franklin_1941 democracy 0.244 1
381 39_roosevelt_franklin_1941 know 0.189 2
382 39_roosevelt_franklin_1941 speaks 0.183 3
383 39_roosevelt_franklin_1941 br 0.163 4
384 39_roosevelt_franklin_1941 nation 0.162 5
385 39_roosevelt_franklin_1941 america 0.140 6
386 39_roosevelt_franklin_1941 life 0.118 7
387 39_roosevelt_franklin_1941 spirit 0.114 8
388 39_roosevelt_franklin_1941 freedom 0.109 9
389 39_roosevelt_franklin_1941 1941 0.109 10
390 40_roosevelt_franklin_1945 learned 0.300 1
391 40_roosevelt_franklin_1945 test 0.195 2
392 40_roosevelt_franklin_1945 1945 0.190 3
393 40_roosevelt_franklin_1945 shall 0.174 4
394 40_roosevelt_franklin_1945 trend 0.172 5
395 40_roosevelt_franklin_1945 mistakes 0.160 6
396 40_roosevelt_franklin_1945 peace 0.159 7
397 40_roosevelt_franklin_1945 today 0.154 8
398 40_roosevelt_franklin_1945 upward 0.150 9
399 40_roosevelt_franklin_1945 gain 0.142 10
400 41_truman_1949 world 0.196 1
401 41_truman_1949 nations 0.194 2
402 41_truman_1949 program 0.172 3
403 41_truman_1949 peoples 0.167 4
404 41_truman_1949 democracy 0.154 5
405 41_truman_1949 freedom 0.149 6
406 41_truman_1949 communism 0.147 7
407 41_truman_1949 peace 0.144 8
408 41_truman_1949 countries 0.137 9
409 41_truman_1949 recovery 0.136 10
410 42_eisenhower_1953 free 0.206 1
411 42_eisenhower_1953 faith 0.155 2
412 42_eisenhower_1953 world 0.146 3
413 42_eisenhower_1953 peoples 0.139 4
414 42_eisenhower_1953 productivity 0.134 5
415 42_eisenhower_1953 strength 0.130 6
416 42_eisenhower_1953 peace 0.124 7
417 42_eisenhower_1953 freedom 0.123 8
418 42_eisenhower_1953 shall 0.106 9
419 42_eisenhower_1953 hold 0.105 10
420 43_eisenhower_1957 world 0.194 1
421 43_eisenhower_1957 freedom 0.180 2
422 43_eisenhower_1957 seek 0.176 3
423 43_eisenhower_1957 nations 0.176 4
424 43_eisenhower_1957 peoples 0.158 5
425 43_eisenhower_1957 strives 0.146 6
426 43_eisenhower_1957 peace 0.137 7
427 43_eisenhower_1957 help 0.133 8
428 43_eisenhower_1957 divided 0.115 9
429 43_eisenhower_1957 mr 0.112 10
430 44_kennedy_1961 let 0.268 1
431 44_kennedy_1961 sides 0.263 2
432 44_kennedy_1961 pledge 0.161 3
433 44_kennedy_1961 ask 0.108 4
434 44_kennedy_1961 begin 0.106 5
435 44_kennedy_1961 dare 0.106 6
436 44_kennedy_1961 world 0.103 7
437 44_kennedy_1961 final 0.102 8
438 44_kennedy_1961 new 0.097 9
439 44_kennedy_1961 explore 0.094 10
440 45_johnson_1965 change 0.276 1
441 45_johnson_1965 covenant 0.243 2
442 45_johnson_1965 man 0.174 3
443 45_johnson_1965 mastery 0.154 4
444 45_johnson_1965 nation 0.152 5
445 45_johnson_1965 union 0.151 6
446 45_johnson_1965 old 0.129 7
447 45_johnson_1965 trying 0.110 8
448 45_johnson_1965 people 0.109 9
449 45_johnson_1965 harvest 0.102 10
450 46_nixon_1969 voices 0.209 1
451 46_nixon_1969 peace 0.145 2
452 46_nixon_1969 let 0.141 3
453 46_nixon_1969 earth 0.140 4
454 46_nixon_1969 know 0.138 5
455 46_nixon_1969 man 0.135 6
456 46_nixon_1969 people 0.131 7
457 46_nixon_1969 world 0.128 8
458 46_nixon_1969 rhetoric 0.119 9
459 46_nixon_1969 forward 0.113 10
460 47_nixon_1973 america 0.307 1
461 47_nixon_1973 let 0.282 2
462 47_nixon_1973 peace 0.212 3
463 47_nixon_1973 role 0.190 4
464 47_nixon_1973 world 0.178 5
465 47_nixon_1973 policies 0.176 6
466 47_nixon_1973 responsibility 0.164 7
467 47_nixon_1973 new 0.159 8
468 47_nixon_1973 abroad 0.155 9
469 47_nixon_1973 home 0.127 10
470 48_carter_1977 br 0.223 1
471 48_carter_1977 nation 0.192 2
472 48_carter_1977 dream 0.182 3
473 48_carter_1977 strength 0.147 4
474 48_carter_1977 new 0.142 5
475 48_carter_1977 micah 0.119 6
476 48_carter_1977 thee 0.108 7
477 48_carter_1977 spirit 0.107 8
478 48_carter_1977 human 0.101 9
479 48_carter_1977 enhance 0.100 10
480 49_reagan_1981 government 0.162 1
481 49_reagan_1981 americans 0.157 2
482 49_reagan_1981 heroes 0.137 3
483 49_reagan_1981 believe 0.136 4
484 49_reagan_1981 ve 0.115 5
485 49_reagan_1981 productivity 0.105 6
486 49_reagan_1981 weapon 0.105 7
487 49_reagan_1981 freedom 0.103 8
488 49_reagan_1981 dreams 0.101 9
489 49_reagan_1981 today 0.094 10
490 50_reagan_1985 government 0.161 1
491 50_reagan_1985 freedom 0.160 2
492 50_reagan_1985 nuclear 0.154 3
493 50_reagan_1985 ve 0.154 4
494 50_reagan_1985 weapons 0.140 5
495 50_reagan_1985 people 0.137 6
496 50_reagan_1985 world 0.127 7
497 50_reagan_1985 history 0.105 8
498 50_reagan_1985 human 0.105 9
499 50_reagan_1985 senator 0.102 10
500 51_bush_george_h_w_1989 don 0.186 1
501 51_bush_george_h_w_1989 breeze 0.184 2
502 51_bush_george_h_w_1989 new 0.137 3
503 51_bush_george_h_w_1989 friends 0.137 4
504 51_bush_george_h_w_1989 door 0.134 5
505 51_bush_george_h_w_1989 word 0.132 6
506 51_bush_george_h_w_1989 mr 0.127 7
507 51_bush_george_h_w_1989 hand 0.125 8
508 51_bush_george_h_w_1989 blowing 0.111 9
509 51_bush_george_h_w_1989 things 0.111 10
510 52_clinton_1993 america 0.319 1
511 52_clinton_1993 world 0.227 2
512 52_clinton_1993 americans 0.207 3
513 52_clinton_1993 today 0.186 4
514 52_clinton_1993 change 0.171 5
515 52_clinton_1993 renewal 0.137 6
516 52_clinton_1993 season 0.137 7
517 52_clinton_1993 idea 0.135 8
518 52_clinton_1993 let 0.133 9
519 52_clinton_1993 people 0.129 10
520 53_clinton_1997 century 0.321 1
521 53_clinton_1997 new 0.280 2
522 53_clinton_1997 america 0.200 3
523 53_clinton_1997 promise 0.164 4
524 53_clinton_1997 world 0.135 5
525 53_clinton_1997 land 0.131 6
526 53_clinton_1997 nation 0.117 7
527 53_clinton_1997 americans 0.115 8
528 53_clinton_1997 time 0.108 9
529 53_clinton_1997 let 0.105 10
530 54_bush_george_w_2001 story 0.341 1
531 54_bush_george_w_2001 america 0.193 2
532 54_bush_george_w_2001 civility 0.161 3
533 54_bush_george_w_2001 nation 0.130 4
534 54_bush_george_w_2001 affirm 0.121 5
535 54_bush_george_w_2001 ideals 0.109 6
536 54_bush_george_w_2001 americans 0.108 7
537 54_bush_george_w_2001 promise 0.108 8
538 54_bush_george_w_2001 compassion 0.107 9
539 54_bush_george_w_2001 citizens 0.107 10
540 55_bush_george_w_2005 freedom 0.350 1
541 55_bush_george_w_2005 america 0.285 2
542 55_bush_george_w_2005 liberty 0.174 3
543 55_bush_george_w_2005 americans 0.140 4
544 55_bush_george_w_2005 tyranny 0.127 5
545 55_bush_george_w_2005 seen 0.110 6
546 55_bush_george_w_2005 nation 0.096 7
547 55_bush_george_w_2005 cause 0.093 8
548 55_bush_george_w_2005 history 0.092 9
549 55_bush_george_w_2005 came 0.092 10
550 56_obama_2009 america 0.148 1
551 56_obama_2009 nation 0.120 2
552 56_obama_2009 new 0.118 3
553 56_obama_2009 today 0.115 4
554 56_obama_2009 generation 0.101 5
555 56_obama_2009 let 0.091 6
556 56_obama_2009 jobs 0.091 7
557 56_obama_2009 crisis 0.087 8
558 56_obama_2009 hard 0.085 9
559 56_obama_2009 women 0.085 10
560 57_obama_2013 journey 0.168 1
561 57_obama_2013 creed 0.140 2
562 57_obama_2013 generation 0.127 3
563 57_obama_2013 america 0.125 4
564 57_obama_2013 complete 0.115 5
565 57_obama_2013 requires 0.115 6
566 57_obama_2013 people 0.110 7
567 57_obama_2013 time 0.106 8
568 57_obama_2013 today 0.104 9
569 57_obama_2013 evident 0.101 10
570 58_trump_2017 america 0.350 1
571 58_trump_2017 dreams 0.156 2
572 58_trump_2017 american 0.149 3
573 58_trump_2017 jobs 0.143 4
574 58_trump_2017 protected 0.132 5
575 58_trump_2017 obama 0.120 6
576 58_trump_2017 people 0.112 7
577 58_trump_2017 thank 0.109 8
578 58_trump_2017 borders 0.107 9
579 58_trump_2017 ve 0.107 10

Your Turn!

Take a few minutes to explore the dataframe below and then answer the following questions.

tfidf_compare

1. What is the difference between a tf-idf score and raw word frequency?

Your answer here

2. Based on the dataframe above, what is one potential problem or limitation that you notice with tf-idf scores?

Your answer here

3. What’s another collection of texts that you think might be interesting to analyze with tf-idf scores? Why?

Your answer here