Data Types#

Note: You can explore the associated workbook for this chapter in the cloud.

There are four essential kinds of Python data with different powers and capabilities:

  • Strings (Text)

  • Integers (Whole Numbers)

  • Floats (Decimal Numbers)

  • Booleans (True/False)

They’re sort of like starter pack Pokémon!

https://hips.hearstapps.com/digitalspyuk.cdnds.net/16/08/1456483171-pokemon2.jpg?resize=768:*

Data Types#

Take a look at the variables filepath_of_text and number_of_desired_word in the word count code below.

What differences do you notice between these two variables and their corresponding values?

# Import Libraries and Modules

import re
from collections import Counter

# Define Functions

def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text)
    return split_words

# Define Filepaths and Assign Variables

filepath_of_text = "../texts/music/Beyonce-Lemonade.txt"
number_of_desired_words = 40

stopwords = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours',
'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers',
 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves',
 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are',
 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does',
 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until',
 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into',
 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down',
 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here',
 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more',
 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so',
 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now', 've', 'll', 'amp']

# Read in File

full_text = open(filepath_of_text, encoding="utf-8").read()

# Manipulate and Analyze File

all_the_words = split_into_words(full_text)
meaningful_words = [word for word in all_the_words if word not in stopwords]
meaningful_words_tally = Counter(meaningful_words)
most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)

# Output Results

most_frequent_meaningful_words

You might be wondering…

Why is “…/texts/music/Beyonce-Lemonade.txt” colored in red and surrounded by quotation marks while 40 is colored in green and not surrounded by quotation marks? Because these are two different “types” of Python data.

Data Type

Explanation

Example

String

Text

 "Beyonce-Lemonade.txt",
 "lemonade"

Integer

Whole Numbers

40

Float

Decimal Numbers

40.2

Boolean

True/False

False

Check Data Types#

You can check the data type of any value by using the function type().

type("lemonade")
str
type(filepath_of_text)
Hide code cell output
str
type(40)
int
type(number_of_desired_words)
Hide code cell output
int

Strings#

A string is a Python data type that is treated like text, even if it contains a number. Strings are always enclosed by either single quotation marks 'this is a string' or double quotation marks "this is a string".

'this is a string'
"this is also a string, even though it contains a number like 42"
this is not a string

It doesn’t matter whether you use single or double quotation marks with strings, as long as you use the same kind on either side of the string.

If you need to include a single or double quotation mark inside of a string, then you need to either:

  • use the opposite kind of quotation mark inside the string

  • or “escape” the quotation mark by using a backslash \ before it

"She exclaimed, 'This is a quotation inside a string!''"
"She exclaimed, \"This is also a quotation inside a string!\""

String Methods#

Each data type has different properties and capabilities. So there are special things that only strings can do, and there are special ways of interacting with strings.

For example, you can index and slice strings, you can add strings together, and you can transform strings to uppercase or lowercase. We’re going to learn more about string methods in the next lesson, but here are a few examples using a snippet from Beyoncé’s song “Hold Up.”

lemonade_snippet = "Hold up, they don't love you like I love you"

Index#

lemonade_snippet[0]
Hide code cell output
'H'

Slice#

lemonade_snippet[0:20]
Hide code cell output
"Hold up, they don't "

Add#

lemonade_snippet + " // Slow down, they don't love you like I love you"
Hide code cell output
"Hold up, they don't love you like I love you // Slow down, they don't love you like I love you"

Make uppercase#

lemonade_snippet.upper()
Hide code cell output
"HOLD UP, THEY DON'T LOVE YOU LIKE I LOVE YOU"

f-Strings#

A special kind of string that we’re going to use in this class is called an f-string. An f-string, short for formatted string literal, allows you to insert a variable directly into a string. f-strings were introduced with Python version 3.6.

An f-string must begin with an f outside the quotation marks. Then, inside the quotation marks, the inserted variable must be placed within curly brackets {}.

print(f"Beyonce burst out of the building and sang: \n\n'{lemonade_snippet}'")
Hide code cell output
Beyonce burst out of the building and sang: 

'Hold up, they don't love you like I love you'

Integers & Floats#

An integer and a float (short for floating point number) are two Python data types for representing numbers. Integers represent whole numbers. Floats represent numbers with decimal points. They do not need to be placed in quotation marks.

type(40)
int
type(40.5)
float
type(40.555555)
float

You can do a large range of mathematical calculations and operations with integers and floats. The table below is taken from Python’s documentation about Numeric Types.

Operation

Explanation

x + y

sum of x and y

x - y

difference of x and y

x * y

product of x and y

x / y

quotient of x and y

x // y

floored quotient of x and y

x % y

remainder of x / y

-x

x negated

+x

x unchanged

abs(x)

absolute value or magnitude of x

int(x)

x converted to integer

float(x)

x converted to floating point

pow(x, y)

x to the power y

x ** y

x to the power y

Multiplication#

variable1 = 4
variable2 = 2
variable1 * variable2
Hide code cell output
8

Exponents#

variable1 ** variable2
Hide code cell output
16

Remainder#

72 % 10
Hide code cell output
2

Booleans#

Booleans are “truth” values. They report on whether things in your Python universe are True or False. There are the only two options for a boolean: True or False.

For example, let’s assign the variable beyonce the value "Grammy award-winner"

beyonce = "Grammy award-winner"

Python Review

Remember the difference between a single equals sign `=` and a double equals sign `==`?

  • A single equals sign `=` is used for variable assignment
  • A double equals sign `==` is used as the equals operator

We can “test” whether the variable beyonce equals "Grammy award-winner" by using the equals operator ==. This will return a boolean.

beyonce == "Grammy award-winner"
Hide code cell output
True
type(beyonce == "Grammy award-winner")
bool

If we evaluate whether beyonce instead equals "Oscar award-winner", we will get the boolean answer.

beyonce == "Oscar award-winner"
Hide code cell output
False

TypeError#

If you don’t use the right data “type” for a particular method or function, you will get a TypeError.

Let’s look at what happens if we change the data type number_of_desired_words to a string "40" instead of an integer.

import re
from collections import Counter

def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text)
    return split_words

filepath_of_text = "../texts/music/Beyonce-Lemonade.txt"
number_of_desired_words = "40"

stopwords = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours',
 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers',
 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves',
 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are',
 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does',
 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until',
 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into',
 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down',
 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here',
 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more',
 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so',
 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now', 've', 'll', 'amp']


full_text = open(filepath_of_text, encoding="utf-8").read()

all_the_words = split_into_words(full_text)
meaningful_words = [word for word in all_the_words if word not in stopwords]
meaningful_words_tally = Counter(meaningful_words)
most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)

most_frequent_meaningful_words
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-a142b58e454a> in <module>
     29 meaningful_words = [word for word in all_the_words if word not in stopwords]
     30 meaningful_words_tally = Counter(meaningful_words)
---> 31 most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)
     32 
     33 most_frequent_meaningful_words

~/opt/anaconda3/lib/python3.7/collections/__init__.py in most_common(self, n)
    584         if n is None:
    585             return sorted(self.items(), key=_itemgetter(1), reverse=True)
--> 586         return _heapq.nlargest(n, self.items(), key=_itemgetter(1))
    587 
    588     def elements(self):

~/opt/anaconda3/lib/python3.7/heapq.py in nlargest(n, iterable, key)
    544         pass
    545     else:
--> 546         if n >= size:
    547             return sorted(iterable, key=key, reverse=True)[:n]
    548 

TypeError: '>=' not supported between instances of 'str' and 'int'

Your Turn!#

Here’s an example of data types in action using some biographical information about me.

name = 'Prof. Walsh' #string
age = 1000 #integer
place = 'Chicago' #string 
favorite_food = 'tacos' #string
dog_years_age = age * 7.5 #float
student = False #boolean
print(f'✨This is...{name}!✨')

print(f"""{name} likes {favorite_food} and once lived in {place}.
{name} is {age} years old, which is {dog_years_age} in dog years.
The statement '{name} is a student' is {student}.""")
Hide code cell output
✨This is...Prof. Walsh!✨
Prof. Walsh likes tacos and once lived in Chicago.
Prof. Walsh is 1000 years old, which is 7500.0 in dog years.
The statement 'Prof. Walsh is a student' is False.
print(f"""
name = {type(name)}
age = {type(age)}
place = {type(place)}
favorite_food = {type(favorite_food)}
dog_years_age = {type(dog_years_age)}
student = {type(student)}
""")
Hide code cell output
name = <class 'str'>
age = <class 'int'>
place = <class 'str'>
favorite_food = <class 'str'>
dog_years_age = <class 'float'>
student = <class 'bool'>

Let’s do the same thing but with biographical info about you! Ask your partner a few questions and then fill in the variables below accordingly.

name = #Your code here
age = #Your code here
home_town = #Your code here
favorite_food = #Your code here
dog_years_age =#Your code here * 7.5
student = False #boolean
print(f'✨This is...{name}!✨')

print(f"""{name} likes {favorite_food} and once lived in {place}.
{name} is {age} years old, which is {dog_years_age} in dog years.
The statement "{name} is a student" is {student}.""")

Add a new variable called favorite_movie and update the f-string to include a new sentence about your partner’s favorite movie.

name = 
age = 
home_town = 
favorite_food = 
dog_years_age =
#favorite_movie = 
print(f'✨This is...{name}!✨')

print(f"""{name} likes {favorite_food} and once lived in {place}.
{name} is {age} years old, which is {dog_years_age} in dog years.
The statement "{name} is a student" is {student}.
# YOUR NEW SENTENCE HERE')