Lists & Loops — Part 2#
Note: You can explore the associated workbook for this chapter in the cloud.
In this lesson, we’re going to learn more about lists and loops by drawing on DH scholar Anelise Shrout’s Bellevue Almshouse Dataset.
Preview The Bellevue Almshouse Dataset
date_in | first_name | last_name | age | disease | profession | gender | children | |
---|---|---|---|---|---|---|---|---|
0 | 1847-04-17 | Mary | Gallagher | 28.0 | recent emigrant | married | w | Child Alana 10 days |
1 | 1847-04-08 | John | Sanin (?) | 19.0 | recent emigrant | laborer | m | Catherine 2 mo |
2 | 1847-04-17 | Anthony | Clark | 60.0 | recent emigrant | laborer | m | Charles Riley afed 10 days |
3 | 1847-04-08 | Lawrence | Feeney | 32.0 | recent emigrant | laborer | m | Child |
4 | 1847-04-13 | Henry | Joyce | 21.0 | recent emigrant | NaN | m | Child 1 mo |
5 | 1847-04-14 | Bridget | Hart | 20.0 | recent emigrant | spinster | w | Child |
6 | 1847-04-14 | Mary | Green | 40.0 | recent emigrant | spinster | w | And child 2 months |
7 | 1847-04-19 | Daniel | Loftus | 27.0 | destitution | laborer | m | NaN |
8 | 1847-04-10 | James | Day | 35.0 | recent emigrant | laborer | m | NaN |
9 | 1847-04-10 | Margaret | Farrell | 30.0 | recent emigrant | widow | w | NaN |
10 | 1847-04-10 | Bridget | Day | 30.0 | recent emigrant | married | w | NaN |
11 | 1847-04-10 | Anthony | Day | 0.5 | recent emigrant | NaN | m | NaN |
12 | 1847-04-07 | James | Collins | 22.0 | recent emigrant | laborer | m | NaN |
13 | 1847-04-07 | Thomas | Collins | 21.0 | recent emigrant | laborer | m | NaN |
14 | 1847-04-07 | Pat | Whalen | 25.0 | recent emigrant | laborer | m | NaN |
15 | 1847-04-17 | Dan | Delany | 10.0 | typhus | NaN | m | NaN |
16 | 1847-04-09 | Catherine | O'Harra | 23.0 | recent emigrant | married | w | NaN |
17 | 1847-04-09 | Damiel | O'Harra | 25.0 | recent emigrant | laborer | m | NaN |
18 | 1847-04-12 | Margaret | Delaney | 26.0 | recent emigrant | married | w | NaN |
19 | 1847-04-12 | Michael | Delany | 3.0 | recent emigrant | NaN | m | NaN |
We’re using the Bellevue Almshouse Dataset to practice Python lists and loops because we want to think deeply about the consequences of reducing human life to data even at this early stage in our Python journey. This immigration data, as Shrout argues in her essay “(Re)Humanizing Data: Digitally Navigating the Bellevue Almshouse,” was “produced with the express purpose of reducing people to bodies; bodies to easily quantifiable aspects; and assigning value to those aspects which proved that the marginalized people to who they belonged were worth less than their elite counterparts.”
As we work through the lesson below, reflect about the categories that these Irish immigrants were slotted into by the NYC government. What should we make of the fact that Python, as a programming language, doesn’t understand the meaning or historical context of this data? How can we nevertheless use Python to better understand this history and to interrogate power?
In the previous lesson, we learned how to make, manipulate, and iterate through lists, an important Python collection type. In this lesson, we’re going to keep practicing and learn how to:
build lists with
for
loopscreate a running index of items in a list with
enumerate()
create one-line
for
loops with list comprehensionszip lists together with
zip()
easily count items in a list
These tools can help us:
identify how many times a certain value appears in the data (e.g., the so-called disease “recent emigrant”)
programatically change all blank values in the data (e.g., from a blank to “no disease recorded”)
find the most and least common values in the data (e.g., most common “diseases” or professions)
Example Lists
Here are the four lists with sample data from the Bellevue Almshouse dataset. Each item in each list corresponds to a single row from the dataset. You might imagine how the code in this lesson might apply to the entire dataset or to other large datasets.
first_names = ['Unity', 'Catherine', 'Thomas', 'William', 'Patrick', 'Mary Anne', 'Morris',
'Michael', 'Ellen', 'James', 'Michael', 'Hannah', 'Alexander', 'Mary A', 'Serena?',
'Margaret', 'Michael', 'Jane', 'Rosanna', 'James', 'Michael', 'John', 'John', 'Mary',
'Bantel', 'Marcella', 'Arthur', 'Michael', 'Mary', 'Martin']
last_names = ['Harkin', 'Doyle', 'McDonald', 'Jordan', 'Rouse', 'Keene', 'Brown',
'McLoughlin', 'Cassidy', 'Whittle', 'Coyle', 'Cullen', 'Cozens',
'Maly', 'McGuire', 'Laly', 'Bahan', 'Combs', 'McGovern', 'Gallagher',
'Crone', 'Brannon', 'McDonal', 'Atkins', 'Garragan', 'Wood', 'Kelly', 'Galeny', 'Welch', 'Kerly']
diseases = ['', 'recent emigrant', 'sickness', '', '', '', 'destitution', '', 'sickness', '',
'sickness', 'recent emigrant', '', 'insane', 'recent emigrant', 'insane', '', '',
'sickness', 'sickness', '', 'syphilis', 'sickness', '', 'recent emigrant', 'destitution',
'sickness', 'recent emigrant', 'sickness', 'sickness']
ages = ['22', '21', '23', '47', '45', '28', '23', '50', '26', '28', '30', '30', '65', '17', '35',
'27', '32', '40', '22', '30', '27', '40', '41', '37', '16', '20', '30', '30', '35', '9']
For Loop#
As a refresher, we can use a for
loop to iterate through a list and do something to each item in the list.
diseases = ['', 'recent emigrant', 'sickness', '', '', '', 'destitution', '', 'sickness', '',
'sickness', 'recent emigrant', '', 'insane', 'recent emigrant', 'insane', '', '',
'sickness', 'sickness', '', 'syphilis', 'sickness', '', 'recent emigrant', 'destitution',
'sickness', 'recent emigrant', 'sickness', 'sickness']
Below we are iterating through the list diseases
and printing out every item in the list.
for disease in diseases:
print(disease)
recent emigrant
sickness
destitution
sickness
sickness
recent emigrant
insane
recent emigrant
insane
sickness
sickness
syphilis
sickness
recent emigrant
destitution
sickness
recent emigrant
sickness
sickness
Remember that the variable name that will represent each item in the list doesn’t exist yet, and it can be named anything you want. Instead of disease
, we could name the variable x
for x in diseases:
print(x)
recent emigrant
sickness
destitution
sickness
sickness
recent emigrant
insane
recent emigrant
insane
sickness
sickness
syphilis
sickness
recent emigrant
destitution
sickness
recent emigrant
sickness
sickness
As we’ve discussed before, however, it’s preferable to name your variables something clear that has human language significance.
Enumerate()#
diseases = ['', 'recent emigrant', 'sickness', '', '', '', 'destitution', '', 'sickness', '',
'sickness', 'recent emigrant', '', 'insane', 'recent emigrant', 'insane', '', '',
'sickness', 'sickness', '', 'syphilis', 'sickness', '', 'recent emigrant', 'destitution',
'sickness', 'recent emigrant', 'sickness', 'sickness']
You might want to keep a numerical count or index of items in a list. To print out each item in the list with a corresponding number, you can use the built-in Python function enumerate()
.
To access the number that corresponds to each item, you need to unpack two variables instead of just one: number
, disease
for number, disease in enumerate(diseases):
print(number, disease)
0
1 recent emigrant
2 sickness
3
4
5
6 destitution
7
8 sickness
9
10 sickness
11 recent emigrant
12
13 insane
14 recent emigrant
15 insane
16
17
18 sickness
19 sickness
20
21 syphilis
22 sickness
23
24 recent emigrant
25 destitution
26 sickness
27 recent emigrant
28 sickness
29 sickness
for number, disease in enumerate(diseases):
print(f"Person {number}: {disease}")
Person 0:
Person 1: recent emigrant
Person 2: sickness
Person 3:
Person 4:
Person 5:
Person 6: destitution
Person 7:
Person 8: sickness
Person 9:
Person 10: sickness
Person 11: recent emigrant
Person 12:
Person 13: insane
Person 14: recent emigrant
Person 15: insane
Person 16:
Person 17:
Person 18: sickness
Person 19: sickness
Person 20:
Person 21: syphilis
Person 22: sickness
Person 23:
Person 24: recent emigrant
Person 25: destitution
Person 26: sickness
Person 27: recent emigrant
Person 28: sickness
Person 29: sickness
Build a List with a For
Loop#
We can also make lists with for
loops. Let’s say we wanted to take this list collection
and create a new list that only contains the items in the list that match "item we want"
.
collection = ['item', 'item we want', 'item', 'item', 'item we want']
To do so, we could make an empty list by assigning empty_list
the value of []
—that is, a list with nothing inside of it.
Then, we could use a for
loop to iterate through collection
. If an item equals "item we want"
, then we will .append()
that item to our previously empty list.
empty_list = []
for item in collection:
if item == "item we want":
empty_list.append(item)
Check it out!
empty_list
Show code cell output
['item we want', 'item we want']
To iterate through the list diseases
and make a new list with only the items that match "recent emigrant"
, we could use the same template.
diseases = ['', 'recent emigrant', 'sickness', '', '', '', 'destitution', '', 'sickness', '',
'sickness', 'recent emigrant', '', 'insane', 'recent emigrant', 'insane', '', '',
'sickness', 'sickness', '', 'syphilis', 'sickness', '', 'recent emigrant', 'destitution',
'sickness', 'recent emigrant', 'sickness', 'sickness']
recent_emigrants = []
for disease in diseases:
if disease == 'recent emigrant':
recent_emigrants.append(disease)
recent_emigrants
['recent emigrant',
'recent emigrant',
'recent emigrant',
'recent emigrant',
'recent emigrant']
How many items are in this list? Remember that we can use the len()
function to see how many items are in a list.
len(recent_emigrants)
5
We could also create a new list that revises the old list and transforms blank values into more informative values.
Below we are iterating through the list diseases
. If an item is blank, then we are adding"no disease recorded"
to a new updated_diseases
list. If an item is not blank, we are simply adding the disease
to the new list.
updated_diseases = []
for disease in diseases:
if disease == '':
new_disease = 'no disease recorded'
updated_diseases.append(new_disease)
else:
updated_diseases.append(disease)
updated_diseases
['no disease recorded',
'recent emigrant',
'sickness',
'no disease recorded',
'no disease recorded',
'no disease recorded',
'destitution',
'no disease recorded',
'sickness',
'no disease recorded',
'sickness',
'recent emigrant',
'no disease recorded',
'insane',
'recent emigrant',
'insane',
'no disease recorded',
'no disease recorded',
'sickness',
'sickness',
'no disease recorded',
'syphilis',
'sickness',
'no disease recorded',
'recent emigrant',
'destitution',
'sickness',
'recent emigrant',
'sickness',
'sickness']
List Comprehensions#
There’s a slightly easier and more compact way to build a list with a for
loop called a list comprehension.
Loop |
List Comprehension |
---|---|
empty_list = []
for item in collection:
if item == "item we want":
empty_list.append(item)
|
empty_list = [item for item in collection if item == 'item we want']
|
Instead of creating an empty list, you can build the for
loop inside of a list, and you can do it all in one line.
Rather than this…
empty_list = []
for item in collection:
if item == "item we want":
empty_list.append(item)
We can do this…
empty_list = [item for item in collection if item == 'item we want']
empty_list
['item we want', 'item we want']
You might think of a list comprehension unfolding in the following order: “the item we want to extract” followed by a flattened for
loop.
To make the list recent_emigrants
with a list comprehension, for example, we start with disease
— that is, the “item we want to extract” — and followed it up with a flattened for
loop.
recent_emigrants = [disease for disease in diseases if disease == 'recent emigrant']
recent_emigrants
['recent emigrant',
'recent emigrant',
'recent emigrant',
'recent emigrant',
'recent emigrant']
There’s no significant difference between the 4-line for
loop and the 1-line list comprehension, except that the list comprehension is easier to write (once you get the hang of it) and takes up less space.
Zip Lists Together#
We can also iterate through multiple lists at the same time by using the zip()
function, which basically “zips” the lists together.
first_names = ['Unity', 'Catherine', 'Thomas', 'William', 'Patrick', 'Mary Anne', 'Morris',
'Michael', 'Ellen', 'James', 'Michael', 'Hannah', 'Alexander', 'Mary A', 'Serena?',
'Margaret', 'Michael', 'Jane', 'Rosanna', 'James', 'Michael', 'John', 'John', 'Mary',
'Bantel', 'Marcella', 'Arthur', 'Michael', 'Mary', 'Martin']
last_names = ['Harkin', 'Doyle', 'McDonald', 'Jordan', 'Rouse', 'Keene', 'Brown',
'McLoughlin', 'Cassidy', 'Whittle', 'Coyle', 'Cullen', 'Cozens',
'Maly', 'McGuire', 'Laly', 'Bahan', 'Combs', 'McGovern', 'Gallagher',
'Crone', 'Brannon', 'McDonal', 'Atkins', 'Garragan', 'Wood', 'Kelly', 'Galeny', 'Welch', 'Kerly']
updated_diseases
['no disease recorded',
'recent emigrant',
'sickness',
'no disease recorded',
'no disease recorded',
'no disease recorded',
'destitution',
'no disease recorded',
'sickness',
'no disease recorded',
'sickness',
'recent emigrant',
'no disease recorded',
'insane',
'recent emigrant',
'insane',
'no disease recorded',
'no disease recorded',
'sickness',
'sickness',
'no disease recorded',
'syphilis',
'sickness',
'no disease recorded',
'recent emigrant',
'destitution',
'sickness',
'recent emigrant',
'sickness',
'sickness']
ages = ['22', '21', '23', '47', '45', '28', '23', '50', '26', '28', '30', '30', '65', '17', '35',
'27', '32', '40', '22', '30', '27', '40', '41', '37', '16', '20', '30', '30', '35', '9']
For example, if we wanted to print out each Bellevue Almshouse patient’s first name and their “disease” as recorded by the NYC government, we could zip()
the lists together and unpack multiple variables.
for first_name, disease in zip(first_names, updated_diseases):
print(f"{first_name} // {disease}")
Unity // no disease recorded
Catherine // recent emigrant
Thomas // sickness
William // no disease recorded
Patrick // no disease recorded
Mary Anne // no disease recorded
Morris // destitution
Michael // no disease recorded
Ellen // sickness
James // no disease recorded
Michael // sickness
Hannah // recent emigrant
Alexander // no disease recorded
Mary A // insane
Serena? // recent emigrant
Margaret // insane
Michael // no disease recorded
Jane // no disease recorded
Rosanna // sickness
James // sickness
Michael // no disease recorded
John // syphilis
John // sickness
Mary // no disease recorded
Bantel // recent emigrant
Marcella // destitution
Arthur // sickness
Michael // recent emigrant
Mary // sickness
Martin // sickness
for first_name, last_name, age, disease in zip(first_names, last_names, ages, updated_diseases):
print(f"{first_name} {last_name} // Age {age} // {disease}")
Unity Harkin // Age 22 // no disease recorded
Catherine Doyle // Age 21 // recent emigrant
Thomas McDonald // Age 23 // sickness
William Jordan // Age 47 // no disease recorded
Patrick Rouse // Age 45 // no disease recorded
Mary Anne Keene // Age 28 // no disease recorded
Morris Brown // Age 23 // destitution
Michael McLoughlin // Age 50 // no disease recorded
Ellen Cassidy // Age 26 // sickness
James Whittle // Age 28 // no disease recorded
Michael Coyle // Age 30 // sickness
Hannah Cullen // Age 30 // recent emigrant
Alexander Cozens // Age 65 // no disease recorded
Mary A Maly // Age 17 // insane
Serena? McGuire // Age 35 // recent emigrant
Margaret Laly // Age 27 // insane
Michael Bahan // Age 32 // no disease recorded
Jane Combs // Age 40 // no disease recorded
Rosanna McGovern // Age 22 // sickness
James Gallagher // Age 30 // sickness
Michael Crone // Age 27 // no disease recorded
John Brannon // Age 40 // syphilis
John McDonal // Age 41 // sickness
Mary Atkins // Age 37 // no disease recorded
Bantel Garragan // Age 16 // recent emigrant
Marcella Wood // Age 20 // destitution
Arthur Kelly // Age 30 // sickness
Michael Galeny // Age 30 // recent emigrant
Mary Welch // Age 35 // sickness
Martin Kerly // Age 9 // sickness
If you try to zip()
a list that is a different length than the other lists, it will only zip to the length of the shortest list.
another_list = ['laborer']
for first_name, last_name, age, disease, list_item in zip(first_names, last_names, ages, updated_diseases, another_list):
print(f"{first_name} {last_name} // Age {age} // {disease} // {list_item}")
Unity Harkin // Age 22 // no disease recorded // laborer
Count Items In a List or Collection#
If you want to count the items in a list or a collection, you can use the Counter
module from the collections
library.
To use this tool, you first need to import
it. The import
statement is used whenever you want to import an external Python package or library that was written by someone else.
The from
keyword allows us to import
a specific module from a larger library — in this case, from collections
.
from collections import Counter
Now that we have Counter
imported, we can use it. To count the items in a collection, we simply need to insert a collection inside the Counter()
function.
Counter(updated_diseases)
Counter({'no disease recorded': 11,
'recent emigrant': 5,
'sickness': 9,
'destitution': 2,
'insane': 2,
'syphilis': 1})
This gives us another kind of a collection called a dictionary, which we will discuss in a later lesson. This dictionary includes every item in the list and how many times it appears in the list.
Most Common#
To sort this Counter dictionary based on the most commonly occurring items, we can use the .most_common()
method.
disease_tally = Counter(updated_diseases)
disease_tally.most_common()
[('no disease recorded', 11),
('sickness', 9),
('recent emigrant', 5),
('destitution', 2),
('insane', 2),
('syphilis', 1)]
We can also select a certain number of the top most common items by placing a number inside the .most_common()
method.
disease_tally.most_common(2)
[('no disease recorded', 11), ('sickness', 9)]
Least Common#
We can also select a certain number of the least common items by extracting a slice from the end of list.
disease_tally.most_common()[-2:]
[('insane', 2), ('syphilis', 1)]
Exercises#
shuffled_professions = ['married', 'married', 'laborer', 'laborer', 'widow', 'married', 'spinster',
'laborer', 'spinster', 'laborer', 'spinster', 'spinster', 'married', 'laborer',
'laborer', 'spinster', 'laborer', 'laborer', 'laborer', 'laborer', 'laborer', 'spinster',
'laborer', 'spinster', 'widow', 'spinster', 'painter', 'laborer', 'weaver', 'laborer']
Exercise 1#
Make a new list that includes only the items in the list shuffled_professions
that matches spinster
#Your code here
#Your code here
#Your code here
#Your code here
Exercise 2#
Print out each item in the list shuffled_professions
next to an index number
#Your code here
#Your code here
Exercise 3#
Find the most and least common professions in the list shuffled_professions
#Your code here