{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lists & Loops — Part 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note: You can explore the [associated workbook](https://mybinder.org/v2/gh/melaniewalsh/Intro-Cultural-Analytics/master?urlpath=lab/tree/book/02-Python/Workbooks/10.5-Lists-Loops-Part1-WORKBOOK.ipynb) for this chapter in the cloud.*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this lesson, we're going to learn more about lists and loops by drawing on DH scholar Anelise Shrout's [Bellevue Almshouse Dataset](https://www.nyuirish.net/almshouse/the-almshouse-records/)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Preview The Bellevue Almshouse Dataset**" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "scrolled": true, "tags": [ "remove-input", "output_scroll" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
date_infirst_namelast_nameagediseaseprofessiongenderchildren
01847-04-17MaryGallagher28.0recent emigrantmarriedwChild Alana 10 days
11847-04-08JohnSanin (?)19.0recent emigrantlaborermCatherine 2 mo
21847-04-17AnthonyClark60.0recent emigrantlaborermCharles Riley afed 10 days
31847-04-08LawrenceFeeney32.0recent emigrantlaborermChild
41847-04-13HenryJoyce21.0recent emigrantNaNmChild 1 mo
51847-04-14BridgetHart20.0recent emigrantspinsterwChild
61847-04-14MaryGreen40.0recent emigrantspinsterwAnd child 2 months
71847-04-19DanielLoftus27.0destitutionlaborermNaN
81847-04-10JamesDay35.0recent emigrantlaborermNaN
91847-04-10MargaretFarrell30.0recent emigrantwidowwNaN
101847-04-10BridgetDay30.0recent emigrantmarriedwNaN
111847-04-10AnthonyDay0.5recent emigrantNaNmNaN
121847-04-07JamesCollins22.0recent emigrantlaborermNaN
131847-04-07ThomasCollins21.0recent emigrantlaborermNaN
141847-04-07PatWhalen25.0recent emigrantlaborermNaN
151847-04-17DanDelany10.0typhusNaNmNaN
161847-04-09CatherineO'Harra23.0recent emigrantmarriedwNaN
171847-04-09DamielO'Harra25.0recent emigrantlaborermNaN
181847-04-12MargaretDelaney26.0recent emigrantmarriedwNaN
191847-04-12MichaelDelany3.0recent emigrantNaNmNaN
\n", "
" ], "text/plain": [ " date_in first_name last_name age disease profession gender \\\n", "0 1847-04-17 Mary Gallagher 28.0 recent emigrant married w \n", "1 1847-04-08 John Sanin (?) 19.0 recent emigrant laborer m \n", "2 1847-04-17 Anthony Clark 60.0 recent emigrant laborer m \n", "3 1847-04-08 Lawrence Feeney 32.0 recent emigrant laborer m \n", "4 1847-04-13 Henry Joyce 21.0 recent emigrant NaN m \n", "5 1847-04-14 Bridget Hart 20.0 recent emigrant spinster w \n", "6 1847-04-14 Mary Green 40.0 recent emigrant spinster w \n", "7 1847-04-19 Daniel Loftus 27.0 destitution laborer m \n", "8 1847-04-10 James Day 35.0 recent emigrant laborer m \n", "9 1847-04-10 Margaret Farrell 30.0 recent emigrant widow w \n", "10 1847-04-10 Bridget Day 30.0 recent emigrant married w \n", "11 1847-04-10 Anthony Day 0.5 recent emigrant NaN m \n", "12 1847-04-07 James Collins 22.0 recent emigrant laborer m \n", "13 1847-04-07 Thomas Collins 21.0 recent emigrant laborer m \n", "14 1847-04-07 Pat Whalen 25.0 recent emigrant laborer m \n", "15 1847-04-17 Dan Delany 10.0 typhus NaN m \n", "16 1847-04-09 Catherine O'Harra 23.0 recent emigrant married w \n", "17 1847-04-09 Damiel O'Harra 25.0 recent emigrant laborer m \n", "18 1847-04-12 Margaret Delaney 26.0 recent emigrant married w \n", "19 1847-04-12 Michael Delany 3.0 recent emigrant NaN m \n", "\n", " children \n", "0 Child Alana 10 days \n", "1 Catherine 2 mo \n", "2 Charles Riley afed 10 days \n", "3 Child \n", "4 Child 1 mo \n", "5 Child \n", "6 And child 2 months \n", "7 NaN \n", "8 NaN \n", "9 NaN \n", "10 NaN \n", "11 NaN \n", "12 NaN \n", "13 NaN \n", "14 NaN \n", "15 NaN \n", "16 NaN \n", "17 NaN \n", "18 NaN \n", "19 NaN " ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas\n", "pandas.read_csv(\"../data/bellevue_almshouse_modified.csv\").head(20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{margin} The Bellevue Almshouse Dataset \n", "The Bellevue Almshouse Dataset includes information about Irish-born immigrants who were admitted to the almshouse in the 1840s. The Bellevue Almshouse was part of New York City's public health system, a place where poor, sick, homeless, and otherwise marginalized people were sent — sometimes voluntarily and sometimes forcibly. This dataset was transcribed from the almshouse's own admissions records by Anelise Shrout.\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We're using the Bellevue Almshouse Dataset to practice Python lists and loops because we want to think deeply about the consequences of reducing human life to data even at this early stage in our Python journey. This immigration data, as Shrout argues in her essay [\"(Re)Humanizing Data: Digitally Navigating the Bellevue Almshouse,\"](https://crdh.rrchnm.org/essays/v01-10-(re)-humanizing-data/) was \"produced with the express purpose of reducing people to bodies; bodies to easily quantifiable aspects; and assigning value to those aspects which proved that the marginalized people to who they belonged were worth less than their elite counterparts.\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we work through the lesson below, reflect about the categories that these Irish immigrants were slotted into by the NYC government. What should we make of the fact that Python, as a programming language, doesn't understand the meaning or historical context of this data? How can we nevertheless use Python to better understand this history and to interrogate power?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the previous lesson, we learned how to make, manipulate, and iterate through lists, an important Python collection type. In this lesson, we're going to keep practicing and learn how to:\n", "\n", "- build lists with `for` loops\n", "- create a running index of items in a list with `enumerate()`\n", "- create one-line `for` loops with *list comprehensions*\n", "- zip lists together with`zip()`\n", "- easily count items in a list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These tools can help us:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- identify how many times a certain value appears in the data (e.g., the so-called disease \"recent emigrant\")\n", "- programatically change all blank values in the data (e.g., from a blank to \"no disease recorded\")\n", "- find the most and least common values in the data (e.g., most common \"diseases\" or professions)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Example Lists**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are the four lists with sample data from the Bellevue Almshouse dataset. Each item in each list corresponds to a single row from the dataset. You might imagine how the code in this lesson might apply to the entire dataset or to other large datasets." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "first_names = ['Unity', 'Catherine', 'Thomas', 'William', 'Patrick', 'Mary Anne', 'Morris',\n", " 'Michael', 'Ellen', 'James', 'Michael', 'Hannah', 'Alexander', 'Mary A', 'Serena?',\n", " 'Margaret', 'Michael', 'Jane', 'Rosanna', 'James', 'Michael', 'John', 'John', 'Mary',\n", " 'Bantel', 'Marcella', 'Arthur', 'Michael', 'Mary', 'Martin']" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "last_names = ['Harkin', 'Doyle', 'McDonald', 'Jordan', 'Rouse', 'Keene', 'Brown',\n", " 'McLoughlin', 'Cassidy', 'Whittle', 'Coyle', 'Cullen', 'Cozens', \n", " 'Maly', 'McGuire', 'Laly', 'Bahan', 'Combs', 'McGovern', 'Gallagher', \n", " 'Crone', 'Brannon', 'McDonal', 'Atkins', 'Garragan', 'Wood', 'Kelly', 'Galeny', 'Welch', 'Kerly']" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "diseases = ['', 'recent emigrant', 'sickness', '', '', '', 'destitution', '', 'sickness', '',\n", " 'sickness', 'recent emigrant', '', 'insane', 'recent emigrant', 'insane', '', '',\n", " 'sickness', 'sickness', '', 'syphilis', 'sickness', '', 'recent emigrant', 'destitution',\n", " 'sickness', 'recent emigrant', 'sickness', 'sickness']" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "ages = ['22', '21', '23', '47', '45', '28', '23', '50', '26', '28', '30', '30', '65', '17', '35',\n", " '27', '32', '40', '22', '30', '27', '40', '41', '37', '16', '20', '30', '30', '35', '9']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## For Loop" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a refresher, we can use a `for` loop to iterate through a list and do something to each item in the list.\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "diseases = ['', 'recent emigrant', 'sickness', '', '', '', 'destitution', '', 'sickness', '',\n", " 'sickness', 'recent emigrant', '', 'insane', 'recent emigrant', 'insane', '', '',\n", " 'sickness', 'sickness', '', 'syphilis', 'sickness', '', 'recent emigrant', 'destitution',\n", " 'sickness', 'recent emigrant', 'sickness', 'sickness']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below we are iterating through the list `diseases` and printing out every item in the list." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "recent emigrant\n", "sickness\n", "\n", "\n", "\n", "destitution\n", "\n", "sickness\n", "\n", "sickness\n", "recent emigrant\n", "\n", "insane\n", "recent emigrant\n", "insane\n", "\n", "\n", "sickness\n", "sickness\n", "\n", "syphilis\n", "sickness\n", "\n", "recent emigrant\n", "destitution\n", "sickness\n", "recent emigrant\n", "sickness\n", "sickness\n" ] } ], "source": [ "for disease in diseases:\n", " print(disease)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Remember that the variable name that will represent each item in the list doesn't exist yet, and it can be named anything you want. Instead of `disease`, we could name the variable `x`" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "recent emigrant\n", "sickness\n", "\n", "\n", "\n", "destitution\n", "\n", "sickness\n", "\n", "sickness\n", "recent emigrant\n", "\n", "insane\n", "recent emigrant\n", "insane\n", "\n", "\n", "sickness\n", "sickness\n", "\n", "syphilis\n", "sickness\n", "\n", "recent emigrant\n", "destitution\n", "sickness\n", "recent emigrant\n", "sickness\n", "sickness\n" ] } ], "source": [ "for x in diseases:\n", " print(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we've discussed before, however, it's preferable to name your variables something clear that has human language significance." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Enumerate()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "diseases = ['', 'recent emigrant', 'sickness', '', '', '', 'destitution', '', 'sickness', '',\n", " 'sickness', 'recent emigrant', '', 'insane', 'recent emigrant', 'insane', '', '',\n", " 'sickness', 'sickness', '', 'syphilis', 'sickness', '', 'recent emigrant', 'destitution',\n", " 'sickness', 'recent emigrant', 'sickness', 'sickness']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You might want to keep a numerical count or index of items in a list. To print out each item in the list with a corresponding number, you can use the built-in Python function `enumerate()`.\n", "\n", "To access the number that corresponds to each item, you need to unpack *two* variables instead of just one: `number` , `disease`" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 \n", "1 recent emigrant\n", "2 sickness\n", "3 \n", "4 \n", "5 \n", "6 destitution\n", "7 \n", "8 sickness\n", "9 \n", "10 sickness\n", "11 recent emigrant\n", "12 \n", "13 insane\n", "14 recent emigrant\n", "15 insane\n", "16 \n", "17 \n", "18 sickness\n", "19 sickness\n", "20 \n", "21 syphilis\n", "22 sickness\n", "23 \n", "24 recent emigrant\n", "25 destitution\n", "26 sickness\n", "27 recent emigrant\n", "28 sickness\n", "29 sickness\n" ] } ], "source": [ "for number, disease in enumerate(diseases):\n", " print(number, disease)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "tags": [ "output_scroll" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Person 0: \n", "Person 1: recent emigrant\n", "Person 2: sickness\n", "Person 3: \n", "Person 4: \n", "Person 5: \n", "Person 6: destitution\n", "Person 7: \n", "Person 8: sickness\n", "Person 9: \n", "Person 10: sickness\n", "Person 11: recent emigrant\n", "Person 12: \n", "Person 13: insane\n", "Person 14: recent emigrant\n", "Person 15: insane\n", "Person 16: \n", "Person 17: \n", "Person 18: sickness\n", "Person 19: sickness\n", "Person 20: \n", "Person 21: syphilis\n", "Person 22: sickness\n", "Person 23: \n", "Person 24: recent emigrant\n", "Person 25: destitution\n", "Person 26: sickness\n", "Person 27: recent emigrant\n", "Person 28: sickness\n", "Person 29: sickness\n" ] } ], "source": [ "for number, disease in enumerate(diseases):\n", " print(f\"Person {number}: {disease}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Build a List with a `For` Loop" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also make lists with `for` loops. Let's say we wanted to take this list `collection` and create a new list that only contains the items in the list that match `\"item we want\"`." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "collection = ['item', 'item we want', 'item', 'item', 'item we want']" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "hide-output" ] }, "source": [ "To do so, we could make an empty list by assigning `empty_list` the value of `[]`—that is, a list with nothing inside of it. \n", "Then, we could use a `for` loop to iterate through `collection`. If an item equals `\"item we want\"`, then we will `.append()` that item to our previously empty list." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "empty_list = []\n", "for item in collection:\n", " if item == \"item we want\":\n", " empty_list.append(item)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check it out!" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/plain": [ "['item we want', 'item we want']" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "empty_list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To iterate through the list `diseases` and make a new list with only the items that match `\"recent emigrant\"`, we could use the same template." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "diseases = ['', 'recent emigrant', 'sickness', '', '', '', 'destitution', '', 'sickness', '',\n", " 'sickness', 'recent emigrant', '', 'insane', 'recent emigrant', 'insane', '', '',\n", " 'sickness', 'sickness', '', 'syphilis', 'sickness', '', 'recent emigrant', 'destitution',\n", " 'sickness', 'recent emigrant', 'sickness', 'sickness']" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "recent_emigrants = []\n", "for disease in diseases:\n", " if disease == 'recent emigrant':\n", " recent_emigrants.append(disease)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['recent emigrant',\n", " 'recent emigrant',\n", " 'recent emigrant',\n", " 'recent emigrant',\n", " 'recent emigrant']" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "recent_emigrants" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How many items are in this list? Remember that we can use the `len()` function to see how many items are in a list." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(recent_emigrants)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We could also create a new list that revises the old list and transforms blank values into more informative values.\n", "\n", "Below we are iterating through the list `diseases`. If an item is blank, then we are adding`\"no disease recorded\"` to a new `updated_diseases` list. If an item is not blank, we are simply adding the `disease` to the new list." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "updated_diseases = []\n", "for disease in diseases:\n", " if disease == '':\n", " new_disease = 'no disease recorded'\n", " updated_diseases.append(new_disease)\n", " else:\n", " updated_diseases.append(disease)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "tags": [ "output_scroll" ] }, "outputs": [ { "data": { "text/plain": [ "['no disease recorded',\n", " 'recent emigrant',\n", " 'sickness',\n", " 'no disease recorded',\n", " 'no disease recorded',\n", " 'no disease recorded',\n", " 'destitution',\n", " 'no disease recorded',\n", " 'sickness',\n", " 'no disease recorded',\n", " 'sickness',\n", " 'recent emigrant',\n", " 'no disease recorded',\n", " 'insane',\n", " 'recent emigrant',\n", " 'insane',\n", " 'no disease recorded',\n", " 'no disease recorded',\n", " 'sickness',\n", " 'sickness',\n", " 'no disease recorded',\n", " 'syphilis',\n", " 'sickness',\n", " 'no disease recorded',\n", " 'recent emigrant',\n", " 'destitution',\n", " 'sickness',\n", " 'recent emigrant',\n", " 'sickness',\n", " 'sickness']" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "updated_diseases" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## List Comprehensions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There's a slightly easier and more compact way to build a list with a `for` loop called a *list comprehension*.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "````{list-table}\n", ":header-rows: 1\n", "\n", "* - Loop\n", " - List Comprehension\n", "* - ```\n", " empty_list = []\n", " for item in collection:\n", " if item == \"item we want\":\n", " empty_list.append(item)\n", " ```\n", " - ```\n", " empty_list = [item for item in collection if item == 'item we want']\n", " ```\n", "````" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Instead of creating an empty list, you can build the `for` loop inside of a list, and you can do it all in one line." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Rather than this..." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "empty_list = []\n", "for item in collection:\n", " if item == \"item we want\":\n", " empty_list.append(item)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can do this..." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "empty_list = [item for item in collection if item == 'item we want']" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['item we want', 'item we want']" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "empty_list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You might think of a list comprehension unfolding in the following order: \"the item we want to extract\" followed by a flattened `for` loop. \n", "\n", "To make the list `recent_emigrants` with a list comprehension, for example, we start with `disease` — that is, the \"item we want to extract\" — and followed it up with a flattened `for` loop." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "recent_emigrants = [disease for disease in diseases if disease == 'recent emigrant']" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['recent emigrant',\n", " 'recent emigrant',\n", " 'recent emigrant',\n", " 'recent emigrant',\n", " 'recent emigrant']" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "recent_emigrants" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There's no significant difference between the 4-line `for` loop and the 1-line list comprehension, except that the list comprehension is easier to write (once you get the hang of it) and takes up less space." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Zip Lists Together" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also iterate through multiple lists at the same time by using the `zip()` function, which basically \"zips\" the lists together." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "first_names = ['Unity', 'Catherine', 'Thomas', 'William', 'Patrick', 'Mary Anne', 'Morris',\n", " 'Michael', 'Ellen', 'James', 'Michael', 'Hannah', 'Alexander', 'Mary A', 'Serena?',\n", " 'Margaret', 'Michael', 'Jane', 'Rosanna', 'James', 'Michael', 'John', 'John', 'Mary',\n", " 'Bantel', 'Marcella', 'Arthur', 'Michael', 'Mary', 'Martin']" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "last_names = ['Harkin', 'Doyle', 'McDonald', 'Jordan', 'Rouse', 'Keene', 'Brown',\n", " 'McLoughlin', 'Cassidy', 'Whittle', 'Coyle', 'Cullen', 'Cozens', \n", " 'Maly', 'McGuire', 'Laly', 'Bahan', 'Combs', 'McGovern', 'Gallagher', \n", " 'Crone', 'Brannon', 'McDonal', 'Atkins', 'Garragan', 'Wood', 'Kelly', 'Galeny', 'Welch', 'Kerly']" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "scrolled": true, "tags": [ "output_scroll" ] }, "outputs": [ { "data": { "text/plain": [ "['no disease recorded',\n", " 'recent emigrant',\n", " 'sickness',\n", " 'no disease recorded',\n", " 'no disease recorded',\n", " 'no disease recorded',\n", " 'destitution',\n", " 'no disease recorded',\n", " 'sickness',\n", " 'no disease recorded',\n", " 'sickness',\n", " 'recent emigrant',\n", " 'no disease recorded',\n", " 'insane',\n", " 'recent emigrant',\n", " 'insane',\n", " 'no disease recorded',\n", " 'no disease recorded',\n", " 'sickness',\n", " 'sickness',\n", " 'no disease recorded',\n", " 'syphilis',\n", " 'sickness',\n", " 'no disease recorded',\n", " 'recent emigrant',\n", " 'destitution',\n", " 'sickness',\n", " 'recent emigrant',\n", " 'sickness',\n", " 'sickness']" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "updated_diseases" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ages = ['22', '21', '23', '47', '45', '28', '23', '50', '26', '28', '30', '30', '65', '17', '35',\n", " '27', '32', '40', '22', '30', '27', '40', '41', '37', '16', '20', '30', '30', '35', '9']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For example, if we wanted to print out each Bellevue Almshouse patient's first name and their \"disease\" as recorded by the NYC government, we could `zip()` the lists together and unpack multiple variables." ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "tags": [ "output_scroll" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Unity // no disease recorded\n", "Catherine // recent emigrant\n", "Thomas // sickness\n", "William // no disease recorded\n", "Patrick // no disease recorded\n", "Mary Anne // no disease recorded\n", "Morris // destitution\n", "Michael // no disease recorded\n", "Ellen // sickness\n", "James // no disease recorded\n", "Michael // sickness\n", "Hannah // recent emigrant\n", "Alexander // no disease recorded\n", "Mary A // insane\n", "Serena? // recent emigrant\n", "Margaret // insane\n", "Michael // no disease recorded\n", "Jane // no disease recorded\n", "Rosanna // sickness\n", "James // sickness\n", "Michael // no disease recorded\n", "John // syphilis\n", "John // sickness\n", "Mary // no disease recorded\n", "Bantel // recent emigrant\n", "Marcella // destitution\n", "Arthur // sickness\n", "Michael // recent emigrant\n", "Mary // sickness\n", "Martin // sickness\n" ] } ], "source": [ "for first_name, disease in zip(first_names, updated_diseases):\n", " print(f\"{first_name} // {disease}\")" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "tags": [ "output_scroll" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Unity Harkin // Age 22 // no disease recorded\n", "Catherine Doyle // Age 21 // recent emigrant\n", "Thomas McDonald // Age 23 // sickness\n", "William Jordan // Age 47 // no disease recorded\n", "Patrick Rouse // Age 45 // no disease recorded\n", "Mary Anne Keene // Age 28 // no disease recorded\n", "Morris Brown // Age 23 // destitution\n", "Michael McLoughlin // Age 50 // no disease recorded\n", "Ellen Cassidy // Age 26 // sickness\n", "James Whittle // Age 28 // no disease recorded\n", "Michael Coyle // Age 30 // sickness\n", "Hannah Cullen // Age 30 // recent emigrant\n", "Alexander Cozens // Age 65 // no disease recorded\n", "Mary A Maly // Age 17 // insane\n", "Serena? McGuire // Age 35 // recent emigrant\n", "Margaret Laly // Age 27 // insane\n", "Michael Bahan // Age 32 // no disease recorded\n", "Jane Combs // Age 40 // no disease recorded\n", "Rosanna McGovern // Age 22 // sickness\n", "James Gallagher // Age 30 // sickness\n", "Michael Crone // Age 27 // no disease recorded\n", "John Brannon // Age 40 // syphilis\n", "John McDonal // Age 41 // sickness\n", "Mary Atkins // Age 37 // no disease recorded\n", "Bantel Garragan // Age 16 // recent emigrant\n", "Marcella Wood // Age 20 // destitution\n", "Arthur Kelly // Age 30 // sickness\n", "Michael Galeny // Age 30 // recent emigrant\n", "Mary Welch // Age 35 // sickness\n", "Martin Kerly // Age 9 // sickness\n" ] } ], "source": [ "for first_name, last_name, age, disease in zip(first_names, last_names, ages, updated_diseases):\n", " print(f\"{first_name} {last_name} // Age {age} // {disease}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you try to `zip()` a list that is a different length than the other lists, it will only zip to the length of the shortest list." ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "another_list = ['laborer']" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Unity Harkin // Age 22 // no disease recorded // laborer\n" ] } ], "source": [ "for first_name, last_name, age, disease, list_item in zip(first_names, last_names, ages, updated_diseases, another_list):\n", " print(f\"{first_name} {last_name} // Age {age} // {disease} // {list_item}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Count Items In a List or Collection" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you want to count the items in a list or a collection, you can use the `Counter` module from the `collections` library. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To use this tool, you first need to `import` it. The `import` statement is used whenever you want to import an external Python package or library that was written by someone else.\n", "\n", "The `from` keyword allows us to `import` a specific module from a larger library — in this case, from `collections`." ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [], "source": [ "from collections import Counter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have `Counter` imported, we can use it. To count the items in a collection, we simply need to insert a collection inside the `Counter()` function." ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "tags": [ "output_scroll" ] }, "outputs": [ { "data": { "text/plain": [ "Counter({'no disease recorded': 11,\n", " 'recent emigrant': 5,\n", " 'sickness': 9,\n", " 'destitution': 2,\n", " 'insane': 2,\n", " 'syphilis': 1})" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Counter(updated_diseases)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This gives us another kind of a collection called a *dictionary*, which we will discuss in a later lesson. This dictionary includes every item in the list and how many times it appears in the list." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Most Common" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " To sort this Counter dictionary based on the most commonly occurring items, we can use the `.most_common()` method." ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "tags": [ "output_scroll" ] }, "outputs": [ { "data": { "text/plain": [ "[('no disease recorded', 11),\n", " ('sickness', 9),\n", " ('recent emigrant', 5),\n", " ('destitution', 2),\n", " ('insane', 2),\n", " ('syphilis', 1)]" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "disease_tally = Counter(updated_diseases)\n", "disease_tally.most_common()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also select a certain number of the top most common items by placing a number inside the `.most_common()` method." ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('no disease recorded', 11), ('sickness', 9)]" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "disease_tally.most_common(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Least Common" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also select a certain number of the *least* common items by extracting a slice from the end of list." ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('insane', 2), ('syphilis', 1)]" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "disease_tally.most_common()[-2:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{admonition}\n", ":class: pythonreview\n", "For a refresher on how to slice from the end of a list, see list slices in the previous lesson.\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercises" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "shuffled_professions = ['married', 'married', 'laborer', 'laborer', 'widow', 'married', 'spinster',\n", " 'laborer', 'spinster', 'laborer', 'spinster', 'spinster', 'married', 'laborer',\n", " 'laborer', 'spinster', 'laborer', 'laborer', 'laborer', 'laborer', 'laborer', 'spinster',\n", " 'laborer', 'spinster', 'widow', 'spinster', 'painter', 'laborer', 'weaver', 'laborer']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Make a new list that includes only the items in the list `shuffled_professions` that matches `spinster`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Your code here\n", "#Your code here\n", " #Your code here\n", " #Your code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Print out each item in the list `shuffled_professions` next to an index number " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Your code here\n", " #Your code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Find the most and least common professions in the list `shuffled_professions`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Your code here" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 4 }