{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Anatomy of a Python Script" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first few times that I tried to learn Python, it felt like learning a bunch of made-up rules about an imaginary universe. It turns out that Python is kind of like an imaginary universe with made-up rules. That's part of what makes Python and programming languages so much fun.\n", "\n", "But it can also make learning Python difficult if you don't really know what the imaginary universe looks like, or how it functions, or how it relates to your universe and your specific goals — such as doing text analysis or making a Twitter bot or creating a network visualization.\n", "\n", "In this lesson, we're going to demonstrate what Python looks like in action, so you can get a feel for its structure and flow. Don't get too bogged down in the details for now. Just try to get a sense — at an abstract level — of how Python works and how you might use it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below is a chunk of Python code. These lines, when put together, do something simple yet important. They count and display the most frequent words in a text file. The example below specifically counts and displays the 40 most frequent words in Charlotte Perkins Gilman's short story \"The Yellow Wallpaper\" (1892)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "tags": [ "output_scroll" ] }, "outputs": [ { "data": { "text/plain": [ "[('john', 45),\n", " ('one', 33),\n", " ('said', 30),\n", " ('would', 27),\n", " ('get', 24),\n", " ('see', 24),\n", " ('room', 24),\n", " ('pattern', 24),\n", " ('paper', 23),\n", " ('like', 21),\n", " ('little', 20),\n", " ('much', 16),\n", " ('good', 16),\n", " ('think', 16),\n", " ('well', 15),\n", " ('know', 15),\n", " ('go', 15),\n", " ('really', 14),\n", " ('thing', 14),\n", " ('wallpaper', 13),\n", " ('night', 13),\n", " ('long', 12),\n", " ('course', 12),\n", " ('things', 12),\n", " ('take', 12),\n", " ('always', 12),\n", " ('could', 12),\n", " ('jennie', 12),\n", " ('great', 11),\n", " ('says', 11),\n", " ('feel', 11),\n", " ('even', 11),\n", " ('used', 11),\n", " ('dear', 11),\n", " ('time', 11),\n", " ('enough', 11),\n", " ('away', 11),\n", " ('want', 11),\n", " ('never', 10),\n", " ('must', 10)]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import re\n", "from collections import Counter\n", "\n", "def split_into_words(any_chunk_of_text):\n", " lowercase_text = any_chunk_of_text.lower()\n", " split_words = re.split(\"\\W+\", lowercase_text)\n", " return split_words\n", "\n", "filepath_of_text = \"../texts/literature/The-Yellow-Wallpaper_Charlotte-Perkins-Gilman.txt\"\n", "number_of_desired_words = 40\n", "\n", "stopwords = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours',\n", " 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers',\n", " 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves',\n", " 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are',\n", " 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does',\n", " 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until',\n", " 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into',\n", " 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down',\n", " 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here',\n", " 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more',\n", " 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so',\n", " 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now', 've', 'll', 'amp']\n", "\n", "full_text = open(filepath_of_text, encoding=\"utf-8\").read()\n", "\n", "all_the_words = split_into_words(full_text)\n", "meaningful_words = [word for word in all_the_words if word not in stopwords]\n", "meaningful_words_tally = Counter(meaningful_words)\n", "most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)\n", "\n", "most_frequent_meaningful_words" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calculating word frequency is a very basic form of computational text analysis. Typically, it's not terribly interesting on its own, especially with a single short text. But calculating word frequency *is* important, and it's at the center of most text analysis approaches, even far more complicated ones." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Python Review
\n", "It's important to emphasize that the code above is just *one* way to count words in a text file with Python. This is not the one right way. There is no right way to count words in a text file or to do anything else in Python.
\n", "Rather than asking \"Is this code *right*?\", you want to ask yourself:\n", "- \"Is this code efficient?\"\n", "- \"Is this code readable?\"\n", "- \"Does this code help me accomplish my goal?\"\n", "
\n", "Sometimes you'll prioritize one of these concerns over another. Maybe your code isn't as efficient as humanly possible, but if it gets the job done, and you understand it, then you might not care about maximum efficiency. Our main goal for this class is to study and make arguments about culture, not (necessarily) to become the most efficient software developers.
\n", "