{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Topic Modeling — CSV Files" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In these lessons, we're learning about a text analysis method called *topic modeling*. This method will help us identify the main topics or discourses within a collection of texts a single text that has been separated into smaller text chunks." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Am I the Asshole?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "AITA for lying about my biggest fear on a quiz show and subsequently winning a car and making other contestants lose?\n", "\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this particular lesson, we're going to use [Little MALLET Wrapper](https://github.com/maria-antoniak/little-mallet-wrapper), a Python wrapper for [MALLET](http://mallet.cs.umass.edu/topics.php), to topic model a CSV file with 2,932 Reddit posts from the subreddit [r/AmITheAsshole](https://www.reddit.com/r/AmItheAsshole/) that have at least an upvote score of 2,000. This is an online forum where people share their personal conflicts and ask the community to judge who's the a**hole in the story. This data was collected with PSAW, a wrapper for the Pushshift API." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", " —Reddit user iwonacar, r/AmItheAsshole\n", "
\n", " \n", "
Attention
\n", " \n", "If you're working in this Jupyter notebook on your own computer, you'll need to have both the Java Development Kit and MALLET pre-installed. For set up instructions, please see the previous lesson.
\n", " \n", "If you're working in this Jupyter notebook in the cloud via Binder, then the Java Development Kit and Mallet will already be installed. You're good to go! \n", " \n", "Note
\n", "We're calling these text files our *training data*, because we're *training* our topic model with these texts. The topic model will be learning and extracting topics based on these texts.\n", " \n", "Pandas
\n", " Do you need a refresher or introduction to the Python data analysis library Pandas? Be sure to check out Pandas Basics (1-3) in this textbook!\n", " \n", "\n", " | author | \n", "full_date | \n", "date | \n", "title | \n", "selftext | \n", "url | \n", "subreddit | \n", "upvote_score | \n", "num_comments | \n", "num_crossposts | \n", "
---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "Additional-Pizza-805 | \n", "2020-07-24 19:13:49+00:00 | \n", "2020-07-24 | \n", "AITA for kicking my cousin off of my sister’s wedding Zoom call? | \n", "My [27M] older sister [30F] and her fiancé [31M] were planning for over a year for their wedding... | \n", "https://www.reddit.com/r/AmItheAsshole/comments/hx80wd/aita_for_kicking_my_cousin_off_of_my_sist... | \n", "AmItheAsshole | \n", "11159 | \n", "2209 | \n", "4 | \n", "
1 | \n", "decadel8ter | \n", "2020-07-24 14:37:13+00:00 | \n", "2020-07-24 | \n", "AITA for resenting my family for something that happened over a decade ago? | \n", "when i was 15 i was in a car accident. i was riding my bike on new bike lanes that my city had i... | \n", "https://www.reddit.com/r/AmItheAsshole/comments/hx2vvl/aita_for_resenting_my_family_for_somethin... | \n", "AmItheAsshole | \n", "2541 | \n", "1143 | \n", "0 | \n", "
2 | \n", "Snoo_66130 | \n", "2020-07-24 12:35:35+00:00 | \n", "2020-07-24 | \n", "AITA for telling my step dad to stop trying to be my dad? | \n", "I'm 35, and my mom who is 52 is dating a man who is is 27. This is fucking weird as hell and... | \n", "https://www.reddit.com/r/AmItheAsshole/comments/hx0zk7/aita_for_telling_my_step_dad_to_stop_tryi... | \n", "AmItheAsshole | \n", "2809 | \n", "1253 | \n", "1 | \n", "
3 | \n", "ohnoihaveabluechair | \n", "2020-07-24 10:56:56+00:00 | \n", "2020-07-24 | \n", "AITA for confronting my SIL for wearing clothes that belonged to me? | \n", "Some info: A few years ago, my family didn’t have a lot of spare money to buy a lot of things (l... | \n", "https://www.reddit.com/r/AmItheAsshole/comments/hwzpbu/aita_for_confronting_my_sil_for_wearing_c... | \n", "AmItheAsshole | \n", "7581 | \n", "1550 | \n", "1 | \n", "
4 | \n", "FormalLettuce3 | \n", "2020-07-24 10:52:08+00:00 | \n", "2020-07-24 | \n", "AITA for saying we'd only help with my ex's kid's party if we could tell people we're engaged? | \n", "This guy, \"Jack\", and I were together for about a year, and within a couple weeks of ending it I... | \n", "https://www.reddit.com/r/AmItheAsshole/comments/hwzncq/aita_for_saying_wed_only_help_with_my_exs... | \n", "AmItheAsshole | \n", "2915 | \n", "1214 | \n", "0 | \n", "