Final Project — Cultural Data Analysis

Due Monday, May 18th by 5pm

For your final project, you will analyze a cultural dataset of your choosing with one of the methods that we’ve discussed in class and write a short paper about your findings. In the end, you will submit two documents: 1) a Jupyter notebook that shows your computational analysis and 2) a three- to five-page (double-spaced) paper that describes your analysis and discusses your findings.

Possible Cultural Datasets:

A “cultural” dataset is any dataset broadly related to art, literature, music, history, politics, communities, and/or society. For your cultural dataset, you can choose:

  • a dataset that we’ve already used in class

  • a dataset that that has been shared by someone else

  • a dataset that you yourself collect and/or compile

You can find a list of potential cultural datasets here. If you have questions about which option to choose or whether your dataset will make for a good final project, please reach out and discuss it with me.

Computational Analysis (Jupyter Notebook):

The computational analysis in your Jupyter notebook should be split into the following sections:

1. Prepare and Examine the Data

First, prepare and broadly examine your data with an eye toward general patterns, potential outliers, or problems. This might involve getting an overview of the data (checking the number of rows and names of columns, calculating basic statistics, etc.), cleaning the data, and/or transforming the data in some way.

2. Analyze the Data

Then analyze the data using at least of the methods that we’ve discussed in class:

  • Pandas calculations

  • TF-IDF

  • Topic models

  • Named entity recognition

  • Network analysis

  • Mapping

3. Zoom In

Finally, zoom in on a few specific examples from your dataset—either grounding your analysis in these concrete examples or examining specific patterns in the new light of your analysis.

Report and Findings (3-5 page paper):

The paper that describes your analysis and reports on your findings should be split into the following sections:

1. Introduction

Broadly introduce your research question, how it relates to culture more broadly, why you were motivated to answer it, and what you were hoping or expecting to find.

2. Dataset

Introduce your dataset (where did it come from, how was it collected, what are its limits, etc.) and explain why you chose this dataset to explore your research question.

3. Methodology

Explain the method that you chose to analyze the data (what is it? how does it work?) and why you chose this method.

4. Analysis & Interpretation

Describe your analysis and interpret your results. What, if anything, can you conclude about your cultural phenomenon based on your analysis? Make sure to incorporate at least one metric, calculation, or data visualization from your computational analysis.

5. Conclusions & Future Work

If you had unlimited time and resources, where would you go next with this analysis?