Datasets

Below are a list of datasets broadly related to culture and the humanities, which might be useful for those interested in cultural analytics and the digital humanities.

Film 🎬

Literature 📚

  • Harry Potter series, books 1-7, randomly shuffled (Download)

  • txtLAB’s multilingual novels (Link)

  • Modernist journal data (1890s-1920s) (Link)

  • Seattle Public Library check-out data (2005-present) (Link)

Politics 🗳️ & History 📜

  • The New York Times obituaries (1852-2000) (Download)

  • U.S. Inaugural Addresses (1789-2017) (Download)

  • Nobel Prize winners (1901-2017) (Download)

  • Refugee arrivals to the U.S. (2005-2015) (Link)

  • Irish immigrants admitted to NYC’s Bellevue Almshouse (1840s) (Link)

Social Media 🕸️

  • Donald Trump’s tweets (2009-2020) (Download)

  • “Am I The Asshole?” Reddit posts (Download)

Food 🍔

  • The New York Public Library’s menu dataset (1840-present) (Link)


Other Dataset Compilations

Below are some other great compilations of cultural and humanities-related datasets: