Word and Document Embeddings

Contents

Word and Document Embeddings#

Word and document embeddings are numerical representations of text that capture semantic meaning. They allow you to measure how similar words or documents are to one another, which is useful for tasks like finding thematically related texts in a large corpus.

Measuring Document Similarity with LLMs #

This tutorial demonstrates how to use LLMs to find similar texts within a dataset. It covers comparing narrative versus non-narrative texts and analyzing poetry collections.

Measuring Word Similarity with BERT #

This tutorial shows how to use a pre-trained BERT model to measure word similarity by finding semantically comparable words from poem collections.

Measuring Word Similarity with BERT (Spanish)#

A demo showing how the word similarity approach works with a Spanish-language BERT model, illustrating that these techniques can be applied beyond English.

From AI for Humanists