Application Programming Interfaces (APIs)

In the previous lessons, we collected internet data by scraping the surface of web pages. But there’s another way of collecting internet data called Application Programming Interfaces (APIs).

What is an API?

An API allows you to programmatically extract and interact with data under the hood of websites like Genius as well as other social networks, applications, and projects that make their data publicly available, such as Twitter and The Smithsonian museums.

An API is something that a project or company explicitly designs for data-sharing purposes. Why do companies or projects go to all this trouble? One reason is that it helps to promote the use and further development of an application and its data. For example, Twitter wants other developers to use, integrate, and build upon Twitter tools and data. The Twitter API is the main conduit by which these developers can do so.

Pros

Because APIs are explicitly designed for data-sharing purposes, working with an API is often a cleaner, more reliable, and more streamlined process than web scraping.

Cons

One of the downsides is that the companies and projects that design the APIs get to decide exactly which kinds of data they want to share. *Spoiler alert* They often choose not to share their most lucrative and desirable data. For example, the Genius API does not provide access to lyrics data, and the Twitter API does not provide free access to tweets more than 14 days in the past. To get that Twitter data, you need to pay for special API access.