Unstructured

Unstructured data is data that is without any defined organization and which specifically does not break down into stringently defined columns of specific types. This can consist of many types of information such as photos and graphic images, videos, streaming sensor data, web pages, PDF files, PowerPoint presentations, emails, blog entries, wikis, and word processing documents.

While pandas does not manipulate unstructured data directly, it provides a number of facilities to extract structured data from unstructured sources. As a specific example that we will examine, pandas has tools to retrieve web pages and extract specific pieces of content into a DataFrame.