There's more...

Further filters could be enforced, for example, discarding all links that end in .pdf, meaning they are PDF files:

# In get_links
if link.endswith('pdf'):
continue

The use of Content-Type can also be determined to parse the returned object in different ways. A PDF result (Content-Type: application/pdf) won't have a valid response.text object to be parsed, but it can be parsed in other ways. The same is valid for other types, such as a CSV file (Content-Type: text/csv) or a ZIP file that may need to be decompressed (Content-Type: application/zip). We'll see how to deal with those later.