- Python Automation Cookbook
- Jaime Buelta
- 102字
- 2025-04-04 16:17:47
There's more...
Further filters could be enforced, for example, discarding all links that end in .pdf, meaning they are PDF files:
# In get_links
if link.endswith('pdf'):
continue
The use of Content-Type can also be determined to parse the returned object in different ways. A PDF result (Content-Type: application/pdf) won't have a valid response.text object to be parsed, but it can be parsed in other ways. The same is valid for other types, such as a CSV file (Content-Type: text/csv) or a ZIP file that may need to be decompressed (Content-Type: application/zip). We'll see how to deal with those later.