- Python Automation Cookbook
- Jaime Buelta
- 162字
- 2025-04-04 16:17:47
How it works...
The parsed feed object contains the information of the entries, as well as general information about the feed itself, such as when it was updated. The feed information can be found in the feed attribute:
>>> rss.feed.title
'NYT > Home Page'
Each of the entries work as a dictionary, so the fields are easy to retrieve. They can also be accessed as attributes, but treating them as keys allows us to get all the available fields:
>>> entries[5].keys()
dict_keys(['title', 'title_detail', 'links', 'link', 'id', 'guidislink', 'media_content', 'summary', 'summary_detail', 'media_credit', 'credit', 'content', 'authors', 'author', 'author_detail', 'published', 'published_parsed', 'tags'])
The basic strategy when dealing with feeds is to parse them and go through the entries, performing a quick check on whether they are interesting or not, for example, by checking the description or summary. If they are download the whole page using the link field. Then, to avoid rechecking entires, store the latest publication date and next time, only check newer entries.