Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ExtractorHTML: Treat 'cite' attribute as navlink instead of embed
The cite attribute is used to identify the source document of a blockquote. But ExtractorHTML was treating it as an embed which can cause out of scope pages to be included in a crawl incorrectly. Browsers don't use the cite attribute currently so there might be an argument for ignoring it entirely but let's at least not treat it as an embed.
- Loading branch information