-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update time extraction code for all configured news sites #5
Open
paultcochrane
wants to merge
16
commits into
denny:main
Choose a base branch
from
paultcochrane:update-time-extraction
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
... because BBC uses bbc.com outside of the UK.
The publication date on BBC news is now either within the `datePublished` element of a JSON object stored within a `<script>` element or (if that doesn't exist) within a `<time>` element within the `datetime` attribute. Now the stale news warning works for the BBC again.
Although CNN uses what seems to have become a standard across many news sites for specifying the publication date/time (i.e. the `content` attributes of the `<meta>` element with the `"article:published_time"` property), this doesn't seem to be available when the extension is loaded. However, CNN provides the `<meta>` element with the `pubdate` name and the date information is stored in this element's `content` attribute. This change now gets the stale news warning to work again on CNN.
The DailyMail uses what seems to have become a standard on news sites: the `<meta>` element with the `article:published_time` property contains the publication date/time data.
As with other news agencies, The Guardian is now using the `article:published_time` meta property to store the publication date/time.
... because they also use huffpost.com now.
As with many other news sites, the publication date/time is stored in the `article:published_time` meta property. This change allows the publication date to be extracted, however the extension still won't run because the site has disallowed `alert()`s from running.
As with many other news sites, the publication date/time is stored in the `article:published_time` meta property.
As with many other news sites, the publication date/time is stored in the `article:published_time` meta property. However, sometimes this isn't seen by the stale news warning extension, hence the `datePublished` element of the page's JSON metadata (stored in a `<script>` element) is used as a fall-back. The weird thing with the India Times is that when loading a page the first time, the extension doesn't pick up any publication date/time information, however on reload it *does*. Odd.
The Times of India uses a similar technique to what the BBC does: the date is embedded in the `datePublished` element a JSON object which is embedded in a `<script>` element on the page.
As with many other news sites, the publication date/time is stored within the page's `article:published_time` meta property.
... which uses the `datePublished` property of a JSON object provided via a `<script>` in the page. This is only a slight change over what this extension's code used to do; the JSON is no longer in an element with a well-defined id: one has to search through all `<script>` tags to find the one which contains the relevant information.
... which now use a `<time>` element with the `itemprop` attribute of `dateCreated` (which is the publication date being looked for by the stale news warning extension). The date/time data is stored in the `datetime` attribute.
... which puts the publication date/time info in the `<meta>` tag with the name `DCSext.articleFirstPublished`.
... which puts the publication date/time in the `datePublished` element of a JSON object embedded within a `<script>` element. Unfortunately, even though the publication date/time is extracted correctly, the extension can't show it because `alert()`s are forbidden on Yahoo News.
... which puts the publication date/time in the `datePublished` element of a JSON object embedded within a `<script>` element. Unfortunately, even though the publication date/time is extracted correctly, the extension can't show it because `alert()`s are forbidden on Yahoo News.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR updates the publication date/time extraction code for all sites defined in the extension's manifest file. In some cases it was necessary to extend the matches in the manifest as the news sites have changed their URLs slightly. I've tried to document in each commit the changes that I made and why so that these changes can be cherry picked if so desired.
This PR is submitted in the hope that it is useful; if you want anything changed, I'll be more than happy to update and resubmit as necessary.