GitHub - publicdigital/covid19-response-tracking: Screenshots and automated analysis of public sector Covid19 sites from around the world

Capturing the evolution of public sector Covid19 response sites from around the world. Lots of governments have responded quickly and used the web to provide essential information, and analysing how they have done will be a useful indicator of how well they're set up for internet-era ways of working.

This was started with a list from the Public Digital blog

Reports generated from this data can be found at https://covidsites.public.digital

Please send any feedback to [email protected]

What we capture

For each site I'm capturing:

A screenshot
A report from lighthouse in both HTML and JSON
The HTML of the site and some basic text analysis

The former will show how the sites evolve in terms of the content they prioritise and the visual design to support it. The lighthouse score gives us a picture of speed, and coverage of the basics of accessibility.

They're stored in a way that was convenient to me when capturing them, which may not be the easiest structure for analysis. If you want to use this data and there's another structure that would make it easier for you, let me know.

How to add a site

To recommend a site for us to include please either:

Create a pull request to add it to the list.csv file
or email [email protected] with the URL and the government it represents

How it works.

Every night we do four things:

Use an open source script to take a screenshot. This was where we started
Take a copy of the HTML of the site
Use the lighthouse tool to run analysis of the site
Query Google's PageSpeedInsights API to capture their analysis of the site.

These scripts currently run on a server provided by DigitalOcean, located in the UK. Most of the scoring is based on objective analysis, but the score is partly influenced by the network connection. We're considering making more use of Google's data to get a less geographically-biased result, but expect that the impact would be very small.

Each day we manually generate the report. This pulls in all the data we've captured and does four things:

Extracts the current and average accessibility scores from the lighthouse data.
Extracts the current and average speed scores from the lighthouse data.
Generates a timelapse animated GIF from the site
Produces a report as a website

Every now and then we also manually run a script to generate the videos. We’re using a private copy of the excellent webpagetest software (running on Amazon Web Services) with some hacky scripts that manage the queueing and check when the videos are done. As we have this set up it’s really slow. If this service turned into a reasonable tool we’d love to get some expert help to make it faster and more reliable, and to really tune the settings to make the simulation of phones reliable.

Reading age and clarity

There is some controversy over whether or not it's useful to attempt to automatically determine the complexity of a piece of text. Because of that we're not currently showing that measure in the reports, but we do still capture some analysis to allow further work.

The first step is to extract text from the HTML. That’s never been a straightforward process unless you do it manually (which we’ve not had time to do). Instead we're relying on Trafilatura for that.

We then use textstat to extract a range of scores for text complexity/reading age.

You can see the day-by-day results of that in the language analysis folder

Results

Reports generated from this data can be found at https://covidsites.public.digital

Please send any feedback to [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 197 Commits
2020-03-25		2020-03-25
2020-03-31		2020-03-31
2020-04-01		2020-04-01
2020-04-02		2020-04-02
2020-04-03		2020-04-03
2020-04-04		2020-04-04
2020-04-05		2020-04-05
2020-04-06		2020-04-06
2020-04-07		2020-04-07
2020-04-08		2020-04-08
2020-04-09		2020-04-09
2020-04-10		2020-04-10
2020-04-11		2020-04-11
2020-04-15		2020-04-15
2020-04-16		2020-04-16
2020-04-17		2020-04-17
2020-04-18		2020-04-18
2020-04-19		2020-04-19
2020-04-20		2020-04-20
2020-04-21		2020-04-21
2020-04-22		2020-04-22
2020-04-23		2020-04-23
2020-04-24		2020-04-24
2020-04-25		2020-04-25
2020-04-26		2020-04-26
2020-04-27		2020-04-27
2020-04-28		2020-04-28
2020-04-29		2020-04-29
2020-04-30		2020-04-30
2020-05-01		2020-05-01
2020-05-02		2020-05-02
2020-05-03		2020-05-03
2020-05-04		2020-05-04
2020-05-05		2020-05-05
2020-05-06		2020-05-06
2020-05-07		2020-05-07
2020-05-08		2020-05-08
2020-05-09		2020-05-09
2020-05-10		2020-05-10
2020-05-11		2020-05-11
2020-05-12		2020-05-12
2020-05-13		2020-05-13
2020-05-14		2020-05-14
2020-05-15		2020-05-15
2020-05-16		2020-05-16
2020-05-17		2020-05-17
2020-05-18		2020-05-18
2020-05-19		2020-05-19
2020-05-20		2020-05-20
2020-05-21		2020-05-21
2020-05-22		2020-05-22
2020-05-23		2020-05-23
2020-05-24		2020-05-24
2020-05-25		2020-05-25
2020-05-26		2020-05-26
2020-05-27		2020-05-27
2020-05-28		2020-05-28
2020-05-29		2020-05-29
2020-05-30		2020-05-30
2020-05-31		2020-05-31
2020-06-01		2020-06-01
2020-06-02		2020-06-02
2020-06-03		2020-06-03
2020-06-04		2020-06-04
2020-06-05		2020-06-05
2020-06-06		2020-06-06
2020-06-07		2020-06-07
2020-06-08		2020-06-08
2020-06-09		2020-06-09
2020-06-10		2020-06-10
2020-06-11		2020-06-11
2020-06-12		2020-06-12
2020-06-13		2020-06-13
2020-06-14		2020-06-14
2020-06-15		2020-06-15
2020-06-16		2020-06-16
2020-06-17		2020-06-17
2020-06-18		2020-06-18
2020-06-19		2020-06-19
2020-06-20		2020-06-20
2020-06-21		2020-06-21
2020-06-22		2020-06-22
2020-06-23		2020-06-23
2020-06-24		2020-06-24
2020-06-25		2020-06-25
2020-06-26		2020-06-26
2020-06-27		2020-06-27
2020-06-28		2020-06-28
2020-06-29		2020-06-29
2020-07-01		2020-07-01
2020-07-02		2020-07-02
2020-07-03		2020-07-03
2020-07-04		2020-07-04
2020-07-05		2020-07-05
2020-07-06		2020-07-06
2020-07-07		2020-07-07
2020-07-08		2020-07-08
2020-07-09		2020-07-09
2020-07-10		2020-07-10
2020-07-11		2020-07-11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What we capture

How to add a site

How it works.

Reading age and clarity

Results

About

Releases

Packages

Contributors 2

Languages

publicdigital/covid19-response-tracking

Folders and files

Latest commit

History

Repository files navigation

What we capture

How to add a site

How it works.

Reading age and clarity

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages