Research

Tooling

Wayback Machine

www.kcna.kp Wayback Sitemap

New Save Page Now Uses Brozzler

Batch archive URLs from google sheets

Official Save Page Now API

StackExchange Answer Save Page Now 2 Change Log Save Page Now 2 Public API Docs Draft

Unofficial Wayback Save API Unofficial Wabback Save Gem

Mark Graham

Director of Wayback

Mark Graham Presentation Video

[01:25] "There are many of our partners that are using us, specifically, to archive the news. This is one particular crawl that I set up, archiving North Korean news. I am capturing something like forty-some North Korean sources every day."

IA Reddit AMA

I am especially passionate about this archive of web content from and about North Korea: https://archive-it.org/collections/6777

Mark

Archive-It North Korea Collection

Other IA Wayback Collections

kcna.kp collections GDELT appears to be prominent

IA Whole Earth Web Archiving

WEWA NK Page IA Webservices

Archive Team

Archive Bot

ArchiveBot

Archive Bot Job 8veu3

North Korea Governments/North Korea

Could set up Archive Team project following this guide.

DNS Leak

NK DNS Leak

Behavior

Wayback SPN appears broken for kcna.kp and others.

Example SPN2 not working Crashes SPN for rodong.rep.kp

The same example website from the link above about SPN2 not working now appears to be displayed on Wayback and is working mostly. Link Perhaps it takes SPN a few days to show up on Wayback.

Some of the images are not being saved properly. Showing 403 access denied for some, but I can access from my browser. One example case of this strange behavior is with this photo that was captured, but failed 403 a few days later here.

User Agent probably doesn't matter. Most of these sites appear to have extreme rate limiting.

I'm also able to capture well using Webrecorder's ArchiveWeb Extension

How to connect and other resources

Research Projects #467 Webscraping README War Dialing README

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RESEARCH.md

RESEARCH.md

Research

Tooling

Wayback Machine

Official Save Page Now API

Mark Graham

Other IA Wayback Collections

IA Whole Earth Web Archiving

Archive Team

Archive Bot

DNS Leak

Behavior

How to connect and other resources

Files

RESEARCH.md

Latest commit

History

RESEARCH.md

File metadata and controls

Research

Tooling

Wayback Machine

Official Save Page Now API

Mark Graham

Other IA Wayback Collections

IA Whole Earth Web Archiving

Archive Team

Archive Bot

DNS Leak

Behavior

How to connect and other resources