Skip to content

Commit

Permalink
Merge pull request InternetHealthReport#127 from m-appel/acknowledgme…
Browse files Browse the repository at this point in the history
…nt-file

Add acknowledgments
  • Loading branch information
romain-fontugne authored Feb 19, 2024
2 parents 236f736 + e4cb984 commit de1f514
Showing 1 changed file with 185 additions and 0 deletions.
185 changes: 185 additions & 0 deletions ACKNOWLEDGMENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
# Acknowledgments

The Internet Yellow Pages could not exist without all the awesome prior research and
data sources. We list all of them here, if possible with their corresponding licenses,
to which you will need to conform if you use the public instance or create a dump that
includes these data sources.

Please refer to the READMEs in the respective crawler directories for more information.

## Alice-LG

We retrieve route server looking glass snapshots from the following IXPs.

| Name | URL |
|----------|----------------------------|
| AMS-IX | https://lg.ams-ix.net/ |
| BCIX | https://lg.bcix.de/ |
| DE-CIX | https://lg.de-cix.net/ |
| IX.br | https://lg.ix.br/ |
| LINX | https://alice-rs.linx.net/ |
| Megaport | https://lg.megaport.com/ |
| Netnod | https://lg.netnod.se/ |

## APNIC

We use [APNIC](https://labs.apnic.net/)'s [AS population
estimate](https://labs.apnic.net/index.php/2014/10/02/how-big-is-that-network/).

## BGPKIT

We use the as2rel, peer-stats, and pfx2as [datasets](https://data.bgpkit.com/) from
[BGPKIT](https://bgpkit.com/).

Use of this data is authorized under their [Acceptable Use
Agreement](https://bgpkit.com/aua).

## BGP.Tools

We use [AS names, AS tags](https://bgp.tools/kb/api), and [anycast prefix
tags](https://github.com/bgptools/anycast-prefixes) provided by
[BGP.Tools](https://bgp.tools/).

## CAIDA

We use two datasets from [CAIDA](https://www.caida.org/) which use is authorized
under their [Acceptable Use Agreement](https://www.caida.org/about/legal/aua/).

> CAIDA AS Rank https://doi.org/10.21986/CAIDA.DATA.AS-RANK.
and

> The CAIDA UCSD IXPs Dataset,
> https://www.caida.org/catalog/datasets/ixps
## Cisco

We use the [Cisco Umbrella Popularity
List](https://s3-us-west-1.amazonaws.com/umbrella-static/index.html).

## Citizen Lab

We use URL testing lists from [The Citizen Lab](https://citizenlab.ca/).

> Citizen Lab and Others. 2014. URL Testing Lists Intended for Discovering Website
> Censorship. https://github.com/citizenlab/test-lists.
This data is licensed under [CC BY-NC-SA
4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). No changes were made to the data.

## Cloudflare

We use the `radar/dns/top/ases`, `radar/dns/top/locations`, `radar/ranking/top`, and
`radar/datasets` endpoints of the [Clouflare Radar](https://radar.cloudflare.com/) API.

This data is licensed under [CC BY-NC
4.0](https://creativecommons.org/licenses/by-nc/4.0/). No changes were made to the data.

## Emile Aben

We use [AS names](https://github.com/emileaben/asnames) provided by Emile Aben and
others with permission (Hi Emile!).

## Internet Health Report

We use three datasets from the [Internet Health Report](https://ihr.iijlab.net/) (that's
us!): Country Dependency, AS Hegemony, and Route Origin Validation.

This data is licensed under [CC BY-NC-SA
4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). No changes were made to the
data.

## Internet Intelligence Lab

We use the AS to organization mapping from the [Internet Intelligence Lab at Georgia
Tech](https://inetintel.notion.site/Internet-Intelligence-Research-Lab-d186184563d345bab51901129d812ed6).

> Z. Chen, Z. Bischof, C. Testart, A. Dainotti, "AS to Organization Mapping",
> Internet Intelligence Lab at Georgia Tech,
> https://github.com/InetIntel/Dataset-AS-to-Organization-Mapping
Use of this data is authorized under their [Acceptable Use
Agreement](https://raw.githubusercontent.com/InetIntel/Dataset-AS-to-Organization-Mapping/master/LICENSE).

## Number Resource Organization

We use the [extended allocation and assignment
reports](https://www.nro.net/about/rirs/statistics/) provided by the [Number Resource
Organization](https://www.nro.net/).

## OpenINTEL

We use several datasets from [OpenINTEL](https://www.openintel.nl/), a joint project of
the University of Twente, SURF, SIDN Labs and NLnet Labs.

The `tranco1m` and `umbrella1m` [datasets](https://data.openintel.nl/data/) are licensed
under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). No changes
were made to the data. In addition, there are [Terms of
Use](https://data.openintel.nl/data/README.txt) for this data.

The [DNS Dependency Graph tool](https://dnsgraph.dacs.utwente.nl/) is a joint project of
the University of Twente and IIJ Research Laboratory.

Other datasets are used with permission from OpenINTEL.

## Packet Clearing House

We use the [daily routing snapshots](https://www.pch.net/resources/Routing_Data/) from
[Packet Clearing House](https://www.pch.net/).

This data is licensed under [CC BY-NC-SA
3.0](https://creativecommons.org/licenses/by-nc-sa/3.0/). No changes were made to the
data.

## PeeringDB

We use the `fac`, `ix`, `ixlan`, `netfac`, and `org` endpoints of the
[PeeringDB](https://www.peeringdb.com/) API.

Use of this data is authorized under their [Acceptable Use
Policy](https://www.peeringdb.com/aup).

## RIPE NCC

We use AS names, Atlas measurement information, and RPKI data from the [RIPE
NCC](https://www.ripe.net/) and [RIPE Atlas](https://atlas.ripe.net/).

## Stanford

We use the [Stanford ASdb dataset](https://asdb.stanford.edu/) provided by the [Stanford
Empirical Security Research Group](https://esrg.stanford.edu/).

> [ASdb: A System for Classifying Owners of Autonomous
> Systems](https://zakird.com/papers/asdb.pdf).
> Maya Ziv, Liz Izhikevich, Kimberly Ruth, Katherine Izhikevich, and Zakir Durumeric.
> ACM Internet Measurement Conference (IMC), November 2021.
## Tranco

We use the [Tranco list](https://tranco-list.eu/) provided by the [DistriNet Research
Unit KU Leuven](https://distrinet.cs.kuleuven.be/), [TU Delft](https://www.tudelft.nl/),
and [LIG](https://www.liglab.fr/).

The Tranco list combines lists from five providers:

1. [Cisco
Umbrella](https://umbrella-static.s3-us-west-1.amazonaws.com/index.html)
1. [Majestic](https://majestic.com/reports/majestic-million) (available under a [CC BY
3.0](https://creativecommons.org/licenses/by/3.0/) license)
1. [Farsight](https://www.domaintools.com/resources/blog/mirror-mirror-on-the-wall-whos-the-fairest-website-of-them-all)
1. [Chrome User Experience Report (CrUX)](https://developer.chrome.com/docs/crux/)
([available](https://research.google/resources/datasets/chrome-user-experience-report/)
under a [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/) license)
1. [Cloudflare Radar](https://radar.cloudflare.com/domains)
([available](https://radar.cloudflare.com/about) under a [CC BY-NC
4.0](https://creativecommons.org/licenses/by-nc/4.0/) license).

## Virginia Tech

We use the [RoVista](https://rovista.netsecurelab.org/) dataset provided by the
NetSecLab group at Virginia Tech.

> RoVista: Measuring and Understanding the Route Origin Validation (ROV) in RPKI.
> Weitong Li, Zhexiao Lin, Md. Ishtiaq Ashiq, Emile Aben, Romain Fontugne,
> Amreesh Phokeer, and Taejoong Chung.
> ACM Internet Measurement Conference (IMC), October 2023.

0 comments on commit de1f514

Please sign in to comment.