From e4cb984e552fd988dbde7227ee04c2301241e3a5 Mon Sep 17 00:00:00 2001 From: Malte Tashiro Date: Fri, 16 Feb 2024 09:04:29 +0000 Subject: [PATCH] Add an acknowledgments file --- ACKNOWLEDGMENTS.md | 185 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 185 insertions(+) create mode 100644 ACKNOWLEDGMENTS.md diff --git a/ACKNOWLEDGMENTS.md b/ACKNOWLEDGMENTS.md new file mode 100644 index 0000000..9aae2ef --- /dev/null +++ b/ACKNOWLEDGMENTS.md @@ -0,0 +1,185 @@ +# Acknowledgments + +The Internet Yellow Pages could not exist without all the awesome prior research and +data sources. We list all of them here, if possible with their corresponding licenses, +to which you will need to conform if you use the public instance or create a dump that +includes these data sources. + +Please refer to the READMEs in the respective crawler directories for more information. + +## Alice-LG + +We retrieve route server looking glass snapshots from the following IXPs. + +| Name | URL | +|----------|----------------------------| +| AMS-IX | https://lg.ams-ix.net/ | +| BCIX | https://lg.bcix.de/ | +| DE-CIX | https://lg.de-cix.net/ | +| IX.br | https://lg.ix.br/ | +| LINX | https://alice-rs.linx.net/ | +| Megaport | https://lg.megaport.com/ | +| Netnod | https://lg.netnod.se/ | + +## APNIC + +We use [APNIC](https://labs.apnic.net/)'s [AS population +estimate](https://labs.apnic.net/index.php/2014/10/02/how-big-is-that-network/). + +## BGPKIT + +We use the as2rel, peer-stats, and pfx2as [datasets](https://data.bgpkit.com/) from +[BGPKIT](https://bgpkit.com/). + +Use of this data is authorized under their [Acceptable Use +Agreement](https://bgpkit.com/aua). + +## BGP.Tools + +We use [AS names, AS tags](https://bgp.tools/kb/api), and [anycast prefix +tags](https://github.com/bgptools/anycast-prefixes) provided by +[BGP.Tools](https://bgp.tools/). + +## CAIDA + +We use two datasets from [CAIDA](https://www.caida.org/) which use is authorized +under their [Acceptable Use Agreement](https://www.caida.org/about/legal/aua/). + +> CAIDA AS Rank https://doi.org/10.21986/CAIDA.DATA.AS-RANK. + +and + +> The CAIDA UCSD IXPs Dataset, +> https://www.caida.org/catalog/datasets/ixps + +## Cisco + +We use the [Cisco Umbrella Popularity +List](https://s3-us-west-1.amazonaws.com/umbrella-static/index.html). + +## Citizen Lab + +We use URL testing lists from [The Citizen Lab](https://citizenlab.ca/). + +> Citizen Lab and Others. 2014. URL Testing Lists Intended for Discovering Website +> Censorship. https://github.com/citizenlab/test-lists. + +This data is licensed under [CC BY-NC-SA +4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). No changes were made to the data. + +## Cloudflare + +We use the `radar/dns/top/ases`, `radar/dns/top/locations`, `radar/ranking/top`, and +`radar/datasets` endpoints of the [Clouflare Radar](https://radar.cloudflare.com/) API. + +This data is licensed under [CC BY-NC +4.0](https://creativecommons.org/licenses/by-nc/4.0/). No changes were made to the data. + +## Emile Aben + +We use [AS names](https://github.com/emileaben/asnames) provided by Emile Aben and +others with permission (Hi Emile!). + +## Internet Health Report + +We use three datasets from the [Internet Health Report](https://ihr.iijlab.net/) (that's +us!): Country Dependency, AS Hegemony, and Route Origin Validation. + +This data is licensed under [CC BY-NC-SA +4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). No changes were made to the +data. + +## Internet Intelligence Lab + +We use the AS to organization mapping from the [Internet Intelligence Lab at Georgia +Tech](https://inetintel.notion.site/Internet-Intelligence-Research-Lab-d186184563d345bab51901129d812ed6). + +> Z. Chen, Z. Bischof, C. Testart, A. Dainotti, "AS to Organization Mapping", +> Internet Intelligence Lab at Georgia Tech, +> https://github.com/InetIntel/Dataset-AS-to-Organization-Mapping + +Use of this data is authorized under their [Acceptable Use +Agreement](https://raw.githubusercontent.com/InetIntel/Dataset-AS-to-Organization-Mapping/master/LICENSE). + +## Number Resource Organization + +We use the [extended allocation and assignment +reports](https://www.nro.net/about/rirs/statistics/) provided by the [Number Resource +Organization](https://www.nro.net/). + +## OpenINTEL + +We use several datasets from [OpenINTEL](https://www.openintel.nl/), a joint project of +the University of Twente, SURF, SIDN Labs and NLnet Labs. + +The `tranco1m` and `umbrella1m` [datasets](https://data.openintel.nl/data/) are licensed +under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). No changes +were made to the data. In addition, there are [Terms of +Use](https://data.openintel.nl/data/README.txt) for this data. + +The [DNS Dependency Graph tool](https://dnsgraph.dacs.utwente.nl/) is a joint project of +the University of Twente and IIJ Research Laboratory. + +Other datasets are used with permission from OpenINTEL. + +## Packet Clearing House + +We use the [daily routing snapshots](https://www.pch.net/resources/Routing_Data/) from +[Packet Clearing House](https://www.pch.net/). + +This data is licensed under [CC BY-NC-SA +3.0](https://creativecommons.org/licenses/by-nc-sa/3.0/). No changes were made to the +data. + +## PeeringDB + +We use the `fac`, `ix`, `ixlan`, `netfac`, and `org` endpoints of the +[PeeringDB](https://www.peeringdb.com/) API. + +Use of this data is authorized under their [Acceptable Use +Policy](https://www.peeringdb.com/aup). + +## RIPE NCC + +We use AS names, Atlas measurement information, and RPKI data from the [RIPE +NCC](https://www.ripe.net/) and [RIPE Atlas](https://atlas.ripe.net/). + +## Stanford + +We use the [Stanford ASdb dataset](https://asdb.stanford.edu/) provided by the [Stanford +Empirical Security Research Group](https://esrg.stanford.edu/). + +> [ASdb: A System for Classifying Owners of Autonomous +> Systems](https://zakird.com/papers/asdb.pdf). +> Maya Ziv, Liz Izhikevich, Kimberly Ruth, Katherine Izhikevich, and Zakir Durumeric. +> ACM Internet Measurement Conference (IMC), November 2021. + +## Tranco + +We use the [Tranco list](https://tranco-list.eu/) provided by the [DistriNet Research +Unit KU Leuven](https://distrinet.cs.kuleuven.be/), [TU Delft](https://www.tudelft.nl/), +and [LIG](https://www.liglab.fr/). + +The Tranco list combines lists from five providers: + +1. [Cisco +Umbrella](https://umbrella-static.s3-us-west-1.amazonaws.com/index.html) +1. [Majestic](https://majestic.com/reports/majestic-million) (available under a [CC BY + 3.0](https://creativecommons.org/licenses/by/3.0/) license) +1. [Farsight](https://www.domaintools.com/resources/blog/mirror-mirror-on-the-wall-whos-the-fairest-website-of-them-all) +1. [Chrome User Experience Report (CrUX)](https://developer.chrome.com/docs/crux/) + ([available](https://research.google/resources/datasets/chrome-user-experience-report/) + under a [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/) license) +1. [Cloudflare Radar](https://radar.cloudflare.com/domains) + ([available](https://radar.cloudflare.com/about) under a [CC BY-NC + 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license). + +## Virginia Tech + +We use the [RoVista](https://rovista.netsecurelab.org/) dataset provided by the +NetSecLab group at Virginia Tech. + +> RoVista: Measuring and Understanding the Route Origin Validation (ROV) in RPKI. +> Weitong Li, Zhexiao Lin, Md. Ishtiaq Ashiq, Emile Aben, Romain Fontugne, +> Amreesh Phokeer, and Taejoong Chung. +> ACM Internet Measurement Conference (IMC), October 2023.