Skip to content

Latest commit

 

History

History
32 lines (24 loc) · 1.71 KB

ubm_berd.md

File metadata and controls

32 lines (24 loc) · 1.71 KB

USE CASE BERD BY Mannheim University Library (UBM)

Context

Motivation. German company data are spread over many providers, registers and time spans. The company identifiers in Germany are sadly famous for their lack of uniqueness, inconsistent representations and multiple registrations per legal entity (see OpenCorporates blog). The modern data were scraped and processed by OpenCorporates. The main historical datasets were digitized and processed by Mannheim University Library.

Goal. Create a knowledge graph-based research infrastructure for German company datasets in order to improve access to German Business, Economic and Related Data (BERD).

Software. We chose Wikibase for creating and maintaining a knowledge graph.

Challenges

  1. Data integration
  2. Data quality: non-unique identifiers & OCR-ed data
  3. Scaling a Wikibase-based knowledge graph

Resources

  • Websites:
  • Data:
    • Historical - unstructured and semi-structured datasets
    • Modern - OpenCorporates dataset
  • Tools:
    • bbw - semantic annotator for tabular data
    • RaiseWikibase - a tool for speeding up data integration and knowledge graph construction using Wikibase