Skip to content

Structured Data Import

Jan Ehmueller edited this page Jul 27, 2017 · 11 revisions

Steps

The import of a structured data source consist of three distinct steps.

  1. Normalization
  2. Duplicate Detection
  3. Data Merging

The result of the successive execution of all steps is then written to the 'subject' Table.

Data Sources

The pipeline for each data sources can be found here:

  1. Implisense
  2. Wikidata
  3. DBpedia
  4. Kompass

companies.jar

This jar is needed for compilation. It is used for the transformation of new datasources into subjects (legal form extraction). It can be downloaded from /home/bp2016n1/jars/companies.jar on sopedu.


Unstructured Data Import is documented here

Clone this wiki locally