Skip to content

Specification for crosswalks

Knut Wenzig edited this page Sep 24, 2019 · 6 revisions

A crosswalk is always unidirectional from a source classification to a target classification. There are 6 columns:

  • sourceConceptScheme: The identificator of the source classification as defined in the corresponding classification (column ConceptScheme).
  • sourceConcept: The identificator of the source category as defined in the corresponding classification (column Concept).
  • targetConceptScheme: The identificator of the target classification as defined in the corresponding classification (column ConceptScheme).
  • targetConcept: The identificator of the source category as defined in the corresponding classification (column Concept).
  • prob: If there is an 1:1 or m:1 relation the probability (numeric) should be one. If one source category is related to multiple target categories the sum of the entries in prob of the related rows should be one. Per default the row with maximum prob is used.
  • comment: A free text for any desired information of this relation.
  • aux1 ... auxn: one or more auxiliary variables, e.g. information on selfemployment and number of employees in https://github.com/dirtyhawk/stata-derivescores/blob/master/tables/ISCO-88_Ganzeboom--EGP_Ganzeboom.csv

Try to include a row for each source category even it has no corresponding target category. Then leave the cells targetConceptScheme, targetConcept and prob empty. Use comment in order to make this property of the crosswalk explicit.

Only prob and the auxillary variables are numeric. The other columns contain strings and have to be enclosed by quotes.

Variables sourceConceptScheme, sourceConcept, targetConceptScheme and targetConcept must be the first 4 columns in the table; the order of all other variables (auxiliary variables, prob, comment) is arbitrary.

The name of the file has to be sourceConceptScheme+double underscore+targetConceptScheme, the file name extension has to be "csv".

The file has to comply also to Specification for all tables.

The csv files for crosswalks can be imported to Stata with import delimited <file>, encoding(utf8) stringcols(1 2 3 4 6) numericcols(5) case(preserve) clear.