Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link individual small graphs with wikidata usig LIMES #10

Open
sshivam95 opened this issue Jun 18, 2024 · 6 comments
Open

Link individual small graphs with wikidata usig LIMES #10

sshivam95 opened this issue Jun 18, 2024 · 6 comments

Comments

@sshivam95
Copy link
Collaborator

Next step to #9

@sshivam95
Copy link
Collaborator Author

For Linking the full WDC dataset, we need to take care about the <RESTRICTION> tag. Since the classes in wikidata are gibberish, eg.:

  • Q5: Represents all human beings.
  • Q515: Represents cities.
  • Q571: Represents books.

etc. These does not make sense to humans, therefore we are interested in their rdfs:label property.

Working pipeline:

Step Description
Gather Wikidata Classes Gather all the Wikidata classes with rdfs:label using a SPARQL query
Gather WDC Dataset Classes Gather all the classes from each WDC dataset
Link Dataset Classes Link the dataset classes with Wikidata classes
Store Linked Classes Keep these linked classes for further checking
Automate Config Creation Automate the creation of config files for LIMES linking based on the KG with only triples

@sshivam95
Copy link
Collaborator Author

For step Gather Wikidata Classes,

  • extract the languages from the classes in WDC dataset.
  • Use these language list to extract labels from wikidata using the wikibase service.
    Query to use:
SELECT ?class ?classLabel
WHERE
{
  SERVICE wikibase:label { bd:serviceParam wikibase:language {list of languages}. }
  {SELECT DISTINCT ?class WHERE {
    ?s wdt:P31 ?class .
  } OFFSET 1000 LIMIT 100}
}

@sshivam95
Copy link
Collaborator Author

sshivam95 commented Jun 27, 2024

  • Languages for the WDC labels are extracted.

  • Job for getting Wikidata classes for those languages

@sshivam95
Copy link
Collaborator Author

sshivam95 commented Jul 30, 2024

Update:

  • Clean WDC class labels
  • Get the languages in which labels are in.
  • Use these languages to get the wikidata classes.
  • Link wikidata classes with WDC classes for referral

@sshivam95
Copy link
Collaborator Author

sshivam95 commented Jul 30, 2024

For linking, to avoid complexity based on the large number of files, combining 99% of files in each format was done only for linking.
This reduced applying limes for linking on, eg. $265$ datasets to only $15$ datasets.

  • Create a checker file containing wiki_class owl:equivalentClass KG_class .

@sshivam95
Copy link
Collaborator Author

sshivam95 commented Jul 30, 2024

Combining 99% of named KGs in a dataset to avoid creating number of Limes config. Same as #9 (comment)

  • species_dataset: 265 files
  • hresume_dataset: 91 files
  • hrecipe_dataset: 2563 files
  • hlisting_dataset: 7100 files
  • hcalendar_dataset: 19446 files
  • hreview_dataset: 16157 files
  • geo_dataset: 27272 files
  • adr_dataset: 128132 files
  • xfn_dataset: 371903 files
  • rdfa_dataset: 605989 files
  • hcard_dataset: 3977271 files
  • microdata_dataset: 7982306 files
  • jsonld_dataset: 7506522 files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant