Skip to content

Support for Google Dataset Search

Ondřej Košarko edited this page Aug 5, 2020 · 1 revision

Introduced in #916 & #924, based on https://developers.google.com/search/docs/data-types/dataset (28.02.2020); the code injects script type="application/ld+json" in item-view pages based on a predefined mapping and under certain conditions.

Test with https://search.google.com/test/rich-results when deployed.

Disable

This feature can be disabled in dspace.cfg by setting google-dataset.enable to false

Mapping

The default mapping (which can be overridden/extended in dspace/config/crosswalks/google-metadata.properties)

name = dc.title
description = dc.description
keywords = dc.subject
license = dc.rights.uri
url = dc.identifier.uri
citation = dc.relation.isreferencedby
identifier = dc.identifier.uri
creator = dc.contributor.author

name and description have a special treatment as those are mandatory; the description must fit between 50 and 5000 characters. Creator (if present) has a special treatment too, as that must be converted to object. To extend, if you don't need to create an object, just add another mapping line. DataDownload is left out on purpose, so everyone has to go through the landing page. Moreover; many our datasets are split into multiple files and the documentation seems unclear in that matter.

Conditions

  1. the code looks for google-dataset.blacklistedTypes (a comma separated list of type values) in dspace.cfg. If an item has a blacklisted dc.type the google dataset metadata are not injected. Eg. we blacklist toolService type
  2. by default it only injects the metadata when the item has bitstreams; this can be overriden in dspace.cfg by setting google-dataset.onlyItemsWithBitstreams to false.
Clone this wiki locally