Skip to content

Commit

Permalink
HERD name change (#933)
Browse files Browse the repository at this point in the history
* HERD name change

* schema

* schema

* Update plot_external_resources.py

* Update docs/gallery/plot_external_resources.py

Co-authored-by: Oliver Ruebel <[email protected]>

* missed changes of externalresource to herd

---------

Co-authored-by: Oliver Ruebel <[email protected]>
  • Loading branch information
mavaylon1 and oruebel authored Aug 9, 2023
1 parent 3f3586a commit dd39b38
Show file tree
Hide file tree
Showing 13 changed files with 304 additions and 301 deletions.
100 changes: 50 additions & 50 deletions docs/gallery/plot_external_resources.py
Original file line number Diff line number Diff line change
@@ -1,34 +1,34 @@
"""
ExternalResources
=================
HERD: HDMF External Resources Data Structure
==============================================
This is a user guide to interacting with the
:py:class:`~hdmf.common.resources.ExternalResources` class. The ExternalResources type
:py:class:`~hdmf.common.resources.HERD` class. The HERD type
is experimental and is subject to change in future releases. If you use this type,
please provide feedback to the HDMF team so that we can improve the structure and
access of data stored with this type for your use cases.
Introduction
-------------
The :py:class:`~hdmf.common.resources.ExternalResources` class provides a way
The :py:class:`~hdmf.common.resources.HERD` class provides a way
to organize and map user terms from their data (keys) to multiple entities
from the external resources. A typical use case for external resources is to link data
stored in datasets or attributes to ontologies. For example, you may have a
dataset ``country`` storing locations. Using
:py:class:`~hdmf.common.resources.ExternalResources` allows us to link the
:py:class:`~hdmf.common.resources.HERD` allows us to link the
country names stored in the dataset to an ontology of all countries, enabling
more rigid standardization of the data and facilitating data query and
introspection.
From a user's perspective, one can think of the
:py:class:`~hdmf.common.resources.ExternalResources` as a simple table, in which each
:py:class:`~hdmf.common.resources.HERD` as a simple table, in which each
row associates a particular ``key`` stored in a particular ``object`` (i.e., Attribute
or Dataset in a file) with a particular ``entity`` (i.e, a term of an online
resource). That is, ``(object, key)`` refer to parts inside a
file and ``entity`` refers to an external resource outside the file, and
:py:class:`~hdmf.common.resources.ExternalResources` allows us to link the two. To
:py:class:`~hdmf.common.resources.HERD` allows us to link the two. To
reduce data redundancy and improve data integrity,
:py:class:`~hdmf.common.resources.ExternalResources` stores this data internally in a
:py:class:`~hdmf.common.resources.HERD` stores this data internally in a
collection of interlinked tables.
* :py:class:`~hdmf.common.resources.KeyTable` where each row describes a
Expand All @@ -45,21 +45,21 @@
:py:class:`~hdmf.common.resources.ObjectKey` pair identifying which keys
are used by which objects.
The :py:class:`~hdmf.common.resources.ExternalResources` class then provides
The :py:class:`~hdmf.common.resources.HERD` class then provides
convenience functions to simplify interaction with these tables, allowing users
to treat :py:class:`~hdmf.common.resources.ExternalResources` as a single large table as
to treat :py:class:`~hdmf.common.resources.HERD` as a single large table as
much as possible.
Rules to ExternalResources
Rules to HERD
---------------------------
When using the :py:class:`~hdmf.common.resources.ExternalResources` class, there
When using the :py:class:`~hdmf.common.resources.HERD` class, there
are rules to how users store information in the interlinked tables.
1. Multiple :py:class:`~hdmf.common.resources.Key` objects can have the same name.
They are disambiguated by the :py:class:`~hdmf.common.resources.Object` associated
with each, meaning we may have keys with the same name in different objects, but for a particular object
all keys must be unique.
2. In order to query specific records, the :py:class:`~hdmf.common.resources.ExternalResources` class
2. In order to query specific records, the :py:class:`~hdmf.common.resources.HERD` class
uses '(file, object_id, relative_path, field, key)' as the unique identifier.
3. :py:class:`~hdmf.common.resources.Object` can have multiple :py:class:`~hdmf.common.resources.Key`
objects.
Expand All @@ -74,7 +74,7 @@
Use the format provided by the resource. For example, Identifiers.org uses the ID ``ncbigene:22353``
but the NCBI Gene uses the ID ``22353`` for the same term.
8. In a majority of cases, :py:class:`~hdmf.common.resources.Object` objects will have an empty string
for 'field'. The :py:class:`~hdmf.common.resources.ExternalResources` class supports compound data_types.
for 'field'. The :py:class:`~hdmf.common.resources.HERD` class supports compound data_types.
In that case, 'field' would be the field of the compound data_type that has an external reference.
9. In some cases, the attribute that needs an external reference is not a object with a 'data_type'.
The user must then use the nearest object that has a data type to be used as the parent object. When
Expand All @@ -85,41 +85,41 @@
has :py:class:`~hdmf.common.resources.File` along the parent hierarchy.
"""
######################################################
# Creating an instance of the ExternalResources class
# Creating an instance of the HERD class
# ----------------------------------------------------

# sphinx_gallery_thumbnail_path = 'figures/gallery_thumbnail_externalresources.png'
from hdmf.common import ExternalResources
from hdmf.common import HERD
from hdmf.common import DynamicTable, VectorData
from hdmf import Container, ExternalResourcesManager
from hdmf import Container, HERDManager
from hdmf import Data
import numpy as np
import os
# Ignore experimental feature warnings in the tutorial to improve rendering
import warnings
warnings.filterwarnings("ignore", category=UserWarning, message="ExternalResources is experimental*")
warnings.filterwarnings("ignore", category=UserWarning, message="HERD is experimental*")


# Class to represent a file
class ExternalResourcesManagerContainer(Container, ExternalResourcesManager):
class HERDManagerContainer(Container, HERDManager):
def __init__(self, **kwargs):
kwargs['name'] = 'ExternalResourcesManagerContainer'
kwargs['name'] = 'HERDManagerContainer'
super().__init__(**kwargs)


er = ExternalResources()
file = ExternalResourcesManagerContainer(name='file')
er = HERD()
file = HERDManagerContainer(name='file')


###############################################################################
# Using the add_ref method
# ------------------------------------------------------
# :py:func:`~hdmf.common.resources.ExternalResources.add_ref`
# :py:func:`~hdmf.common.resources.HERD.add_ref`
# is a wrapper function provided by the
# :py:class:`~hdmf.common.resources.ExternalResources` class that simplifies adding
# data. Using :py:func:`~hdmf.common.resources.ExternalResources.add_ref` allows us to
# :py:class:`~hdmf.common.resources.HERD` class that simplifies adding
# data. Using :py:func:`~hdmf.common.resources.HERD.add_ref` allows us to
# treat new entries similar to adding a new row to a flat table, with
# :py:func:`~hdmf.common.resources.ExternalResources.add_ref` taking care of populating
# :py:func:`~hdmf.common.resources.HERD.add_ref` taking care of populating
# the underlying data structures accordingly.

data = Data(name="species", data=['Homo sapiens', 'Mus musculus'])
Expand Down Expand Up @@ -165,7 +165,7 @@ def __init__(self, **kwargs):
entity_uri='http://www.informatics.jax.org/marker/MGI:1343464'
)

# Note: :py:func:`~hdmf.common.resources.ExternalResources.add_ref` internally resolves the object
# Note: :py:func:`~hdmf.common.resources.HERD.add_ref` internally resolves the object
# to the closest parent, so that ``er.add_ref(container=genotypes, attribute='genotype_name')`` and
# ``er.add_ref(container=genotypes.genotype_name, attribute=None)`` will ultimately both use the ``object_id``
# of the ``genotypes.genotype_name`` :py:class:`~hdmf.common.table.VectorData` column and
Expand Down Expand Up @@ -197,12 +197,12 @@ def __init__(self, **kwargs):
)

###############################################################################
# Visualize ExternalResources
# Visualize HERD
# ------------------------------------------------------
# Users can visualize `~hdmf.common.resources.ExternalResources` as a flattened table or
# Users can visualize `~hdmf.common.resources.HERD` as a flattened table or
# as separate tables.

# `~hdmf.common.resources.ExternalResources` as a flattened table
# `~hdmf.common.resources.HERD` as a flattened table
er.to_dataframe()

# The individual interlinked tables:
Expand All @@ -216,13 +216,13 @@ def __init__(self, **kwargs):
###############################################################################
# Using the get_key method
# ------------------------------------------------------
# The :py:func:`~hdmf.common.resources.ExternalResources.get_key`
# The :py:func:`~hdmf.common.resources.HERD.get_key`
# method will return a :py:class:`~hdmf.common.resources.Key` object. In the current version of
# :py:class:`~hdmf.common.resources.ExternalResources`, duplicate keys are allowed; however, each key needs a unique
# :py:class:`~hdmf.common.resources.HERD`, duplicate keys are allowed; however, each key needs a unique
# linking Object. In other words, each combination of (file, container, relative_path, field, key)
# can exist only once in :py:class:`~hdmf.common.resources.ExternalResources`.
# can exist only once in :py:class:`~hdmf.common.resources.HERD`.

# The :py:func:`~hdmf.common.resources.ExternalResources.get_key` method will be able to return the
# The :py:func:`~hdmf.common.resources.HERD.get_key` method will be able to return the
# :py:class:`~hdmf.common.resources.Key` object if the :py:class:`~hdmf.common.resources.Key` object is unique.
genotype_key_object = er.get_key(key_name='Rorb')

Expand All @@ -232,18 +232,18 @@ def __init__(self, **kwargs):
container=species['Species_Data'],
key_name='Ursus arctos horribilis')

# The :py:func:`~hdmf.common.resources.ExternalResources.get_key` also will check the
# The :py:func:`~hdmf.common.resources.HERD.get_key` also will check the
# :py:class:`~hdmf.common.resources.Object` for a :py:class:`~hdmf.common.resources.File` along the parent hierarchy
# if the file is not provided as in :py:func:`~hdmf.common.resources.ExternalResources.add_ref`
# if the file is not provided as in :py:func:`~hdmf.common.resources.HERD.add_ref`

###############################################################################
# Using the add_ref method with a key_object
# ------------------------------------------------------
# Multiple :py:class:`~hdmf.common.resources.Object` objects can use the same
# :py:class:`~hdmf.common.resources.Key`. To use an existing key when adding
# new entries into :py:class:`~hdmf.common.resources.ExternalResources`, pass the
# new entries into :py:class:`~hdmf.common.resources.HERD`, pass the
# :py:class:`~hdmf.common.resources.Key` object instead of the 'key_name' to the
# :py:func:`~hdmf.common.resources.ExternalResources.add_ref` method. If a 'key_name'
# :py:func:`~hdmf.common.resources.HERD.add_ref` method. If a 'key_name'
# is used, a new :py:class:`~hdmf.common.resources.Key` will be created.

er.add_ref(
Expand All @@ -258,7 +258,7 @@ def __init__(self, **kwargs):
###############################################################################
# Using the get_object_entities
# ------------------------------------------------------
# The :py:class:`~hdmf.common.resources.ExternalResources.get_object_entities` method
# The :py:class:`~hdmf.common.resources.HERD.get_object_entities` method
# allows the user to retrieve all entities and key information associated with an `Object` in
# the form of a pandas DataFrame.

Expand All @@ -269,7 +269,7 @@ def __init__(self, **kwargs):
###############################################################################
# Using the get_object_type
# ------------------------------------------------------
# The :py:class:`~hdmf.common.resources.ExternalResources.get_object_entities` method
# The :py:class:`~hdmf.common.resources.HERD.get_object_entities` method
# allows the user to retrieve all entities and key information associated with an `Object` in
# the form of a pandas DataFrame.

Expand All @@ -285,9 +285,9 @@ def __init__(self, **kwargs):
# column/field is associated with different ontologies, then use field='x' to denote that
# 'x' is using the external reference.

# Let's create a new instance of :py:class:`~hdmf.common.resources.ExternalResources`.
er = ExternalResources()
file = ExternalResourcesManagerContainer(name='file')
# Let's create a new instance of :py:class:`~hdmf.common.resources.HERD`.
er = HERD()
file = HERDManagerContainer(name='file')

data = Data(
name='data_name',
Expand All @@ -307,28 +307,28 @@ def __init__(self, **kwargs):
)

###############################################################################
# Write ExternalResources
# Write HERD
# ------------------------------------------------------
# :py:class:`~hdmf.common.resources.ExternalResources` is written as a zip file of
# :py:class:`~hdmf.common.resources.HERD` is written as a zip file of
# the individual tables written to tsv.
# The user provides the path, which contains the name of the directory.

er.to_norm_tsv(path='./')

###############################################################################
# Read ExternalResources
# Read HERD
# ------------------------------------------------------
# Users can read :py:class:`~hdmf.common.resources.ExternalResources` from the tsv format
# Users can read :py:class:`~hdmf.common.resources.HERD` from the tsv format
# by providing the path to the directory.

er_read = ExternalResources.from_norm_tsv(path='./')
er_read = HERD.from_norm_tsv(path='./')
os.remove('./er.zip')

###############################################################################
# Using TermSet with ExternalResources
# Using TermSet with HERD
# ------------------------------------------------
# :py:class:`~hdmf.term_set.TermSet` allows for an easier way to add references to
# :py:class:`~hdmf.common.resources.ExternalResources`. These enumerations take place of the
# :py:class:`~hdmf.common.resources.HERD`. These enumerations take place of the
# entity_id and entity_uri parameters. :py:class:`~hdmf.common.resources.Key` values will have
# to match the name of the term in the :py:class:`~hdmf.term_set.TermSet`.
from hdmf.term_set import TermSet
Expand Down
2 changes: 1 addition & 1 deletion src/hdmf/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from . import query
from .backends.hdf5.h5_utils import H5Dataset, H5RegionSlicer
from .container import Container, Data, DataRegion, ExternalResourcesManager
from .container import Container, Data, DataRegion, HERDManager
from .region import ListSlicer
from .utils import docval, getargs
from .term_set import TermSet
Expand Down
10 changes: 5 additions & 5 deletions src/hdmf/backends/hdf5/h5tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,15 +61,15 @@ def can_read(path):
'doc': 'the MPI communicator to use for parallel I/O', 'default': None},
{'name': 'file', 'type': [File, "S3File"], 'doc': 'a pre-existing h5py.File object', 'default': None},
{'name': 'driver', 'type': str, 'doc': 'driver for h5py to use when opening HDF5 file', 'default': None},
{'name': 'external_resources_path', 'type': str,
'doc': 'The path to the ExternalResources', 'default': None},)
{'name': 'herd_path', 'type': str,
'doc': 'The path to the HERD', 'default': None},)
def __init__(self, **kwargs):
"""Open an HDF5 file for IO.
"""
self.logger = logging.getLogger('%s.%s' % (self.__class__.__module__, self.__class__.__qualname__))
path, manager, mode, comm, file_obj, driver, external_resources_path = popargs('path', 'manager', 'mode',
path, manager, mode, comm, file_obj, driver, herd_path = popargs('path', 'manager', 'mode',
'comm', 'file', 'driver',
'external_resources_path',
'herd_path',
kwargs)

self.__open_links = [] # keep track of other files opened from links in this file
Expand All @@ -93,7 +93,7 @@ def __init__(self, **kwargs):
self.__comm = comm
self.__mode = mode
self.__file = file_obj
super().__init__(manager, source=path, external_resources_path=external_resources_path)
super().__init__(manager, source=path, herd_path=herd_path)
# NOTE: source is not set if path is None and file_obj is passed
self.__built = dict() # keep track of each builder for each dataset/group/link for each file
self.__read = dict() # keep track of which files have been read. Key is the filename value is the builder
Expand Down
Loading

0 comments on commit dd39b38

Please sign in to comment.