-
Notifications
You must be signed in to change notification settings - Fork 35
2.0.0 Customizing Export Formats
With Vireo 1.8, a new export system was introduced that expands the number of export formats. The developers hope that the new export/deposit format system is easy for each institution to tweak or customize to their local needs and easy enough to create completely new formats.
There are 8 pre-defined export formats that are packaged with the default install of Vireo:
-
Vireo Export
This is an internally defined export that is designed to be the highest fidelity of all the export formats. It exports every piece of information about the submission using Vireo's internal data model.
-
DSpace METS
This is the DSpace METS SIP format, which is the best format to use when depositing a submission into a DSpace-based repository via the SWORD protocol.
-
DSpace Simple Archive
This is the DSpace Simple Archive format, which is the best format to use when importing items into a DSpace-based repository manually using the import command.
-
Generic Qualified Dublin Core
This is a very generic Dublin Core format. This format is unlikely to be useful in practice but does demonstrate that Vireo is repository agnostic.
-
File Export
This is a very simple export format that contains no metadata. It will include a directory for each submission which contains the primary document along with any supplemental documents that are associated with the submission.
-
MARC21
This a MARC 21 export format intended to export records to a library OPAC. The fields exported by this format follow a different profile than the XML version. This follows the profile as desired by Texas A&M University instead of the the pseudo ETD-MS standard from the XML version. You may need to modify this format for your local institution.
- grantorLocation: This is used in field 260a to identify where the ETD was published. The default value for this field is "College Station, Tex"
- leader: You may customize the MARC leader field. The default leader is "
RLXXXnam a22BADXXKa 4500
", where RLXXX and BADXX are replaced with the record length and base address.
-
MARC21 XML
This is a MARC 21 export following a pseudo ETD-MS standard with an XML encoding.
-
MODS
This is a MODS export format following the profile created by the Texas Digital Library metadata working group.
-
ProQuest UMI
This is the ProQuest UMI export format. There are a myriad of options that you can use to customize the export generated metadata. This ranges from embargo settings, publication identifiers, and whether to release contact information for the student.
There is a known deficiency with this export format. ProQuest would like a field they call "DISS_category," which contains a subject hierarchy using a controlled vocabulary from UMI. Vireo does not currently collect this field. Also it is unclear what licensing restrictions ProQuest places on using this controlled vocabulary and whether we can include the list within Vireo.
All formats are located in [vireo]/conf/formats
directory. Each format is a Play! Framework-based Groovy template which is dynamically compiled for each export. This is the same template engine that is used within Vireo for generating the HTML views. There are several reasons why this approach was chosen: 1) It allows for code to be executed, meaning that any transformation is possible; 2) for simple XML-based formats, it is easy for non-programmers to follow; 3) because it is dynamic, it is simple to debug. Each format must be configured in the Spring-based configuration file: [vireo]/conf/application-context.xml
. This configuration is a bit complicated, but when adding a new format we can ignore most of it. For each format available in Vireo there is a corresponding <bean>
definition defining it.
All export formats are defined in [vireo]/conf/application-context.xml
. Near the bottom of the file there is an XML comment, <!-- Package Formats -->
. Below this line there is a <bean>
definition for each export format available. Each bean is given a unique id (alpha-numeric with no spaces). The id is just used internally to identify the format. Along with the id parameter each bean has an implementation class and scope. The scope should always be "prototype," meaning that there will only be one instance of the format. However there are multiple implementation classes to choose from:
-
org.tdl.vireo.export.impl.TemplatePackagerImpl
This is the basic packager implementation that uses the template engine to generate an export format. The one limitation this implementation has is that it can only produce one "manifest" or metadata output file. Typically this is fine because most formats are an XML file along with the PDF and supplemental files.
Here are the available options for this packager:
- displayName: This is the name of the export format shown in the select list.
- format: This is an identifier of the format, typically it is the URL to the XML schema defining the format's syntax.
- templatePaths: This identifies multiple templates which will be executed to generate the multiple manifest files. The key is the name of the output file, and the value is the template which generates the file.
- manifestTemplateArguments: This is a list of variables that will be available when a groovy-based template is executed.
- attachmentTypeNames: A list of the attachment types that should be included in this export.
-
org.tdl.vireo.export.impl.MultipleTemplatePackagerImpl
This is a more advanced packager implementation that uses the template engine to generate multiple "manifests" or metadata output files. At the time of writing, only one format uses this implementation -- the DSpace Simple Archive format -- because it has a separate metadata file for each schema.
Here are the available options for this packager:
- displayName: This is the name of the export format shown in the dropdown list.
- format: This is an identifier of the format, typically it is the URL to the XML schema defining the format's syntax.
- manifestTemplatePath: This identifies which groovy-based template to use when generating a format.
- manifestName: This identifies the name of the metadata manifest file that is generated by the template above.
- manifestTemplateArguments: This is a list of variables that will be available when the groovy-based template is executed.
- attachmentTypeNames: A list of the attachment types that should be included in this export.
-
org.tdl.vireo.export.impl.FilePackagerImpl
This is a specialized packager that has only one purpose: to generate an export with no metadata. It is used by the File Export. It is very basic. Here are the available options for this packager:
- displayName: This is the name of the export format shown in the dropdown list.
- attachmentTypeNames: A list of the attachment types that should be included in this export. It would be possible to create a new export format that only included the primary document without the supplemental documents.
The export templates are used to define the format of the metadata exported from Vireo. From these templates you have access to all the internal Java-based APIs to access data from the database and the ability to format that data in any output format. The templates are dynamic, meaning they are recompiled each time they are run, so you can change them, run an export, change them again, and re-run an export without having to restart the server each time to deploy new code. Here are a few very helpful resources:
When export templates are executed, there are several objects that are programmatically inserted into the template's namespace. This makes them easy to reference. For instance you can obtain the student's first name by using ${ sub.getStudentFirstName }
because the sub
(short for submission) object is always available. Here are the variables always available
play
This is a reference to the Play framework object. From this object you can obtain configuration parameters from the application.conf. Such as play.configuration.getProperty("myProperty")
sub
This is a reference to the Vireo submission which is being exported. From this object you can obtain almost all the information needed for most export formats.
manifestName
This is the name of the manifest file that is generated by this template. Inside of each folder for the export a file will be created with this name, and the the output of the template will be stored inside the file. You can not change the name of the file being generated within the template, for that you will need to modify the spring-based configuration. (see above)
mimeType
This is the mimeType (i.e. text/xml
) of the export format. If the export format contains multiple files then the mimeType will be null because the resulting directory of files will be archived together into a single zip file. The resulting mimeType of the package in this case will always be application/zip
. You can not change the mimeType of the package being generated within the template; for that you will need to modify the Spring-based configuration. (see above)
attachmentTypes
This is a Java list of AttachmentType objects that will be included in the export format. All attachments associated with the submission that are of a type contained within the list will also be included in the export package. You cannot change the type of attachments included from within the template; for that you will need to modify the Spring-based configuration. (see above)
personRepo
This is a reference to the PersonRepository object within Vireo. From this repository object you can look up additional person objects. It is very rare that you will need to use this object.
subRepo
This is a reference to the SubmissionRepository object within Vireo. From this repository object you can look up additional submissions other than the current one being exported. It is very rare that you will need to use this object.
-
settingRepo
This is a reference to the SettingsRepository object within Vireo. From this repository object you can look up dynamic configuration parameters such as the current grantor, whether submissions are open or closed, email templates, configuration lists of colleges, departments, etc.
-
proquestRepo
This is a reference to the Proquest Vocabulary Repository object within vireo. It is likely that this repository is only usefull for the proquest export but is available for all export formats. Some fields such as subject are defined by a controlled vocabulary from Proquest. Using this repository you can identify all the possibly subject terms defined in the vocabulary.