Customizing Vireo Export Formats
Vireo is a turnkey solution for Electronic Thesis & Dissertation (ETD) management from the initial point of a student’s submission, through the approval workflow, and to publication. This post gives an overview of how to export with data from Vireo into other systems.
The export system, introduced with Vireo 1.8, is a powerful and flexible mechanism for allowing other systems to use data from Vireo. The number of built-in formats supported by Vireo is greatly expanded, while also allowing for customizations at each institution. I hope that the new export system will prove to be easily customized by other developers without investing lots of time in learning the intricacies of the platform or java, and instead just focus or reformatting the data. If, you’re a developer working on customizing an export format please drop me a line and let me know how the experience went (or if you’re not finished I may be able to answer a question or two).
There are 8 pre-defined export formats that are packaged with the default install of Vireo:
Vireo Export
This is an internally defined export that is designed to be the highest fidelity of all the export formats. It exports every piece of information about the submission using Vireo’s internal data model.
DSpace METS
This is the DSpace METS SIP format, which is the best format to use when depositing a submission into a DSpace-based repository via the SWORD protocol.
DSpace Simple Archive
This is the DSpace Simple Archive format, which is the best format to use when importing items into a DSpace-based repository manually using the import command.
Generic Qualified Dublin Core
This is a very generic Dublin Core format. This format is unlikely to be useful in practice but does demonstrate that Vireo is repository agnostic.
File Export
This is a very simple export format that contains no metadata. It will include a directory for each submission which contains the primary document along with any supplemental documents that are associated with the submission.
MARC21
This a MARC 21 export format intended to export records to a library OPAC. The fields exported by this format follow a different profile than the XML version. This follows the profile as desired by Texas A&M University instead of the the pseudo ETD-MS standard from the XML version. You may need to modify this format for your local institution.
grantorLocation
: This is used in field260a
to identify where the ETD was published. The default value for this field is “College Station, Tex
“leader
: You may customize the MARC leader field. The default leader is “RLXXXnam a22BADXXKa 4500
”, whereRLXXX
andBADXX
are replaced with the record length and base address.
MARC21 XML
This is a MARC 21 export following a pseudo ETD-MS standard with an XML encoding.
MODS
This is a MODS export format following the profile created by the Texas Digital Library metadata working group.
ProQuest UMI
This is the ProQuest UMI export format. This format has several options that can be configured. However, because of ProQuests status within the ETD community the most popular of these options have been made available within the UI. You can access them under the Application Settings Tab.
- ProQuest Institution Code: A unique code assigned by ProQuest.
- Open Access Publishing: Should the ETDs be published as open access by ProQuest, this may require an additional fee.
- Indexed by Search Engines: Should ETDs published in ProQuest be findable in Google, Yahoo, Bing, etc…
What Exactly is an “Export Format”?
At it’s most simple level Export Formats is a simple template that generates an output file for each export. Each format is a Play! Framework-based Groovy template which is dynamically compiled each time it is run. This is the same template engine that is used within Vireo for generating the HTML views. There are several reasons why this approach was chosen: 1) It allows for code to be executed, meaning that any transformation is possible; 2) for simple XML-based formats, it is easy for non-programmers to follow; 3) because it is dynamic, it is simple to debug. Each format must be configured in the Spring-based configuration file: [vireo]/conf/application-context.xml
. This configuration is a bit complicated, but when adding a new format we can ignore most of it. For each format available in Vireo there is a corresponding <bean>
definition defining it. All template files for formats are located in [vireo]/conf/formats
directory.
Configuring an Export Format
All export formats are defined in [vireo]/conf/application-context.xml
. Near the bottom of the file there is an XML comment, <!-- Package Formats -->
. Below this line there is a <bean>
definition for each export format available. Each bean is given a unique id (alpha-numeric with no spaces). The id is just used internally to identify the format. Along with the id parameter each bean has an implementation class and scope. The scope should always be “prototype,” meaning that there will only be one instance of the format. However there are multiple implementation classes to choose from:
org.tdl.vireo.export.impl.TemplatePackagerImpl
This is the basic packager implementation that uses the template engine to generate an export format. The one limitation this implementation has is that it can only produce one “manifest” or metadata output file. Typically this is fine because most formats are an XML file along with the PDF and supplemental files.Here are the available options for this packager:
displayName: This is the name of the export format shown in the select list. format: This is an identifier of the format, typically it is the URL to the XML schema defining the format’s syntax.
manifestName: This identifies the name of the metadata manifest file that is generated by the template below. If none is provided then the default value is “mets.xml”.
mimeType: The Mime Type of the resulting export format. This should rarely be set because most of the time the format will be zipped together with associated file attachments with the Mime Type set to “application/zip”. This parameter is only needed if format consists of only one file generated by the manifest template.
manifestTemplatePath: This identifies which groovy-based template to use when generating an export format.
manifestTemplateArguments: This is a list of variables that will be available when a groovy-based template is executed.
attachmentTypeNames: A list of the attachment types that should be included in this export.
org.tdl.vireo.export.impl.MultipleTemplatePackagerImpl
This is a more advanced packager implementation that uses the template engine to generate multiple “manifests” or metadata output files. At the time of writing, only one format uses this implementation – the DSpace Simple Archive format – because it has a separate metadata file for each schema.Here are the available options for this packager:
displayName: This is the name of the export format shown in the dropdown list.
format: This is an identifier of the format, typically it is the URL to the XML schema defining the format’s syntax.
templatePaths: This identifies multiple templates which will be executed to generate the multiple manifest files. The key is the name of the output file, and the value is the template which generates the file.
templateArguments: This is a list of variables that will be available when the groovy-based template is executed.
attachmentTypeNames: A list of the attachment types that should be included in this export.
org.tdl.vireo.export.impl.FilePackagerImpl
This is a specialized packager that has only one purpose: to generate an export with no metadata. It is used by the File Export. It is very basic. Here are the available options for this packager: displayName: This is the name of the export format shown in the dropdown list. attachmentTypeNames: A list of the attachment types that should be included in this export. It would be possible to create a new export format that only included the primary document without the supplemental documents.
Export Format Templates
The export templates are used to define the format of the metadata exported from Vireo. From these templates you have access to all the internal Java-based APIs to access data from the database and the ability to format that data in any output format. The templates are dynamic, meaning they are recompiled each time they are run, so you can change them, run an export, change them again, and re-run an export without having to restart the server each time to deploy new code. Here are a few very helpful resources:
- Template syntax overview
- Built-in template tags
- Java extensions
- Template cheat sheet
When export templates are executed, there are several objects that are programmatically inserted into the template’s namespace. This makes them easy to reference. For instance you can obtain the student’s first name by using ${ sub.getStudentFirstName }
because the sub (short for submission) object is always available. Here are the variables always available
play
This is a reference to the Play framework object. From this object you can obtain configuration parameters from the application.conf. Such as
play.configuration.getProperty("myProperty")
sub
This is a reference to the Vireo submission which is being exported. From this object you can obtain almost all the information needed for most export formats.
manifestName
This is the name of the manifest file that is generated by this template. Inside of each folder for the export a file will be created with this name, and the the output of the template will be stored inside the file. You can not change the name of the file being generated within the template, for that you will need to modify the spring-based configuration. (see above)
mimeType
This is the mimeType (i.e. text/xml) of the export format. If the export format contains multiple files then the mimeType will be null because the resulting directory of files will be archived together into a single zip file. The resulting mimeType of the package in this case will always be application/zip. You can not change the mimeType of the package being generated within the template; for that you will need to modify the Spring-based configuration. (see above)
attachmentTypes
This is a Java list of AttachmentType objects that will be included in the export format. All attachments associated with the submission that are of a type contained within the list will also be included in the export package. You cannot change the type of attachments included from within the template; for that you will need to modify the Spring-based configuration. (see above)
personRepo
This is a reference to the PersonRepository object within Vireo. From this repository object you can look up additional person objects. It is very rare that you will need to use this object.
subRepo
This is a reference to the SubmissionRepository object within Vireo. From this repository object you can look up additional submissions other than the current one being exported. It is very rare that you will need to use this object.
settingRepo
This is a reference to the SettingsRepository object within Vireo. From this repository object you can look up dynamic configuration parameters such as the current grantor, whether submissions are open or closed, email templates, configuration lists of colleges, departments, etc.
proquestRepo
This is a reference to the Proquest Vocabulary Repository object within Vireo. It is likely that this repository is only useful for the proquest export but is available for all export formats. Some fields such as subject are defined by a controlled vocabulary from Proquest. Using this repository you can identify all the possibly subject terms defined in the vocabulary.
Update 2/26/2013: Fixed a mistake reported by a reader in the parameters between the template packager and the multiple template packager.