Exporting from Vireo into DSpace
The first version of the Vireo Electronic Thesis and Submission system was built as an addon to DSpace. It used the same technology stack, reused the underlying database and file storage, operated within the same UI. The original idea was that Vireo would deeptly integrate with the repository. Because of these decisions there was no separation between how Vireo stored it’s metadata and it’s Dublin Core encoding of the metadata. There was only one way, the Vireo way. If you wanted something else you could not do it with DSpace. For example, if you wanted to store the author’s information in contributor.author
you could not. Vireo demanded that you use the creator
field instead.
Vireo 2.0 broke this requirement bring more flexibility. The project is no longer deeply integrated with DSpace, or any other repository. Internally Vireo stores its metadata in relational tables in the format that is easiest for it to work with and does not conform to any particular metadata encoding. Then when data is ready to be deposited into the repository the SWORD protocol is used to deposit the content into the destination repository. During the SWORD deposit Vireo will generate a metadata package in a particular encoding format. These “export formats” are designed to be flexible so that different repositories can use different encodings. I’ve previously written a blog post on the technology behind these export formats if you are interested in customizing them.
Vireo ships with several built in export formats. The two formats for DSpace are the METS-based package along with the simple archive format. Both of these formats use th same metadata profile that was hard coded into prior versions of Vireo. Here is a recap of that profile
Qualified Dublin Core Fields:
dc.creator
The Student’s name in last, first format.
dc.title
The document’s title.
dc.description.abstract
The document’s abstract as plain text.
dc.subject
The document’s keywords each as a separate element.
dc.contributor.advisor
The student’s Committee Chair, Co-Chair, Superviosor, Co-Superviosor, or Advisors’s name in last, first format.
dc.contributor.committeeMember
The student’s committee if they have a role other than one associated with the advisor field. The name is also in the last, first format.
dc.date.created
The student’s graduation date in the format “YYYY-MM”.
dc.date.submitted
The date the student completed their submission (the first time) for review.
dc.date.issued
The date when the submission transitioned into the Approved state.
dc.format.mimetype
The mimetype of the primary document. This is always “application/pdf” because primary documents are required to be PDFs.
dc.language.iso
The document’s language in ISO-639-2 format.
dc.type.material
This field is statically defined as “text”.
dc.type
This field is statically defined as “Thesis”.
dc.identifier.uri
The handle or other repository defined identifier as assigned by the repository. This would not have been assigned until after it has been submitted into a repository. Therefore, upon first submission it would not be present, but on subsequent submissions it would be defined. However, the SWORD protocol, version 1, does not support re-submissions of the same item - it would create duplicate copies in the repository.
ETD-MS Fields:
thesis.degree.name
The full degree name selected by the Student
thesis.degree.level
The degree level (Doctoral, Masters, Undergraduate) selected by the Student.
thesis.degree.discipline
The major selected by the student
thesis.degree.grantor
The degree granting institution this field is statically set for all submissions under the Application’s Settings tab.
thesis.degree.department
The department selected by the student.
DSpace Embargo Fields:
local.embargo.terms
The type of embargo term, in this case a specific date calculated based upon the graduation date.
local.embargo.lift
The date when this item should be released from embargo. This is calculated based upon the graduation date, and yes this is the same date as the other field - That’s just the way they made DSpace work. It doesn’t make sense from this perspective.
Dublin Core Provenance Fields:
dc.description.provenance
An english description of the embargo selected.
dc.description.provenance
An english description of when the student agreed to the license.
dc.description.provenance
An english description of the submission date, document type.
dc.description.provenance
An english description when the committee approved the submission.
dc.description.provenance
An english description when the submission was approved.
dc.description.provenance
An english description of when the submission was deposited
If you are trying to get Vireo to deposit items into a DSpace. The number one problem that occurs is because DSpace does not have the above fields defined in its metadata registry. When you try to deposit an item into a DSpace repository and the item uses a field that is unknown to DSpace then a log message will be generated in DSpace but the error returned via SWORD to Vireo will be a generic error message. One of the best debugging techniques I have found is to tail the DSpace logs while doing a submission to identify the error. Most of the time it’s a simple fix to just add the field to the metadata registry.