Exporting from Vireo into DSpace

The first version of the Vireo Electronic Thesis and Submission system was built as an addon to DSpace. It used the same technology stack, reused the underlying database and file storage, operated within the same UI. The original idea was that Vireo would deeptly integrate with the repository. Because of these decisions there was no separation between how Vireo stored it’s metadata and it’s Dublin Core encoding of the metadata. There was only one way, the Vireo way. If you wanted something else you could not do it with DSpace. For example, if you wanted to store the author’s information in contributor.author you could not. Vireo demanded that you use the creator field instead.

Vireo 2.0 broke this requirement bring more flexibility. The project is no longer deeply integrated with DSpace, or any other repository. Internally Vireo stores its metadata in relational tables in the format that is easiest for it to work with and does not conform to any particular metadata encoding. Then when data is ready to be deposited into the repository the SWORD protocol is used to deposit the content into the destination repository. During the SWORD deposit Vireo will generate a metadata package in a particular encoding format. These “export formats” are designed to be flexible so that different repositories can use different encodings. I’ve previously written a blog post on the technology behind these export formats if you are interested in customizing them.

Vireo ships with several built in export formats. The two formats for DSpace are the METS-based package along with the simple archive format. Both of these formats use th same metadata profile that was hard coded into prior versions of Vireo. Here is a recap of that profile

Qualified Dublin Core Fields:

  • dc.creator

    The Student’s name in last, first format.

  • dc.title

    The document’s title.

  • dc.description.abstract

    The document’s abstract as plain text.

  • dc.subject

    The document’s keywords each as a separate element.

  • dc.contributor.advisor

    The student’s Committee Chair, Co-Chair, Superviosor, Co-Superviosor, or Advisors’s name in last, first format.

  • dc.contributor.committeeMember

    The student’s committee if they have a role other than one associated with the advisor field. The name is also in the last, first format.

  • dc.date.created

    The student’s graduation date in the format “YYYY-MM”.

  • dc.date.submitted

    The date the student completed their submission (the first time) for review.

  • dc.date.issued

    The date when the submission transitioned into the Approved state.

  • dc.format.mimetype

    The mimetype of the primary document. This is always “application/pdf” because primary documents are required to be PDFs.

  • dc.language.iso

    The document’s language in ISO-639-2 format.

  • dc.type.material

    This field is statically defined as “text”.

  • dc.type

    This field is statically defined as “Thesis”.

  • dc.identifier.uri

    The handle or other repository defined identifier as assigned by the repository. This would not have been assigned until after it has been submitted into a repository. Therefore, upon first submission it would not be present, but on subsequent submissions it would be defined. However, the SWORD protocol, version 1, does not support re-submissions of the same item - it would create duplicate copies in the repository.

ETD-MS Fields:

  • thesis.degree.name

    The full degree name selected by the Student

  • thesis.degree.level

    The degree level (Doctoral, Masters, Undergraduate) selected by the Student.

  • thesis.degree.discipline

    The major selected by the student

  • thesis.degree.grantor

    The degree granting institution this field is statically set for all submissions under the Application’s Settings tab.

  • thesis.degree.department

    The department selected by the student.

DSpace Embargo Fields:

  • local.embargo.terms

    The type of embargo term, in this case a specific date calculated based upon the graduation date.

  • local.embargo.lift

    The date when this item should be released from embargo. This is calculated based upon the graduation date, and yes this is the same date as the other field - That’s just the way they made DSpace work. It doesn’t make sense from this perspective.

Dublin Core Provenance Fields:

  • dc.description.provenance

    An english description of the embargo selected.

  • dc.description.provenance

    An english description of when the student agreed to the license.

  • dc.description.provenance

    An english description of the submission date, document type.

  • dc.description.provenance

    An english description when the committee approved the submission.

  • dc.description.provenance

    An english description when the submission was approved.

  • dc.description.provenance

    An english description of when the submission was deposited

If you are trying to get Vireo to deposit items into a DSpace. The number one problem that occurs is because DSpace does not have the above fields defined in its metadata registry. When you try to deposit an item into a DSpace repository and the item uses a field that is unknown to DSpace then a log message will be generated in DSpace but the error returned via SWORD to Vireo will be a generic error message. One of the best debugging techniques I have found is to tail the DSpace logs while doing a submission to identify the error. Most of the time it’s a simple fix to just add the field to the metadata registry.


comments powered by Disqus