Exporting from Vireo into DSpace

The first version of the Vireo Electronic Thesis and Submission system was built as an addon to DSpace. It used the same technology stack, reused the underlying database and file storage, operated within the same UI. The original idea was that Vireo would deeptly integrate with the repository. Because of these decisions there was no separation between how Vireo stored it’s metadata and it’s Dublin Core encoding of the metadata. There was only one way, the Vireo way. If you wanted something else you couldn’t do it with DSpace. For example, if you wanted to store the author’s information in contributor.author you couldn’t. Vireo demanded that you use the creator field instead.

Vireo 2.0 broke this requirement bring more flexibility. The project is no longer deeply integrated with DSpace, or any other repository. Internally Vireo stores its metadata in relational tables in the format that is easiest for it to work with and does not conform to any particular metadata encoding. Then when data is ready to be deposited into the repository the SWORD protocol is used to deposit the content into the destination repository. During the SWORD deposit Vireo will generate a metadata package in a particular encoding format. These “export formats” are designed to be flexible so that different repositories can use different encodings. I’ve previously written a blog post on the technology behind these export formats if you are interested in customizing them.

Vireo ships with several built in export formats. The two formats for DSpace are the METS-based package along with the simple archive format. Both of these formats use th same metadata profile that was hard coded into prior versions of Vireo. Here is a recap of that profile

Qualified Dublin Core Fields:

  • dc.creator
    The Student’s name in last, first format.
  • dc.title
    The document’s title.
  • dc.description.abstract
    The document’s abstract as plain text.
  • dc.subject
    The document’s keywords each as a separate element.
  • dc.contributor.advisor
    The student’s Committee Chair, Co-Chair, Superviosor, Co-Superviosor, or Advisors’s name in last, first format.
  • dc.contributor.committeeMember
    The student’s committee if they have a role other than one associated with the advisor field. The name is also in the last, first format.
  • dc.date.created
    The student’s graduation date in the format “YYYY-MM”.
  • dc.date.submitted
    The date the student completed their submission (the first time) for review.
  • dc.date.issued
    The date when the submission transitioned into the Approved state.
  • dc.format.mimetype
    The mimetype of the primary document. This is always “application/pdf” because primary documents are required to be PDFs.
  • dc.language.iso
    The document’s language in ISO-639-2 format.
  • dc.type.material
    This field is statically defined as “text”.
  • dc.type
    This field is statically defined as “Thesis”.
  • dc.identifier.uri
    The handle or other repository defined identifier as assigned by the repository. This would not have been assigned until after it has been submitted into a repository. Therefore, upon first submission it would not be present, but on subsequent submissions it would be defined. However, the SWORD protocol, version 1, does not support re-submissions of the same item – it would create duplicate copies in the repository.

ETD-MS Fields:

  • thesis.degree.name
    The full degree name selected by the Student
  • thesis.degree.level
    The degree level (Doctoral, Masters, Undergraduate) selected by the Student.
  • thesis.degree.discipline
    The major selected by the student
  • thesis.degree.grantor
    The degree granting institution this field is statically set for all submissions under the Application’s Settings tab.
  • thesis.degree.department
    The department selected by the student.

DSpace Embargo Fields:

  • local.embargo.terms
    The type of embargo term, in this case a specific date calculated based upon the graduation date.
  • local.embargo.lift
    The date when this item should be released from embargo. This is calculated based upon the graduation date, and yes this is the same date as the other field – That’s just the way they made DSpace work. It doesn’t make sense from this perspective.

Dublin Core Provenance Fields:

  • dc.description.provenance
    An english description of the embargo selected.
  • dc.description.provenance
    An english description of when the student agreed to the license.
  • dc.description.provenance
    An english description of the submission date, document type.
  • dc.description.provenance
    An english description when the committee approved the submission.
  • dc.description.provenance
    An english description when the submission was approved.
  • dc.description.provenance
    An english description of when the submission was deposited

If you are trying to get Vireo to deposit items into a DSpace. The number one problem that occurs is because DSpace does not have the above fields defined in its metadata registry. When you try to deposit an item into a DSpace repository and the item uses a field that is unknown to DSpace then a log message will be generated in DSpace but the error returned via SWORD to Vireo will be a generic error message. One of the best debugging techniques I have found is to tail the DSpace logs while doing a submission to identify the error. Most of the time it’s a simple fix to just add the field to the metadata registry.

Tags: , , , , , ,

One Response to “Exporting from Vireo into DSpace”

  1. Drew Heles says:

    Very helpful post, Scott. With this information, I was able to register the thesis and local schemas (via the UI in DSpace) and then create and import a simple xml file into DSpace to add all the missing fields. With that done, I can successfully deposit into DSpace from Vireo without difficulty. Absent your help, I’d likely still be troubleshooting. Thanks!

Leave a Reply