Archive for July, 2011

Preserving Character Encodings of a DSpace Metadata Export using MS Excel 2011 on OS X

Wednesday, July 20th, 2011

Stencil Alphabet The problem I recently ran into was updating the metadata for a particular collection that was being moved from TDL’s repository into A&M’s repository. I able to quickly move the collection into the new repository using OAI-PMH harvesting with ORE support. However, the metadata needed a bit of cleaning up for it’s new repository home, such as changing dc.contributor.author to dc.author and inconsistent formats used in other fields. This is a perfect task for Stuart’s Bulk Metadata Export tool. This DSpace feature allows an administrator to download a Comma Separate Values (CSV) file of all the metadata in a particular collection, then open it up in MS Excel and edit the metadata naturally. Finally once the metadata is ready to go you can upload it back to the repository and all the fields will be updated correctly. It is a very nice feature that can save a ton of time.

The Problem

When I opened the file in Excel some of the characters were not showing up correctly. Specifically characters in titles and names which used non-English marks, in this case there were all from the extended Latin character set. If you ignore these problems, later when you try to upload the CSV file DSpace will pick up on these changes and cause the garbled characters to be introduced into the repository.


(more…)