Posts Tagged ‘DSpace’

Introducing Piper

Saturday, May 19th, 2012

Piper is an internal project we have been working on at Texas A&M University Libraries. The project is just in its initial stages at this point with the first kernel of an idea. I expect to that we will expand its capabilities in the future. Piper is basically a repository batch import tool right now, and in the future it could grow into becoming an internal repository workflow tool.

How does Piper fit into the repository ecosystem? It is a behind the scene tool for repository administrators to curate collections. For our initial phase we focused on the sole task of ingesting content into DSpace simple. However, in the future, it may bring in workflow capabilities to ensure quality control, integration with other workflow tools like Vireo, and of course additional repository support.

(more…)

Merging two DSpace Solr based data sets together

Tuesday, October 11th, 2011

Have you ever messed up a DSpace upgrade and somehow ended up resetting your DSpace statistics? I did that. When we upgrade DSpace at A&M we preform a fresh install each time and then restore the data from the old instance into the new instance. This involves connecting the database, linking the asset store, and copying the DSpace log directory. We like to do it this way so that our configs are fresh each time. Our documented installation procedures lists the exact settings (about 5) that need to be touched for each production install. All other parameters in the dspace.cfg are maintained in our local SVN copy. This prevents the problem of never know exactly how your DSpace is configured if you do the recommended upgrade procedure by modifying the dspace.cfg each upgrade with new parameters.

(more…)

Find All Restricted Items Within DSpace

Friday, August 19th, 2011

Here is an SQL query you can copy-and-paste into DSpace to find all items which have restricted access or contain bundles / bitstreams which are restricted. Restricted means that the object does not have an authorization policy enabling anonymous read.

It’s actually quite hard to find the absence of something with SQL. After trying various methods the way I came up with to solve this problem is a sub select that counts how many anonymous access policies exist for each object and if there are none then report those. The query is broken down into three distant parts one for each object time. Then all the objects are combined via PostgreSQL set operators and sub selects (again!). This means that if you have a huge number of restricted items in your repository the query might fail or take an obscene amount of time/memory to run. I tried using a left outer join but couldn’t get it to handle the case where both no access policies exists and only non anonymous access policies exist.

The approach used here is inelegant and has some serious performance problems. However it worked my immediate purpose. We had no idea how many or which items are restricted in our repository (answer: just under 300). This task is a good candidate for a DSpace curation task, to find all items in a collection which are have restricted access. Or the opposite, find all items which are NOT restricted.

(more…)

Preserving Character Encodings of a DSpace Metadata Export using MS Excel 2011 on OS X

Wednesday, July 20th, 2011

Stencil Alphabet The problem I recently ran into was updating the metadata for a particular collection that was being moved from TDL’s repository into A&M’s repository. I able to quickly move the collection into the new repository using OAI-PMH harvesting with ORE support. However, the metadata needed a bit of cleaning up for it’s new repository home, such as changing dc.contributor.author to dc.author and inconsistent formats used in other fields. This is a perfect task for Stuart’s Bulk Metadata Export tool. This DSpace feature allows an administrator to download a Comma Separate Values (CSV) file of all the metadata in a particular collection, then open it up in MS Excel and edit the metadata naturally. Finally once the metadata is ready to go you can upload it back to the repository and all the fields will be updated correctly. It is a very nice feature that can save a ton of time.

The Problem

When I opened the file in Excel some of the characters were not showing up correctly. Specifically characters in titles and names which used non-English marks, in this case there were all from the extended Latin character set. If you ignore these problems, later when you try to upload the CSV file DSpace will pick up on these changes and cause the garbled characters to be introduced into the repository.


(more…)

DSpace Functional Tests?

Sunday, April 4th, 2010

The Texas Digital Library has been focusing on testability for our projects. Since DSpace is related too or part of most of our projects we’ve been looking for a way to increase DSpace’s testability. Traditionally this would mean adding unit tests and integration tests. However as DSpace currently stands is hard to break it up into individual components that can be tested in isolation. You’ll quickly find that writing tests for DSpace pull in the entire system, plus databases, and a file system. To address this problem we’ve created a simple framework for adding both integration tests and functional tests which improve the reliability of our projects. I’m interested to see if this is something the greater DSpace community would be interested in?

The goals of our project were to create a mechanism where we could run complete functional tests. Functional tests evaluate the entire system as the end user would use it, so think of it as opening a web browser and evaluating the output – but completely automated. They test everything all together. Ideal it would be better to test each component individual, but this is in practical for DSpace for two reasons 1) DSpace is highly integrated and nearly impossible to separate from the database and file systems, 2) Creating unit test for all of DSpace is very time consuming it is simpler to write a few functional tests that cover a wide set of features over the whole application. It gets you to a point where you can reliably verify the software quicker. If you’re working on unit tests for DSpace please do not let this stand in your way.

(more…)