Posts Tagged ‘Tips’

Find All Restricted Items Within DSpace

Friday, August 19th, 2011

Here is an SQL query you can copy-and-paste into DSpace to find all items which have restricted access or contain bundles / bitstreams which are restricted. Restricted means that the object does not have an authorization policy enabling anonymous read.

It’s actually quite hard to find the absence of something with SQL. After trying various methods the way I came up with to solve this problem is a sub select that counts how many anonymous access policies exist for each object and if there are none then report those. The query is broken down into three distant parts one for each object time. Then all the objects are combined via PostgreSQL set operators and sub selects (again!). This means that if you have a huge number of restricted items in your repository the query might fail or take an obscene amount of time/memory to run. I tried using a left outer join but couldn’t get it to handle the case where both no access policies exists and only non anonymous access policies exist.

The approach used here is inelegant and has some serious performance problems. However it worked my immediate purpose. We had no idea how many or which items are restricted in our repository (answer: just under 300). This task is a good candidate for a DSpace curation task, to find all items in a collection which are have restricted access. Or the opposite, find all items which are NOT restricted.

(more…)

Preserving Character Encodings of a DSpace Metadata Export using MS Excel 2011 on OS X

Wednesday, July 20th, 2011

Stencil Alphabet The problem I recently ran into was updating the metadata for a particular collection that was being moved from TDL’s repository into A&M’s repository. I able to quickly move the collection into the new repository using OAI-PMH harvesting with ORE support. However, the metadata needed a bit of cleaning up for it’s new repository home, such as changing dc.contributor.author to dc.author and inconsistent formats used in other fields. This is a perfect task for Stuart’s Bulk Metadata Export tool. This DSpace feature allows an administrator to download a Comma Separate Values (CSV) file of all the metadata in a particular collection, then open it up in MS Excel and edit the metadata naturally. Finally once the metadata is ready to go you can upload it back to the repository and all the fields will be updated correctly. It is a very nice feature that can save a ton of time.

The Problem

When I opened the file in Excel some of the characters were not showing up correctly. Specifically characters in titles and names which used non-English marks, in this case there were all from the extended Latin character set. If you ignore these problems, later when you try to upload the CSV file DSpace will pick up on these changes and cause the garbled characters to be introduced into the repository.


(more…)

Mixed XML and Property files?

Tuesday, February 1st, 2011

Have you ever wanted the simplicity of a plain old Java properties file but with just a little bit of grouping provided by XML? I’ve been working on a small side-project recently and it requires a simple configuration file of a dozen items or so. The project needed a repeatable set of configuration parameters, so that it could connect to several SVN servers. Each connection needed a URL, username, password, and a few other ancillary properties. This is a pain to do in a plain old properties file. You have to do something with the naming of the properties to relate them together, such as:

property.1.url = http://...
property.1.username = Bob
property.1.password = Bob’s secret

property.2.url = http://...
property.2.username = Joe
property.2.password = Joe’s secret  

This way works but it’s sort of annoying and can be confusing for someone else to understand what’s going on. They would likely need to read the documentation, especially if it’s more complex with multiple types of repeating parameters. There are several alternatives, you could try encoding all the parameters into one property but that’s even harder for a user to figure out. A slightly better alternative is to use something hierarchical like XML, thus:

<properties>
	<repeatable>
		<url>http://...</url>
		<username>Bob</username>
		<password>Bob’s secret</password>
	</repeatable>
	<repeatable>
		<url>http://...</url>
		<username>Joe</username>
		<password>Joe’s secret</password>
	</repeatable>
</properties>

This is easier to understand, but it’s very verbose. Each property is labeled twice, once to open the tag and again to close the tag. XML is good for complex things like HTML or specific file formats with a dedicated reader. However, XML is not great for humans to read, let alone edit quickly.

A better solution, combine both!

Instead of either XML or properties file we can munge the two together to create something that is easier for users to manage.

property.one = value1
property.two = value2

<repeatable>
	url = http://...
	username = Bob
	password = Bob’s secret
</repeatable>

<repeatable>
	url = http://...
	username = Joe
	password = Joe’s secret
</repeatable>

The combined format is similar to Apache’s httpd configuration format where name/value pairs are also mixed with nestable elements. It’s very close to the simplicity of a plain old properties file, but has just enough expressivity to handle grouping of elements. It’s a win-win.
(more…)

In-place SVN Import

Monday, September 6th, 2010

I discovered an SVN trick today: how to do an in-place import into SVN. Normally when you run “svn import” it will leave the file system alone creating a copy on the repository. Then you have to do an “svn checkout” to pull the files back down under version control.

The import/checkout process normally this is a pain. However there are a few instances where it’s a really big pain such as Unix’s etc/ directory. You can’t just delete etc/ and recheck it out from version control or lots of stuff will break.  The other place I’ve found this usefull is for Xcode when starting new projects. Use the in-place import instead of Apple’s suggestion of creating two projects.

The process is quite simple. First create an empty directory is the repository, then checkout the empty directory into your existing location. Finally, run add the new files and then commit them into the repository.

svn mkdir https://your-svn-repo.com/new/directory/

svn checkout https//your-svn-repo.com/new/directory/  .

svn add *

svn commit –m “Initial in-place import of directory”