Find All Restricted Items Within DSpace

Here is an SQL query you can copy-and-paste into DSpace to find all items which have restricted access or contain bundles / bitstreams which are restricted. Restricted means that the object does not have an authorization policy enabling anonymous read.

It’s actually quite hard to find the absence of something with SQL. After trying various methods the way I came up with to solve this problem is a sub select that counts how many anonymous access policies exist for each object and if there are none then report those. The query is broken down into three distant parts one for each object time. Then all the objects are combined via PostgreSQL set operators and sub selects (again!). This means that if you have a huge number of restricted items in your repository the query might fail or take an obscene amount of time/memory to run. I tried using a left outer join but couldn’t get it to handle the case where both no access policies exists and only non anonymous access policies exist.

The approach used here is inelegant and has some serious performance problems. However it worked my immediate purpose. We had no idea how many or which items are restricted in our repository (answer: just under 300). This task is a good candidate for a DSpace curation task, to find all items in a collection which are have restricted access. Or the opposite, find all items which are NOT restricted.

Here is the code for you to cut and paste:

SELECT DISTINCT ON (handle) handle 
FROM handle 
WHERE 


handle IN 
(
-- ---------------------------------------
-- Select Items which are restricted
-- ---------------------------------------
SELECT DISTINCT ON (handle) handle 
FROM item 
INNER JOIN handle 
ON item.item_id = handle.resource_id AND handle.resource_type_id = 2 
WHERE 
( 
-- Count how many anonymous access policies exist for each item
SELECT count(*) 
FROM item AS item2anonymous, resourcepolicy AS rp 
WHERE 
item2anonymous.item_id = item.item_id AND 
item2anonymous.item_id = rp.resource_id AND 
rp.resource_type_id = 2 AND -- Type   = Item
rp.action_id = 0 AND        -- Action = Read
rp.epersongroup_id = 0      -- Group  = Anonymous
) 
< 1 
) 


OR handle IN 
( 
-- ---------------------------------------
-- Select Bundles which are restricted
-- ---------------------------------------
SELECT DISTINCT ON (handle) handle 
FROM item2bundle 
INNER JOIN bundle AS bun 
ON bun.bundle_id = item2bundle.bundle_id 
INNER JOIN handle 
ON item2bundle.item_id = handle.resource_id AND handle.resource_type_id = 2 
WHERE 
(
-- Count how many anonymous access policies exist for each bundle
SELECT count(*) 
FROM bundle AS bun2anonymous, resourcepolicy AS rp 
WHERE 
bun2anonymous.bundle_id = bun.bundle_id AND 
bun2anonymous.bundle_id = rp.resource_id AND 
rp.resource_type_id = 1 AND -- Type   = Bundle
rp.action_id = 0 AND        -- Action = Read
rp.epersongroup_id = 0      -- Group  = Anonymous
) 
< 1
) 


OR handle IN 
(
-- ---------------------------------------
-- Select Bitstreams which are restricted
-- ---------------------------------------
SELECT DISTINCT ON (handle) handle 
FROM bitstream AS bit 
INNER JOIN bundle2bitstream 
ON bit.bitstream_id = bundle2bitstream.bitstream_id 
INNER JOIN item2bundle 
ON bundle2bitstream.bundle_id = item2bundle.bundle_id 
INNER JOIN handle 
ON item2bundle.item_id = handle.resource_id AND handle.resource_type_id = 2 
WHERE 
(
-- Count how many anonymous access policies exist for each bitstream
SELECT count(*) 
FROM bitstream AS bit2anonymous, resourcepolicy AS rp 
WHERE 
bit2anonymous.bitstream_id = bit.bitstream_id AND 
bit2anonymous.bitstream_id = rp.resource_id AND 
rp.resource_type_id = 0 AND -- Type = Item
rp.action_id = 0 AND        -- Action = Read
rp.epersongroup_id = 0      -- Group = Anonymous
) 
< 1


)

comments powered by Disqus