Find All Restricted Items Within DSpace
Here is an SQL query you can copy-and-paste into DSpace to find all items which have restricted access or contain bundles / bitstreams which are restricted. Restricted means that the object does not have an authorization policy enabling anonymous read.
It’s actually quite hard to find the absence of something with SQL. After trying various methods the way I came up with to solve this problem is a sub select that counts how many anonymous access policies exist for each object and if there are none then report those. The query is broken down into three distant parts one for each object time. Then all the objects are combined via PostgreSQL set operators and sub selects (again!). This means that if you have a huge number of restricted items in your repository the query might fail or take an obscene amount of time/memory to run. I tried using a left outer join but couldn’t get it to handle the case where both no access policies exists and only non anonymous access policies exist.
The approach used here is inelegant and has some serious performance problems. However it worked my immediate purpose. We had no idea how many or which items are restricted in our repository (answer: just under 300). This task is a good candidate for a DSpace curation task, to find all items in a collection which are have restricted access. Or the opposite, find all items which are NOT restricted.
Here is the code for you to cut and paste:
SELECT DISTINCT ON (handle) handle
FROM handle
WHERE
handle IN
(
-- ---------------------------------------
-- Select Items which are restricted
-- ---------------------------------------
SELECT DISTINCT ON (handle) handle
FROM item
INNER JOIN handle
ON item.item_id = handle.resource_id AND handle.resource_type_id = 2
WHERE
(
-- Count how many anonymous access policies exist for each item
SELECT count(*)
FROM item AS item2anonymous, resourcepolicy AS rp
WHERE
item2anonymous.item_id = item.item_id AND
item2anonymous.item_id = rp.resource_id AND
rp.resource_type_id = 2 AND -- Type = Item
rp.action_id = 0 AND -- Action = Read
rp.epersongroup_id = 0 -- Group = Anonymous
)
< 1
)
OR handle IN
(
-- ---------------------------------------
-- Select Bundles which are restricted
-- ---------------------------------------
SELECT DISTINCT ON (handle) handle
FROM item2bundle
INNER JOIN bundle AS bun
ON bun.bundle_id = item2bundle.bundle_id
INNER JOIN handle
ON item2bundle.item_id = handle.resource_id AND handle.resource_type_id = 2
WHERE
(
-- Count how many anonymous access policies exist for each bundle
SELECT count(*)
FROM bundle AS bun2anonymous, resourcepolicy AS rp
WHERE
bun2anonymous.bundle_id = bun.bundle_id AND
bun2anonymous.bundle_id = rp.resource_id AND
rp.resource_type_id = 1 AND -- Type = Bundle
rp.action_id = 0 AND -- Action = Read
rp.epersongroup_id = 0 -- Group = Anonymous
)
< 1
)
OR handle IN
(
-- ---------------------------------------
-- Select Bitstreams which are restricted
-- ---------------------------------------
SELECT DISTINCT ON (handle) handle
FROM bitstream AS bit
INNER JOIN bundle2bitstream
ON bit.bitstream_id = bundle2bitstream.bitstream_id
INNER JOIN item2bundle
ON bundle2bitstream.bundle_id = item2bundle.bundle_id
INNER JOIN handle
ON item2bundle.item_id = handle.resource_id AND handle.resource_type_id = 2
WHERE
(
-- Count how many anonymous access policies exist for each bitstream
SELECT count(*)
FROM bitstream AS bit2anonymous, resourcepolicy AS rp
WHERE
bit2anonymous.bitstream_id = bit.bitstream_id AND
bit2anonymous.bitstream_id = rp.resource_id AND
rp.resource_type_id = 0 AND -- Type = Item
rp.action_id = 0 AND -- Action = Read
rp.epersongroup_id = 0 -- Group = Anonymous
)
< 1
)