WE HAVE SUNSET THIS LISTSERV - Join us at collectionspace@lyrasislists.org
View all threadsHello,
We at PAHMA have developed a tool that I'd like to let the community know
about, in case other institutions out there would like to use it or would
like to work with us to improve it. The tool in question produces nightly
emailed reports that detail the changes that have been made to CSpace data
over the past day.
We've had a couple of situations over the past year where we learned a bit
belatedly that some users were making incorrect assumptions and were making
improper and/or inaccurate changes to data. I didn't notice these
systematic errors until the scope of the problem had gotten so large that
it was having an effect on the collections-wide statistics I also gather
every night (another tool I'd be happy to discuss or share, if you're
interested).
In an effort to discover these mistakes before they escalate to such a
large scale, we've designed and implemented a python tool that is triggered
by a nightly cron script. This tool (which we're provisionally calling the
"CSpace change monitor") runs a series of simple, user-defined queries that
each return a key-value pair for every record of a particular type (we've
started with just collection objects, but will expand to other record
types).
The results of each of these queries are written to files each night, and
then the current day's results are compared to the previous day's results.
Any differences found are written to another file, and then these
differences are run through some simple logic to filter out differences
that have been deemed to be unremarkable (e.g., capitalization differences
in Object name).
The remaining differences are then formatted nicely and emailed off to the
indicated recipient(s) for examination.
Looking over the results takes less than a minute and any suspicious edits
are quickly spotted so that they can be dealt with.
Some planned or desired improvements:
• make the report web-based and password-protected, with links to the
records in question
• include data about who made the changes (rather tricky, given CSpace's
current audit capabilities)
• enlarge the number of fields being checked (each field being checked adds
about 20 seconds to the overall execution time, in a database with 650,000
object records).
• enhance the logic for filtering out unremarkable changes
Let me know if you're interested in getting more details about the tool and
its code, or if you're interested in partnering up to work on improving
this tool.
Thanks,
Michael Black
Head, Research & Information Systems
Phoebe A. Hearst Museum of Anthropology