Empathy List Archives

talk@lists.collectionspace.org

WE HAVE SUNSET THIS LISTSERV - Join us at collectionspace@lyrasislists.org

Limit to the number of authorities in a CSpace database?

Peter Murray

Tue, Nov 17, 2015 7:40 PM

I am intrigued by Susan's comment about using external vocabularies. There seems to be foresight in the data model to be able to do that, and hooking into something like the Getty's Linked Open Data SPARQL endpoint would certainly be nice to do. As with most open source, though, if only someone had the time to scratch that itch...

Peter

On Nov 16, 2015, at 6:29 PM, Aron Roberts aron@socrates.berkeley.edu wrote:

This will likely repeat a lot of what John, Ray, and Richard have said, but my two cents as well ... there's also a note here from a discussion with Chris about search training.

A summary / tl;dr:

Elasticsearch integration may be a future help in speeding up large searches. Until then (and likely even after then ...
Training users on doing effective searching can significantly improve search speed.
Using just a subset of term lists can help (if it's amenable to that; it may not be here ...).
Throwing hardware at the problem - particularly fast disk - can help quite a lot.
Be sure that the database is cared for/tuned a bit:
5.1 Run VACUUM/ANALYZE
5.2 Look at other tuning tips

Aron

--
Peter Murray
Dev/Ops Lead and Project Manager
Cherry Hill Company

Thank you, Susan, Aron and Chris. In particular a shout-out to Aron for writing up a great list of things to look at for improving performance. This certainly generated more of a discussion than I anticipated -- my inquiries seem to do that -- and I appreciate everyone's time in writing out their thoughts. In particular, I though Postgesql's AutoVacuum process would take care of things for me, but in this time of large record loads it may make sense for me to do that more often than the automatic processes would dictate. Knowing about the anchors, as Chris called out, is particularly useful as well. I am intrigued by Susan's comment about using external vocabularies. There seems to be foresight in the data model to be able to do that, and hooking into something like the Getty's Linked Open Data SPARQL endpoint would certainly be nice to do. As with most open source, though, if only someone had the time to scratch that itch... Peter > On Nov 16, 2015, at 6:29 PM, Aron Roberts <aron@socrates.berkeley.edu> wrote: > > This will likely repeat a lot of what John, Ray, and Richard have said, but my two cents as well ... there's also a note here from a discussion with Chris about search training. > > A summary / tl;dr: > > 1. Elasticsearch integration may be a future help in speeding up large searches. Until then (and likely even after then ... > 2. Training users on doing effective searching can significantly improve search speed. > 3. Using just a subset of term lists can help (if it's amenable to that; it may not be here ...). > 4. Throwing hardware at the problem - particularly fast disk - can help quite a lot. > 5. Be sure that the database is cared for/tuned a bit: > 5.1 Run VACUUM/ANALYZE > 5.2 Look at other tuning tips > > Aron -- Peter Murray Dev/Ops Lead and Project Manager Cherry Hill Company