talk@lists.collectionspace.org

WE HAVE SUNSET THIS LISTSERV - Join us at collectionspace@lyrasislists.org

View all threads

Limit to the number of authorities in a CSpace database?

PM
Peter Murray
Mon, Nov 16, 2015 2:17 AM

Is there a limit to the number of authorities in a CollectionSpace database?  On behalf of SDMoM, I'm loading extracts of the Bureau of Geographic Names (211,330 headings) and the National Geospatial-Intelligence Agency (191,549 headings).  After doing this, the system ground to a halt on operations such as browsing the headings.  I could see that there were no errors logged in either PostgreSQL's or Tomcat's logs (or the CSpace application logs).  The 5-minute load average on the system was just below 2.0, and both PostgreSQL and Tomcat were consuming a lot of CPU.  I'm not an expert in database tuning, but I did follow the steps offered in the installation guide (https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning).

Anyone else running a system with 400,000 authority headings?

Peter

Peter Murray
Dev/Ops Lead and Project Manager
Cherry Hill Company

Is there a limit to the number of authorities in a CollectionSpace database? On behalf of SDMoM, I'm loading extracts of the Bureau of Geographic Names (211,330 headings) and the National Geospatial-Intelligence Agency (191,549 headings). After doing this, the system ground to a halt on operations such as browsing the headings. I could see that there were no errors logged in either PostgreSQL's or Tomcat's logs (or the CSpace application logs). The 5-minute load average on the system was just below 2.0, and both PostgreSQL and Tomcat were consuming a lot of CPU. I'm not an expert in database tuning, but I did follow the steps offered in the installation guide (https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning). Anyone else running a system with 400,000 authority headings? Peter -- Peter Murray Dev/Ops Lead and Project Manager Cherry Hill Company
JB
John B Lowe
Mon, Nov 16, 2015 9:07 PM

Peter,

I just checked, and the largest authority in the UCB deployments appears
to be the Taxon authority used in the UCJEPS Herbaria deployment.  The
taxon_common table has 309,352 non-deleted rows at the moment. I think
this corresponds pretty closed to "nodes in the tree", and perhaps
therefore to "authority terms".

I just tried an advanced search on this authority on our Dev instance and
after about 6 minutes I did get a result page.  Redo the search took about
30 seconds -- of course there was much cacheing done at that point. (Tomcat
was not busy on that server, but I can't see the load on Postgres as we
don't have direct access to that server).

I'll try this on UCJEPS Production after hours (I'm interested in the
outcome...).

John

On Sun, Nov 15, 2015 at 6:17 PM, Peter Murray pmurray@chillco.com wrote:

Is there a limit to the number of authorities in a CollectionSpace
database?  On behalf of SDMoM, I'm loading extracts of the Bureau of
Geographic Names (211,330 headings) and the National
Geospatial-Intelligence Agency (191,549 headings).  After doing this, the
system ground to a halt on operations such as browsing the headings.  I
could see that there were no errors logged in either PostgreSQL's or
Tomcat's logs (or the CSpace application logs).  The 5-minute load average
on the system was just below 2.0, and both PostgreSQL and Tomcat were
consuming a lot of CPU.  I'm not an expert in database tuning, but I did
follow the steps offered in the installation guide (
https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning
).

Anyone else running a system with 400,000 authority headings?

Peter

Peter Murray
Dev/Ops Lead and Project Manager
Cherry Hill Company


Talk mailing list
Talk@lists.collectionspace.org

http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

Peter, I just checked, and the largest authority in the UCB deployments *appears* to be the Taxon authority used in the UCJEPS Herbaria deployment. The taxon_common table has 309,352 non-deleted rows at the moment. I *think* this corresponds pretty closed to "nodes in the tree", and perhaps therefore to "authority terms". I just tried an advanced search on this authority on our Dev instance and after about 6 minutes I did get a result page. Redo the search took about 30 seconds -- of course there was much cacheing done at that point. (Tomcat was not busy on that server, but I can't see the load on Postgres as we don't have direct access to that server). I'll try this on UCJEPS Production after hours (I'm interested in the outcome...). John On Sun, Nov 15, 2015 at 6:17 PM, Peter Murray <pmurray@chillco.com> wrote: > Is there a limit to the number of authorities in a CollectionSpace > database? On behalf of SDMoM, I'm loading extracts of the Bureau of > Geographic Names (211,330 headings) and the National > Geospatial-Intelligence Agency (191,549 headings). After doing this, the > system ground to a halt on operations such as browsing the headings. I > could see that there were no errors logged in either PostgreSQL's or > Tomcat's logs (or the CSpace application logs). The 5-minute load average > on the system was just below 2.0, and both PostgreSQL and Tomcat were > consuming a lot of CPU. I'm not an expert in database tuning, but I did > follow the steps offered in the installation guide ( > https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning > ). > > Anyone else running a system with 400,000 authority headings? > > > Peter > -- > Peter Murray > Dev/Ops Lead and Project Manager > Cherry Hill Company > > > _______________________________________________ > Talk mailing list > Talk@lists.collectionspace.org > > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org >
RL
Ray Lee
Mon, Nov 16, 2015 9:25 PM

Hi Peter,
I don't think there are hard limits, but once you get that many records of
any type (it could be authority items, procedures, or collection objects),
performance takes a dive. We have to train users to avoid browsing all
objects, taxon names, or any other type that has a lot of records, because
the results won't show up in a reasonable time.

Ray

On Mon, Nov 16, 2015 at 1:07 PM, John B Lowe jblowe@berkeley.edu wrote:

Peter,

I just checked, and the largest authority in the UCB deployments appears
to be the Taxon authority used in the UCJEPS Herbaria deployment.  The
taxon_common table has 309,352 non-deleted rows at the moment. I think
this corresponds pretty closed to "nodes in the tree", and perhaps
therefore to "authority terms".

I just tried an advanced search on this authority on our Dev instance and
after about 6 minutes I did get a result page.  Redo the search took about
30 seconds -- of course there was much cacheing done at that point. (Tomcat
was not busy on that server, but I can't see the load on Postgres as we
don't have direct access to that server).

I'll try this on UCJEPS Production after hours (I'm interested in the
outcome...).

John

On Sun, Nov 15, 2015 at 6:17 PM, Peter Murray pmurray@chillco.com wrote:

Is there a limit to the number of authorities in a CollectionSpace
database?  On behalf of SDMoM, I'm loading extracts of the Bureau of
Geographic Names (211,330 headings) and the National
Geospatial-Intelligence Agency (191,549 headings).  After doing this, the
system ground to a halt on operations such as browsing the headings.  I
could see that there were no errors logged in either PostgreSQL's or
Tomcat's logs (or the CSpace application logs).  The 5-minute load average
on the system was just below 2.0, and both PostgreSQL and Tomcat were
consuming a lot of CPU.  I'm not an expert in database tuning, but I did
follow the steps offered in the installation guide (
https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning
).

Anyone else running a system with 400,000 authority headings?

Peter

Peter Murray
Dev/Ops Lead and Project Manager
Cherry Hill Company


Talk mailing list
Talk@lists.collectionspace.org

http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

Hi Peter, I don't think there are hard limits, but once you get that many records of any type (it could be authority items, procedures, or collection objects), performance takes a dive. We have to train users to avoid browsing all objects, taxon names, or any other type that has a lot of records, because the results won't show up in a reasonable time. Ray On Mon, Nov 16, 2015 at 1:07 PM, John B Lowe <jblowe@berkeley.edu> wrote: > Peter, > > I just checked, and the largest authority in the UCB deployments *appears* > to be the Taxon authority used in the UCJEPS Herbaria deployment. The > taxon_common table has 309,352 non-deleted rows at the moment. I *think* > this corresponds pretty closed to "nodes in the tree", and perhaps > therefore to "authority terms". > > I just tried an advanced search on this authority on our Dev instance and > after about 6 minutes I did get a result page. Redo the search took about > 30 seconds -- of course there was much cacheing done at that point. (Tomcat > was not busy on that server, but I can't see the load on Postgres as we > don't have direct access to that server). > > I'll try this on UCJEPS Production after hours (I'm interested in the > outcome...). > > John > > > > On Sun, Nov 15, 2015 at 6:17 PM, Peter Murray <pmurray@chillco.com> wrote: > >> Is there a limit to the number of authorities in a CollectionSpace >> database? On behalf of SDMoM, I'm loading extracts of the Bureau of >> Geographic Names (211,330 headings) and the National >> Geospatial-Intelligence Agency (191,549 headings). After doing this, the >> system ground to a halt on operations such as browsing the headings. I >> could see that there were no errors logged in either PostgreSQL's or >> Tomcat's logs (or the CSpace application logs). The 5-minute load average >> on the system was just below 2.0, and both PostgreSQL and Tomcat were >> consuming a lot of CPU. I'm not an expert in database tuning, but I did >> follow the steps offered in the installation guide ( >> https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning >> ). >> >> Anyone else running a system with 400,000 authority headings? >> >> >> Peter >> -- >> Peter Murray >> Dev/Ops Lead and Project Manager >> Cherry Hill Company >> >> >> _______________________________________________ >> Talk mailing list >> Talk@lists.collectionspace.org >> >> http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org >> > > > _______________________________________________ > Talk mailing list > Talk@lists.collectionspace.org > > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org > >
RM
Richard Millet
Mon, Nov 16, 2015 9:41 PM

Peter,

Having the most experience with large data sets, Ray and John might be able to provide some guidance on whether or not allocating more resources (e.g., RAM, CPU power, faster disks (SSD)) to Postgres and/or Tomcat will help.

Also, from the functional side, I'm wondering if the full set of terms from the Bureau of Geographic Names (211,330 headings) and the National Geospatial-Intelligence Agency (191,549 headings) are required?  Would a subset suffice?

Richard


From: Talk talk-bounces@lists.collectionspace.org on behalf of Ray Lee rhlee@berkeley.edu
Sent: Monday, November 16, 2015 1:25 PM
To: John B Lowe
Cc: CollectionSpace Talk List
Subject: Re: [Talk] Limit to the number of authorities in a CSpace database?

Hi Peter,
I don't think there are hard limits, but once you get that many records of any type (it could be authority items, procedures, or collection objects), performance takes a dive. We have to train users to avoid browsing all objects, taxon names, or any other type that has a lot of records, because the results won't show up in a reasonable time.

Ray

On Mon, Nov 16, 2015 at 1:07 PM, John B Lowe <jblowe@berkeley.edumailto:jblowe@berkeley.edu> wrote:
Peter,

I just checked, and the largest authority in the UCB deployments appears to be the Taxon authority used in the UCJEPS Herbaria deployment.  The taxon_common table has 309,352 non-deleted rows at the moment. I think this corresponds pretty closed to "nodes in the tree", and perhaps therefore to "authority terms".

I just tried an advanced search on this authority on our Dev instance and after about 6 minutes I did get a result page.  Redo the search took about 30 seconds -- of course there was much cacheing done at that point. (Tomcat was not busy on that server, but I can't see the load on Postgres as we don't have direct access to that server).

I'll try this on UCJEPS Production after hours (I'm interested in the outcome...).

John

On Sun, Nov 15, 2015 at 6:17 PM, Peter Murray <pmurray@chillco.commailto:pmurray@chillco.com> wrote:
Is there a limit to the number of authorities in a CollectionSpace database?  On behalf of SDMoM, I'm loading extracts of the Bureau of Geographic Names (211,330 headings) and the National Geospatial-Intelligence Agency (191,549 headings).  After doing this, the system ground to a halt on operations such as browsing the headings.  I could see that there were no errors logged in either PostgreSQL's or Tomcat's logs (or the CSpace application logs).  The 5-minute load average on the system was just below 2.0, and both PostgreSQL and Tomcat were consuming a lot of CPU.  I'm not an expert in database tuning, but I did follow the steps offered in the installation guide (https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning).

Anyone else running a system with 400,000 authority headings?

Peter

Peter Murray
Dev/Ops Lead and Project Manager
Cherry Hill Company


Talk mailing list
Talk@lists.collectionspace.orgmailto:Talk@lists.collectionspace.org
http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org


Talk mailing list
Talk@lists.collectionspace.orgmailto:Talk@lists.collectionspace.org
http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

Peter, Having the most experience with large data sets, Ray and John might be able to provide some guidance on whether or not allocating more resources (e.g., RAM, CPU power, faster disks (SSD)) to Postgres and/or Tomcat will help. Also, from the functional side, I'm wondering if the full set of terms from the Bureau of Geographic Names (211,330 headings) and the National Geospatial-Intelligence Agency (191,549 headings) are required? Would a subset suffice? Richard ________________________________ From: Talk <talk-bounces@lists.collectionspace.org> on behalf of Ray Lee <rhlee@berkeley.edu> Sent: Monday, November 16, 2015 1:25 PM To: John B Lowe Cc: CollectionSpace Talk List Subject: Re: [Talk] Limit to the number of authorities in a CSpace database? Hi Peter, I don't think there are hard limits, but once you get that many records of any type (it could be authority items, procedures, or collection objects), performance takes a dive. We have to train users to avoid browsing all objects, taxon names, or any other type that has a lot of records, because the results won't show up in a reasonable time. Ray On Mon, Nov 16, 2015 at 1:07 PM, John B Lowe <jblowe@berkeley.edu<mailto:jblowe@berkeley.edu>> wrote: Peter, I just checked, and the largest authority in the UCB deployments *appears* to be the Taxon authority used in the UCJEPS Herbaria deployment. The taxon_common table has 309,352 non-deleted rows at the moment. I *think* this corresponds pretty closed to "nodes in the tree", and perhaps therefore to "authority terms". I just tried an advanced search on this authority on our Dev instance and after about 6 minutes I did get a result page. Redo the search took about 30 seconds -- of course there was much cacheing done at that point. (Tomcat was not busy on that server, but I can't see the load on Postgres as we don't have direct access to that server). I'll try this on UCJEPS Production after hours (I'm interested in the outcome...). John On Sun, Nov 15, 2015 at 6:17 PM, Peter Murray <pmurray@chillco.com<mailto:pmurray@chillco.com>> wrote: Is there a limit to the number of authorities in a CollectionSpace database? On behalf of SDMoM, I'm loading extracts of the Bureau of Geographic Names (211,330 headings) and the National Geospatial-Intelligence Agency (191,549 headings). After doing this, the system ground to a halt on operations such as browsing the headings. I could see that there were no errors logged in either PostgreSQL's or Tomcat's logs (or the CSpace application logs). The 5-minute load average on the system was just below 2.0, and both PostgreSQL and Tomcat were consuming a lot of CPU. I'm not an expert in database tuning, but I did follow the steps offered in the installation guide (https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning). Anyone else running a system with 400,000 authority headings? Peter -- Peter Murray Dev/Ops Lead and Project Manager Cherry Hill Company _______________________________________________ Talk mailing list Talk@lists.collectionspace.org<mailto:Talk@lists.collectionspace.org> http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org _______________________________________________ Talk mailing list Talk@lists.collectionspace.org<mailto:Talk@lists.collectionspace.org> http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org
PM
Peter Murray
Mon, Nov 16, 2015 10:04 PM

Thanks for posting your observations, John.  It was definitely upwards of 5-6 minutes to get any sort of response from the API endpoint.  The web GUI just timed out after a while.  I'm curious to hear what happens on your production server, and maybe we could then compare notes on JVM and Postgres configurations.

Peter

On Nov 16, 2015, at 4:07 PM, John B Lowe jblowe@berkeley.edu wrote:

Peter,

I just checked, and the largest authority in the UCB deployments appears to be the Taxon authority used in the UCJEPS Herbaria deployment.  The taxon_common table has 309,352 non-deleted rows at the moment. I think this corresponds pretty closed to "nodes in the tree", and perhaps therefore to "authority terms".

I just tried an advanced search on this authority on our Dev instance and after about 6 minutes I did get a result page.  Redo the search took about 30 seconds -- of course there was much cacheing done at that point. (Tomcat was not busy on that server, but I can't see the load on Postgres as we don't have direct access to that server).

I'll try this on UCJEPS Production after hours (I'm interested in the outcome...).

John

On Sun, Nov 15, 2015 at 6:17 PM, Peter Murray <pmurray@chillco.com mailto:pmurray@chillco.com> wrote:
Is there a limit to the number of authorities in a CollectionSpace database?  On behalf of SDMoM, I'm loading extracts of the Bureau of Geographic Names (211,330 headings) and the National Geospatial-Intelligence Agency (191,549 headings).  After doing this, the system ground to a halt on operations such as browsing the headings.  I could see that there were no errors logged in either PostgreSQL's or Tomcat's logs (or the CSpace application logs).  The 5-minute load average on the system was just below 2.0, and both PostgreSQL and Tomcat were consuming a lot of CPU.  I'm not an expert in database tuning, but I did follow the steps offered in the installation guide (https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning).

Anyone else running a system with 400,000 authority headings?

Peter

--
Peter Murray
Dev/Ops Lead and Project Manager
Cherry Hill Company

Thanks for posting your observations, John. It was definitely upwards of 5-6 minutes to get any sort of response from the API endpoint. The web GUI just timed out after a while. I'm curious to hear what happens on your production server, and maybe we could then compare notes on JVM and Postgres configurations. Peter > On Nov 16, 2015, at 4:07 PM, John B Lowe <jblowe@berkeley.edu> wrote: > > Peter, > > I just checked, and the largest authority in the UCB deployments *appears* to be the Taxon authority used in the UCJEPS Herbaria deployment. The taxon_common table has 309,352 non-deleted rows at the moment. I *think* this corresponds pretty closed to "nodes in the tree", and perhaps therefore to "authority terms". > > I just tried an advanced search on this authority on our Dev instance and after about 6 minutes I did get a result page. Redo the search took about 30 seconds -- of course there was much cacheing done at that point. (Tomcat was not busy on that server, but I can't see the load on Postgres as we don't have direct access to that server). > > I'll try this on UCJEPS Production after hours (I'm interested in the outcome...). > > John > > > > On Sun, Nov 15, 2015 at 6:17 PM, Peter Murray <pmurray@chillco.com <mailto:pmurray@chillco.com>> wrote: > Is there a limit to the number of authorities in a CollectionSpace database? On behalf of SDMoM, I'm loading extracts of the Bureau of Geographic Names (211,330 headings) and the National Geospatial-Intelligence Agency (191,549 headings). After doing this, the system ground to a halt on operations such as browsing the headings. I could see that there were no errors logged in either PostgreSQL's or Tomcat's logs (or the CSpace application logs). The 5-minute load average on the system was just below 2.0, and both PostgreSQL and Tomcat were consuming a lot of CPU. I'm not an expert in database tuning, but I did follow the steps offered in the installation guide (https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning <https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning>). > > Anyone else running a system with 400,000 authority headings? > > > Peter -- Peter Murray Dev/Ops Lead and Project Manager Cherry Hill Company
PM
Peter Murray
Mon, Nov 16, 2015 10:26 PM

Hmm -- that does reflect what I'm seeing, so it is comforting to know that I'm not too far out in left field.

Peter

On Nov 16, 2015, at 4:25 PM, Ray Lee rhlee@berkeley.edu wrote:

Hi Peter,
I don't think there are hard limits, but once you get that many records of any type (it could be authority items, procedures, or collection objects), performance takes a dive. We have to train users to avoid browsing all objects, taxon names, or any other type that has a lot of records, because the results won't show up in a reasonable time.

Ray

On Mon, Nov 16, 2015 at 1:07 PM, John B Lowe <jblowe@berkeley.edu mailto:jblowe@berkeley.edu> wrote:
Peter,

I just checked, and the largest authority in the UCB deployments appears to be the Taxon authority used in the UCJEPS Herbaria deployment.  The taxon_common table has 309,352 non-deleted rows at the moment. I think this corresponds pretty closed to "nodes in the tree", and perhaps therefore to "authority terms".

I just tried an advanced search on this authority on our Dev instance and after about 6 minutes I did get a result page.  Redo the search took about 30 seconds -- of course there was much cacheing done at that point. (Tomcat was not busy on that server, but I can't see the load on Postgres as we don't have direct access to that server).

I'll try this on UCJEPS Production after hours (I'm interested in the outcome...).

John

On Sun, Nov 15, 2015 at 6:17 PM, Peter Murray <pmurray@chillco.com mailto:pmurray@chillco.com> wrote:
Is there a limit to the number of authorities in a CollectionSpace database?  On behalf of SDMoM, I'm loading extracts of the Bureau of Geographic Names (211,330 headings) and the National Geospatial-Intelligence Agency (191,549 headings).  After doing this, the system ground to a halt on operations such as browsing the headings.  I could see that there were no errors logged in either PostgreSQL's or Tomcat's logs (or the CSpace application logs).  The 5-minute load average on the system was just below 2.0, and both PostgreSQL and Tomcat were consuming a lot of CPU.  I'm not an expert in database tuning, but I did follow the steps offered in the installation guide (https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning).

Anyone else running a system with 400,000 authority headings?

Peter

--
Peter Murray
Dev/Ops Lead and Project Manager
Cherry Hill Company

Hmm -- that does reflect what I'm seeing, so it is comforting to know that I'm not too far out in left field. Peter > On Nov 16, 2015, at 4:25 PM, Ray Lee <rhlee@berkeley.edu> wrote: > > Hi Peter, > I don't think there are hard limits, but once you get that many records of any type (it could be authority items, procedures, or collection objects), performance takes a dive. We have to train users to avoid browsing all objects, taxon names, or any other type that has a lot of records, because the results won't show up in a reasonable time. > > Ray > > > On Mon, Nov 16, 2015 at 1:07 PM, John B Lowe <jblowe@berkeley.edu <mailto:jblowe@berkeley.edu>> wrote: > Peter, > > I just checked, and the largest authority in the UCB deployments *appears* to be the Taxon authority used in the UCJEPS Herbaria deployment. The taxon_common table has 309,352 non-deleted rows at the moment. I *think* this corresponds pretty closed to "nodes in the tree", and perhaps therefore to "authority terms". > > I just tried an advanced search on this authority on our Dev instance and after about 6 minutes I did get a result page. Redo the search took about 30 seconds -- of course there was much cacheing done at that point. (Tomcat was not busy on that server, but I can't see the load on Postgres as we don't have direct access to that server). > > I'll try this on UCJEPS Production after hours (I'm interested in the outcome...). > > John > > > > On Sun, Nov 15, 2015 at 6:17 PM, Peter Murray <pmurray@chillco.com <mailto:pmurray@chillco.com>> wrote: > Is there a limit to the number of authorities in a CollectionSpace database? On behalf of SDMoM, I'm loading extracts of the Bureau of Geographic Names (211,330 headings) and the National Geospatial-Intelligence Agency (191,549 headings). After doing this, the system ground to a halt on operations such as browsing the headings. I could see that there were no errors logged in either PostgreSQL's or Tomcat's logs (or the CSpace application logs). The 5-minute load average on the system was just below 2.0, and both PostgreSQL and Tomcat were consuming a lot of CPU. I'm not an expert in database tuning, but I did follow the steps offered in the installation guide (https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning <https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning>). > > Anyone else running a system with 400,000 authority headings? > > > Peter -- Peter Murray Dev/Ops Lead and Project Manager Cherry Hill Company
PM
Peter Murray
Mon, Nov 16, 2015 10:31 PM

That is a useful functional question.  The issue is that the datasets by themselves do not offer many knobs to fine-tune how it can be sliced.  BGN has a column for 'population' but it is no longer, well, populated -- so BGN would need to get cross-referenced with something else (probably census data?) to remove a lot of the clutter.  NGA provides a slightly more helpful 'display' field with values between 1 and 9 that represent the scale that should include the entry should the entry be placed on a map.  In practice, though, it doesn't cut out much of the cruft.

If there are other (free) sources of place names, I'm open to trying them out...

Peter

On Nov 16, 2015, at 4:41 PM, Richard Millet richard.millet@lyrasis.org wrote:

Peter,

Having the most experience with large data sets, Ray and John might be able to provide some guidance on whether or not allocating more resources (e.g., RAM, CPU power, faster disks (SSD)) to Postgres and/or Tomcat will help.

Also, from the functional side, I'm wondering if the full set of terms from the Bureau of Geographic Names (211,330 headings) and the National Geospatial-Intelligence Agency (191,549 headings) are required?  Would a subset suffice?

Richard

From: Talk <talk-bounces@lists.collectionspace.org mailto:talk-bounces@lists.collectionspace.org> on behalf of Ray Lee <rhlee@berkeley.edu mailto:rhlee@berkeley.edu>
Sent: Monday, November 16, 2015 1:25 PM
To: John B Lowe
Cc: CollectionSpace Talk List
Subject: Re: [Talk] Limit to the number of authorities in a CSpace database?

Hi Peter,
I don't think there are hard limits, but once you get that many records of any type (it could be authority items, procedures, or collection objects), performance takes a dive. We have to train users to avoid browsing all objects, taxon names, or any other type that has a lot of records, because the results won't show up in a reasonable time.

Ray

On Mon, Nov 16, 2015 at 1:07 PM, John B Lowe <jblowe@berkeley.edu mailto:jblowe@berkeley.edu> wrote:
Peter,

I just checked, and the largest authority in the UCB deployments appears to be the Taxon authority used in the UCJEPS Herbaria deployment.  The taxon_common table has 309,352 non-deleted rows at the moment. I think this corresponds pretty closed to "nodes in the tree", and perhaps therefore to "authority terms".

I just tried an advanced search on this authority on our Dev instance and after about 6 minutes I did get a result page.  Redo the search took about 30 seconds -- of course there was much cacheing done at that point. (Tomcat was not busy on that server, but I can't see the load on Postgres as we don't have direct access to that server).

I'll try this on UCJEPS Production after hours (I'm interested in the outcome...).

John

On Sun, Nov 15, 2015 at 6:17 PM, Peter Murray <pmurray@chillco.com mailto:pmurray@chillco.com> wrote:
Is there a limit to the number of authorities in a CollectionSpace database?  On behalf of SDMoM, I'm loading extracts of the Bureau of Geographic Names (211,330 headings) and the National Geospatial-Intelligence Agency (191,549 headings).  After doing this, the system ground to a halt on operations such as browsing the headings.  I could see that there were no errors logged in either PostgreSQL's or Tomcat's logs (or the CSpace application logs).  The 5-minute load average on the system was just below 2.0, and both PostgreSQL and Tomcat were consuming a lot of CPU.  I'm not an expert in database tuning, but I did follow the steps offered in the installation guide (https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning).

Anyone else running a system with 400,000 authority headings?

Peter

--
Peter Murray
Dev/Ops Lead and Project Manager
Cherry Hill Company

That is a useful functional question. The issue is that the datasets by themselves do not offer many knobs to fine-tune how it can be sliced. BGN has a column for 'population' but it is no longer, well, populated -- so BGN would need to get cross-referenced with something else (probably census data?) to remove a lot of the clutter. NGA provides a slightly more helpful 'display' field with values between 1 and 9 that represent the scale that should include the entry should the entry be placed on a map. In practice, though, it doesn't cut out much of the cruft. If there are other (free) sources of place names, I'm open to trying them out... Peter > On Nov 16, 2015, at 4:41 PM, Richard Millet <richard.millet@lyrasis.org> wrote: > > Peter, > > Having the most experience with large data sets, Ray and John might be able to provide some guidance on whether or not allocating more resources (e.g., RAM, CPU power, faster disks (SSD)) to Postgres and/or Tomcat will help. > > Also, from the functional side, I'm wondering if the full set of terms from the Bureau of Geographic Names (211,330 headings) and the National Geospatial-Intelligence Agency (191,549 headings) are required? Would a subset suffice? > > Richard > > From: Talk <talk-bounces@lists.collectionspace.org <mailto:talk-bounces@lists.collectionspace.org>> on behalf of Ray Lee <rhlee@berkeley.edu <mailto:rhlee@berkeley.edu>> > Sent: Monday, November 16, 2015 1:25 PM > To: John B Lowe > Cc: CollectionSpace Talk List > Subject: Re: [Talk] Limit to the number of authorities in a CSpace database? > > Hi Peter, > I don't think there are hard limits, but once you get that many records of any type (it could be authority items, procedures, or collection objects), performance takes a dive. We have to train users to avoid browsing all objects, taxon names, or any other type that has a lot of records, because the results won't show up in a reasonable time. > > Ray > > > On Mon, Nov 16, 2015 at 1:07 PM, John B Lowe <jblowe@berkeley.edu <mailto:jblowe@berkeley.edu>> wrote: > Peter, > > I just checked, and the largest authority in the UCB deployments *appears* to be the Taxon authority used in the UCJEPS Herbaria deployment. The taxon_common table has 309,352 non-deleted rows at the moment. I *think* this corresponds pretty closed to "nodes in the tree", and perhaps therefore to "authority terms". > > I just tried an advanced search on this authority on our Dev instance and after about 6 minutes I did get a result page. Redo the search took about 30 seconds -- of course there was much cacheing done at that point. (Tomcat was not busy on that server, but I can't see the load on Postgres as we don't have direct access to that server). > > I'll try this on UCJEPS Production after hours (I'm interested in the outcome...). > > John > > > > On Sun, Nov 15, 2015 at 6:17 PM, Peter Murray <pmurray@chillco.com <mailto:pmurray@chillco.com>> wrote: > Is there a limit to the number of authorities in a CollectionSpace database? On behalf of SDMoM, I'm loading extracts of the Bureau of Geographic Names (211,330 headings) and the National Geospatial-Intelligence Agency (191,549 headings). After doing this, the system ground to a halt on operations such as browsing the headings. I could see that there were no errors logged in either PostgreSQL's or Tomcat's logs (or the CSpace application logs). The 5-minute load average on the system was just below 2.0, and both PostgreSQL and Tomcat were consuming a lot of CPU. I'm not an expert in database tuning, but I did follow the steps offered in the installation guide (https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning <https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning>). > > Anyone else running a system with 400,000 authority headings? > > > Peter -- Peter Murray Dev/Ops Lead and Project Manager Cherry Hill Company
SS
Susan STONE
Mon, Nov 16, 2015 11:13 PM

We used to think that by now we would have
progressed to the point where we wouldn't
have to load whole "standard" vocabularies from outside sources.
If I remember correctly, that was the origin of the refname
syntax that we love so well: we could point to terms in
external vocabularies. But no one has done anything to
hook any of them up to cspace.

Susan

On Mon, Nov 16, 2015 at 2:31 PM, Peter Murray pmurray@chillco.com wrote:

That is a useful functional question.  The issue is that the datasets by
themselves do not offer many knobs to fine-tune how it can be sliced.  BGN
has a column for 'population' but it is no longer, well, populated -- so BGN
would need to get cross-referenced with something else (probably census
data?) to remove a lot of the clutter.  NGA provides a slightly more helpful
'display' field with values between 1 and 9 that represent the scale that
should include the entry should the entry be placed on a map.  In practice,
though, it doesn't cut out much of the cruft.

If there are other (free) sources of place names, I'm open to trying them
out...

Peter

On Nov 16, 2015, at 4:41 PM, Richard Millet richard.millet@lyrasis.org
wrote:

Peter,

Having the most experience with large data sets, Ray and John might be able
to provide some guidance on whether or not allocating more resources (e.g.,
RAM, CPU power, faster disks (SSD)) to Postgres and/or Tomcat will help.

Also, from the functional side, I'm wondering if the full set of terms from
the Bureau of Geographic Names (211,330 headings) and the National
Geospatial-Intelligence Agency (191,549 headings) are required?  Would a
subset suffice?

Richard


From: Talk talk-bounces@lists.collectionspace.org on behalf of Ray Lee
rhlee@berkeley.edu
Sent: Monday, November 16, 2015 1:25 PM
To: John B Lowe
Cc: CollectionSpace Talk List
Subject: Re: [Talk] Limit to the number of authorities in a CSpace database?

Hi Peter,
I don't think there are hard limits, but once you get that many records of
any type (it could be authority items, procedures, or collection objects),
performance takes a dive. We have to train users to avoid browsing all
objects, taxon names, or any other type that has a lot of records, because
the results won't show up in a reasonable time.

Ray

On Mon, Nov 16, 2015 at 1:07 PM, John B Lowe jblowe@berkeley.edu wrote:

Peter,

I just checked, and the largest authority in the UCB deployments appears
to be the Taxon authority used in the UCJEPS Herbaria deployment.  The
taxon_common table has 309,352 non-deleted rows at the moment. I think
this corresponds pretty closed to "nodes in the tree", and perhaps therefore
to "authority terms".

I just tried an advanced search on this authority on our Dev instance and
after about 6 minutes I did get a result page.  Redo the search took about
30 seconds -- of course there was much cacheing done at that point. (Tomcat
was not busy on that server, but I can't see the load on Postgres as we
don't have direct access to that server).

I'll try this on UCJEPS Production after hours (I'm interested in the
outcome...).

John

On Sun, Nov 15, 2015 at 6:17 PM, Peter Murray pmurray@chillco.com wrote:

Is there a limit to the number of authorities in a CollectionSpace
database?  On behalf of SDMoM, I'm loading extracts of the Bureau of
Geographic Names (211,330 headings) and the National Geospatial-Intelligence
Agency (191,549 headings).  After doing this, the system ground to a halt on
operations such as browsing the headings.  I could see that there were no
errors logged in either PostgreSQL's or Tomcat's logs (or the CSpace
application logs).  The 5-minute load average on the system was just below
2.0, and both PostgreSQL and Tomcat were consuming a lot of CPU.  I'm not an
expert in database tuning, but I did follow the steps offered in the
installation guide
(https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning).

Anyone else running a system with 400,000 authority headings?

Peter

--
Peter Murray
Dev/Ops Lead and Project Manager
Cherry Hill Company


Talk mailing list
Talk@lists.collectionspace.org
http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

We used to think that by now we would have progressed to the point where we wouldn't have to load whole "standard" vocabularies from outside sources. If I remember correctly, that was the origin of the refname syntax that we love so well: we could point to terms in external vocabularies. But no one has done anything to hook any of them up to cspace. Susan On Mon, Nov 16, 2015 at 2:31 PM, Peter Murray <pmurray@chillco.com> wrote: > That is a useful functional question. The issue is that the datasets by > themselves do not offer many knobs to fine-tune how it can be sliced. BGN > has a column for 'population' but it is no longer, well, populated -- so BGN > would need to get cross-referenced with something else (probably census > data?) to remove a lot of the clutter. NGA provides a slightly more helpful > 'display' field with values between 1 and 9 that represent the scale that > should include the entry should the entry be placed on a map. In practice, > though, it doesn't cut out much of the cruft. > > If there are other (free) sources of place names, I'm open to trying them > out... > > > Peter > > On Nov 16, 2015, at 4:41 PM, Richard Millet <richard.millet@lyrasis.org> > wrote: > > Peter, > > Having the most experience with large data sets, Ray and John might be able > to provide some guidance on whether or not allocating more resources (e.g., > RAM, CPU power, faster disks (SSD)) to Postgres and/or Tomcat will help. > > Also, from the functional side, I'm wondering if the full set of terms from > the Bureau of Geographic Names (211,330 headings) and the National > Geospatial-Intelligence Agency (191,549 headings) are required? Would a > subset suffice? > > Richard > > ________________________________ > From: Talk <talk-bounces@lists.collectionspace.org> on behalf of Ray Lee > <rhlee@berkeley.edu> > Sent: Monday, November 16, 2015 1:25 PM > To: John B Lowe > Cc: CollectionSpace Talk List > Subject: Re: [Talk] Limit to the number of authorities in a CSpace database? > > Hi Peter, > I don't think there are hard limits, but once you get that many records of > any type (it could be authority items, procedures, or collection objects), > performance takes a dive. We have to train users to avoid browsing all > objects, taxon names, or any other type that has a lot of records, because > the results won't show up in a reasonable time. > > Ray > > > On Mon, Nov 16, 2015 at 1:07 PM, John B Lowe <jblowe@berkeley.edu> wrote: >> >> Peter, >> >> I just checked, and the largest authority in the UCB deployments *appears* >> to be the Taxon authority used in the UCJEPS Herbaria deployment. The >> taxon_common table has 309,352 non-deleted rows at the moment. I *think* >> this corresponds pretty closed to "nodes in the tree", and perhaps therefore >> to "authority terms". >> >> I just tried an advanced search on this authority on our Dev instance and >> after about 6 minutes I did get a result page. Redo the search took about >> 30 seconds -- of course there was much cacheing done at that point. (Tomcat >> was not busy on that server, but I can't see the load on Postgres as we >> don't have direct access to that server). >> >> I'll try this on UCJEPS Production after hours (I'm interested in the >> outcome...). >> >> John >> >> >> >> On Sun, Nov 15, 2015 at 6:17 PM, Peter Murray <pmurray@chillco.com> wrote: >>> >>> Is there a limit to the number of authorities in a CollectionSpace >>> database? On behalf of SDMoM, I'm loading extracts of the Bureau of >>> Geographic Names (211,330 headings) and the National Geospatial-Intelligence >>> Agency (191,549 headings). After doing this, the system ground to a halt on >>> operations such as browsing the headings. I could see that there were no >>> errors logged in either PostgreSQL's or Tomcat's logs (or the CSpace >>> application logs). The 5-minute load average on the system was just below >>> 2.0, and both PostgreSQL and Tomcat were consuming a lot of CPU. I'm not an >>> expert in database tuning, but I did follow the steps offered in the >>> installation guide >>> (https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning). >>> >>> Anyone else running a system with 400,000 authority headings? >>> >>> >>> Peter > > > > -- > Peter Murray > Dev/Ops Lead and Project Manager > Cherry Hill Company > > > > _______________________________________________ > Talk mailing list > Talk@lists.collectionspace.org > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org >
AR
Aron Roberts
Mon, Nov 16, 2015 11:29 PM

This will likely repeat a lot of what John, Ray, and Richard have said,
but my two cents as well ... there's also a note here from a discussion
with Chris about search training.

A summary / tl;dr:

  1. Elasticsearch integration may be a future help in speeding up large
    searches. Until then (and likely even after then ...
  2. Training users on doing effective searching can significantly improve
    search speed.
  3. Using just a subset of term lists can help (if it's amenable to that;
    it may not be here ...).
  4. Throwing hardware at the problem - particularly fast disk - can help
    quite a lot.
  5. Be sure that the database is cared for/tuned a bit:
    5.1 Run VACUUM/ANALYZE
    5.2 Look at other tuning tips

Aron

--

  1. Elasticsearch

We're hoping that when CollectionSpace offers the possibility to direct
certain searches to use Elasticsearch indexes (
https://issues.collectionspace.org/browse/CSPACE-6760), rather than
querying the database, some of its slowest searches can be significantly
accelerated.

That remains to be seen, but our experiences (pioneered by John Lowe)
with using Solr (roughly comparable to Elasticsearch, and both based on
Apache Lucene) to handle searches in some of the CollectionSpace webapps,
gives significant hope here.

  1. Effective searching

Until then, as Ray noted, it's desirable to avoid searching/browsing in
the most problematic "all records" cases. More broadly, some thoughts from
Chris and Ray from another context ...

Chris wrote:
"Definitely don’t underestimate the importance of educating your users
about ways to improve searches (e.g., usually by including as many
characters as possible, though admittedly sometimes you just have to search
for something having few characters)."

And Ray noted:
"As Chris said, we've handled this with training to avoid slow searches,
and designing workflows that avoid them."

Chris just came by my cube and offered a potentially important suggestion
regarding this: if these authority terms will mostly be searched via
autocomplete / term completion fields, using anchor characters in searches
(rather than relying on default behavior) can significantly speed up
searching. Worth a try with your large term lists ... will be interested to
know of your experience here.

Here's a quick look at using anchor and/or wildcard characters in
autocomplete, a feature which has been around for several CollectionSpace
releases now:

https://wiki.collectionspace.org/display/collectionspace/Proposed+autocomplete+enhancements+for+Release+3.3

To understand better whether this would be helpful, when you wrote,
"After doing this, the system ground to a halt on operations such as
browsing the headings ...", what operation(s) were you performing, exactly?

  1. Subsets of terms

Consider subsets of your full term list (as per Richard), if feasible.
(Just read your follow-up here, about how these two particular term lists,
BGN and NGA, don't lend themselves well to this ...)

  1. Hardware upgrades (especially fast disk)

Strongly consider hardware upgrades, particularly fast disk (e.g. SSD).

AIUI, CollectionSpace's search performance is database-bound, and that,
in turn, is generally disk-bound. (Adding RAM to your database server can
also help out with its caching of indexes and in helping the OS in caching
files. But starting out with fast disk, rather than additional memory, as
the first incremental upgrade, seems from my colleagues' findings to be the
way to approach this.)

Per a long and fruitful email discussion - unfortunately not on the Talk
list, sigh - that a number of us had with Al and Becky of OMCA, Susan
Stone, et al. on May 20-21, 2015 (and from which the quotes above came),
some snippets/conclusions around this:

Ray:
"I just did a wildcard object number search on pahma that returned 14,000
records, and it took about 4 minutes. That's on our current, non-SSD
database. On the SSD that query runs in 30 seconds."

John:
"We have 15GB RAM each for our production and dev postgres servers. It was
recently noted that one custom query was taking 6.8 hours on Dev, but
only 25 minutes on Prod. The difference is SSD."

  1. Look at database cleanup/tuning

5.1. Run ANALYZE or VACUUM ANALYZE

Run ANALYZE or VACUUM ANALYZE to help PostgreSQL better understand its
data and improve its query planning:

http://www.postgresql.org/docs/current/static/sql-analyze.html
and
http://www.postgresql.org/docs/current/static/sql-vacuum.html

5.2 Tuning

You might also take a quick look at the database tuning discussions in
this JIRA comment, and the ones that follow:

https://issues.collectionspace.org/browse/PAHMA-708?focusedCommentId=35282&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-35282

These are a bit more heuristic, and provide more explanation, than the
rote instructions in the Tuning section of the current docs.

Ray's wrapup of much of this, from that discussion with the OMCA folks:

"I would say that searches can still be very slow for us, and we haven't
had time to look into it further. We've just tried to train users to avoid
searches that return a lot of results. I've mostly been waiting for the
Nuxeo/ElasticSearch integration to arrive, which didn't make it to CSpace
4.2, but hopefully will make it into the following version. That will help
keyword search, but maybe not term completion or field search.

"Throwing hardware at the problem definitely helps. Add as much RAM to the
db server as you can, and configure pg_buffercache to take advantage of it.
Put the database on the fastest disk you can. We're going to put it on SSD
soon, which will help a lot. Also make sure to run ANALYZE after importing
data, so your stats are up to date."

Aron

On Mon, Nov 16, 2015 at 2:26 PM, Peter Murray pmurray@chillco.com wrote:

Hmm -- that does reflect what I'm seeing, so it is comforting to know that
I'm not too far out in left field.

Peter

On Nov 16, 2015, at 4:25 PM, Ray Lee rhlee@berkeley.edu wrote:

Hi Peter,
I don't think there are hard limits, but once you get that many records of
any type (it could be authority items, procedures, or collection objects),
performance takes a dive. We have to train users to avoid browsing all
objects, taxon names, or any other type that has a lot of records, because
the results won't show up in a reasonable time.

Ray

On Mon, Nov 16, 2015 at 1:07 PM, John B Lowe jblowe@berkeley.edu wrote:

Peter,

I just checked, and the largest authority in the UCB deployments
appears to be the Taxon authority used in the UCJEPS Herbaria
deployment.  The taxon_common table has 309,352 non-deleted rows at the
moment. I think this corresponds pretty closed to "nodes in the tree",
and perhaps therefore to "authority terms".

I just tried an advanced search on this authority on our Dev instance and
after about 6 minutes I did get a result page.  Redo the search took about
30 seconds -- of course there was much cacheing done at that point. (Tomcat
was not busy on that server, but I can't see the load on Postgres as we
don't have direct access to that server).

I'll try this on UCJEPS Production after hours (I'm interested in the
outcome...).

John

On Sun, Nov 15, 2015 at 6:17 PM, Peter Murray pmurray@chillco.com
wrote:

Is there a limit to the number of authorities in a CollectionSpace
database?  On behalf of SDMoM, I'm loading extracts of the Bureau of
Geographic Names (211,330 headings) and the National
Geospatial-Intelligence Agency (191,549 headings).  After doing this, the
system ground to a halt on operations such as browsing the headings.  I
could see that there were no errors logged in either PostgreSQL's or
Tomcat's logs (or the CSpace application logs).  The 5-minute load average
on the system was just below 2.0, and both PostgreSQL and Tomcat were
consuming a lot of CPU.  I'm not an expert in database tuning, but I did
follow the steps offered in the installation guide (
https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning
).

Anyone else running a system with 400,000 authority headings?

Peter

--
Peter Murray
Dev/Ops Lead and Project Manager
Cherry Hill Company


Talk mailing list
Talk@lists.collectionspace.org

http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

This will likely repeat a lot of what John, Ray, and Richard have said, but my two cents as well ... there's also a note here from a discussion with Chris about search training. A summary / tl;dr: 1. Elasticsearch integration may be a future help in speeding up large searches. Until then (and likely even after then ... 2. Training users on doing effective searching can significantly improve search speed. 3. Using just a subset of term lists can help (if it's amenable to that; it may not be here ...). 4. Throwing hardware at the problem - particularly fast disk - can help quite a lot. 5. Be sure that the database is cared for/tuned a bit: 5.1 Run VACUUM/ANALYZE 5.2 Look at other tuning tips Aron -- 1. Elasticsearch We're hoping that when CollectionSpace offers the possibility to direct certain searches to use Elasticsearch indexes ( https://issues.collectionspace.org/browse/CSPACE-6760), rather than querying the database, some of its slowest searches can be significantly accelerated. That remains to be seen, but our experiences (pioneered by John Lowe) with using Solr (roughly comparable to Elasticsearch, and both based on Apache Lucene) to handle searches in some of the CollectionSpace webapps, gives significant hope here. 2. Effective searching Until then, as Ray noted, it's desirable to avoid searching/browsing in the most problematic "all records" cases. More broadly, some thoughts from Chris and Ray from another context ... Chris wrote: "Definitely don’t underestimate the importance of educating your users about ways to improve searches (e.g., usually by including as many characters as possible, though admittedly sometimes you just have to search for something having few characters)." And Ray noted: "As Chris said, we've handled this with training to avoid slow searches, and designing workflows that avoid them." Chris just came by my cube and offered a potentially important suggestion regarding this: if these authority terms will mostly be searched via autocomplete / term completion fields, using anchor characters in searches (rather than relying on default behavior) can significantly speed up searching. Worth a try with your large term lists ... will be interested to know of your experience here. Here's a quick look at using anchor and/or wildcard characters in autocomplete, a feature which has been around for several CollectionSpace releases now: https://wiki.collectionspace.org/display/collectionspace/Proposed+autocomplete+enhancements+for+Release+3.3 To understand better whether this would be helpful, when you wrote, "After doing this, the system ground to a halt on operations such as browsing the headings ...", what operation(s) were you performing, exactly? 3. Subsets of terms Consider subsets of your full term list (as per Richard), if feasible. (Just read your follow-up here, about how these two particular term lists, BGN and NGA, don't lend themselves well to this ...) 4. Hardware upgrades (especially fast disk) Strongly consider hardware upgrades, particularly fast disk (e.g. SSD). AIUI, CollectionSpace's search performance is database-bound, and that, in turn, is generally disk-bound. (Adding RAM to your database server can also help out with its caching of indexes and in helping the OS in caching files. But starting out with fast disk, rather than additional memory, as the first incremental upgrade, seems from my colleagues' findings to be the way to approach this.) Per a long and fruitful email discussion - unfortunately not on the Talk list, sigh - that a number of us had with Al and Becky of OMCA, Susan Stone, et al. on May 20-21, 2015 (and from which the quotes above came), some snippets/conclusions around this: Ray: "I just did a wildcard object number search on pahma that returned 14,000 records, and it took about 4 minutes. That's on our current, non-SSD database. On the SSD that query runs in 30 seconds." John: "We have 15GB RAM each for our production and dev postgres servers. It was recently noted that one *custom* query was taking 6.8 hours on Dev, but only 25 minutes on Prod. The difference is SSD." 5. Look at database cleanup/tuning 5.1. Run ANALYZE or VACUUM ANALYZE Run ANALYZE or VACUUM ANALYZE to help PostgreSQL better understand its data and improve its query planning: http://www.postgresql.org/docs/current/static/sql-analyze.html and http://www.postgresql.org/docs/current/static/sql-vacuum.html 5.2 Tuning You might also take a quick look at the database tuning discussions in this JIRA comment, and the ones that follow: https://issues.collectionspace.org/browse/PAHMA-708?focusedCommentId=35282&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-35282 These are a bit more heuristic, and provide more explanation, than the rote instructions in the Tuning section of the current docs. Ray's wrapup of much of this, from that discussion with the OMCA folks: "I would say that searches can still be very slow for us, and we haven't had time to look into it further. We've just tried to train users to avoid searches that return a lot of results. I've mostly been waiting for the Nuxeo/ElasticSearch integration to arrive, which didn't make it to CSpace 4.2, but hopefully will make it into the following version. That will help keyword search, but maybe not term completion or field search. "Throwing hardware at the problem definitely helps. Add as much RAM to the db server as you can, and configure pg_buffercache to take advantage of it. Put the database on the fastest disk you can. We're going to put it on SSD soon, which will help a lot. Also make sure to run ANALYZE after importing data, so your stats are up to date." Aron On Mon, Nov 16, 2015 at 2:26 PM, Peter Murray <pmurray@chillco.com> wrote: > Hmm -- that does reflect what I'm seeing, so it is comforting to know that > I'm not too far out in left field. > > > Peter > > On Nov 16, 2015, at 4:25 PM, Ray Lee <rhlee@berkeley.edu> wrote: > > Hi Peter, > I don't think there are hard limits, but once you get that many records of > any type (it could be authority items, procedures, or collection objects), > performance takes a dive. We have to train users to avoid browsing all > objects, taxon names, or any other type that has a lot of records, because > the results won't show up in a reasonable time. > > Ray > > > On Mon, Nov 16, 2015 at 1:07 PM, John B Lowe <jblowe@berkeley.edu> wrote: > >> Peter, >> >> I just checked, and the largest authority in the UCB deployments >> *appears* to be the Taxon authority used in the UCJEPS Herbaria >> deployment. The taxon_common table has 309,352 non-deleted rows at the >> moment. I *think* this corresponds pretty closed to "nodes in the tree", >> and perhaps therefore to "authority terms". >> >> I just tried an advanced search on this authority on our Dev instance and >> after about 6 minutes I did get a result page. Redo the search took about >> 30 seconds -- of course there was much cacheing done at that point. (Tomcat >> was not busy on that server, but I can't see the load on Postgres as we >> don't have direct access to that server). >> >> I'll try this on UCJEPS Production after hours (I'm interested in the >> outcome...). >> >> John >> >> >> >> On Sun, Nov 15, 2015 at 6:17 PM, Peter Murray <pmurray@chillco.com> >> wrote: >> >>> Is there a limit to the number of authorities in a CollectionSpace >>> database? On behalf of SDMoM, I'm loading extracts of the Bureau of >>> Geographic Names (211,330 headings) and the National >>> Geospatial-Intelligence Agency (191,549 headings). After doing this, the >>> system ground to a halt on operations such as browsing the headings. I >>> could see that there were no errors logged in either PostgreSQL's or >>> Tomcat's logs (or the CSpace application logs). The 5-minute load average >>> on the system was just below 2.0, and both PostgreSQL and Tomcat were >>> consuming a lot of CPU. I'm not an expert in database tuning, but I did >>> follow the steps offered in the installation guide ( >>> https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning >>> ). >>> >>> Anyone else running a system with 400,000 authority headings? >>> >>> >>> Peter >> >> > > -- > Peter Murray > Dev/Ops Lead and Project Manager > Cherry Hill Company > > > > _______________________________________________ > Talk mailing list > Talk@lists.collectionspace.org > > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org > >
CH
Chris Hoffman
Mon, Nov 16, 2015 11:45 PM

Thanks, Aron!

Peter, I just want to highlight this from Aron’s note:

Chris just came by my cube and offered a potentially important suggestion regarding this: if these authority terms will mostly be searched via autocomplete / term completion fields, using anchor characters in searches (rather than relying on default behavior) can significantly speed up searching. Worth a try with your large term lists ... will be interested to know of your experience here.

Here's a quick look at using anchor and/or wildcard characters in autocomplete, a feature which has been around for several CollectionSpace releases now:

https://wiki.collectionspace.org/display/collectionspace/Proposed+autocomplete+enhancements+for+Release+3.3 https://wiki.collectionspace.org/display/collectionspace/Proposed+autocomplete+enhancements+for+Release+3.3

We’ve found with our deployments that anchoring the term completion search, using the ^-carat to stand for the beginning or end of string is an absolute necessity. This helps for speed of search AND it addresses another critical problem: by default, term completion will only return 40 matches, based on random order of a keyword search.  If you search term appears more than 40 results (which is likely in a large vocabulary), then you might need to anchor the search in order to retrieve the item of interest.  We have had people create multiple versions of the same term because they could never retrieve the one stored in the system.

Regards,
Chris

On Nov 16, 2015, at 3:29 PM, Aron Roberts aron@socrates.berkeley.edu wrote:

This will likely repeat a lot of what John, Ray, and Richard have said, but my two cents as well ... there's also a note here from a discussion with Chris about search training.

A summary / tl;dr:

  1. Elasticsearch integration may be a future help in speeding up large searches. Until then (and likely even after then ...
  2. Training users on doing effective searching can significantly improve search speed.
  3. Using just a subset of term lists can help (if it's amenable to that; it may not be here ...).
  4. Throwing hardware at the problem - particularly fast disk - can help quite a lot.
  5. Be sure that the database is cared for/tuned a bit:
    5.1 Run VACUUM/ANALYZE
    5.2 Look at other tuning tips

Aron

--

  1. Elasticsearch

We're hoping that when CollectionSpace offers the possibility to direct certain searches to use Elasticsearch indexes (https://issues.collectionspace.org/browse/CSPACE-6760 https://issues.collectionspace.org/browse/CSPACE-6760), rather than querying the database, some of its slowest searches can be significantly accelerated.

That remains to be seen, but our experiences (pioneered by John Lowe) with using Solr (roughly comparable to Elasticsearch, and both based on Apache Lucene) to handle searches in some of the CollectionSpace webapps, gives significant hope here.

  1. Effective searching

Until then, as Ray noted, it's desirable to avoid searching/browsing in the most problematic "all records" cases. More broadly, some thoughts from Chris and Ray from another context ...

Chris wrote:
"Definitely don’t underestimate the importance of educating your users about ways to improve searches (e.g., usually by including as many characters as possible, though admittedly sometimes you just have to search for something having few characters)."

And Ray noted:
"As Chris said, we've handled this with training to avoid slow searches, and designing workflows that avoid them."

Chris just came by my cube and offered a potentially important suggestion regarding this: if these authority terms will mostly be searched via autocomplete / term completion fields, using anchor characters in searches (rather than relying on default behavior) can significantly speed up searching. Worth a try with your large term lists ... will be interested to know of your experience here.

Here's a quick look at using anchor and/or wildcard characters in autocomplete, a feature which has been around for several CollectionSpace releases now:

https://wiki.collectionspace.org/display/collectionspace/Proposed+autocomplete+enhancements+for+Release+3.3 https://wiki.collectionspace.org/display/collectionspace/Proposed+autocomplete+enhancements+for+Release+3.3

To understand better whether this would be helpful, when you wrote, "After doing this, the system ground to a halt on operations such as browsing the headings ...", what operation(s) were you performing, exactly?

  1. Subsets of terms
Consider subsets of your full term list (as per Richard), if feasible. (Just read your follow-up here, about how these two particular term lists, BGN and NGA, don't lend themselves well to this ...)
  1. Hardware upgrades (especially fast disk)

Strongly consider hardware upgrades, particularly fast disk (e.g. SSD).

AIUI, CollectionSpace's search performance is database-bound, and that, in turn, is generally disk-bound. (Adding RAM to your database server can also help out with its caching of indexes and in helping the OS in caching files. But starting out with fast disk, rather than additional memory, as the first incremental upgrade, seems from my colleagues' findings to be the way to approach this.)

Per a long and fruitful email discussion - unfortunately not on the Talk list, sigh - that a number of us had with Al and Becky of OMCA, Susan Stone, et al. on May 20-21, 2015 (and from which the quotes above came), some snippets/conclusions around this:

Ray:
"I just did a wildcard object number search on pahma that returned 14,000 records, and it took about 4 minutes. That's on our current, non-SSD database. On the SSD that query runs in 30 seconds."

John:
"We have 15GB RAM each for our production and dev postgres servers. It was recently noted that one custom query was taking 6.8 hours on Dev, but only 25 minutes on Prod. The difference is SSD."

  1. Look at database cleanup/tuning

5.1. Run ANALYZE or VACUUM ANALYZE

Run ANALYZE or VACUUM ANALYZE to help PostgreSQL better understand its data and improve its query planning:

http://www.postgresql.org/docs/current/static/sql-analyze.html http://www.postgresql.org/docs/current/static/sql-analyze.html
and
http://www.postgresql.org/docs/current/static/sql-vacuum.html http://www.postgresql.org/docs/current/static/sql-vacuum.html

5.2 Tuning

You might also take a quick look at the database tuning discussions in this JIRA comment, and the ones that follow:

https://issues.collectionspace.org/browse/PAHMA-708?focusedCommentId=35282&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-35282 https://issues.collectionspace.org/browse/PAHMA-708?focusedCommentId=35282&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-35282

These are a bit more heuristic, and provide more explanation, than the rote instructions in the Tuning section of the current docs.

Ray's wrapup of much of this, from that discussion with the OMCA folks:

"I would say that searches can still be very slow for us, and we haven't had time to look into it further. We've just tried to train users to avoid searches that return a lot of results. I've mostly been waiting for the Nuxeo/ElasticSearch integration to arrive, which didn't make it to CSpace 4.2, but hopefully will make it into the following version. That will help keyword search, but maybe not term completion or field search.

"Throwing hardware at the problem definitely helps. Add as much RAM to the db server as you can, and configure pg_buffercache to take advantage of it. Put the database on the fastest disk you can. We're going to put it on SSD soon, which will help a lot. Also make sure to run ANALYZE after importing data, so your stats are up to date."

Aron

On Mon, Nov 16, 2015 at 2:26 PM, Peter Murray <pmurray@chillco.com mailto:pmurray@chillco.com> wrote:
Hmm -- that does reflect what I'm seeing, so it is comforting to know that I'm not too far out in left field.

Peter

On Nov 16, 2015, at 4:25 PM, Ray Lee <rhlee@berkeley.edu mailto:rhlee@berkeley.edu> wrote:

Hi Peter,
I don't think there are hard limits, but once you get that many records of any type (it could be authority items, procedures, or collection objects), performance takes a dive. We have to train users to avoid browsing all objects, taxon names, or any other type that has a lot of records, because the results won't show up in a reasonable time.

Ray

On Mon, Nov 16, 2015 at 1:07 PM, John B Lowe <jblowe@berkeley.edu mailto:jblowe@berkeley.edu> wrote:
Peter,

I just checked, and the largest authority in the UCB deployments appears to be the Taxon authority used in the UCJEPS Herbaria deployment.  The taxon_common table has 309,352 non-deleted rows at the moment. I think this corresponds pretty closed to "nodes in the tree", and perhaps therefore to "authority terms".

I just tried an advanced search on this authority on our Dev instance and after about 6 minutes I did get a result page.  Redo the search took about 30 seconds -- of course there was much cacheing done at that point. (Tomcat was not busy on that server, but I can't see the load on Postgres as we don't have direct access to that server).

I'll try this on UCJEPS Production after hours (I'm interested in the outcome...).

John

On Sun, Nov 15, 2015 at 6:17 PM, Peter Murray <pmurray@chillco.com mailto:pmurray@chillco.com> wrote:
Is there a limit to the number of authorities in a CollectionSpace database?  On behalf of SDMoM, I'm loading extracts of the Bureau of Geographic Names (211,330 headings) and the National Geospatial-Intelligence Agency (191,549 headings).  After doing this, the system ground to a halt on operations such as browsing the headings.  I could see that there were no errors logged in either PostgreSQL's or Tomcat's logs (or the CSpace application logs).  The 5-minute load average on the system was just below 2.0, and both PostgreSQL and Tomcat were consuming a lot of CPU.  I'm not an expert in database tuning, but I did follow the steps offered in the installation guide (https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning).

Anyone else running a system with 400,000 authority headings?

Peter

Thanks, Aron! Peter, I just want to highlight this from Aron’s note: > Chris just came by my cube and offered a potentially important suggestion regarding this: if these authority terms will mostly be searched via autocomplete / term completion fields, using anchor characters in searches (rather than relying on default behavior) can significantly speed up searching. Worth a try with your large term lists ... will be interested to know of your experience here. > > Here's a quick look at using anchor and/or wildcard characters in autocomplete, a feature which has been around for several CollectionSpace releases now: > > https://wiki.collectionspace.org/display/collectionspace/Proposed+autocomplete+enhancements+for+Release+3.3 <https://wiki.collectionspace.org/display/collectionspace/Proposed+autocomplete+enhancements+for+Release+3.3> We’ve found with our deployments that anchoring the term completion search, using the ^-carat to stand for the beginning or end of string is an absolute necessity. This helps for speed of search AND it addresses another critical problem: by default, term completion will only return 40 matches, based on random order of a keyword search. If you search term appears more than 40 results (which is likely in a large vocabulary), then you might need to anchor the search in order to retrieve the item of interest. We have had people create multiple versions of the same term because they could never retrieve the one stored in the system. Regards, Chris > On Nov 16, 2015, at 3:29 PM, Aron Roberts <aron@socrates.berkeley.edu> wrote: > > This will likely repeat a lot of what John, Ray, and Richard have said, but my two cents as well ... there's also a note here from a discussion with Chris about search training. > > A summary / tl;dr: > > 1. Elasticsearch integration may be a future help in speeding up large searches. Until then (and likely even after then ... > 2. Training users on doing effective searching can significantly improve search speed. > 3. Using just a subset of term lists can help (if it's amenable to that; it may not be here ...). > 4. Throwing hardware at the problem - particularly fast disk - can help quite a lot. > 5. Be sure that the database is cared for/tuned a bit: > 5.1 Run VACUUM/ANALYZE > 5.2 Look at other tuning tips > > Aron > > -- > > 1. Elasticsearch > > We're hoping that when CollectionSpace offers the possibility to direct certain searches to use Elasticsearch indexes (https://issues.collectionspace.org/browse/CSPACE-6760 <https://issues.collectionspace.org/browse/CSPACE-6760>), rather than querying the database, some of its slowest searches can be significantly accelerated. > > That remains to be seen, but our experiences (pioneered by John Lowe) with using Solr (roughly comparable to Elasticsearch, and both based on Apache Lucene) to handle searches in some of the CollectionSpace webapps, gives significant hope here. > > 2. Effective searching > > Until then, as Ray noted, it's desirable to avoid searching/browsing in the most problematic "all records" cases. More broadly, some thoughts from Chris and Ray from another context ... > > Chris wrote: > "Definitely don’t underestimate the importance of educating your users about ways to improve searches (e.g., usually by including as many characters as possible, though admittedly sometimes you just have to search for something having few characters)." > > And Ray noted: > "As Chris said, we've handled this with training to avoid slow searches, and designing workflows that avoid them." > > Chris just came by my cube and offered a potentially important suggestion regarding this: if these authority terms will mostly be searched via autocomplete / term completion fields, using anchor characters in searches (rather than relying on default behavior) can significantly speed up searching. Worth a try with your large term lists ... will be interested to know of your experience here. > > Here's a quick look at using anchor and/or wildcard characters in autocomplete, a feature which has been around for several CollectionSpace releases now: > > https://wiki.collectionspace.org/display/collectionspace/Proposed+autocomplete+enhancements+for+Release+3.3 <https://wiki.collectionspace.org/display/collectionspace/Proposed+autocomplete+enhancements+for+Release+3.3> > > To understand better whether this would be helpful, when you wrote, "After doing this, the system ground to a halt on operations such as browsing the headings ...", what operation(s) were you performing, exactly? > > 3. Subsets of terms > > Consider subsets of your full term list (as per Richard), if feasible. (Just read your follow-up here, about how these two particular term lists, BGN and NGA, don't lend themselves well to this ...) > > 4. Hardware upgrades (especially fast disk) > > Strongly consider hardware upgrades, particularly fast disk (e.g. SSD). > > AIUI, CollectionSpace's search performance is database-bound, and that, in turn, is generally disk-bound. (Adding RAM to your database server can also help out with its caching of indexes and in helping the OS in caching files. But starting out with fast disk, rather than additional memory, as the first incremental upgrade, seems from my colleagues' findings to be the way to approach this.) > > Per a long and fruitful email discussion - unfortunately not on the Talk list, sigh - that a number of us had with Al and Becky of OMCA, Susan Stone, et al. on May 20-21, 2015 (and from which the quotes above came), some snippets/conclusions around this: > > Ray: > "I just did a wildcard object number search on pahma that returned 14,000 records, and it took about 4 minutes. That's on our current, non-SSD database. On the SSD that query runs in 30 seconds." > > John: > "We have 15GB RAM each for our production and dev postgres servers. It was recently noted that one *custom* query was taking 6.8 hours on Dev, but only 25 minutes on Prod. The difference is SSD." > > 5. Look at database cleanup/tuning > > 5.1. Run ANALYZE or VACUUM ANALYZE > > Run ANALYZE or VACUUM ANALYZE to help PostgreSQL better understand its data and improve its query planning: > > http://www.postgresql.org/docs/current/static/sql-analyze.html <http://www.postgresql.org/docs/current/static/sql-analyze.html> > and > http://www.postgresql.org/docs/current/static/sql-vacuum.html <http://www.postgresql.org/docs/current/static/sql-vacuum.html> > > 5.2 Tuning > > You might also take a quick look at the database tuning discussions in this JIRA comment, and the ones that follow: > > https://issues.collectionspace.org/browse/PAHMA-708?focusedCommentId=35282&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-35282 <https://issues.collectionspace.org/browse/PAHMA-708?focusedCommentId=35282&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-35282> > > These are a bit more heuristic, and provide more explanation, than the rote instructions in the Tuning section of the current docs. > > Ray's wrapup of much of this, from that discussion with the OMCA folks: > > "I would say that searches can still be very slow for us, and we haven't had time to look into it further. We've just tried to train users to avoid searches that return a lot of results. I've mostly been waiting for the Nuxeo/ElasticSearch integration to arrive, which didn't make it to CSpace 4.2, but hopefully will make it into the following version. That will help keyword search, but maybe not term completion or field search. > > "Throwing hardware at the problem definitely helps. Add as much RAM to the db server as you can, and configure pg_buffercache to take advantage of it. Put the database on the fastest disk you can. We're going to put it on SSD soon, which will help a lot. Also make sure to run ANALYZE after importing data, so your stats are up to date." > > Aron > > On Mon, Nov 16, 2015 at 2:26 PM, Peter Murray <pmurray@chillco.com <mailto:pmurray@chillco.com>> wrote: > Hmm -- that does reflect what I'm seeing, so it is comforting to know that I'm not too far out in left field. > > > Peter > >> On Nov 16, 2015, at 4:25 PM, Ray Lee <rhlee@berkeley.edu <mailto:rhlee@berkeley.edu>> wrote: >> >> Hi Peter, >> I don't think there are hard limits, but once you get that many records of any type (it could be authority items, procedures, or collection objects), performance takes a dive. We have to train users to avoid browsing all objects, taxon names, or any other type that has a lot of records, because the results won't show up in a reasonable time. >> >> Ray >> >> >> On Mon, Nov 16, 2015 at 1:07 PM, John B Lowe <jblowe@berkeley.edu <mailto:jblowe@berkeley.edu>> wrote: >> Peter, >> >> I just checked, and the largest authority in the UCB deployments *appears* to be the Taxon authority used in the UCJEPS Herbaria deployment. The taxon_common table has 309,352 non-deleted rows at the moment. I *think* this corresponds pretty closed to "nodes in the tree", and perhaps therefore to "authority terms". >> >> I just tried an advanced search on this authority on our Dev instance and after about 6 minutes I did get a result page. Redo the search took about 30 seconds -- of course there was much cacheing done at that point. (Tomcat was not busy on that server, but I can't see the load on Postgres as we don't have direct access to that server). >> >> I'll try this on UCJEPS Production after hours (I'm interested in the outcome...). >> >> John >> >> >> >> On Sun, Nov 15, 2015 at 6:17 PM, Peter Murray <pmurray@chillco.com <mailto:pmurray@chillco.com>> wrote: >> Is there a limit to the number of authorities in a CollectionSpace database? On behalf of SDMoM, I'm loading extracts of the Bureau of Geographic Names (211,330 headings) and the National Geospatial-Intelligence Agency (191,549 headings). After doing this, the system ground to a halt on operations such as browsing the headings. I could see that there were no errors logged in either PostgreSQL's or Tomcat's logs (or the CSpace application logs). The 5-minute load average on the system was just below 2.0, and both PostgreSQL and Tomcat were consuming a lot of CPU. I'm not an expert in database tuning, but I did follow the steps offered in the installation guide (https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning <https://wiki.collectionspace.org/display/DOC/PostgreSQL+Installation+under+Linux#PostgreSQLInstallationunderLinux-Tuning>). >> >> Anyone else running a system with 400,000 authority headings? >> >> >> Peter > > > > -- > Peter Murray > Dev/Ops Lead and Project Manager > Cherry Hill Company > > > > _______________________________________________ > Talk mailing list > Talk@lists.collectionspace.org <mailto:Talk@lists.collectionspace.org> > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org <http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org> > > > _______________________________________________ > Talk mailing list > Talk@lists.collectionspace.org > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org