Empathy List Archives

SB

Sébastien Brossard

Wed, Oct 15, 2014 11:14 AM

Dear CS community,

We've developed here at the SMK an adapter that make the link between CS' reports and Solr. It has been in production since the beginning of September and works well.
The main idea is that instead of extracting data via SQL request, the reports fetch data from a simple call to Solr server (call to an unique URL).

The advantage is that you don't have to create and write a new SQL request in iReport each time you create a new report - and even better, you don't have to modify the SQL requests in every and each report when you have to add/modify a field. If you must change the SQL request for whatever reason, you'll have to do it only one time, in your export from Nuxeo to Solr. And all you'll have to do in the reports is to read the already formatted fields you get from Solr.

If some of you think they may be interested in this adapter, I'll be happy to contribute and add an article about how to install it in the CS' wiki (it's quite straightforward).

It's worth noting that as we're currently running Collectionspace v3.2, this adapter is not suited to v4.xx (or maybe?, but we don't have tested it)

Best regards,
Sébastien

Sébastien Brossard
IT-Udvikler
sebastien.brossard@smk.dk mailto:3sebastien.brossard@smk.dkT
Tmailto:3sebastien.brossard@smk.dkT +45 2552 7112

Statens Museum for Kunst
Sølvgade 48-50
DK-1307 København K

T +45 3374 8494
F +45 3374 8404
smk.dkhttp://smk.dk/

[Logo_mail]

Dear CS community, We've developed here at the SMK an adapter that make the link between CS' reports and Solr. It has been in production since the beginning of September and works well. The main idea is that instead of extracting data via SQL request, the reports fetch data from a simple call to Solr server (call to an unique URL). The advantage is that you don't have to create and write a new SQL request in iReport each time you create a new report - and even better, you don't have to modify the SQL requests in every and each report when you have to add/modify a field. If you must change the SQL request for whatever reason, you'll have to do it only one time, in your export from Nuxeo to Solr. And all you'll have to do in the reports is to read the already formatted fields you get from Solr. If some of you think they may be interested in this adapter, I'll be happy to contribute and add an article about how to install it in the CS' wiki (it's quite straightforward). It's worth noting that as we're currently running Collectionspace v3.2, this adapter is not suited to v4.xx (or maybe?, but we don't have tested it) Best regards, Sébastien Sébastien Brossard IT-Udvikler sebastien.brossard@smk.dk<mailto:3sebastien.brossard@smk.dkT> T<mailto:3sebastien.brossard@smk.dkT> +45 2552 7112 Statens Museum for Kunst Sølvgade 48-50 DK-1307 København K T +45 3374 8494 F +45 3374 8404 smk.dk<http://smk.dk/> [Logo_mail]

CH

Chris Hoffman

Wed, Oct 15, 2014 4:57 PM

Hi Sébastien,

We at Berkeley would definitely love to hear more about this approach! We are using Solr as the data source for the public portals we are developing (see https://ucjeps.berkeley.edu/specimens for one example). And we've wondered what would be involved in pointing iReport at this source. Performance would be much better.

By the way, it would probably be interesting to share information about how we are creating our Solr data sources from the underlying tables. We've learned a lot but I suspect we could learn much from your approach.

I'll ask others on our team to send some links to our github code and (evolving) documentation.

Thanks,
Chris
UC Berkeley

On Oct 15, 2014, at 4:14 AM, Sébastien Brossard Sebastien.Brossard@smk.dk wrote:

Dear CS community,

We’ve developed here at the SMK an adapter that make the link between CS’ reports and Solr. It has been in production since the beginning of September and works well.
The main idea is that instead of extracting data via SQL request, the reports fetch data from a simple call to Solr server (call to an unique URL).

The advantage is that you don’t have to create and write a new SQL request in iReport each time you create a new report – and even better, you don’t have to modify the SQL requests in every and each report when you have to add/modify a field. If you must change the SQL request for whatever reason, you’ll have to do it only one time, in your export from Nuxeo to Solr. And all you’ll have to do in the reports is to read the already formatted fields you get from Solr.

If some of you think they may be interested in this adapter, I’ll be happy to contribute and add an article about how to install it in the CS’ wiki (it’s quite straightforward).

It’s worth noting that as we’re currently running Collectionspace v3.2, this adapter is not suited to v4.xx (or maybe?, but we don’t have tested it)

Best regards,
Sébastien

Sébastien Brossard
IT-Udvikler
sebastien.brossard@smk.dk
T +45 2552 7112

Statens Museum for Kunst
Sølvgade 48-50
DK—1307 København K
T +45 3374 8494
F +45 3374 8404
smk.dk

<image003.jpg>

Talk mailing list
Talk@lists.collectionspace.org
http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

Hi Sébastien, We at Berkeley would definitely love to hear more about this approach! We are using Solr as the data source for the public portals we are developing (see https://ucjeps.berkeley.edu/specimens for one example). And we've wondered what would be involved in pointing iReport at this source. Performance would be much better. By the way, it would probably be interesting to share information about how we are creating our Solr data sources from the underlying tables. We've learned a lot but I suspect we could learn much from your approach. I'll ask others on our team to send some links to our github code and (evolving) documentation. Thanks, Chris UC Berkeley On Oct 15, 2014, at 4:14 AM, Sébastien Brossard <Sebastien.Brossard@smk.dk> wrote: > Dear CS community, > > We’ve developed here at the SMK an adapter that make the link between CS’ reports and Solr. It has been in production since the beginning of September and works well. > The main idea is that instead of extracting data via SQL request, the reports fetch data from a simple call to Solr server (call to an unique URL). > > The advantage is that you don’t have to create and write a new SQL request in iReport each time you create a new report – and even better, you don’t have to modify the SQL requests in every and each report when you have to add/modify a field. If you must change the SQL request for whatever reason, you’ll have to do it only one time, in your export from Nuxeo to Solr. And all you’ll have to do in the reports is to read the already formatted fields you get from Solr. > > If some of you think they may be interested in this adapter, I’ll be happy to contribute and add an article about how to install it in the CS’ wiki (it’s quite straightforward). > > It’s worth noting that as we’re currently running Collectionspace v3.2, this adapter is not suited to v4.xx (or maybe?, but we don’t have tested it) > > Best regards, > Sébastien > > > > Sébastien Brossard > IT-Udvikler > sebastien.brossard@smk.dk > T +45 2552 7112 > > Statens Museum for Kunst > Sølvgade 48-50 > DK—1307 København K > T +45 3374 8494 > F +45 3374 8404 > smk.dk > > <image003.jpg> > > > > _______________________________________________ > Talk mailing list > Talk@lists.collectionspace.org > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

JB

John B. LOWE

Wed, Oct 15, 2014 11:14 PM

Sébastian, et al.,

The code for the UCB "solr datasources" may be found at:

https://github.com/cspace-deployment/Tools/tree/master/datasources

The files for the solr multicore configuration is at:

https://github.com/cspace-deployment/Tools/tree/master/datasources/ucb

and the deployment-specific ETL code (sql queries, solr load scripts) may
be found in the institution-specific directories, e.g. for PAHMA:

https://github.com/cspace-deployment/Tools/tree/master/datasources/pahma

I'm wondering how you populate your Solr core(s) from CSpace? It seems you
must have some SQL to start with...

Cheers, great stuff, looking forward to sharing,

John

On Wed, Oct 15, 2014 at 9:57 AM, Chris Hoffman chris_h@berkeley.edu wrote:

Hi Sébastien,

We at Berkeley would definitely love to hear more about this approach! We
are using Solr as the data source for the public portals we are developing
(see https://ucjeps.berkeley.edu/specimens for one example). And we've
wondered what would be involved in pointing iReport at this source.
Performance would be much better.

By the way, it would probably be interesting to share information about
how we are creating our Solr data sources from the underlying tables.
We've learned a lot but I suspect we could learn much from your approach.

I'll ask others on our team to send some links to our github code and
(evolving) documentation.

Thanks,
Chris
UC Berkeley

On Oct 15, 2014, at 4:14 AM, Sébastien Brossard Sebastien.Brossard@smk.dk
wrote:

Dear CS community,

We’ve developed here at the SMK an adapter that make the link between CS’
reports and Solr. It has been in production since the beginning of
September and works well.
The main idea is that instead of extracting data via SQL request, the
reports fetch data from a simple call to Solr server (call to an unique
URL).

The advantage is that you don’t have to create and write a new SQL request
in iReport each time you create a new report – and even better, you don’t
have to modify the SQL requests in every and each report when you have to
add/modify a field. If you must change the SQL request for whatever reason,
you’ll have to do it only one time, in your export from Nuxeo to Solr. And
all you’ll have to do in the reports is to read the already formatted
fields you get from Solr.

If some of you think they may be interested in this adapter, I’ll be happy
to contribute and add an article about how to install it in the CS’ wiki
(it’s quite straightforward).

It’s worth noting that as we’re currently running Collectionspace v3.2,
this adapter is not suited to v4.xx (or maybe?, but we don’t have tested it)

Best regards,
Sébastien

Sébastien Brossard
IT-Udvikler
sebastien.brossard@smk.dk 3sebastien.brossard@smk.dkT
T 3sebastien.brossard@smk.dkT +45 2552 7112

Statens Museum for Kunst
Sølvgade 48-50
DK—1307 København K
T +45 3374 8494
F +45 3374 8404
smk.dk

<image003.jpg>

Talk mailing list
Talk@lists.collectionspace.org

http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

Talk mailing list
Talk@lists.collectionspace.org

http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

Sébastian, et al., The code for the UCB "solr datasources" may be found at: https://github.com/cspace-deployment/Tools/tree/master/datasources The files for the solr multicore configuration is at: https://github.com/cspace-deployment/Tools/tree/master/datasources/ucb and the deployment-specific ETL code (sql queries, solr load scripts) may be found in the institution-specific directories, e.g. for PAHMA: https://github.com/cspace-deployment/Tools/tree/master/datasources/pahma I'm wondering how you populate your Solr core(s) from CSpace? It seems you must have *some* SQL to start with... Cheers, great stuff, looking forward to sharing, John On Wed, Oct 15, 2014 at 9:57 AM, Chris Hoffman <chris_h@berkeley.edu> wrote: > Hi Sébastien, > > We at Berkeley would definitely love to hear more about this approach! We > are using Solr as the data source for the public portals we are developing > (see https://ucjeps.berkeley.edu/specimens for one example). And we've > wondered what would be involved in pointing iReport at this source. > Performance would be much better. > > By the way, it would probably be interesting to share information about > how we are creating our Solr data sources from the underlying tables. > We've learned a lot but I suspect we could learn much from your approach. > > I'll ask others on our team to send some links to our github code and > (evolving) documentation. > > Thanks, > Chris > UC Berkeley > > > On Oct 15, 2014, at 4:14 AM, Sébastien Brossard <Sebastien.Brossard@smk.dk> > wrote: > > Dear CS community, > > We’ve developed here at the SMK an adapter that make the link between CS’ > reports and Solr. It has been in production since the beginning of > September and works well. > The main idea is that instead of extracting data via SQL request, the > reports fetch data from a simple call to Solr server (call to an unique > URL). > > The advantage is that you don’t have to create and write a new SQL request > in iReport each time you create a new report – and even better, you don’t > have to modify the SQL requests in every and each report when you have to > add/modify a field. If you must change the SQL request for whatever reason, > you’ll have to do it only one time, in your export from Nuxeo to Solr. And > all you’ll have to do in the reports is to read the already formatted > fields you get from Solr. > > If some of you think they may be interested in this adapter, I’ll be happy > to contribute and add an article about how to install it in the CS’ wiki > (it’s *quite* straightforward). > > It’s worth noting that as we’re currently running Collectionspace v3.2, > this adapter is not suited to v4.xx (or maybe?, but we don’t have tested it) > > Best regards, > Sébastien > > > > *Sébastien Brossard* > *IT-Udvikler* > sebastien.brossard@smk.dk <3sebastien.brossard@smk.dkT> > T <3sebastien.brossard@smk.dkT> +45 2552 7112 > > Statens Museum for Kunst > Sølvgade 48-50 > DK—1307 København K > T +45 3374 8494 > F +45 3374 8404 > smk.dk > > <image003.jpg> > > > > _______________________________________________ > Talk mailing list > Talk@lists.collectionspace.org > > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org > > > > _______________________________________________ > Talk mailing list > Talk@lists.collectionspace.org > > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org > >

SB

Sébastien Brossard

Fri, Oct 17, 2014 8:22 AM

Hi John and Chris,

I’m currently revamping our SQL requests and Solr cores.
I’ll put material online on our github sometime next week and I’ll write back to you.

Cheers,
Sébastien

Fra: John B. LOWE [mailto:jblowe@berkeley.edu]
Sendt: 16. oktober 2014 01:14
Til: Chris Hoffman
Cc: Sébastien Brossard; talk@lists.collectionspace.org
Emne: Re: [Talk] Contribution - CS' reports and Solr - Collectionspace v 3.2

Sébastian, et al.,

The code for the UCB "solr datasources" may be found at:

https://github.com/cspace-deployment/Tools/tree/master/datasources

The files for the solr multicore configuration is at:

https://github.com/cspace-deployment/Tools/tree/master/datasources/ucb

and the deployment-specific ETL code (sql queries, solr load scripts) may be found in the institution-specific directories, e.g. for PAHMA:

https://github.com/cspace-deployment/Tools/tree/master/datasources/pahma

I'm wondering how you populate your Solr core(s) from CSpace? It seems you must have some SQL to start with...

Cheers, great stuff, looking forward to sharing,

John

On Wed, Oct 15, 2014 at 9:57 AM, Chris Hoffman <chris_h@berkeley.edu mailto:chris_h@berkeley.edu> wrote:
Hi Sébastien,

We at Berkeley would definitely love to hear more about this approach! We are using Solr as the data source for the public portals we are developing (see https://ucjeps.berkeley.edu/specimens for one example). And we've wondered what would be involved in pointing iReport at this source. Performance would be much better.

By the way, it would probably be interesting to share information about how we are creating our Solr data sources from the underlying tables. We've learned a lot but I suspect we could learn much from your approach.

I'll ask others on our team to send some links to our github code and (evolving) documentation.

Thanks,
Chris
UC Berkeley

On Oct 15, 2014, at 4:14 AM, Sébastien Brossard <Sebastien.Brossard@smk.dk mailto:Sebastien.Brossard@smk.dk> wrote:

Dear CS community,

We’ve developed here at the SMK an adapter that make the link between CS’ reports and Solr. It has been in production since the beginning of September and works well.
The main idea is that instead of extracting data via SQL request, the reports fetch data from a simple call to Solr server (call to an unique URL).

The advantage is that you don’t have to create and write a new SQL request in iReport each time you create a new report – and even better, you don’t have to modify the SQL requests in every and each report when you have to add/modify a field. If you must change the SQL request for whatever reason, you’ll have to do it only one time, in your export from Nuxeo to Solr. And all you’ll have to do in the reports is to read the already formatted fields you get from Solr.

If some of you think they may be interested in this adapter, I’ll be happy to contribute and add an article about how to install it in the CS’ wiki (it’s quite straightforward).

It’s worth noting that as we’re currently running Collectionspace v3.2, this adapter is not suited to v4.xx (or maybe?, but we don’t have tested it)

Best regards,
Sébastien

Sébastien Brossard
IT-Udvikler
sebastien.brossard@smk.dk mailto:3sebastien.brossard@smk.dkT
Tmailto:3sebastien.brossard@smk.dkT +45 2552 7112

Statens Museum for Kunst
Sølvgade 48-50
DK—1307 København K

T +45 3374 8494
F +45 3374 8404
smk.dkhttp://smk.dk/

<image003.jpg>

Talk mailing list
Talk@lists.collectionspace.org mailto:Talk@lists.collectionspace.org
http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

Hi John and Chris, I’m currently revamping our SQL requests and Solr cores. I’ll put material online on our github sometime next week and I’ll write back to you. Cheers, Sébastien Fra: John B. LOWE [mailto:jblowe@berkeley.edu] Sendt: 16. oktober 2014 01:14 Til: Chris Hoffman Cc: Sébastien Brossard; talk@lists.collectionspace.org Emne: Re: [Talk] Contribution - CS' reports and Solr - Collectionspace v 3.2 Sébastian, et al., The code for the UCB "solr datasources" may be found at: https://github.com/cspace-deployment/Tools/tree/master/datasources The files for the solr multicore configuration is at: https://github.com/cspace-deployment/Tools/tree/master/datasources/ucb and the deployment-specific ETL code (sql queries, solr load scripts) may be found in the institution-specific directories, e.g. for PAHMA: https://github.com/cspace-deployment/Tools/tree/master/datasources/pahma I'm wondering how you populate your Solr core(s) from CSpace? It seems you must have *some* SQL to start with... Cheers, great stuff, looking forward to sharing, John On Wed, Oct 15, 2014 at 9:57 AM, Chris Hoffman <chris_h@berkeley.edu<mailto:chris_h@berkeley.edu>> wrote: Hi Sébastien, We at Berkeley would definitely love to hear more about this approach! We are using Solr as the data source for the public portals we are developing (see https://ucjeps.berkeley.edu/specimens for one example). And we've wondered what would be involved in pointing iReport at this source. Performance would be much better. By the way, it would probably be interesting to share information about how we are creating our Solr data sources from the underlying tables. We've learned a lot but I suspect we could learn much from your approach. I'll ask others on our team to send some links to our github code and (evolving) documentation. Thanks, Chris UC Berkeley On Oct 15, 2014, at 4:14 AM, Sébastien Brossard <Sebastien.Brossard@smk.dk<mailto:Sebastien.Brossard@smk.dk>> wrote: Dear CS community, We’ve developed here at the SMK an adapter that make the link between CS’ reports and Solr. It has been in production since the beginning of September and works well. The main idea is that instead of extracting data via SQL request, the reports fetch data from a simple call to Solr server (call to an unique URL). The advantage is that you don’t have to create and write a new SQL request in iReport each time you create a new report – and even better, you don’t have to modify the SQL requests in every and each report when you have to add/modify a field. If you must change the SQL request for whatever reason, you’ll have to do it only one time, in your export from Nuxeo to Solr. And all you’ll have to do in the reports is to read the already formatted fields you get from Solr. If some of you think they may be interested in this adapter, I’ll be happy to contribute and add an article about how to install it in the CS’ wiki (it’s quite straightforward). It’s worth noting that as we’re currently running Collectionspace v3.2, this adapter is not suited to v4.xx (or maybe?, but we don’t have tested it) Best regards, Sébastien Sébastien Brossard IT-Udvikler sebastien.brossard@smk.dk<mailto:3sebastien.brossard@smk.dkT> T<mailto:3sebastien.brossard@smk.dkT> +45 2552 7112 Statens Museum for Kunst Sølvgade 48-50 DK—1307 København K T +45 3374 8494 F +45 3374 8404 smk.dk<http://smk.dk/> <image003.jpg> _______________________________________________ Talk mailing list Talk@lists.collectionspace.org<mailto:Talk@lists.collectionspace.org> http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org _______________________________________________ Talk mailing list Talk@lists.collectionspace.org<mailto:Talk@lists.collectionspace.org> http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

SB

Sébastien Brossard

Thu, Oct 30, 2014 3:16 PM

Dear all,

Thank you for your email, and sorry for the delayed response, I had to revamp our Solr cores and it took longer than expected.
I can see from your configuration files that we have a very different approach – let say that ours is a bit more handmade…

¤¤

As John guessed, we run each night a huge SQL requesthttps://github.com/SebSMK/prod_SQL_full_export/blob/master/sql/CS2Solr_DIH_full_import.sql (1500 lines) on our master Solr core (called Prod_SQL_full_export in the schema attached).
We used in first place a SQL export to a .csv file that we then imported into Solr – too heavy, no flexibility, too slow. We dropped it and used Solr’s SQL DataImportHandler (DIHhttp://wiki.apache.org/solr/DataImportHandler#Usage_with_RDBMS) instead of.
I can’t say it’s not heavy, but greatly flexible and reasonably fast – the export takes approx. 5 hours each night, 65.000 objects / 90 fields each (obs.: as you can notice, only a selected set of CS’ data are exported ).
This approach works because of the relatively limited set of objects – I can’t swear it would work with an 1.000.000 pieces collection.
Here is the data-config.xmlhttps://github.com/SebSMK/prod_SQL_full_export/blob/master/solr%20conf/conf/data-config.xml – it makes extensive use of DIH’s ScriptTransformerhttp://wiki.apache.org/solr/DataImportHandler#ScriptTransformer – great, very flexible tool.

Once the master core is built, building the other –slave- cores is piece of cake.
We use this time DIH’s SolrEntityProcessorhttp://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor so that to transfer data from the master core to slave cores. Again, it requires some ScriptTransformer, that we use in order to modify/add/delete fields from the master, depending on the needs.
Each export to slaves takes less than 10 minutes.

¤¤

One of the slave Solr core is dedicated to CS’ reports (Prod_collectionspace_reports in the attached schema).
Obs: this core uses also some SQL import so that to perform live update when data has been changed in Collectionspace, that’s a special case.

The idea was, with some minor modifications to the CS jasper reports files, to use Solr as an unique datasource to all reports in CS instead of writing a SQL request for every and each of them.
In order to do so, we developed a Solr-JasperReports adapter (compiled jar herehttps://bitbucket.org/christopher_pott/corpus_reports/src/c2105271d4bca9457e36a17afe497e9f502a0fb5/solr/jar/jasperCollectionSpaceDataSource.jar?at=master, code herehttps://github.com/SebSMK/JRCollectionSpaceQuery – be careful sensitive gifted developers: raw and quite dirty coding here, no maven, not optimized, some part of the code are not used any more – but it works) – information on the subject is scarce, iReport Ultimate Guidehttp://community.jaspersoft.com/documentation/ireport-ultimate-guide helped a lot.

· So that to use this adapter in iReport (iReport v4.5.1 – we are currently running Collectionspace 3.2.1, and we don’t have tested this adapter on other versions):

Tools > options -> ireport > tab:classpath
add the path to the .jar file then OK

Tools > options -> ireport > tab:query executers
Add a query executer with the following properties:
Language: collectionspace
Factory class: org.jasper.collectionspace.smk.executer.CollectionSpaceQueryExecuterFactory
Fields provider class: com.jaspersoft.ireport.designer.data.fieldsproviders.SQLFieldsProvider

· So that to use this adapter with Collectionspace, here’s a copy of our intern wiki – you’ll find also some example of jasper Reports files running with Solr.

It may look a bit heavy but is in fact quite straightforward :

Collectionspace server setup
1- Stop Collectionspace server
2- Copy (or replace) all .jasper files + subdirectories
from https://bitbucket.org/christopher_pott/corpus_reports/solr/jasper https://bitbucket.org/christopher_pott/corpus_reports/src/c2105271d4bca9457e36a17afe497e9f502a0fb5/solr/jasper/?at=master
to $CATALINA_HOME/webapps/cspace/reports/.
[http://corpuswiki.smk.dk/confluence/images/icons/emoticons/forbidden.gif]

You'll have to repeat the steps below each time Collectionspace's services will be recompiled

¤¤¤¤¤¤¤¤¤¤¤¤¤¤

Even if it has been in production without any problems for 3 months now, I still call this development a beta-version.
There’s room for a lot of improvements, e.g. SQL optimization, ScriptTransformer in .jar, adapter’s code revamping, a better update system between the different Solr cores and nuxeo etc.

But it seems like you guys know a lot about Solr (and of course CSpace!), so if you’re interested in this solution and rework it, I’m quite sure you can take advantage of it.

By the way, I’ve got a question: do you know how to send a parameter to jasper from Collectionspace? We’re currently sending csid and tenant, I would also like to send Solr’s address to our reports. It will prevent to write Solr address in each report… I had a look to the code and I could implement a homemade solution again, but it would be even better to do it the right way.

Thank you by advance for your help!
Feel free to ask if you need further information about Solr/Cs/adapter and so on.

Cheers,
Sébastien

Fra: John B. LOWE [mailto:jblowe@berkeley.edu]
Sendt: 16. oktober 2014 01:14
Til: Chris Hoffman
Cc: Sébastien Brossard; talk@lists.collectionspace.org mailto:talk@lists.collectionspace.org
Emne: Re: [Talk] Contribution - CS' reports and Solr - Collectionspace v 3.2

Sébastian, et al.,

The code for the UCB "solr datasources" may be found at:

https://github.com/cspace-deployment/Tools/tree/master/datasources

The files for the solr multicore configuration is at:

https://github.com/cspace-deployment/Tools/tree/master/datasources/ucb

and the deployment-specific ETL code (sql queries, solr load scripts) may be found in the institution-specific directories, e.g. for PAHMA:

https://github.com/cspace-deployment/Tools/tree/master/datasources/pahma

I'm wondering how you populate your Solr core(s) from CSpace? It seems you must have some SQL to start with...

Cheers, great stuff, looking forward to sharing,

John

On Wed, Oct 15, 2014 at 9:57 AM, Chris Hoffman <chris_h@berkeley.edu mailto:chris_h@berkeley.edu> wrote:
Hi Sébastien,

We at Berkeley would definitely love to hear more about this approach! We are using Solr as the data source for the public portals we are developing (see https://ucjeps.berkeley.edu/specimens for one example). And we've wondered what would be involved in pointing iReport at this source. Performance would be much better.

By the way, it would probably be interesting to share information about how we are creating our Solr data sources from the underlying tables. We've learned a lot but I suspect we could learn much from your approach.

I'll ask others on our team to send some links to our github code and (evolving) documentation.

Thanks,
Chris
UC Berkeley

On Oct 15, 2014, at 4:14 AM, Sébastien Brossard <Sebastien.Brossard@smk.dk mailto:Sebastien.Brossard@smk.dk> wrote:

Dear CS community,

We’ve developed here at the SMK an adapter that make the link between CS’ reports and Solr. It has been in production since the beginning of September and works well.
The main idea is that instead of extracting data via SQL request, the reports fetch data from a simple call to Solr server (call to an unique URL).

The advantage is that you don’t have to create and write a new SQL request in iReport each time you create a new report – and even better, you don’t have to modify the SQL requests in every and each report when you have to add/modify a field. If you must change the SQL request for whatever reason, you’ll have to do it only one time, in your export from Nuxeo to Solr. And all you’ll have to do in the reports is to read the already formatted fields you get from Solr.

If some of you think they may be interested in this adapter, I’ll be happy to contribute and add an article about how to install it in the CS’ wiki (it’s quite straightforward).

It’s worth noting that as we’re currently running Collectionspace v3.2, this adapter is not suited to v4.xx (or maybe?, but we don’t have tested it)

Best regards,
Sébastien

Sébastien Brossard
IT-Udvikler
sebastien.brossard@smk.dk mailto:3sebastien.brossard@smk.dkT
Tmailto:3sebastien.brossard@smk.dkT +45 2552 7112

Statens Museum for Kunst
Sølvgade 48-50
DK—1307 København K

T +45 3374 8494
F +45 3374 8404
smk.dkhttp://smk.dk/

<image003.jpg>

Talk mailing list
Talk@lists.collectionspace.org mailto:Talk@lists.collectionspace.org
http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

Dear all, Thank you for your email, and sorry for the delayed response, I had to revamp our Solr cores and it took longer than expected. I can see from your configuration files that we have a very different approach – let say that ours is a bit more handmade… ¤¤ As John guessed, we run each night a huge SQL request<https://github.com/SebSMK/prod_SQL_full_export/blob/master/sql/CS2Solr_DIH_full_import.sql> (1500 lines) on our master Solr core (called Prod_SQL_full_export in the schema attached). We used in first place a SQL export to a .csv file that we then imported into Solr – too heavy, no flexibility, too slow. We dropped it and used Solr’s SQL DataImportHandler (DIH<http://wiki.apache.org/solr/DataImportHandler#Usage_with_RDBMS>) instead of. I can’t say it’s not heavy, but greatly flexible and reasonably fast – the export takes approx. 5 hours each night, 65.000 objects / 90 fields each (obs.: as you can notice, only a selected set of CS’ data are exported ). This approach works because of the relatively limited set of objects – I can’t swear it would work with an 1.000.000 pieces collection. Here is the data-config.xml<https://github.com/SebSMK/prod_SQL_full_export/blob/master/solr%20conf/conf/data-config.xml> – it makes extensive use of DIH’s ScriptTransformer<http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer> – great, very flexible tool. Once the master core is built, building the other –slave- cores is piece of cake. We use this time DIH’s SolrEntityProcessor<http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor> so that to transfer data from the master core to slave cores. Again, it requires some ScriptTransformer, that we use in order to modify/add/delete fields from the master, depending on the needs. Each export to slaves takes less than 10 minutes. ¤¤ One of the slave Solr core is dedicated to CS’ reports (Prod_collectionspace_reports in the attached schema). Obs: this core uses also some SQL import so that to perform live update when data has been changed in Collectionspace, that’s a special case. The idea was, with some minor modifications to the CS jasper reports files, to use Solr as an unique datasource to all reports in CS instead of writing a SQL request for every and each of them. In order to do so, we developed a Solr-JasperReports adapter (compiled jar here<https://bitbucket.org/christopher_pott/corpus_reports/src/c2105271d4bca9457e36a17afe497e9f502a0fb5/solr/jar/jasperCollectionSpaceDataSource.jar?at=master>, code here<https://github.com/SebSMK/JRCollectionSpaceQuery> – be careful sensitive gifted developers: raw and quite dirty coding here, no maven, not optimized, some part of the code are not used any more – but it works) – information on the subject is scarce, iReport Ultimate Guide<http://community.jaspersoft.com/documentation/ireport-ultimate-guide> helped a lot. · So that to use this adapter in iReport (iReport v4.5.1 – we are currently running Collectionspace 3.2.1, and we don’t have tested this adapter on other versions): Tools > options -> ireport > tab:classpath add the path to the .jar file then OK Tools > options -> ireport > tab:query executers Add a query executer with the following properties: Language: collectionspace Factory class: org.jasper.collectionspace.smk.executer.CollectionSpaceQueryExecuterFactory Fields provider class: com.jaspersoft.ireport.designer.data.fieldsproviders.SQLFieldsProvider · So that to use this adapter with Collectionspace, here’s a copy of our intern wiki – you’ll find also some example of jasper Reports files running with Solr. It may look a bit heavy but is in fact quite straightforward : Collectionspace server setup 1- Stop Collectionspace server 2- Copy (or replace) all .jasper files + subdirectories from https://bitbucket.org/christopher_pott/corpus_reports/solr/jasper<https://bitbucket.org/christopher_pott/corpus_reports/src/c2105271d4bca9457e36a17afe497e9f502a0fb5/solr/jasper/?at=master> to $CATALINA_HOME/webapps/cspace/reports/. [http://corpuswiki.smk.dk/confluence/images/icons/emoticons/forbidden.gif] You'll have to repeat the steps below each time Collectionspace's services will be recompiled 3- Copy jasperCollectionSpaceDataSource.jar (or compil code from https://github.com/SebSMK/JRCollectionSpaceQuery) from https://bitbucket.org/christopher_pott/corpus_reports/solr/jar<https://bitbucket.org/christopher_pott/corpus_reports/src/c2105271d4bca9457e36a17afe497e9f502a0fb5/solr/jar/?at=master>) to $CATALINA_HOME/webapps/cspace-services/WEB-INF/lib 4- Rename commons-beanutils-1.6.1.jar in $CATALINA_HOME/webapps/cspace-services/WEB-INF/lib 5- Copy commons-beanutils-1.9.1.jar (backward compatibility stated here: http://commons.apache.org/proper/commons-beanutils/index.html and tested) from https://bitbucket.org/christopher_pott/corpus_reports/solr/jar<https://bitbucket.org/christopher_pott/corpus_reports/src/c2105271d4bca9457e36a17afe497e9f502a0fb5/solr/jar/?at=master> to $CATALINA_HOME/webapps/cspace-services/WEB-INF/lib 6- Copy jasperreports-extensions-3.5.3.jar from https://bitbucket.org/christopher_pott/corpus_reports/solr/jar<https://bitbucket.org/christopher_pott/corpus_reports/src/c2105271d4bca9457e36a17afe497e9f502a0fb5/solr/jar/?at=master> to $CATALINA_HOME/webapps/cspace-services/WEB-INF/lib 7- Copy jackson-all-1.9.11.jar from https://bitbucket.org/christopher_pott/corpus_reports/solr/jar<https://bitbucket.org/christopher_pott/corpus_reports/src/c2105271d4bca9457e36a17afe497e9f502a0fb5/solr/jar/?at=master> to $CATALINA_HOME/lib 8- Create or modify jasperreports.properties in $CATALINA_HOME/webapps/cspace-services/WEB-INF/classes and insert this line: net.sf.jasperreports.query.executer.factory.collectionspace=org.jasper.collectionspace.smk.executer.CollectionSpaceQueryExecuterFactory 9- Restart Collectionspace server Fonts setup 1- Copy all .ttf files from https://bitbucket.org/christopher_pott/corpus_reports/solr/fonts<https://bitbucket.org/christopher_pott/corpus_reports/src/c2105271d4bca9457e36a17afe497e9f502a0fb5/solr/fonts/?at=master> to $JAVA_HOME/jre/lib/fonts/. 2- Copy all jasper_font*.jar from https://bitbucket.org/christopher_pott/corpus_reports/solr/jar<https://bitbucket.org/christopher_pott/corpus_reports/src/c2105271d4bca9457e36a17afe497e9f502a0fb5/solr/jar/?at=master> into $CATALINA_HOME/webapps/cspace-services/WEB-INF/lib ¤¤¤¤¤¤¤¤¤¤¤¤¤¤ Even if it has been in production without any problems for 3 months now, I still call this development a beta-version. There’s room for a lot of improvements, e.g. SQL optimization, ScriptTransformer in .jar, adapter’s code revamping, a better update system between the different Solr cores and nuxeo etc. But it seems like you guys know a lot about Solr (and of course CSpace!), so if you’re interested in this solution and rework it, I’m quite sure you can take advantage of it. By the way, I’ve got a question: do you know how to send a parameter to jasper from Collectionspace? We’re currently sending csid and tenant, I would also like to send Solr’s address to our reports. It will prevent to write Solr address in each report… I had a look to the code and I could implement a homemade solution again, but it would be even better to do it the right way. Thank you by advance for your help! Feel free to ask if you need further information about Solr/Cs/adapter and so on. Cheers, Sébastien ----------------------------------------------------------------------------------------------------------------- Fra: John B. LOWE [mailto:jblowe@berkeley.edu] Sendt: 16. oktober 2014 01:14 Til: Chris Hoffman Cc: Sébastien Brossard; talk@lists.collectionspace.org<mailto:talk@lists.collectionspace.org> Emne: Re: [Talk] Contribution - CS' reports and Solr - Collectionspace v 3.2 Sébastian, et al., The code for the UCB "solr datasources" may be found at: https://github.com/cspace-deployment/Tools/tree/master/datasources The files for the solr multicore configuration is at: https://github.com/cspace-deployment/Tools/tree/master/datasources/ucb and the deployment-specific ETL code (sql queries, solr load scripts) may be found in the institution-specific directories, e.g. for PAHMA: https://github.com/cspace-deployment/Tools/tree/master/datasources/pahma I'm wondering how you populate your Solr core(s) from CSpace? It seems you must have *some* SQL to start with... Cheers, great stuff, looking forward to sharing, John On Wed, Oct 15, 2014 at 9:57 AM, Chris Hoffman <chris_h@berkeley.edu<mailto:chris_h@berkeley.edu>> wrote: Hi Sébastien, We at Berkeley would definitely love to hear more about this approach! We are using Solr as the data source for the public portals we are developing (see https://ucjeps.berkeley.edu/specimens for one example). And we've wondered what would be involved in pointing iReport at this source. Performance would be much better. By the way, it would probably be interesting to share information about how we are creating our Solr data sources from the underlying tables. We've learned a lot but I suspect we could learn much from your approach. I'll ask others on our team to send some links to our github code and (evolving) documentation. Thanks, Chris UC Berkeley On Oct 15, 2014, at 4:14 AM, Sébastien Brossard <Sebastien.Brossard@smk.dk<mailto:Sebastien.Brossard@smk.dk>> wrote: Dear CS community, We’ve developed here at the SMK an adapter that make the link between CS’ reports and Solr. It has been in production since the beginning of September and works well. The main idea is that instead of extracting data via SQL request, the reports fetch data from a simple call to Solr server (call to an unique URL). The advantage is that you don’t have to create and write a new SQL request in iReport each time you create a new report – and even better, you don’t have to modify the SQL requests in every and each report when you have to add/modify a field. If you must change the SQL request for whatever reason, you’ll have to do it only one time, in your export from Nuxeo to Solr. And all you’ll have to do in the reports is to read the already formatted fields you get from Solr. If some of you think they may be interested in this adapter, I’ll be happy to contribute and add an article about how to install it in the CS’ wiki (it’s quite straightforward). It’s worth noting that as we’re currently running Collectionspace v3.2, this adapter is not suited to v4.xx (or maybe?, but we don’t have tested it) Best regards, Sébastien Sébastien Brossard IT-Udvikler sebastien.brossard@smk.dk<mailto:3sebastien.brossard@smk.dkT> T<mailto:3sebastien.brossard@smk.dkT> +45 2552 7112 Statens Museum for Kunst Sølvgade 48-50 DK—1307 København K T +45 3374 8494 F +45 3374 8404 smk.dk<http://smk.dk/> <image003.jpg> _______________________________________________ Talk mailing list Talk@lists.collectionspace.org<mailto:Talk@lists.collectionspace.org> http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org _______________________________________________ Talk mailing list Talk@lists.collectionspace.org<mailto:Talk@lists.collectionspace.org> http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

AR

Aron Roberts

Thu, Oct 30, 2014 4:31 PM

On Thu, Oct 30, 2014 at 8:16 AM, Sébastien Brossard <
Sebastien.Brossard@smk.dk> wrote:

By the way, I’ve got a question: do you know how to send a parameter to
jasper from Collectionspace? We’re currently sending csid and tenant, I
would also like to send Solr’s address to our reports. It will prevent to
write Solr address in each report… I had a look to the code and I could
implement a homemade solution again, but it would be even better to do it
the right way.

From a quick glance at JIRA, this looks like "work to be done" - a place

where a contribution would be invited:
http://issues.collectionspace.org/browse/CSPACE-4207

In UC Berkeley implementations, it seems that John worked around that via
webapps; e.g.:
http://issues.collectionspace.org/browse/PAHMA-818

Aron

On Thu, Oct 30, 2014 at 8:16 AM, Sébastien Brossard < Sebastien.Brossard@smk.dk> wrote: > > By the way, I’ve got a question: do you know how to send a parameter to > jasper from Collectionspace? We’re currently sending csid and tenant, I > would also like to send Solr’s address to our reports. It will prevent to > write Solr address in each report… I had a look to the code and I could > implement a homemade solution again, but it would be even better to do it > the right way. > >From a quick glance at JIRA, this looks like "work to be done" - a place where a contribution would be invited: http://issues.collectionspace.org/browse/CSPACE-4207 In UC Berkeley implementations, it seems that John worked around that via webapps; e.g.: http://issues.collectionspace.org/browse/PAHMA-818 Aron

SB

Sébastien Brossard

Fri, Oct 31, 2014 2:00 PM

Hi Aron,

Thank you very much for your answer.

Sébastien

Fra: Aron Roberts [mailto:aronroberts@gmail.com]
Sendt: 30. oktober 2014 17:31
Til: Sébastien Brossard
Cc: John B. LOWE; Chris Hoffman; talk@lists.collectionspace.org
Emne: Re: [Talk] Contribution - CS' reports and Solr - Collectionspace v 3.2

On Thu, Oct 30, 2014 at 8:16 AM, Sébastien Brossard <Sebastien.Brossard@smk.dk mailto:Sebastien.Brossard@smk.dk> wrote:

By the way, I’ve got a question: do you know how to send a parameter to jasper from Collectionspace? We’re currently sending csid and tenant, I would also like to send Solr’s address to our reports. It will prevent to write Solr address in each report… I had a look to the code and I could implement a homemade solution again, but it would be even better to do it the right way.

From a quick glance at JIRA, this looks like "work to be done" - a place where a contribution would be invited:
http://issues.collectionspace.org/browse/CSPACE-4207

In UC Berkeley implementations, it seems that John worked around that via webapps; e.g.:
http://issues.collectionspace.org/browse/PAHMA-818

Aron

Hi Aron, Thank you very much for your answer. Sébastien Fra: Aron Roberts [mailto:aronroberts@gmail.com] Sendt: 30. oktober 2014 17:31 Til: Sébastien Brossard Cc: John B. LOWE; Chris Hoffman; talk@lists.collectionspace.org Emne: Re: [Talk] Contribution - CS' reports and Solr - Collectionspace v 3.2 On Thu, Oct 30, 2014 at 8:16 AM, Sébastien Brossard <Sebastien.Brossard@smk.dk<mailto:Sebastien.Brossard@smk.dk>> wrote: By the way, I’ve got a question: do you know how to send a parameter to jasper from Collectionspace? We’re currently sending csid and tenant, I would also like to send Solr’s address to our reports. It will prevent to write Solr address in each report… I had a look to the code and I could implement a homemade solution again, but it would be even better to do it the right way. From a quick glance at JIRA, this looks like "work to be done" - a place where a contribution would be invited: http://issues.collectionspace.org/browse/CSPACE-4207 In UC Berkeley implementations, it seems that John worked around that via webapps; e.g.: http://issues.collectionspace.org/browse/PAHMA-818 Aron