WE HAVE SUNSET THIS LISTSERV - Join us at collectionspace@lyrasislists.org
View all threadsHi Everyone,
I've been working on a means of converting metadata from a provided
excel spreadsheet into xml code to be sent off via curl statements for
importing new records. I've been looking to see if there's any
established programs or tools that streamline this process to work with
excel spreadsheets, but the only one I found a hint of this was here:
https://wiki.collectionspace.org/display/deploy/Data+import
It mentions that "They add bar code numbers, accession numbers, and
other required data to the spreadsheet, reformat the information to a
required format and send the file to a programmer who can import the
data using data loading programs." I'm guessing for UCJEPS they have an
internal program they've made to parse the excel metadata.
So what I've done is with the spreadsheet of data (each column
representing a schema field, like the title or ID for cataloging
records), I've created a separate sheet, where it basically sets up a
template of XML code (each column spits out basically a line of XML code
you would see from a typical import XML file), and references the
metadata sheet to populate the blanks in the XML template code. Like if
I want to import a new cataloging record, I would have a column in my
new sheet called "objectNumber", which would generate the xml line of
code for
<collectionobjects_common:objectNumber>123456</collectionobjects_common:objectNumber>
in a cell below that column. I then accumulate all of the spreadsheet
cells that contains a line of XML code, and put them all together into
one cell in another seperate sheet, so it has all of the XML import code
there. From there you export that sheet with the lone cell of XML code
as an XML document. However the problem I run into this is that this is
still a tremendous amount of work to manually setup the XML template
code, and this is highly dependent that the metadata sheet format needs
to stay the same as the XML code generation is highly dependent on which
cells the correct data lies in. There is also topic of importing records
alongside related records. Like lets say you want to import a cataloging
record, but also want to upload a related Acquisition record. This then
involves having to setup another sheet to generate XML code for the
relations record, and then having to make sure that you send the curl
statements of importing the cataloging record and the acquisition
record, before doing the curl statement for the relation one.
This is all very doable, but it also just made me wonder if there is a
simpler means or tool that already exist (that I may not be using) that
would help translate excel metadata to XML import code. Or is what I
listed pretty much the right track on doing this? Just was wondering on
any input.
Thank you,
Peter Tucker
Hi Peter,
Thanks very much for sharing your thoughts, and a description of your
experiments, regarding importing Excel-based data into CollectionSpace.
There may well be others doing this, and I'm hoping they respond. To
start things off, last June, John Lowe shared this prototype project which,
during one point in its workflow, creates XML import records from CSV via a
template:
https://github.com/cspace-deployment/Tools/tree/master/devops/helpers
At a quick glance, some code relevant to that part of the workflow might
include:
https://github.com/cspace-deployment/Tools/blob/master/devops/helpers/loadCspace.py#L84-L90
and
https://github.com/cspace-deployment/Tools/blob/master/devops/helpers/loadCspace.py#L133-L161
as well as the template file itself:
https://github.com/cspace-deployment/Tools/blob/master/devops/helpers/collectionobject.xml
In case this might help generate approaches/inspiration ...
Aron
On Thu, Apr 28, 2016 at 5:45 PM, Peter Tucker <
peter.tucker@granitehorizon.com> wrote:
Hi Everyone,
I've been working on a means of converting metadata from a provided excel
spreadsheet into xml code to be sent off via curl statements for importing
new records. I've been looking to see if there's any established programs
or tools that streamline this process to work with excel spreadsheets, but
the only one I found a hint of this was here:
https://wiki.collectionspace.org/display/deploy/Data+import
It mentions that "They add bar code numbers, accession numbers, and other
required data to the spreadsheet, reformat the information to a required
format and send the file to a programmer who can import the data using data
loading programs." I'm guessing for UCJEPS they have an internal program
they've made to parse the excel metadata.
So what I've done is with the spreadsheet of data (each column
representing a schema field, like the title or ID for cataloging records),
I've created a separate sheet, where it basically sets up a template of XML
code (each column spits out basically a line of XML code you would see from
a typical import XML file), and references the metadata sheet to populate
the blanks in the XML template code. Like if I want to import a new
cataloging record, I would have a column in my new sheet called
"objectNumber", which would generate the xml line of code for
<collectionobjects_common:objectNumber>123456</collectionobjects_common:objectNumber>
in a cell below that column. I then accumulate all of the spreadsheet cells
that contains a line of XML code, and put them all together into one cell
in another seperate sheet, so it has all of the XML import code there. From
there you export that sheet with the lone cell of XML code as an XML
document. However the problem I run into this is that this is still a
tremendous amount of work to manually setup the XML template code, and this
is highly dependent that the metadata sheet format needs to stay the same
as the XML code generation is highly dependent on which cells the correct
data lies in. There is also topic of importing records alongside related
records. Like lets say you want to import a cataloging record, but also
want to upload a related Acquisition record. This then involves having to
setup another sheet to generate XML code for the relations record, and then
having to make sure that you send the curl statements of importing the
cataloging record and the acquisition record, before doing the curl
statement for the relation one.
This is all very doable, but it also just made me wonder if there is a
simpler means or tool that already exist (that I may not be using) that
would help translate excel metadata to XML import code. Or is what I listed
pretty much the right track on doing this? Just was wondering on any input.
Thank you,
Peter Tucker
Talk mailing list
Talk@lists.collectionspace.org
http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org
Peter,
Also, Mark Cooper of LYRASIS has graciously shared this tool he developed for working with CollectionSpace:
https://github.com/lyrasis/csible
With that tool, there's a single command to get a list of all records and a set of their properties returned by the API output to CSV:
rake cs:get:list[collectionobjects,"wf_deleted=false"]
-Richard
From: Talk talk-bounces@lists.collectionspace.org on behalf of Aron Roberts aron@socrates.berkeley.edu
Sent: Friday, April 29, 2016 11:23 AM
To: Peter Tucker
Cc: talk@lists.collectionspace.org
Subject: Re: [Talk] Importing Data via spreadsheets
Hi Peter,
Thanks very much for sharing your thoughts, and a description of your experiments, regarding importing Excel-based data into CollectionSpace.
There may well be others doing this, and I'm hoping they respond. To start things off, last June, John Lowe shared this prototype project which, during one point in its workflow, creates XML import records from CSV via a template:
https://github.com/cspace-deployment/Tools/tree/master/devops/helpers
cspace-deployment/Toolshttps://github.com/cspace-deployment/Tools/tree/master/devops/helpers
github.com
Tools - Additional software, outside of the "core" CSpace code, but useful for deployment, testing, reporting, and so on.
At a quick glance, some code relevant to that part of the workflow might include:
https://github.com/cspace-deployment/Tools/blob/master/devops/helpers/loadCspace.py#L84-L90
and
https://github.com/cspace-deployment/Tools/blob/master/devops/helpers/loadCspace.py#L133-L161
as well as the template file itself:
https://github.com/cspace-deployment/Tools/blob/master/devops/helpers/collectionobject.xml
In case this might help generate approaches/inspiration ...
Aron
On Thu, Apr 28, 2016 at 5:45 PM, Peter Tucker <peter.tucker@granitehorizon.commailto:peter.tucker@granitehorizon.com> wrote:
Hi Everyone,
I've been working on a means of converting metadata from a provided excel spreadsheet into xml code to be sent off via curl statements for importing new records. I've been looking to see if there's any established programs or tools that streamline this process to work with excel spreadsheets, but the only one I found a hint of this was here: https://wiki.collectionspace.org/display/deploy/Data+import
It mentions that "They add bar code numbers, accession numbers, and other required data to the spreadsheet, reformat the information to a required format and send the file to a programmer who can import the data using data loading programs." I'm guessing for UCJEPS they have an internal program they've made to parse the excel metadata.
So what I've done is with the spreadsheet of data (each column representing a schema field, like the title or ID for cataloging records), I've created a separate sheet, where it basically sets up a template of XML code (each column spits out basically a line of XML code you would see from a typical import XML file), and references the metadata sheet to populate the blanks in the XML template code. Like if I want to import a new cataloging record, I would have a column in my new sheet called "objectNumber", which would generate the xml line of code for <collectionobjects_common:objectNumber>123456</collectionobjects_common:objectNumber> in a cell below that column. I then accumulate all of the spreadsheet cells that contains a line of XML code, and put them all together into one cell in another seperate sheet, so it has all of the XML import code there. From there you export that sheet with the lone cell of XML code as an XML document. However the problem I run into this is that this is still a tremendous amount of work to manually setup the XML template code, and this is highly dependent that the metadata sheet format needs to stay the same as the XML code generation is highly dependent on which cells the correct data lies in. There is also topic of importing records alongside related records. Like lets say you want to import a cataloging record, but also want to upload a related Acquisition record. This then involves having to setup another sheet to generate XML code for the relations record, and then having to make sure that you send the curl statements of importing the cataloging record and the acquisition record, before doing the curl statement for the relation one.
This is all very doable, but it also just made me wonder if there is a simpler means or tool that already exist (that I may not be using) that would help translate excel metadata to XML import code. Or is what I listed pretty much the right track on doing this? Just was wondering on any input.
Thank you,
Peter Tucker
Talk mailing list
Talk@lists.collectionspace.orgmailto:Talk@lists.collectionspace.org
http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org
Hi, Peter. In my migration efforts, I've found the need to do significant manipulation of data coming into CSpace -- particularly resolving text strings into entities in the CSpace database. So my approach has been to use Python for the migration logic and Jinja2 templates for the XML. The Jinga2 code library is sophisticated enough to loop over lists of values to create repeated elements in the XML uploaded to CSpace. With Python I can look up entity values from a small sqlite3 database. I've found the combination of the two to be powerful enough for what I need without having to create a massive structure of XML-generating code.
Peter
On Thu, Apr 28, 2016 at 8:45 PM, Peter Tucker <peter.tucker@granitehorizon.com mailto:peter.tucker@granitehorizon.com> wrote:
Hi Everyone,
I've been working on a means of converting metadata from a provided excel spreadsheet into xml code to be sent off via curl statements for importing new records. I've been looking to see if there's any established programs or tools that streamline this process to work with excel spreadsheets, but the only one I found a hint of this was here: https://wiki.collectionspace.org/display/deploy/Data+import https://wiki.collectionspace.org/display/deploy/Data+import
It mentions that "They add bar code numbers, accession numbers, and other required data to the spreadsheet, reformat the information to a required format and send the file to a programmer who can import the data using data loading programs." I'm guessing for UCJEPS they have an internal program they've made to parse the excel metadata.
So what I've done is with the spreadsheet of data (each column representing a schema field, like the title or ID for cataloging records), I've created a separate sheet, where it basically sets up a template of XML code (each column spits out basically a line of XML code you would see from a typical import XML file), and references the metadata sheet to populate the blanks in the XML template code. Like if I want to import a new cataloging record, I would have a column in my new sheet called "objectNumber", which would generate the xml line of code for <collectionobjects_common:objectNumber>123456</collectionobjects_common:objectNumber> in a cell below that column. I then accumulate all of the spreadsheet cells that contains a line of XML code, and put them all together into one cell in another seperate sheet, so it has all of the XML import code there. From there you export that sheet with the lone cell of XML code as an XML document. However the problem I run into this is that this is still a tremendous amount of work to manually setup the XML template code, and this is highly dependent that the metadata sheet format needs to stay the same as the XML code generation is highly dependent on which cells the correct data lies in. There is also topic of importing records alongside related records. Like lets say you want to import a cataloging record, but also want to upload a related Acquisition record. This then involves having to setup another sheet to generate XML code for the relations record, and then having to make sure that you send the curl statements of importing the cataloging record and the acquisition record, before doing the curl statement for the relation one.
This is all very doable, but it also just made me wonder if there is a simpler means or tool that already exist (that I may not be using) that would help translate excel metadata to XML import code. Or is what I listed pretty much the right track on doing this? Just was wondering on any input.
Thank you,
Peter Tucker
Talk mailing list
Talk@lists.collectionspace.org mailto:Talk@lists.collectionspace.org
http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org
--
Peter Murray
Dev/Ops Lead and Project Manager
Cherry Hill Company