talk@lists.collectionspace.org

WE HAVE SUNSET THIS LISTSERV - Join us at collectionspace@lyrasislists.org

View all threads

Importing Data via spreadsheets

PT
Peter Tucker
Fri, Apr 29, 2016 12:45 AM

Hi Everyone,

I've been working on a means of converting metadata from a provided
excel spreadsheet into xml code to be sent off via curl statements for
importing new records. I've been looking to see if there's any
established programs or tools that streamline this process to work with
excel spreadsheets, but the only one I found a hint of this was here:
https://wiki.collectionspace.org/display/deploy/Data+import
It mentions that "They add bar code numbers, accession numbers, and
other required data to the spreadsheet, reformat the information to a
required format and send the file to a programmer who can import the
data using data loading programs." I'm guessing for UCJEPS they have an
internal program they've made to parse the excel metadata.

So what I've done is with the spreadsheet of data (each column
representing a schema field, like the title or ID for cataloging
records), I've created a separate sheet, where it basically sets up a
template of XML code (each column spits out basically a line of XML code
you would see from a typical import XML file), and references the
metadata sheet to populate the blanks in the XML template code. Like if
I want to import a new cataloging record, I would have a column in my
new sheet called "objectNumber", which would generate the xml line of
code for
<collectionobjects_common:objectNumber>123456</collectionobjects_common:objectNumber>
in a cell below that column. I then accumulate all of the spreadsheet
cells that contains a line of XML code, and put them all together into
one cell in another seperate sheet, so it has all of the XML import code
there. From there you export that sheet with the lone cell of XML code
as an XML document. However the problem I run into this is that this is
still a tremendous amount of work to manually setup the XML template
code, and this is highly dependent that the metadata sheet format needs
to stay the same as the XML code generation is highly dependent on which
cells the correct data lies in. There is also topic of importing records
alongside related records. Like lets say you want to import a cataloging
record, but also want to upload a related Acquisition record. This then
involves having to setup another sheet to generate XML code for the
relations record, and then having to make sure that you send the curl
statements of importing the cataloging record and the acquisition
record, before doing the curl statement for the relation one.

This is all very doable, but it also just made me wonder if there is a
simpler means or tool that already exist (that I may not be using) that
would help translate excel metadata to XML import code. Or is what I
listed pretty much the right track on doing this? Just was wondering on
any input.

Thank you,
Peter Tucker

Hi Everyone, I've been working on a means of converting metadata from a provided excel spreadsheet into xml code to be sent off via curl statements for importing new records. I've been looking to see if there's any established programs or tools that streamline this process to work with excel spreadsheets, but the only one I found a hint of this was here: https://wiki.collectionspace.org/display/deploy/Data+import It mentions that "They add bar code numbers, accession numbers, and other required data to the spreadsheet, reformat the information to a required format and send the file to a programmer who can import the data using data loading programs." I'm guessing for UCJEPS they have an internal program they've made to parse the excel metadata. So what I've done is with the spreadsheet of data (each column representing a schema field, like the title or ID for cataloging records), I've created a separate sheet, where it basically sets up a template of XML code (each column spits out basically a line of XML code you would see from a typical import XML file), and references the metadata sheet to populate the blanks in the XML template code. Like if I want to import a new cataloging record, I would have a column in my new sheet called "objectNumber", which would generate the xml line of code for <collectionobjects_common:objectNumber>123456</collectionobjects_common:objectNumber> in a cell below that column. I then accumulate all of the spreadsheet cells that contains a line of XML code, and put them all together into one cell in another seperate sheet, so it has all of the XML import code there. From there you export that sheet with the lone cell of XML code as an XML document. However the problem I run into this is that this is still a tremendous amount of work to manually setup the XML template code, and this is highly dependent that the metadata sheet format needs to stay the same as the XML code generation is highly dependent on which cells the correct data lies in. There is also topic of importing records alongside related records. Like lets say you want to import a cataloging record, but also want to upload a related Acquisition record. This then involves having to setup another sheet to generate XML code for the relations record, and then having to make sure that you send the curl statements of importing the cataloging record and the acquisition record, before doing the curl statement for the relation one. This is all very doable, but it also just made me wonder if there is a simpler means or tool that already exist (that I may not be using) that would help translate excel metadata to XML import code. Or is what I listed pretty much the right track on doing this? Just was wondering on any input. Thank you, Peter Tucker
AR
Aron Roberts
Fri, Apr 29, 2016 6:23 PM

Hi Peter,

Thanks very much for sharing your thoughts, and a description of your
experiments, regarding importing Excel-based data into CollectionSpace.

There may well be others doing this, and I'm hoping they respond. To
start things off, last June, John Lowe shared this prototype project which,
during one point in its workflow, creates XML import records from CSV via a
template:

https://github.com/cspace-deployment/Tools/tree/master/devops/helpers

At a quick glance, some code relevant to that part of the workflow might
include:

https://github.com/cspace-deployment/Tools/blob/master/devops/helpers/loadCspace.py#L84-L90
and

https://github.com/cspace-deployment/Tools/blob/master/devops/helpers/loadCspace.py#L133-L161

as well as the template file itself:

https://github.com/cspace-deployment/Tools/blob/master/devops/helpers/collectionobject.xml

In case this might help generate approaches/inspiration ...

Aron

On Thu, Apr 28, 2016 at 5:45 PM, Peter Tucker <
peter.tucker@granitehorizon.com> wrote:

Hi Everyone,

I've been working on a means of converting metadata from a provided excel
spreadsheet into xml code to be sent off via curl statements for importing
new records. I've been looking to see if there's any established programs
or tools that streamline this process to work with excel spreadsheets, but
the only one I found a hint of this was here:
https://wiki.collectionspace.org/display/deploy/Data+import
It mentions that "They add bar code numbers, accession numbers, and other
required data to the spreadsheet, reformat the information to a required
format and send the file to a programmer who can import the data using data
loading programs." I'm guessing for UCJEPS they have an internal program
they've made to parse the excel metadata.

So what I've done is with the spreadsheet of data (each column
representing a schema field, like the title or ID for cataloging records),
I've created a separate sheet, where it basically sets up a template of XML
code (each column spits out basically a line of XML code you would see from
a typical import XML file), and references the metadata sheet to populate
the blanks in the XML template code. Like if I want to import a new
cataloging record, I would have a column in my new sheet called
"objectNumber", which would generate the xml line of code for
<collectionobjects_common:objectNumber>123456</collectionobjects_common:objectNumber>
in a cell below that column. I then accumulate all of the spreadsheet cells
that contains a line of XML code, and put them all together into one cell
in another seperate sheet, so it has all of the XML import code there. From
there you export that sheet with the lone cell of XML code as an XML
document. However the problem I run into this is that this is still a
tremendous amount of work to manually setup the XML template code, and this
is highly dependent that the metadata sheet format needs to stay the same
as the XML code generation is highly dependent on which cells the correct
data lies in. There is also topic of importing records alongside related
records. Like lets say you want to import a cataloging record, but also
want to upload a related Acquisition record. This then involves having to
setup another sheet to generate XML code for the relations record, and then
having to make sure that you send the curl statements of importing the
cataloging record and the acquisition record, before doing the curl
statement for the relation one.

This is all very doable, but it also just made me wonder if there is a
simpler means or tool that already exist (that I may not be using) that
would help translate excel metadata to XML import code. Or is what I listed
pretty much the right track on doing this? Just was wondering on any input.

Thank you,
Peter Tucker


Talk mailing list
Talk@lists.collectionspace.org

http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

Hi Peter, Thanks very much for sharing your thoughts, and a description of your experiments, regarding importing Excel-based data into CollectionSpace. There may well be others doing this, and I'm hoping they respond. To start things off, last June, John Lowe shared this prototype project which, during one point in its workflow, creates XML import records from CSV via a template: https://github.com/cspace-deployment/Tools/tree/master/devops/helpers At a quick glance, some code relevant to that part of the workflow might include: https://github.com/cspace-deployment/Tools/blob/master/devops/helpers/loadCspace.py#L84-L90 and https://github.com/cspace-deployment/Tools/blob/master/devops/helpers/loadCspace.py#L133-L161 as well as the template file itself: https://github.com/cspace-deployment/Tools/blob/master/devops/helpers/collectionobject.xml In case this might help generate approaches/inspiration ... Aron On Thu, Apr 28, 2016 at 5:45 PM, Peter Tucker < peter.tucker@granitehorizon.com> wrote: > Hi Everyone, > > I've been working on a means of converting metadata from a provided excel > spreadsheet into xml code to be sent off via curl statements for importing > new records. I've been looking to see if there's any established programs > or tools that streamline this process to work with excel spreadsheets, but > the only one I found a hint of this was here: > https://wiki.collectionspace.org/display/deploy/Data+import > It mentions that "They add bar code numbers, accession numbers, and other > required data to the spreadsheet, reformat the information to a required > format and send the file to a programmer who can import the data using data > loading programs." I'm guessing for UCJEPS they have an internal program > they've made to parse the excel metadata. > > So what I've done is with the spreadsheet of data (each column > representing a schema field, like the title or ID for cataloging records), > I've created a separate sheet, where it basically sets up a template of XML > code (each column spits out basically a line of XML code you would see from > a typical import XML file), and references the metadata sheet to populate > the blanks in the XML template code. Like if I want to import a new > cataloging record, I would have a column in my new sheet called > "objectNumber", which would generate the xml line of code for > <collectionobjects_common:objectNumber>123456</collectionobjects_common:objectNumber> > in a cell below that column. I then accumulate all of the spreadsheet cells > that contains a line of XML code, and put them all together into one cell > in another seperate sheet, so it has all of the XML import code there. From > there you export that sheet with the lone cell of XML code as an XML > document. However the problem I run into this is that this is still a > tremendous amount of work to manually setup the XML template code, and this > is highly dependent that the metadata sheet format needs to stay the same > as the XML code generation is highly dependent on which cells the correct > data lies in. There is also topic of importing records alongside related > records. Like lets say you want to import a cataloging record, but also > want to upload a related Acquisition record. This then involves having to > setup another sheet to generate XML code for the relations record, and then > having to make sure that you send the curl statements of importing the > cataloging record and the acquisition record, before doing the curl > statement for the relation one. > > This is all very doable, but it also just made me wonder if there is a > simpler means or tool that already exist (that I may not be using) that > would help translate excel metadata to XML import code. Or is what I listed > pretty much the right track on doing this? Just was wondering on any input. > > Thank you, > Peter Tucker > > _______________________________________________ > Talk mailing list > Talk@lists.collectionspace.org > > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org >
RM
Richard Millet
Fri, Apr 29, 2016 6:53 PM

Peter,

Also, Mark Cooper of LYRASIS has graciously shared this tool he developed for working with CollectionSpace:

https://github.com/lyrasis/csible

With that tool, there's a single command to get a list of all records and a set of their properties returned by the API output to CSV:

rake cs:get:list[collectionobjects,"wf_deleted=false"]

-Richard


From: Talk talk-bounces@lists.collectionspace.org on behalf of Aron Roberts aron@socrates.berkeley.edu
Sent: Friday, April 29, 2016 11:23 AM
To: Peter Tucker
Cc: talk@lists.collectionspace.org
Subject: Re: [Talk] Importing Data via spreadsheets

Hi Peter,

Thanks very much for sharing your thoughts, and a description of your experiments, regarding importing Excel-based data into CollectionSpace.

There may well be others doing this, and I'm hoping they respond. To start things off, last June, John Lowe shared this prototype project which, during one point in its workflow, creates XML import records from CSV via a template:

https://github.com/cspace-deployment/Tools/tree/master/devops/helpers

cspace-deployment/Toolshttps://github.com/cspace-deployment/Tools/tree/master/devops/helpers
github.com
Tools - Additional software, outside of the "core" CSpace code, but useful for deployment, testing, reporting, and so on.

At a quick glance, some code relevant to that part of the workflow might include:

https://github.com/cspace-deployment/Tools/blob/master/devops/helpers/loadCspace.py#L84-L90
and
https://github.com/cspace-deployment/Tools/blob/master/devops/helpers/loadCspace.py#L133-L161

as well as the template file itself:

https://github.com/cspace-deployment/Tools/blob/master/devops/helpers/collectionobject.xml

In case this might help generate approaches/inspiration ...

Aron

On Thu, Apr 28, 2016 at 5:45 PM, Peter Tucker <peter.tucker@granitehorizon.commailto:peter.tucker@granitehorizon.com> wrote:
Hi Everyone,

I've been working on a means of converting metadata from a provided excel spreadsheet into xml code to be sent off via curl statements for importing new records. I've been looking to see if there's any established programs or tools that streamline this process to work with excel spreadsheets, but the only one I found a hint of this was here: https://wiki.collectionspace.org/display/deploy/Data+import
It mentions that "They add bar code numbers, accession numbers, and other required data to the spreadsheet, reformat the information to a required format and send the file to a programmer who can import the data using data loading programs." I'm guessing for UCJEPS they have an internal program they've made to parse the excel metadata.

So what I've done is with the spreadsheet of data (each column representing a schema field, like the title or ID for cataloging records), I've created a separate sheet, where it basically sets up a template of XML code (each column spits out basically a line of XML code you would see from a typical import XML file), and references the metadata sheet to populate the blanks in the XML template code. Like if I want to import a new cataloging record, I would have a column in my new sheet called "objectNumber", which would generate the xml line of code for <collectionobjects_common:objectNumber>123456</collectionobjects_common:objectNumber> in a cell below that column. I then accumulate all of the spreadsheet cells that contains a line of XML code, and put them all together into one cell in another seperate sheet, so it has all of the XML import code there. From there you export that sheet with the lone cell of XML code as an XML document. However the problem I run into this is that this is still a tremendous amount of work to manually setup the XML template code, and this is highly dependent that the metadata sheet format needs to stay the same as the XML code generation is highly dependent on which cells the correct data lies in. There is also topic of importing records alongside related records. Like lets say you want to import a cataloging record, but also want to upload a related Acquisition record. This then involves having to setup another sheet to generate XML code for the relations record, and then having to make sure that you send the curl statements of importing the cataloging record and the acquisition record, before doing the curl statement for the relation one.

This is all very doable, but it also just made me wonder if there is a simpler means or tool that already exist (that I may not be using) that would help translate excel metadata to XML import code. Or is what I listed pretty much the right track on doing this? Just was wondering on any input.

Thank you,
Peter Tucker


Talk mailing list
Talk@lists.collectionspace.orgmailto:Talk@lists.collectionspace.org
http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

Peter, Also, Mark Cooper of LYRASIS has graciously shared this tool he developed for working with CollectionSpace: https://github.com/lyrasis/csible With that tool, there's a single command to get a list of all records and a set of their properties returned by the API output to CSV: rake cs:get:list[collectionobjects,"wf_deleted=false"] -Richard ________________________________ From: Talk <talk-bounces@lists.collectionspace.org> on behalf of Aron Roberts <aron@socrates.berkeley.edu> Sent: Friday, April 29, 2016 11:23 AM To: Peter Tucker Cc: talk@lists.collectionspace.org Subject: Re: [Talk] Importing Data via spreadsheets Hi Peter, Thanks very much for sharing your thoughts, and a description of your experiments, regarding importing Excel-based data into CollectionSpace. There may well be others doing this, and I'm hoping they respond. To start things off, last June, John Lowe shared this prototype project which, during one point in its workflow, creates XML import records from CSV via a template: https://github.com/cspace-deployment/Tools/tree/master/devops/helpers cspace-deployment/Tools<https://github.com/cspace-deployment/Tools/tree/master/devops/helpers> github.com Tools - Additional software, outside of the "core" CSpace code, but useful for deployment, testing, reporting, and so on. At a quick glance, some code relevant to that part of the workflow might include: https://github.com/cspace-deployment/Tools/blob/master/devops/helpers/loadCspace.py#L84-L90 and https://github.com/cspace-deployment/Tools/blob/master/devops/helpers/loadCspace.py#L133-L161 as well as the template file itself: https://github.com/cspace-deployment/Tools/blob/master/devops/helpers/collectionobject.xml In case this might help generate approaches/inspiration ... Aron On Thu, Apr 28, 2016 at 5:45 PM, Peter Tucker <peter.tucker@granitehorizon.com<mailto:peter.tucker@granitehorizon.com>> wrote: Hi Everyone, I've been working on a means of converting metadata from a provided excel spreadsheet into xml code to be sent off via curl statements for importing new records. I've been looking to see if there's any established programs or tools that streamline this process to work with excel spreadsheets, but the only one I found a hint of this was here: https://wiki.collectionspace.org/display/deploy/Data+import It mentions that "They add bar code numbers, accession numbers, and other required data to the spreadsheet, reformat the information to a required format and send the file to a programmer who can import the data using data loading programs." I'm guessing for UCJEPS they have an internal program they've made to parse the excel metadata. So what I've done is with the spreadsheet of data (each column representing a schema field, like the title or ID for cataloging records), I've created a separate sheet, where it basically sets up a template of XML code (each column spits out basically a line of XML code you would see from a typical import XML file), and references the metadata sheet to populate the blanks in the XML template code. Like if I want to import a new cataloging record, I would have a column in my new sheet called "objectNumber", which would generate the xml line of code for <collectionobjects_common:objectNumber>123456</collectionobjects_common:objectNumber> in a cell below that column. I then accumulate all of the spreadsheet cells that contains a line of XML code, and put them all together into one cell in another seperate sheet, so it has all of the XML import code there. From there you export that sheet with the lone cell of XML code as an XML document. However the problem I run into this is that this is still a tremendous amount of work to manually setup the XML template code, and this is highly dependent that the metadata sheet format needs to stay the same as the XML code generation is highly dependent on which cells the correct data lies in. There is also topic of importing records alongside related records. Like lets say you want to import a cataloging record, but also want to upload a related Acquisition record. This then involves having to setup another sheet to generate XML code for the relations record, and then having to make sure that you send the curl statements of importing the cataloging record and the acquisition record, before doing the curl statement for the relation one. This is all very doable, but it also just made me wonder if there is a simpler means or tool that already exist (that I may not be using) that would help translate excel metadata to XML import code. Or is what I listed pretty much the right track on doing this? Just was wondering on any input. Thank you, Peter Tucker _______________________________________________ Talk mailing list Talk@lists.collectionspace.org<mailto:Talk@lists.collectionspace.org> http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org
PM
Peter Murray
Fri, Apr 29, 2016 8:49 PM

Hi, Peter.  In my migration efforts, I've found the need to do significant manipulation of data coming into CSpace -- particularly resolving text strings into entities in the CSpace database.  So my approach has been to use Python for the migration logic and Jinja2 templates for the XML.  The Jinga2 code library is sophisticated enough to loop over lists of values to create repeated elements in the XML uploaded to CSpace.  With Python I can look up entity values from a small sqlite3 database.  I've found the combination of the two to be powerful enough for what I need without having to create a massive structure of XML-generating code.

Peter

On Thu, Apr 28, 2016 at 8:45 PM, Peter Tucker <peter.tucker@granitehorizon.com mailto:peter.tucker@granitehorizon.com> wrote:
Hi Everyone,

I've been working on a means of converting metadata from a provided excel spreadsheet into xml code to be sent off via curl statements for importing new records. I've been looking to see if there's any established programs or tools that streamline this process to work with excel spreadsheets, but the only one I found a hint of this was here: https://wiki.collectionspace.org/display/deploy/Data+import https://wiki.collectionspace.org/display/deploy/Data+import
It mentions that "They add bar code numbers, accession numbers, and other required data to the spreadsheet, reformat the information to a required format and send the file to a programmer who can import the data using data loading programs." I'm guessing for UCJEPS they have an internal program they've made to parse the excel metadata.

So what I've done is with the spreadsheet of data (each column representing a schema field, like the title or ID for cataloging records), I've created a separate sheet, where it basically sets up a template of XML code (each column spits out basically a line of XML code you would see from a typical import XML file), and references the metadata sheet to populate the blanks in the XML template code. Like if I want to import a new cataloging record, I would have a column in my new sheet called "objectNumber", which would generate the xml line of code for <collectionobjects_common:objectNumber>123456</collectionobjects_common:objectNumber> in a cell below that column. I then accumulate all of the spreadsheet cells that contains a line of XML code, and put them all together into one cell in another seperate sheet, so it has all of the XML import code there. From there you export that sheet with the lone cell of XML code as an XML document. However the problem I run into this is that this is still a tremendous amount of work to manually setup the XML template code, and this is highly dependent that the metadata sheet format needs to stay the same as the XML code generation is highly dependent on which cells the correct data lies in. There is also topic of importing records alongside related records. Like lets say you want to import a cataloging record, but also want to upload a related Acquisition record. This then involves having to setup another sheet to generate XML code for the relations record, and then having to make sure that you send the curl statements of importing the cataloging record and the acquisition record, before doing the curl statement for the relation one.

This is all very doable, but it also just made me wonder if there is a simpler means or tool that already exist (that I may not be using) that would help translate excel metadata to XML import code. Or is what I listed pretty much the right track on doing this? Just was wondering on any input.

Thank you,
Peter Tucker


Talk mailing list
Talk@lists.collectionspace.org mailto:Talk@lists.collectionspace.org
http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

--
Peter Murray
Dev/Ops Lead and Project Manager
Cherry Hill Company

Hi, Peter. In my migration efforts, I've found the need to do significant manipulation of data coming into CSpace -- particularly resolving text strings into entities in the CSpace database. So my approach has been to use Python for the migration logic and Jinja2 templates for the XML. The Jinga2 code library is sophisticated enough to loop over lists of values to create repeated elements in the XML uploaded to CSpace. With Python I can look up entity values from a small sqlite3 database. I've found the combination of the two to be powerful enough for what I need without having to create a massive structure of XML-generating code. Peter On Thu, Apr 28, 2016 at 8:45 PM, Peter Tucker <peter.tucker@granitehorizon.com <mailto:peter.tucker@granitehorizon.com>> wrote: Hi Everyone, I've been working on a means of converting metadata from a provided excel spreadsheet into xml code to be sent off via curl statements for importing new records. I've been looking to see if there's any established programs or tools that streamline this process to work with excel spreadsheets, but the only one I found a hint of this was here: https://wiki.collectionspace.org/display/deploy/Data+import <https://wiki.collectionspace.org/display/deploy/Data+import> It mentions that "They add bar code numbers, accession numbers, and other required data to the spreadsheet, reformat the information to a required format and send the file to a programmer who can import the data using data loading programs." I'm guessing for UCJEPS they have an internal program they've made to parse the excel metadata. So what I've done is with the spreadsheet of data (each column representing a schema field, like the title or ID for cataloging records), I've created a separate sheet, where it basically sets up a template of XML code (each column spits out basically a line of XML code you would see from a typical import XML file), and references the metadata sheet to populate the blanks in the XML template code. Like if I want to import a new cataloging record, I would have a column in my new sheet called "objectNumber", which would generate the xml line of code for <collectionobjects_common:objectNumber>123456</collectionobjects_common:objectNumber> in a cell below that column. I then accumulate all of the spreadsheet cells that contains a line of XML code, and put them all together into one cell in another seperate sheet, so it has all of the XML import code there. From there you export that sheet with the lone cell of XML code as an XML document. However the problem I run into this is that this is still a tremendous amount of work to manually setup the XML template code, and this is highly dependent that the metadata sheet format needs to stay the same as the XML code generation is highly dependent on which cells the correct data lies in. There is also topic of importing records alongside related records. Like lets say you want to import a cataloging record, but also want to upload a related Acquisition record. This then involves having to setup another sheet to generate XML code for the relations record, and then having to make sure that you send the curl statements of importing the cataloging record and the acquisition record, before doing the curl statement for the relation one. This is all very doable, but it also just made me wonder if there is a simpler means or tool that already exist (that I may not be using) that would help translate excel metadata to XML import code. Or is what I listed pretty much the right track on doing this? Just was wondering on any input. Thank you, Peter Tucker _______________________________________________ Talk mailing list Talk@lists.collectionspace.org <mailto:Talk@lists.collectionspace.org> http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org <http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org> -- Peter Murray Dev/Ops Lead and Project Manager Cherry Hill Company