talk@lists.collectionspace.org

WE HAVE SUNSET THIS LISTSERV - Join us at collectionspace@lyrasislists.org

View all threads

Examples of Imports on Version 2.0

AZ
Austin Zumbro
Tue, Jun 5, 2012 9:27 PM

Hello,

We would like to do a batch import from an Excel spreadsheet into CSpace
2.0, and we're getting a bit muddled poring over the wiki.  If anyone could
provide any more detailed support, that would be wonderful.  Specifically:

  1. Does anyone have, or know where I can find, real-world sample import
    XML files with image file references and custom field data?

  2. In the case of partial import failures, how do I flush/erase all CS
    data?

  3. Is there a real world step-by-step for importing?  I think I saw one
    once, but I can no longer find it.

Thank you!

Austin Zumbro
Mediatrope LLC.

Hello, We would like to do a batch import from an Excel spreadsheet into CSpace 2.0, and we're getting a bit muddled poring over the wiki. If anyone could provide any more detailed support, that would be wonderful. Specifically: 1. Does anyone have, or know where I can find, real-world sample import XML files with image file references and custom field data? 2. In the case of partial import failures, how do I flush/erase all CS data? 3. Is there a real world step-by-step for importing? I think I saw one once, but I can no longer find it. Thank you! -- Austin Zumbro Mediatrope LLC.
AR
Aron Roberts
Tue, Jun 5, 2012 9:36 PM

Hi Austin,

At least a partial response to your questions 2 and 3 can be found
here; this might be the 'real world step-by-step' guide you were
asking about:

http://wiki.collectionspace.org/display/CSPACE20/How+to+Import+Data

The latest (v2.4 / early v2.5) version is here, if anything has been
added to corrected that may be helpful:

http://wiki.collectionspace.org/display/UNRELEASED/How+to+Import+Data

(There haven't, to my knowledge, been major changes in the Imports
service between versions 2.0 and v2.5, other than internal bugfixes.)

Aron Roberts
UC Berkeley

P.S. Other useful document are linked from the See also section, at
the end of the above wiki page(s).

On Tue, Jun 5, 2012 at 2:27 PM, Austin Zumbro azumbro@gmail.com wrote:

Hello,

We would like to do a batch import from an Excel spreadsheet into CSpace
2.0, and we're getting a bit muddled poring over the wiki.  If anyone could
provide any more detailed support, that would be wonderful.  Specifically:

  1.  Does anyone have, or know where I can find, real-world sample import XML
    files with image file references and custom field data?

  2.  In the case of partial import failures, how do I flush/erase all CS
    data?

  3.  Is there a real world step-by-step for importing?  I think I saw one
    once, but I can no longer find it.

Thank you!

Austin Zumbro
Mediatrope LLC.


Talk mailing list
Talk@lists.collectionspace.org
http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

Hi Austin, At least a partial response to your questions 2 and 3 can be found here; this might be the 'real world step-by-step' guide you were asking about: http://wiki.collectionspace.org/display/CSPACE20/How+to+Import+Data The latest (v2.4 / early v2.5) version is here, if anything has been added to corrected that may be helpful: http://wiki.collectionspace.org/display/UNRELEASED/How+to+Import+Data (There haven't, to my knowledge, been major changes in the Imports service between versions 2.0 and v2.5, other than internal bugfixes.) Aron Roberts UC Berkeley P.S. Other useful document are linked from the See also section, at the end of the above wiki page(s). On Tue, Jun 5, 2012 at 2:27 PM, Austin Zumbro <azumbro@gmail.com> wrote: > Hello, > > We would like to do a batch import from an Excel spreadsheet into CSpace > 2.0, and we're getting a bit muddled poring over the wiki.  If anyone could > provide any more detailed support, that would be wonderful.  Specifically: > > 1.  Does anyone have, or know where I can find, real-world sample import XML > files with image file references and custom field data? > > 2.  In the case of partial import failures, how do I flush/erase all CS > data? > > 3.  Is there a real world step-by-step for importing?  I think I saw one > once, but I can no longer find it. > > Thank you! > -- > Austin Zumbro > Mediatrope LLC. > > _______________________________________________ > Talk mailing list > Talk@lists.collectionspace.org > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org >
CH
Chris Hoffman
Tue, Jun 5, 2012 9:41 PM

Hi Austin,

The world of import is an especially fun one, no doubt about it.  I have a couple questions.  From your message, it sounds like you need to import records for collection objects, for images (media handling procedure), and the relationships between collection objects and media handling records.  Do you have other data (e.g., loans, acquisition records, and so on)?

Also, do you have customizations to your schema, or are you using the core tenant as delivered by CollectionSpace?

By the way, there are enough improvements in the more recent versions (e.g., 2.4 is about to be released), that you might consider using something after 2.0.  However, that's been a challenge all along -- deciding which version to use.

Thanks,
Chris

On Jun 5, 2012, at 2:27 PM, Austin Zumbro wrote:

Hello,

We would like to do a batch import from an Excel spreadsheet into CSpace 2.0, and we're getting a bit muddled poring over the wiki.  If anyone could provide any more detailed support, that would be wonderful.  Specifically:

  1. Does anyone have, or know where I can find, real-world sample import XML files with image file references and custom field data?

  2. In the case of partial import failures, how do I flush/erase all CS data?

  3. Is there a real world step-by-step for importing?  I think I saw one once, but I can no longer find it.

Thank you!

Austin Zumbro
Mediatrope LLC.


Talk mailing list
Talk@lists.collectionspace.org
http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

Hi Austin, The world of import is an especially fun one, no doubt about it. I have a couple questions. From your message, it sounds like you need to import records for collection objects, for images (media handling procedure), and the relationships between collection objects and media handling records. Do you have other data (e.g., loans, acquisition records, and so on)? Also, do you have customizations to your schema, or are you using the core tenant as delivered by CollectionSpace? By the way, there are enough improvements in the more recent versions (e.g., 2.4 is about to be released), that you might consider using something after 2.0. However, that's been a challenge all along -- deciding which version to use. Thanks, Chris On Jun 5, 2012, at 2:27 PM, Austin Zumbro wrote: > Hello, > > We would like to do a batch import from an Excel spreadsheet into CSpace 2.0, and we're getting a bit muddled poring over the wiki. If anyone could provide any more detailed support, that would be wonderful. Specifically: > > 1. Does anyone have, or know where I can find, real-world sample import XML files with image file references and custom field data? > > 2. In the case of partial import failures, how do I flush/erase all CS data? > > 3. Is there a real world step-by-step for importing? I think I saw one once, but I can no longer find it. > > Thank you! > -- > Austin Zumbro > Mediatrope LLC. > _______________________________________________ > Talk mailing list > Talk@lists.collectionspace.org > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org
AZ
Austin Zumbro
Tue, Jun 5, 2012 10:39 PM

Hi Aron and Chris,

Thank you!  This is all very helpful.  I'm actually leaving the project
shortly, and I'm trying to set up my successor with as much information as
I can.

We are using the core tenant as delivered by CollectionSpace.  There are a
number of acquisition records - along with the collection objects, images,
etc. as you mentioned.  There is also a smattering of other information
scattered throughout the spreadsheet we're pulling from.  There are 1,000+
relevant collection objects, and they occasionally come with specific notes
and data points.

I've been not-so-closely following the 2.4 development, and I agree that we
should discuss moving to a later version.  However, that's going to involve
a number of discussions and decisions, so for practical purposes, we should
probably hold the discussion to Version 2.0 for now.

Thanks again!

-Austin

On Tue, Jun 5, 2012 at 2:41 PM, Chris Hoffman chris.hoffman@berkeley.eduwrote:

Hi Austin,

The world of import is an especially fun one, no doubt about it.  I have a
couple questions.  From your message, it sounds like you need to import
records for collection objects, for images (media handling procedure), and
the relationships between collection objects and media handling records.
Do you have other data (e.g., loans, acquisition records, and so on)?

Also, do you have customizations to your schema, or are you using the core
tenant as delivered by CollectionSpace?

By the way, there are enough improvements in the more recent versions
(e.g., 2.4 is about to be released), that you might consider using
something after 2.0.  However, that's been a challenge all along --
deciding which version to use.

Thanks,
Chris

On Jun 5, 2012, at 2:27 PM, Austin Zumbro wrote:

Hello,

We would like to do a batch import from an Excel spreadsheet into CSpace

2.0, and we're getting a bit muddled poring over the wiki.  If anyone could
provide any more detailed support, that would be wonderful.  Specifically:

  1. Does anyone have, or know where I can find, real-world sample import

XML files with image file references and custom field data?

  1. In the case of partial import failures, how do I flush/erase all CS

data?

  1. Is there a real world step-by-step for importing?  I think I saw one

once, but I can no longer find it.

Thank you!

Austin Zumbro
Mediatrope LLC.


Talk mailing list
Talk@lists.collectionspace.org

Hi Aron and Chris, Thank you! This is all very helpful. I'm actually leaving the project shortly, and I'm trying to set up my successor with as much information as I can. We are using the core tenant as delivered by CollectionSpace. There are a number of acquisition records - along with the collection objects, images, etc. as you mentioned. There is also a smattering of other information scattered throughout the spreadsheet we're pulling from. There are 1,000+ relevant collection objects, and they occasionally come with specific notes and data points. I've been not-so-closely following the 2.4 development, and I agree that we should discuss moving to a later version. However, that's going to involve a number of discussions and decisions, so for practical purposes, we should probably hold the discussion to Version 2.0 for now. Thanks again! -Austin On Tue, Jun 5, 2012 at 2:41 PM, Chris Hoffman <chris.hoffman@berkeley.edu>wrote: > Hi Austin, > > The world of import is an especially fun one, no doubt about it. I have a > couple questions. From your message, it sounds like you need to import > records for collection objects, for images (media handling procedure), and > the relationships between collection objects and media handling records. > Do you have other data (e.g., loans, acquisition records, and so on)? > > Also, do you have customizations to your schema, or are you using the core > tenant as delivered by CollectionSpace? > > By the way, there are enough improvements in the more recent versions > (e.g., 2.4 is about to be released), that you might consider using > something after 2.0. However, that's been a challenge all along -- > deciding which version to use. > > Thanks, > Chris > > > On Jun 5, 2012, at 2:27 PM, Austin Zumbro wrote: > > > Hello, > > > > We would like to do a batch import from an Excel spreadsheet into CSpace > 2.0, and we're getting a bit muddled poring over the wiki. If anyone could > provide any more detailed support, that would be wonderful. Specifically: > > > > 1. Does anyone have, or know where I can find, real-world sample import > XML files with image file references and custom field data? > > > > 2. In the case of partial import failures, how do I flush/erase all CS > data? > > > > 3. Is there a real world step-by-step for importing? I think I saw one > once, but I can no longer find it. > > > > Thank you! > > -- > > Austin Zumbro > > Mediatrope LLC. > > _______________________________________________ > > Talk mailing list > > Talk@lists.collectionspace.org > > > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org > >
CH
Chris Hoffman
Wed, Jun 6, 2012 4:38 AM

Thanks for the additional information, Austin.  I completely understand needing to keep this conversation about 2.0 for now.

So your list of import jobs will be:

Collection Objects
Media Handling records (and blobs which are the images themselves)
Relationships between Collection Object records and Media Handling records
Acquisition records
Relationships between Collection Object records and Media Handling records

In addition, you will probably have data that need to go in

  • vocabularies (such as  persons and organizations)
  • controlled lists (dropdowns) of which there are two types
    • static lists (the values for which are stored in the app layer config files)
    • dynamic lists (which can be managed via the Term List tab in the Administration screens in CSpace)

If you have persons or organizations, then that involves creating full records in those vocabularies.  Those records include the refname format for the record, and it is this refname format that gets imported wherever you need a person or an organization.  For example, if the Production Person for a collection object is Jackson Pollock, you might thing that the collection object record stores a value of "Jackson Pollock" in the data field for Production Person. In fact, that's not the case. You would be storing something that looks more like this in that field:

urn:cspace:mediatrope.com:personauthorities:name(person):item:name(person1234)'Jackson Pollock'

Maybe you're already working with refnames and know all about this, but the point is that you need these background pieces of information (persons, organizations, controlled lists) in order to populate the collection object, media handling, and acquisition records.  More information about refnames can be found at
http://wiki.collectionspace.org/display/collectionspace/RefName

The other gotcha I'd share is that if you have any pieces of information on your records that take advantage of any of the repeating fields or field groups, then you have some extra work and a couple options.  For example, on collection objects, if you need to import multiple Production Person values (multiple artists), then there's an extra step.  Or if an object has multiple Responsible Departments that you need to populate at import, then there is some additional work.  We can talk more about that if you need to.

So, I'm going to stop there even though this is still just covering some of the planning work. With 1000+ records, you have some other options, and I'd like to see if anyone else chimes in.  At UC Berkeley, we just launched a system for the University and Jepson Herbaria that has over 500,000 collection objects, 100,000 organizations, 100,000 persons, 250,000 scientific taxonomy names, 5000 loans in and 3000 loans out.  So the scale of what we had to do forced us to do some pretty heavy lifting with tools such as Talend Open Studio.  We are using the Import Service, but you could use the REST api, supplemented by some additional data entry frankly by hand.

Thanks,
Chris

On Jun 5, 2012, at 3:39 PM, Austin Zumbro wrote:

Hi Aron and Chris,

Thank you!  This is all very helpful.  I'm actually leaving the project shortly, and I'm trying to set up my successor with as much information as I can.

We are using the core tenant as delivered by CollectionSpace.  There are a number of acquisition records - along with the collection objects, images, etc. as you mentioned.  There is also a smattering of other information scattered throughout the spreadsheet we're pulling from.  There are 1,000+ relevant collection objects, and they occasionally come with specific notes and data points.

I've been not-so-closely following the 2.4 development, and I agree that we should discuss moving to a later version.  However, that's going to involve a number of discussions and decisions, so for practical purposes, we should probably hold the discussion to Version 2.0 for now.

Thanks again!

-Austin

On Tue, Jun 5, 2012 at 2:41 PM, Chris Hoffman chris.hoffman@berkeley.edu wrote:
Hi Austin,

The world of import is an especially fun one, no doubt about it.  I have a couple questions.  From your message, it sounds like you need to import records for collection objects, for images (media handling procedure), and the relationships between collection objects and media handling records.  Do you have other data (e.g., loans, acquisition records, and so on)?

Also, do you have customizations to your schema, or are you using the core tenant as delivered by CollectionSpace?

By the way, there are enough improvements in the more recent versions (e.g., 2.4 is about to be released), that you might consider using something after 2.0.  However, that's been a challenge all along -- deciding which version to use.

Thanks,
Chris

On Jun 5, 2012, at 2:27 PM, Austin Zumbro wrote:

Hello,

We would like to do a batch import from an Excel spreadsheet into CSpace 2.0, and we're getting a bit muddled poring over the wiki.  If anyone could provide any more detailed support, that would be wonderful.  Specifically:

  1. Does anyone have, or know where I can find, real-world sample import XML files with image file references and custom field data?

  2. In the case of partial import failures, how do I flush/erase all CS data?

  3. Is there a real world step-by-step for importing?  I think I saw one once, but I can no longer find it.

Thank you!

Austin Zumbro
Mediatrope LLC.


Talk mailing list
Talk@lists.collectionspace.org
http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

Thanks for the additional information, Austin. I completely understand needing to keep this conversation about 2.0 for now. So your list of import jobs will be: Collection Objects Media Handling records (and blobs which are the images themselves) Relationships between Collection Object records and Media Handling records Acquisition records Relationships between Collection Object records and Media Handling records In addition, you will probably have data that need to go in - vocabularies (such as persons and organizations) - controlled lists (dropdowns) of which there are two types - static lists (the values for which are stored in the app layer config files) - dynamic lists (which can be managed via the Term List tab in the Administration screens in CSpace) If you have persons or organizations, then that involves creating full records in those vocabularies. Those records include the refname format for the record, and it is this refname format that gets imported wherever you need a person or an organization. For example, if the Production Person for a collection object is Jackson Pollock, you might thing that the collection object record stores a value of "Jackson Pollock" in the data field for Production Person. In fact, that's not the case. You would be storing something that looks more like this in that field: urn:cspace:mediatrope.com:personauthorities:name(person):item:name(person1234)'Jackson Pollock' Maybe you're already working with refnames and know all about this, but the point is that you need these background pieces of information (persons, organizations, controlled lists) in order to populate the collection object, media handling, and acquisition records. More information about refnames can be found at http://wiki.collectionspace.org/display/collectionspace/RefName The other gotcha I'd share is that if you have any pieces of information on your records that take advantage of any of the repeating fields or field groups, then you have some extra work and a couple options. For example, on collection objects, if you need to import multiple Production Person values (multiple artists), then there's an extra step. Or if an object has multiple Responsible Departments that you need to populate at import, then there is some additional work. We can talk more about that if you need to. So, I'm going to stop there even though this is still just covering some of the planning work. With 1000+ records, you have some other options, and I'd like to see if anyone else chimes in. At UC Berkeley, we just launched a system for the University and Jepson Herbaria that has over 500,000 collection objects, 100,000 organizations, 100,000 persons, 250,000 scientific taxonomy names, 5000 loans in and 3000 loans out. So the scale of what we had to do forced us to do some pretty heavy lifting with tools such as Talend Open Studio. We are using the Import Service, but you could use the REST api, supplemented by some additional data entry frankly by hand. Thanks, Chris On Jun 5, 2012, at 3:39 PM, Austin Zumbro wrote: > Hi Aron and Chris, > > Thank you! This is all very helpful. I'm actually leaving the project shortly, and I'm trying to set up my successor with as much information as I can. > > We are using the core tenant as delivered by CollectionSpace. There are a number of acquisition records - along with the collection objects, images, etc. as you mentioned. There is also a smattering of other information scattered throughout the spreadsheet we're pulling from. There are 1,000+ relevant collection objects, and they occasionally come with specific notes and data points. > > I've been not-so-closely following the 2.4 development, and I agree that we should discuss moving to a later version. However, that's going to involve a number of discussions and decisions, so for practical purposes, we should probably hold the discussion to Version 2.0 for now. > > Thanks again! > > -Austin > > On Tue, Jun 5, 2012 at 2:41 PM, Chris Hoffman <chris.hoffman@berkeley.edu> wrote: > Hi Austin, > > The world of import is an especially fun one, no doubt about it. I have a couple questions. From your message, it sounds like you need to import records for collection objects, for images (media handling procedure), and the relationships between collection objects and media handling records. Do you have other data (e.g., loans, acquisition records, and so on)? > > Also, do you have customizations to your schema, or are you using the core tenant as delivered by CollectionSpace? > > By the way, there are enough improvements in the more recent versions (e.g., 2.4 is about to be released), that you might consider using something after 2.0. However, that's been a challenge all along -- deciding which version to use. > > Thanks, > Chris > > > On Jun 5, 2012, at 2:27 PM, Austin Zumbro wrote: > > > Hello, > > > > We would like to do a batch import from an Excel spreadsheet into CSpace 2.0, and we're getting a bit muddled poring over the wiki. If anyone could provide any more detailed support, that would be wonderful. Specifically: > > > > 1. Does anyone have, or know where I can find, real-world sample import XML files with image file references and custom field data? > > > > 2. In the case of partial import failures, how do I flush/erase all CS data? > > > > 3. Is there a real world step-by-step for importing? I think I saw one once, but I can no longer find it. > > > > Thank you! > > -- > > Austin Zumbro > > Mediatrope LLC. > > _______________________________________________ > > Talk mailing list > > Talk@lists.collectionspace.org > > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org > >