talk@lists.collectionspace.org

WE HAVE SUNSET THIS LISTSERV - Join us at collectionspace@lyrasislists.org

View all threads

Re: [Talk] Using RESTful interface, create a record with a particular CSID

RM
Richard Millet
Wed, Oct 21, 2015 4:26 PM

Thanks Susan.  My last comment was partly tongue-in-cheek.  As for JIRA issue status changes, everyone in the community should speak up and challenge any issue status changes they disagree with.  Please!


From: Susan STONE sstone@berkeley.edu
Sent: Tuesday, October 20, 2015 8:04 PM
To: Richard Millet
Subject: Re: [Talk] Using RESTful interface, create a record with a particular CSID

I'll keep that in mind. I do remember I had a bad experience with
JIRAs in the past well enough not to want to repeat it: they all went
from major to minor to will not fix.

Susan

On Tue, Oct 20, 2015 at 7:56 PM, Richard Millet
richard.millet@lyrasis.org wrote:

Susan,

"Those who cannot remember the past (by documenting log file findings) are
condemned to repeat it. George Santayana

-Richard

On Oct 20, 2015, at 4:15 PM, Susan STONE sstone@berkeley.edu wrote:

Aron,

I definitely find stuff in the server-side logs that helps me find
errors in the XML. It can be a painful process, so I haven't saved any
cherished examples.

Susan

On Tue, Oct 20, 2015 at 4:05 PM, Aron Roberts
aron@socrates.berkeley.edu wrote:

Thanks, Susan!

In my experience, it is usually all or nothing (as with a database timeout

when the imports are too large or backed up) ...

Interesting. Have you been able to capture any log output on the server

side when those issues occurred? And are there CSpace JIRA issues for those?

I'd be happy to create one (or more) if you have any raw material around

this.

Aron

On Tue, Oct 20, 2015 at 3:57 PM, Susan STONE sstone@berkeley.edu wrote:

Aron,

In my experience, it is usually all or nothing (as with a database timeout

when the imports are too large or backed up),

and I just check the total for each batch.

I usually work out the XML issues in testing.

In the rare cases where there is a problem in some individual records

and the totals don't match, I have been comparing the

CSIDs manually-ish, but we are working to

automate that process and log the particular records

missed so they can be checked and resubmitted.

Susan

On Tue, Oct 20, 2015 at 3:47 PM, Aron Roberts

aron@socrates.berkeley.edu wrote:

Peter wrote:

I think I'll take another look at the Import service, albeit in a

one-at-a-time mode so I can have a better handle on error reporting.

From a trivial test just now, I'm wondering whether the Imports

service

might give us just enough information to do a multi-record import, and

be

able to tell which records were successfully imported and which were

not?

Specifically, if we're providing CSIDs for each record at import time,

perhaps we can tell which were successfully imported, and which failed

to be

imported - and thus need to be fixed and re-submitted in a follow-up

import?

Example POST to the Imports service, of five CollectionObject records

to

be imported into the 'core' tenant:

curl -X POST http://yourhostnamehere:8180/cspace-services/imports -i -u

"admin@core.collectionspace.org:Administrator" -H "Content-Type:

application/xml" -T mixed-objects-some-invalid.xml

Where the file 'mixed-objects-some-invalid.xml' is a payload

consisting of

five CollectionObject records to be imported, and where the fourth such

record includes a non-existent element (i.e. one not present in the

collectionobjects_common schema):

<?xml version="1.0" encoding="UTF-8"?> <imports>
<import service="CollectionObjects" type="CollectionObject"

CSID="e9a3e850-2776-44f4-b068-4ab1a0c8c046">

    <schema

xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"

name="collectionobjects_common">

<collectionobjects_common:objectNumber>UC1</collectionobjects_common:objectNumber>

    </schema>

</import>

<import service="CollectionObjects" type="CollectionObject"

CSID="c730a597-3229-476a-9e22-4ce89c003925">

    <schema

xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"

name="collectionobjects_common">

<collectionobjects_common:objectNumber>UC2</collectionobjects_common:objectNumber>

    </schema>

</import>

<import service="CollectionObjects" type="CollectionObject"

CSID="d7358564-6a08-4dc2-a07d-9708471daa02">

    <schema

xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"

name="collectionobjects_common">

<collectionobjects_common:objectNumber>UC3</collectionobjects_common:objectNumber>

    </schema>

</import>

<import service="CollectionObjects" type="CollectionObject"

CSID="6feb15c3-4e1e-4230-bb88-fa81467f6cbd">

    <schema

xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"

name="collectionobjects_common">

<collectionobjects_common:objectNumber>UC4</collectionobjects_common:objectNumber>

        <collectionobjects_common:foo>THIS ELEMENT DOESN'T EXIST IN

THE

SCHEMA</collectionobjects_common:foo>

    </schema>

</import>

<import service="CollectionObjects" type="CollectionObject"

CSID="a5839b2c-b229-4a55-8ee3-71b2440658a3">

    <schema

xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"

name="collectionobjects_common">

<collectionobjects_common:objectNumber>UC5</collectionobjects_common:objectNumber>

    </schema>

</import>
</imports>

This import generates the following console output (pretty printed after

the

fact for clarity, with hand-editing of the <report> content for further

readability):

<?xml version="1.0" encoding="utf-16"?> <import>
<msg>SUCCESS</msg>

<importedRecords>

    <importedRecord>

        <doctype>CollectionObject</doctype>

        <csid>d7358564-6a08-4dc2-a07d-9708471daa02</csid>

    </importedRecord>

    <importedRecord>

        <doctype>CollectionObject</doctype>

        <csid>c730a597-3229-476a-9e22-4ce89c003925</csid>

    </importedRecord>

    <importedRecord>

        <doctype>CollectionObject</doctype>

        <csid>e9a3e850-2776-44f4-b068-4ab1a0c8c046</csid>

    </importedRecord>

    <importedRecord>

        <doctype>CollectionObject</doctype>

        <csid>a5839b2c-b229-4a55-8ee3-71b2440658a3</csid>

    </importedRecord>

</importedRecords>

<status>Success</status>

<totalRecordsImported>4</totalRecordsImported>

<numRecordsImportedByDocType>

    <numRecordsImported>

        <docType>CollectionObject</docType>

        <numRecords>4</numRecords>

    </numRecordsImported>

</numRecordsImportedByDocType>

<report>

READ:

/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd/document.xml/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd

READ:

/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02/document.xml/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02

READ:

/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3/document.xml/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3

READ:

/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925/document.xml/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925

READ:

/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046/document.xml/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046</report>

</import>

Note that <totalRecordsImported> identifies that only 4 records were

successfully imported.

And by checking the CSIDs that were imported successfully against

the

entire list of CSIDs, perhaps the 'missing' records (that failed to

import)

could be identified? (In the list above, note that CSID

'6feb15c3-4e1e-4230-bb88-fa81467f6cbd' - the CSID for the problematic

fourth

record - doesn't appear in the list of <importedRecords>.) If this test

is

any indication, you might need to sort both lists of CSIDs - those

submitted

and those successfully imported - as the ordering in the import payload

might not match the order returned in the output from that POST ...

Anyway,

a thought.

Also: there are others on this list who are extremely experienced at

doing

imports, and who might be able to share their own tips/tricks/scripts

for

making it easier to identify records that failed to import, and

re-submitting those ...

Aron

On Tue, Oct 20, 2015 at 2:24 PM, Peter Murray pmurray@chillco.com

wrote:

Thanks, Aron and Richard.  I'm working with Acquisition records at the

moment, so I would need to add the 'other number' field to it and the

other

record types in order to store that PastPerfect identifier.  I think

I'll

take another look at the Import service, albeit in a one-at-a-time mode

so I

can have a better handle on error reporting.

Peter

On Oct 20, 2015, at 5:05 PM, Richard Millet

richard.millet@lyrasis.org

wrote:

Peter,

I agree with Aron.  If you decide you can't (or would rather not) use

the

Import service to create the cataloging records, then using the "Other

Number" field is probably your best choice.

Keep in mind that using a combination of data insertion methods

(RESTFul

API, Import Service, SQL) to get data into CollectionSpace is perfectly

ok.

So perhaps you could create all the cataloging records using the Import

service and then make additional changes with RESTFul PUT and other API

calls.

-Richard


From: Talk talk-bounces@lists.collectionspace.org on behalf of Aron

Roberts aron@socrates.berkeley.edu

Sent: Tuesday, October 20, 2015 1:00 PM

To: Peter Murray

Cc: CollectionSpace Talk List

Subject: Re: [Talk] Using RESTful interface, create a record with a

particular CSID

I wrote:

One possible way to do this - if this were supported, say, as a

future

enhancement - might be to supply the CSID in the <uri> value in a

<collectionspace_core> record part, in POSTs ...

And, of course, that's exactly what you suggested, Peter! :) Serves

me

right for too-quickly skimming!

Just thinking out loud here: the services would need to check that

URI

for at least: format, record type matching, and identifier uniqueness

(even

with the improbability of duplicate Type 4 UUIDs), and presumably

reject

records that didn't pass those validation checks, returning a '400 Bad

Request' or similar status.

And for certain record types, the services might also need to check

and/or synthesize the <refName> value. (For object or procedural

records

with hierarchy, such as Cataloging records, the CSID is part of that

refName.)

Aron

On Tue, Oct 20, 2015 at 12:46 PM, Aron Roberts

aron@socrates.berkeley.edu wrote:

As a possible workaround, the Imports service will allow you to

specify

a CSID for a newly imported record.

As an off-the-cuff, not-researched response: I don't recall if you

can

specify a CSID on a POST, when interacting with the services for

various

record types (i.e. outside of an import context), but my recollection

is

that's not possible.

One possible way to do this - if this were supported, say, as a

future

enhancement - might be to supply the CSID in the <uri> value in a

<collectionspace_core> record part, in POSTs; e.g.

<document name="collectionobjects">

ns2:collectionspace_core

...

<uri>/collectionobjects/90c0a0e6-eeca-46dd-add6</uri>

</ns2:collectionspace_core>

ns2:collectionobjects_common

 ...

it seems to be a really handy thing to have the CSID match

PastPerfect

ID (especially in the migration process when I am iterating through

loading

templates and linking records together).

Would the 'other number' multivalued field in

Cataloging/CollectionObject records work for this purpose? Out of the

box,

there's a 'previous' type for that field. (See attached and below.)

<cspace-other-number-field-example.png>

<otherNumberList> <otherNumber>
<numberValue>0001</numberValue>

<numberType>serial</numberType>
</otherNumber> <otherNumber>
<numberValue>204b95db-1557-4c8d-ba28-42e5578e53d3</numberValue>

<numberType>previous</numberType>
</otherNumber> </otherNumberList>

AFAIK, this is the provided/intended way to stash away formerly-used

museum numbers or identifiers that you'd like to continue to have

associated

with a record in CollectionSpace, although this clearly isn't as

clean/easy

to work with as having matching UUIDs in both one's old and new

systems.

Aron

On Tue, Oct 20, 2015 at 12:26 PM, Peter Murray pmurray@chillco.com

wrote:

As it happens, PastPerfect also uses Type-4 UUIDs as internal record

numbers, and it seems to be a really handy thing to have the CSID

match

PastPerfect ID (especially in the migration process when I am

iterating

through loading templates and linking records together).  The problem

is

that the RESTful service interface doesn't seem to let me specify a

CSID.

If I PUT to /cspace-services/acquisitions/{{UUID}} and that record

doesn't already exist, I get back a 404.[1]  If I POST to

/cspace-services/acquisitions and include this in the document:

<?xml version="1.0" encoding="UTF-8"?> <document name="acquisitions">
<ns2:collectionspace_core

xmlns:ns2="http://collectionspace.org/collectionspace_core/"

    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    <updatedBy>PastPerfect: {{ UPDATEDBY }}</updatedBy>

    <createdBy>PastPerfect Migration</createdBy>

    <workflowState>project</workflowState>

    <tenantId>11</tenantId>

    <updatedAt>{{ __updatedAt }}</updatedAt>

    <uri>/acquisitions/{{ PPID }}</uri>

</ns2:collectionspace_core>

...the service then doesn't honor the identifier in the <uri> element

and it assigns the record a new CSID.  (The above, by the way, is

part of

the Jinja2 template I'm using to create records, so the {{ PPID }} is

a

replaced placeholder.)

Thoughts?

Peter

[1] This is what I expect a RESTful interface to do...

--

Peter Murray

Dev/Ops Lead and Project Manager

Cherry Hill Company


Talk mailing list

Talk@lists.collectionspace.org

http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org


Talk mailing list

Talk@lists.collectionspace.org

http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org


Talk mailing list
Talk@lists.collectionspace.org
http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

Thanks Susan. My last comment was partly tongue-in-cheek. As for JIRA issue status changes, *everyone* in the community should speak up and challenge any issue status changes they disagree with. Please! ________________________________________ From: Susan STONE <sstone@berkeley.edu> Sent: Tuesday, October 20, 2015 8:04 PM To: Richard Millet Subject: Re: [Talk] Using RESTful interface, create a record with a particular CSID I'll keep that in mind. I do remember I had a bad experience with JIRAs in the past well enough not to want to repeat it: they all went from major to minor to will not fix. Susan On Tue, Oct 20, 2015 at 7:56 PM, Richard Millet <richard.millet@lyrasis.org> wrote: > Susan, > > "Those who cannot remember the past (by documenting log file findings) are > condemned to repeat it. George Santayana > > -Richard > > On Oct 20, 2015, at 4:15 PM, Susan STONE <sstone@berkeley.edu> wrote: > > Aron, > > I definitely find stuff in the server-side logs that helps me find > errors in the XML. It can be a painful process, so I haven't saved any > cherished examples. > > Susan > > On Tue, Oct 20, 2015 at 4:05 PM, Aron Roberts > <aron@socrates.berkeley.edu> wrote: > > Thanks, Susan! > > > In my experience, it is usually all or nothing (as with a database timeout > > when the imports are too large or backed up) ... > > > Interesting. Have you been able to capture any log output on the server > > side when those issues occurred? And are there CSpace JIRA issues for those? > > I'd be happy to create one (or more) if you have any raw material around > > this. > > > Aron > > > On Tue, Oct 20, 2015 at 3:57 PM, Susan STONE <sstone@berkeley.edu> wrote: > > > Aron, > > > In my experience, it is usually all or nothing (as with a database timeout > > when the imports are too large or backed up), > > and I just check the total for each batch. > > I usually work out the XML issues in testing. > > > In the rare cases where there is a problem in some individual records > > and the totals don't match, I have been comparing the > > CSIDs manually-ish, but we are working to > > automate that process and log the particular records > > missed so they can be checked and resubmitted. > > > Susan > > > On Tue, Oct 20, 2015 at 3:47 PM, Aron Roberts > > <aron@socrates.berkeley.edu> wrote: > > Peter wrote: > > I think I'll take another look at the Import service, albeit in a > > one-at-a-time mode so I can have a better handle on error reporting. > > > From a trivial test just now, I'm wondering whether the Imports > > service > > might give us *just enough* information to do a multi-record import, and > > be > > able to tell which records were successfully imported and which were > > not? > > > Specifically, if we're providing CSIDs for each record at import time, > > perhaps we can tell which were successfully imported, and which failed > > to be > > imported - and thus need to be fixed and re-submitted in a follow-up > > import? > > > Example POST to the Imports service, of five CollectionObject records > > to > > be imported into the 'core' tenant: > > > curl -X POST http://yourhostnamehere:8180/cspace-services/imports -i -u > > "admin@core.collectionspace.org:Administrator" -H "Content-Type: > > application/xml" -T mixed-objects-some-invalid.xml > > > Where the file 'mixed-objects-some-invalid.xml' is a payload > > consisting of > > five CollectionObject records to be imported, and where the fourth such > > record includes a non-existent element (i.e. one not present in the > > collectionobjects_common schema): > > > <?xml version="1.0" encoding="UTF-8"?> > > <imports> > > <import service="CollectionObjects" type="CollectionObject" > > CSID="e9a3e850-2776-44f4-b068-4ab1a0c8c046"> > > <schema > > > xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject" > > name="collectionobjects_common"> > > > > <collectionobjects_common:objectNumber>UC1</collectionobjects_common:objectNumber> > > </schema> > > </import> > > <import service="CollectionObjects" type="CollectionObject" > > CSID="c730a597-3229-476a-9e22-4ce89c003925"> > > <schema > > > xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject" > > name="collectionobjects_common"> > > > > <collectionobjects_common:objectNumber>UC2</collectionobjects_common:objectNumber> > > </schema> > > </import> > > <import service="CollectionObjects" type="CollectionObject" > > CSID="d7358564-6a08-4dc2-a07d-9708471daa02"> > > <schema > > > xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject" > > name="collectionobjects_common"> > > > > <collectionobjects_common:objectNumber>UC3</collectionobjects_common:objectNumber> > > </schema> > > </import> > > <import service="CollectionObjects" type="CollectionObject" > > CSID="6feb15c3-4e1e-4230-bb88-fa81467f6cbd"> > > <schema > > > xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject" > > name="collectionobjects_common"> > > > > <collectionobjects_common:objectNumber>UC4</collectionobjects_common:objectNumber> > > <collectionobjects_common:foo>THIS ELEMENT DOESN'T EXIST IN > > THE > > SCHEMA</collectionobjects_common:foo> > > </schema> > > </import> > > <import service="CollectionObjects" type="CollectionObject" > > CSID="a5839b2c-b229-4a55-8ee3-71b2440658a3"> > > <schema > > > xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject" > > name="collectionobjects_common"> > > > > <collectionobjects_common:objectNumber>UC5</collectionobjects_common:objectNumber> > > </schema> > > </import> > > </imports> > > > This import generates the following console output (pretty printed after > > the > > fact for clarity, with hand-editing of the <report> content for further > > readability): > > > <?xml version="1.0" encoding="utf-16"?> > > <import> > > <msg>SUCCESS</msg> > > <importedRecords> > > <importedRecord> > > <doctype>CollectionObject</doctype> > > <csid>d7358564-6a08-4dc2-a07d-9708471daa02</csid> > > </importedRecord> > > <importedRecord> > > <doctype>CollectionObject</doctype> > > <csid>c730a597-3229-476a-9e22-4ce89c003925</csid> > > </importedRecord> > > <importedRecord> > > <doctype>CollectionObject</doctype> > > <csid>e9a3e850-2776-44f4-b068-4ab1a0c8c046</csid> > > </importedRecord> > > <importedRecord> > > <doctype>CollectionObject</doctype> > > <csid>a5839b2c-b229-4a55-8ee3-71b2440658a3</csid> > > </importedRecord> > > </importedRecords> > > <status>Success</status> > > <totalRecordsImported>4</totalRecordsImported> > > <numRecordsImportedByDocType> > > <numRecordsImported> > > <docType>CollectionObject</docType> > > <numRecords>4</numRecords> > > </numRecordsImported> > > </numRecordsImportedByDocType> > > <report> > > READ: > > > /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd/document.xml/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd > > READ: > > > /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02/document.xml/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02 > > READ: > > > /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3/document.xml/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3 > > READ: > > > /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925/document.xml/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925 > > READ: > > > /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046/document.xml/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046</report> > > </import> > > > Note that <totalRecordsImported> identifies that only 4 records were > > successfully imported. > > > And by checking the CSIDs that *were* imported successfully against > > the > > entire list of CSIDs, perhaps the 'missing' records (that failed to > > import) > > could be identified? (In the list above, note that CSID > > '6feb15c3-4e1e-4230-bb88-fa81467f6cbd' - the CSID for the problematic > > fourth > > record - doesn't appear in the list of <importedRecords>.) If this test > > is > > any indication, you might need to sort both lists of CSIDs - those > > submitted > > and those successfully imported - as the ordering in the import payload > > might not match the order returned in the output from that POST ... > > Anyway, > > a thought. > > > Also: there are others on this list who are extremely experienced at > > doing > > imports, and who might be able to share their own tips/tricks/scripts > > for > > making it easier to identify records that failed to import, and > > re-submitting those ... > > > Aron > > > > On Tue, Oct 20, 2015 at 2:24 PM, Peter Murray <pmurray@chillco.com> > > wrote: > > > Thanks, Aron and Richard. I'm working with Acquisition records at the > > moment, so I would need to add the 'other number' field to it and the > > other > > record types in order to store that PastPerfect identifier. I think > > I'll > > take another look at the Import service, albeit in a one-at-a-time mode > > so I > > can have a better handle on error reporting. > > > > Peter > > > > On Oct 20, 2015, at 5:05 PM, Richard Millet > > <richard.millet@lyrasis.org> > > wrote: > > > Peter, > > > I agree with Aron. If you decide you can't (or would rather not) use > > the > > Import service to create the cataloging records, then using the "Other > > Number" field is probably your best choice. > > > Keep in mind that using a combination of data insertion methods > > (RESTFul > > API, Import Service, SQL) to get data into CollectionSpace is perfectly > > ok. > > So perhaps you could create all the cataloging records using the Import > > service and then make additional changes with RESTFul PUT and other API > > calls. > > > -Richard > > > > ________________________________ > > From: Talk <talk-bounces@lists.collectionspace.org> on behalf of Aron > > Roberts <aron@socrates.berkeley.edu> > > Sent: Tuesday, October 20, 2015 1:00 PM > > To: Peter Murray > > Cc: CollectionSpace Talk List > > Subject: Re: [Talk] Using RESTful interface, create a record with a > > particular CSID > > > I wrote: > > One possible way to do this - if this were supported, say, as a > > future > > enhancement - might be to supply the CSID in the <uri> value in a > > <collectionspace_core> record part, in POSTs ... > > > And, of course, that's exactly what you suggested, Peter! :) Serves > > me > > right for too-quickly skimming! > > > Just thinking out loud here: the services would need to check that > > URI > > for at least: format, record type matching, and identifier uniqueness > > (even > > with the improbability of duplicate Type 4 UUIDs), and presumably > > reject > > records that didn't pass those validation checks, returning a '400 Bad > > Request' or similar status. > > > And for certain record types, the services might also need to check > > and/or synthesize the <refName> value. (For object or procedural > > records > > with hierarchy, such as Cataloging records, the CSID is part of that > > refName.) > > > Aron > > > On Tue, Oct 20, 2015 at 12:46 PM, Aron Roberts > > <aron@socrates.berkeley.edu> wrote: > > > As a possible workaround, the Imports service will allow you to > > specify > > a CSID for a newly imported record. > > > As an off-the-cuff, not-researched response: I don't recall if you > > can > > specify a CSID on a POST, when interacting with the services for > > various > > record types (i.e. outside of an import context), but my recollection > > is > > that's not possible. > > > One possible way to do this - if this were supported, say, as a > > future > > enhancement - might be to supply the CSID in the <uri> value in a > > <collectionspace_core> record part, in POSTs; e.g. > > > <document name="collectionobjects"> > > <ns2:collectionspace_core> > > ... > > <uri>/collectionobjects/90c0a0e6-eeca-46dd-add6</uri> > > </ns2:collectionspace_core> > > <ns2:collectionobjects_common> > > ... > > > it seems to be a really handy thing to have the CSID match > > PastPerfect > > ID (especially in the migration process when I am iterating through > > loading > > templates and linking records together). > > > Would the 'other number' multivalued field in > > Cataloging/CollectionObject records work for this purpose? Out of the > > box, > > there's a 'previous' type for that field. (See attached and below.) > > > <cspace-other-number-field-example.png> > > > > <otherNumberList> > > <otherNumber> > > <numberValue>0001</numberValue> > > <numberType>serial</numberType> > > </otherNumber> > > <otherNumber> > > <numberValue>204b95db-1557-4c8d-ba28-42e5578e53d3</numberValue> > > <numberType>previous</numberType> > > </otherNumber> > > </otherNumberList> > > > AFAIK, this is the provided/intended way to stash away formerly-used > > museum numbers or identifiers that you'd like to continue to have > > associated > > with a record in CollectionSpace, although this clearly isn't as > > clean/easy > > to work with as having matching UUIDs in both one's old and new > > systems. > > > Aron > > > > > > On Tue, Oct 20, 2015 at 12:26 PM, Peter Murray <pmurray@chillco.com> > > wrote: > > > As it happens, PastPerfect also uses Type-4 UUIDs as internal record > > numbers, and it seems to be a really handy thing to have the CSID > > match > > PastPerfect ID (especially in the migration process when I am > > iterating > > through loading templates and linking records together). The problem > > is > > that the RESTful service interface doesn't seem to let me specify a > > CSID. > > > If I PUT to /cspace-services/acquisitions/{{UUID}} and that record > > doesn't already exist, I get back a 404.[1] If I POST to > > /cspace-services/acquisitions and include this in the document: > > > <?xml version="1.0" encoding="UTF-8"?> > > <document name="acquisitions"> > > <ns2:collectionspace_core > > xmlns:ns2="http://collectionspace.org/collectionspace_core/" > > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> > > <updatedBy>PastPerfect: {{ UPDATEDBY }}</updatedBy> > > <createdBy>PastPerfect Migration</createdBy> > > <workflowState>project</workflowState> > > <tenantId>11</tenantId> > > <updatedAt>{{ __updatedAt }}</updatedAt> > > <uri>/acquisitions/{{ PPID }}</uri> > > </ns2:collectionspace_core> > > > ...the service then doesn't honor the identifier in the <uri> element > > and it assigns the record a new CSID. (The above, by the way, is > > part of > > the Jinja2 template I'm using to create records, so the {{ PPID }} is > > a > > replaced placeholder.) > > > Thoughts? > > > > Peter > > > [1] This is what I expect a RESTful interface to do... > > > > > -- > > Peter Murray > > Dev/Ops Lead and Project Manager > > Cherry Hill Company > > > > > _______________________________________________ > > Talk mailing list > > Talk@lists.collectionspace.org > > > > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org > > > > > _______________________________________________ > > Talk mailing list > > Talk@lists.collectionspace.org > > > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org > > > > > > _______________________________________________ > Talk mailing list > Talk@lists.collectionspace.org > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org
JB
John B Lowe
Wed, Oct 21, 2015 5:06 PM

Talkers,

Speaking of "remembering history"...

There is a very long and highly ramified discussion about the behavior of
the IMPORT service and migrating large amounts of data in the following
JIRA:

https://issues.collectionspace.org/browse/PAHMA-378

it discusses, among other things:

  • The "all-or-nothing" behavior that Susan and I previously mentioned, and
    shows how to exploit it for one's own benefit.
  • The "furball effect", a slightly studied, perhaps not very general,
    behavior of CSpace where successive IMPORTs get slower and slower until one
    fails, whereupon the system is again able to process IMPORTs efficiently.
  • The "Magic Bus" approach to scheduling large numbers of imports, in order
    to automate to some degree the migration process.
  • The methodology used in debugging IMPORT issues, many by means of
    specific examples.

HTH!

John

On Wed, Oct 21, 2015 at 9:26 AM, Richard Millet richard.millet@lyrasis.org
wrote:

Thanks Susan.  My last comment was partly tongue-in-cheek.  As for JIRA
issue status changes, everyone in the community should speak up and
challenge any issue status changes they disagree with.  Please!


From: Susan STONE sstone@berkeley.edu
Sent: Tuesday, October 20, 2015 8:04 PM
To: Richard Millet
Subject: Re: [Talk] Using RESTful interface, create a record with a
particular CSID

I'll keep that in mind. I do remember I had a bad experience with
JIRAs in the past well enough not to want to repeat it: they all went
from major to minor to will not fix.

Susan

On Tue, Oct 20, 2015 at 7:56 PM, Richard Millet
richard.millet@lyrasis.org wrote:

Susan,

"Those who cannot remember the past (by documenting log file findings)

are

condemned to repeat it. George Santayana

-Richard

On Oct 20, 2015, at 4:15 PM, Susan STONE sstone@berkeley.edu wrote:

Aron,

I definitely find stuff in the server-side logs that helps me find
errors in the XML. It can be a painful process, so I haven't saved any
cherished examples.

Susan

On Tue, Oct 20, 2015 at 4:05 PM, Aron Roberts
aron@socrates.berkeley.edu wrote:

Thanks, Susan!

In my experience, it is usually all or nothing (as with a database

timeout

when the imports are too large or backed up) ...

Interesting. Have you been able to capture any log output on the server

side when those issues occurred? And are there CSpace JIRA issues for

those?

I'd be happy to create one (or more) if you have any raw material around

this.

Aron

On Tue, Oct 20, 2015 at 3:57 PM, Susan STONE sstone@berkeley.edu

wrote:

Aron,

In my experience, it is usually all or nothing (as with a database

timeout

when the imports are too large or backed up),

and I just check the total for each batch.

I usually work out the XML issues in testing.

In the rare cases where there is a problem in some individual records

and the totals don't match, I have been comparing the

CSIDs manually-ish, but we are working to

automate that process and log the particular records

missed so they can be checked and resubmitted.

Susan

On Tue, Oct 20, 2015 at 3:47 PM, Aron Roberts

aron@socrates.berkeley.edu wrote:

Peter wrote:

I think I'll take another look at the Import service, albeit in a

one-at-a-time mode so I can have a better handle on error reporting.

From a trivial test just now, I'm wondering whether the Imports

service

might give us just enough information to do a multi-record import, and

be

able to tell which records were successfully imported and which were

not?

Specifically, if we're providing CSIDs for each record at import time,

perhaps we can tell which were successfully imported, and which failed

to be

imported - and thus need to be fixed and re-submitted in a follow-up

import?

Example POST to the Imports service, of five CollectionObject records

to

be imported into the 'core' tenant:

curl -X POST http://yourhostnamehere:8180/cspace-services/imports -i -u

"admin@core.collectionspace.org:Administrator" -H "Content-Type:

application/xml" -T mixed-objects-some-invalid.xml

Where the file 'mixed-objects-some-invalid.xml' is a payload

consisting of

five CollectionObject records to be imported, and where the fourth such

record includes a non-existent element (i.e. one not present in the

collectionobjects_common schema):

<?xml version="1.0" encoding="UTF-8"?> <imports>
<import service="CollectionObjects" type="CollectionObject"

CSID="e9a3e850-2776-44f4-b068-4ab1a0c8c046">

    <schema

xmlns:collectionobjects_common="

name="collectionobjects_common">

<collectionobjects_common:objectNumber>UC1</collectionobjects_common:objectNumber>

    </schema>

</import>

<import service="CollectionObjects" type="CollectionObject"

CSID="c730a597-3229-476a-9e22-4ce89c003925">

    <schema

xmlns:collectionobjects_common="

name="collectionobjects_common">

<collectionobjects_common:objectNumber>UC2</collectionobjects_common:objectNumber>

    </schema>

</import>

<import service="CollectionObjects" type="CollectionObject"

CSID="d7358564-6a08-4dc2-a07d-9708471daa02">

    <schema

xmlns:collectionobjects_common="

name="collectionobjects_common">

<collectionobjects_common:objectNumber>UC3</collectionobjects_common:objectNumber>

    </schema>

</import>

<import service="CollectionObjects" type="CollectionObject"

CSID="6feb15c3-4e1e-4230-bb88-fa81467f6cbd">

    <schema

xmlns:collectionobjects_common="

name="collectionobjects_common">

<collectionobjects_common:objectNumber>UC4</collectionobjects_common:objectNumber>

        <collectionobjects_common:foo>THIS ELEMENT DOESN'T EXIST IN

THE

SCHEMA</collectionobjects_common:foo>

    </schema>

</import>

<import service="CollectionObjects" type="CollectionObject"

CSID="a5839b2c-b229-4a55-8ee3-71b2440658a3">

    <schema

xmlns:collectionobjects_common="

name="collectionobjects_common">

<collectionobjects_common:objectNumber>UC5</collectionobjects_common:objectNumber>

    </schema>

</import>
</imports>

This import generates the following console output (pretty printed after

the

fact for clarity, with hand-editing of the <report> content for further

readability):

<?xml version="1.0" encoding="utf-16"?> <import>
<msg>SUCCESS</msg>

<importedRecords>

    <importedRecord>

        <doctype>CollectionObject</doctype>

        <csid>d7358564-6a08-4dc2-a07d-9708471daa02</csid>

    </importedRecord>

    <importedRecord>

        <doctype>CollectionObject</doctype>

        <csid>c730a597-3229-476a-9e22-4ce89c003925</csid>

    </importedRecord>

    <importedRecord>

        <doctype>CollectionObject</doctype>

        <csid>e9a3e850-2776-44f4-b068-4ab1a0c8c046</csid>

    </importedRecord>

    <importedRecord>

        <doctype>CollectionObject</doctype>

        <csid>a5839b2c-b229-4a55-8ee3-71b2440658a3</csid>

    </importedRecord>

</importedRecords>

<status>Success</status>

<totalRecordsImported>4</totalRecordsImported>

<numRecordsImportedByDocType>

    <numRecordsImported>

        <docType>CollectionObject</docType>

        <numRecords>4</numRecords>

    </numRecordsImported>

</numRecordsImportedByDocType>

<report>

READ:

/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd/document.xml/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd

READ:

/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02/document.xml/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02

READ:

/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3/document.xml/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3

READ:

/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925/document.xml/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925

READ:

/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046/document.xml/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046</report>

</import>

Note that <totalRecordsImported> identifies that only 4 records were

successfully imported.

And by checking the CSIDs that were imported successfully against

the

entire list of CSIDs, perhaps the 'missing' records (that failed to

import)

could be identified? (In the list above, note that CSID

'6feb15c3-4e1e-4230-bb88-fa81467f6cbd' - the CSID for the problematic

fourth

record - doesn't appear in the list of <importedRecords>.) If this test

is

any indication, you might need to sort both lists of CSIDs - those

submitted

and those successfully imported - as the ordering in the import payload

might not match the order returned in the output from that POST ...

Anyway,

a thought.

Also: there are others on this list who are extremely experienced at

doing

imports, and who might be able to share their own tips/tricks/scripts

for

making it easier to identify records that failed to import, and

re-submitting those ...

Aron

On Tue, Oct 20, 2015 at 2:24 PM, Peter Murray pmurray@chillco.com

wrote:

Thanks, Aron and Richard.  I'm working with Acquisition records at the

moment, so I would need to add the 'other number' field to it and the

other

record types in order to store that PastPerfect identifier.  I think

I'll

take another look at the Import service, albeit in a one-at-a-time mode

so I

can have a better handle on error reporting.

Peter

On Oct 20, 2015, at 5:05 PM, Richard Millet

richard.millet@lyrasis.org

wrote:

Peter,

I agree with Aron.  If you decide you can't (or would rather not) use

the

Import service to create the cataloging records, then using the "Other

Number" field is probably your best choice.

Keep in mind that using a combination of data insertion methods

(RESTFul

API, Import Service, SQL) to get data into CollectionSpace is perfectly

ok.

So perhaps you could create all the cataloging records using the Import

service and then make additional changes with RESTFul PUT and other API

calls.

-Richard


From: Talk talk-bounces@lists.collectionspace.org on behalf of Aron

Roberts aron@socrates.berkeley.edu

Sent: Tuesday, October 20, 2015 1:00 PM

To: Peter Murray

Cc: CollectionSpace Talk List

Subject: Re: [Talk] Using RESTful interface, create a record with a

particular CSID

I wrote:

One possible way to do this - if this were supported, say, as a

future

enhancement - might be to supply the CSID in the <uri> value in a

<collectionspace_core> record part, in POSTs ...

And, of course, that's exactly what you suggested, Peter! :) Serves

me

right for too-quickly skimming!

Just thinking out loud here: the services would need to check that

URI

for at least: format, record type matching, and identifier uniqueness

(even

with the improbability of duplicate Type 4 UUIDs), and presumably

reject

records that didn't pass those validation checks, returning a '400 Bad

Request' or similar status.

And for certain record types, the services might also need to check

and/or synthesize the <refName> value. (For object or procedural

records

with hierarchy, such as Cataloging records, the CSID is part of that

refName.)

Aron

On Tue, Oct 20, 2015 at 12:46 PM, Aron Roberts

aron@socrates.berkeley.edu wrote:

As a possible workaround, the Imports service will allow you to

specify

a CSID for a newly imported record.

As an off-the-cuff, not-researched response: I don't recall if you

can

specify a CSID on a POST, when interacting with the services for

various

record types (i.e. outside of an import context), but my recollection

is

that's not possible.

One possible way to do this - if this were supported, say, as a

future

enhancement - might be to supply the CSID in the <uri> value in a

<collectionspace_core> record part, in POSTs; e.g.

<document name="collectionobjects">

ns2:collectionspace_core

...

<uri>/collectionobjects/90c0a0e6-eeca-46dd-add6</uri>

</ns2:collectionspace_core>

ns2:collectionobjects_common

 ...

it seems to be a really handy thing to have the CSID match

PastPerfect

ID (especially in the migration process when I am iterating through

loading

templates and linking records together).

Would the 'other number' multivalued field in

Cataloging/CollectionObject records work for this purpose? Out of the

box,

there's a 'previous' type for that field. (See attached and below.)

<cspace-other-number-field-example.png>

<otherNumberList> <otherNumber>
<numberValue>0001</numberValue>

<numberType>serial</numberType>
</otherNumber> <otherNumber>
<numberValue>204b95db-1557-4c8d-ba28-42e5578e53d3</numberValue>

<numberType>previous</numberType>
</otherNumber> </otherNumberList>

AFAIK, this is the provided/intended way to stash away formerly-used

museum numbers or identifiers that you'd like to continue to have

associated

with a record in CollectionSpace, although this clearly isn't as

clean/easy

to work with as having matching UUIDs in both one's old and new

systems.

Aron

On Tue, Oct 20, 2015 at 12:26 PM, Peter Murray pmurray@chillco.com

wrote:

As it happens, PastPerfect also uses Type-4 UUIDs as internal record

numbers, and it seems to be a really handy thing to have the CSID

match

PastPerfect ID (especially in the migration process when I am

iterating

through loading templates and linking records together).  The problem

is

that the RESTful service interface doesn't seem to let me specify a

CSID.

If I PUT to /cspace-services/acquisitions/{{UUID}} and that record

doesn't already exist, I get back a 404.[1]  If I POST to

/cspace-services/acquisitions and include this in the document:

<?xml version="1.0" encoding="UTF-8"?> <document name="acquisitions">
<ns2:collectionspace_core

xmlns:ns2="http://collectionspace.org/collectionspace_core/"

    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    <updatedBy>PastPerfect: {{ UPDATEDBY }}</updatedBy>

    <createdBy>PastPerfect Migration</createdBy>

    <workflowState>project</workflowState>

    <tenantId>11</tenantId>

    <updatedAt>{{ __updatedAt }}</updatedAt>

    <uri>/acquisitions/{{ PPID }}</uri>

</ns2:collectionspace_core>

...the service then doesn't honor the identifier in the <uri> element

and it assigns the record a new CSID.  (The above, by the way, is

part of

the Jinja2 template I'm using to create records, so the {{ PPID }} is

a

replaced placeholder.)

Thoughts?

Peter

[1] This is what I expect a RESTful interface to do...

--

Peter Murray

Dev/Ops Lead and Project Manager

Cherry Hill Company


Talk mailing list

Talk@lists.collectionspace.org


Talk mailing list

Talk@lists.collectionspace.org


Talk mailing list
Talk@lists.collectionspace.org

Talkers, Speaking of "remembering history"... There is a very long and highly ramified discussion about the behavior of the IMPORT service and migrating large amounts of data in the following JIRA: https://issues.collectionspace.org/browse/PAHMA-378 it discusses, among other things: * The "all-or-nothing" behavior that Susan and I previously mentioned, and shows how to exploit it for one's own benefit. * The "furball effect", a slightly studied, perhaps not very general, behavior of CSpace where successive IMPORTs get slower and slower until one fails, whereupon the system is again able to process IMPORTs efficiently. * The "Magic Bus" approach to scheduling large numbers of imports, in order to automate to some degree the migration process. * The methodology used in debugging IMPORT issues, many by means of specific examples. HTH! John On Wed, Oct 21, 2015 at 9:26 AM, Richard Millet <richard.millet@lyrasis.org> wrote: > Thanks Susan. My last comment was partly tongue-in-cheek. As for JIRA > issue status changes, *everyone* in the community should speak up and > challenge any issue status changes they disagree with. Please! > > ________________________________________ > From: Susan STONE <sstone@berkeley.edu> > Sent: Tuesday, October 20, 2015 8:04 PM > To: Richard Millet > Subject: Re: [Talk] Using RESTful interface, create a record with a > particular CSID > > I'll keep that in mind. I do remember I had a bad experience with > JIRAs in the past well enough not to want to repeat it: they all went > from major to minor to will not fix. > > Susan > > On Tue, Oct 20, 2015 at 7:56 PM, Richard Millet > <richard.millet@lyrasis.org> wrote: > > Susan, > > > > "Those who cannot remember the past (by documenting log file findings) > are > > condemned to repeat it. George Santayana > > > > -Richard > > > > On Oct 20, 2015, at 4:15 PM, Susan STONE <sstone@berkeley.edu> wrote: > > > > Aron, > > > > I definitely find stuff in the server-side logs that helps me find > > errors in the XML. It can be a painful process, so I haven't saved any > > cherished examples. > > > > Susan > > > > On Tue, Oct 20, 2015 at 4:05 PM, Aron Roberts > > <aron@socrates.berkeley.edu> wrote: > > > > Thanks, Susan! > > > > > > In my experience, it is usually all or nothing (as with a database > timeout > > > > when the imports are too large or backed up) ... > > > > > > Interesting. Have you been able to capture any log output on the server > > > > side when those issues occurred? And are there CSpace JIRA issues for > those? > > > > I'd be happy to create one (or more) if you have any raw material around > > > > this. > > > > > > Aron > > > > > > On Tue, Oct 20, 2015 at 3:57 PM, Susan STONE <sstone@berkeley.edu> > wrote: > > > > > > Aron, > > > > > > In my experience, it is usually all or nothing (as with a database > timeout > > > > when the imports are too large or backed up), > > > > and I just check the total for each batch. > > > > I usually work out the XML issues in testing. > > > > > > In the rare cases where there is a problem in some individual records > > > > and the totals don't match, I have been comparing the > > > > CSIDs manually-ish, but we are working to > > > > automate that process and log the particular records > > > > missed so they can be checked and resubmitted. > > > > > > Susan > > > > > > On Tue, Oct 20, 2015 at 3:47 PM, Aron Roberts > > > > <aron@socrates.berkeley.edu> wrote: > > > > Peter wrote: > > > > I think I'll take another look at the Import service, albeit in a > > > > one-at-a-time mode so I can have a better handle on error reporting. > > > > > > From a trivial test just now, I'm wondering whether the Imports > > > > service > > > > might give us *just enough* information to do a multi-record import, and > > > > be > > > > able to tell which records were successfully imported and which were > > > > not? > > > > > > Specifically, if we're providing CSIDs for each record at import time, > > > > perhaps we can tell which were successfully imported, and which failed > > > > to be > > > > imported - and thus need to be fixed and re-submitted in a follow-up > > > > import? > > > > > > Example POST to the Imports service, of five CollectionObject records > > > > to > > > > be imported into the 'core' tenant: > > > > > > curl -X POST http://yourhostnamehere:8180/cspace-services/imports -i -u > > > > "admin@core.collectionspace.org:Administrator" -H "Content-Type: > > > > application/xml" -T mixed-objects-some-invalid.xml > > > > > > Where the file 'mixed-objects-some-invalid.xml' is a payload > > > > consisting of > > > > five CollectionObject records to be imported, and where the fourth such > > > > record includes a non-existent element (i.e. one not present in the > > > > collectionobjects_common schema): > > > > > > <?xml version="1.0" encoding="UTF-8"?> > > > > <imports> > > > > <import service="CollectionObjects" type="CollectionObject" > > > > CSID="e9a3e850-2776-44f4-b068-4ab1a0c8c046"> > > > > <schema > > > > > > xmlns:collectionobjects_common=" > http://collectionspace.org/services/collectionobject" > > > > name="collectionobjects_common"> > > > > > > > > > <collectionobjects_common:objectNumber>UC1</collectionobjects_common:objectNumber> > > > > </schema> > > > > </import> > > > > <import service="CollectionObjects" type="CollectionObject" > > > > CSID="c730a597-3229-476a-9e22-4ce89c003925"> > > > > <schema > > > > > > xmlns:collectionobjects_common=" > http://collectionspace.org/services/collectionobject" > > > > name="collectionobjects_common"> > > > > > > > > > <collectionobjects_common:objectNumber>UC2</collectionobjects_common:objectNumber> > > > > </schema> > > > > </import> > > > > <import service="CollectionObjects" type="CollectionObject" > > > > CSID="d7358564-6a08-4dc2-a07d-9708471daa02"> > > > > <schema > > > > > > xmlns:collectionobjects_common=" > http://collectionspace.org/services/collectionobject" > > > > name="collectionobjects_common"> > > > > > > > > > <collectionobjects_common:objectNumber>UC3</collectionobjects_common:objectNumber> > > > > </schema> > > > > </import> > > > > <import service="CollectionObjects" type="CollectionObject" > > > > CSID="6feb15c3-4e1e-4230-bb88-fa81467f6cbd"> > > > > <schema > > > > > > xmlns:collectionobjects_common=" > http://collectionspace.org/services/collectionobject" > > > > name="collectionobjects_common"> > > > > > > > > > <collectionobjects_common:objectNumber>UC4</collectionobjects_common:objectNumber> > > > > <collectionobjects_common:foo>THIS ELEMENT DOESN'T EXIST IN > > > > THE > > > > SCHEMA</collectionobjects_common:foo> > > > > </schema> > > > > </import> > > > > <import service="CollectionObjects" type="CollectionObject" > > > > CSID="a5839b2c-b229-4a55-8ee3-71b2440658a3"> > > > > <schema > > > > > > xmlns:collectionobjects_common=" > http://collectionspace.org/services/collectionobject" > > > > name="collectionobjects_common"> > > > > > > > > > <collectionobjects_common:objectNumber>UC5</collectionobjects_common:objectNumber> > > > > </schema> > > > > </import> > > > > </imports> > > > > > > This import generates the following console output (pretty printed after > > > > the > > > > fact for clarity, with hand-editing of the <report> content for further > > > > readability): > > > > > > <?xml version="1.0" encoding="utf-16"?> > > > > <import> > > > > <msg>SUCCESS</msg> > > > > <importedRecords> > > > > <importedRecord> > > > > <doctype>CollectionObject</doctype> > > > > <csid>d7358564-6a08-4dc2-a07d-9708471daa02</csid> > > > > </importedRecord> > > > > <importedRecord> > > > > <doctype>CollectionObject</doctype> > > > > <csid>c730a597-3229-476a-9e22-4ce89c003925</csid> > > > > </importedRecord> > > > > <importedRecord> > > > > <doctype>CollectionObject</doctype> > > > > <csid>e9a3e850-2776-44f4-b068-4ab1a0c8c046</csid> > > > > </importedRecord> > > > > <importedRecord> > > > > <doctype>CollectionObject</doctype> > > > > <csid>a5839b2c-b229-4a55-8ee3-71b2440658a3</csid> > > > > </importedRecord> > > > > </importedRecords> > > > > <status>Success</status> > > > > <totalRecordsImported>4</totalRecordsImported> > > > > <numRecordsImportedByDocType> > > > > <numRecordsImported> > > > > <docType>CollectionObject</docType> > > > > <numRecords>4</numRecords> > > > > </numRecordsImported> > > > > </numRecordsImportedByDocType> > > > > <report> > > > > READ: > > > > > > > /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd/document.xml/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd > > > > READ: > > > > > > > /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02/document.xml/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02 > > > > READ: > > > > > > > /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3/document.xml/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3 > > > > READ: > > > > > > > /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925/document.xml/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925 > > > > READ: > > > > > > > /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046/document.xml/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046</report> > > > > </import> > > > > > > Note that <totalRecordsImported> identifies that only 4 records were > > > > successfully imported. > > > > > > And by checking the CSIDs that *were* imported successfully against > > > > the > > > > entire list of CSIDs, perhaps the 'missing' records (that failed to > > > > import) > > > > could be identified? (In the list above, note that CSID > > > > '6feb15c3-4e1e-4230-bb88-fa81467f6cbd' - the CSID for the problematic > > > > fourth > > > > record - doesn't appear in the list of <importedRecords>.) If this test > > > > is > > > > any indication, you might need to sort both lists of CSIDs - those > > > > submitted > > > > and those successfully imported - as the ordering in the import payload > > > > might not match the order returned in the output from that POST ... > > > > Anyway, > > > > a thought. > > > > > > Also: there are others on this list who are extremely experienced at > > > > doing > > > > imports, and who might be able to share their own tips/tricks/scripts > > > > for > > > > making it easier to identify records that failed to import, and > > > > re-submitting those ... > > > > > > Aron > > > > > > > > On Tue, Oct 20, 2015 at 2:24 PM, Peter Murray <pmurray@chillco.com> > > > > wrote: > > > > > > Thanks, Aron and Richard. I'm working with Acquisition records at the > > > > moment, so I would need to add the 'other number' field to it and the > > > > other > > > > record types in order to store that PastPerfect identifier. I think > > > > I'll > > > > take another look at the Import service, albeit in a one-at-a-time mode > > > > so I > > > > can have a better handle on error reporting. > > > > > > > > Peter > > > > > > > > On Oct 20, 2015, at 5:05 PM, Richard Millet > > > > <richard.millet@lyrasis.org> > > > > wrote: > > > > > > Peter, > > > > > > I agree with Aron. If you decide you can't (or would rather not) use > > > > the > > > > Import service to create the cataloging records, then using the "Other > > > > Number" field is probably your best choice. > > > > > > Keep in mind that using a combination of data insertion methods > > > > (RESTFul > > > > API, Import Service, SQL) to get data into CollectionSpace is perfectly > > > > ok. > > > > So perhaps you could create all the cataloging records using the Import > > > > service and then make additional changes with RESTFul PUT and other API > > > > calls. > > > > > > -Richard > > > > > > > > ________________________________ > > > > From: Talk <talk-bounces@lists.collectionspace.org> on behalf of Aron > > > > Roberts <aron@socrates.berkeley.edu> > > > > Sent: Tuesday, October 20, 2015 1:00 PM > > > > To: Peter Murray > > > > Cc: CollectionSpace Talk List > > > > Subject: Re: [Talk] Using RESTful interface, create a record with a > > > > particular CSID > > > > > > I wrote: > > > > One possible way to do this - if this were supported, say, as a > > > > future > > > > enhancement - might be to supply the CSID in the <uri> value in a > > > > <collectionspace_core> record part, in POSTs ... > > > > > > And, of course, that's exactly what you suggested, Peter! :) Serves > > > > me > > > > right for too-quickly skimming! > > > > > > Just thinking out loud here: the services would need to check that > > > > URI > > > > for at least: format, record type matching, and identifier uniqueness > > > > (even > > > > with the improbability of duplicate Type 4 UUIDs), and presumably > > > > reject > > > > records that didn't pass those validation checks, returning a '400 Bad > > > > Request' or similar status. > > > > > > And for certain record types, the services might also need to check > > > > and/or synthesize the <refName> value. (For object or procedural > > > > records > > > > with hierarchy, such as Cataloging records, the CSID is part of that > > > > refName.) > > > > > > Aron > > > > > > On Tue, Oct 20, 2015 at 12:46 PM, Aron Roberts > > > > <aron@socrates.berkeley.edu> wrote: > > > > > > As a possible workaround, the Imports service will allow you to > > > > specify > > > > a CSID for a newly imported record. > > > > > > As an off-the-cuff, not-researched response: I don't recall if you > > > > can > > > > specify a CSID on a POST, when interacting with the services for > > > > various > > > > record types (i.e. outside of an import context), but my recollection > > > > is > > > > that's not possible. > > > > > > One possible way to do this - if this were supported, say, as a > > > > future > > > > enhancement - might be to supply the CSID in the <uri> value in a > > > > <collectionspace_core> record part, in POSTs; e.g. > > > > > > <document name="collectionobjects"> > > > > <ns2:collectionspace_core> > > > > ... > > > > <uri>/collectionobjects/90c0a0e6-eeca-46dd-add6</uri> > > > > </ns2:collectionspace_core> > > > > <ns2:collectionobjects_common> > > > > ... > > > > > > it seems to be a really handy thing to have the CSID match > > > > PastPerfect > > > > ID (especially in the migration process when I am iterating through > > > > loading > > > > templates and linking records together). > > > > > > Would the 'other number' multivalued field in > > > > Cataloging/CollectionObject records work for this purpose? Out of the > > > > box, > > > > there's a 'previous' type for that field. (See attached and below.) > > > > > > <cspace-other-number-field-example.png> > > > > > > > > <otherNumberList> > > > > <otherNumber> > > > > <numberValue>0001</numberValue> > > > > <numberType>serial</numberType> > > > > </otherNumber> > > > > <otherNumber> > > > > <numberValue>204b95db-1557-4c8d-ba28-42e5578e53d3</numberValue> > > > > <numberType>previous</numberType> > > > > </otherNumber> > > > > </otherNumberList> > > > > > > AFAIK, this is the provided/intended way to stash away formerly-used > > > > museum numbers or identifiers that you'd like to continue to have > > > > associated > > > > with a record in CollectionSpace, although this clearly isn't as > > > > clean/easy > > > > to work with as having matching UUIDs in both one's old and new > > > > systems. > > > > > > Aron > > > > > > > > > > > > On Tue, Oct 20, 2015 at 12:26 PM, Peter Murray <pmurray@chillco.com> > > > > wrote: > > > > > > As it happens, PastPerfect also uses Type-4 UUIDs as internal record > > > > numbers, and it seems to be a really handy thing to have the CSID > > > > match > > > > PastPerfect ID (especially in the migration process when I am > > > > iterating > > > > through loading templates and linking records together). The problem > > > > is > > > > that the RESTful service interface doesn't seem to let me specify a > > > > CSID. > > > > > > If I PUT to /cspace-services/acquisitions/{{UUID}} and that record > > > > doesn't already exist, I get back a 404.[1] If I POST to > > > > /cspace-services/acquisitions and include this in the document: > > > > > > <?xml version="1.0" encoding="UTF-8"?> > > > > <document name="acquisitions"> > > > > <ns2:collectionspace_core > > > > xmlns:ns2="http://collectionspace.org/collectionspace_core/" > > > > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> > > > > <updatedBy>PastPerfect: {{ UPDATEDBY }}</updatedBy> > > > > <createdBy>PastPerfect Migration</createdBy> > > > > <workflowState>project</workflowState> > > > > <tenantId>11</tenantId> > > > > <updatedAt>{{ __updatedAt }}</updatedAt> > > > > <uri>/acquisitions/{{ PPID }}</uri> > > > > </ns2:collectionspace_core> > > > > > > ...the service then doesn't honor the identifier in the <uri> element > > > > and it assigns the record a new CSID. (The above, by the way, is > > > > part of > > > > the Jinja2 template I'm using to create records, so the {{ PPID }} is > > > > a > > > > replaced placeholder.) > > > > > > Thoughts? > > > > > > > > Peter > > > > > > [1] This is what I expect a RESTful interface to do... > > > > > > > > > > -- > > > > Peter Murray > > > > Dev/Ops Lead and Project Manager > > > > Cherry Hill Company > > > > > > > > > > _______________________________________________ > > > > Talk mailing list > > > > Talk@lists.collectionspace.org > > > > > > > > > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org > > > > > > > > > > _______________________________________________ > > > > Talk mailing list > > > > Talk@lists.collectionspace.org > > > > > > > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org > > > > > > > > > > > > _______________________________________________ > > Talk mailing list > > Talk@lists.collectionspace.org > > > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org > > _______________________________________________ > Talk mailing list > Talk@lists.collectionspace.org > > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org >
PM
Peter Murray
Thu, Oct 22, 2015 1:47 PM

Thanks, John.  I'm fortunate in one respect that I'm not dealing with record numbers at that scale, so the one-at-a-time ImportService calls doesn't add too much overhead while having the benefit of being able to deal with each records error messages as it is being processed.  So far this pattern is working for me...

Peter

On Oct 21, 2015, at 1:06 PM, John B Lowe jblowe@berkeley.edu wrote:

Talkers,

Speaking of "remembering history"...

There is a very long and highly ramified discussion about the behavior of the IMPORT service and migrating large amounts of data in the following JIRA:

https://issues.collectionspace.org/browse/PAHMA-378 https://issues.collectionspace.org/browse/PAHMA-378

it discusses, among other things:

  • The "all-or-nothing" behavior that Susan and I previously mentioned, and shows how to exploit it for one's own benefit.
  • The "furball effect", a slightly studied, perhaps not very general, behavior of CSpace where successive IMPORTs get slower and slower until one fails, whereupon the system is again able to process IMPORTs efficiently.
  • The "Magic Bus" approach to scheduling large numbers of imports, in order to automate to some degree the migration process.
  • The methodology used in debugging IMPORT issues, many by means of specific examples.

HTH!

John

On Wed, Oct 21, 2015 at 9:26 AM, Richard Millet <richard.millet@lyrasis.org mailto:richard.millet@lyrasis.org> wrote:
Thanks Susan.  My last comment was partly tongue-in-cheek.  As for JIRA issue status changes, everyone in the community should speak up and challenge any issue status changes they disagree with.  Please!


From: Susan STONE <sstone@berkeley.edu mailto:sstone@berkeley.edu>
Sent: Tuesday, October 20, 2015 8:04 PM
To: Richard Millet
Subject: Re: [Talk] Using RESTful interface, create a record with a particular CSID

I'll keep that in mind. I do remember I had a bad experience with
JIRAs in the past well enough not to want to repeat it: they all went
from major to minor to will not fix.

Susan

On Tue, Oct 20, 2015 at 7:56 PM, Richard Millet
<richard.millet@lyrasis.org mailto:richard.millet@lyrasis.org> wrote:

Susan,

"Those who cannot remember the past (by documenting log file findings) are
condemned to repeat it. George Santayana

-Richard

On Oct 20, 2015, at 4:15 PM, Susan STONE <sstone@berkeley.edu mailto:sstone@berkeley.edu> wrote:

Aron,

I definitely find stuff in the server-side logs that helps me find
errors in the XML. It can be a painful process, so I haven't saved any
cherished examples.

Susan

On Tue, Oct 20, 2015 at 4:05 PM, Aron Roberts
<aron@socrates.berkeley.edu mailto:aron@socrates.berkeley.edu> wrote:

Thanks, Susan!

In my experience, it is usually all or nothing (as with a database timeout

when the imports are too large or backed up) ...

Interesting. Have you been able to capture any log output on the server

side when those issues occurred? And are there CSpace JIRA issues for those?

I'd be happy to create one (or more) if you have any raw material around

this.

Aron

On Tue, Oct 20, 2015 at 3:57 PM, Susan STONE <sstone@berkeley.edu mailto:sstone@berkeley.edu> wrote:

Aron,

In my experience, it is usually all or nothing (as with a database timeout

when the imports are too large or backed up),

and I just check the total for each batch.

I usually work out the XML issues in testing.

In the rare cases where there is a problem in some individual records

and the totals don't match, I have been comparing the

CSIDs manually-ish, but we are working to

automate that process and log the particular records

missed so they can be checked and resubmitted.

Susan

On Tue, Oct 20, 2015 at 3:47 PM, Aron Roberts

<aron@socrates.berkeley.edu mailto:aron@socrates.berkeley.edu> wrote:

Peter wrote:

I think I'll take another look at the Import service, albeit in a

one-at-a-time mode so I can have a better handle on error reporting.

From a trivial test just now, I'm wondering whether the Imports

service

might give us just enough information to do a multi-record import, and

be

able to tell which records were successfully imported and which were

not?

Specifically, if we're providing CSIDs for each record at import time,

perhaps we can tell which were successfully imported, and which failed

to be

imported - and thus need to be fixed and re-submitted in a follow-up

import?

Example POST to the Imports service, of five CollectionObject records

to

be imported into the 'core' tenant:

curl -X POST http://yourhostnamehere:8180/cspace-services/imports http://yourhostnamehere:8180/cspace-services/imports -i -u

"admin@core.collectionspace.org:Administrator" -H "Content-Type:

application/xml" -T mixed-objects-some-invalid.xml

Where the file 'mixed-objects-some-invalid.xml' is a payload

consisting of

five CollectionObject records to be imported, and where the fourth such

record includes a non-existent element (i.e. one not present in the

collectionobjects_common schema):

<?xml version="1.0" encoding="UTF-8"?> <imports>
<import service="CollectionObjects" type="CollectionObject"

CSID="e9a3e850-2776-44f4-b068-4ab1a0c8c046">

    <schema

xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject http://collectionspace.org/services/collectionobject"

name="collectionobjects_common">

<collectionobjects_common:objectNumber>UC1</collectionobjects_common:objectNumber>

    </schema>

</import>

<import service="CollectionObjects" type="CollectionObject"

CSID="c730a597-3229-476a-9e22-4ce89c003925">

    <schema

xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject http://collectionspace.org/services/collectionobject"

name="collectionobjects_common">

<collectionobjects_common:objectNumber>UC2</collectionobjects_common:objectNumber>

    </schema>

</import>

<import service="CollectionObjects" type="CollectionObject"

CSID="d7358564-6a08-4dc2-a07d-9708471daa02">

    <schema

xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject http://collectionspace.org/services/collectionobject"

name="collectionobjects_common">

<collectionobjects_common:objectNumber>UC3</collectionobjects_common:objectNumber>

    </schema>

</import>

<import service="CollectionObjects" type="CollectionObject"

CSID="6feb15c3-4e1e-4230-bb88-fa81467f6cbd">

    <schema

xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject http://collectionspace.org/services/collectionobject"

name="collectionobjects_common">

<collectionobjects_common:objectNumber>UC4</collectionobjects_common:objectNumber>

        <collectionobjects_common:foo>THIS ELEMENT DOESN'T EXIST IN

THE

SCHEMA</collectionobjects_common:foo>

    </schema>

</import>

<import service="CollectionObjects" type="CollectionObject"

CSID="a5839b2c-b229-4a55-8ee3-71b2440658a3">

    <schema

xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject http://collectionspace.org/services/collectionobject"

name="collectionobjects_common">

<collectionobjects_common:objectNumber>UC5</collectionobjects_common:objectNumber>

    </schema>

</import>
</imports>

This import generates the following console output (pretty printed after

the

fact for clarity, with hand-editing of the <report> content for further

readability):

<?xml version="1.0" encoding="utf-16"?> <import>
<msg>SUCCESS</msg>

<importedRecords>

    <importedRecord>

        <doctype>CollectionObject</doctype>

        <csid>d7358564-6a08-4dc2-a07d-9708471daa02</csid>

    </importedRecord>

    <importedRecord>

        <doctype>CollectionObject</doctype>

        <csid>c730a597-3229-476a-9e22-4ce89c003925</csid>

    </importedRecord>

    <importedRecord>

        <doctype>CollectionObject</doctype>

        <csid>e9a3e850-2776-44f4-b068-4ab1a0c8c046</csid>

    </importedRecord>

    <importedRecord>

        <doctype>CollectionObject</doctype>

        <csid>a5839b2c-b229-4a55-8ee3-71b2440658a3</csid>

    </importedRecord>

</importedRecords>

<status>Success</status>

<totalRecordsImported>4</totalRecordsImported>

<numRecordsImportedByDocType>

    <numRecordsImported>

        <docType>CollectionObject</docType>

        <numRecords>4</numRecords>

    </numRecordsImported>

</numRecordsImportedByDocType>

<report>

READ:

/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd/document.xml/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd

READ:

/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02/document.xml/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02

READ:

/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3/document.xml/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3

READ:

/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925/document.xml/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925

READ:

/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046/document.xml/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046</report>

</import>

Note that <totalRecordsImported> identifies that only 4 records were

successfully imported.

And by checking the CSIDs that were imported successfully against

the

entire list of CSIDs, perhaps the 'missing' records (that failed to

import)

could be identified? (In the list above, note that CSID

'6feb15c3-4e1e-4230-bb88-fa81467f6cbd' - the CSID for the problematic

fourth

record - doesn't appear in the list of <importedRecords>.) If this test

is

any indication, you might need to sort both lists of CSIDs - those

submitted

and those successfully imported - as the ordering in the import payload

might not match the order returned in the output from that POST ...

Anyway,

a thought.

Also: there are others on this list who are extremely experienced at

doing

imports, and who might be able to share their own tips/tricks/scripts

for

making it easier to identify records that failed to import, and

re-submitting those ...

Aron

On Tue, Oct 20, 2015 at 2:24 PM, Peter Murray <pmurray@chillco.com mailto:pmurray@chillco.com>

wrote:

Thanks, Aron and Richard.  I'm working with Acquisition records at the

moment, so I would need to add the 'other number' field to it and the

other

record types in order to store that PastPerfect identifier.  I think

I'll

take another look at the Import service, albeit in a one-at-a-time mode

so I

can have a better handle on error reporting.

Peter

On Oct 20, 2015, at 5:05 PM, Richard Millet

<richard.millet@lyrasis.org mailto:richard.millet@lyrasis.org>

wrote:

Peter,

I agree with Aron.  If you decide you can't (or would rather not) use

the

Import service to create the cataloging records, then using the "Other

Number" field is probably your best choice.

Keep in mind that using a combination of data insertion methods

(RESTFul

API, Import Service, SQL) to get data into CollectionSpace is perfectly

ok.

So perhaps you could create all the cataloging records using the Import

service and then make additional changes with RESTFul PUT and other API

calls.

-Richard


From: Talk <talk-bounces@lists.collectionspace.org mailto:talk-bounces@lists.collectionspace.org> on behalf of Aron

Roberts <aron@socrates.berkeley.edu mailto:aron@socrates.berkeley.edu>

Sent: Tuesday, October 20, 2015 1:00 PM

To: Peter Murray

Cc: CollectionSpace Talk List

Subject: Re: [Talk] Using RESTful interface, create a record with a

particular CSID

I wrote:

One possible way to do this - if this were supported, say, as a

future

enhancement - might be to supply the CSID in the <uri> value in a

<collectionspace_core> record part, in POSTs ...

And, of course, that's exactly what you suggested, Peter! :) Serves

me

right for too-quickly skimming!

Just thinking out loud here: the services would need to check that

URI

for at least: format, record type matching, and identifier uniqueness

(even

with the improbability of duplicate Type 4 UUIDs), and presumably

reject

records that didn't pass those validation checks, returning a '400 Bad

Request' or similar status.

And for certain record types, the services might also need to check

and/or synthesize the <refName> value. (For object or procedural

records

with hierarchy, such as Cataloging records, the CSID is part of that

refName.)

Aron

On Tue, Oct 20, 2015 at 12:46 PM, Aron Roberts

<aron@socrates.berkeley.edu mailto:aron@socrates.berkeley.edu> wrote:

As a possible workaround, the Imports service will allow you to

specify

a CSID for a newly imported record.

As an off-the-cuff, not-researched response: I don't recall if you

can

specify a CSID on a POST, when interacting with the services for

various

record types (i.e. outside of an import context), but my recollection

is

that's not possible.

One possible way to do this - if this were supported, say, as a

future

enhancement - might be to supply the CSID in the <uri> value in a

<collectionspace_core> record part, in POSTs; e.g.

<document name="collectionobjects">

ns2:collectionspace_core

...

<uri>/collectionobjects/90c0a0e6-eeca-46dd-add6</uri>

</ns2:collectionspace_core>

ns2:collectionobjects_common

 ...

it seems to be a really handy thing to have the CSID match

PastPerfect

ID (especially in the migration process when I am iterating through

loading

templates and linking records together).

Would the 'other number' multivalued field in

Cataloging/CollectionObject records work for this purpose? Out of the

box,

there's a 'previous' type for that field. (See attached and below.)

<cspace-other-number-field-example.png>

<otherNumberList> <otherNumber>
<numberValue>0001</numberValue>

<numberType>serial</numberType>
</otherNumber> <otherNumber>
<numberValue>204b95db-1557-4c8d-ba28-42e5578e53d3</numberValue>

<numberType>previous</numberType>
</otherNumber> </otherNumberList>

AFAIK, this is the provided/intended way to stash away formerly-used

museum numbers or identifiers that you'd like to continue to have

associated

with a record in CollectionSpace, although this clearly isn't as

clean/easy

to work with as having matching UUIDs in both one's old and new

systems.

Aron

On Tue, Oct 20, 2015 at 12:26 PM, Peter Murray <pmurray@chillco.com mailto:pmurray@chillco.com>

wrote:

As it happens, PastPerfect also uses Type-4 UUIDs as internal record

numbers, and it seems to be a really handy thing to have the CSID

match

PastPerfect ID (especially in the migration process when I am

iterating

through loading templates and linking records together).  The problem

is

that the RESTful service interface doesn't seem to let me specify a

CSID.

If I PUT to /cspace-services/acquisitions/{{UUID}} and that record

doesn't already exist, I get back a 404.[1]  If I POST to

/cspace-services/acquisitions and include this in the document:

<?xml version="1.0" encoding="UTF-8"?> <document name="acquisitions">
<ns2:collectionspace_core

xmlns:ns2="http://collectionspace.org/collectionspace_core/ http://collectionspace.org/collectionspace_core/"

    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance <http://www.w3.org/2001/XMLSchema-instance>">

    <updatedBy>PastPerfect: {{ UPDATEDBY }}</updatedBy>

    <createdBy>PastPerfect Migration</createdBy>

    <workflowState>project</workflowState>

    <tenantId>11</tenantId>

    <updatedAt>{{ __updatedAt }}</updatedAt>

    <uri>/acquisitions/{{ PPID }}</uri>

</ns2:collectionspace_core>

...the service then doesn't honor the identifier in the <uri> element

and it assigns the record a new CSID.  (The above, by the way, is

part of

the Jinja2 template I'm using to create records, so the {{ PPID }} is

a

replaced placeholder.)

Thoughts?

Peter

[1] This is what I expect a RESTful interface to do...

--
Peter Murray
Dev/Ops Lead and Project Manager
Cherry Hill Company

Thanks, John. I'm fortunate in one respect that I'm not dealing with record numbers at that scale, so the one-at-a-time ImportService calls doesn't add too much overhead while having the benefit of being able to deal with each records error messages as it is being processed. So far this pattern is working for me... Peter > On Oct 21, 2015, at 1:06 PM, John B Lowe <jblowe@berkeley.edu> wrote: > > Talkers, > > Speaking of "remembering history"... > > There is a very long and highly ramified discussion about the behavior of the IMPORT service and migrating large amounts of data in the following JIRA: > > https://issues.collectionspace.org/browse/PAHMA-378 <https://issues.collectionspace.org/browse/PAHMA-378> > > it discusses, among other things: > > * The "all-or-nothing" behavior that Susan and I previously mentioned, and shows how to exploit it for one's own benefit. > * The "furball effect", a slightly studied, perhaps not very general, behavior of CSpace where successive IMPORTs get slower and slower until one fails, whereupon the system is again able to process IMPORTs efficiently. > * The "Magic Bus" approach to scheduling large numbers of imports, in order to automate to some degree the migration process. > * The methodology used in debugging IMPORT issues, many by means of specific examples. > > HTH! > > John > > > > On Wed, Oct 21, 2015 at 9:26 AM, Richard Millet <richard.millet@lyrasis.org <mailto:richard.millet@lyrasis.org>> wrote: > Thanks Susan. My last comment was partly tongue-in-cheek. As for JIRA issue status changes, *everyone* in the community should speak up and challenge any issue status changes they disagree with. Please! > > ________________________________________ > From: Susan STONE <sstone@berkeley.edu <mailto:sstone@berkeley.edu>> > Sent: Tuesday, October 20, 2015 8:04 PM > To: Richard Millet > Subject: Re: [Talk] Using RESTful interface, create a record with a particular CSID > > I'll keep that in mind. I do remember I had a bad experience with > JIRAs in the past well enough not to want to repeat it: they all went > from major to minor to will not fix. > > Susan > > On Tue, Oct 20, 2015 at 7:56 PM, Richard Millet > <richard.millet@lyrasis.org <mailto:richard.millet@lyrasis.org>> wrote: > > Susan, > > > > "Those who cannot remember the past (by documenting log file findings) are > > condemned to repeat it. George Santayana > > > > -Richard > > > > On Oct 20, 2015, at 4:15 PM, Susan STONE <sstone@berkeley.edu <mailto:sstone@berkeley.edu>> wrote: > > > > Aron, > > > > I definitely find stuff in the server-side logs that helps me find > > errors in the XML. It can be a painful process, so I haven't saved any > > cherished examples. > > > > Susan > > > > On Tue, Oct 20, 2015 at 4:05 PM, Aron Roberts > > <aron@socrates.berkeley.edu <mailto:aron@socrates.berkeley.edu>> wrote: > > > > Thanks, Susan! > > > > > > In my experience, it is usually all or nothing (as with a database timeout > > > > when the imports are too large or backed up) ... > > > > > > Interesting. Have you been able to capture any log output on the server > > > > side when those issues occurred? And are there CSpace JIRA issues for those? > > > > I'd be happy to create one (or more) if you have any raw material around > > > > this. > > > > > > Aron > > > > > > On Tue, Oct 20, 2015 at 3:57 PM, Susan STONE <sstone@berkeley.edu <mailto:sstone@berkeley.edu>> wrote: > > > > > > Aron, > > > > > > In my experience, it is usually all or nothing (as with a database timeout > > > > when the imports are too large or backed up), > > > > and I just check the total for each batch. > > > > I usually work out the XML issues in testing. > > > > > > In the rare cases where there is a problem in some individual records > > > > and the totals don't match, I have been comparing the > > > > CSIDs manually-ish, but we are working to > > > > automate that process and log the particular records > > > > missed so they can be checked and resubmitted. > > > > > > Susan > > > > > > On Tue, Oct 20, 2015 at 3:47 PM, Aron Roberts > > > > <aron@socrates.berkeley.edu <mailto:aron@socrates.berkeley.edu>> wrote: > > > > Peter wrote: > > > > I think I'll take another look at the Import service, albeit in a > > > > one-at-a-time mode so I can have a better handle on error reporting. > > > > > > From a trivial test just now, I'm wondering whether the Imports > > > > service > > > > might give us *just enough* information to do a multi-record import, and > > > > be > > > > able to tell which records were successfully imported and which were > > > > not? > > > > > > Specifically, if we're providing CSIDs for each record at import time, > > > > perhaps we can tell which were successfully imported, and which failed > > > > to be > > > > imported - and thus need to be fixed and re-submitted in a follow-up > > > > import? > > > > > > Example POST to the Imports service, of five CollectionObject records > > > > to > > > > be imported into the 'core' tenant: > > > > > > curl -X POST http://yourhostnamehere:8180/cspace-services/imports <http://yourhostnamehere:8180/cspace-services/imports> -i -u > > > > "admin@core.collectionspace.org:Administrator" -H "Content-Type: > > > > application/xml" -T mixed-objects-some-invalid.xml > > > > > > Where the file 'mixed-objects-some-invalid.xml' is a payload > > > > consisting of > > > > five CollectionObject records to be imported, and where the fourth such > > > > record includes a non-existent element (i.e. one not present in the > > > > collectionobjects_common schema): > > > > > > <?xml version="1.0" encoding="UTF-8"?> > > > > <imports> > > > > <import service="CollectionObjects" type="CollectionObject" > > > > CSID="e9a3e850-2776-44f4-b068-4ab1a0c8c046"> > > > > <schema > > > > > > xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject <http://collectionspace.org/services/collectionobject>" > > > > name="collectionobjects_common"> > > > > > > > > <collectionobjects_common:objectNumber>UC1</collectionobjects_common:objectNumber> > > > > </schema> > > > > </import> > > > > <import service="CollectionObjects" type="CollectionObject" > > > > CSID="c730a597-3229-476a-9e22-4ce89c003925"> > > > > <schema > > > > > > xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject <http://collectionspace.org/services/collectionobject>" > > > > name="collectionobjects_common"> > > > > > > > > <collectionobjects_common:objectNumber>UC2</collectionobjects_common:objectNumber> > > > > </schema> > > > > </import> > > > > <import service="CollectionObjects" type="CollectionObject" > > > > CSID="d7358564-6a08-4dc2-a07d-9708471daa02"> > > > > <schema > > > > > > xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject <http://collectionspace.org/services/collectionobject>" > > > > name="collectionobjects_common"> > > > > > > > > <collectionobjects_common:objectNumber>UC3</collectionobjects_common:objectNumber> > > > > </schema> > > > > </import> > > > > <import service="CollectionObjects" type="CollectionObject" > > > > CSID="6feb15c3-4e1e-4230-bb88-fa81467f6cbd"> > > > > <schema > > > > > > xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject <http://collectionspace.org/services/collectionobject>" > > > > name="collectionobjects_common"> > > > > > > > > <collectionobjects_common:objectNumber>UC4</collectionobjects_common:objectNumber> > > > > <collectionobjects_common:foo>THIS ELEMENT DOESN'T EXIST IN > > > > THE > > > > SCHEMA</collectionobjects_common:foo> > > > > </schema> > > > > </import> > > > > <import service="CollectionObjects" type="CollectionObject" > > > > CSID="a5839b2c-b229-4a55-8ee3-71b2440658a3"> > > > > <schema > > > > > > xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject <http://collectionspace.org/services/collectionobject>" > > > > name="collectionobjects_common"> > > > > > > > > <collectionobjects_common:objectNumber>UC5</collectionobjects_common:objectNumber> > > > > </schema> > > > > </import> > > > > </imports> > > > > > > This import generates the following console output (pretty printed after > > > > the > > > > fact for clarity, with hand-editing of the <report> content for further > > > > readability): > > > > > > <?xml version="1.0" encoding="utf-16"?> > > > > <import> > > > > <msg>SUCCESS</msg> > > > > <importedRecords> > > > > <importedRecord> > > > > <doctype>CollectionObject</doctype> > > > > <csid>d7358564-6a08-4dc2-a07d-9708471daa02</csid> > > > > </importedRecord> > > > > <importedRecord> > > > > <doctype>CollectionObject</doctype> > > > > <csid>c730a597-3229-476a-9e22-4ce89c003925</csid> > > > > </importedRecord> > > > > <importedRecord> > > > > <doctype>CollectionObject</doctype> > > > > <csid>e9a3e850-2776-44f4-b068-4ab1a0c8c046</csid> > > > > </importedRecord> > > > > <importedRecord> > > > > <doctype>CollectionObject</doctype> > > > > <csid>a5839b2c-b229-4a55-8ee3-71b2440658a3</csid> > > > > </importedRecord> > > > > </importedRecords> > > > > <status>Success</status> > > > > <totalRecordsImported>4</totalRecordsImported> > > > > <numRecordsImportedByDocType> > > > > <numRecordsImported> > > > > <docType>CollectionObject</docType> > > > > <numRecords>4</numRecords> > > > > </numRecordsImported> > > > > </numRecordsImportedByDocType> > > > > <report> > > > > READ: > > > > > > /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd/document.xml/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd > > > > READ: > > > > > > /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02/document.xml/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02 > > > > READ: > > > > > > /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3/document.xml/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3 > > > > READ: > > > > > > /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925/document.xml/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925 > > > > READ: > > > > > > /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046/document.xml/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046</report> > > > > </import> > > > > > > Note that <totalRecordsImported> identifies that only 4 records were > > > > successfully imported. > > > > > > And by checking the CSIDs that *were* imported successfully against > > > > the > > > > entire list of CSIDs, perhaps the 'missing' records (that failed to > > > > import) > > > > could be identified? (In the list above, note that CSID > > > > '6feb15c3-4e1e-4230-bb88-fa81467f6cbd' - the CSID for the problematic > > > > fourth > > > > record - doesn't appear in the list of <importedRecords>.) If this test > > > > is > > > > any indication, you might need to sort both lists of CSIDs - those > > > > submitted > > > > and those successfully imported - as the ordering in the import payload > > > > might not match the order returned in the output from that POST ... > > > > Anyway, > > > > a thought. > > > > > > Also: there are others on this list who are extremely experienced at > > > > doing > > > > imports, and who might be able to share their own tips/tricks/scripts > > > > for > > > > making it easier to identify records that failed to import, and > > > > re-submitting those ... > > > > > > Aron > > > > > > > > On Tue, Oct 20, 2015 at 2:24 PM, Peter Murray <pmurray@chillco.com <mailto:pmurray@chillco.com>> > > > > wrote: > > > > > > Thanks, Aron and Richard. I'm working with Acquisition records at the > > > > moment, so I would need to add the 'other number' field to it and the > > > > other > > > > record types in order to store that PastPerfect identifier. I think > > > > I'll > > > > take another look at the Import service, albeit in a one-at-a-time mode > > > > so I > > > > can have a better handle on error reporting. > > > > > > > > Peter > > > > > > > > On Oct 20, 2015, at 5:05 PM, Richard Millet > > > > <richard.millet@lyrasis.org <mailto:richard.millet@lyrasis.org>> > > > > wrote: > > > > > > Peter, > > > > > > I agree with Aron. If you decide you can't (or would rather not) use > > > > the > > > > Import service to create the cataloging records, then using the "Other > > > > Number" field is probably your best choice. > > > > > > Keep in mind that using a combination of data insertion methods > > > > (RESTFul > > > > API, Import Service, SQL) to get data into CollectionSpace is perfectly > > > > ok. > > > > So perhaps you could create all the cataloging records using the Import > > > > service and then make additional changes with RESTFul PUT and other API > > > > calls. > > > > > > -Richard > > > > > > > > ________________________________ > > > > From: Talk <talk-bounces@lists.collectionspace.org <mailto:talk-bounces@lists.collectionspace.org>> on behalf of Aron > > > > Roberts <aron@socrates.berkeley.edu <mailto:aron@socrates.berkeley.edu>> > > > > Sent: Tuesday, October 20, 2015 1:00 PM > > > > To: Peter Murray > > > > Cc: CollectionSpace Talk List > > > > Subject: Re: [Talk] Using RESTful interface, create a record with a > > > > particular CSID > > > > > > I wrote: > > > > One possible way to do this - if this were supported, say, as a > > > > future > > > > enhancement - might be to supply the CSID in the <uri> value in a > > > > <collectionspace_core> record part, in POSTs ... > > > > > > And, of course, that's exactly what you suggested, Peter! :) Serves > > > > me > > > > right for too-quickly skimming! > > > > > > Just thinking out loud here: the services would need to check that > > > > URI > > > > for at least: format, record type matching, and identifier uniqueness > > > > (even > > > > with the improbability of duplicate Type 4 UUIDs), and presumably > > > > reject > > > > records that didn't pass those validation checks, returning a '400 Bad > > > > Request' or similar status. > > > > > > And for certain record types, the services might also need to check > > > > and/or synthesize the <refName> value. (For object or procedural > > > > records > > > > with hierarchy, such as Cataloging records, the CSID is part of that > > > > refName.) > > > > > > Aron > > > > > > On Tue, Oct 20, 2015 at 12:46 PM, Aron Roberts > > > > <aron@socrates.berkeley.edu <mailto:aron@socrates.berkeley.edu>> wrote: > > > > > > As a possible workaround, the Imports service will allow you to > > > > specify > > > > a CSID for a newly imported record. > > > > > > As an off-the-cuff, not-researched response: I don't recall if you > > > > can > > > > specify a CSID on a POST, when interacting with the services for > > > > various > > > > record types (i.e. outside of an import context), but my recollection > > > > is > > > > that's not possible. > > > > > > One possible way to do this - if this were supported, say, as a > > > > future > > > > enhancement - might be to supply the CSID in the <uri> value in a > > > > <collectionspace_core> record part, in POSTs; e.g. > > > > > > <document name="collectionobjects"> > > > > <ns2:collectionspace_core> > > > > ... > > > > <uri>/collectionobjects/90c0a0e6-eeca-46dd-add6</uri> > > > > </ns2:collectionspace_core> > > > > <ns2:collectionobjects_common> > > > > ... > > > > > > it seems to be a really handy thing to have the CSID match > > > > PastPerfect > > > > ID (especially in the migration process when I am iterating through > > > > loading > > > > templates and linking records together). > > > > > > Would the 'other number' multivalued field in > > > > Cataloging/CollectionObject records work for this purpose? Out of the > > > > box, > > > > there's a 'previous' type for that field. (See attached and below.) > > > > > > <cspace-other-number-field-example.png> > > > > > > > > <otherNumberList> > > > > <otherNumber> > > > > <numberValue>0001</numberValue> > > > > <numberType>serial</numberType> > > > > </otherNumber> > > > > <otherNumber> > > > > <numberValue>204b95db-1557-4c8d-ba28-42e5578e53d3</numberValue> > > > > <numberType>previous</numberType> > > > > </otherNumber> > > > > </otherNumberList> > > > > > > AFAIK, this is the provided/intended way to stash away formerly-used > > > > museum numbers or identifiers that you'd like to continue to have > > > > associated > > > > with a record in CollectionSpace, although this clearly isn't as > > > > clean/easy > > > > to work with as having matching UUIDs in both one's old and new > > > > systems. > > > > > > Aron > > > > > > > > > > > > On Tue, Oct 20, 2015 at 12:26 PM, Peter Murray <pmurray@chillco.com <mailto:pmurray@chillco.com>> > > > > wrote: > > > > > > As it happens, PastPerfect also uses Type-4 UUIDs as internal record > > > > numbers, and it seems to be a really handy thing to have the CSID > > > > match > > > > PastPerfect ID (especially in the migration process when I am > > > > iterating > > > > through loading templates and linking records together). The problem > > > > is > > > > that the RESTful service interface doesn't seem to let me specify a > > > > CSID. > > > > > > If I PUT to /cspace-services/acquisitions/{{UUID}} and that record > > > > doesn't already exist, I get back a 404.[1] If I POST to > > > > /cspace-services/acquisitions and include this in the document: > > > > > > <?xml version="1.0" encoding="UTF-8"?> > > > > <document name="acquisitions"> > > > > <ns2:collectionspace_core > > > > xmlns:ns2="http://collectionspace.org/collectionspace_core/ <http://collectionspace.org/collectionspace_core/>" > > > > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance <http://www.w3.org/2001/XMLSchema-instance>"> > > > > <updatedBy>PastPerfect: {{ UPDATEDBY }}</updatedBy> > > > > <createdBy>PastPerfect Migration</createdBy> > > > > <workflowState>project</workflowState> > > > > <tenantId>11</tenantId> > > > > <updatedAt>{{ __updatedAt }}</updatedAt> > > > > <uri>/acquisitions/{{ PPID }}</uri> > > > > </ns2:collectionspace_core> > > > > > > ...the service then doesn't honor the identifier in the <uri> element > > > > and it assigns the record a new CSID. (The above, by the way, is > > > > part of > > > > the Jinja2 template I'm using to create records, so the {{ PPID }} is > > > > a > > > > replaced placeholder.) > > > > > > Thoughts? > > > > > > > > Peter > > > > > > [1] This is what I expect a RESTful interface to do... > > > > > > > > -- Peter Murray Dev/Ops Lead and Project Manager Cherry Hill Company