PM
Peter Murray
Wed, Oct 21, 2015 9:49 PM
The server logs are definitely crucial in figuring out what went wrong. The XML reply to an import will tell you that an error happened (or sometimes not), but the WHY of the problem with the import can only be found in the server logs.
A side benefit that I've noticed is that I can force data into the CSpace record that I wouldn't otherwise be able to, such as the last PastPerfect user to edit the record and when that edit happened. I'm also able to put in a "Past Perfect Migration" string in for "creator" so later viewers will know that this record started in the legacy system. So, on the whole, using ImportsService probably makes sense.
One thing I have noticed is that ImportsService does not clean up after itself by deleting the temp files. Probably should file a ticket for that, but the left over files are useful for debugging.
As always, thanks for the discussion and ideas,
Peter
On Oct 20, 2015, at 7:14 PM, Susan STONE sstone@berkeley.edu wrote:
Aron,
I definitely find stuff in the server-side logs that helps me find
errors in the XML. It can be a painful process, so I haven't saved any
cherished examples.
Susan
On Tue, Oct 20, 2015 at 4:05 PM, Aron Roberts
aron@socrates.berkeley.edu wrote:
In my experience, it is usually all or nothing (as with a database timeout
when the imports are too large or backed up) ...
Interesting. Have you been able to capture any log output on the server
side when those issues occurred? And are there CSpace JIRA issues for those?
I'd be happy to create one (or more) if you have any raw material around
this.
Aron
On Tue, Oct 20, 2015 at 3:57 PM, Susan STONE sstone@berkeley.edu wrote:
Aron,
In my experience, it is usually all or nothing (as with a database timeout
when the imports are too large or backed up),
and I just check the total for each batch.
I usually work out the XML issues in testing.
In the rare cases where there is a problem in some individual records
and the totals don't match, I have been comparing the
CSIDs manually-ish, but we are working to
automate that process and log the particular records
missed so they can be checked and resubmitted.
Susan
On Tue, Oct 20, 2015 at 3:47 PM, Aron Roberts
aron@socrates.berkeley.edu wrote:
I think I'll take another look at the Import service, albeit in a
one-at-a-time mode so I can have a better handle on error reporting.
From a trivial test just now, I'm wondering whether the Imports
service
might give us just enough information to do a multi-record import, and
be
able to tell which records were successfully imported and which were
not?
Specifically, if we're providing CSIDs for each record at import time,
perhaps we can tell which were successfully imported, and which failed
to be
imported - and thus need to be fixed and re-submitted in a follow-up
import?
Example POST to the Imports service, of five CollectionObject records
to
be imported into the 'core' tenant:
curl -X POST http://yourhostnamehere:8180/cspace-services/imports -i -u
"admin@core.collectionspace.org:Administrator" -H "Content-Type:
application/xml" -T mixed-objects-some-invalid.xml
Where the file 'mixed-objects-some-invalid.xml' is a payload
consisting of
five CollectionObject records to be imported, and where the fourth such
record includes a non-existent element (i.e. one not present in the
collectionobjects_common schema):
<?xml version="1.0" encoding="UTF-8"?>
<imports>
<import service="CollectionObjects" type="CollectionObject"
CSID="e9a3e850-2776-44f4-b068-4ab1a0c8c046">
<schema
xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
name="collectionobjects_common">
<collectionobjects_common:objectNumber>UC1</collectionobjects_common:objectNumber>
</schema>
</import>
<import service="CollectionObjects" type="CollectionObject"
CSID="c730a597-3229-476a-9e22-4ce89c003925">
<schema
xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
name="collectionobjects_common">
<collectionobjects_common:objectNumber>UC2</collectionobjects_common:objectNumber>
</schema>
</import>
<import service="CollectionObjects" type="CollectionObject"
CSID="d7358564-6a08-4dc2-a07d-9708471daa02">
<schema
xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
name="collectionobjects_common">
<collectionobjects_common:objectNumber>UC3</collectionobjects_common:objectNumber>
</schema>
</import>
<import service="CollectionObjects" type="CollectionObject"
CSID="6feb15c3-4e1e-4230-bb88-fa81467f6cbd">
<schema
xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
name="collectionobjects_common">
<collectionobjects_common:objectNumber>UC4</collectionobjects_common:objectNumber>
<collectionobjects_common:foo>THIS ELEMENT DOESN'T EXIST IN
THE
SCHEMA</collectionobjects_common:foo>
</schema>
</import>
<import service="CollectionObjects" type="CollectionObject"
CSID="a5839b2c-b229-4a55-8ee3-71b2440658a3">
<schema
xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
name="collectionobjects_common">
<collectionobjects_common:objectNumber>UC5</collectionobjects_common:objectNumber>
</schema>
</import>
</imports>
This import generates the following console output (pretty printed after
the
fact for clarity, with hand-editing of the <report> content for further
readability):
<?xml version="1.0" encoding="utf-16"?>
<import>
<msg>SUCCESS</msg>
<importedRecords>
<importedRecord>
<doctype>CollectionObject</doctype>
<csid>d7358564-6a08-4dc2-a07d-9708471daa02</csid>
</importedRecord>
<importedRecord>
<doctype>CollectionObject</doctype>
<csid>c730a597-3229-476a-9e22-4ce89c003925</csid>
</importedRecord>
<importedRecord>
<doctype>CollectionObject</doctype>
<csid>e9a3e850-2776-44f4-b068-4ab1a0c8c046</csid>
</importedRecord>
<importedRecord>
<doctype>CollectionObject</doctype>
<csid>a5839b2c-b229-4a55-8ee3-71b2440658a3</csid>
</importedRecord>
</importedRecords>
<status>Success</status>
<totalRecordsImported>4</totalRecordsImported>
<numRecordsImportedByDocType>
<numRecordsImported>
<docType>CollectionObject</docType>
<numRecords>4</numRecords>
</numRecordsImported>
</numRecordsImportedByDocType>
<report>
READ:
/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd/document.xml/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd
READ:
/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02/document.xml/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02
READ:
/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3/document.xml/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3
READ:
/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925/document.xml/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925
READ:
/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046/document.xml/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046</report>
</import>
Note that <totalRecordsImported> identifies that only 4 records were
successfully imported.
And by checking the CSIDs that were imported successfully against
the
entire list of CSIDs, perhaps the 'missing' records (that failed to
import)
could be identified? (In the list above, note that CSID
'6feb15c3-4e1e-4230-bb88-fa81467f6cbd' - the CSID for the problematic
fourth
record - doesn't appear in the list of <importedRecords>.) If this test
is
any indication, you might need to sort both lists of CSIDs - those
submitted
and those successfully imported - as the ordering in the import payload
might not match the order returned in the output from that POST ...
Anyway,
a thought.
Also: there are others on this list who are extremely experienced at
doing
imports, and who might be able to share their own tips/tricks/scripts
for
making it easier to identify records that failed to import, and
re-submitting those ...
Aron
On Tue, Oct 20, 2015 at 2:24 PM, Peter Murray pmurray@chillco.com
wrote:
Thanks, Aron and Richard. I'm working with Acquisition records at the
moment, so I would need to add the 'other number' field to it and the
other
record types in order to store that PastPerfect identifier. I think
I'll
take another look at the Import service, albeit in a one-at-a-time mode
so I
can have a better handle on error reporting.
Peter
On Oct 20, 2015, at 5:05 PM, Richard Millet
richard.millet@lyrasis.org
wrote:
Peter,
I agree with Aron. If you decide you can't (or would rather not) use
the
Import service to create the cataloging records, then using the "Other
Number" field is probably your best choice.
Keep in mind that using a combination of data insertion methods
(RESTFul
API, Import Service, SQL) to get data into CollectionSpace is perfectly
ok.
So perhaps you could create all the cataloging records using the Import
service and then make additional changes with RESTFul PUT and other API
calls.
-Richard
From: Talk talk-bounces@lists.collectionspace.org on behalf of Aron
Roberts aron@socrates.berkeley.edu
Sent: Tuesday, October 20, 2015 1:00 PM
To: Peter Murray
Cc: CollectionSpace Talk List
Subject: Re: [Talk] Using RESTful interface, create a record with a
particular CSID
I wrote:
One possible way to do this - if this were supported, say, as a
future
enhancement - might be to supply the CSID in the <uri> value in a
<collectionspace_core> record part, in POSTs ...
And, of course, that's exactly what you suggested, Peter! :) Serves
me
right for too-quickly skimming!
Just thinking out loud here: the services would need to check that
URI
for at least: format, record type matching, and identifier uniqueness
(even
with the improbability of duplicate Type 4 UUIDs), and presumably
reject
records that didn't pass those validation checks, returning a '400 Bad
Request' or similar status.
And for certain record types, the services might also need to check
and/or synthesize the <refName> value. (For object or procedural
records
with hierarchy, such as Cataloging records, the CSID is part of that
refName.)
Aron
On Tue, Oct 20, 2015 at 12:46 PM, Aron Roberts
aron@socrates.berkeley.edu wrote:
As a possible workaround, the Imports service will allow you to
specify
a CSID for a newly imported record.
As an off-the-cuff, not-researched response: I don't recall if you
can
specify a CSID on a POST, when interacting with the services for
various
record types (i.e. outside of an import context), but my recollection
is
that's not possible.
One possible way to do this - if this were supported, say, as a
future
enhancement - might be to supply the CSID in the <uri> value in a
<collectionspace_core> record part, in POSTs; e.g.
<document name="collectionobjects">
<ns2:collectionspace_core>
...
<uri>/collectionobjects/90c0a0e6-eeca-46dd-add6</uri>
</ns2:collectionspace_core>
<ns2:collectionobjects_common>
...
it seems to be a really handy thing to have the CSID match
PastPerfect
ID (especially in the migration process when I am iterating through
loading
templates and linking records together).
Would the 'other number' multivalued field in
Cataloging/CollectionObject records work for this purpose? Out of the
box,
there's a 'previous' type for that field. (See attached and below.)
<cspace-other-number-field-example.png>
<otherNumberList>
<otherNumber>
<numberValue>0001</numberValue>
<numberType>serial</numberType>
</otherNumber>
<otherNumber>
<numberValue>204b95db-1557-4c8d-ba28-42e5578e53d3</numberValue>
<numberType>previous</numberType>
</otherNumber>
</otherNumberList>
AFAIK, this is the provided/intended way to stash away formerly-used
museum numbers or identifiers that you'd like to continue to have
associated
with a record in CollectionSpace, although this clearly isn't as
clean/easy
to work with as having matching UUIDs in both one's old and new
systems.
Aron
On Tue, Oct 20, 2015 at 12:26 PM, Peter Murray pmurray@chillco.com
wrote:
As it happens, PastPerfect also uses Type-4 UUIDs as internal record
numbers, and it seems to be a really handy thing to have the CSID
match
PastPerfect ID (especially in the migration process when I am
iterating
through loading templates and linking records together). The problem
is
that the RESTful service interface doesn't seem to let me specify a
CSID.
If I PUT to /cspace-services/acquisitions/{{UUID}} and that record
doesn't already exist, I get back a 404.[1] If I POST to
/cspace-services/acquisitions and include this in the document:
<?xml version="1.0" encoding="UTF-8"?>
<document name="acquisitions">
<ns2:collectionspace_core
xmlns:ns2="http://collectionspace.org/collectionspace_core/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<updatedBy>PastPerfect: {{ UPDATEDBY }}</updatedBy>
<createdBy>PastPerfect Migration</createdBy>
<workflowState>project</workflowState>
<tenantId>11</tenantId>
<updatedAt>{{ __updatedAt }}</updatedAt>
<uri>/acquisitions/{{ PPID }}</uri>
</ns2:collectionspace_core>
...the service then doesn't honor the identifier in the <uri> element
and it assigns the record a new CSID. (The above, by the way, is
part of
the Jinja2 template I'm using to create records, so the {{ PPID }} is
a
replaced placeholder.)
Thoughts?
Peter
[1] This is what I expect a RESTful interface to do...
--
Peter Murray
Dev/Ops Lead and Project Manager
Cherry Hill Company
The server logs are definitely crucial in figuring out what went wrong. The XML reply to an import will tell you that an error happened (or sometimes not), but the WHY of the problem with the import can only be found in the server logs.
A side benefit that I've noticed is that I can force data into the CSpace record that I wouldn't otherwise be able to, such as the last PastPerfect user to edit the record and when that edit happened. I'm also able to put in a "Past Perfect Migration" string in for "creator" so later viewers will know that this record started in the legacy system. So, on the whole, using ImportsService probably makes sense.
One thing I have noticed is that ImportsService does not clean up after itself by deleting the temp files. Probably should file a ticket for that, but the left over files are useful for debugging.
As always, thanks for the discussion and ideas,
Peter
> On Oct 20, 2015, at 7:14 PM, Susan STONE <sstone@berkeley.edu> wrote:
>
> Aron,
>
> I definitely find stuff in the server-side logs that helps me find
> errors in the XML. It can be a painful process, so I haven't saved any
> cherished examples.
>
> Susan
>
> On Tue, Oct 20, 2015 at 4:05 PM, Aron Roberts
> <aron@socrates.berkeley.edu> wrote:
>> Thanks, Susan!
>>
>>> In my experience, it is usually all or nothing (as with a database timeout
>> when the imports are too large or backed up) ...
>>
>> Interesting. Have you been able to capture any log output on the server
>> side when those issues occurred? And are there CSpace JIRA issues for those?
>> I'd be happy to create one (or more) if you have any raw material around
>> this.
>>
>> Aron
>>
>> On Tue, Oct 20, 2015 at 3:57 PM, Susan STONE <sstone@berkeley.edu> wrote:
>>>
>>> Aron,
>>>
>>> In my experience, it is usually all or nothing (as with a database timeout
>>> when the imports are too large or backed up),
>>> and I just check the total for each batch.
>>> I usually work out the XML issues in testing.
>>>
>>> In the rare cases where there is a problem in some individual records
>>> and the totals don't match, I have been comparing the
>>> CSIDs manually-ish, but we are working to
>>> automate that process and log the particular records
>>> missed so they can be checked and resubmitted.
>>>
>>> Susan
>>>
>>> On Tue, Oct 20, 2015 at 3:47 PM, Aron Roberts
>>> <aron@socrates.berkeley.edu> wrote:
>>>> Peter wrote:
>>>>> I think I'll take another look at the Import service, albeit in a
>>>>> one-at-a-time mode so I can have a better handle on error reporting.
>>>>
>>>> From a trivial test just now, I'm wondering whether the Imports
>>>> service
>>>> might give us *just enough* information to do a multi-record import, and
>>>> be
>>>> able to tell which records were successfully imported and which were
>>>> not?
>>>>
>>>> Specifically, if we're providing CSIDs for each record at import time,
>>>> perhaps we can tell which were successfully imported, and which failed
>>>> to be
>>>> imported - and thus need to be fixed and re-submitted in a follow-up
>>>> import?
>>>>
>>>> Example POST to the Imports service, of five CollectionObject records
>>>> to
>>>> be imported into the 'core' tenant:
>>>>
>>>> curl -X POST http://yourhostnamehere:8180/cspace-services/imports -i -u
>>>> "admin@core.collectionspace.org:Administrator" -H "Content-Type:
>>>> application/xml" -T mixed-objects-some-invalid.xml
>>>>
>>>> Where the file 'mixed-objects-some-invalid.xml' is a payload
>>>> consisting of
>>>> five CollectionObject records to be imported, and where the fourth such
>>>> record includes a non-existent element (i.e. one not present in the
>>>> collectionobjects_common schema):
>>>>
>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>> <imports>
>>>> <import service="CollectionObjects" type="CollectionObject"
>>>> CSID="e9a3e850-2776-44f4-b068-4ab1a0c8c046">
>>>> <schema
>>>>
>>>> xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
>>>> name="collectionobjects_common">
>>>>
>>>>
>>>> <collectionobjects_common:objectNumber>UC1</collectionobjects_common:objectNumber>
>>>> </schema>
>>>> </import>
>>>> <import service="CollectionObjects" type="CollectionObject"
>>>> CSID="c730a597-3229-476a-9e22-4ce89c003925">
>>>> <schema
>>>>
>>>> xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
>>>> name="collectionobjects_common">
>>>>
>>>>
>>>> <collectionobjects_common:objectNumber>UC2</collectionobjects_common:objectNumber>
>>>> </schema>
>>>> </import>
>>>> <import service="CollectionObjects" type="CollectionObject"
>>>> CSID="d7358564-6a08-4dc2-a07d-9708471daa02">
>>>> <schema
>>>>
>>>> xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
>>>> name="collectionobjects_common">
>>>>
>>>>
>>>> <collectionobjects_common:objectNumber>UC3</collectionobjects_common:objectNumber>
>>>> </schema>
>>>> </import>
>>>> <import service="CollectionObjects" type="CollectionObject"
>>>> CSID="6feb15c3-4e1e-4230-bb88-fa81467f6cbd">
>>>> <schema
>>>>
>>>> xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
>>>> name="collectionobjects_common">
>>>>
>>>>
>>>> <collectionobjects_common:objectNumber>UC4</collectionobjects_common:objectNumber>
>>>> <collectionobjects_common:foo>THIS ELEMENT DOESN'T EXIST IN
>>>> THE
>>>> SCHEMA</collectionobjects_common:foo>
>>>> </schema>
>>>> </import>
>>>> <import service="CollectionObjects" type="CollectionObject"
>>>> CSID="a5839b2c-b229-4a55-8ee3-71b2440658a3">
>>>> <schema
>>>>
>>>> xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
>>>> name="collectionobjects_common">
>>>>
>>>>
>>>> <collectionobjects_common:objectNumber>UC5</collectionobjects_common:objectNumber>
>>>> </schema>
>>>> </import>
>>>> </imports>
>>>>
>>>> This import generates the following console output (pretty printed after
>>>> the
>>>> fact for clarity, with hand-editing of the <report> content for further
>>>> readability):
>>>>
>>>> <?xml version="1.0" encoding="utf-16"?>
>>>> <import>
>>>> <msg>SUCCESS</msg>
>>>> <importedRecords>
>>>> <importedRecord>
>>>> <doctype>CollectionObject</doctype>
>>>> <csid>d7358564-6a08-4dc2-a07d-9708471daa02</csid>
>>>> </importedRecord>
>>>> <importedRecord>
>>>> <doctype>CollectionObject</doctype>
>>>> <csid>c730a597-3229-476a-9e22-4ce89c003925</csid>
>>>> </importedRecord>
>>>> <importedRecord>
>>>> <doctype>CollectionObject</doctype>
>>>> <csid>e9a3e850-2776-44f4-b068-4ab1a0c8c046</csid>
>>>> </importedRecord>
>>>> <importedRecord>
>>>> <doctype>CollectionObject</doctype>
>>>> <csid>a5839b2c-b229-4a55-8ee3-71b2440658a3</csid>
>>>> </importedRecord>
>>>> </importedRecords>
>>>> <status>Success</status>
>>>> <totalRecordsImported>4</totalRecordsImported>
>>>> <numRecordsImportedByDocType>
>>>> <numRecordsImported>
>>>> <docType>CollectionObject</docType>
>>>> <numRecords>4</numRecords>
>>>> </numRecordsImported>
>>>> </numRecordsImportedByDocType>
>>>> <report>
>>>> READ:
>>>>
>>>> /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd/document.xml/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd
>>>> READ:
>>>>
>>>> /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02/document.xml/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02
>>>> READ:
>>>>
>>>> /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3/document.xml/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3
>>>> READ:
>>>>
>>>> /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925/document.xml/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925
>>>> READ:
>>>>
>>>> /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046/document.xml/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046</report>
>>>> </import>
>>>>
>>>> Note that <totalRecordsImported> identifies that only 4 records were
>>>> successfully imported.
>>>>
>>>> And by checking the CSIDs that *were* imported successfully against
>>>> the
>>>> entire list of CSIDs, perhaps the 'missing' records (that failed to
>>>> import)
>>>> could be identified? (In the list above, note that CSID
>>>> '6feb15c3-4e1e-4230-bb88-fa81467f6cbd' - the CSID for the problematic
>>>> fourth
>>>> record - doesn't appear in the list of <importedRecords>.) If this test
>>>> is
>>>> any indication, you might need to sort both lists of CSIDs - those
>>>> submitted
>>>> and those successfully imported - as the ordering in the import payload
>>>> might not match the order returned in the output from that POST ...
>>>> Anyway,
>>>> a thought.
>>>>
>>>> Also: there are others on this list who are extremely experienced at
>>>> doing
>>>> imports, and who might be able to share their own tips/tricks/scripts
>>>> for
>>>> making it easier to identify records that failed to import, and
>>>> re-submitting those ...
>>>>
>>>> Aron
>>>>
>>>>
>>>> On Tue, Oct 20, 2015 at 2:24 PM, Peter Murray <pmurray@chillco.com>
>>>> wrote:
>>>>>
>>>>> Thanks, Aron and Richard. I'm working with Acquisition records at the
>>>>> moment, so I would need to add the 'other number' field to it and the
>>>>> other
>>>>> record types in order to store that PastPerfect identifier. I think
>>>>> I'll
>>>>> take another look at the Import service, albeit in a one-at-a-time mode
>>>>> so I
>>>>> can have a better handle on error reporting.
>>>>>
>>>>>
>>>>> Peter
>>>>>
>>>>>
>>>>> On Oct 20, 2015, at 5:05 PM, Richard Millet
>>>>> <richard.millet@lyrasis.org>
>>>>> wrote:
>>>>>
>>>>> Peter,
>>>>>
>>>>> I agree with Aron. If you decide you can't (or would rather not) use
>>>>> the
>>>>> Import service to create the cataloging records, then using the "Other
>>>>> Number" field is probably your best choice.
>>>>>
>>>>> Keep in mind that using a combination of data insertion methods
>>>>> (RESTFul
>>>>> API, Import Service, SQL) to get data into CollectionSpace is perfectly
>>>>> ok.
>>>>> So perhaps you could create all the cataloging records using the Import
>>>>> service and then make additional changes with RESTFul PUT and other API
>>>>> calls.
>>>>>
>>>>> -Richard
>>>>>
>>>>>
>>>>> ________________________________
>>>>> From: Talk <talk-bounces@lists.collectionspace.org> on behalf of Aron
>>>>> Roberts <aron@socrates.berkeley.edu>
>>>>> Sent: Tuesday, October 20, 2015 1:00 PM
>>>>> To: Peter Murray
>>>>> Cc: CollectionSpace Talk List
>>>>> Subject: Re: [Talk] Using RESTful interface, create a record with a
>>>>> particular CSID
>>>>>
>>>>> I wrote:
>>>>>> One possible way to do this - if this were supported, say, as a
>>>>>> future
>>>>>> enhancement - might be to supply the CSID in the <uri> value in a
>>>>>> <collectionspace_core> record part, in POSTs ...
>>>>>
>>>>> And, of course, that's exactly what you suggested, Peter! :) Serves
>>>>> me
>>>>> right for too-quickly skimming!
>>>>>
>>>>> Just thinking out loud here: the services would need to check that
>>>>> URI
>>>>> for at least: format, record type matching, and identifier uniqueness
>>>>> (even
>>>>> with the improbability of duplicate Type 4 UUIDs), and presumably
>>>>> reject
>>>>> records that didn't pass those validation checks, returning a '400 Bad
>>>>> Request' or similar status.
>>>>>
>>>>> And for certain record types, the services might also need to check
>>>>> and/or synthesize the <refName> value. (For object or procedural
>>>>> records
>>>>> with hierarchy, such as Cataloging records, the CSID is part of that
>>>>> refName.)
>>>>>
>>>>> Aron
>>>>>
>>>>> On Tue, Oct 20, 2015 at 12:46 PM, Aron Roberts
>>>>> <aron@socrates.berkeley.edu> wrote:
>>>>>>
>>>>>> As a possible workaround, the Imports service will allow you to
>>>>>> specify
>>>>>> a CSID for a newly imported record.
>>>>>>
>>>>>> As an off-the-cuff, not-researched response: I don't recall if you
>>>>>> can
>>>>>> specify a CSID on a POST, when interacting with the services for
>>>>>> various
>>>>>> record types (i.e. outside of an import context), but my recollection
>>>>>> is
>>>>>> that's not possible.
>>>>>>
>>>>>> One possible way to do this - if this were supported, say, as a
>>>>>> future
>>>>>> enhancement - might be to supply the CSID in the <uri> value in a
>>>>>> <collectionspace_core> record part, in POSTs; e.g.
>>>>>>
>>>>>> <document name="collectionobjects">
>>>>>> <ns2:collectionspace_core>
>>>>>> ...
>>>>>> <uri>/collectionobjects/90c0a0e6-eeca-46dd-add6</uri>
>>>>>> </ns2:collectionspace_core>
>>>>>> <ns2:collectionobjects_common>
>>>>>> ...
>>>>>>
>>>>>>> it seems to be a really handy thing to have the CSID match
>>>>>>> PastPerfect
>>>>>>> ID (especially in the migration process when I am iterating through
>>>>>>> loading
>>>>>>> templates and linking records together).
>>>>>>
>>>>>> Would the 'other number' multivalued field in
>>>>>> Cataloging/CollectionObject records work for this purpose? Out of the
>>>>>> box,
>>>>>> there's a 'previous' type for that field. (See attached and below.)
>>>>>>
>>>>>> <cspace-other-number-field-example.png>
>>>>>>
>>>>>>
>>>>>> <otherNumberList>
>>>>>> <otherNumber>
>>>>>> <numberValue>0001</numberValue>
>>>>>> <numberType>serial</numberType>
>>>>>> </otherNumber>
>>>>>> <otherNumber>
>>>>>> <numberValue>204b95db-1557-4c8d-ba28-42e5578e53d3</numberValue>
>>>>>> <numberType>previous</numberType>
>>>>>> </otherNumber>
>>>>>> </otherNumberList>
>>>>>>
>>>>>> AFAIK, this is the provided/intended way to stash away formerly-used
>>>>>> museum numbers or identifiers that you'd like to continue to have
>>>>>> associated
>>>>>> with a record in CollectionSpace, although this clearly isn't as
>>>>>> clean/easy
>>>>>> to work with as having matching UUIDs in both one's old and new
>>>>>> systems.
>>>>>>
>>>>>> Aron
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 20, 2015 at 12:26 PM, Peter Murray <pmurray@chillco.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> As it happens, PastPerfect also uses Type-4 UUIDs as internal record
>>>>>>> numbers, and it seems to be a really handy thing to have the CSID
>>>>>>> match
>>>>>>> PastPerfect ID (especially in the migration process when I am
>>>>>>> iterating
>>>>>>> through loading templates and linking records together). The problem
>>>>>>> is
>>>>>>> that the RESTful service interface doesn't seem to let me specify a
>>>>>>> CSID.
>>>>>>>
>>>>>>> If I PUT to /cspace-services/acquisitions/{{UUID}} and that record
>>>>>>> doesn't already exist, I get back a 404.[1] If I POST to
>>>>>>> /cspace-services/acquisitions and include this in the document:
>>>>>>>
>>>>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>>>>> <document name="acquisitions">
>>>>>>> <ns2:collectionspace_core
>>>>>>> xmlns:ns2="http://collectionspace.org/collectionspace_core/"
>>>>>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
>>>>>>> <updatedBy>PastPerfect: {{ UPDATEDBY }}</updatedBy>
>>>>>>> <createdBy>PastPerfect Migration</createdBy>
>>>>>>> <workflowState>project</workflowState>
>>>>>>> <tenantId>11</tenantId>
>>>>>>> <updatedAt>{{ __updatedAt }}</updatedAt>
>>>>>>> <uri>/acquisitions/{{ PPID }}</uri>
>>>>>>> </ns2:collectionspace_core>
>>>>>>>
>>>>>>> ...the service then doesn't honor the identifier in the <uri> element
>>>>>>> and it assigns the record a new CSID. (The above, by the way, is
>>>>>>> part of
>>>>>>> the Jinja2 template I'm using to create records, so the {{ PPID }} is
>>>>>>> a
>>>>>>> replaced placeholder.)
>>>>>>>
>>>>>>> Thoughts?
>>>>>>>
>>>>>>>
>>>>>>> Peter
>>>>>>>
>>>>>>> [1] This is what I expect a RESTful interface to do...
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Peter Murray
>>>>> Dev/Ops Lead and Project Manager
>>>>> Cherry Hill Company
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Talk mailing list
>>>>> Talk@lists.collectionspace.org
>>>>>
>>>>>
>>>>> http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Talk mailing list
>>>> Talk@lists.collectionspace.org
>>>>
>>>> http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org
--
Peter Murray
Dev/Ops Lead and Project Manager
Cherry Hill Company
SS
Susan STONE
Wed, Oct 21, 2015 10:01 PM
Peter,
Since they are not cleaned up, it is useful to create an alias or run
a cron job to delete the import files in the temp directory that are
older than a certain amount of time so they don't build up.
Susan
On Wed, Oct 21, 2015 at 2:49 PM, Peter Murray pmurray@chillco.com wrote:
The server logs are definitely crucial in figuring out what went wrong. The XML reply to an import will tell you that an error happened (or sometimes not), but the WHY of the problem with the import can only be found in the server logs.
A side benefit that I've noticed is that I can force data into the CSpace record that I wouldn't otherwise be able to, such as the last PastPerfect user to edit the record and when that edit happened. I'm also able to put in a "Past Perfect Migration" string in for "creator" so later viewers will know that this record started in the legacy system. So, on the whole, using ImportsService probably makes sense.
One thing I have noticed is that ImportsService does not clean up after itself by deleting the temp files. Probably should file a ticket for that, but the left over files are useful for debugging.
As always, thanks for the discussion and ideas,
Peter
On Oct 20, 2015, at 7:14 PM, Susan STONE sstone@berkeley.edu wrote:
Aron,
I definitely find stuff in the server-side logs that helps me find
errors in the XML. It can be a painful process, so I haven't saved any
cherished examples.
Susan
On Tue, Oct 20, 2015 at 4:05 PM, Aron Roberts
aron@socrates.berkeley.edu wrote:
In my experience, it is usually all or nothing (as with a database timeout
when the imports are too large or backed up) ...
Interesting. Have you been able to capture any log output on the server
side when those issues occurred? And are there CSpace JIRA issues for those?
I'd be happy to create one (or more) if you have any raw material around
this.
Aron
On Tue, Oct 20, 2015 at 3:57 PM, Susan STONE sstone@berkeley.edu wrote:
Aron,
In my experience, it is usually all or nothing (as with a database timeout
when the imports are too large or backed up),
and I just check the total for each batch.
I usually work out the XML issues in testing.
In the rare cases where there is a problem in some individual records
and the totals don't match, I have been comparing the
CSIDs manually-ish, but we are working to
automate that process and log the particular records
missed so they can be checked and resubmitted.
Susan
On Tue, Oct 20, 2015 at 3:47 PM, Aron Roberts
aron@socrates.berkeley.edu wrote:
I think I'll take another look at the Import service, albeit in a
one-at-a-time mode so I can have a better handle on error reporting.
From a trivial test just now, I'm wondering whether the Imports
service
might give us just enough information to do a multi-record import, and
be
able to tell which records were successfully imported and which were
not?
Specifically, if we're providing CSIDs for each record at import time,
perhaps we can tell which were successfully imported, and which failed
to be
imported - and thus need to be fixed and re-submitted in a follow-up
import?
Example POST to the Imports service, of five CollectionObject records
to
be imported into the 'core' tenant:
curl -X POST http://yourhostnamehere:8180/cspace-services/imports -i -u
"admin@core.collectionspace.org:Administrator" -H "Content-Type:
application/xml" -T mixed-objects-some-invalid.xml
Where the file 'mixed-objects-some-invalid.xml' is a payload
consisting of
five CollectionObject records to be imported, and where the fourth such
record includes a non-existent element (i.e. one not present in the
collectionobjects_common schema):
<?xml version="1.0" encoding="UTF-8"?>
<imports>
<import service="CollectionObjects" type="CollectionObject"
CSID="e9a3e850-2776-44f4-b068-4ab1a0c8c046">
<schema
xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
name="collectionobjects_common">
<collectionobjects_common:objectNumber>UC1</collectionobjects_common:objectNumber>
</schema>
</import>
<import service="CollectionObjects" type="CollectionObject"
CSID="c730a597-3229-476a-9e22-4ce89c003925">
<schema
xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
name="collectionobjects_common">
<collectionobjects_common:objectNumber>UC2</collectionobjects_common:objectNumber>
</schema>
</import>
<import service="CollectionObjects" type="CollectionObject"
CSID="d7358564-6a08-4dc2-a07d-9708471daa02">
<schema
xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
name="collectionobjects_common">
<collectionobjects_common:objectNumber>UC3</collectionobjects_common:objectNumber>
</schema>
</import>
<import service="CollectionObjects" type="CollectionObject"
CSID="6feb15c3-4e1e-4230-bb88-fa81467f6cbd">
<schema
xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
name="collectionobjects_common">
<collectionobjects_common:objectNumber>UC4</collectionobjects_common:objectNumber>
<collectionobjects_common:foo>THIS ELEMENT DOESN'T EXIST IN
THE
SCHEMA</collectionobjects_common:foo>
</schema>
</import>
<import service="CollectionObjects" type="CollectionObject"
CSID="a5839b2c-b229-4a55-8ee3-71b2440658a3">
<schema
xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
name="collectionobjects_common">
<collectionobjects_common:objectNumber>UC5</collectionobjects_common:objectNumber>
</schema>
</import>
</imports>
This import generates the following console output (pretty printed after
the
fact for clarity, with hand-editing of the <report> content for further
readability):
<?xml version="1.0" encoding="utf-16"?>
<import>
<msg>SUCCESS</msg>
<importedRecords>
<importedRecord>
<doctype>CollectionObject</doctype>
<csid>d7358564-6a08-4dc2-a07d-9708471daa02</csid>
</importedRecord>
<importedRecord>
<doctype>CollectionObject</doctype>
<csid>c730a597-3229-476a-9e22-4ce89c003925</csid>
</importedRecord>
<importedRecord>
<doctype>CollectionObject</doctype>
<csid>e9a3e850-2776-44f4-b068-4ab1a0c8c046</csid>
</importedRecord>
<importedRecord>
<doctype>CollectionObject</doctype>
<csid>a5839b2c-b229-4a55-8ee3-71b2440658a3</csid>
</importedRecord>
</importedRecords>
<status>Success</status>
<totalRecordsImported>4</totalRecordsImported>
<numRecordsImportedByDocType>
<numRecordsImported>
<docType>CollectionObject</docType>
<numRecords>4</numRecords>
</numRecordsImported>
</numRecordsImportedByDocType>
<report>
READ:
/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd/document.xml/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd
READ:
/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02/document.xml/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02
READ:
/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3/document.xml/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3
READ:
/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925/document.xml/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925
READ:
/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046/document.xml/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046</report>
</import>
Note that <totalRecordsImported> identifies that only 4 records were
successfully imported.
And by checking the CSIDs that were imported successfully against
the
entire list of CSIDs, perhaps the 'missing' records (that failed to
import)
could be identified? (In the list above, note that CSID
'6feb15c3-4e1e-4230-bb88-fa81467f6cbd' - the CSID for the problematic
fourth
record - doesn't appear in the list of <importedRecords>.) If this test
is
any indication, you might need to sort both lists of CSIDs - those
submitted
and those successfully imported - as the ordering in the import payload
might not match the order returned in the output from that POST ...
Anyway,
a thought.
Also: there are others on this list who are extremely experienced at
doing
imports, and who might be able to share their own tips/tricks/scripts
for
making it easier to identify records that failed to import, and
re-submitting those ...
Aron
On Tue, Oct 20, 2015 at 2:24 PM, Peter Murray pmurray@chillco.com
wrote:
Thanks, Aron and Richard. I'm working with Acquisition records at the
moment, so I would need to add the 'other number' field to it and the
other
record types in order to store that PastPerfect identifier. I think
I'll
take another look at the Import service, albeit in a one-at-a-time mode
so I
can have a better handle on error reporting.
Peter
On Oct 20, 2015, at 5:05 PM, Richard Millet
richard.millet@lyrasis.org
wrote:
Peter,
I agree with Aron. If you decide you can't (or would rather not) use
the
Import service to create the cataloging records, then using the "Other
Number" field is probably your best choice.
Keep in mind that using a combination of data insertion methods
(RESTFul
API, Import Service, SQL) to get data into CollectionSpace is perfectly
ok.
So perhaps you could create all the cataloging records using the Import
service and then make additional changes with RESTFul PUT and other API
calls.
-Richard
From: Talk talk-bounces@lists.collectionspace.org on behalf of Aron
Roberts aron@socrates.berkeley.edu
Sent: Tuesday, October 20, 2015 1:00 PM
To: Peter Murray
Cc: CollectionSpace Talk List
Subject: Re: [Talk] Using RESTful interface, create a record with a
particular CSID
I wrote:
One possible way to do this - if this were supported, say, as a
future
enhancement - might be to supply the CSID in the <uri> value in a
<collectionspace_core> record part, in POSTs ...
And, of course, that's exactly what you suggested, Peter! :) Serves
me
right for too-quickly skimming!
Just thinking out loud here: the services would need to check that
URI
for at least: format, record type matching, and identifier uniqueness
(even
with the improbability of duplicate Type 4 UUIDs), and presumably
reject
records that didn't pass those validation checks, returning a '400 Bad
Request' or similar status.
And for certain record types, the services might also need to check
and/or synthesize the <refName> value. (For object or procedural
records
with hierarchy, such as Cataloging records, the CSID is part of that
refName.)
Aron
On Tue, Oct 20, 2015 at 12:46 PM, Aron Roberts
aron@socrates.berkeley.edu wrote:
As a possible workaround, the Imports service will allow you to
specify
a CSID for a newly imported record.
As an off-the-cuff, not-researched response: I don't recall if you
can
specify a CSID on a POST, when interacting with the services for
various
record types (i.e. outside of an import context), but my recollection
is
that's not possible.
One possible way to do this - if this were supported, say, as a
future
enhancement - might be to supply the CSID in the <uri> value in a
<collectionspace_core> record part, in POSTs; e.g.
<document name="collectionobjects">
<ns2:collectionspace_core>
...
<uri>/collectionobjects/90c0a0e6-eeca-46dd-add6</uri>
</ns2:collectionspace_core>
<ns2:collectionobjects_common>
...
it seems to be a really handy thing to have the CSID match
PastPerfect
ID (especially in the migration process when I am iterating through
loading
templates and linking records together).
Would the 'other number' multivalued field in
Cataloging/CollectionObject records work for this purpose? Out of the
box,
there's a 'previous' type for that field. (See attached and below.)
<cspace-other-number-field-example.png>
<otherNumberList>
<otherNumber>
<numberValue>0001</numberValue>
<numberType>serial</numberType>
</otherNumber>
<otherNumber>
<numberValue>204b95db-1557-4c8d-ba28-42e5578e53d3</numberValue>
<numberType>previous</numberType>
</otherNumber>
</otherNumberList>
AFAIK, this is the provided/intended way to stash away formerly-used
museum numbers or identifiers that you'd like to continue to have
associated
with a record in CollectionSpace, although this clearly isn't as
clean/easy
to work with as having matching UUIDs in both one's old and new
systems.
Aron
On Tue, Oct 20, 2015 at 12:26 PM, Peter Murray pmurray@chillco.com
wrote:
As it happens, PastPerfect also uses Type-4 UUIDs as internal record
numbers, and it seems to be a really handy thing to have the CSID
match
PastPerfect ID (especially in the migration process when I am
iterating
through loading templates and linking records together). The problem
is
that the RESTful service interface doesn't seem to let me specify a
CSID.
If I PUT to /cspace-services/acquisitions/{{UUID}} and that record
doesn't already exist, I get back a 404.[1] If I POST to
/cspace-services/acquisitions and include this in the document:
<?xml version="1.0" encoding="UTF-8"?>
<document name="acquisitions">
<ns2:collectionspace_core
xmlns:ns2="http://collectionspace.org/collectionspace_core/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<updatedBy>PastPerfect: {{ UPDATEDBY }}</updatedBy>
<createdBy>PastPerfect Migration</createdBy>
<workflowState>project</workflowState>
<tenantId>11</tenantId>
<updatedAt>{{ __updatedAt }}</updatedAt>
<uri>/acquisitions/{{ PPID }}</uri>
</ns2:collectionspace_core>
...the service then doesn't honor the identifier in the <uri> element
and it assigns the record a new CSID. (The above, by the way, is
part of
the Jinja2 template I'm using to create records, so the {{ PPID }} is
a
replaced placeholder.)
Thoughts?
Peter
[1] This is what I expect a RESTful interface to do...
Peter,
Since they are not cleaned up, it is useful to create an alias or run
a cron job to delete the import files in the temp directory that are
older than a certain amount of time so they don't build up.
Susan
On Wed, Oct 21, 2015 at 2:49 PM, Peter Murray <pmurray@chillco.com> wrote:
> The server logs are definitely crucial in figuring out what went wrong. The XML reply to an import will tell you that an error happened (or sometimes not), but the WHY of the problem with the import can only be found in the server logs.
>
> A side benefit that I've noticed is that I can force data into the CSpace record that I wouldn't otherwise be able to, such as the last PastPerfect user to edit the record and when that edit happened. I'm also able to put in a "Past Perfect Migration" string in for "creator" so later viewers will know that this record started in the legacy system. So, on the whole, using ImportsService probably makes sense.
>
> One thing I have noticed is that ImportsService does not clean up after itself by deleting the temp files. Probably should file a ticket for that, but the left over files are useful for debugging.
>
> As always, thanks for the discussion and ideas,
>
>
> Peter
>
>> On Oct 20, 2015, at 7:14 PM, Susan STONE <sstone@berkeley.edu> wrote:
>>
>> Aron,
>>
>> I definitely find stuff in the server-side logs that helps me find
>> errors in the XML. It can be a painful process, so I haven't saved any
>> cherished examples.
>>
>> Susan
>>
>> On Tue, Oct 20, 2015 at 4:05 PM, Aron Roberts
>> <aron@socrates.berkeley.edu> wrote:
>>> Thanks, Susan!
>>>
>>>> In my experience, it is usually all or nothing (as with a database timeout
>>> when the imports are too large or backed up) ...
>>>
>>> Interesting. Have you been able to capture any log output on the server
>>> side when those issues occurred? And are there CSpace JIRA issues for those?
>>> I'd be happy to create one (or more) if you have any raw material around
>>> this.
>>>
>>> Aron
>>>
>>> On Tue, Oct 20, 2015 at 3:57 PM, Susan STONE <sstone@berkeley.edu> wrote:
>>>>
>>>> Aron,
>>>>
>>>> In my experience, it is usually all or nothing (as with a database timeout
>>>> when the imports are too large or backed up),
>>>> and I just check the total for each batch.
>>>> I usually work out the XML issues in testing.
>>>>
>>>> In the rare cases where there is a problem in some individual records
>>>> and the totals don't match, I have been comparing the
>>>> CSIDs manually-ish, but we are working to
>>>> automate that process and log the particular records
>>>> missed so they can be checked and resubmitted.
>>>>
>>>> Susan
>>>>
>>>> On Tue, Oct 20, 2015 at 3:47 PM, Aron Roberts
>>>> <aron@socrates.berkeley.edu> wrote:
>>>>> Peter wrote:
>>>>>> I think I'll take another look at the Import service, albeit in a
>>>>>> one-at-a-time mode so I can have a better handle on error reporting.
>>>>>
>>>>> From a trivial test just now, I'm wondering whether the Imports
>>>>> service
>>>>> might give us *just enough* information to do a multi-record import, and
>>>>> be
>>>>> able to tell which records were successfully imported and which were
>>>>> not?
>>>>>
>>>>> Specifically, if we're providing CSIDs for each record at import time,
>>>>> perhaps we can tell which were successfully imported, and which failed
>>>>> to be
>>>>> imported - and thus need to be fixed and re-submitted in a follow-up
>>>>> import?
>>>>>
>>>>> Example POST to the Imports service, of five CollectionObject records
>>>>> to
>>>>> be imported into the 'core' tenant:
>>>>>
>>>>> curl -X POST http://yourhostnamehere:8180/cspace-services/imports -i -u
>>>>> "admin@core.collectionspace.org:Administrator" -H "Content-Type:
>>>>> application/xml" -T mixed-objects-some-invalid.xml
>>>>>
>>>>> Where the file 'mixed-objects-some-invalid.xml' is a payload
>>>>> consisting of
>>>>> five CollectionObject records to be imported, and where the fourth such
>>>>> record includes a non-existent element (i.e. one not present in the
>>>>> collectionobjects_common schema):
>>>>>
>>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>>> <imports>
>>>>> <import service="CollectionObjects" type="CollectionObject"
>>>>> CSID="e9a3e850-2776-44f4-b068-4ab1a0c8c046">
>>>>> <schema
>>>>>
>>>>> xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
>>>>> name="collectionobjects_common">
>>>>>
>>>>>
>>>>> <collectionobjects_common:objectNumber>UC1</collectionobjects_common:objectNumber>
>>>>> </schema>
>>>>> </import>
>>>>> <import service="CollectionObjects" type="CollectionObject"
>>>>> CSID="c730a597-3229-476a-9e22-4ce89c003925">
>>>>> <schema
>>>>>
>>>>> xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
>>>>> name="collectionobjects_common">
>>>>>
>>>>>
>>>>> <collectionobjects_common:objectNumber>UC2</collectionobjects_common:objectNumber>
>>>>> </schema>
>>>>> </import>
>>>>> <import service="CollectionObjects" type="CollectionObject"
>>>>> CSID="d7358564-6a08-4dc2-a07d-9708471daa02">
>>>>> <schema
>>>>>
>>>>> xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
>>>>> name="collectionobjects_common">
>>>>>
>>>>>
>>>>> <collectionobjects_common:objectNumber>UC3</collectionobjects_common:objectNumber>
>>>>> </schema>
>>>>> </import>
>>>>> <import service="CollectionObjects" type="CollectionObject"
>>>>> CSID="6feb15c3-4e1e-4230-bb88-fa81467f6cbd">
>>>>> <schema
>>>>>
>>>>> xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
>>>>> name="collectionobjects_common">
>>>>>
>>>>>
>>>>> <collectionobjects_common:objectNumber>UC4</collectionobjects_common:objectNumber>
>>>>> <collectionobjects_common:foo>THIS ELEMENT DOESN'T EXIST IN
>>>>> THE
>>>>> SCHEMA</collectionobjects_common:foo>
>>>>> </schema>
>>>>> </import>
>>>>> <import service="CollectionObjects" type="CollectionObject"
>>>>> CSID="a5839b2c-b229-4a55-8ee3-71b2440658a3">
>>>>> <schema
>>>>>
>>>>> xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
>>>>> name="collectionobjects_common">
>>>>>
>>>>>
>>>>> <collectionobjects_common:objectNumber>UC5</collectionobjects_common:objectNumber>
>>>>> </schema>
>>>>> </import>
>>>>> </imports>
>>>>>
>>>>> This import generates the following console output (pretty printed after
>>>>> the
>>>>> fact for clarity, with hand-editing of the <report> content for further
>>>>> readability):
>>>>>
>>>>> <?xml version="1.0" encoding="utf-16"?>
>>>>> <import>
>>>>> <msg>SUCCESS</msg>
>>>>> <importedRecords>
>>>>> <importedRecord>
>>>>> <doctype>CollectionObject</doctype>
>>>>> <csid>d7358564-6a08-4dc2-a07d-9708471daa02</csid>
>>>>> </importedRecord>
>>>>> <importedRecord>
>>>>> <doctype>CollectionObject</doctype>
>>>>> <csid>c730a597-3229-476a-9e22-4ce89c003925</csid>
>>>>> </importedRecord>
>>>>> <importedRecord>
>>>>> <doctype>CollectionObject</doctype>
>>>>> <csid>e9a3e850-2776-44f4-b068-4ab1a0c8c046</csid>
>>>>> </importedRecord>
>>>>> <importedRecord>
>>>>> <doctype>CollectionObject</doctype>
>>>>> <csid>a5839b2c-b229-4a55-8ee3-71b2440658a3</csid>
>>>>> </importedRecord>
>>>>> </importedRecords>
>>>>> <status>Success</status>
>>>>> <totalRecordsImported>4</totalRecordsImported>
>>>>> <numRecordsImportedByDocType>
>>>>> <numRecordsImported>
>>>>> <docType>CollectionObject</docType>
>>>>> <numRecords>4</numRecords>
>>>>> </numRecordsImported>
>>>>> </numRecordsImportedByDocType>
>>>>> <report>
>>>>> READ:
>>>>>
>>>>> /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd/document.xml/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd
>>>>> READ:
>>>>>
>>>>> /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02/document.xml/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02
>>>>> READ:
>>>>>
>>>>> /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3/document.xml/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3
>>>>> READ:
>>>>>
>>>>> /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925/document.xml/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925
>>>>> READ:
>>>>>
>>>>> /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046/document.xml/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046</report>
>>>>> </import>
>>>>>
>>>>> Note that <totalRecordsImported> identifies that only 4 records were
>>>>> successfully imported.
>>>>>
>>>>> And by checking the CSIDs that *were* imported successfully against
>>>>> the
>>>>> entire list of CSIDs, perhaps the 'missing' records (that failed to
>>>>> import)
>>>>> could be identified? (In the list above, note that CSID
>>>>> '6feb15c3-4e1e-4230-bb88-fa81467f6cbd' - the CSID for the problematic
>>>>> fourth
>>>>> record - doesn't appear in the list of <importedRecords>.) If this test
>>>>> is
>>>>> any indication, you might need to sort both lists of CSIDs - those
>>>>> submitted
>>>>> and those successfully imported - as the ordering in the import payload
>>>>> might not match the order returned in the output from that POST ...
>>>>> Anyway,
>>>>> a thought.
>>>>>
>>>>> Also: there are others on this list who are extremely experienced at
>>>>> doing
>>>>> imports, and who might be able to share their own tips/tricks/scripts
>>>>> for
>>>>> making it easier to identify records that failed to import, and
>>>>> re-submitting those ...
>>>>>
>>>>> Aron
>>>>>
>>>>>
>>>>> On Tue, Oct 20, 2015 at 2:24 PM, Peter Murray <pmurray@chillco.com>
>>>>> wrote:
>>>>>>
>>>>>> Thanks, Aron and Richard. I'm working with Acquisition records at the
>>>>>> moment, so I would need to add the 'other number' field to it and the
>>>>>> other
>>>>>> record types in order to store that PastPerfect identifier. I think
>>>>>> I'll
>>>>>> take another look at the Import service, albeit in a one-at-a-time mode
>>>>>> so I
>>>>>> can have a better handle on error reporting.
>>>>>>
>>>>>>
>>>>>> Peter
>>>>>>
>>>>>>
>>>>>> On Oct 20, 2015, at 5:05 PM, Richard Millet
>>>>>> <richard.millet@lyrasis.org>
>>>>>> wrote:
>>>>>>
>>>>>> Peter,
>>>>>>
>>>>>> I agree with Aron. If you decide you can't (or would rather not) use
>>>>>> the
>>>>>> Import service to create the cataloging records, then using the "Other
>>>>>> Number" field is probably your best choice.
>>>>>>
>>>>>> Keep in mind that using a combination of data insertion methods
>>>>>> (RESTFul
>>>>>> API, Import Service, SQL) to get data into CollectionSpace is perfectly
>>>>>> ok.
>>>>>> So perhaps you could create all the cataloging records using the Import
>>>>>> service and then make additional changes with RESTFul PUT and other API
>>>>>> calls.
>>>>>>
>>>>>> -Richard
>>>>>>
>>>>>>
>>>>>> ________________________________
>>>>>> From: Talk <talk-bounces@lists.collectionspace.org> on behalf of Aron
>>>>>> Roberts <aron@socrates.berkeley.edu>
>>>>>> Sent: Tuesday, October 20, 2015 1:00 PM
>>>>>> To: Peter Murray
>>>>>> Cc: CollectionSpace Talk List
>>>>>> Subject: Re: [Talk] Using RESTful interface, create a record with a
>>>>>> particular CSID
>>>>>>
>>>>>> I wrote:
>>>>>>> One possible way to do this - if this were supported, say, as a
>>>>>>> future
>>>>>>> enhancement - might be to supply the CSID in the <uri> value in a
>>>>>>> <collectionspace_core> record part, in POSTs ...
>>>>>>
>>>>>> And, of course, that's exactly what you suggested, Peter! :) Serves
>>>>>> me
>>>>>> right for too-quickly skimming!
>>>>>>
>>>>>> Just thinking out loud here: the services would need to check that
>>>>>> URI
>>>>>> for at least: format, record type matching, and identifier uniqueness
>>>>>> (even
>>>>>> with the improbability of duplicate Type 4 UUIDs), and presumably
>>>>>> reject
>>>>>> records that didn't pass those validation checks, returning a '400 Bad
>>>>>> Request' or similar status.
>>>>>>
>>>>>> And for certain record types, the services might also need to check
>>>>>> and/or synthesize the <refName> value. (For object or procedural
>>>>>> records
>>>>>> with hierarchy, such as Cataloging records, the CSID is part of that
>>>>>> refName.)
>>>>>>
>>>>>> Aron
>>>>>>
>>>>>> On Tue, Oct 20, 2015 at 12:46 PM, Aron Roberts
>>>>>> <aron@socrates.berkeley.edu> wrote:
>>>>>>>
>>>>>>> As a possible workaround, the Imports service will allow you to
>>>>>>> specify
>>>>>>> a CSID for a newly imported record.
>>>>>>>
>>>>>>> As an off-the-cuff, not-researched response: I don't recall if you
>>>>>>> can
>>>>>>> specify a CSID on a POST, when interacting with the services for
>>>>>>> various
>>>>>>> record types (i.e. outside of an import context), but my recollection
>>>>>>> is
>>>>>>> that's not possible.
>>>>>>>
>>>>>>> One possible way to do this - if this were supported, say, as a
>>>>>>> future
>>>>>>> enhancement - might be to supply the CSID in the <uri> value in a
>>>>>>> <collectionspace_core> record part, in POSTs; e.g.
>>>>>>>
>>>>>>> <document name="collectionobjects">
>>>>>>> <ns2:collectionspace_core>
>>>>>>> ...
>>>>>>> <uri>/collectionobjects/90c0a0e6-eeca-46dd-add6</uri>
>>>>>>> </ns2:collectionspace_core>
>>>>>>> <ns2:collectionobjects_common>
>>>>>>> ...
>>>>>>>
>>>>>>>> it seems to be a really handy thing to have the CSID match
>>>>>>>> PastPerfect
>>>>>>>> ID (especially in the migration process when I am iterating through
>>>>>>>> loading
>>>>>>>> templates and linking records together).
>>>>>>>
>>>>>>> Would the 'other number' multivalued field in
>>>>>>> Cataloging/CollectionObject records work for this purpose? Out of the
>>>>>>> box,
>>>>>>> there's a 'previous' type for that field. (See attached and below.)
>>>>>>>
>>>>>>> <cspace-other-number-field-example.png>
>>>>>>>
>>>>>>>
>>>>>>> <otherNumberList>
>>>>>>> <otherNumber>
>>>>>>> <numberValue>0001</numberValue>
>>>>>>> <numberType>serial</numberType>
>>>>>>> </otherNumber>
>>>>>>> <otherNumber>
>>>>>>> <numberValue>204b95db-1557-4c8d-ba28-42e5578e53d3</numberValue>
>>>>>>> <numberType>previous</numberType>
>>>>>>> </otherNumber>
>>>>>>> </otherNumberList>
>>>>>>>
>>>>>>> AFAIK, this is the provided/intended way to stash away formerly-used
>>>>>>> museum numbers or identifiers that you'd like to continue to have
>>>>>>> associated
>>>>>>> with a record in CollectionSpace, although this clearly isn't as
>>>>>>> clean/easy
>>>>>>> to work with as having matching UUIDs in both one's old and new
>>>>>>> systems.
>>>>>>>
>>>>>>> Aron
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 20, 2015 at 12:26 PM, Peter Murray <pmurray@chillco.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> As it happens, PastPerfect also uses Type-4 UUIDs as internal record
>>>>>>>> numbers, and it seems to be a really handy thing to have the CSID
>>>>>>>> match
>>>>>>>> PastPerfect ID (especially in the migration process when I am
>>>>>>>> iterating
>>>>>>>> through loading templates and linking records together). The problem
>>>>>>>> is
>>>>>>>> that the RESTful service interface doesn't seem to let me specify a
>>>>>>>> CSID.
>>>>>>>>
>>>>>>>> If I PUT to /cspace-services/acquisitions/{{UUID}} and that record
>>>>>>>> doesn't already exist, I get back a 404.[1] If I POST to
>>>>>>>> /cspace-services/acquisitions and include this in the document:
>>>>>>>>
>>>>>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>>>>>> <document name="acquisitions">
>>>>>>>> <ns2:collectionspace_core
>>>>>>>> xmlns:ns2="http://collectionspace.org/collectionspace_core/"
>>>>>>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
>>>>>>>> <updatedBy>PastPerfect: {{ UPDATEDBY }}</updatedBy>
>>>>>>>> <createdBy>PastPerfect Migration</createdBy>
>>>>>>>> <workflowState>project</workflowState>
>>>>>>>> <tenantId>11</tenantId>
>>>>>>>> <updatedAt>{{ __updatedAt }}</updatedAt>
>>>>>>>> <uri>/acquisitions/{{ PPID }}</uri>
>>>>>>>> </ns2:collectionspace_core>
>>>>>>>>
>>>>>>>> ...the service then doesn't honor the identifier in the <uri> element
>>>>>>>> and it assigns the record a new CSID. (The above, by the way, is
>>>>>>>> part of
>>>>>>>> the Jinja2 template I'm using to create records, so the {{ PPID }} is
>>>>>>>> a
>>>>>>>> replaced placeholder.)
>>>>>>>>
>>>>>>>> Thoughts?
>>>>>>>>
>>>>>>>>
>>>>>>>> Peter
>>>>>>>>
>>>>>>>> [1] This is what I expect a RESTful interface to do...
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Peter Murray
>>>>>> Dev/Ops Lead and Project Manager
>>>>>> Cherry Hill Company
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Talk mailing list
>>>>>> Talk@lists.collectionspace.org
>>>>>>
>>>>>>
>>>>>> http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Talk mailing list
>>>>> Talk@lists.collectionspace.org
>>>>>
>>>>> http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org
>
>
> --
> Peter Murray
> Dev/Ops Lead and Project Manager
> Cherry Hill Company
>
>
> _______________________________________________
> Talk mailing list
> Talk@lists.collectionspace.org
> http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org
RL
Ray Lee
Wed, Oct 21, 2015 10:59 PM
The server logs are definitely crucial in figuring out what went wrong.
The XML reply to an import will tell you that an error happened (or
sometimes not), but the WHY of the problem with the import can only be
found in the server logs.
A side benefit that I've noticed is that I can force data into the CSpace
record that I wouldn't otherwise be able to, such as the last PastPerfect
user to edit the record and when that edit happened. I'm also able to put
in a "Past Perfect Migration" string in for "creator" so later viewers will
know that this record started in the legacy system. So, on the whole,
using ImportsService probably makes sense.
One thing I have noticed is that ImportsService does not clean up after
itself by deleting the temp files. Probably should file a ticket for that,
but the left over files are useful for debugging.
As always, thanks for the discussion and ideas,
Peter
On Oct 20, 2015, at 7:14 PM, Susan STONE sstone@berkeley.edu wrote:
Aron,
I definitely find stuff in the server-side logs that helps me find
errors in the XML. It can be a painful process, so I haven't saved any
cherished examples.
Susan
On Tue, Oct 20, 2015 at 4:05 PM, Aron Roberts
aron@socrates.berkeley.edu wrote:
In my experience, it is usually all or nothing (as with a database
when the imports are too large or backed up) ...
Interesting. Have you been able to capture any log output on the server
side when those issues occurred? And are there CSpace JIRA issues for
I'd be happy to create one (or more) if you have any raw material around
this.
Aron
On Tue, Oct 20, 2015 at 3:57 PM, Susan STONE sstone@berkeley.edu
Aron,
In my experience, it is usually all or nothing (as with a database
when the imports are too large or backed up),
and I just check the total for each batch.
I usually work out the XML issues in testing.
In the rare cases where there is a problem in some individual records
and the totals don't match, I have been comparing the
CSIDs manually-ish, but we are working to
automate that process and log the particular records
missed so they can be checked and resubmitted.
Susan
On Tue, Oct 20, 2015 at 3:47 PM, Aron Roberts
aron@socrates.berkeley.edu wrote:
I think I'll take another look at the Import service, albeit in a
one-at-a-time mode so I can have a better handle on error reporting.
From a trivial test just now, I'm wondering whether the Imports
service
might give us just enough information to do a multi-record import,
be
able to tell which records were successfully imported and which were
not?
Specifically, if we're providing CSIDs for each record at import
perhaps we can tell which were successfully imported, and which failed
to be
imported - and thus need to be fixed and re-submitted in a follow-up
import?
Example POST to the Imports service, of five CollectionObject records
to
be imported into the 'core' tenant:
curl -X POST http://yourhostnamehere:8180/cspace-services/imports -i
"admin@core.collectionspace.org:Administrator" -H "Content-Type:
application/xml" -T mixed-objects-some-invalid.xml
Where the file 'mixed-objects-some-invalid.xml' is a payload
consisting of
five CollectionObject records to be imported, and where the fourth
record includes a non-existent element (i.e. one not present in the
collectionobjects_common schema):
<?xml version="1.0" encoding="UTF-8"?>
<imports>
<import service="CollectionObjects" type="CollectionObject"
CSID="e9a3e850-2776-44f4-b068-4ab1a0c8c046">
<schema
xmlns:collectionobjects_common="
name="collectionobjects_common">
<collectionobjects_common:objectNumber>UC1</collectionobjects_common:objectNumber>
</schema>
</import>
<import service="CollectionObjects" type="CollectionObject"
CSID="c730a597-3229-476a-9e22-4ce89c003925">
<schema
xmlns:collectionobjects_common="
name="collectionobjects_common">
<collectionobjects_common:objectNumber>UC2</collectionobjects_common:objectNumber>
</schema>
</import>
<import service="CollectionObjects" type="CollectionObject"
CSID="d7358564-6a08-4dc2-a07d-9708471daa02">
<schema
xmlns:collectionobjects_common="
name="collectionobjects_common">
<collectionobjects_common:objectNumber>UC3</collectionobjects_common:objectNumber>
</schema>
</import>
<import service="CollectionObjects" type="CollectionObject"
CSID="6feb15c3-4e1e-4230-bb88-fa81467f6cbd">
<schema
xmlns:collectionobjects_common="
name="collectionobjects_common">
<collectionobjects_common:objectNumber>UC4</collectionobjects_common:objectNumber>
<collectionobjects_common:foo>THIS ELEMENT DOESN'T EXIST IN
THE
SCHEMA</collectionobjects_common:foo>
</schema>
</import>
<import service="CollectionObjects" type="CollectionObject"
CSID="a5839b2c-b229-4a55-8ee3-71b2440658a3">
<schema
xmlns:collectionobjects_common="
name="collectionobjects_common">
<collectionobjects_common:objectNumber>UC5</collectionobjects_common:objectNumber>
</schema>
</import>
</imports>
This import generates the following console output (pretty printed
the
fact for clarity, with hand-editing of the <report> content for
readability):
<?xml version="1.0" encoding="utf-16"?>
<import>
<msg>SUCCESS</msg>
<importedRecords>
<importedRecord>
<doctype>CollectionObject</doctype>
<csid>d7358564-6a08-4dc2-a07d-9708471daa02</csid>
</importedRecord>
<importedRecord>
<doctype>CollectionObject</doctype>
<csid>c730a597-3229-476a-9e22-4ce89c003925</csid>
</importedRecord>
<importedRecord>
<doctype>CollectionObject</doctype>
<csid>e9a3e850-2776-44f4-b068-4ab1a0c8c046</csid>
</importedRecord>
<importedRecord>
<doctype>CollectionObject</doctype>
<csid>a5839b2c-b229-4a55-8ee3-71b2440658a3</csid>
</importedRecord>
</importedRecords>
<status>Success</status>
<totalRecordsImported>4</totalRecordsImported>
<numRecordsImportedByDocType>
<numRecordsImported>
<docType>CollectionObject</docType>
<numRecords>4</numRecords>
</numRecordsImported>
</numRecordsImportedByDocType>
<report>
READ:
/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd/document.xml/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd
/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02/document.xml/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02
/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3/document.xml/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3
/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925/document.xml/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925
/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046/document.xml/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046</report>
</import>
Note that <totalRecordsImported> identifies that only 4 records were
successfully imported.
And by checking the CSIDs that were imported successfully against
the
entire list of CSIDs, perhaps the 'missing' records (that failed to
import)
could be identified? (In the list above, note that CSID
'6feb15c3-4e1e-4230-bb88-fa81467f6cbd' - the CSID for the problematic
fourth
record - doesn't appear in the list of <importedRecords>.) If this
is
any indication, you might need to sort both lists of CSIDs - those
submitted
and those successfully imported - as the ordering in the import
might not match the order returned in the output from that POST ...
Anyway,
a thought.
Also: there are others on this list who are extremely experienced at
doing
imports, and who might be able to share their own tips/tricks/scripts
for
making it easier to identify records that failed to import, and
re-submitting those ...
Aron
On Tue, Oct 20, 2015 at 2:24 PM, Peter Murray pmurray@chillco.com
wrote:
Thanks, Aron and Richard. I'm working with Acquisition records at
moment, so I would need to add the 'other number' field to it and the
other
record types in order to store that PastPerfect identifier. I think
I'll
take another look at the Import service, albeit in a one-at-a-time
so I
can have a better handle on error reporting.
Peter
On Oct 20, 2015, at 5:05 PM, Richard Millet
richard.millet@lyrasis.org
wrote:
Peter,
I agree with Aron. If you decide you can't (or would rather not) use
the
Import service to create the cataloging records, then using the
Number" field is probably your best choice.
Keep in mind that using a combination of data insertion methods
(RESTFul
API, Import Service, SQL) to get data into CollectionSpace is
ok.
So perhaps you could create all the cataloging records using the
service and then make additional changes with RESTFul PUT and other
Roberts aron@socrates.berkeley.edu
Sent: Tuesday, October 20, 2015 1:00 PM
To: Peter Murray
Cc: CollectionSpace Talk List
Subject: Re: [Talk] Using RESTful interface, create a record with a
particular CSID
I wrote:
One possible way to do this - if this were supported, say, as a
future
enhancement - might be to supply the CSID in the <uri> value in a
<collectionspace_core> record part, in POSTs ...
And, of course, that's exactly what you suggested, Peter! :) Serves
me
right for too-quickly skimming!
Just thinking out loud here: the services would need to check that
URI
for at least: format, record type matching, and identifier uniqueness
(even
with the improbability of duplicate Type 4 UUIDs), and presumably
reject
records that didn't pass those validation checks, returning a '400
Request' or similar status.
And for certain record types, the services might also need to check
and/or synthesize the <refName> value. (For object or procedural
records
with hierarchy, such as Cataloging records, the CSID is part of that
refName.)
Aron
On Tue, Oct 20, 2015 at 12:46 PM, Aron Roberts
aron@socrates.berkeley.edu wrote:
As a possible workaround, the Imports service will allow you to
specify
a CSID for a newly imported record.
As an off-the-cuff, not-researched response: I don't recall if you
can
specify a CSID on a POST, when interacting with the services for
various
record types (i.e. outside of an import context), but my
is
that's not possible.
One possible way to do this - if this were supported, say, as a
future
enhancement - might be to supply the CSID in the <uri> value in a
<collectionspace_core> record part, in POSTs; e.g.
<document name="collectionobjects">
<ns2:collectionspace_core>
...
<uri>/collectionobjects/90c0a0e6-eeca-46dd-add6</uri>
</ns2:collectionspace_core>
<ns2:collectionobjects_common>
...
it seems to be a really handy thing to have the CSID match
PastPerfect
ID (especially in the migration process when I am iterating through
loading
templates and linking records together).
Would the 'other number' multivalued field in
Cataloging/CollectionObject records work for this purpose? Out of
box,
there's a 'previous' type for that field. (See attached and below.)
<cspace-other-number-field-example.png>
<otherNumberList>
<otherNumber>
<numberValue>0001</numberValue>
<numberType>serial</numberType>
</otherNumber>
<otherNumber>
<numberValue>204b95db-1557-4c8d-ba28-42e5578e53d3</numberValue>
<numberType>previous</numberType>
</otherNumber>
</otherNumberList>
AFAIK, this is the provided/intended way to stash away
museum numbers or identifiers that you'd like to continue to have
associated
with a record in CollectionSpace, although this clearly isn't as
clean/easy
to work with as having matching UUIDs in both one's old and new
systems.
Aron
On Tue, Oct 20, 2015 at 12:26 PM, Peter Murray <pmurray@chillco.com
As it happens, PastPerfect also uses Type-4 UUIDs as internal
numbers, and it seems to be a really handy thing to have the CSID
match
PastPerfect ID (especially in the migration process when I am
iterating
through loading templates and linking records together). The
is
that the RESTful service interface doesn't seem to let me specify a
CSID.
If I PUT to /cspace-services/acquisitions/{{UUID}} and that record
doesn't already exist, I get back a 404.[1] If I POST to
/cspace-services/acquisitions and include this in the document:
<?xml version="1.0" encoding="UTF-8"?>
<document name="acquisitions">
<ns2:collectionspace_core
xmlns:ns2="http://collectionspace.org/collectionspace_core/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<updatedBy>PastPerfect: {{ UPDATEDBY }}</updatedBy>
<createdBy>PastPerfect Migration</createdBy>
<workflowState>project</workflowState>
<tenantId>11</tenantId>
<updatedAt>{{ __updatedAt }}</updatedAt>
<uri>/acquisitions/{{ PPID }}</uri>
</ns2:collectionspace_core>
...the service then doesn't honor the identifier in the <uri>
and it assigns the record a new CSID. (The above, by the way, is
part of
the Jinja2 template I'm using to create records, so the {{ PPID }}
a
replaced placeholder.)
Thoughts?
Peter
[1] This is what I expect a RESTful interface to do...
There is a JIRA already for the files being left in temp:
https://issues.collectionspace.org/browse/CSPACE-6814
Ray
On Wed, Oct 21, 2015 at 2:49 PM, Peter Murray <pmurray@chillco.com> wrote:
> The server logs are definitely crucial in figuring out what went wrong.
> The XML reply to an import will tell you that an error happened (or
> sometimes not), but the WHY of the problem with the import can only be
> found in the server logs.
>
> A side benefit that I've noticed is that I can force data into the CSpace
> record that I wouldn't otherwise be able to, such as the last PastPerfect
> user to edit the record and when that edit happened. I'm also able to put
> in a "Past Perfect Migration" string in for "creator" so later viewers will
> know that this record started in the legacy system. So, on the whole,
> using ImportsService probably makes sense.
>
> One thing I have noticed is that ImportsService does not clean up after
> itself by deleting the temp files. Probably should file a ticket for that,
> but the left over files are useful for debugging.
>
> As always, thanks for the discussion and ideas,
>
>
> Peter
>
> > On Oct 20, 2015, at 7:14 PM, Susan STONE <sstone@berkeley.edu> wrote:
> >
> > Aron,
> >
> > I definitely find stuff in the server-side logs that helps me find
> > errors in the XML. It can be a painful process, so I haven't saved any
> > cherished examples.
> >
> > Susan
> >
> > On Tue, Oct 20, 2015 at 4:05 PM, Aron Roberts
> > <aron@socrates.berkeley.edu> wrote:
> >> Thanks, Susan!
> >>
> >>> In my experience, it is usually all or nothing (as with a database
> timeout
> >> when the imports are too large or backed up) ...
> >>
> >> Interesting. Have you been able to capture any log output on the server
> >> side when those issues occurred? And are there CSpace JIRA issues for
> those?
> >> I'd be happy to create one (or more) if you have any raw material around
> >> this.
> >>
> >> Aron
> >>
> >> On Tue, Oct 20, 2015 at 3:57 PM, Susan STONE <sstone@berkeley.edu>
> wrote:
> >>>
> >>> Aron,
> >>>
> >>> In my experience, it is usually all or nothing (as with a database
> timeout
> >>> when the imports are too large or backed up),
> >>> and I just check the total for each batch.
> >>> I usually work out the XML issues in testing.
> >>>
> >>> In the rare cases where there is a problem in some individual records
> >>> and the totals don't match, I have been comparing the
> >>> CSIDs manually-ish, but we are working to
> >>> automate that process and log the particular records
> >>> missed so they can be checked and resubmitted.
> >>>
> >>> Susan
> >>>
> >>> On Tue, Oct 20, 2015 at 3:47 PM, Aron Roberts
> >>> <aron@socrates.berkeley.edu> wrote:
> >>>> Peter wrote:
> >>>>> I think I'll take another look at the Import service, albeit in a
> >>>>> one-at-a-time mode so I can have a better handle on error reporting.
> >>>>
> >>>> From a trivial test just now, I'm wondering whether the Imports
> >>>> service
> >>>> might give us *just enough* information to do a multi-record import,
> and
> >>>> be
> >>>> able to tell which records were successfully imported and which were
> >>>> not?
> >>>>
> >>>> Specifically, if we're providing CSIDs for each record at import
> time,
> >>>> perhaps we can tell which were successfully imported, and which failed
> >>>> to be
> >>>> imported - and thus need to be fixed and re-submitted in a follow-up
> >>>> import?
> >>>>
> >>>> Example POST to the Imports service, of five CollectionObject records
> >>>> to
> >>>> be imported into the 'core' tenant:
> >>>>
> >>>> curl -X POST http://yourhostnamehere:8180/cspace-services/imports -i
> -u
> >>>> "admin@core.collectionspace.org:Administrator" -H "Content-Type:
> >>>> application/xml" -T mixed-objects-some-invalid.xml
> >>>>
> >>>> Where the file 'mixed-objects-some-invalid.xml' is a payload
> >>>> consisting of
> >>>> five CollectionObject records to be imported, and where the fourth
> such
> >>>> record includes a non-existent element (i.e. one not present in the
> >>>> collectionobjects_common schema):
> >>>>
> >>>> <?xml version="1.0" encoding="UTF-8"?>
> >>>> <imports>
> >>>> <import service="CollectionObjects" type="CollectionObject"
> >>>> CSID="e9a3e850-2776-44f4-b068-4ab1a0c8c046">
> >>>> <schema
> >>>>
> >>>> xmlns:collectionobjects_common="
> http://collectionspace.org/services/collectionobject"
> >>>> name="collectionobjects_common">
> >>>>
> >>>>
> >>>>
> <collectionobjects_common:objectNumber>UC1</collectionobjects_common:objectNumber>
> >>>> </schema>
> >>>> </import>
> >>>> <import service="CollectionObjects" type="CollectionObject"
> >>>> CSID="c730a597-3229-476a-9e22-4ce89c003925">
> >>>> <schema
> >>>>
> >>>> xmlns:collectionobjects_common="
> http://collectionspace.org/services/collectionobject"
> >>>> name="collectionobjects_common">
> >>>>
> >>>>
> >>>>
> <collectionobjects_common:objectNumber>UC2</collectionobjects_common:objectNumber>
> >>>> </schema>
> >>>> </import>
> >>>> <import service="CollectionObjects" type="CollectionObject"
> >>>> CSID="d7358564-6a08-4dc2-a07d-9708471daa02">
> >>>> <schema
> >>>>
> >>>> xmlns:collectionobjects_common="
> http://collectionspace.org/services/collectionobject"
> >>>> name="collectionobjects_common">
> >>>>
> >>>>
> >>>>
> <collectionobjects_common:objectNumber>UC3</collectionobjects_common:objectNumber>
> >>>> </schema>
> >>>> </import>
> >>>> <import service="CollectionObjects" type="CollectionObject"
> >>>> CSID="6feb15c3-4e1e-4230-bb88-fa81467f6cbd">
> >>>> <schema
> >>>>
> >>>> xmlns:collectionobjects_common="
> http://collectionspace.org/services/collectionobject"
> >>>> name="collectionobjects_common">
> >>>>
> >>>>
> >>>>
> <collectionobjects_common:objectNumber>UC4</collectionobjects_common:objectNumber>
> >>>> <collectionobjects_common:foo>THIS ELEMENT DOESN'T EXIST IN
> >>>> THE
> >>>> SCHEMA</collectionobjects_common:foo>
> >>>> </schema>
> >>>> </import>
> >>>> <import service="CollectionObjects" type="CollectionObject"
> >>>> CSID="a5839b2c-b229-4a55-8ee3-71b2440658a3">
> >>>> <schema
> >>>>
> >>>> xmlns:collectionobjects_common="
> http://collectionspace.org/services/collectionobject"
> >>>> name="collectionobjects_common">
> >>>>
> >>>>
> >>>>
> <collectionobjects_common:objectNumber>UC5</collectionobjects_common:objectNumber>
> >>>> </schema>
> >>>> </import>
> >>>> </imports>
> >>>>
> >>>> This import generates the following console output (pretty printed
> after
> >>>> the
> >>>> fact for clarity, with hand-editing of the <report> content for
> further
> >>>> readability):
> >>>>
> >>>> <?xml version="1.0" encoding="utf-16"?>
> >>>> <import>
> >>>> <msg>SUCCESS</msg>
> >>>> <importedRecords>
> >>>> <importedRecord>
> >>>> <doctype>CollectionObject</doctype>
> >>>> <csid>d7358564-6a08-4dc2-a07d-9708471daa02</csid>
> >>>> </importedRecord>
> >>>> <importedRecord>
> >>>> <doctype>CollectionObject</doctype>
> >>>> <csid>c730a597-3229-476a-9e22-4ce89c003925</csid>
> >>>> </importedRecord>
> >>>> <importedRecord>
> >>>> <doctype>CollectionObject</doctype>
> >>>> <csid>e9a3e850-2776-44f4-b068-4ab1a0c8c046</csid>
> >>>> </importedRecord>
> >>>> <importedRecord>
> >>>> <doctype>CollectionObject</doctype>
> >>>> <csid>a5839b2c-b229-4a55-8ee3-71b2440658a3</csid>
> >>>> </importedRecord>
> >>>> </importedRecords>
> >>>> <status>Success</status>
> >>>> <totalRecordsImported>4</totalRecordsImported>
> >>>> <numRecordsImportedByDocType>
> >>>> <numRecordsImported>
> >>>> <docType>CollectionObject</docType>
> >>>> <numRecords>4</numRecords>
> >>>> </numRecordsImported>
> >>>> </numRecordsImportedByDocType>
> >>>> <report>
> >>>> READ:
> >>>>
> >>>>
> /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd/document.xml/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd
> >>>> READ:
> >>>>
> >>>>
> /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02/document.xml/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02
> >>>> READ:
> >>>>
> >>>>
> /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3/document.xml/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3
> >>>> READ:
> >>>>
> >>>>
> /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925/document.xml/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925
> >>>> READ:
> >>>>
> >>>>
> /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046/document.xml/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046</report>
> >>>> </import>
> >>>>
> >>>> Note that <totalRecordsImported> identifies that only 4 records were
> >>>> successfully imported.
> >>>>
> >>>> And by checking the CSIDs that *were* imported successfully against
> >>>> the
> >>>> entire list of CSIDs, perhaps the 'missing' records (that failed to
> >>>> import)
> >>>> could be identified? (In the list above, note that CSID
> >>>> '6feb15c3-4e1e-4230-bb88-fa81467f6cbd' - the CSID for the problematic
> >>>> fourth
> >>>> record - doesn't appear in the list of <importedRecords>.) If this
> test
> >>>> is
> >>>> any indication, you might need to sort both lists of CSIDs - those
> >>>> submitted
> >>>> and those successfully imported - as the ordering in the import
> payload
> >>>> might not match the order returned in the output from that POST ...
> >>>> Anyway,
> >>>> a thought.
> >>>>
> >>>> Also: there are others on this list who are extremely experienced at
> >>>> doing
> >>>> imports, and who might be able to share their own tips/tricks/scripts
> >>>> for
> >>>> making it easier to identify records that failed to import, and
> >>>> re-submitting those ...
> >>>>
> >>>> Aron
> >>>>
> >>>>
> >>>> On Tue, Oct 20, 2015 at 2:24 PM, Peter Murray <pmurray@chillco.com>
> >>>> wrote:
> >>>>>
> >>>>> Thanks, Aron and Richard. I'm working with Acquisition records at
> the
> >>>>> moment, so I would need to add the 'other number' field to it and the
> >>>>> other
> >>>>> record types in order to store that PastPerfect identifier. I think
> >>>>> I'll
> >>>>> take another look at the Import service, albeit in a one-at-a-time
> mode
> >>>>> so I
> >>>>> can have a better handle on error reporting.
> >>>>>
> >>>>>
> >>>>> Peter
> >>>>>
> >>>>>
> >>>>> On Oct 20, 2015, at 5:05 PM, Richard Millet
> >>>>> <richard.millet@lyrasis.org>
> >>>>> wrote:
> >>>>>
> >>>>> Peter,
> >>>>>
> >>>>> I agree with Aron. If you decide you can't (or would rather not) use
> >>>>> the
> >>>>> Import service to create the cataloging records, then using the
> "Other
> >>>>> Number" field is probably your best choice.
> >>>>>
> >>>>> Keep in mind that using a combination of data insertion methods
> >>>>> (RESTFul
> >>>>> API, Import Service, SQL) to get data into CollectionSpace is
> perfectly
> >>>>> ok.
> >>>>> So perhaps you could create all the cataloging records using the
> Import
> >>>>> service and then make additional changes with RESTFul PUT and other
> API
> >>>>> calls.
> >>>>>
> >>>>> -Richard
> >>>>>
> >>>>>
> >>>>> ________________________________
> >>>>> From: Talk <talk-bounces@lists.collectionspace.org> on behalf of
> Aron
> >>>>> Roberts <aron@socrates.berkeley.edu>
> >>>>> Sent: Tuesday, October 20, 2015 1:00 PM
> >>>>> To: Peter Murray
> >>>>> Cc: CollectionSpace Talk List
> >>>>> Subject: Re: [Talk] Using RESTful interface, create a record with a
> >>>>> particular CSID
> >>>>>
> >>>>> I wrote:
> >>>>>> One possible way to do this - if this were supported, say, as a
> >>>>>> future
> >>>>>> enhancement - might be to supply the CSID in the <uri> value in a
> >>>>>> <collectionspace_core> record part, in POSTs ...
> >>>>>
> >>>>> And, of course, that's exactly what you suggested, Peter! :) Serves
> >>>>> me
> >>>>> right for too-quickly skimming!
> >>>>>
> >>>>> Just thinking out loud here: the services would need to check that
> >>>>> URI
> >>>>> for at least: format, record type matching, and identifier uniqueness
> >>>>> (even
> >>>>> with the improbability of duplicate Type 4 UUIDs), and presumably
> >>>>> reject
> >>>>> records that didn't pass those validation checks, returning a '400
> Bad
> >>>>> Request' or similar status.
> >>>>>
> >>>>> And for certain record types, the services might also need to check
> >>>>> and/or synthesize the <refName> value. (For object or procedural
> >>>>> records
> >>>>> with hierarchy, such as Cataloging records, the CSID is part of that
> >>>>> refName.)
> >>>>>
> >>>>> Aron
> >>>>>
> >>>>> On Tue, Oct 20, 2015 at 12:46 PM, Aron Roberts
> >>>>> <aron@socrates.berkeley.edu> wrote:
> >>>>>>
> >>>>>> As a possible workaround, the Imports service will allow you to
> >>>>>> specify
> >>>>>> a CSID for a newly imported record.
> >>>>>>
> >>>>>> As an off-the-cuff, not-researched response: I don't recall if you
> >>>>>> can
> >>>>>> specify a CSID on a POST, when interacting with the services for
> >>>>>> various
> >>>>>> record types (i.e. outside of an import context), but my
> recollection
> >>>>>> is
> >>>>>> that's not possible.
> >>>>>>
> >>>>>> One possible way to do this - if this were supported, say, as a
> >>>>>> future
> >>>>>> enhancement - might be to supply the CSID in the <uri> value in a
> >>>>>> <collectionspace_core> record part, in POSTs; e.g.
> >>>>>>
> >>>>>> <document name="collectionobjects">
> >>>>>> <ns2:collectionspace_core>
> >>>>>> ...
> >>>>>> <uri>/collectionobjects/90c0a0e6-eeca-46dd-add6</uri>
> >>>>>> </ns2:collectionspace_core>
> >>>>>> <ns2:collectionobjects_common>
> >>>>>> ...
> >>>>>>
> >>>>>>> it seems to be a really handy thing to have the CSID match
> >>>>>>> PastPerfect
> >>>>>>> ID (especially in the migration process when I am iterating through
> >>>>>>> loading
> >>>>>>> templates and linking records together).
> >>>>>>
> >>>>>> Would the 'other number' multivalued field in
> >>>>>> Cataloging/CollectionObject records work for this purpose? Out of
> the
> >>>>>> box,
> >>>>>> there's a 'previous' type for that field. (See attached and below.)
> >>>>>>
> >>>>>> <cspace-other-number-field-example.png>
> >>>>>>
> >>>>>>
> >>>>>> <otherNumberList>
> >>>>>> <otherNumber>
> >>>>>> <numberValue>0001</numberValue>
> >>>>>> <numberType>serial</numberType>
> >>>>>> </otherNumber>
> >>>>>> <otherNumber>
> >>>>>> <numberValue>204b95db-1557-4c8d-ba28-42e5578e53d3</numberValue>
> >>>>>> <numberType>previous</numberType>
> >>>>>> </otherNumber>
> >>>>>> </otherNumberList>
> >>>>>>
> >>>>>> AFAIK, this is the provided/intended way to stash away
> formerly-used
> >>>>>> museum numbers or identifiers that you'd like to continue to have
> >>>>>> associated
> >>>>>> with a record in CollectionSpace, although this clearly isn't as
> >>>>>> clean/easy
> >>>>>> to work with as having matching UUIDs in both one's old and new
> >>>>>> systems.
> >>>>>>
> >>>>>> Aron
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Oct 20, 2015 at 12:26 PM, Peter Murray <pmurray@chillco.com
> >
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> As it happens, PastPerfect also uses Type-4 UUIDs as internal
> record
> >>>>>>> numbers, and it seems to be a really handy thing to have the CSID
> >>>>>>> match
> >>>>>>> PastPerfect ID (especially in the migration process when I am
> >>>>>>> iterating
> >>>>>>> through loading templates and linking records together). The
> problem
> >>>>>>> is
> >>>>>>> that the RESTful service interface doesn't seem to let me specify a
> >>>>>>> CSID.
> >>>>>>>
> >>>>>>> If I PUT to /cspace-services/acquisitions/{{UUID}} and that record
> >>>>>>> doesn't already exist, I get back a 404.[1] If I POST to
> >>>>>>> /cspace-services/acquisitions and include this in the document:
> >>>>>>>
> >>>>>>> <?xml version="1.0" encoding="UTF-8"?>
> >>>>>>> <document name="acquisitions">
> >>>>>>> <ns2:collectionspace_core
> >>>>>>> xmlns:ns2="http://collectionspace.org/collectionspace_core/"
> >>>>>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
> >>>>>>> <updatedBy>PastPerfect: {{ UPDATEDBY }}</updatedBy>
> >>>>>>> <createdBy>PastPerfect Migration</createdBy>
> >>>>>>> <workflowState>project</workflowState>
> >>>>>>> <tenantId>11</tenantId>
> >>>>>>> <updatedAt>{{ __updatedAt }}</updatedAt>
> >>>>>>> <uri>/acquisitions/{{ PPID }}</uri>
> >>>>>>> </ns2:collectionspace_core>
> >>>>>>>
> >>>>>>> ...the service then doesn't honor the identifier in the <uri>
> element
> >>>>>>> and it assigns the record a new CSID. (The above, by the way, is
> >>>>>>> part of
> >>>>>>> the Jinja2 template I'm using to create records, so the {{ PPID }}
> is
> >>>>>>> a
> >>>>>>> replaced placeholder.)
> >>>>>>>
> >>>>>>> Thoughts?
> >>>>>>>
> >>>>>>>
> >>>>>>> Peter
> >>>>>>>
> >>>>>>> [1] This is what I expect a RESTful interface to do...
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Peter Murray
> >>>>> Dev/Ops Lead and Project Manager
> >>>>> Cherry Hill Company
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Talk mailing list
> >>>>> Talk@lists.collectionspace.org
> >>>>>
> >>>>>
> >>>>>
> http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org
> >>>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Talk mailing list
> >>>> Talk@lists.collectionspace.org
> >>>>
> >>>>
> http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org
>
>
> --
> Peter Murray
> Dev/Ops Lead and Project Manager
> Cherry Hill Company
>
>
> _______________________________________________
> Talk mailing list
> Talk@lists.collectionspace.org
>
> http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org
>
PM
Peter Murray
Thu, Oct 22, 2015 1:45 PM
Yes, and it is an easy one to write since the files aren't touched after the ImportService is done with them. Still, it might catch people off guard...
Peter
On Oct 21, 2015, at 6:01 PM, Susan STONE sstone@berkeley.edu wrote:
Peter,
Since they are not cleaned up, it is useful to create an alias or run
a cron job to delete the import files in the temp directory that are
older than a certain amount of time so they don't build up.
Susan
On Wed, Oct 21, 2015 at 2:49 PM, Peter Murray pmurray@chillco.com wrote:
The server logs are definitely crucial in figuring out what went wrong. The XML reply to an import will tell you that an error happened (or sometimes not), but the WHY of the problem with the import can only be found in the server logs.
A side benefit that I've noticed is that I can force data into the CSpace record that I wouldn't otherwise be able to, such as the last PastPerfect user to edit the record and when that edit happened. I'm also able to put in a "Past Perfect Migration" string in for "creator" so later viewers will know that this record started in the legacy system. So, on the whole, using ImportsService probably makes sense.
One thing I have noticed is that ImportsService does not clean up after itself by deleting the temp files. Probably should file a ticket for that, but the left over files are useful for debugging.
As always, thanks for the discussion and ideas,
Peter
On Oct 20, 2015, at 7:14 PM, Susan STONE sstone@berkeley.edu wrote:
Aron,
I definitely find stuff in the server-side logs that helps me find
errors in the XML. It can be a painful process, so I haven't saved any
cherished examples.
Susan
On Tue, Oct 20, 2015 at 4:05 PM, Aron Roberts
aron@socrates.berkeley.edu wrote:
In my experience, it is usually all or nothing (as with a database timeout
when the imports are too large or backed up) ...
Interesting. Have you been able to capture any log output on the server
side when those issues occurred? And are there CSpace JIRA issues for those?
I'd be happy to create one (or more) if you have any raw material around
this.
Aron
On Tue, Oct 20, 2015 at 3:57 PM, Susan STONE sstone@berkeley.edu wrote:
Aron,
In my experience, it is usually all or nothing (as with a database timeout
when the imports are too large or backed up),
and I just check the total for each batch.
I usually work out the XML issues in testing.
In the rare cases where there is a problem in some individual records
and the totals don't match, I have been comparing the
CSIDs manually-ish, but we are working to
automate that process and log the particular records
missed so they can be checked and resubmitted.
Susan
On Tue, Oct 20, 2015 at 3:47 PM, Aron Roberts
aron@socrates.berkeley.edu wrote:
I think I'll take another look at the Import service, albeit in a
one-at-a-time mode so I can have a better handle on error reporting.
From a trivial test just now, I'm wondering whether the Imports
service
might give us just enough information to do a multi-record import, and
be
able to tell which records were successfully imported and which were
not?
Specifically, if we're providing CSIDs for each record at import time,
perhaps we can tell which were successfully imported, and which failed
to be
imported - and thus need to be fixed and re-submitted in a follow-up
import?
Example POST to the Imports service, of five CollectionObject records
to
be imported into the 'core' tenant:
curl -X POST http://yourhostnamehere:8180/cspace-services/imports -i -u
"admin@core.collectionspace.org:Administrator" -H "Content-Type:
application/xml" -T mixed-objects-some-invalid.xml
Where the file 'mixed-objects-some-invalid.xml' is a payload
consisting of
five CollectionObject records to be imported, and where the fourth such
record includes a non-existent element (i.e. one not present in the
collectionobjects_common schema):
<?xml version="1.0" encoding="UTF-8"?>
<imports>
<import service="CollectionObjects" type="CollectionObject"
CSID="e9a3e850-2776-44f4-b068-4ab1a0c8c046">
<schema
xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
name="collectionobjects_common">
<collectionobjects_common:objectNumber>UC1</collectionobjects_common:objectNumber>
</schema>
</import>
<import service="CollectionObjects" type="CollectionObject"
CSID="c730a597-3229-476a-9e22-4ce89c003925">
<schema
xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
name="collectionobjects_common">
<collectionobjects_common:objectNumber>UC2</collectionobjects_common:objectNumber>
</schema>
</import>
<import service="CollectionObjects" type="CollectionObject"
CSID="d7358564-6a08-4dc2-a07d-9708471daa02">
<schema
xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
name="collectionobjects_common">
<collectionobjects_common:objectNumber>UC3</collectionobjects_common:objectNumber>
</schema>
</import>
<import service="CollectionObjects" type="CollectionObject"
CSID="6feb15c3-4e1e-4230-bb88-fa81467f6cbd">
<schema
xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
name="collectionobjects_common">
<collectionobjects_common:objectNumber>UC4</collectionobjects_common:objectNumber>
<collectionobjects_common:foo>THIS ELEMENT DOESN'T EXIST IN
THE
SCHEMA</collectionobjects_common:foo>
</schema>
</import>
<import service="CollectionObjects" type="CollectionObject"
CSID="a5839b2c-b229-4a55-8ee3-71b2440658a3">
<schema
xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
name="collectionobjects_common">
<collectionobjects_common:objectNumber>UC5</collectionobjects_common:objectNumber>
</schema>
</import>
</imports>
This import generates the following console output (pretty printed after
the
fact for clarity, with hand-editing of the <report> content for further
readability):
<?xml version="1.0" encoding="utf-16"?>
<import>
<msg>SUCCESS</msg>
<importedRecords>
<importedRecord>
<doctype>CollectionObject</doctype>
<csid>d7358564-6a08-4dc2-a07d-9708471daa02</csid>
</importedRecord>
<importedRecord>
<doctype>CollectionObject</doctype>
<csid>c730a597-3229-476a-9e22-4ce89c003925</csid>
</importedRecord>
<importedRecord>
<doctype>CollectionObject</doctype>
<csid>e9a3e850-2776-44f4-b068-4ab1a0c8c046</csid>
</importedRecord>
<importedRecord>
<doctype>CollectionObject</doctype>
<csid>a5839b2c-b229-4a55-8ee3-71b2440658a3</csid>
</importedRecord>
</importedRecords>
<status>Success</status>
<totalRecordsImported>4</totalRecordsImported>
<numRecordsImportedByDocType>
<numRecordsImported>
<docType>CollectionObject</docType>
<numRecords>4</numRecords>
</numRecordsImported>
</numRecordsImportedByDocType>
<report>
READ:
/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd/document.xml/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd
READ:
/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02/document.xml/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02
READ:
/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3/document.xml/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3
READ:
/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925/document.xml/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925
READ:
/usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046/document.xml/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046</report>
</import>
Note that <totalRecordsImported> identifies that only 4 records were
successfully imported.
And by checking the CSIDs that were imported successfully against
the
entire list of CSIDs, perhaps the 'missing' records (that failed to
import)
could be identified? (In the list above, note that CSID
'6feb15c3-4e1e-4230-bb88-fa81467f6cbd' - the CSID for the problematic
fourth
record - doesn't appear in the list of <importedRecords>.) If this test
is
any indication, you might need to sort both lists of CSIDs - those
submitted
and those successfully imported - as the ordering in the import payload
might not match the order returned in the output from that POST ...
Anyway,
a thought.
Also: there are others on this list who are extremely experienced at
doing
imports, and who might be able to share their own tips/tricks/scripts
for
making it easier to identify records that failed to import, and
re-submitting those ...
Aron
On Tue, Oct 20, 2015 at 2:24 PM, Peter Murray pmurray@chillco.com
wrote:
Thanks, Aron and Richard. I'm working with Acquisition records at the
moment, so I would need to add the 'other number' field to it and the
other
record types in order to store that PastPerfect identifier. I think
I'll
take another look at the Import service, albeit in a one-at-a-time mode
so I
can have a better handle on error reporting.
Peter
On Oct 20, 2015, at 5:05 PM, Richard Millet
richard.millet@lyrasis.org
wrote:
Peter,
I agree with Aron. If you decide you can't (or would rather not) use
the
Import service to create the cataloging records, then using the "Other
Number" field is probably your best choice.
Keep in mind that using a combination of data insertion methods
(RESTFul
API, Import Service, SQL) to get data into CollectionSpace is perfectly
ok.
So perhaps you could create all the cataloging records using the Import
service and then make additional changes with RESTFul PUT and other API
calls.
-Richard
From: Talk talk-bounces@lists.collectionspace.org on behalf of Aron
Roberts aron@socrates.berkeley.edu
Sent: Tuesday, October 20, 2015 1:00 PM
To: Peter Murray
Cc: CollectionSpace Talk List
Subject: Re: [Talk] Using RESTful interface, create a record with a
particular CSID
I wrote:
One possible way to do this - if this were supported, say, as a
future
enhancement - might be to supply the CSID in the <uri> value in a
<collectionspace_core> record part, in POSTs ...
And, of course, that's exactly what you suggested, Peter! :) Serves
me
right for too-quickly skimming!
Just thinking out loud here: the services would need to check that
URI
for at least: format, record type matching, and identifier uniqueness
(even
with the improbability of duplicate Type 4 UUIDs), and presumably
reject
records that didn't pass those validation checks, returning a '400 Bad
Request' or similar status.
And for certain record types, the services might also need to check
and/or synthesize the <refName> value. (For object or procedural
records
with hierarchy, such as Cataloging records, the CSID is part of that
refName.)
Aron
On Tue, Oct 20, 2015 at 12:46 PM, Aron Roberts
aron@socrates.berkeley.edu wrote:
As a possible workaround, the Imports service will allow you to
specify
a CSID for a newly imported record.
As an off-the-cuff, not-researched response: I don't recall if you
can
specify a CSID on a POST, when interacting with the services for
various
record types (i.e. outside of an import context), but my recollection
is
that's not possible.
One possible way to do this - if this were supported, say, as a
future
enhancement - might be to supply the CSID in the <uri> value in a
<collectionspace_core> record part, in POSTs; e.g.
<document name="collectionobjects">
<ns2:collectionspace_core>
...
<uri>/collectionobjects/90c0a0e6-eeca-46dd-add6</uri>
</ns2:collectionspace_core>
<ns2:collectionobjects_common>
...
it seems to be a really handy thing to have the CSID match
PastPerfect
ID (especially in the migration process when I am iterating through
loading
templates and linking records together).
Would the 'other number' multivalued field in
Cataloging/CollectionObject records work for this purpose? Out of the
box,
there's a 'previous' type for that field. (See attached and below.)
<cspace-other-number-field-example.png>
<otherNumberList>
<otherNumber>
<numberValue>0001</numberValue>
<numberType>serial</numberType>
</otherNumber>
<otherNumber>
<numberValue>204b95db-1557-4c8d-ba28-42e5578e53d3</numberValue>
<numberType>previous</numberType>
</otherNumber>
</otherNumberList>
AFAIK, this is the provided/intended way to stash away formerly-used
museum numbers or identifiers that you'd like to continue to have
associated
with a record in CollectionSpace, although this clearly isn't as
clean/easy
to work with as having matching UUIDs in both one's old and new
systems.
Aron
On Tue, Oct 20, 2015 at 12:26 PM, Peter Murray pmurray@chillco.com
wrote:
As it happens, PastPerfect also uses Type-4 UUIDs as internal record
numbers, and it seems to be a really handy thing to have the CSID
match
PastPerfect ID (especially in the migration process when I am
iterating
through loading templates and linking records together). The problem
is
that the RESTful service interface doesn't seem to let me specify a
CSID.
If I PUT to /cspace-services/acquisitions/{{UUID}} and that record
doesn't already exist, I get back a 404.[1] If I POST to
/cspace-services/acquisitions and include this in the document:
<?xml version="1.0" encoding="UTF-8"?>
<document name="acquisitions">
<ns2:collectionspace_core
xmlns:ns2="http://collectionspace.org/collectionspace_core/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<updatedBy>PastPerfect: {{ UPDATEDBY }}</updatedBy>
<createdBy>PastPerfect Migration</createdBy>
<workflowState>project</workflowState>
<tenantId>11</tenantId>
<updatedAt>{{ __updatedAt }}</updatedAt>
<uri>/acquisitions/{{ PPID }}</uri>
</ns2:collectionspace_core>
...the service then doesn't honor the identifier in the <uri> element
and it assigns the record a new CSID. (The above, by the way, is
part of
the Jinja2 template I'm using to create records, so the {{ PPID }} is
a
replaced placeholder.)
Thoughts?
Peter
[1] This is what I expect a RESTful interface to do...
--
Peter Murray
Dev/Ops Lead and Project Manager
Cherry Hill Company
Yes, and it is an easy one to write since the files aren't touched after the ImportService is done with them. Still, it might catch people off guard...
Peter
> On Oct 21, 2015, at 6:01 PM, Susan STONE <sstone@berkeley.edu> wrote:
>
> Peter,
>
> Since they are not cleaned up, it is useful to create an alias or run
> a cron job to delete the import files in the temp directory that are
> older than a certain amount of time so they don't build up.
>
> Susan
>
>
>
> On Wed, Oct 21, 2015 at 2:49 PM, Peter Murray <pmurray@chillco.com> wrote:
>> The server logs are definitely crucial in figuring out what went wrong. The XML reply to an import will tell you that an error happened (or sometimes not), but the WHY of the problem with the import can only be found in the server logs.
>>
>> A side benefit that I've noticed is that I can force data into the CSpace record that I wouldn't otherwise be able to, such as the last PastPerfect user to edit the record and when that edit happened. I'm also able to put in a "Past Perfect Migration" string in for "creator" so later viewers will know that this record started in the legacy system. So, on the whole, using ImportsService probably makes sense.
>>
>> One thing I have noticed is that ImportsService does not clean up after itself by deleting the temp files. Probably should file a ticket for that, but the left over files are useful for debugging.
>>
>> As always, thanks for the discussion and ideas,
>>
>>
>> Peter
>>
>>> On Oct 20, 2015, at 7:14 PM, Susan STONE <sstone@berkeley.edu> wrote:
>>>
>>> Aron,
>>>
>>> I definitely find stuff in the server-side logs that helps me find
>>> errors in the XML. It can be a painful process, so I haven't saved any
>>> cherished examples.
>>>
>>> Susan
>>>
>>> On Tue, Oct 20, 2015 at 4:05 PM, Aron Roberts
>>> <aron@socrates.berkeley.edu> wrote:
>>>> Thanks, Susan!
>>>>
>>>>> In my experience, it is usually all or nothing (as with a database timeout
>>>> when the imports are too large or backed up) ...
>>>>
>>>> Interesting. Have you been able to capture any log output on the server
>>>> side when those issues occurred? And are there CSpace JIRA issues for those?
>>>> I'd be happy to create one (or more) if you have any raw material around
>>>> this.
>>>>
>>>> Aron
>>>>
>>>> On Tue, Oct 20, 2015 at 3:57 PM, Susan STONE <sstone@berkeley.edu> wrote:
>>>>>
>>>>> Aron,
>>>>>
>>>>> In my experience, it is usually all or nothing (as with a database timeout
>>>>> when the imports are too large or backed up),
>>>>> and I just check the total for each batch.
>>>>> I usually work out the XML issues in testing.
>>>>>
>>>>> In the rare cases where there is a problem in some individual records
>>>>> and the totals don't match, I have been comparing the
>>>>> CSIDs manually-ish, but we are working to
>>>>> automate that process and log the particular records
>>>>> missed so they can be checked and resubmitted.
>>>>>
>>>>> Susan
>>>>>
>>>>> On Tue, Oct 20, 2015 at 3:47 PM, Aron Roberts
>>>>> <aron@socrates.berkeley.edu> wrote:
>>>>>> Peter wrote:
>>>>>>> I think I'll take another look at the Import service, albeit in a
>>>>>>> one-at-a-time mode so I can have a better handle on error reporting.
>>>>>>
>>>>>> From a trivial test just now, I'm wondering whether the Imports
>>>>>> service
>>>>>> might give us *just enough* information to do a multi-record import, and
>>>>>> be
>>>>>> able to tell which records were successfully imported and which were
>>>>>> not?
>>>>>>
>>>>>> Specifically, if we're providing CSIDs for each record at import time,
>>>>>> perhaps we can tell which were successfully imported, and which failed
>>>>>> to be
>>>>>> imported - and thus need to be fixed and re-submitted in a follow-up
>>>>>> import?
>>>>>>
>>>>>> Example POST to the Imports service, of five CollectionObject records
>>>>>> to
>>>>>> be imported into the 'core' tenant:
>>>>>>
>>>>>> curl -X POST http://yourhostnamehere:8180/cspace-services/imports -i -u
>>>>>> "admin@core.collectionspace.org:Administrator" -H "Content-Type:
>>>>>> application/xml" -T mixed-objects-some-invalid.xml
>>>>>>
>>>>>> Where the file 'mixed-objects-some-invalid.xml' is a payload
>>>>>> consisting of
>>>>>> five CollectionObject records to be imported, and where the fourth such
>>>>>> record includes a non-existent element (i.e. one not present in the
>>>>>> collectionobjects_common schema):
>>>>>>
>>>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>>>> <imports>
>>>>>> <import service="CollectionObjects" type="CollectionObject"
>>>>>> CSID="e9a3e850-2776-44f4-b068-4ab1a0c8c046">
>>>>>> <schema
>>>>>>
>>>>>> xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
>>>>>> name="collectionobjects_common">
>>>>>>
>>>>>>
>>>>>> <collectionobjects_common:objectNumber>UC1</collectionobjects_common:objectNumber>
>>>>>> </schema>
>>>>>> </import>
>>>>>> <import service="CollectionObjects" type="CollectionObject"
>>>>>> CSID="c730a597-3229-476a-9e22-4ce89c003925">
>>>>>> <schema
>>>>>>
>>>>>> xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
>>>>>> name="collectionobjects_common">
>>>>>>
>>>>>>
>>>>>> <collectionobjects_common:objectNumber>UC2</collectionobjects_common:objectNumber>
>>>>>> </schema>
>>>>>> </import>
>>>>>> <import service="CollectionObjects" type="CollectionObject"
>>>>>> CSID="d7358564-6a08-4dc2-a07d-9708471daa02">
>>>>>> <schema
>>>>>>
>>>>>> xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
>>>>>> name="collectionobjects_common">
>>>>>>
>>>>>>
>>>>>> <collectionobjects_common:objectNumber>UC3</collectionobjects_common:objectNumber>
>>>>>> </schema>
>>>>>> </import>
>>>>>> <import service="CollectionObjects" type="CollectionObject"
>>>>>> CSID="6feb15c3-4e1e-4230-bb88-fa81467f6cbd">
>>>>>> <schema
>>>>>>
>>>>>> xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
>>>>>> name="collectionobjects_common">
>>>>>>
>>>>>>
>>>>>> <collectionobjects_common:objectNumber>UC4</collectionobjects_common:objectNumber>
>>>>>> <collectionobjects_common:foo>THIS ELEMENT DOESN'T EXIST IN
>>>>>> THE
>>>>>> SCHEMA</collectionobjects_common:foo>
>>>>>> </schema>
>>>>>> </import>
>>>>>> <import service="CollectionObjects" type="CollectionObject"
>>>>>> CSID="a5839b2c-b229-4a55-8ee3-71b2440658a3">
>>>>>> <schema
>>>>>>
>>>>>> xmlns:collectionobjects_common="http://collectionspace.org/services/collectionobject"
>>>>>> name="collectionobjects_common">
>>>>>>
>>>>>>
>>>>>> <collectionobjects_common:objectNumber>UC5</collectionobjects_common:objectNumber>
>>>>>> </schema>
>>>>>> </import>
>>>>>> </imports>
>>>>>>
>>>>>> This import generates the following console output (pretty printed after
>>>>>> the
>>>>>> fact for clarity, with hand-editing of the <report> content for further
>>>>>> readability):
>>>>>>
>>>>>> <?xml version="1.0" encoding="utf-16"?>
>>>>>> <import>
>>>>>> <msg>SUCCESS</msg>
>>>>>> <importedRecords>
>>>>>> <importedRecord>
>>>>>> <doctype>CollectionObject</doctype>
>>>>>> <csid>d7358564-6a08-4dc2-a07d-9708471daa02</csid>
>>>>>> </importedRecord>
>>>>>> <importedRecord>
>>>>>> <doctype>CollectionObject</doctype>
>>>>>> <csid>c730a597-3229-476a-9e22-4ce89c003925</csid>
>>>>>> </importedRecord>
>>>>>> <importedRecord>
>>>>>> <doctype>CollectionObject</doctype>
>>>>>> <csid>e9a3e850-2776-44f4-b068-4ab1a0c8c046</csid>
>>>>>> </importedRecord>
>>>>>> <importedRecord>
>>>>>> <doctype>CollectionObject</doctype>
>>>>>> <csid>a5839b2c-b229-4a55-8ee3-71b2440658a3</csid>
>>>>>> </importedRecord>
>>>>>> </importedRecords>
>>>>>> <status>Success</status>
>>>>>> <totalRecordsImported>4</totalRecordsImported>
>>>>>> <numRecordsImportedByDocType>
>>>>>> <numRecordsImported>
>>>>>> <docType>CollectionObject</docType>
>>>>>> <numRecords>4</numRecords>
>>>>>> </numRecordsImported>
>>>>>> </numRecordsImportedByDocType>
>>>>>> <report>
>>>>>> READ:
>>>>>>
>>>>>> /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd/document.xml/CollectionObjects/6feb15c3-4e1e-4230-bb88-fa81467f6cbd
>>>>>> READ:
>>>>>>
>>>>>> /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02/document.xml/CollectionObjects/d7358564-6a08-4dc2-a07d-9708471daa02
>>>>>> READ:
>>>>>>
>>>>>> /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3/document.xml/CollectionObjects/a5839b2c-b229-4a55-8ee3-71b2440658a3
>>>>>> READ:
>>>>>>
>>>>>> /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925/document.xml/CollectionObjects/c730a597-3229-476a-9e22-4ce89c003925
>>>>>> READ:
>>>>>>
>>>>>> /usr/local/share/apache-tomcat-7.0.57/temp/imports-6243858932268618966/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046/document.xml/CollectionObjects/e9a3e850-2776-44f4-b068-4ab1a0c8c046</report>
>>>>>> </import>
>>>>>>
>>>>>> Note that <totalRecordsImported> identifies that only 4 records were
>>>>>> successfully imported.
>>>>>>
>>>>>> And by checking the CSIDs that *were* imported successfully against
>>>>>> the
>>>>>> entire list of CSIDs, perhaps the 'missing' records (that failed to
>>>>>> import)
>>>>>> could be identified? (In the list above, note that CSID
>>>>>> '6feb15c3-4e1e-4230-bb88-fa81467f6cbd' - the CSID for the problematic
>>>>>> fourth
>>>>>> record - doesn't appear in the list of <importedRecords>.) If this test
>>>>>> is
>>>>>> any indication, you might need to sort both lists of CSIDs - those
>>>>>> submitted
>>>>>> and those successfully imported - as the ordering in the import payload
>>>>>> might not match the order returned in the output from that POST ...
>>>>>> Anyway,
>>>>>> a thought.
>>>>>>
>>>>>> Also: there are others on this list who are extremely experienced at
>>>>>> doing
>>>>>> imports, and who might be able to share their own tips/tricks/scripts
>>>>>> for
>>>>>> making it easier to identify records that failed to import, and
>>>>>> re-submitting those ...
>>>>>>
>>>>>> Aron
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 20, 2015 at 2:24 PM, Peter Murray <pmurray@chillco.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Thanks, Aron and Richard. I'm working with Acquisition records at the
>>>>>>> moment, so I would need to add the 'other number' field to it and the
>>>>>>> other
>>>>>>> record types in order to store that PastPerfect identifier. I think
>>>>>>> I'll
>>>>>>> take another look at the Import service, albeit in a one-at-a-time mode
>>>>>>> so I
>>>>>>> can have a better handle on error reporting.
>>>>>>>
>>>>>>>
>>>>>>> Peter
>>>>>>>
>>>>>>>
>>>>>>> On Oct 20, 2015, at 5:05 PM, Richard Millet
>>>>>>> <richard.millet@lyrasis.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Peter,
>>>>>>>
>>>>>>> I agree with Aron. If you decide you can't (or would rather not) use
>>>>>>> the
>>>>>>> Import service to create the cataloging records, then using the "Other
>>>>>>> Number" field is probably your best choice.
>>>>>>>
>>>>>>> Keep in mind that using a combination of data insertion methods
>>>>>>> (RESTFul
>>>>>>> API, Import Service, SQL) to get data into CollectionSpace is perfectly
>>>>>>> ok.
>>>>>>> So perhaps you could create all the cataloging records using the Import
>>>>>>> service and then make additional changes with RESTFul PUT and other API
>>>>>>> calls.
>>>>>>>
>>>>>>> -Richard
>>>>>>>
>>>>>>>
>>>>>>> ________________________________
>>>>>>> From: Talk <talk-bounces@lists.collectionspace.org> on behalf of Aron
>>>>>>> Roberts <aron@socrates.berkeley.edu>
>>>>>>> Sent: Tuesday, October 20, 2015 1:00 PM
>>>>>>> To: Peter Murray
>>>>>>> Cc: CollectionSpace Talk List
>>>>>>> Subject: Re: [Talk] Using RESTful interface, create a record with a
>>>>>>> particular CSID
>>>>>>>
>>>>>>> I wrote:
>>>>>>>> One possible way to do this - if this were supported, say, as a
>>>>>>>> future
>>>>>>>> enhancement - might be to supply the CSID in the <uri> value in a
>>>>>>>> <collectionspace_core> record part, in POSTs ...
>>>>>>>
>>>>>>> And, of course, that's exactly what you suggested, Peter! :) Serves
>>>>>>> me
>>>>>>> right for too-quickly skimming!
>>>>>>>
>>>>>>> Just thinking out loud here: the services would need to check that
>>>>>>> URI
>>>>>>> for at least: format, record type matching, and identifier uniqueness
>>>>>>> (even
>>>>>>> with the improbability of duplicate Type 4 UUIDs), and presumably
>>>>>>> reject
>>>>>>> records that didn't pass those validation checks, returning a '400 Bad
>>>>>>> Request' or similar status.
>>>>>>>
>>>>>>> And for certain record types, the services might also need to check
>>>>>>> and/or synthesize the <refName> value. (For object or procedural
>>>>>>> records
>>>>>>> with hierarchy, such as Cataloging records, the CSID is part of that
>>>>>>> refName.)
>>>>>>>
>>>>>>> Aron
>>>>>>>
>>>>>>> On Tue, Oct 20, 2015 at 12:46 PM, Aron Roberts
>>>>>>> <aron@socrates.berkeley.edu> wrote:
>>>>>>>>
>>>>>>>> As a possible workaround, the Imports service will allow you to
>>>>>>>> specify
>>>>>>>> a CSID for a newly imported record.
>>>>>>>>
>>>>>>>> As an off-the-cuff, not-researched response: I don't recall if you
>>>>>>>> can
>>>>>>>> specify a CSID on a POST, when interacting with the services for
>>>>>>>> various
>>>>>>>> record types (i.e. outside of an import context), but my recollection
>>>>>>>> is
>>>>>>>> that's not possible.
>>>>>>>>
>>>>>>>> One possible way to do this - if this were supported, say, as a
>>>>>>>> future
>>>>>>>> enhancement - might be to supply the CSID in the <uri> value in a
>>>>>>>> <collectionspace_core> record part, in POSTs; e.g.
>>>>>>>>
>>>>>>>> <document name="collectionobjects">
>>>>>>>> <ns2:collectionspace_core>
>>>>>>>> ...
>>>>>>>> <uri>/collectionobjects/90c0a0e6-eeca-46dd-add6</uri>
>>>>>>>> </ns2:collectionspace_core>
>>>>>>>> <ns2:collectionobjects_common>
>>>>>>>> ...
>>>>>>>>
>>>>>>>>> it seems to be a really handy thing to have the CSID match
>>>>>>>>> PastPerfect
>>>>>>>>> ID (especially in the migration process when I am iterating through
>>>>>>>>> loading
>>>>>>>>> templates and linking records together).
>>>>>>>>
>>>>>>>> Would the 'other number' multivalued field in
>>>>>>>> Cataloging/CollectionObject records work for this purpose? Out of the
>>>>>>>> box,
>>>>>>>> there's a 'previous' type for that field. (See attached and below.)
>>>>>>>>
>>>>>>>> <cspace-other-number-field-example.png>
>>>>>>>>
>>>>>>>>
>>>>>>>> <otherNumberList>
>>>>>>>> <otherNumber>
>>>>>>>> <numberValue>0001</numberValue>
>>>>>>>> <numberType>serial</numberType>
>>>>>>>> </otherNumber>
>>>>>>>> <otherNumber>
>>>>>>>> <numberValue>204b95db-1557-4c8d-ba28-42e5578e53d3</numberValue>
>>>>>>>> <numberType>previous</numberType>
>>>>>>>> </otherNumber>
>>>>>>>> </otherNumberList>
>>>>>>>>
>>>>>>>> AFAIK, this is the provided/intended way to stash away formerly-used
>>>>>>>> museum numbers or identifiers that you'd like to continue to have
>>>>>>>> associated
>>>>>>>> with a record in CollectionSpace, although this clearly isn't as
>>>>>>>> clean/easy
>>>>>>>> to work with as having matching UUIDs in both one's old and new
>>>>>>>> systems.
>>>>>>>>
>>>>>>>> Aron
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 20, 2015 at 12:26 PM, Peter Murray <pmurray@chillco.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> As it happens, PastPerfect also uses Type-4 UUIDs as internal record
>>>>>>>>> numbers, and it seems to be a really handy thing to have the CSID
>>>>>>>>> match
>>>>>>>>> PastPerfect ID (especially in the migration process when I am
>>>>>>>>> iterating
>>>>>>>>> through loading templates and linking records together). The problem
>>>>>>>>> is
>>>>>>>>> that the RESTful service interface doesn't seem to let me specify a
>>>>>>>>> CSID.
>>>>>>>>>
>>>>>>>>> If I PUT to /cspace-services/acquisitions/{{UUID}} and that record
>>>>>>>>> doesn't already exist, I get back a 404.[1] If I POST to
>>>>>>>>> /cspace-services/acquisitions and include this in the document:
>>>>>>>>>
>>>>>>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>>>>>>> <document name="acquisitions">
>>>>>>>>> <ns2:collectionspace_core
>>>>>>>>> xmlns:ns2="http://collectionspace.org/collectionspace_core/"
>>>>>>>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
>>>>>>>>> <updatedBy>PastPerfect: {{ UPDATEDBY }}</updatedBy>
>>>>>>>>> <createdBy>PastPerfect Migration</createdBy>
>>>>>>>>> <workflowState>project</workflowState>
>>>>>>>>> <tenantId>11</tenantId>
>>>>>>>>> <updatedAt>{{ __updatedAt }}</updatedAt>
>>>>>>>>> <uri>/acquisitions/{{ PPID }}</uri>
>>>>>>>>> </ns2:collectionspace_core>
>>>>>>>>>
>>>>>>>>> ...the service then doesn't honor the identifier in the <uri> element
>>>>>>>>> and it assigns the record a new CSID. (The above, by the way, is
>>>>>>>>> part of
>>>>>>>>> the Jinja2 template I'm using to create records, so the {{ PPID }} is
>>>>>>>>> a
>>>>>>>>> replaced placeholder.)
>>>>>>>>>
>>>>>>>>> Thoughts?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>> [1] This is what I expect a RESTful interface to do...
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Peter Murray
>>>>>>> Dev/Ops Lead and Project Manager
>>>>>>> Cherry Hill Company
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Talk mailing list
>>>>>>> Talk@lists.collectionspace.org
>>>>>>>
>>>>>>>
>>>>>>> http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Talk mailing list
>>>>>> Talk@lists.collectionspace.org
>>>>>>
>>>>>> http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org
>>
>>
>> --
>> Peter Murray
>> Dev/Ops Lead and Project Manager
>> Cherry Hill Company
>>
>>
>> _______________________________________________
>> Talk mailing list
>> Talk@lists.collectionspace.org
>> http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org
--
Peter Murray
Dev/Ops Lead and Project Manager
Cherry Hill Company