talk@lists.collectionspace.org

WE HAVE SUNSET THIS LISTSERV - Join us at collectionspace@lyrasislists.org

View all threads

Short identifier uniqueness requirements question?

CH
Chris Hoffman
Fri, Aug 31, 2012 7:58 PM

Hi,
I need to make sure I understand at what scope uniqueness is required for short identifiers in vocabularies.  Must short identifiers be unique across all vocabularies within one authority, or do they have to be unique just within a vocabulary?  I ask because for our UCJEPS installation, we have 3 taxonomy vocabularies.  Short identifiers are unique within each vocabulary but not across the entire set.  When I try to display one of those that has a "duplicate" short identifier in a demo 2.4 system, CSpace took 5 minutes to display the record.  The GET statement in firebug was only reporting:
http://ucjeps.collectionspace.org:8180/collectionspace/tenant/ucjeps/vocabularies/taxon/urn:cspace:name(0)

although the referer within the request headers is
http://ucjeps.collectionspace.org:8180/collectionspace/ui/ucjeps/html/taxon.html?csid=urn:cspace:name(0)&vocab=taxon

Thanks,
Chris

Hi, I need to make sure I understand at what scope uniqueness is required for short identifiers in vocabularies. Must short identifiers be unique across all vocabularies within one authority, or do they have to be unique just within a vocabulary? I ask because for our UCJEPS installation, we have 3 taxonomy vocabularies. Short identifiers are unique within each vocabulary but not across the entire set. When I try to display one of those that has a "duplicate" short identifier in a demo 2.4 system, CSpace took 5 minutes to display the record. The GET statement in firebug was only reporting: http://ucjeps.collectionspace.org:8180/collectionspace/tenant/ucjeps/vocabularies/taxon/urn:cspace:name(0) although the referer within the request headers is http://ucjeps.collectionspace.org:8180/collectionspace/ui/ucjeps/html/taxon.html?csid=urn:cspace:name(0)&vocab=taxon Thanks, Chris
JD
John Deck
Fri, Aug 31, 2012 8:34 PM

Not sure i understand the requirements here entirely but
urn:cspace:name(0)  does seem very brief!

It would be easy to construct identifiers that are globally unique
(e.g. attaching a UUID at the end of urn:cspace:) so why not use those
so other systems can work directly with the concepts that you are
identifying?

John

On Fri, Aug 31, 2012 at 12:58 PM, Chris Hoffman
chris.hoffman@berkeley.edu wrote:

Hi,
I need to make sure I understand at what scope uniqueness is required for short identifiers in vocabularies.  Must short identifiers be unique across all vocabularies within one authority, or do they have to be unique just within a vocabulary?  I ask because for our UCJEPS installation, we have 3 taxonomy vocabularies.  Short identifiers are unique within each vocabulary but not across the entire set.  When I try to display one of those that has a "duplicate" short identifier in a demo 2.4 system, CSpace took 5 minutes to display the record.  The GET statement in firebug was only reporting:
http://ucjeps.collectionspace.org:8180/collectionspace/tenant/ucjeps/vocabularies/taxon/urn:cspace:name(0)

although the referer within the request headers is
http://ucjeps.collectionspace.org:8180/collectionspace/ui/ucjeps/html/taxon.html?csid=urn:cspace:name(0)&vocab=taxon

Thanks,
Chris


Talk mailing list
Talk@lists.collectionspace.org
http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

--
John Deck
(541) 321-0689

Not sure i understand the requirements here entirely but urn:cspace:name(0) does seem very brief! It would be easy to construct identifiers that are globally unique (e.g. attaching a UUID at the end of urn:cspace:) so why not use those so other systems can work directly with the concepts that you are identifying? John On Fri, Aug 31, 2012 at 12:58 PM, Chris Hoffman <chris.hoffman@berkeley.edu> wrote: > Hi, > I need to make sure I understand at what scope uniqueness is required for short identifiers in vocabularies. Must short identifiers be unique across all vocabularies within one authority, or do they have to be unique just within a vocabulary? I ask because for our UCJEPS installation, we have 3 taxonomy vocabularies. Short identifiers are unique within each vocabulary but not across the entire set. When I try to display one of those that has a "duplicate" short identifier in a demo 2.4 system, CSpace took 5 minutes to display the record. The GET statement in firebug was only reporting: > http://ucjeps.collectionspace.org:8180/collectionspace/tenant/ucjeps/vocabularies/taxon/urn:cspace:name(0) > > although the referer within the request headers is > http://ucjeps.collectionspace.org:8180/collectionspace/ui/ucjeps/html/taxon.html?csid=urn:cspace:name(0)&vocab=taxon > > Thanks, > Chris > > > _______________________________________________ > Talk mailing list > Talk@lists.collectionspace.org > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org -- John Deck (541) 321-0689
S
sstone@socrates.berkeley.edu
Fri, Aug 31, 2012 8:48 PM

I thought short identifiers could not begin with a number?

Susan

Not sure i understand the requirements here entirely but
urn:cspace:name(0)  does seem very brief!

It would be easy to construct identifiers that are globally unique
(e.g. attaching a UUID at the end of urn:cspace:) so why not use those
so other systems can work directly with the concepts that you are
identifying?

John

On Fri, Aug 31, 2012 at 12:58 PM, Chris Hoffman
chris.hoffman@berkeley.edu wrote:

Hi,
I need to make sure I understand at what scope uniqueness is required
for short identifiers in vocabularies.  Must short identifiers be unique
across all vocabularies within one authority, or do they have to be
unique just within a vocabulary?  I ask because for our UCJEPS
installation, we have 3 taxonomy vocabularies.  Short identifiers are
unique within each vocabulary but not across the entire set.  When I try
to display one of those that has a "duplicate" short identifier in a
demo 2.4 system, CSpace took 5 minutes to display the record.  The GET
statement in firebug was only reporting:
http://ucjeps.collectionspace.org:8180/collectionspace/tenant/ucjeps/vocabularies/taxon/urn:cspace:name(0)

although the referer within the request headers is
http://ucjeps.collectionspace.org:8180/collectionspace/ui/ucjeps/html/taxon.html?csid=urn:cspace:name(0)&vocab=taxon

Thanks,
Chris


Talk mailing list
Talk@lists.collectionspace.org
http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

I thought short identifiers could not begin with a number? Susan > Not sure i understand the requirements here entirely but > urn:cspace:name(0) does seem very brief! > > It would be easy to construct identifiers that are globally unique > (e.g. attaching a UUID at the end of urn:cspace:) so why not use those > so other systems can work directly with the concepts that you are > identifying? > > John > > On Fri, Aug 31, 2012 at 12:58 PM, Chris Hoffman > <chris.hoffman@berkeley.edu> wrote: >> Hi, >> I need to make sure I understand at what scope uniqueness is required >> for short identifiers in vocabularies. Must short identifiers be unique >> across all vocabularies within one authority, or do they have to be >> unique just within a vocabulary? I ask because for our UCJEPS >> installation, we have 3 taxonomy vocabularies. Short identifiers are >> unique within each vocabulary but not across the entire set. When I try >> to display one of those that has a "duplicate" short identifier in a >> demo 2.4 system, CSpace took 5 minutes to display the record. The GET >> statement in firebug was only reporting: >> http://ucjeps.collectionspace.org:8180/collectionspace/tenant/ucjeps/vocabularies/taxon/urn:cspace:name(0) >> >> although the referer within the request headers is >> http://ucjeps.collectionspace.org:8180/collectionspace/ui/ucjeps/html/taxon.html?csid=urn:cspace:name(0)&vocab=taxon >> >> Thanks, >> Chris >> >> >> _______________________________________________ >> Talk mailing list >> Talk@lists.collectionspace.org >> http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org > > > > -- > John Deck > (541) 321-0689 > > _______________________________________________ > Talk mailing list > Talk@lists.collectionspace.org > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org >
PS
Patrick Schmitz
Fri, Aug 31, 2012 9:00 PM

Must be unique within a vocab, or you will get duplicates, and things will
be unhappy. Having the same ID in several vocabs (even within the same
authority) is fine.

ShortIdentifiers are the basis for fetching, so having a short ID that is a
reasonable word length is good (indexes better). Having one that has a
numeric pseudo-random tail (e.g., "fooBar2345", as we create for auto-create
on termCompletion) balances ease of reading/debugging with uniqueness that
will behave well in the DB. The DB will try to stem the shortId when it
prepares keyword indexes (which we use in certain cases), and so the suffix
keeps things behaving better, I think.

UUIDs will work fine as well, but are harder to make sense of when
debugging, etc.

Patrick

-----Original Message-----
From: talk-bounces@lists.collectionspace.org
[mailto:talk-bounces@lists.collectionspace.org] On Behalf Of
Chris Hoffman
Sent: Friday, August 31, 2012 12:58 PM
To: CollectionSpace Talk List
Subject: [Talk] Short identifier uniqueness requirements question?

Hi,
I need to make sure I understand at what scope uniqueness is
required for short identifiers in vocabularies.  Must short
identifiers be unique across all vocabularies within one
authority, or do they have to be unique just within a
vocabulary?  I ask because for our UCJEPS installation, we
have 3 taxonomy vocabularies.  Short identifiers are unique
within each vocabulary but not across the entire set.  When I
try to display one of those that has a "duplicate" short
identifier in a demo 2.4 system, CSpace took 5 minutes to
display the record.  The GET statement in firebug was only reporting:

http://ucjeps.collectionspace.org:8180/collectionspace/tenant/
ucjeps/vocabularies/taxon/urn:cspace:name(0)

although the referer within the request headers is

http://ucjeps.collectionspace.org:8180/collectionspace/ui/ucje
ps/html/taxon.html?csid=urn:cspace:name(0)&vocab=taxon

Thanks,
Chris


Talk mailing list
Talk@lists.collectionspace.org
http://lists.collectionspace.org/mailman/listinfo/talk_lists.c
ollectionspace.org

Must be unique within a vocab, or you will get duplicates, and things will be unhappy. Having the same ID in several vocabs (even within the same authority) is fine. ShortIdentifiers are the basis for fetching, so having a short ID that is a reasonable word length is good (indexes better). Having one that has a numeric pseudo-random tail (e.g., "fooBar2345", as we create for auto-create on termCompletion) balances ease of reading/debugging with uniqueness that will behave well in the DB. The DB will try to stem the shortId when it prepares keyword indexes (which we use in certain cases), and so the suffix keeps things behaving better, I think. UUIDs will work fine as well, but are harder to make sense of when debugging, etc. Patrick > -----Original Message----- > From: talk-bounces@lists.collectionspace.org > [mailto:talk-bounces@lists.collectionspace.org] On Behalf Of > Chris Hoffman > Sent: Friday, August 31, 2012 12:58 PM > To: CollectionSpace Talk List > Subject: [Talk] Short identifier uniqueness requirements question? > > Hi, > I need to make sure I understand at what scope uniqueness is > required for short identifiers in vocabularies. Must short > identifiers be unique across all vocabularies within one > authority, or do they have to be unique just within a > vocabulary? I ask because for our UCJEPS installation, we > have 3 taxonomy vocabularies. Short identifiers are unique > within each vocabulary but not across the entire set. When I > try to display one of those that has a "duplicate" short > identifier in a demo 2.4 system, CSpace took 5 minutes to > display the record. The GET statement in firebug was only reporting: > > http://ucjeps.collectionspace.org:8180/collectionspace/tenant/ > ucjeps/vocabularies/taxon/urn:cspace:name(0) > > although the referer within the request headers is > > http://ucjeps.collectionspace.org:8180/collectionspace/ui/ucje > ps/html/taxon.html?csid=urn:cspace:name(0)&vocab=taxon > > Thanks, > Chris > > > _______________________________________________ > Talk mailing list > Talk@lists.collectionspace.org > http://lists.collectionspace.org/mailman/listinfo/talk_lists.c > ollectionspace.org >
PS
Patrick Schmitz
Fri, Aug 31, 2012 9:01 PM

I think we just require all word chars (alnum and _, IIRC).

-----Original Message-----
From: talk-bounces@lists.collectionspace.org
[mailto:talk-bounces@lists.collectionspace.org] On Behalf Of
sstone@socrates.berkeley.edu
Sent: Friday, August 31, 2012 1:49 PM
To: John Deck
Cc: CollectionSpace Talk List
Subject: Re: [Talk] Short identifier uniqueness requirements question?

I thought short identifiers could not begin with a number?

Susan

Not sure i understand the requirements here entirely but
urn:cspace:name(0)  does seem very brief!

It would be easy to construct identifiers that are globally unique
(e.g. attaching a UUID at the end of urn:cspace:) so why

not use those

so other systems can work directly with the concepts that you are
identifying?

John

On Fri, Aug 31, 2012 at 12:58 PM, Chris Hoffman
chris.hoffman@berkeley.edu wrote:

Hi,
I need to make sure I understand at what scope uniqueness

is required

for short identifiers in vocabularies.  Must short identifiers be
unique across all vocabularies within one authority, or do

they have

to be unique just within a vocabulary?  I ask because for

our UCJEPS

installation, we have 3 taxonomy vocabularies.  Short

identifiers are

unique within each vocabulary but not across the entire

set.  When I

try to display one of those that has a "duplicate" short

identifier

in a demo 2.4 system, CSpace took 5 minutes to display the

record.

The GET statement in firebug was only reporting:

vocabularies/taxon/urn:cspace:name(0)

although the referer within the request headers is

/taxon.html?csid=urn:cspace:name(0)&vocab=taxon

Thanks,
Chris


Talk mailing list
Talk@lists.collectionspace.org

onspace.org

--
John Deck
(541) 321-0689


Talk mailing list
Talk@lists.collectionspace.org

nspace.org

I think we just require all word chars (alnum and _, IIRC). > -----Original Message----- > From: talk-bounces@lists.collectionspace.org > [mailto:talk-bounces@lists.collectionspace.org] On Behalf Of > sstone@socrates.berkeley.edu > Sent: Friday, August 31, 2012 1:49 PM > To: John Deck > Cc: CollectionSpace Talk List > Subject: Re: [Talk] Short identifier uniqueness requirements question? > > I thought short identifiers could not begin with a number? > > Susan > > > Not sure i understand the requirements here entirely but > > urn:cspace:name(0) does seem very brief! > > > > It would be easy to construct identifiers that are globally unique > > (e.g. attaching a UUID at the end of urn:cspace:) so why > not use those > > so other systems can work directly with the concepts that you are > > identifying? > > > > John > > > > On Fri, Aug 31, 2012 at 12:58 PM, Chris Hoffman > > <chris.hoffman@berkeley.edu> wrote: > >> Hi, > >> I need to make sure I understand at what scope uniqueness > is required > >> for short identifiers in vocabularies. Must short identifiers be > >> unique across all vocabularies within one authority, or do > they have > >> to be unique just within a vocabulary? I ask because for > our UCJEPS > >> installation, we have 3 taxonomy vocabularies. Short > identifiers are > >> unique within each vocabulary but not across the entire > set. When I > >> try to display one of those that has a "duplicate" short > identifier > >> in a demo 2.4 system, CSpace took 5 minutes to display the > record. > >> The GET statement in firebug was only reporting: > >> > >> > http://ucjeps.collectionspace.org:8180/collectionspace/tenant/ucjeps/ > >> vocabularies/taxon/urn:cspace:name(0) > >> > >> although the referer within the request headers is > >> > >> > http://ucjeps.collectionspace.org:8180/collectionspace/ui/ucjeps/html > >> /taxon.html?csid=urn:cspace:name(0)&vocab=taxon > >> > >> Thanks, > >> Chris > >> > >> > >> _______________________________________________ > >> Talk mailing list > >> Talk@lists.collectionspace.org > >> > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collecti > >> onspace.org > > > > > > > > -- > > John Deck > > (541) 321-0689 > > > > _______________________________________________ > > Talk mailing list > > Talk@lists.collectionspace.org > > > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectio > > nspace.org > > > > > > _______________________________________________ > Talk mailing list > Talk@lists.collectionspace.org > http://lists.collectionspace.org/mailman/listinfo/talk_lists.c > ollectionspace.org >
CH
Chris Hoffman
Fri, Aug 31, 2012 10:13 PM

Thanks, I'm pretty sure my problem is that the legacy identifier I used in this case was simply a one-digit string "0", and fetching is therefore slow. Sigh.
Chris

On Aug 31, 2012, at 2:00 PM, Patrick Schmitz wrote:

Must be unique within a vocab, or you will get duplicates, and things will
be unhappy. Having the same ID in several vocabs (even within the same
authority) is fine.

ShortIdentifiers are the basis for fetching, so having a short ID that is a
reasonable word length is good (indexes better). Having one that has a
numeric pseudo-random tail (e.g., "fooBar2345", as we create for auto-create
on termCompletion) balances ease of reading/debugging with uniqueness that
will behave well in the DB. The DB will try to stem the shortId when it
prepares keyword indexes (which we use in certain cases), and so the suffix
keeps things behaving better, I think.

UUIDs will work fine as well, but are harder to make sense of when
debugging, etc.

Patrick

-----Original Message-----
From: talk-bounces@lists.collectionspace.org
[mailto:talk-bounces@lists.collectionspace.org] On Behalf Of
Chris Hoffman
Sent: Friday, August 31, 2012 12:58 PM
To: CollectionSpace Talk List
Subject: [Talk] Short identifier uniqueness requirements question?

Hi,
I need to make sure I understand at what scope uniqueness is
required for short identifiers in vocabularies.  Must short
identifiers be unique across all vocabularies within one
authority, or do they have to be unique just within a
vocabulary?  I ask because for our UCJEPS installation, we
have 3 taxonomy vocabularies.  Short identifiers are unique
within each vocabulary but not across the entire set.  When I
try to display one of those that has a "duplicate" short
identifier in a demo 2.4 system, CSpace took 5 minutes to
display the record.  The GET statement in firebug was only reporting:

http://ucjeps.collectionspace.org:8180/collectionspace/tenant/
ucjeps/vocabularies/taxon/urn:cspace:name(0)

although the referer within the request headers is

http://ucjeps.collectionspace.org:8180/collectionspace/ui/ucje
ps/html/taxon.html?csid=urn:cspace:name(0)&vocab=taxon

Thanks,
Chris


Talk mailing list
Talk@lists.collectionspace.org
http://lists.collectionspace.org/mailman/listinfo/talk_lists.c
ollectionspace.org

Thanks, I'm pretty sure my problem is that the legacy identifier I used in this case was simply a one-digit string "0", and fetching is therefore slow. Sigh. Chris On Aug 31, 2012, at 2:00 PM, Patrick Schmitz wrote: > Must be unique within a vocab, or you will get duplicates, and things will > be unhappy. Having the same ID in several vocabs (even within the same > authority) is fine. > > ShortIdentifiers are the basis for fetching, so having a short ID that is a > reasonable word length is good (indexes better). Having one that has a > numeric pseudo-random tail (e.g., "fooBar2345", as we create for auto-create > on termCompletion) balances ease of reading/debugging with uniqueness that > will behave well in the DB. The DB will try to stem the shortId when it > prepares keyword indexes (which we use in certain cases), and so the suffix > keeps things behaving better, I think. > > UUIDs will work fine as well, but are harder to make sense of when > debugging, etc. > > Patrick > >> -----Original Message----- >> From: talk-bounces@lists.collectionspace.org >> [mailto:talk-bounces@lists.collectionspace.org] On Behalf Of >> Chris Hoffman >> Sent: Friday, August 31, 2012 12:58 PM >> To: CollectionSpace Talk List >> Subject: [Talk] Short identifier uniqueness requirements question? >> >> Hi, >> I need to make sure I understand at what scope uniqueness is >> required for short identifiers in vocabularies. Must short >> identifiers be unique across all vocabularies within one >> authority, or do they have to be unique just within a >> vocabulary? I ask because for our UCJEPS installation, we >> have 3 taxonomy vocabularies. Short identifiers are unique >> within each vocabulary but not across the entire set. When I >> try to display one of those that has a "duplicate" short >> identifier in a demo 2.4 system, CSpace took 5 minutes to >> display the record. The GET statement in firebug was only reporting: >> >> http://ucjeps.collectionspace.org:8180/collectionspace/tenant/ >> ucjeps/vocabularies/taxon/urn:cspace:name(0) >> >> although the referer within the request headers is >> >> http://ucjeps.collectionspace.org:8180/collectionspace/ui/ucje >> ps/html/taxon.html?csid=urn:cspace:name(0)&vocab=taxon >> >> Thanks, >> Chris >> >> >> _______________________________________________ >> Talk mailing list >> Talk@lists.collectionspace.org >> http://lists.collectionspace.org/mailman/listinfo/talk_lists.c >> ollectionspace.org >> >
CH
Chris Hoffman
Fri, Aug 31, 2012 10:16 PM

Thanks, John.  You're right that the string urn:cspace:name(0) wouldn't be a very good identifier!  Fortunately, this one is just used internally, and even then only under specific circumstances.  There are UUIDs and such attached to this record that will allow us to construct more appropriate globally unique identifiers for data sharing purposes.
Chris

On Aug 31, 2012, at 1:34 PM, John Deck wrote:

Not sure i understand the requirements here entirely but
urn:cspace:name(0)  does seem very brief!

It would be easy to construct identifiers that are globally unique
(e.g. attaching a UUID at the end of urn:cspace:) so why not use those
so other systems can work directly with the concepts that you are
identifying?

John

On Fri, Aug 31, 2012 at 12:58 PM, Chris Hoffman
chris.hoffman@berkeley.edu wrote:

Hi,
I need to make sure I understand at what scope uniqueness is required for short identifiers in vocabularies.  Must short identifiers be unique across all vocabularies within one authority, or do they have to be unique just within a vocabulary?  I ask because for our UCJEPS installation, we have 3 taxonomy vocabularies.  Short identifiers are unique within each vocabulary but not across the entire set.  When I try to display one of those that has a "duplicate" short identifier in a demo 2.4 system, CSpace took 5 minutes to display the record.  The GET statement in firebug was only reporting:
http://ucjeps.collectionspace.org:8180/collectionspace/tenant/ucjeps/vocabularies/taxon/urn:cspace:name(0)

although the referer within the request headers is
http://ucjeps.collectionspace.org:8180/collectionspace/ui/ucjeps/html/taxon.html?csid=urn:cspace:name(0)&vocab=taxon

Thanks,
Chris


Talk mailing list
Talk@lists.collectionspace.org
http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

--
John Deck
(541) 321-0689

Thanks, John. You're right that the string urn:cspace:name(0) wouldn't be a very good identifier! Fortunately, this one is just used internally, and even then only under specific circumstances. There are UUIDs and such attached to this record that will allow us to construct more appropriate globally unique identifiers for data sharing purposes. Chris On Aug 31, 2012, at 1:34 PM, John Deck wrote: > Not sure i understand the requirements here entirely but > urn:cspace:name(0) does seem very brief! > > It would be easy to construct identifiers that are globally unique > (e.g. attaching a UUID at the end of urn:cspace:) so why not use those > so other systems can work directly with the concepts that you are > identifying? > > John > > On Fri, Aug 31, 2012 at 12:58 PM, Chris Hoffman > <chris.hoffman@berkeley.edu> wrote: >> Hi, >> I need to make sure I understand at what scope uniqueness is required for short identifiers in vocabularies. Must short identifiers be unique across all vocabularies within one authority, or do they have to be unique just within a vocabulary? I ask because for our UCJEPS installation, we have 3 taxonomy vocabularies. Short identifiers are unique within each vocabulary but not across the entire set. When I try to display one of those that has a "duplicate" short identifier in a demo 2.4 system, CSpace took 5 minutes to display the record. The GET statement in firebug was only reporting: >> http://ucjeps.collectionspace.org:8180/collectionspace/tenant/ucjeps/vocabularies/taxon/urn:cspace:name(0) >> >> although the referer within the request headers is >> http://ucjeps.collectionspace.org:8180/collectionspace/ui/ucjeps/html/taxon.html?csid=urn:cspace:name(0)&vocab=taxon >> >> Thanks, >> Chris >> >> >> _______________________________________________ >> Talk mailing list >> Talk@lists.collectionspace.org >> http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org > > > > -- > John Deck > (541) 321-0689