Empathy List Archives

PM

Peter Murray

Mon, Aug 10, 2015 9:04 PM

Hello everyone. I have a question about how diacritic marks are indexed and searched in CollectionSpace. From what I can tell, the handling of text strings themselves is UTF-8 clean throughout the application and the database. If a user searches for terms using the character without the diacritic mark, with the seach engine retrieve records with the diacritics? Also, if a user types in a controlled vocabulary field, will a keyed entry without a diacritic mark offer the term with the diacritic?

Thanks in advance,

Peter

Peter Murray
Cherry Hill Company

Hello everyone. I have a question about how diacritic marks are indexed and searched in CollectionSpace. From what I can tell, the handling of text strings themselves is UTF-8 clean throughout the application and the database. If a user searches for terms using the character without the diacritic mark, with the seach engine retrieve records with the diacritics? Also, if a user types in a controlled vocabulary field, will a keyed entry without a diacritic mark offer the term with the diacritic? Thanks in advance, Peter -- Peter Murray Cherry Hill Company

AR

Aron Roberts

Mon, Aug 10, 2015 9:28 PM

Hi Peter,

These are both terrific questions, and ones that some of us have also had.

There are some notes about keyword search (e.g. in both the upper right
search box and the keyword search box in Advanced Search) and diacritics in

https://issues.collectionspace.org/browse/BAMPFA-197

The most generalizable way may be to use the PostgreSQL 'unaccent'
module. Ray Lee has written extensive notes on using this, here:

https://issues.collectionspace.org/browse/BAMPFA-199

IIRC, searches in controlled vocabulary fields, and in field-specific
searches on the Advanced Search page, use different underlying search
mechanisms which may not be able to take advantage of fulltext search
configuration for finding text using both diacritical and non-diacritical
variations of characters.

If you have use case(s) for either/both of the latter, please feel free
to file new JIRA issue(s) for them at https://issues.collectionspace.org
(Or just ask us to do so, if you prefer!)

Aron Roberts
UC Berkeley

On Mon, Aug 10, 2015 at 2:04 PM, Peter Murray pmurray@chillco.com wrote:

Hello everyone. I have a question about how diacritic marks are indexed
and searched in CollectionSpace. From what I can tell, the handling of
text strings themselves is UTF-8 clean throughout the application and the
database. If a user searches for terms using the character without the
diacritic mark, with the seach engine retrieve records with the
diacritics? Also, if a user types in a controlled vocabulary field, will a
keyed entry without a diacritic mark offer the term with the diacritic?

Thanks in advance,

Peter

Peter Murray
Cherry Hill Company

Talk mailing list
Talk@lists.collectionspace.org

http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

Hi Peter, These are both terrific questions, and ones that some of us have also had. There are some notes about keyword search (e.g. in both the upper right search box and the keyword search box in Advanced Search) and diacritics in https://issues.collectionspace.org/browse/BAMPFA-197 The most generalizable way may be to use the PostgreSQL 'unaccent' module. Ray Lee has written extensive notes on using this, here: https://issues.collectionspace.org/browse/BAMPFA-199 IIRC, searches in controlled vocabulary fields, and in field-specific searches on the Advanced Search page, use different underlying search mechanisms which may not be able to take advantage of fulltext search configuration for finding text using both diacritical and non-diacritical variations of characters. If you have use case(s) for either/both of the latter, please feel free to file new JIRA issue(s) for them at https://issues.collectionspace.org (Or just ask us to do so, if you prefer!) Aron Roberts UC Berkeley On Mon, Aug 10, 2015 at 2:04 PM, Peter Murray <pmurray@chillco.com> wrote: > Hello everyone. I have a question about how diacritic marks are indexed > and searched in CollectionSpace. From what I can tell, the handling of > text strings themselves is UTF-8 clean throughout the application and the > database. If a user searches for terms using the character without the > diacritic mark, with the seach engine retrieve records with the > diacritics? Also, if a user types in a controlled vocabulary field, will a > keyed entry without a diacritic mark offer the term with the diacritic? > > Thanks in advance, > > > Peter > -- > Peter Murray > Cherry Hill Company > _______________________________________________ > Talk mailing list > Talk@lists.collectionspace.org > > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org >

PM

Peter Murray

Tue, Aug 11, 2015 12:27 AM

Excellent! Thank you for the reply, Aron. Through your tickets I was able to find the corresponding issues in the core code:

[CSPACE-6394] Fix search on characters with diacritical marks to allow greater discovery - CollectionSpace https://issues.collectionspace.org/browse/CSPACE-6394 https://issues.collectionspace.org/browse/CSPACE-6394

In reading through the comments of BAMPFA-197, I can't quite tell if the changes (https://issues.collectionspace.org/browse/BAMPFA-197?focusedCommentId=45866&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-45866 https://issues.collectionspace.org/browse/BAMPFA-197?focusedCommentId=45866&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-45866) were made to CSpace core or to the Berkeley webapp. Do you know? If not, I can reach out to Glen directly.

Peter

On Aug 10, 2015, at 5:28 PM, Aron Roberts aron@socrates.berkeley.edu wrote:

Hi Peter,

These are both terrific questions, and ones that some of us have also had.

There are some notes about keyword search (e.g. in both the upper right search box and the keyword search box in Advanced Search) and diacritics in

https://issues.collectionspace.org/browse/BAMPFA-197 https://issues.collectionspace.org/browse/BAMPFA-197

The most generalizable way may be to use the PostgreSQL 'unaccent' module. Ray Lee has written extensive notes on using this, here:

https://issues.collectionspace.org/browse/BAMPFA-199 https://issues.collectionspace.org/browse/BAMPFA-199

IIRC, searches in controlled vocabulary fields, and in field-specific searches on the Advanced Search page, use different underlying search mechanisms which may not be able to take advantage of fulltext search configuration for finding text using both diacritical and non-diacritical variations of characters.

If you have use case(s) for either/both of the latter, please feel free to file new JIRA issue(s) for them at https://issues.collectionspace.org https://issues.collectionspace.org/ (Or just ask us to do so, if you prefer!)

Aron Roberts
UC Berkeley

On Mon, Aug 10, 2015 at 2:04 PM, Peter Murray <pmurray@chillco.com mailto:pmurray@chillco.com> wrote:
Hello everyone. I have a question about how diacritic marks are indexed and searched in CollectionSpace. From what I can tell, the handling of text strings themselves is UTF-8 clean throughout the application and the database. If a user searches for terms using the character without the diacritic mark, with the seach engine retrieve records with the diacritics? Also, if a user types in a controlled vocabulary field, will a keyed entry without a diacritic mark offer the term with the diacritic?

Thanks in advance,

Peter

Excellent! Thank you for the reply, Aron. Through your tickets I was able to find the corresponding issues in the core code: [CSPACE-6394] Fix search on characters with diacritical marks to allow greater discovery - CollectionSpace https://issues.collectionspace.org/browse/CSPACE-6394 <https://issues.collectionspace.org/browse/CSPACE-6394> In reading through the comments of BAMPFA-197, I can't quite tell if the changes (https://issues.collectionspace.org/browse/BAMPFA-197?focusedCommentId=45866&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-45866 <https://issues.collectionspace.org/browse/BAMPFA-197?focusedCommentId=45866&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-45866>) were made to CSpace core or to the Berkeley webapp. Do you know? If not, I can reach out to Glen directly. Peter > On Aug 10, 2015, at 5:28 PM, Aron Roberts <aron@socrates.berkeley.edu> wrote: > > Hi Peter, > > These are both terrific questions, and ones that some of us have also had. > > There are some notes about keyword search (e.g. in both the upper right search box and the keyword search box in Advanced Search) and diacritics in > > https://issues.collectionspace.org/browse/BAMPFA-197 <https://issues.collectionspace.org/browse/BAMPFA-197> > > The most generalizable way may be to use the PostgreSQL 'unaccent' module. Ray Lee has written extensive notes on using this, here: > > https://issues.collectionspace.org/browse/BAMPFA-199 <https://issues.collectionspace.org/browse/BAMPFA-199> > > IIRC, searches in controlled vocabulary fields, and in field-specific searches on the Advanced Search page, use different underlying search mechanisms which may not be able to take advantage of fulltext search configuration for finding text using both diacritical and non-diacritical variations of characters. > > If you have use case(s) for either/both of the latter, please feel free to file new JIRA issue(s) for them at https://issues.collectionspace.org <https://issues.collectionspace.org/> (Or just ask us to do so, if you prefer!) > > Aron Roberts > UC Berkeley > > > > On Mon, Aug 10, 2015 at 2:04 PM, Peter Murray <pmurray@chillco.com <mailto:pmurray@chillco.com>> wrote: > Hello everyone. I have a question about how diacritic marks are indexed and searched in CollectionSpace. From what I can tell, the handling of text strings themselves is UTF-8 clean throughout the application and the database. If a user searches for terms using the character without the diacritic mark, with the seach engine retrieve records with the diacritics? Also, if a user types in a controlled vocabulary field, will a keyed entry without a diacritic mark offer the term with the diacritic? > > Thanks in advance, > > > Peter

RL

Ray Lee

Tue, Aug 11, 2015 12:35 AM

Hi Peter,
In BAMFA-197 Glen was fixing the source code for a web app we have here
that is distinct from CollectionSpace (although it does operate on data
extracted from CollectionSpace). For the changes we made to make full text
searches in CollectionSpace accent-insensitive, see BAMPFA-199.

Ray

On Mon, Aug 10, 2015 at 5:27 PM, Peter Murray pmurray@chillco.com wrote:

Excellent! Thank you for the reply, Aron. Through your tickets I was
able to find the corresponding issues in the core code:

[CSPACE-6394] Fix search on characters with diacritical marks to allow
greater discovery - CollectionSpace
https://issues.collectionspace.org/browse/CSPACE-6394

In reading through the comments of BAMPFA-197, I can't quite tell if the
changes (
https://issues.collectionspace.org/browse/BAMPFA-197?focusedCommentId=45866&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-45866)
were made to CSpace core or to the Berkeley webapp. Do you know? If not,
I can reach out to Glen directly.

Peter

On Aug 10, 2015, at 5:28 PM, Aron Roberts aron@socrates.berkeley.edu
wrote:

Hi Peter,

These are both terrific questions, and ones that some of us have also
had.

There are some notes about keyword search (e.g. in both the upper right
search box and the keyword search box in Advanced Search) and diacritics in

https://issues.collectionspace.org/browse/BAMPFA-197

The most generalizable way may be to use the PostgreSQL 'unaccent'
module. Ray Lee has written extensive notes on using this, here:

https://issues.collectionspace.org/browse/BAMPFA-199

IIRC, searches in controlled vocabulary fields, and in field-specific
searches on the Advanced Search page, use different underlying search
mechanisms which may not be able to take advantage of fulltext search
configuration for finding text using both diacritical and non-diacritical
variations of characters.

If you have use case(s) for either/both of the latter, please feel free
to file new JIRA issue(s) for them at https://issues.collectionspace.org
(Or just ask us to do so, if you prefer!)

Aron Roberts
UC Berkeley

On Mon, Aug 10, 2015 at 2:04 PM, Peter Murray pmurray@chillco.com wrote:

Hello everyone. I have a question about how diacritic marks are indexed
and searched in CollectionSpace. From what I can tell, the handling of
text strings themselves is UTF-8 clean throughout the application and the
database. If a user searches for terms using the character without the
diacritic mark, with the seach engine retrieve records with the
diacritics? Also, if a user types in a controlled vocabulary field, will a
keyed entry without a diacritic mark offer the term with the diacritic?

Thanks in advance,

Peter

Talk mailing list
Talk@lists.collectionspace.org

http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

Hi Peter, In BAMFA-197 Glen was fixing the source code for a web app we have here that is distinct from CollectionSpace (although it does operate on data extracted from CollectionSpace). For the changes we made to make full text searches in CollectionSpace accent-insensitive, see BAMPFA-199. Ray On Mon, Aug 10, 2015 at 5:27 PM, Peter Murray <pmurray@chillco.com> wrote: > Excellent! Thank you for the reply, Aron. Through your tickets I was > able to find the corresponding issues in the core code: > > [CSPACE-6394] Fix search on characters with diacritical marks to allow > greater discovery - CollectionSpace > https://issues.collectionspace.org/browse/CSPACE-6394 > > In reading through the comments of BAMPFA-197, I can't quite tell if the > changes ( > https://issues.collectionspace.org/browse/BAMPFA-197?focusedCommentId=45866&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-45866) > were made to CSpace core or to the Berkeley webapp. Do you know? If not, > I can reach out to Glen directly. > > > Peter > > On Aug 10, 2015, at 5:28 PM, Aron Roberts <aron@socrates.berkeley.edu> > wrote: > > Hi Peter, > > These are both terrific questions, and ones that some of us have also > had. > > There are some notes about keyword search (e.g. in both the upper right > search box and the keyword search box in Advanced Search) and diacritics in > > https://issues.collectionspace.org/browse/BAMPFA-197 > > The most generalizable way may be to use the PostgreSQL 'unaccent' > module. Ray Lee has written extensive notes on using this, here: > > https://issues.collectionspace.org/browse/BAMPFA-199 > > IIRC, searches in controlled vocabulary fields, and in field-specific > searches on the Advanced Search page, use different underlying search > mechanisms which may not be able to take advantage of fulltext search > configuration for finding text using both diacritical and non-diacritical > variations of characters. > > If you have use case(s) for either/both of the latter, please feel free > to file new JIRA issue(s) for them at https://issues.collectionspace.org > (Or just ask us to do so, if you prefer!) > > Aron Roberts > UC Berkeley > > > > On Mon, Aug 10, 2015 at 2:04 PM, Peter Murray <pmurray@chillco.com> wrote: > >> Hello everyone. I have a question about how diacritic marks are indexed >> and searched in CollectionSpace. From what I can tell, the handling of >> text strings themselves is UTF-8 clean throughout the application and the >> database. If a user searches for terms using the character without the >> diacritic mark, with the seach engine retrieve records with the >> diacritics? Also, if a user types in a controlled vocabulary field, will a >> keyed entry without a diacritic mark offer the term with the diacritic? >> >> Thanks in advance, >> >> >> Peter >> > > > > > _______________________________________________ > Talk mailing list > Talk@lists.collectionspace.org > > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org > >

PM

Peter Murray

Tue, Aug 11, 2015 12:46 AM

Thank you, Ray; that helped sort things through. I've taken the liberty of
relating BAMPFA-199 to CSPACE-6394.

Peter

On Mon, Aug 10, 2015 at 8:35 PM, Ray Lee rhlee@berkeley.edu wrote:

Hi Peter,
In BAMFA-197 Glen was fixing the source code for a web app we have here
that is distinct from CollectionSpace (although it does operate on data
extracted from CollectionSpace). For the changes we made to make full text
searches in CollectionSpace accent-insensitive, see BAMPFA-199.

Ray

On Mon, Aug 10, 2015 at 5:27 PM, Peter Murray pmurray@chillco.com wrote:

Excellent! Thank you for the reply, Aron. Through your tickets I was
able to find the corresponding issues in the core code:

[CSPACE-6394] Fix search on characters with diacritical marks to allow
greater discovery - CollectionSpace
https://issues.collectionspace.org/browse/CSPACE-6394

In reading through the comments of BAMPFA-197, I can't quite tell if the
changes (
https://issues.collectionspace.org/browse/BAMPFA-197?focusedCommentId=45866&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-45866)
were made to CSpace core or to the Berkeley webapp. Do you know? If not,
I can reach out to Glen directly.

Peter

On Aug 10, 2015, at 5:28 PM, Aron Roberts aron@socrates.berkeley.edu
wrote:

Hi Peter,

These are both terrific questions, and ones that some of us have also
had.

There are some notes about keyword search (e.g. in both the upper right
search box and the keyword search box in Advanced Search) and diacritics in

https://issues.collectionspace.org/browse/BAMPFA-197

The most generalizable way may be to use the PostgreSQL 'unaccent'
module. Ray Lee has written extensive notes on using this, here:

https://issues.collectionspace.org/browse/BAMPFA-199

IIRC, searches in controlled vocabulary fields, and in field-specific
searches on the Advanced Search page, use different underlying search
mechanisms which may not be able to take advantage of fulltext search
configuration for finding text using both diacritical and non-diacritical
variations of characters.

If you have use case(s) for either/both of the latter, please feel free
to file new JIRA issue(s) for them at https://issues.collectionspace.org
(Or just ask us to do so, if you prefer!)

Aron Roberts
UC Berkeley

On Mon, Aug 10, 2015 at 2:04 PM, Peter Murray pmurray@chillco.com
wrote:

Hello everyone. I have a question about how diacritic marks are indexed
and searched in CollectionSpace. From what I can tell, the handling of
text strings themselves is UTF-8 clean throughout the application and the
database. If a user searches for terms using the character without the
diacritic mark, with the seach engine retrieve records with the
diacritics? Also, if a user types in a controlled vocabulary field, will a
keyed entry without a diacritic mark offer the term with the diacritic?

Thanks in advance,

Peter

Talk mailing list
Talk@lists.collectionspace.org

http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

--
Peter Murray
Dev/Ops Lead and Project Manager
Cherry Hill Company

Thank you, Ray; that helped sort things through. I've taken the liberty of relating BAMPFA-199 to CSPACE-6394. Peter On Mon, Aug 10, 2015 at 8:35 PM, Ray Lee <rhlee@berkeley.edu> wrote: > Hi Peter, > In BAMFA-197 Glen was fixing the source code for a web app we have here > that is distinct from CollectionSpace (although it does operate on data > extracted from CollectionSpace). For the changes we made to make full text > searches in CollectionSpace accent-insensitive, see BAMPFA-199. > > Ray > > On Mon, Aug 10, 2015 at 5:27 PM, Peter Murray <pmurray@chillco.com> wrote: > >> Excellent! Thank you for the reply, Aron. Through your tickets I was >> able to find the corresponding issues in the core code: >> >> [CSPACE-6394] Fix search on characters with diacritical marks to allow >> greater discovery - CollectionSpace >> https://issues.collectionspace.org/browse/CSPACE-6394 >> >> In reading through the comments of BAMPFA-197, I can't quite tell if the >> changes ( >> https://issues.collectionspace.org/browse/BAMPFA-197?focusedCommentId=45866&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-45866) >> were made to CSpace core or to the Berkeley webapp. Do you know? If not, >> I can reach out to Glen directly. >> >> >> Peter >> >> On Aug 10, 2015, at 5:28 PM, Aron Roberts <aron@socrates.berkeley.edu> >> wrote: >> >> Hi Peter, >> >> These are both terrific questions, and ones that some of us have also >> had. >> >> There are some notes about keyword search (e.g. in both the upper right >> search box and the keyword search box in Advanced Search) and diacritics in >> >> https://issues.collectionspace.org/browse/BAMPFA-197 >> >> The most generalizable way may be to use the PostgreSQL 'unaccent' >> module. Ray Lee has written extensive notes on using this, here: >> >> https://issues.collectionspace.org/browse/BAMPFA-199 >> >> IIRC, searches in controlled vocabulary fields, and in field-specific >> searches on the Advanced Search page, use different underlying search >> mechanisms which may not be able to take advantage of fulltext search >> configuration for finding text using both diacritical and non-diacritical >> variations of characters. >> >> If you have use case(s) for either/both of the latter, please feel free >> to file new JIRA issue(s) for them at https://issues.collectionspace.org >> (Or just ask us to do so, if you prefer!) >> >> Aron Roberts >> UC Berkeley >> >> >> >> On Mon, Aug 10, 2015 at 2:04 PM, Peter Murray <pmurray@chillco.com> >> wrote: >> >>> Hello everyone. I have a question about how diacritic marks are indexed >>> and searched in CollectionSpace. From what I can tell, the handling of >>> text strings themselves is UTF-8 clean throughout the application and the >>> database. If a user searches for terms using the character without the >>> diacritic mark, with the seach engine retrieve records with the >>> diacritics? Also, if a user types in a controlled vocabulary field, will a >>> keyed entry without a diacritic mark offer the term with the diacritic? >>> >>> Thanks in advance, >>> >>> >>> Peter >>> >> >> >> >> >> _______________________________________________ >> Talk mailing list >> Talk@lists.collectionspace.org >> >> http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org >> >> > -- Peter Murray Dev/Ops Lead and Project Manager Cherry Hill Company

AR

Aron Roberts

Tue, Aug 11, 2015 1:05 AM

Thanks, Peter - appreciate your linking these issues and your comment! I've
annotated your annotation :-) on CSPACE-6394 to include the pertinent issue
number in that comment, as well.

Ray's correct, and my apologies for the confusion: Glen's work, AFAIK, was
on a webapp for the BAM/PFA Cinefiles collection that pre-dates both
CollectionSpace and the current set of Django-based,
CollectionSpace-focused webapps we use here at Berkeley.

On Mon, Aug 10, 2015 at 5:46 PM, Peter Murray pmurray@chillco.com wrote:

Thank you, Ray; that helped sort things through. I've taken the liberty
of relating BAMPFA-199 to CSPACE-6394.

Peter

On Mon, Aug 10, 2015 at 8:35 PM, Ray Lee rhlee@berkeley.edu wrote:

Hi Peter,
In BAMFA-197 Glen was fixing the source code for a web app we have here
that is distinct from CollectionSpace (although it does operate on data
extracted from CollectionSpace). For the changes we made to make full text
searches in CollectionSpace accent-insensitive, see BAMPFA-199.

Ray

On Mon, Aug 10, 2015 at 5:27 PM, Peter Murray pmurray@chillco.com
wrote:

Excellent! Thank you for the reply, Aron. Through your tickets I was
able to find the corresponding issues in the core code:

[CSPACE-6394] Fix search on characters with diacritical marks to allow
greater discovery - CollectionSpace
https://issues.collectionspace.org/browse/CSPACE-6394

In reading through the comments of BAMPFA-197, I can't quite tell if the
changes (
https://issues.collectionspace.org/browse/BAMPFA-197?focusedCommentId=45866&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-45866)
were made to CSpace core or to the Berkeley webapp. Do you know? If not,
I can reach out to Glen directly.

Peter

On Aug 10, 2015, at 5:28 PM, Aron Roberts aron@socrates.berkeley.edu
wrote:

Hi Peter,

These are both terrific questions, and ones that some of us have also
had.

There are some notes about keyword search (e.g. in both the upper
right search box and the keyword search box in Advanced Search) and
diacritics in

https://issues.collectionspace.org/browse/BAMPFA-197

The most generalizable way may be to use the PostgreSQL 'unaccent'
module. Ray Lee has written extensive notes on using this, here:

https://issues.collectionspace.org/browse/BAMPFA-199

IIRC, searches in controlled vocabulary fields, and in field-specific
searches on the Advanced Search page, use different underlying search
mechanisms which may not be able to take advantage of fulltext search
configuration for finding text using both diacritical and non-diacritical
variations of characters.

If you have use case(s) for either/both of the latter, please feel
free to file new JIRA issue(s) for them at
https://issues.collectionspace.org (Or just ask us to do so, if you
prefer!)

Aron Roberts
UC Berkeley

On Mon, Aug 10, 2015 at 2:04 PM, Peter Murray pmurray@chillco.com
wrote:

Hello everyone. I have a question about how diacritic marks are
indexed and searched in CollectionSpace. From what I can tell, the
handling of text strings themselves is UTF-8 clean throughout the
application and the database. If a user searches for terms using the
character without the diacritic mark, with the seach engine retrieve
records with the diacritics? Also, if a user types in a controlled
vocabulary field, will a keyed entry without a diacritic mark offer the
term with the diacritic?

Thanks in advance,

Peter

Talk mailing list
Talk@lists.collectionspace.org

http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

--
Peter Murray
Dev/Ops Lead and Project Manager
Cherry Hill Company

Talk mailing list
Talk@lists.collectionspace.org

http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

Thanks, Peter - appreciate your linking these issues and your comment! I've annotated your annotation :-) on CSPACE-6394 to include the pertinent issue number in that comment, as well. Ray's correct, and my apologies for the confusion: Glen's work, AFAIK, was on a webapp for the BAM/PFA Cinefiles collection that pre-dates both CollectionSpace and the current set of Django-based, CollectionSpace-focused webapps we use here at Berkeley. On Mon, Aug 10, 2015 at 5:46 PM, Peter Murray <pmurray@chillco.com> wrote: > Thank you, Ray; that helped sort things through. I've taken the liberty > of relating BAMPFA-199 to CSPACE-6394. > > > Peter > > On Mon, Aug 10, 2015 at 8:35 PM, Ray Lee <rhlee@berkeley.edu> wrote: > >> Hi Peter, >> In BAMFA-197 Glen was fixing the source code for a web app we have here >> that is distinct from CollectionSpace (although it does operate on data >> extracted from CollectionSpace). For the changes we made to make full text >> searches in CollectionSpace accent-insensitive, see BAMPFA-199. >> >> Ray >> >> On Mon, Aug 10, 2015 at 5:27 PM, Peter Murray <pmurray@chillco.com> >> wrote: >> >>> Excellent! Thank you for the reply, Aron. Through your tickets I was >>> able to find the corresponding issues in the core code: >>> >>> [CSPACE-6394] Fix search on characters with diacritical marks to allow >>> greater discovery - CollectionSpace >>> https://issues.collectionspace.org/browse/CSPACE-6394 >>> >>> In reading through the comments of BAMPFA-197, I can't quite tell if the >>> changes ( >>> https://issues.collectionspace.org/browse/BAMPFA-197?focusedCommentId=45866&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-45866) >>> were made to CSpace core or to the Berkeley webapp. Do you know? If not, >>> I can reach out to Glen directly. >>> >>> >>> Peter >>> >>> On Aug 10, 2015, at 5:28 PM, Aron Roberts <aron@socrates.berkeley.edu> >>> wrote: >>> >>> Hi Peter, >>> >>> These are both terrific questions, and ones that some of us have also >>> had. >>> >>> There are some notes about keyword search (e.g. in both the upper >>> right search box and the keyword search box in Advanced Search) and >>> diacritics in >>> >>> https://issues.collectionspace.org/browse/BAMPFA-197 >>> >>> The most generalizable way may be to use the PostgreSQL 'unaccent' >>> module. Ray Lee has written extensive notes on using this, here: >>> >>> https://issues.collectionspace.org/browse/BAMPFA-199 >>> >>> IIRC, searches in controlled vocabulary fields, and in field-specific >>> searches on the Advanced Search page, use different underlying search >>> mechanisms which may not be able to take advantage of fulltext search >>> configuration for finding text using both diacritical and non-diacritical >>> variations of characters. >>> >>> If you have use case(s) for either/both of the latter, please feel >>> free to file new JIRA issue(s) for them at >>> https://issues.collectionspace.org (Or just ask us to do so, if you >>> prefer!) >>> >>> Aron Roberts >>> UC Berkeley >>> >>> >>> >>> On Mon, Aug 10, 2015 at 2:04 PM, Peter Murray <pmurray@chillco.com> >>> wrote: >>> >>>> Hello everyone. I have a question about how diacritic marks are >>>> indexed and searched in CollectionSpace. From what I can tell, the >>>> handling of text strings themselves is UTF-8 clean throughout the >>>> application and the database. If a user searches for terms using the >>>> character without the diacritic mark, with the seach engine retrieve >>>> records with the diacritics? Also, if a user types in a controlled >>>> vocabulary field, will a keyed entry without a diacritic mark offer the >>>> term with the diacritic? >>>> >>>> Thanks in advance, >>>> >>>> >>>> Peter >>>> >>> >>> >>> >>> >>> _______________________________________________ >>> Talk mailing list >>> Talk@lists.collectionspace.org >>> >>> http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org >>> >>> >> > > > -- > Peter Murray > Dev/Ops Lead and Project Manager > Cherry Hill Company > > > _______________________________________________ > Talk mailing list > Talk@lists.collectionspace.org > > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org > >

PM

Peter Murray

Tue, Aug 11, 2015 2:45 PM

Great! I may be taking this one on, so the work in BAMPFA-199 will be a helpful guide. Ray: If y'all have the code in a form that can be easily pulled into core and are looking for an independent eye on it, let me know please.

Peter

On Aug 10, 2015, at 9:05 PM, Aron Roberts aron@socrates.berkeley.edu wrote:

Thanks, Peter - appreciate your linking these issues and your comment! I've annotated your annotation :-) on CSPACE-6394 to include the pertinent issue number in that comment, as well.

Ray's correct, and my apologies for the confusion: Glen's work, AFAIK, was on a webapp for the BAM/PFA Cinefiles collection that pre-dates both CollectionSpace and the current set of Django-based, CollectionSpace-focused webapps we use here at Berkeley.

On Mon, Aug 10, 2015 at 5:46 PM, Peter Murray <pmurray@chillco.com mailto:pmurray@chillco.com> wrote:
Thank you, Ray; that helped sort things through. I've taken the liberty of relating BAMPFA-199 to CSPACE-6394.

Peter

On Mon, Aug 10, 2015 at 8:35 PM, Ray Lee <rhlee@berkeley.edu mailto:rhlee@berkeley.edu> wrote:
Hi Peter,
In BAMFA-197 Glen was fixing the source code for a web app we have here that is distinct from CollectionSpace (although it does operate on data extracted from CollectionSpace). For the changes we made to make full text searches in CollectionSpace accent-insensitive, see BAMPFA-199.

Ray

On Mon, Aug 10, 2015 at 5:27 PM, Peter Murray <pmurray@chillco.com mailto:pmurray@chillco.com> wrote:
Excellent! Thank you for the reply, Aron. Through your tickets I was able to find the corresponding issues in the core code:

[CSPACE-6394] Fix search on characters with diacritical marks to allow greater discovery - CollectionSpace https://issues.collectionspace.org/browse/CSPACE-6394 https://issues.collectionspace.org/browse/CSPACE-6394

In reading through the comments of BAMPFA-197, I can't quite tell if the changes (https://issues.collectionspace.org/browse/BAMPFA-197?focusedCommentId=45866&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-45866 https://issues.collectionspace.org/browse/BAMPFA-197?focusedCommentId=45866&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-45866) were made to CSpace core or to the Berkeley webapp. Do you know? If not, I can reach out to Glen directly.

Peter

On Aug 10, 2015, at 5:28 PM, Aron Roberts <aron@socrates.berkeley.edu mailto:aron@socrates.berkeley.edu> wrote:

Hi Peter,

These are both terrific questions, and ones that some of us have also had.

There are some notes about keyword search (e.g. in both the upper right search box and the keyword search box in Advanced Search) and diacritics in

https://issues.collectionspace.org/browse/BAMPFA-197 https://issues.collectionspace.org/browse/BAMPFA-197

The most generalizable way may be to use the PostgreSQL 'unaccent' module. Ray Lee has written extensive notes on using this, here:

https://issues.collectionspace.org/browse/BAMPFA-199 https://issues.collectionspace.org/browse/BAMPFA-199

IIRC, searches in controlled vocabulary fields, and in field-specific searches on the Advanced Search page, use different underlying search mechanisms which may not be able to take advantage of fulltext search configuration for finding text using both diacritical and non-diacritical variations of characters.

If you have use case(s) for either/both of the latter, please feel free to file new JIRA issue(s) for them at https://issues.collectionspace.org https://issues.collectionspace.org/ (Or just ask us to do so, if you prefer!)

Aron Roberts
UC Berkeley

On Mon, Aug 10, 2015 at 2:04 PM, Peter Murray <pmurray@chillco.com mailto:pmurray@chillco.com> wrote:
Hello everyone. I have a question about how diacritic marks are indexed and searched in CollectionSpace. From what I can tell, the handling of text strings themselves is UTF-8 clean throughout the application and the database. If a user searches for terms using the character without the diacritic mark, with the seach engine retrieve records with the diacritics? Also, if a user types in a controlled vocabulary field, will a keyed entry without a diacritic mark offer the term with the diacritic?

Thanks in advance,

Peter

Talk mailing list
Talk@lists.collectionspace.org mailto:Talk@lists.collectionspace.org
http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

--
Peter Murray
Dev/Ops Lead and Project Manager
Cherry Hill Company

Great! I may be taking this one on, so the work in BAMPFA-199 will be a helpful guide. Ray: If y'all have the code in a form that can be easily pulled into core and are looking for an independent eye on it, let me know please. Peter > On Aug 10, 2015, at 9:05 PM, Aron Roberts <aron@socrates.berkeley.edu> wrote: > > Thanks, Peter - appreciate your linking these issues and your comment! I've annotated your annotation :-) on CSPACE-6394 to include the pertinent issue number in that comment, as well. > > Ray's correct, and my apologies for the confusion: Glen's work, AFAIK, was on a webapp for the BAM/PFA Cinefiles collection that pre-dates both CollectionSpace and the current set of Django-based, CollectionSpace-focused webapps we use here at Berkeley. > > On Mon, Aug 10, 2015 at 5:46 PM, Peter Murray <pmurray@chillco.com <mailto:pmurray@chillco.com>> wrote: > Thank you, Ray; that helped sort things through. I've taken the liberty of relating BAMPFA-199 to CSPACE-6394. > > > Peter > > On Mon, Aug 10, 2015 at 8:35 PM, Ray Lee <rhlee@berkeley.edu <mailto:rhlee@berkeley.edu>> wrote: > Hi Peter, > In BAMFA-197 Glen was fixing the source code for a web app we have here that is distinct from CollectionSpace (although it does operate on data extracted from CollectionSpace). For the changes we made to make full text searches in CollectionSpace accent-insensitive, see BAMPFA-199. > > Ray > > On Mon, Aug 10, 2015 at 5:27 PM, Peter Murray <pmurray@chillco.com <mailto:pmurray@chillco.com>> wrote: > Excellent! Thank you for the reply, Aron. Through your tickets I was able to find the corresponding issues in the core code: > > [CSPACE-6394] Fix search on characters with diacritical marks to allow greater discovery - CollectionSpace https://issues.collectionspace.org/browse/CSPACE-6394 <https://issues.collectionspace.org/browse/CSPACE-6394> > > In reading through the comments of BAMPFA-197, I can't quite tell if the changes (https://issues.collectionspace.org/browse/BAMPFA-197?focusedCommentId=45866&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-45866 <https://issues.collectionspace.org/browse/BAMPFA-197?focusedCommentId=45866&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-45866>) were made to CSpace core or to the Berkeley webapp. Do you know? If not, I can reach out to Glen directly. > > > Peter > >> On Aug 10, 2015, at 5:28 PM, Aron Roberts <aron@socrates.berkeley.edu <mailto:aron@socrates.berkeley.edu>> wrote: >> >> Hi Peter, >> >> These are both terrific questions, and ones that some of us have also had. >> >> There are some notes about keyword search (e.g. in both the upper right search box and the keyword search box in Advanced Search) and diacritics in >> >> https://issues.collectionspace.org/browse/BAMPFA-197 <https://issues.collectionspace.org/browse/BAMPFA-197> >> >> The most generalizable way may be to use the PostgreSQL 'unaccent' module. Ray Lee has written extensive notes on using this, here: >> >> https://issues.collectionspace.org/browse/BAMPFA-199 <https://issues.collectionspace.org/browse/BAMPFA-199> >> >> IIRC, searches in controlled vocabulary fields, and in field-specific searches on the Advanced Search page, use different underlying search mechanisms which may not be able to take advantage of fulltext search configuration for finding text using both diacritical and non-diacritical variations of characters. >> >> If you have use case(s) for either/both of the latter, please feel free to file new JIRA issue(s) for them at https://issues.collectionspace.org <https://issues.collectionspace.org/> (Or just ask us to do so, if you prefer!) >> >> Aron Roberts >> UC Berkeley >> >> >> >> On Mon, Aug 10, 2015 at 2:04 PM, Peter Murray <pmurray@chillco.com <mailto:pmurray@chillco.com>> wrote: >> Hello everyone. I have a question about how diacritic marks are indexed and searched in CollectionSpace. From what I can tell, the handling of text strings themselves is UTF-8 clean throughout the application and the database. If a user searches for terms using the character without the diacritic mark, with the seach engine retrieve records with the diacritics? Also, if a user types in a controlled vocabulary field, will a keyed entry without a diacritic mark offer the term with the diacritic? >> >> Thanks in advance, >> >> >> Peter > > > > > _______________________________________________ > Talk mailing list > Talk@lists.collectionspace.org <mailto:Talk@lists.collectionspace.org> > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org <http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org> > > -- Peter Murray Dev/Ops Lead and Project Manager Cherry Hill Company

talk@lists.collectionspace.org

Diacritic marks in searching and type-ahead lists

Peter

Peter