Tags with accented Latin characters seen as duplicates
|
tempg
Joined: 2005-12-17
Posts: 1633 |
Posted: Fri, 2012-06-01 16:16
|
|
Apologies if this has already been mentioned. SCENARIO: Is this a Gallery-specific issue or would it be solved by some localization installs on the server? (Also, I'm not using the latest pull, so this may have already been addressed.) |
|
| Login or register to post comments |

Posts: 175
I've also recently noticed that accented tags aren't handled very consistently. I started experimenting with them after I made the new HTML5 tag cloud, since it allows accents (and Arabic, Kanji, etc.). Here's what I quickly learned:
- in the admin|content|tags menu, accents are treated as different letters (E and É have different categories)
- in autocompleted tag fields, they all look the same (E calls all tags that start with E, É, È...)
- in the tag module, when it spits out the tags for use in a sidebar, it spits them out with accents at the end (É goes after Z)
... so it needs a bit of work to say the least
.
Take care,
Shad
Posts: 25945
Please file tickets for these and if you have any patches it will go a long ways as dev time is sparse.
I think the auto-complete JS code is another library could be a setting there, but I have not looked.
Dave
_____________________________________________
Blog & G2 || floridave - Gallery Team
Posts: 175
Hey Dave,
I can start taking a look at tickets/patches, but before that, I have a question: what is the *correct* desired behavior?
From what tempg says, I think he'd like to see each instance as a separate letter (e.g. öçean and ocean are different). But personally, I'd like to see them as identical. Perhaps this arises from our intended languages? My site is English and French.
Or perhaps the correct approach is to have *both* options available? Certainly this is the more complicated approach, but... <shrug>
Take care,
Shad
Posts: 25945
Don't know. I don't use accented characters so would not know the desired behaviour/behavior. I guess looking at how others do it would be one way.
How does Flickr or Picasa handle it?
Is öçean and ocean the same word? I'm sure there are words that would have a different meaning with an accented character.
So in this case it has two different meanings and that should be considered different tags.
Dave
_____________________________________________
Blog & G2 || floridave - Gallery Team
Posts: 1633
@shadlaws: I agree that we should decide an overall "desired behavior" before doing patch suggestions.
(1) Given that most Gallery users are English speakers, I'd expect that Gallery would treat accented characters and non-accented characters as the same for selected purposes (like what @shadlaws indicated above "in autocompleted tag fields").
(2) But, I'd expect that "ocean" and "öçean" would ALWAYS be identified differently for other, non-sorting purposes (e.g. deciding whether or not the tag is unique, as in my scenario above).
(3) I don't think it matters much if e is sorted with or separately from é (separate is likely better, for native speakers of other languages), and I don't think it matters a ton which is considered alphabetically first in a situation (though non-accented generally comes before accented). BUT, putting accented characters after z is definitely a no-no.
Posts: 1633
As this whole thing could quickly get complicated, I've been trying to think through what the average (super majority) user would want to see.
Honestly, I think most of this is a non-issue for most users. I can think of several sets of words in different languages differentiated only by accented characters; however, almost none of them are nouns (meaning that I don't think they'd be used as tags by most people, even if they are using accented characters elsewhere). People not used to accented characters would still be fine if "in autocompleted tag fields, they all look the same (E calls all tags that start with E, É, È...)."
Having said that, I'm just going to change the tags I assigned (or develop a custom solution).
Worthy of our time: In my opinion, the only thing I think should be changed is issue (3) with accented characters being placed after z.
Maybe just sort accented and non-accented together under their non-accented version. I'm not sure how to do that immediately, but I'll look into it later next week, maybe the week after.
Posts: 25945
Sorting in php has, for me at least, has always been a pain.
That said, there is some help with php5.3 that we could use:
http://docs.php.net/manual/en/class.collator.php
Dave
_____________________________________________
Blog & G2 || floridave - Gallery Team