[Tag module hack] IPTC and Russian Keywords

fosco-maestro

Joined: 2010-08-11
Posts: 2
Posted: Wed, 2010-08-11 13:02

I think our people can have the same problem with russian keywords, that why I wrote it.

Spent a day, to trying to get work Gallery 3 with Russian Keywords in photo.

| G3 always output something like "Êóçüìåíêî"

How I discovered later, problem was in PHP, in function "mb_detect_encoding()".

That php function doesn't work well with cyrillic (some problems with CP1252 || CP1251, ru -- you can't get real character encoding for encode your word in "UTF8"). I Made a lot of test that's prove it (and I think it's not really important in that post. For more info if you interesting what's happend, you can write me a mail>

)

God thank, guy "gcog" wrote a module for IPTC (http://codex.gallery2.org/Gallery3:Modules:iptc), from where I get code, for parsing image's IPTC/XMP (never worked before with metadates in images)

I changed function "item_created()" in "modules/tag/helpers/tag_event.php". And copy-pasted folder "lib" from IPTC module, to "modules/tag/".

All in "zip" file that I attached.

Pavel Voznenko

AttachmentSize
module_tag_[hack].zip3.18 KB
 
bharat
bharat's picture

Joined: 2002-05-21
Posts: 7994
Posted: Thu, 2010-08-12 04:12

Sounds like this is related to https://sourceforge.net/apps/trac/gallery/ticket/1254
---
Problems? Check gallery3/var/logs
bugs/feature req's | upgrade to the latest code | use git

 
fosco-maestro

Joined: 2010-08-11
Posts: 2
Posted: Thu, 2010-08-12 08:33

Not really,

Keyword came in "Win-1252" (when I get ord() from all chars in string, it's equ Win-1252 table http://en.wikipedia.org/wiki/Windows-1252 , and Lebedev's decoder said the same http://www.artlebedev.ru/tools/decoder/ ), but "mb_detect_encoding()" returned "UTF-8", when "strict == TRUE", function returned "FALSE".

Default code "mb_detect_encoding($value, "ISO-8859-1, UTF-8")" that was in function "item_created()" returned "FALSE" too, and ofcourse, it starting encode in utf8. After, string become really unreadable and "mb_detect_encoding()" said that after encode string became "ASCII" 0_o

Then I made hardcode like "iconv('Windows-1252', 'UTF-8', $value)". It's returned "FALSE", and ofcourse in code like "value = iconv('Windows-1252', 'UTF-8', $value);" ->> $value == ''

Function "mb_convert_encoding()" made the same with string like "utf8_encode()".

How I think it's problem happend becouse our photografers used strange "taging tools", whatever, still have no idea what happend 0_o

Than I found module, that "gcog" created, and that module working with xmp data model, that doesn't care about encoding.

There is http://coffeecard.com.ua/share/SVV_2897.JPG.zip one of the photos with russian tags, if you interesting.

PS I'm still junior php developer, can be that I just developed own "bicycle" :)

 
sigma_shig

Joined: 2010-11-18
Posts: 18
Posted: Thu, 2010-11-18 08:08

I have the same problem with Russian (non-latin) tags in files. I'm using the PicaJet as IPTC editor under Windows. As a result all tags are encoded in Windows-1251. And I see unreadable characters as a tag. I have reviewed the code and found, that some small changes in modules\tags\helper\tag_event.php (line 39):
if (function_exists("mb_detect_encoding") &&
mb_detect_encoding($word, "Windows-1251,Windows-1252,ISO-8859-1, UTF-8") != "UTF-8") {
/*$word = utf8_encode($word);*/
$coding = mb_detect_encoding($word, "Windows-1251,Windows-1252,ISO-8859-1, UTF-8");
$word = iconv($coding, 'UTF-8', $word);
}

can resolve my problem. So, I have replaced utf8_encode by iconv. But I'm not experienced php-developer and, probably, this code can be optimized. Moreover, I think, it should be generalized and list of available encodings should be expanded. Or, as alternative way, this can be an optional parameter, available for edit in settings.
So, I ask author of tags module: could you please, thinking about non-english customers and add support of different encodings for them? :)
I'm attached my changed file to this message. When you have a problem with encoding you should unzip this file into modules directory.

 
bharat
bharat's picture

Joined: 2002-05-21
Posts: 7994
Posted: Fri, 2010-11-19 20:47

Please file a ticket for this. I think it's reasonable to put the encoding list into a module setting so that you can at least change it via Admin > Settings > Advanced.
---
Problems? Check gallery3/var/logs
bugs/feature req's | upgrade to the latest code | use git