Metadata encoding problems - accented characters

abarroche

Joined: 2007-12-05
Posts: 6
Posted: Thu, 2007-12-06 18:18

Hello,

I have a problem with metadata with accented characters (french é à ï ù...) in Gallery 2.2.

What i want to do is:
1. Write metadata in image file with Photoshop or an other program (Win and Os X)
2. Upload images in Gallery (latest official version)
3. Display images and metadata in Gallery

I've made lots of tests with differents programs to write metadata (Win and OS X) like Photoshop, Pixvue, Microsoft Photo Info, Irfanview, Acdsee, Picturesync... but there are always problems with accented characters.

According to some posts in this forum, i understand that:
Gallery 2.2 is able to import iptc metadata (but not xmp metadata). And Gallery encodage is UTF8.
Ok, so, i have to write iptc metadata in UTF8 and it would be ok ? ... I was thinking that Photoshop cs2 write in UTF8, but:

-> Here is an example of the same image "CS2_origine.jpg" with metadata writing in Photoshop CS2 (Win) in 2 versions of Gallery i have:
- Gallery A (professionnal work) 2.2.3 : all accented characters are wrong (only "Copyright" and "Description" are right). "GalleryA.jpg"
- Gallery B (personnal site) 2.2 : all accented characters are right. "GalleryB.jpg"

I can't see where is the matter: Metadata encoding in Photoshop ? / Exif/IPTC module ? / database encoding ? / importation by Gallery ? / Php and Mysql version ? / type of server ? / charset somewhere ?

*****
And always with accented characters... a user of GalleryA who use Iphoto have the problem which is describe here: http://gallery.menalto.com/node/34870 Any solution about this?
*****

Thank you for your help.

Adr.


Gallery A:
Gallery version = 2.2.3 noyau 1.2.0.5
PHP version = 5.2.5 apache2handler
Serveur Web = Apache/2.2.6 (FreeBSD) mod_ssl/2.2.6 OpenSSL/0.9.7e-p1 DAV/2 PHP/5.2.5 with Suhosin-Patch
Base de données = mysql 5.0.45, lock.system=flock
Boîtes à outils = Exif, ImageMagick, NetPBM, ArchiveUpload, Gd, Dcraw, Ffmpeg, Thumbnail, Getid3, LinkItemToolkit
Accélération = none, none
Système d'exploitation = FreeBSD icono 6.1-RELEASE-p11 FreeBSD 6.1-RELEASE-p11 #5: Thu Dec 7 12:43:14 CET 2006 root@primergy:/usr/obj/usr/src/sys/PRIMERGY i386
Thème par défaut = classic
gettext = activé
Langage = fr_FR
Navigateur Web = Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.1.10) Gecko/20071115 Firefox/2.0.0.10


Gallery B:
Gallery version = 2.2 noyau 1.2.0
PHP version = 4.3.9 apache2handler
Serveur Web = Apache/2.0.52 (CentOS)
Base de données = mysqlt 4.1.12, lock.system=flock
Boîtes à outils = NetPBM, Gd, Exif
Accélération = none, none
Système d'exploitation = Linux ns3.securenetim.net 2.6.9-023stab043.1-smp #1 SMP Mon Mar 5 16:38:22 MSK 2007 i686
Thème par défaut = matrix
gettext = activé
Langage = fr_FR
Navigateur Web = Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.1.10) Gecko/20071115 Firefox/2.0.0.10

AttachmentSize
CS2_origine.jpg27.76 KB
GalleryA.jpg93.93 KB
GalleryB.jpg95.09 KB
 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Sun, 2007-12-09 23:49

maybe one of the php setups is missing the necessary multibyte string extension?
btw: i've added the test image to my test gallery and the exif data works fine there. my test g2 is using the latest svn code of gallery (nightly snapshot).

btw: it would simplify things a tiny bit if both gallery installations were using the latest stable g2 code (including the latest stable exif module).

--------------
Documentation: Support / Troubleshooting | Installation, Upgrade, Configuration and Usage

 
abarroche

Joined: 2007-12-05
Posts: 6
Posted: Tue, 2007-12-11 17:42

Thanks Valiant !

I am in contact with administrator of the server of Gallery A (professionnal) related of php setups. I'm waiting for his answer.

For this Gallery A (which has to work correctly), Gallery 2.2.3 noyau (core) 1.2.0.5 is the latest stable version, isn't it ? Exif module is the latest too (1.1.0).

- Because of this Gallery is under a official website (University), administrators seem to be reticent of a svn version of Gallery and prefer an official version... But if this problem of accented characters is resolved i'm sure that we will find a solution -

In your test, Valiant, are all fields correct with é à ù... ?

Are there some users of Gallery (french, german, spanish...) who don't have this problem ? Can they test my image (CS2_origine.jpg in first post) ?

Thank you in advance for your help.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Tue, 2007-12-11 18:46

> In your test, Valiant, are all fields correct with é à ù... ?

yes.

--------------
Documentation: Support / Troubleshooting | Installation, Upgrade, Configuration and Usage

 
bauchi

Joined: 2007-11-11
Posts: 70
Posted: Tue, 2007-12-11 19:25
abarroche wrote:

Are there some users of Gallery (french, german, spanish...) who don't have this problem ? Can they test my image (CS2_origine.jpg in first post) ?

Tested your image with Gallery-Version 2.2.3 Core 1.2.0.5, PHP-Version 5.1.6.
All characters are displayed correctly.

 
abarroche

Joined: 2007-12-05
Posts: 6
Posted: Wed, 2007-12-12 14:22

-> Thanks Bauchi for your test with the same Gallery-Version.

As you can see in GalleryA.jpg, some fields are correct with accents (é à ..) in image properties:
- "Artist / Auteur" (field 2. in Photoshop)
- "Copyright / Copyright" (field 7.)
- "Image description / Description image" (field 4.)

But the same fields (2., 7., 4.) are wrong, in the same image properties:
- "IPTC: Byline / IPTC: Auteur"
- "IPTC: Copyright Notice / IPTC: Description du copyright"
- "IPTC: Caption / IPTC: Description"

"Artist", "Copyright" and "Image Description" are not understood like "IPTC Byline", "IPTC Copyright" and "IPTC caption".
They are differently interpreted, why ?

 
abarroche

Joined: 2007-12-05
Posts: 6
Posted: Thu, 2008-02-14 14:00

So... I come back with my problem...

No progress for the moment (the computer specialist is going to make a new installation of Gallery).

But i notice one more thing with metadata in Gallery.

Gallery don't understand the same way metadata made by Adobe Photoshop (CS3-Win) and metadata made by Adobe Bridge (CS3-Win). Why ?

Just see at screenshots.

...I'm lost with metadata IPTC Core, IIM IPTC, XMP... Iso 8859, UTF 8.


Gallery version = 2.2.3 noyau 1.2.0.5
PHP version = 5.2.5 apache2handler
Serveur Web = Apache/2.2.6 (FreeBSD) mod_ssl/2.2.6 OpenSSL/0.9.7e-p1 DAV/2 PHP/5.2.5 with Suhosin-Patch
Base de données = mysql 5.0.45, lock.system=flock
Boîtes à outils = Exif, ImageMagick, NetPBM, ArchiveUpload, Gd, Dcraw, Ffmpeg, Thumbnail, Getid3, LinkItemToolkit
Accélération = none, none
Système d'exploitation = FreeBSD icono 6.1-RELEASE-p11 FreeBSD 6.1-RELEASE-p11 #5: Thu Dec 7 12:43:14 CET 2006 root@primergy:/usr/obj/usr/src/sys/PRIMERGY i386
Thème par défaut = classic
gettext = activé
Langage = fr_FR
Navigateur Web = Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.1.10) Gecko/20071115 Firefox/2.0.0.10

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Fri, 2008-02-15 13:14

> Gallery don't understand the same way metadata made by Adobe Photoshop (CS3-Win) and metadata made by Adobe Bridge (CS3-Win). Why ?

correct. gallery's handling of exif/iptc metadata isn't great concerning character-encoding. i guess we still assume it's all ascii. maybe we allow for utf-8 too, surely not for other encodings.

--------------
Documentation: Support / Troubleshooting | Installation, Upgrade, Configuration and Usage

 
Mahonni

Joined: 2007-05-13
Posts: 6
Posted: Mon, 2008-04-21 22:36

I think I have the same problem. Did you solve it, abarroche?

Basically what happens to me is that when I add a photo that has keywords like "Abcdèfg" and "hijk", it only imports until it finds the accent. So, in this example, it'd import only the 1st word, but as "Abcd". Everything that comes after the accent disappears.

Is this gallery's fault? Is it that my hosting has to install another library or something like that?

BTW, this is my gallery2 info:

Gallery version = 2.2.2 core 1.2.0.4
PHP version = 5.2.5 cgi
Webserver = Apache/2.2.8 (Unix) mod_ssl/2.2.8 OpenSSL/0.9.8b mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
Database = mysqli 5.0.45-community, lock.system=flock
Toolkits = ArchiveUpload, Exif, Gd, Thumbnail, Getid3, ImageMagick
Acceleration = none, none
Operating system = Linux zeus.kwix.info 2.6.18-53.1.14.el5 #1 SMP Wed Mar 5 11:36:49 EST 2008 i686
Default theme = matrix
gettext = disabled
Locale = ca_ES
Browser = Mozilla/5.0 (Windows; U; Windows NT 5.1; ca; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14
Rows in GalleryAccessMap table = 29
Rows in GalleryAccessSubscriberMap table = 1938
Rows in GalleryUser table = 25
Rows in GalleryItem table = 1936
Rows in GalleryAlbumItem table = 45
Rows in GalleryCacheMap table = 0

Thank you.

 
abarroche

Joined: 2007-12-05
Posts: 6
Posted: Tue, 2008-04-22 10:17

No, Mahonni, I didn't solve this problem... :-(

In my Gallery, each é à ù... are replaced by a "?".

I remember that in a certain time, "everything that comes after the accent disappears", like in your Gallery, but I didn't know how this was solved (perhaps by converting database sql in utf8 in "Maintenance" in administration panel ?)

 
Mahonni

Joined: 2007-05-13
Posts: 6
Posted: Sat, 2008-04-26 21:22

Thank you, abarroche, I may do that as soon as I have some free time. It's way better to have a letter replaced by a "?" than loosing everything that comes after the special letter ;)

However, are we the only ones who have this problem? Should I submit it as a bug?

 
kurg

Joined: 2008-05-04
Posts: 1
Posted: Sun, 2008-05-04 19:35
Quote:
I was thinking that Photoshop cs2 write in UTF8

My IPTC-tests:
-Photoshop CS2 writes always IPTC-fields in Latin1/ISO-8859-1.
-In normal situation, Photoshop CS3 writes IPTC-fields in Latin1/ISO-8859-1.
-If orginal file loaded to Photoshop CS3 is in UTF-8, then it writes IPTC in UTF-8
-IPTC:CodedCharacterSet-tag includes used characterset (UTF-8). If IPTC:CodedCharacterSet-tag is missing, characterset is Latin1/ISO-8859-1 (this is actually wrong, but many(/all) programs work this way. http://www.cpanforum.com/threads/5088 (second post))
-These works right with character encoding (UTF-8 and Latin1/ISO-8859-1, reads/uses IPTC:CodedCharacterSet-tag):
*PhotoME
*Exiftool
*Photoshop CS3
-These doesn't work (does everything in Latin1/ISO-8859-1 and doesn't read/use IPTC:CodedCharacterSet-tag):
*Irfanview
*Exifer
*Gallery2 (EXIF/IPTC-plugin 1.1.0)
*Photoshop CS2
-You can check if there is IPTC:CodedCharacterSet-tag with Exiftool or PhotoME
-Convert Latin1/ISO-8859-1 to UTF-8 with Exiftool:
exiftool a.jpg -tagsfromfile a.jpg -iptc:all -codedcharacterset=UTF8
-Convert UTF-8 to Latin1/ISO-8859-1 with Exiftool:
exiftool a.jpg -tagsfromfile a.jpg -iptc:all -codedcharacterset=

In my Gallery2 Latin1/ISO-8859-1 IPTC-fields work, but i would like to use UTF-8-encoding in IPTC-fields. I haven't find sollution to this problem yet.

Edit1:
I forgot something important which is answer to my (and maybe your) problem :):
modules/exif/classes/ExifHelper.class (getIptcValue-function):

// function getIptcValue(&$object, $keyPath, $sourceEncoding=null) {
function getIptcValue(&$object, $keyPath, $sourceEncoding="ISO-8859-1") {

Upper line is orginal and bottom line is my "fix" when i tried to get Latin1/ISO-8859-1 IPTC-fields work.

Edit2:
I think i got my new "fix" to work (don't do my first "fix" in Edit1).
modules/exif/classes/ExifHelper.class (getIptcValue-function):

return mb_convert_encoding($result, "UTF-8", mb_detect_encoding($result, "UTF-8, ISO-8859-1", true));
// $result = GalleryCoreApi::convertToUtf8($result, $sourceEncoding);
// return $result;

This doesn't read IPTC:CodedCharacterSet-tag, but it tries to automatically detect encoding. Now both, UTF-8 and Latin1/ISO-8859-1 IPTC-fields work.

 
Ondra Kolar

Joined: 2008-10-22
Posts: 2
Posted: Wed, 2008-10-22 09:26

Upgraded yesterday to 2.3, but this problem still occurs.
Solution in my case (working on Windows, so iptc tags are in czech encoding win 1250):

modules/exif/classes/ExifHelper.class (getIptcValue-function): Line number 467
return iconv("CP1250", "UTF-8", $result);
// $result = GalleryCoreApi::convertToUtf8($result, $sourceEncoding);
// return $result;

 
tooli

Joined: 2008-04-23
Posts: 2
Posted: Fri, 2009-05-15 14:31

Thank you very much.
This helped me a lot.

 
ozgurkalan

Joined: 2009-06-14
Posts: 4
Posted: Sun, 2009-07-19 14:30

Hi,

I had the same problem. The uploaded images had weird characters in keywords section. I used the workaround as described above. I used ISO-8859-9 for the Turkish character problems:

modules/exif/classes/ExifHelper.class (getIptcValue-function):
return mb_convert_encoding($result, "UTF-8", mb_detect_encoding($result, "UTF-8, ISO-8859-9", true));
// $result = GalleryCoreApi::convertToUtf8($result, $sourceEncoding);
// return $result;

Thanks Kurg and Ondra.

Now, I need to solve the crash when Turkish language selected....

 
Cansys

Joined: 2010-03-25
Posts: 5
Posted: Thu, 2010-03-25 17:15

Cheers to kurg, we've struggled with this bug for months and your fix worked flawlessly.