[SOLVED] New Language

sagemaniac

Joined: 2003-03-26
Posts: 47
Posted: Sun, 2010-09-05 18:13

Hi all,
just came back from a wedding in Botswana. Since I have a well educated local on hand, I'd like to contribute a new language for G3: Setswana

How do I go about this?
So far I've only found docs on contributing changes to existing languages.

- Richard

 
floridave
floridave's picture

Joined: 2003-12-22
Posts: 25957
Posted: Sun, 2010-09-05 20:29

http://codex.gallery2.org/Gallery3:Localization

Dave
_____________________________________________
Blog & G2 || floridave - Gallery Team

 
bharat
bharat's picture

Joined: 2002-05-21
Posts: 7985
Posted: Mon, 2010-09-06 05:01

I just added the tn_ZA locale and labelled it "Setswana". Let me know if the label is wrong. Thanks a ton for your localization help!
---
Problems? Check gallery3/var/logs
bugs/feature req's | upgrade to the latest code | use git

 
sagemaniac

Joined: 2003-03-26
Posts: 47
Posted: Mon, 2010-09-06 06:46

Thank you Bharat, that's what I needed.
Setswana is how all the locals I've met call it, so we'll keep the name.
It will take a while though ;).

 
sagemaniac

Joined: 2003-03-26
Posts: 47
Posted: Mon, 2010-09-06 20:11

Bharat,
since Setswana is more widespread in Botswana (90% of the people speak it there), I think the locale should be tn_BW rather then tn_ZA.
In South-Africa, only a 7% minority speak Setswana.

- Richard

 
bharat
bharat's picture

Joined: 2002-05-21
Posts: 7985
Posted: Mon, 2010-09-06 21:06

Fair enough. I don't know too much about this so I'll take your word for it. I just pushed a change to do that (and I see that you've been uploading localizations to tn_BW anyway :-)
---
Problems? Check gallery3/var/logs
bugs/feature req's | upgrade to the latest code | use git

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Tue, 2010-09-07 01:41

Hi Richard,

What follows is a long explanation why we'd rather use "tn_ZA" instead of "tn_BW" as the language tag for Tswana.

I maintain most of the internationalization / localization code and data in Gallery 3.
The way we assign locale codes (choosing a language code + region code) is generally by following standards. And when it comes to locale data, the CLDR project (closely related to the Unicode project) is defining best practices and providing great data. See: http://cldr.unicode.org/ .

In Gallery 3, we specify the region code in all of our language tags rather than just having a language subtag. A lot of languages have regional variants, and by just always including the region code in the language tag we make it clear what variant we use for that specific localization in Gallery 3.

When we add a language tag to Gallery 3, we start by adding just the most common variant. E.g. for now, we just have "de_DE" instead of "de_DE", "de_CH", "de_AT" and other variants. The need to add more variants hasn't arisen yet.
Portuguese is an example where we added 2 variants almost from the beginning since there are two very popular variants: Portuguese (Brazil), Portuguese (Portugal).

CLDR has the concept of "most likely subtags". And it's based on real world statistics. E.g. when we see the language code "de" (German) without any context, then you usually refer to the most likely variant of this language, which is "de_Latn_DE", meaning German as spoken / written in Germany, and written in Latin script (rather than Chinese Han characters :) ).

Similarly, for "en" (English) the most likely subtags are "Latn" for script and "US" for region, since the USA has a larger population with English as their first language than any other region (e.g. England, or Australia, or ...).
And for "pt" (Portuguese) this happens to be "pt_Latn_BR" (Brazil) rather than Portugal.

So when it comes to Tswana ("tn"), CLDR says the most likely region is South Africa and not Botswana, as seen at:
For http://unicode.org/repos/cldr-tmp/trunk/diff/supplemental/likely_subtags.html

Why South Africa? Because there are more South African than Botswanan people speaking / writing Tswana. See:
http://unicode.org/repos/cldr-tmp/trunk/diff/supplemental/territory_language_information.html
(For Tswana we have 49.1 * 0.86 * 0.072 = 3 million people in South Africa vs. 2.0 * 0.812 * 0.62 = 1 million people in Botswana)

--

But does it matter for you whether we declare tn_ZA or tn_BW as Gallery 3's language tag for Tswana?
Do the two regional variants vary much?

As long as G3 just has 1 variant for Tswana, then the display name of this language doesn't include any information about which variant it is. We'd just display "Setswana".

 
bharat
bharat's picture

Joined: 2002-05-21
Posts: 7985
Posted: Tue, 2010-09-07 01:50

Ok, I've set it back to tn_ZA.
---
Problems? Check gallery3/var/logs
bugs/feature req's | upgrade to the latest code | use git

 
sagemaniac

Joined: 2003-03-26
Posts: 47
Posted: Tue, 2010-09-07 06:43

@valliant: thank you for taking time for this detailed explanation - makes sense!
The people helping me with this are from the North of Botswana.
When using flags as a language selector - with tn_ZA south Africa shows up - I am not sure if they'll like that - but I will change my local language.php back to tn_ZA and work on that. I guess I can easily trick a single gallery into ahowing a Botswana flag ;).

@bharat: sorry for the back and forth :D

 
bromide
bromide's picture

Joined: 2010-08-20
Posts: 28
Posted: Tue, 2010-09-07 16:21

By the way, another standard that is an element of what valiant is talking about is BCP47, which replaced ISO 639 for language codes a few years ago. Nice explanation of it here:

http://www.w3.org/International/articles/language-tags/

(edit) And check out what I just found: a tool that will validate whether an language tag is well-formed!

http://people.w3.org/rishida/utils/subtags/

This is good because it checks to make sure you aren't explicitly using the "most likely subtag" defaults, which are supposed to be implied and are difficult to check manually.

(edit again) And actually, when I put "tn-ZA" or "tn-BW" in it basically says that you should just leave off the geographic region and go with "tn" alone if you can.

˳® bromide      
                     

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Tue, 2010-09-07 16:35

Glad to see people refer to BCP47. I'm very familiar with it and the IANA language subtag registry. :)
It was a conscious decision to include the region subtag in Gallery 3 language tags. It avoids confusion as to which regional variant we're using. And it makes things a bit easier should we add more regional variants for a language at a later point.