Locales

Discussions and requests related to new CMSimple features, plugins, templates etc. and how to develop.
Please don't ask for support at this forums!
cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Locales

Post by cmb » Sun Jul 08, 2012 5:46 pm

Hello Community,

till now I have avoided any locale specific PHP routines in my plugins (as it's done in CMSimple_XH's core), as I'm afraid that might be a problem on shared hosts particularly for multi language sites, because the required locales might not be installed on the server (particularly the UTF-8 locales). This might not be a big problem for formatting date/time (if necessary the names of the 12 months and the 7 days could be localized in the language files). But it is definitely a problem regarding the sorting of strings, as different languages have totally different collation sequences. So using locales seems to be unavoidable for several tasks.

Some plugins already use several locale specific routines and call setlocale() to change the current locale to some user defined setting in the language files. This might lead to unexpected behavior, if two plugins set different locales. So probably setting up the desired locale should be done by CMSimple_XH's core. This way it can even be detected ("system check"), if the chosen locale is available on the server. Unfortunately neither the names of the locales are standardized nor is there a universal possibility to list the installed locales, so users would have to find the right one by some trial and error.

And it seems that setlocale doesn't work on Windows for UTF-8 locales:
http://msdn.microsoft.com/en-us/library/x99tb11d.aspx wrote:If you provide a code page like UTF-7 or UTF-8, setlocale will fail, returning NULL.
Additionally one has to consider the locales used in the browser by JS. E.g. formatting a date on an English site view in a Browser which is set to a German locale will show the German version.

Any ideas, suggestions or insights on this topic are very welcome.

Christoph
Christoph M. Becker – Plugins for CMSimple_XH

svasti
Posts: 1651
Joined: Wed Dec 17, 2008 5:08 pm

Re: Locales

Post by svasti » Mon Jul 09, 2012 8:28 am

Lots of dirt under the (PHP-)carpet
cmb wrote:So probably setting up the desired locale should be done by CMSimple_XH's core.
+1
cmb wrote:This way it can even be detected ("system check"), if the chosen locale is available on the server. ... by some trial and error.
What if the locale is not installed? What about some kind of fall-back? The core should check and do the trial and error, and than the user could set the desired/available setting, i.e. choose fall-back Klingon?

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: Locales

Post by cmb » Mon Jul 09, 2012 11:09 am

svasti wrote:The core should check and do the trial and error,
The core could check, but IMO it's not feasible, to let the core try to detect a fitting locale. For once, CMSimple simply doesn't know the desired country, but only the language. For German it's somewhat simple: language "de" maps to country "DE". But even in this case the user might prefer country "AT". It's even harder for English: there's no country "EN", but instead "US", "GB" etc. And the corresponding locale for "de_DE.UTF-8" on *nix seems to be "German_Germany.65001" on Windows. But the following makes me some headache on Windows (XP):

Code: Select all

<?php
echo setlocale(LC_ALL, 'German_Germany.65001'),"\n";
echo strcoll('Birnen','Äpfel') == strcoll('Birnen','Äpfel');
?>
Outputs:

Code: Select all

LC_COLLATE=German_Germany.65001;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.65001;LC_NUMERIC=German_Germany.65001;LC_TIME=German_Germany.65001
1
So choosing a UTF-8 locale on Windows (XP, PHP 5.3) is possible, but testing strcoll() seems to be impossible with such encodings!

And we have to consider this:
http://de3.php.net/manual/en/function.setlocale.php wrote:The locale information is maintained per process, not per thread. If you are running PHP on a multithreaded server API like IIS or Apache on Windows, you may experience sudden changes in locale settings while a script is running, though the script itself never called setlocale(). This happens due to other scripts running in different threads of the same process at the same time, changing the process-wide locale using setlocale().
:shock:
Christoph M. Becker – Plugins for CMSimple_XH

svasti
Posts: 1651
Joined: Wed Dec 17, 2008 5:08 pm

Re: Locales

Post by svasti » Mon Jul 09, 2012 7:26 pm

What about the user choosing something out of different possibilities offered by the core?

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: Locales

Post by cmb » Tue Jul 10, 2012 11:25 pm

svasti wrote:What about the user choosing something out of different possibilities offered by the core?
It seems, that it's possible to specify a locale without a country specification on *NIX, e.g. 'en.UTF-8' might reasonably work. :) But AFAIK it's not possible to get a list of installed locales in a portable fassion. popen('locale -a') might fail on many servers. And there's the problem, how to present the possibilities: if the setting goes to LANG.php, all the back-end normally offers is a simple <input type="text">. :?

So the simplest solution would be to set the default to "$sl.UTF-8". But I'm still not convinced of using the locales at all -- that seems to be a rough edge for portable PHP... The big shocker is this:
http://de3.php.net/manual/en/function.setlocale.php wrote:The locale information is maintained per process, not per thread. If you are running PHP on a multithreaded server API like IIS or Apache on Windows, you may experience sudden changes in locale settings while a script is running, though the script itself never called setlocale(). This happens due to other scripts running in different threads of the same process at the same time, changing the process-wide locale using setlocale().
Consider a shared hosting with several domains (quite a common scenario) in such a multithreaded environment! CMSimple_XH switches to ge.UTF-8 in one thread, than a totally different system switches to some ANSI based locale before the sorting is done. This might result in:

Code: Select all

Apfel
Birnen
Äpfel
:roll:
Christoph M. Becker – Plugins for CMSimple_XH

svasti
Posts: 1651
Joined: Wed Dec 17, 2008 5:08 pm

Re: Locales

Post by svasti » Wed Jul 11, 2012 10:11 am

cmb wrote:than a totally different system switches to some ANSI based locale before the sorting is done.
this locales-solution seems to be a pre-Unicode. For us this propably means not to use locale-dependend functions. Ha, ha, so you can forget quite a bit of php, nice, less memorisation :lol: . Too bad, Php 6.0 is years overdue.
What you're saying means, if a system should be portagble, it shouldn't depend on locales, because there are too many ways locales are implemented and called and sometimes they aren't there at all.
Another item in CMSimple's style guide: don't use locale dependent stuff.

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: Locales

Post by cmb » Wed Jul 11, 2012 12:01 pm

svasti wrote:Another item in CMSimple's style guide: don't use locale dependent stuff.
Unfortunately it's hardly possible to avoid locale dependent functionality in all cases. But the problems are not particularly caused by PHP, which just uses the underlying implementation of the OS. Setting the locale per process is very reasonable for most cases -- it just doesn't work for multi-threaded web servers. And well, some PHP extensions might not be thread safe, so running a multi-process Apache is probably preferable (see e.g. http://www.zerigo.com/article/apache_mu ... re-forked/), or even better NGINX or lighttpd, which are using a very lightweight event model. But there's still IIS, which AFAIK is always multi-threaded.

Since PHP 5.3 the PECL extension (available for PHP 5.2) "intl" is included, which offers the Collator class, which AFAIK is thread-safe. But we can't use that, without lifting the requirements for CMSimple_XH to a point, that might not be supported by very many hosts.

I'm not aware of a native PHP implementation for string collation. But even if there is one available, this would probably be very slow.

So, what shall we do? What do others think about the locales issue?
Christoph M. Becker – Plugins for CMSimple_XH

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: Locales

Post by cmb » Wed Jul 18, 2012 9:08 pm

http://www.phpwact.org/php/i18n/utf-8 wrote:But this is no use if you’re writing applications which will be installed by third parties (like these for example) because it’s system specific (it’s not even just OS specific). If the default system locale does not support UTF-8, in theory your application could change the locale “on the fly” using setlocale but in practice that requires two things; that there is a locale available on the system which supports UTF-8 (not guaranteed) and that the correct locale identifier string can be found (there a definately differences between Windows and *Nix locale identifiers and even amongst the Unixes believe there are variations e.g. FreeBSD). What’s more, you can’t rely on users to be able to change the locale correctly to suit your applications needs - on a shared host they probably won’t be able to change the locale for the user that Apache is running with. Bottom line - locales are not the way to go for applications intended to be “write once, run anywhere”.

Update: You can downgrade your character type locale to the POSIX (C) locale via setlocale(), like

Code: Select all

setlocale ( LC_CTYPE, 'C' ); 
This should work on all platforms. It would mean functions like strtolower() are only considering characters in the ASCII range - that opens the way for some significant performance optimizations.
Something to consider too...
Christoph M. Becker – Plugins for CMSimple_XH

manu
Posts: 1086
Joined: Wed Jun 04, 2008 12:05 pm
Location: St. Gallen - Schweiz
Contact:

Re: Locales

Post by manu » Sun Dec 30, 2012 2:08 pm

..In 99% of the cases we use functions locale (properly set in the host config) related to time/date numeric currency we don't even notice. Probably we should have the possibility to overrule the host given locale setting in the langconfig. ..And just notice in the documentation that it is not bullet proof for IIS and/or Windows Systems. This is not a CMSimple specific issue.

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: Locales

Post by cmb » Sun Dec 30, 2012 6:21 pm

manu wrote:In 99% of the cases we use functions locale (properly set in the host config) related to time/date numeric currency we don't even notice.
The core doesn't use any date/time, numeric or monetary locale specific stuff, if I'm not mistaken. The latter two are simply not required, and the former is handled with date() instead of strftime(). The only part were the core uses locale specific routines, are some string functions, which don't use the UTF-8 aware functions from Utf8_XH, as they're working on file names, which are probably (though not necessarily) locale specific.
manu wrote:And just notice in the documentation that it is not bullet proof for IIS and/or Windows Systems.
I assume that on Apache with mod_php it is either not allowed to change the locale at all, or all other threads/processes are affected as well, which often will be PHP scripts from other domains for shared hosting. So using locale specific functionality, you often don't know, what you'll get. I suppose that's the reason why PHP has included the "intl" extension since 5.3.

I suggest to avoid all locale specific stuff as much as possible. One thing one cannot work around easily are the collations, so it might be a good idea to add a language specific for setting the locale (and to document, that it might not work as expected).
Christoph M. Becker – Plugins for CMSimple_XH

Post Reply