PHP and UTF-8

Discussions and requests related to new CMSimple features, plugins, templates etc. and how to develop.
Please don't ask for support at this forums!
cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: PHP and UTF-8

Post by cmb » Fri Aug 17, 2012 9:17 pm

cmb wrote:only the 3 mentioned entities would have to be converted in the search function, which might be done by:
(the given function call was buggy; I've fixed that in the post above)

But instead of replacing the remainig entities in the content, it's more efficient to entity encode the 3 characters in the search string (r255).
Christoph M. Becker – Plugins for CMSimple_XH

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: PHP and UTF-8

Post by cmb » Sun Aug 26, 2012 11:22 pm

Hello Community,

it seems the small problems with UTF-8 take no end. :shock: On comp.lang.javascript a user noted:
Thomas Lang wrote:When discussing character encoding, it should also be mentioned that not even choosing a specific Unicode encoding such as UTF-8 ensures compatibility. The Macintosh operating system is known to use the more versatile, but in terms of memory footprint more expensive, character composition feature of Unicode, while other systems tend not to; as a result, the same character may be encoded using the same Unicode encoding in a different way and may need to be normalized in order to be compatible) [3]. This applies in particular to a Content Management System where some authors might use a Macintosh computer, while other authors and most visitors would not.
...
[3] Wikipedia contributors (2012-06-12). Wikipedia: Combining character. Retrieved 2012-08-26. <http://en.wikipedia.org/wiki/Combining_character>
This means for example that the search for accented characters might fail, when the content was edited with a MAC, and the visitor uses Windows.

Can anybody test this? Has anybody already made this experience? Shall we apply Unicode normalization to all input?

Christoph
Christoph M. Becker – Plugins for CMSimple_XH

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: PHP and UTF-8

Post by cmb » Tue Oct 14, 2014 10:25 pm

I just stumbled upon an interesting article: http://mortoray.com/2013/11/27/the-stri ... is-broken/.
Christoph M. Becker – Plugins for CMSimple_XH

Post Reply