Page 1 of 3

uri_seperator and url encoding

Posted: Sun Nov 27, 2011 5:40 pm
by cmb
Hello Community,

a user pointed out a problem with URLs to his CMSimple site posted on Facebook. If he wants to post e.g. the URL http://www.example.com/?page/subpage Facebook replaces this by http://www.example.com/?page%2Fsubpage, but this link gives a "404: Not found".

On Wikipedia it is explained, that:
Reserved characters that have no reserved purpose in a particular context may also be percent-encoded but are not semantically different from those that are not.
This holds for the commonly used uri_seperators ":" and "/", and probably also for all other reasonable uri_seperators (as otherwise clashes with page titles could be expected).

So IMO it's up to CMSimple to cater for URLs that have url-encoded those uri_separators. AFAIK technically this could be done by a rewrite rule (e.g. by mod_rewrite) or in CMSimple's source code, the latter being more generally useful and more convenient, as the actual uri_seperator is known. This can be done in /cmsimple/cms.php around line 130 (might depend on the actual version), by inserting:

Code: Select all

$su = substr($su, 0, $cf['uri']['length']); // after this line
$su = preg_replace('/'.urlencode($cf['uri']['seperator']).'/iU', $cf['uri']['seperator'], $su); // insert this one
 
I've chosen the preg_replace() here instead of the faster str_ireplace() as the latter is only available under PHP 5. This substitution should be usable with any uri_seperator which is an ASCII character, as all other bytes in an UTF-8 bytestream will have their highest bit set (see e.g. Wikipedia).

If anybody sees problems with this approach, or even does have a better way to solve the issue, I'm looking forward to read about it.

Christoph

Re: uri_seperator and url encoding

Posted: Sun Nov 27, 2011 9:52 pm
by Holger
cmb wrote:If anybody sees problems
No problem! That change is more than welcome, because I had the same problems in some cases with other socials-stuff.

KR
Holger

Re: uri_seperator and url encoding

Posted: Mon Nov 28, 2011 11:30 am
by johnjdoe
Realy nice and helpfull modification! Should imho be in the core of the next release.

Re: uri_seperator and url encoding

Posted: Sat Jan 21, 2012 11:55 pm
by cmb
Hi Community,

a big problem with this solution was found: headings must not contain the chosen uri_seperator anymore. Otherwise navigation to the page is not possible. This is due to the fact, that CMSimple urlencode()s the uri_seperator, if it's contained in a heading, so that it's not interpreted as a separator between the headings of different menu levels. But with my proposed change it cannot be distinguished anymore, if the encoded uri_seperator was a separator in the first place, or if it's part of heading. Currently I see the following possibilities:
  1. Removing the change from CMSimple_XH 1.5.1, and dropping support for Facebook links, what might not be a good idea as posting links to CMSimple_XH sites might help to increase their popularity. (Well, that may change when SOPA/PIPA will pass legislation ;))
  2. The uri_seperator must be avoided in headings, which could be done through urichar_org/urichar_new, or perhaps even automatically. But that will change the URLs of already existing pages.
  3. The HTTP_REFERER could be used to determine, how to interpret the encoded uri_seperator. Definitely no option, as the information given by the browser might be incorrect, the detection would require all sites, that will encode the uri_separator to be known (and will probably change over time), and there could be ambiguities.
  4. Try to interpret the encoded uri_seperator both ways and see, if a corresponding page could be found. IMO no option, as there could be more than one page that fits.
  5. Submitting a petition to Facebook to change the way they encode URLs. ;)
So it seems to come down to choosing the lesser evil between (1) and (2).

Please note, that it doesn't matter, if the decoding of encoded uri_seperators happens through mod_rewrite or anywhere in CMSimple_XH.

Does anybody see a better solution? What should we do?

Christoph

Re: uri_seperator and url encoding

Posted: Sun Jan 22, 2012 10:20 am
by Martin
Hi Christoph,

maybe there is a 6th possibility: Leave $su as it was before but accept an urlencoded version from outside when trying to figure out the page index in rfc(), ~ l. 624:

Code: Select all

 if ($su == $u[$i] || $su == urlencode($u[$i])) {
            $s = $i;
        }  
:?:

Martin

Re: uri_seperator and url encoding

Posted: Sun Jan 22, 2012 12:44 pm
by cmb
Hi Martin,

indeed, that's smart and simple! Existing URLs don't have to be changed, no ambiguities are possible and links from Facebook will work, if the URL doesn't contain any already urlencoded characters (i.e. %XX), but that might be best practise anyway. So if users care for backlinks from Facebook, they can use urichar_org/new to keep their URLs "clean".

To be honest: I haven't tested it yet, but I'm quite convinced that this should have been the solution for the given problem in the first place.

Christoph

Re: uri_seperator and url encoding

Posted: Tue Jan 24, 2012 11:29 am
by johnjdoe
cmb wrote:Hi Martin,

indeed, that's smart and simple! Existing URLs don't have to be changed, no ambiguities are possible and links from Facebook will work, if the URL doesn't contain any already urlencoded characters (i.e. %XX), but that might be best practise anyway. So if users care for backlinks from Facebook, they can use urichar_org/new to keep their URLs "clean".

To be honest: I haven't tested it yet, but I'm quite convinced that this should have been the solution for the given problem in the first place.

Christoph
Do you know allready in which version this will be implemented?

Re: uri_seperator and url encoding

Posted: Tue Jan 24, 2012 12:09 pm
by cmb
Hi Gerd,

I've already put it on the roadmap for 1.5.2, but it is not approved (yet).

Christoph

Re: uri_seperator and url encoding

Posted: Wed Jan 25, 2012 6:08 am
by johnjdoe
cmb wrote:Hi Gerd,

I've already put it on the roadmap for 1.5.2, but it is not approved (yet).

Christoph
Thanks, hope it will be approved soon.

Re: uri_seperator and url encoding

Posted: Fri Feb 10, 2012 5:50 pm
by manu
cmb wrote:Hi Community,

a big problem with this solution was found: headings must not contain the chosen uri_seperator anymore. Otherwise navigation to the page is not possible. This is due to the fact, that CMSimple urlencode()s the uri_seperator, if it's contained in a heading, so that it's not interpreted as a separator between the headings of different menu levels. But with my proposed change it cannot be distinguished anymore, if the encoded uri_seperator was a separator in the first place, or if it's part of heading. ...//...
Does anybody see a better solution? What should we do?

Christoph
To prevent problems in upgrades it would be nice to have this remarked in the 1.5.1 release notes.
regards
manu