eeeno wrote:IMO it is relevant to discuss this matter in this section.
It is the right place
eeeno wrote:I find it wasteful to allocate memory for all of the site content. Only necessary content should be loaded.
Yes, it is wasteful. But that's for sure not the bottleneck for
typical CMSimple sites. For example go to
http://www.cmsimple-xh.org/ and have a look at the developments of a decent browser. Particularly interesting is the timeline: the actual request takes about 250ms, but all related (sub-)requests are finished it takes over 700ms. The problem is not so much the actual PHP processing, but the 8 images and 8 stylesheets, which trigger additional request to the server. Even for
http://3-magi.net/demo/large/ the actual PHP processing makes up for only about half of the time it takes until the complete page is requested, though on this site are no additional plugins installed. On
http://simplesolutions.dk/ several plugins are used for demonstration purposes. The actual PHP request takes only 50ms, but until everything is there it takes 4 sec. In the last case, the seamingly wasteful reading of the complete content makes for less than 1% of the complete processing.
So IMO we should concentrate on those bottlenecks first. Some improvement is planned for CMSimple_XH 1.6 (see
http://www.cmsimpleforum.com/viewtopic.php?f=29&t=4813), but there's more to do.
eeeno wrote:And from rfc() function in cms.php the content seems to get duplicated in another array while processing it (content[] copied in c[] -array in line cms.php:645), so if the content.htm was 5MB of size, it allocates more than 10MB. I have debugged it and it seems to do so. I'd call that a performance issue at least...
Well, PHP does assign and call by value (except for objects since PHP 5). But this is implemented with a technique called copy-on-write or copy-on-demand. So after an assignment, the value is not directly copied, but only when one of both variables are modified. So in this case there's no duplication of the content.
But instead of assuming, we may actually do some benchmarks. The following is an rfc() with some benchmark output added (the rfc() is from XH 1.5.3, but it should work for XH 1.5.6 also):
Code: Select all
function rfc() {
global $c, $cl, $h, $u, $l, $su, $s, $pth, $tx, $edit, $adm, $cf;
$time0 = microtime(true);
$c = array();
$h = array();
$u = array();
$l = array();
$empty = 0;
$duplicate = 0;
$time1 = microtime(true);
echo sprintf('%fs, %dbytes AT START<br>', $time1-$time0, memory_get_peak_usage(true));
$content = file_get_contents($pth['file']['content']);
$time2 = microtime(true);
echo sprintf('%fs, %dbytes AFTER READING OF CONTENT<br>', $time2-$time1, memory_get_peak_usage(true));
$stop = $cf['menu']['levels'];
$split_token = '#@CMSIMPLE_SPLIT@#';
$content = preg_split('~</body>~i', $content);
$content = preg_replace('~<h[1-' . $stop . ']~i', $split_token . '$0', $content[0]);
$content = explode($split_token, $content);
array_shift($content);
$time3 = microtime(true);
echo sprintf('%fs, %dbytes AFTER SPLITTING OF CONTENT<br>', $time3-$time2, memory_get_peak_usage(true));
foreach ($content as $page) {
$c[] = $page;
preg_match('~<h([1-' . $stop . ']).*>(.*)</h~isU', $page, $temp);
$l[] = $temp[1];
$temp_h[] = preg_replace('/[ \f\n\r\t\xa0]+/isu', ' ', trim(strip_tags($temp[2])));
}
$time4 = microtime(true);
echo sprintf('%fs, %dbytes AFTER COPYING TO $c<br>', $time4-$time3, memory_get_peak_usage(true));
$cl = count($c);
$s = -1;
if ($cl == 0) {
$c[] = '<h1>' . $tx['toc']['newpage'] . '</h1>';
$h[] = trim(strip_tags($tx['toc']['newpage']));
$u[] = uenc($h[0]);
$l[] = 1;
$s = 0;
return;
}
$ancestors = array(); /* just a helper for the "url" construction:
* will be filled like this [0] => "Page"
* [1] => "Subpage"
* [2] => "Sub_Subpage" etc.
*/
foreach ($temp_h as $i => $heading) {
$temp = trim(strip_tags($heading));
if ($temp == '') {
$empty++;
$temp = $tx['toc']['empty'] . ' ' . $empty;
}
$h[] = $temp;
$ancestors[$l[$i] - 1] = uenc($temp);
$ancestors = array_slice($ancestors, 0, $l[$i]);
$url = implode($cf['uri']['seperator'], $ancestors);
$u[] = substr($url, 0, $cf['uri']['length']);
}
foreach ($u as $i => $url) {
if ($su == $u[$i] || $su == urlencode($u[$i])) {
$s = $i;
} // get index of selected page
for ($j = $i + 1; $j < $cl; $j++) { //check for duplicate "urls"
if ($u[$j] == $u[$i]) {
$duplicate++;
$h[$j] = $tx['toc']['dupl'] . ' ' . $duplicate;
$u[$j] = uenc($h[$j]);
}
}
}
if (!($edit && $adm)) {
foreach ($c as $i => $j) {
if (cmscript('remove', $j)) {
$c[$i] = '#CMSimple hide#';
}
}
}
$time5 = microtime(true);
echo sprintf('%fs, %dbytes WHEN READY<br>', $time5-$time4, memory_get_peak_usage(true));
}
The times are differences to the step before, the memory allocation are absolute values.
Results for the default content (19 pages, 15KB) on my local machine:
Code: Select all
0.000018s, 524288bytes AT START
0.001092s, 524288bytes AFTER READING OF CONTENT
0.000769s, 786432bytes AFTER SPLITTING OF CONTENT
0.001263s, 786432bytes AFTER COPYING TO $c
0.003374s, 786432bytes WHEN READY
Results for an content with 399 pages (1.7 MB):
Code: Select all
0.000022s, 524288bytes AT START
0.009717s, 2359296bytes AFTER READING OF CONTENT
0.044081s, 7864320bytes AFTER SPLITTING OF CONTENT
0.033905s, 7864320bytes AFTER COPYING TO $c
0.238963s, 7864320bytes WHEN READY
Results for a content with 1110 pages (6.7MB):
Code: Select all
0.000021s, 524288bytes AT START
0.040865s, 7602176bytes AFTER READING OF CONTENT
0.158541s, 28835840bytes AFTER SPLITTING OF CONTENT
0.081218s, 28835840bytes AFTER COPYING TO $c
1.392920s, 28835840bytes WHEN READY
This shows that actually reading the content isn't so much the problem, but instead the further processing. The splitting of the content and the copying to $c take much longer, particularly for a large content. But the real problem is the processing afterwards to fill $h and $u. $h contains the page headings, $u contains the page `URLs'. Both arrays would have to be filled completely for every page request, as otherwise the current page couldn't be identified and the menu couldn't be build.
Improving this was the reason for the modified rfc() I've posted in the other thread. A comparision for the content with 1110 pages:
Code: Select all
0.000693s, 786432bytes AT START
0.062194s, 15466496bytes AFTER FILLING THE ARRAYS
0.089619s, 15466496bytes WHEN READY
The consumed memory is nearly halved, and the time for the processing is only about 10% of the original rfc(). That's already a nice improvement, which doesn't break compatibility with existing extensions.
Okay, let's compare that to a content file with 1110
empty pages (containing only the page headings, which have to be read in anyway):
Code: Select all
0.000856s, 786432bytes AT START
0.020404s, 1310720bytes AFTER FILLING THE ARRAYS
0.055142s, 1310720bytes WHEN READY
The time for processing is halved and the required memory is less than 10%. That's a great improvement. (But please note, that we have not yet read in the necessary pages.)
Now let's consider the overall picture: we may be able to use CMSimple for medium sized websites. For large websites, the improvement will probably not suffice (at least there are limits). But IMO those medium sized websites usually are not administrated by a single user, but usually are `administrated' by a team of several editors (+ some other personnel). But there are no provisions for any user management, and, anyway, editing a CMSimple website in parallel is not possible (one will overwrite the content others have created). So the improvements don't seem too necessary at all.
Let's consider the drawbacks: storing the content in multiple files is always a bit of a problem as there is no transaction handling for file systems available yet (at least not in practise). So it could happen, that the files get out of sync. And of course there's the problem with breaking compatibility with existing extensions:
eeeno wrote:There could be legacy mode- for all the old plugins that need to be updated so basically it'd populate the global variable with the content as usual but disabling it would process contents more efficiently. So this could be a kind of soft way to implement it.
An optional legacy mode will actually increase the complexity of the code of CMSimple_XH (I'm not sure, how much, however). But anyway, this legacy mode has to be enabled, if only a single plugin is in use, that requires the legacy mode. I guess, that will make the new mode an option only for very few sites, as I doubt that many plugin developers will rewrite their plugins.
within download()- function it may be unnecessary to read and output a file with php as it of course reads them to the memory again... it could just redirect to the file.
If one wants to offer the file directly, he can simply link to it directly. &download serves actually 2 purposes: for one it's possible to force the download of a file, that otherwise would be shown in the browser directly (e.g. PDFs and images). Additionally it's possible to make the file inaccessible for direct access, but to only offer it on a CMSimple page (which may be hidden or protected).
Of course the way the file is processed for downloading should be improved. Instead of reading and echoing the file, function readfile() should be used:
eeeno wrote:pagemanager looks great, but is it necessary to have the plugin read the content file directly as it was already handled by the cms itself?
It might look great, but actually it's badly implemented. There are several issues I'm aware of, but unfortunately didn't have the time to fix them yet. Reading the content a second time is actually necessary, as the core doesn't offer the unmodified page headings (it might replace them with EMPTY HEADING or DUPLICATE HEADING). But the bottleneck with Pagemanager is for sure not the duplicate reading of the content, but the huge amount of JavaScript, which has to be parsed and executed.
eeeno wrote:General API for plugins for easy access to the cms if needed
ACK. Of course a bit is already there, and some more is planned for XH 1.6. The problem is: that will hardly be used, as the plugin developers have to cater for older versions of CMSimple_XH and for other variants of CMSimple, particularly CMSimple 4.
eeeno wrote:Contents (one page at a time) in a class object "model", and functions to retrieve content, modify it and search. Another model could be initialized with contents of another page when needed. Search results cache.
I agree, that CMSimple_XH should be refactored to use a MVC architecture. But IMO it's not possible to do this in a single step: there's simply too less man power, and there are several obstacles regarding compatibility to do so. So it may be better to do it in several steps; for XH 1.6 the first step is restructuring and procedural breakdown, what's already happened in the SVN's 1.6 branch, and what was quite some work and will probably introduce a lot of new bugs. After we have that stable again, we can go on. At least I prefer to eat the elephant a burger a time.
eeeno wrote:I have now made some quite succesful experiments. [...]
Sounds interesting. If you like to present the modifications, you may offer a download, so myself and others could have a look at it.
Christoph