More memory efficient handling of content

Discussions and requests related to new CMSimple features, plugins, templates etc. and how to develop.
Please don't ask for support at this forums!
eeeno
Posts: 12
Joined: Sat Mar 16, 2013 1:38 pm

More memory efficient handling of content

Post by eeeno » Sat Mar 16, 2013 7:20 pm

IMO it is relevant to discuss this matter in this section.

http://www.cmsimpleforum.com/viewtopic. ... 539#p34537

So basically the way cmsimple is designed, all the contents need to be accessible from a global variable. I think it's no good for use in a production environment if the site gets more popular. The content file is loaded and processed on each and every request. I find it wasteful to allocate memory for all of the site content. Only necessary content should be loaded.

I do understand if someone may find it a less of an issue but i don't think it's a good idea to read even several megabytes of data for each request on a limited shared hosting environment. It's hardly optimized.
EDIT: And from rfc() function in cms.php the content seems to get duplicated in another array while processing it (content[] copied in c[] -array in line cms.php:645), so if the content.htm was 5MB of size, it allocates more than 10MB. :!: :!: I have debugged it and it seems to do so. I'd call that a performance issue at least...

I thought of some ways to fix it and at the same time break the whole cms ;-)

There could be legacy mode- for all the old plugins that need to be updated so basically it'd populate the global variable with the content as usual but disabling it would process contents more efficiently. So this could be a kind of soft way to implement it.
I have checked how it works. And if i'm not completely wrong the function in adm.php:728 shows how the content is handled on each request.

file: adm.php: line: 728
function read_content_file($path)

My suggestions:
- Generate "content map" with start- and end- bytes information, could be saved within content/pagedata.php???
- Content map would be regenerated each time the content is edited
- On each page load, only the necessary page content would be loaded.
offset= start reading at this byte
maxlen= end at this byte
file_get_contents($path . '/content/content.htm',false,null,offset,maxlen)


This would eliminate the need to read all the content within an array and find each pagebreak on every single page load. Huge difference!

- content.htm should be read in chunks only to make it possible to process bigger filesizes. Content.htm could contain the data split in chunks of 500 KB. One chunk could be read at a time when generating "the map" or searching. No separate files.
EDIT2: within download()- function it may be unnecessary to read and output a file with php as it of course reads them to the memory again... it could just redirect to the file. :idea:
EDIT3: pagemanager looks great, but is it necessary to have the plugin read the content file directly as it was already handled by the cms itself? :idea:
EDIT4: General API for plugins for easy access to the cms if needed
EDIT5: Contents (one page at a time) in a class object "model", and functions to retrieve content, modify it and search. Another model could be initialized with contents of another page when needed. Search results cache.

EDIT6: I have now made some quite succesful experiments. Viewing site content works fine, with only requested content populated in $c[] along with newsboxes. Reading content.htm partially works and does return the correct page information but editing is broken as it just dumps all content there each time when saved. So i will take a look at that. Pagemanager plugin is now broken like many others but the good thing is that i didn't manage to break the whole cms so there's hope for me to get the changes working. There is a content.map file which is serialized array containing header,link,startbyte,length,level information. I made all the handling in an external class. I don't know if the code is acceptable or not it may just become my very own fork of the cms.

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: More memory efficient handling of content

Post by cmb » Sun Mar 17, 2013 4:28 pm

eeeno wrote:IMO it is relevant to discuss this matter in this section.
It is the right place :)
eeeno wrote:I find it wasteful to allocate memory for all of the site content. Only necessary content should be loaded.
Yes, it is wasteful. But that's for sure not the bottleneck for typical CMSimple sites. For example go to http://www.cmsimple-xh.org/ and have a look at the developments of a decent browser. Particularly interesting is the timeline: the actual request takes about 250ms, but all related (sub-)requests are finished it takes over 700ms. The problem is not so much the actual PHP processing, but the 8 images and 8 stylesheets, which trigger additional request to the server. Even for http://3-magi.net/demo/large/ the actual PHP processing makes up for only about half of the time it takes until the complete page is requested, though on this site are no additional plugins installed. On http://simplesolutions.dk/ several plugins are used for demonstration purposes. The actual PHP request takes only 50ms, but until everything is there it takes 4 sec. In the last case, the seamingly wasteful reading of the complete content makes for less than 1% of the complete processing.

So IMO we should concentrate on those bottlenecks first. Some improvement is planned for CMSimple_XH 1.6 (see http://www.cmsimpleforum.com/viewtopic.php?f=29&t=4813), but there's more to do.
eeeno wrote:And from rfc() function in cms.php the content seems to get duplicated in another array while processing it (content[] copied in c[] -array in line cms.php:645), so if the content.htm was 5MB of size, it allocates more than 10MB. I have debugged it and it seems to do so. I'd call that a performance issue at least...
Well, PHP does assign and call by value (except for objects since PHP 5). But this is implemented with a technique called copy-on-write or copy-on-demand. So after an assignment, the value is not directly copied, but only when one of both variables are modified. So in this case there's no duplication of the content.

But instead of assuming, we may actually do some benchmarks. The following is an rfc() with some benchmark output added (the rfc() is from XH 1.5.3, but it should work for XH 1.5.6 also):

Code: Select all

function rfc() {
    global $c, $cl, $h, $u, $l, $su, $s, $pth, $tx, $edit, $adm, $cf;
$time0 = microtime(true);
    $c = array();
    $h = array();
    $u = array();
    $l = array();
    $empty = 0;
    $duplicate = 0;
$time1 = microtime(true);
echo sprintf('%fs, %dbytes AT START<br>', $time1-$time0, memory_get_peak_usage(true));
    $content = file_get_contents($pth['file']['content']);
$time2 = microtime(true);
echo sprintf('%fs, %dbytes AFTER READING OF CONTENT<br>', $time2-$time1, memory_get_peak_usage(true));
    $stop = $cf['menu']['levels'];
    $split_token = '#@CMSIMPLE_SPLIT@#';


    $content = preg_split('~</body>~i', $content);
    $content = preg_replace('~<h[1-' . $stop . ']~i', $split_token . '$0', $content[0]);
    $content = explode($split_token, $content);
    array_shift($content);
$time3 = microtime(true);
echo sprintf('%fs, %dbytes AFTER SPLITTING OF CONTENT<br>', $time3-$time2, memory_get_peak_usage(true));
    foreach ($content as $page) {
        $c[] = $page;
        preg_match('~<h([1-' . $stop . ']).*>(.*)</h~isU', $page, $temp);
        $l[] = $temp[1];
        $temp_h[] = preg_replace('/[ \f\n\r\t\xa0]+/isu', ' ', trim(strip_tags($temp[2])));
    }
$time4 = microtime(true);
echo sprintf('%fs, %dbytes AFTER COPYING TO $c<br>', $time4-$time3, memory_get_peak_usage(true));
    $cl = count($c);
    $s = -1;

    if ($cl == 0) {
        $c[] = '<h1>' . $tx['toc']['newpage'] . '</h1>';
        $h[] = trim(strip_tags($tx['toc']['newpage']));
        $u[] = uenc($h[0]);
        $l[] = 1;
        $s = 0;
        return;
    }

    $ancestors = array();  /* just a helper for the "url" construction:
     * will be filled like this [0] => "Page"
     *                          [1] => "Subpage"
     *                          [2] => "Sub_Subpage" etc.
     */

    foreach ($temp_h as $i => $heading) {
        $temp = trim(strip_tags($heading));
        if ($temp == '') {
            $empty++;
            $temp = $tx['toc']['empty'] . ' ' . $empty;
        }
        $h[] = $temp;
        $ancestors[$l[$i] - 1] = uenc($temp);
        $ancestors = array_slice($ancestors, 0, $l[$i]);
        $url = implode($cf['uri']['seperator'], $ancestors);
        $u[] = substr($url, 0, $cf['uri']['length']);
    }

    foreach ($u as $i => $url) {
        if ($su == $u[$i] || $su == urlencode($u[$i])) {
            $s = $i;
        } // get index of selected page

        for ($j = $i + 1; $j < $cl; $j++) {   //check for duplicate "urls"
            if ($u[$j] == $u[$i]) {
                $duplicate++;
                $h[$j] = $tx['toc']['dupl'] . ' ' . $duplicate;
                $u[$j] = uenc($h[$j]);
            }
        }
    }
    if (!($edit && $adm)) {
        foreach ($c as $i => $j) {
            if (cmscript('remove', $j)) {
                $c[$i] = '#CMSimple hide#';
            }
        }
    }
$time5 = microtime(true);
echo sprintf('%fs, %dbytes WHEN READY<br>', $time5-$time4, memory_get_peak_usage(true));
} 
The times are differences to the step before, the memory allocation are absolute values.

Results for the default content (19 pages, 15KB) on my local machine:

Code: Select all

0.000018s, 524288bytes AT START
0.001092s, 524288bytes AFTER READING OF CONTENT
0.000769s, 786432bytes AFTER SPLITTING OF CONTENT
0.001263s, 786432bytes AFTER COPYING TO $c
0.003374s, 786432bytes WHEN READY
Results for an content with 399 pages (1.7 MB):

Code: Select all

0.000022s, 524288bytes AT START
0.009717s, 2359296bytes AFTER READING OF CONTENT
0.044081s, 7864320bytes AFTER SPLITTING OF CONTENT
0.033905s, 7864320bytes AFTER COPYING TO $c
0.238963s, 7864320bytes WHEN READY
Results for a content with 1110 pages (6.7MB):

Code: Select all

0.000021s, 524288bytes AT START
0.040865s, 7602176bytes AFTER READING OF CONTENT
0.158541s, 28835840bytes AFTER SPLITTING OF CONTENT
0.081218s, 28835840bytes AFTER COPYING TO $c
1.392920s, 28835840bytes WHEN READY
This shows that actually reading the content isn't so much the problem, but instead the further processing. The splitting of the content and the copying to $c take much longer, particularly for a large content. But the real problem is the processing afterwards to fill $h and $u. $h contains the page headings, $u contains the page `URLs'. Both arrays would have to be filled completely for every page request, as otherwise the current page couldn't be identified and the menu couldn't be build.

Improving this was the reason for the modified rfc() I've posted in the other thread. A comparision for the content with 1110 pages:

Code: Select all

0.000693s, 786432bytes AT START
0.062194s, 15466496bytes AFTER FILLING THE ARRAYS
0.089619s, 15466496bytes WHEN READY
The consumed memory is nearly halved, and the time for the processing is only about 10% of the original rfc(). That's already a nice improvement, which doesn't break compatibility with existing extensions.

Okay, let's compare that to a content file with 1110 empty pages (containing only the page headings, which have to be read in anyway):

Code: Select all

0.000856s, 786432bytes AT START
0.020404s, 1310720bytes AFTER FILLING THE ARRAYS
0.055142s, 1310720bytes WHEN READY
The time for processing is halved and the required memory is less than 10%. That's a great improvement. (But please note, that we have not yet read in the necessary pages.)

Now let's consider the overall picture: we may be able to use CMSimple for medium sized websites. For large websites, the improvement will probably not suffice (at least there are limits). But IMO those medium sized websites usually are not administrated by a single user, but usually are `administrated' by a team of several editors (+ some other personnel). But there are no provisions for any user management, and, anyway, editing a CMSimple website in parallel is not possible (one will overwrite the content others have created). So the improvements don't seem too necessary at all.

Let's consider the drawbacks: storing the content in multiple files is always a bit of a problem as there is no transaction handling for file systems available yet (at least not in practise). So it could happen, that the files get out of sync. And of course there's the problem with breaking compatibility with existing extensions:
eeeno wrote:There could be legacy mode- for all the old plugins that need to be updated so basically it'd populate the global variable with the content as usual but disabling it would process contents more efficiently. So this could be a kind of soft way to implement it.
An optional legacy mode will actually increase the complexity of the code of CMSimple_XH (I'm not sure, how much, however). But anyway, this legacy mode has to be enabled, if only a single plugin is in use, that requires the legacy mode. I guess, that will make the new mode an option only for very few sites, as I doubt that many plugin developers will rewrite their plugins.
within download()- function it may be unnecessary to read and output a file with php as it of course reads them to the memory again... it could just redirect to the file.
If one wants to offer the file directly, he can simply link to it directly. &download serves actually 2 purposes: for one it's possible to force the download of a file, that otherwise would be shown in the browser directly (e.g. PDFs and images). Additionally it's possible to make the file inaccessible for direct access, but to only offer it on a CMSimple page (which may be hidden or protected).

Of course the way the file is processed for downloading should be improved. Instead of reading and echoing the file, function readfile() should be used:
http://www.php.net/manual/en/function.readfile.php wrote:readfile() will not present any memory issues, even when sending large files, on its own.
eeeno wrote:pagemanager looks great, but is it necessary to have the plugin read the content file directly as it was already handled by the cms itself?
It might look great, but actually it's badly implemented. There are several issues I'm aware of, but unfortunately didn't have the time to fix them yet. Reading the content a second time is actually necessary, as the core doesn't offer the unmodified page headings (it might replace them with EMPTY HEADING or DUPLICATE HEADING). But the bottleneck with Pagemanager is for sure not the duplicate reading of the content, but the huge amount of JavaScript, which has to be parsed and executed.
eeeno wrote:General API for plugins for easy access to the cms if needed
ACK. Of course a bit is already there, and some more is planned for XH 1.6. The problem is: that will hardly be used, as the plugin developers have to cater for older versions of CMSimple_XH and for other variants of CMSimple, particularly CMSimple 4.
eeeno wrote:Contents (one page at a time) in a class object "model", and functions to retrieve content, modify it and search. Another model could be initialized with contents of another page when needed. Search results cache.
I agree, that CMSimple_XH should be refactored to use a MVC architecture. But IMO it's not possible to do this in a single step: there's simply too less man power, and there are several obstacles regarding compatibility to do so. So it may be better to do it in several steps; for XH 1.6 the first step is restructuring and procedural breakdown, what's already happened in the SVN's 1.6 branch, and what was quite some work and will probably introduce a lot of new bugs. After we have that stable again, we can go on. At least I prefer to eat the elephant a burger a time.
eeeno wrote:I have now made some quite succesful experiments. [...]
Sounds interesting. If you like to present the modifications, you may offer a download, so myself and others could have a look at it.

Christoph
Christoph M. Becker – Plugins for CMSimple_XH

eeeno
Posts: 12
Joined: Sat Mar 16, 2013 1:38 pm

Re: More memory efficient handling of content

Post by eeeno » Sun Mar 17, 2013 7:23 pm

I don't know if it is usable so i have to at least get the editing working before puting it anywhere.

My idea was to have a cache file (pagemap) of those headings,urls,levels, offsets and content lengths for more efficient access to content.htm.
The pagemap would be re-generated when the content is changed and saved within the editor. Of course the risk of getting it out of sync is probable so
having a recovery method could try to revert contentfile to the latest backup.

So if that is implemented, then there's no need to have that much heavy processing on the content on every single request. It'd only be updated when the content is changed.
So basically when there is a request
1.) pagemap is read
2.) necessary global variables are populated with values, of headings, levels and so on. That information should come from pagemap cache, no processing of content.htm necessary.
3.) requested url is searched and if found the content is read in specific length, starting from the specified offset., voilá and put in $c [index ]

When page content is saved with the editor
1.) pagemap is read, heading is searched
2.) fopen, fseek to the offset, specified length is truncated (old page), fseek back to offset
3.) fwrite content (new page content) from that offset
4.) pagemap is re-generated

I agree that having legacy mode would make the project more complicated. It is not for certain if any of my ideas would make it more efficient but i just assume that it must if instead of 1MB data it just reads the content that is needed, 5kB , 10 kB :lol:

Of course it is not much of a difference with decent hardware and resources.. but by design it could aim at better handling of the data.

The thing is that i don't completely understand the purpose of all those global variables so i have to do some more debugging.

Cmsimple probably doesn't completely need to be MVC but the "datasources" could be a model.

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: More memory efficient handling of content

Post by cmb » Sun Mar 17, 2013 7:44 pm

eeeno wrote:My idea was to have a cache file (pagemap) of those headings,urls,levels, offsets and content lengths for more efficient access to content.htm.
Well, a bit similar to my modified rfc(), except that your solution doesn't contain the content, but instead the offsets and lengths.
eeeno wrote:When page content is saved with the editor
1.) pagemap is read, heading is searched
2.) fopen, fseek to the offset, specified length is truncated (old page), fseek back to offset
3.) fwrite content (new page content) from that offset
4.) pagemap is re-generated
Here it's getting tricky. For one, I'm not sure if you have considered that the new page content may be larger than the old. Anyway, the trickiest part is the handling of inserted pages. One can add a page after the current one simply by adding a new heading (<h1>-<h3>) below the content. On saving from the editor the new page will be inserted in content.htm and pagedata.php. This has to be catered for additionally.

Considering both points, a viable solution might be to combine both ideas: the pagemap and its content file are created the same way as with my modified rfc(). The benefits: it's not necessary to change anything regarding the saving of the content, and if the files would be out of sync, they simply can be deleted as the original content.htm is still there. At least an option to consider.
eeeno wrote:The thing is that i don't completely understand the purpose of all those global variables so i have to do some more debugging.
The list on http://www.cmsimple-xh.org/wiki/doku.php/core_variables might help a little bit. It's far from being a good developer documentation, but currently all that's available.
Christoph M. Becker – Plugins for CMSimple_XH

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: More memory efficient handling of content

Post by cmb » Sun Mar 17, 2013 8:48 pm

cmb wrote:Considering both points, a viable solution might be to combine both ideas: the pagemap and its content file are created the same way as with my modified rfc().
I've made the following rfc() (I had to split it to two separate code sections, as the board software doesn't seem to accept it in a single code section):

Code: Select all

function rfc() {
    global $c, $cl, $h, $u, $l, $su, $s, $pth, $tx, $edit, $adm, $cf, $offsets, $lengths;

    if (($edit && $adm) || filemtime($pth['file']['content']) > filemtime($pth['folder']['content'] . 'pagemap')) {
        $c = array();
        $h = array();
        $u = array();
        $l = array();
        $empty = 0;
        $duplicate = 0;

        $content = file_get_contents($pth['file']['content']);

        $stop = $cf['menu']['levels'];
        $split_token = '#@CMSIMPLE_SPLIT@#';


        $content = preg_split('~</body>~i', $content);
        $content = preg_replace('~<h[1-' . $stop . ']~i', $split_token . '$0', $content[0]);
        $content = explode($split_token, $content);
        array_shift($content);

        foreach ($content as $page) {
            $c[] = $page;
            preg_match('~<h([1-' . $stop . ']).*>(.*)</h~isU', $page, $temp);
            $l[] = $temp[1];
            $temp_h[] = preg_replace('/[ \f\n\r\t\xa0]+/isu', ' ', trim(strip_tags($temp[2])));
        }

        $cl = count($c);
        $s = -1;

        if ($cl == 0) {
            $c[] = '<h1>' . $tx['toc']['newpage'] . '</h1>';
            $h[] = trim(strip_tags($tx['toc']['newpage']));
            $u[] = uenc($h[0]);
            $l[] = 1;
            $s = 0;
            return;
        }

        $ancestors = array();  /* just a helper for the "url" construction:
         * will be filled like this [0] => "Page"
         *                          [1] => "Subpage"
         *                          [2] => "Sub_Subpage" etc.
         */

        foreach ($temp_h as $i => $heading) {
            $temp = trim(strip_tags($heading));
            if ($temp == '') {
                $empty++;
                $temp = $tx['toc']['empty'] . ' ' . $empty;
            }
            $h[] = $temp;
            $ancestors[$l[$i] - 1] = uenc($temp);
            $ancestors = array_slice($ancestors, 0, $l[$i]);
            $url = implode($cf['uri']['seperator'], $ancestors);
            $u[] = substr($url, 0, $cf['uri']['length']);
        }

        foreach ($u as $i => $url) {
            if ($su == $u[$i] || $su == urlencode($u[$i])) {
                $s = $i;
            } // get index of selected page

            for ($j = $i + 1; $j < $cl; $j++) {   //check for duplicate "urls"
                if ($u[$j] == $u[$i]) {
                    $duplicate++;
                    $h[$j] = $tx['toc']['dupl'] . ' ' . $duplicate;
                    $u[$j] = uenc($h[$j]);
                }
            }
        }
 

Code: Select all

        // create the pagemap and the pagecontent files
        $offsets = array();
        $lengths = array();
        for ($i = 0; $i < $cl; $i++) {
            $offsets[$i] = $i > 0 ? $offsets[$i - 1] + $lengths[$i - 1] : 0;
            $lengths[$i] = strlen($c[$i]);
        }
        $cache = array($h, $u, $l, $offsets, $lengths);
        $fp = fopen($pth['folder']['content'] . 'pagemap', 'w');
        fwrite($fp, serialize($cache));
        fclose($fp);
        $fp = fopen($pth['folder']['content'] . 'pagecontent', 'w');
        for ($i = 0; $i < $cl; $i++) {
            fwrite($fp, $c[$i]);
        }
        fclose($fp);
    } else {
        $cache = unserialize(file_get_contents($pth['folder']['content'] . 'pagemap'));
        list($h, $u, $l, $offsets, $lengths) = $cache;
        $cl = count($h);
        $s = 0;
        foreach ($u as $i => $url) {
            if ($su == $u[$i] || $su == urlencode($u[$i])) {
                $s = $i;
            } // get index of selected page
        }
        $c = array();
        $c[$s] = file_get_contents($pth['folder']['content'] . 'pagecontent',
                                   false, null, $offsets[$s], $lengths[$s]);
    }
    if (!($edit && $adm)) {
        foreach ($c as $i => $j) {
            if (cmscript('remove', $j)) {
                $c[$i] = '#CMSimple hide#';
            }
        }
    }
} 
That superfluously seems to work fine. But when one enters admin mode having debug mode enabled, one can image the many problems arising. All comes down to this: having the full content available in $c is intrinsically expected by CMSimple and its plugins. One of the problems are hidden pages. Those are marked either by a field in pagedata.php (which is processed by the page_params plugin, which writes a #CMSimple hide# to the page content) or by #CMSimple hide# directly. But this information is not available, when only the content of the current page is read. This may be remedied, by using the pagedata flag directly in the core and ignore any #CMSimple hide#, but that's neither good practice (the core relies on a setting of a plugin) nor expected by CMSimple users (many of them are accustomed to hide a page by writing #CMSimple hide#). And that's only the tip of the iceberg.

ISTM a prerequisite for such improvements is a clean procedural or probably better object oriented API. If $c wouldn't be accessed directly, but instead through getter and setter functions/methods, it would be much less of a problem to implement a read-on-demand solution for page content. (Actually it would be even possible to use any DB for content storage). But simply changing the core to require this would break many existing extensions. A transitional solution could be a plugin framework, which wraps all the needed accesses to the global variables in getter and setter functions/methods, so a plugin author will be able to write a plugin without directly accessing any global variable right now, and by simply changing the plugin framework, the plugin would work with a future version of CMSimple, which doesn't offer any global variables. But I see three problems with that approach:
  • Somebody has to develop and maintain this plugin framework
  • The plugin developers had to use it for new plugins
  • Old plugins had to be rewritten to use it
Sigh--probably a long way to go...
Christoph M. Becker – Plugins for CMSimple_XH

eeeno
Posts: 12
Joined: Sat Mar 16, 2013 1:38 pm

Re: More memory efficient handling of content

Post by eeeno » Sun Mar 17, 2013 9:19 pm

The combination of your caching idea and pagemap is something that i was aiming at, just without the page content and it'd be more permanent and updated only when content is changed.

Infact i had copied parts of the code from rfc() and all those loops indeed are unnecessary if those global variables could be populated directly from pagemap. It is sort of doing that already but the code is a little messy. $c [] is filled with empty values, and requested content is added to its corresponding index. That's just to make menus show up as it seems to rely on the count of the indexes.

Saving of the data could be left as it is, just pagemap needs to be regenerated each time. On editor view it doesn't seem to need all the content available so all that would be loaded when content is saved. It's not an issue at all as it's not called on every request.

It might of course break plugins but.. well let them break. :twisted: It probably won't be much major change to get them working again.

Well you came up with that code pretty quickly. It seems to do just what i was thinking but i haven't tested/tried that yet.

There should be getter/setter function for plugins, it could just load requested content into its corresponding index in $c [ ] and plugin would continue as usual. And setter, for saving and triggering the update of pagemap.

Maybe the solution is more simple than one might expect... hmmmm :lol: It just takes time to find "the right" way. Maybe this can't be accomplished without making some drastic redesigning on the way data is handled but it is worth it in a long run.

Well call it 2.0 alpha without functional plugins...

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: More memory efficient handling of content

Post by cmb » Sun Mar 17, 2013 9:47 pm

eeeno wrote:The combination of your caching idea and pagemap is something that i was aiming at, just without the page content and it'd be more permanent and updated only when content is changed.
Indeed, the additional pagecontent file is not necessary, as the content can be read from content.htm directly. Updating the pagemap from rfc() happens only, when content.htm has changed
eeeno wrote:It might of course break plugins but.. well let them break. :twisted: It probably won't be much major change to get them working again.
.
Breaking plugins is a bit of a problem. There are many plugins out there which still work fine, but they are currently unmaintained. Even if the fixes would need only minor changes, that might be a problem due to license restrictions.

And CMSimple without plugins is basically useless, IMO. The system is so minimalistic, that nearly every website needs some additional functionality. Having this available as plugin is fine. It's not so fine, if you have to write your own plugin(s)...
eeeno wrote:There should be getter/setter function for plugins, it could just load requested content into its corresponding index in $c [ ] and plugin would continue as usual. And setter, for saving and triggering the update of pagemap.
ACK. But we should be careful to avoid introducing an API, which will have to be changed again in the near future, because of inherent limitations. Changing a plugin once may be okay for the active plugin developers; having to change it for each new CMSimple_XH version may be simply too much (consider that the plugin should be usable with several versions and variants).
eeeno wrote:Well call it 2.0 alpha without functional plugins...
I'd rather avoid working on 2.0 alpha, while 1.6 is still in the development stage. ;)

What do others think about this topic? I'm looking forward to hear your opinion!
Christoph M. Becker – Plugins for CMSimple_XH

eeeno
Posts: 12
Joined: Sat Mar 16, 2013 1:38 pm

Re: More memory efficient handling of content

Post by eeeno » Sun Mar 17, 2013 10:43 pm

Well then this optimization could be enabled or disabled within the settings.

"Cache cms globals" default: enabled
if enabled would make use of pagemap for faster processing of globals, still populating c[] with all of the content and maintaining plugin compatibility.

"Optimized content retrieval" default: disabled (atm)
if enabled, would make use of pagemap and populate only requested content in $c[ ], breaking the support of unmaintained plugins that depend on fully populated $c [ ].

IMO that is then the only way to implement this. The plugin development guidelines should be updated with information on how to access the content.

There would be a need for basic getter and setter functions that just work and dont need that much changing. That could be implemented as a separate plugin or class named like pageAccess or something.

Btw, content search function could read content.htm directly in smaller chunks at a time and when it finds the occurrance, it could go thru pagemap data to find the actual page that it corresponds to, just by evaluating if the offset falls in between the pageoffset and pageoffset+length. :idea:

snafu
Posts: 352
Joined: Sun Dec 26, 2010 5:18 pm

Re: More memory efficient handling of content

Post by snafu » Mon Mar 18, 2013 9:02 am

sodele, lange genug zugeschaut.
was soll cmsimple eigentlich werden?
bisher war es ein kleines schnittiges cms mit einer Größenbeschränkung, einer Grenze die die meisten Anwender selten erreichen werden.
Sozusagen ein schnittiger Kombi, beschränkte Zuladung, begrenzte Höchstgeschwindigkeit, und wenn man ihn auflastet und mit einer Hängerkupplung versieht, passt zwar mehr rein, aber er wird nicht nur langsamer, sondern man wird ihn auch nie in ein Turbo Sattelschlepper verwandeln.
Den Zustand zu verbessern (noch ein bisschen mehr Zuladung, ein paar mm Bodenfreiheit, ein paar PS mehr) wird irgendwann seine Grenze erreichen, das Fahrwerk macht nicht mehr mit und die Karossrie passt eh nicht mehr drüber .... man müßte in Wirklichkeit ein neues Auto entwerfen.

Was ihr jetzt versucht ist Flickschusterei. Ein neues Datenmodell, bei dem wohl auch die meisten Plugins gesprengt werden und dann trotzdem krampfhaft an der Dateibasierung festzuhalten? Das macht nicht wirklich Sinn! Da könntet (und solltet) ihr konsequent auf mysql umsatteln, dann hätten Fragen hinsichtlich der Limits ein Ende und Randerscheinungen wie Mehrbenutzertauglichkeit wären auch abgehandelt.
Wenn dann nach "aussen" die Bedienung über die CMSimple Variablen und das scripting sich nicht ändert, würde sich für Anwender auch nix in der Bedienung ändern, ein echter Mehrwert.

99% der Anwender hätten überhaupt kein Problem damit, wenn es in der Doku z.B. hieße : Die Größe der Contentdatei sollte aus Gründen der Zugriffs und Verarbeitungsgeschwindigkeit auf 5MB beschränkt sein .... und dann die Auflistung der paar "warum".
Ein Beispiel wieviel TEXT man in 5MB unterbringen kann, wäre auch nicht schlecht. Die komplette Bibel, AT+NT, kommt ja nichtmal auf 1,5MB.

Das es ein paar Seiten gibt die locker die Limits von CMSimple sprengen, klar, ja und? Da hat sich der Benutzer einfach die falsche Lösung für sein Problem ausgesucht. Das ist wie das Thema um die Mehrbenutzerfähigkeit. Das entspricht nicht dem technischen Konzept von CMSimple, das ließ sich nur über subsites/coauthor und andere Tricks halbwegs realisieren. Deswegen rate ich immer noch jedem, der ein funktionierendes Mehrbenutzersystem innerhalb einer Installation benötigt, zu Wordpress.
Abgesehen von den diversen Endanwendertricks um die Limits auszuhebeln (eine Vereinszeitung nimmt einfach pro Ausgabejahr eine Subsite, aggregiert auf der Hauptseite, trivial, aber funktioniert)

und ja, dann wird die Mehrheit immer noch zu Wordpress greifen, aber nicht weil dort kein Contentlimit existiert, sondern einfach weil:
mehr Templates, mehr Plugins, komfortableres Backend, mehr Dokumentation, updates von plugins/core übers backend. Einfach mehr Komfort.
Und daß Wordpress auf einem Server mit 64M memory_limit in wirklichkeit nicht rund läuft (keine funktionierende ping/trackbacks), auch nicht auf einem mit 128m, fällt den meisten gar nicht auf, die benutzen sowas nichtmal. Das eine WP Grundinstallation ohne Plugins (eine Page, ein Post, ein Kommentar, default Template) hinsichtlich Aufruf und Zugriff in dem Zeitbereich ist, das CMSimple mit 5mb Contentfile (+1mb pagedata) benötigt, stört auch niemanden. Genausowenig, daß eine WP Installation mit deutschem Sprachfile gleichmal 50 MB Server Ram belegt. Die Verwaltung von Pages (nicht Posts!) in WP lässt ab einer 3stelligen Seitenzahl ausserdem auch keine Freude aufkommen, aber der Mehrheit der WP Benutzern geht es ähnlich wie der Mehrheit der CMSimple Benutzern, sie kommen einfach nie auf das Datenvolumen um sowas überhaupt zu bemerken.

oder auch, was soll ich einem interessierten erzählen, wenn es darum geht sich zwischen cmsimple und wordpress zu entscheiden?
Ich bin argumentativ flexibel, je nach Anspruch und Bedarf ist es null Problem jemandem zu cmsimple zu führen. Wenn jemand den Anspruch hat, weil er ein bisschen was von css und html versteht, selber tun zu wollen, reicht es schon ihm ein Template Verzeichniss von Wordpress und dannach eines von cmsimple zu zeigen :-)

Das Konzept für CMSimple war damals genial, alles in eine HTML contentdatei zu packen und die Struktur über H1-H3 aufzubauen. Das man damit an irgendwann mal, für eine Minderheit von Anwendern, an Grenzen stossen könnte, war auch abzusehen.
lange rede kurzer sinn: also, wo solls hin mit cmsimple? was ist die vision? jedem minderheitenwunsch hinterherlaufen, flickschustern am vorhandenen datenmodell, oder sich auf die Stärken konzentrieren und an den Dingen orientieren, die für die 99% interessant sind ;-}
oder tatsächlich alle Altlasten über Bord werfen, neues DB orientiertes Datenmodell entwerfen und dabei die schlichte Eleganz der Einbindung über überschaubare Template tags und "scripting" Anweisungen beibehalten. Wenn man schon die Kompatibilität über Bord kippt, sollte es sich zumindest langfristig lohnen.

meine 5cent :mrgreen:
(boah, in google translate sieht der text gruselig aus)
lg.
winni

Durch einen Sucher betrachtet wird alles zu einem Motiv.
meine Galerie; mein Blog, mein CMSimple Template Tutorial

eeeno
Posts: 12
Joined: Sat Mar 16, 2013 1:38 pm

Re: More memory efficient handling of content

Post by eeeno » Mon Mar 18, 2013 10:19 am

It's not like anyone would be fitting a new testament within cmsimple. :lol: But you know with html markup the site may become bigger than expected even if it didn't contain that much content. Cmsimple is flawed by "read everything" at one array design at every single request. So this enhancement addresses that issue and at the same time compromises compatibility. I don't care much for the plugins as they may contain many more suprises than the cms itself, of course i could make my own plugins. Not everyone can do that, i agree. :ugeek: So it is a little biased from my side. :lol:

WP is such a bloatware, but it never reads entire db tables into memory by default - or that may very well be the case as it just is so slow. So wouldn't surprise me at all.

Google makes you sound like you didn't like the idea because "if it isn't broken, don't fix it". The basic site structure with heading-tags isn't compromised by this alternative way. It's just far more efficient.

I wouldn't say that i'm just 1% because many users are technical enough to understand the difference and would of course seek for better db-less alternatives. Well and for me if this never makes into the official version, i'm just better off using the modified slicker core.

Post Reply