I'm not sure if it is relevant for Elk at that time (even more because IIRC there is the intention to drop the file-based cache from 1.1), but recently I got some reports that SMF 2.0.9 still presents the "old" cache corruption problem that leads to a crash of the forum.
SMF 2.0.9 incorporates basically the same "safety mechanisms" that Elk has in order to try to avoid the race condition when writing the modSettings cache to the disk, see:
http://custom.simplemachines.org/upgrades/index.php?action=upgrade;file=smf_patch_2.0.7.tar.gz;smf_version=2.0.6
// Write the file.
if (function_exists('file_put_contents'))
{
$cache_bytes = @file_put_contents($cachedir . '/data_' . $key . '.php', $cache_data, LOCK_EX);
if ($cache_bytes != strlen($cache_data))
@unlink($cachedir . '/data_' . $key . '.php');
}
else
{
$fh = @fopen($cachedir . '/data_' . $key . '.php', 'w');
if ($fh)
{
// Write the file.
set_file_buffer($fh, 0);
flock($fh, LOCK_EX);
$cache_bytes = fwrite($fh, $cache_data);
flock($fh, LOCK_UN);
fclose($fh);
// Check that the cache write was successful; all the data should be written
// If it fails due to low diskspace, remove the cache file
if ($cache_bytes != strlen($cache_data))
@unlink($cachedir . '/data_' . $key . '.php');
}
}
while Elk's code is:
if (@file_put_contents(CACHEDIR . '/data_' . $key . '.php', $cache_data, LOCK_EX) !== strlen($cache_data))
@unlink(CACHEDIR . '/data_' . $key . '.php');
The only difference (looking at the file_put_contents branch) is that in Elk there is no variable getting the result of file_put_content.
So, apparently all this is not enough to avoid the issue. It still appears from time to time.
While writing this post, I decided to try to find a way to do some testing, and I think I discovered something odd.
I found:
http://jakarta.apache.org/jmeter/
a software that may be used to simulate load on the server.
I think I set it up to simulate 200 users requesting the home page in a loop "for a while".
Then I started monitoring in particular the modSettings cache file.
In my tests, everything seems to be fine for about 50 seconds, then the cache file is regenerated, even though it should last 90 seconds. Then, from that point on, the regenerations are mostly random: from 1 to 30 seconds, but never 90 as expected.
I'm not sure what is happening.
I'm pretty sure Load.php is not changed/touched during that period (otherwise the key would change and the cache file deleted, and anyway ls -l confirms it was last changed before the experiments).
And if there is no concurrent page loads (i.e. simple ctrl+r every few seconds) the file is replaced exactly when expected.
Soooo.... no idea.
Should I continue debug the issue?
Do you have any suggestion?
Should I just give up and think to something quite more important?