Skip to main content
Topic: cache (Read 11135 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

cache

Reading Kays and Arantor discussion on caching the menu, and following my own doubts about the cache, I have a couple of questions:
are we caching enough?
are we caching meaningful things?
The menu appears to be not the most important thing to cache.

This is the list of things we are currently caching:
sources/Load.php:if (($modSettings = cache_get_data('modSettings', 90)) == null)
sources/Load.php:if (($modSettings['load_average'] = cache_get_data('loadavg', 90)) == null)
sources/Load.php:if (empty($modSettings['cache_enable']) || $modSettings['cache_enable'] < 2 || ($user_settings = cache_get_data('user_settings-' . $id_member, 60)) == null)
sources/Load.php:if (ELKARTE != 'SSI' && !isset($_REQUEST['xml']) && (!isset($_REQUEST['action']) || $_REQUEST['action'] != '.xml') && empty($_SESSION['id_msg_last_visit']) && (empty($modSettings['cache_enable']) || ($_SESSION['id_msg_last_visit'] = cache_get_data('user_last_visit-' . $id_member, 5 * 3600)) === null))
sources/Load.php:if (($topic = cache_get_data('msg_topic-' . $_REQUEST['msg'], 120)) === NULL)
sources/Load.php:$temp = cache_get_data('topic_board-' . $topic, 120);
sources/Load.php:$temp = cache_get_data('board-' . $board, 120);
sources/Load.php:if ($modSettings['cache_enable'] >= 2 && !empty($board) && ($temp = cache_get_data('permissions:' . $cache_groups . ':' . $board, 240)) != null && time() - 240 > $modSettings['settings_updated'])
sources/Load.php:elseif (($temp = cache_get_data('permissions:' . $cache_groups, 240)) != null && time() - 240 > $modSettings['settings_updated'])
sources/Load.php:$data = cache_get_data('member_data-' . $set . '-' . $users[$i], 240);
sources/Load.php:if (($row = cache_get_data('moderator_group_info', 480)) == null)
sources/Load.php:if (!empty($modSettings['cache_enable']) && $modSettings['cache_enable'] >= 2 && ($temp = cache_get_data('theme_settings-' . $id_theme . ':' . $member, 60)) != null && time() - 60 > $modSettings['settings_updated'])
sources/Load.php:elseif (($temp = cache_get_data('theme_settings-' . $id_theme, 90)) != null && time() - 60 > $modSettings['settings_updated'])
sources/Load.php:if (($temp = cache_get_data($cache_name, 600)) !== null)
sources/Load.php:if (($temp = cache_get_data($cache_name, 600)) !== null)
sources/Load.php:if (($boards = cache_get_data('board_parents-' . $id_parent, 480)) === null)
sources/Load.php:if (!$use_cache || ($languages = cache_get_data('known_languages', !empty($modSettings['cache_enable']) && $modSettings['cache_enable'] < 1 ? 86400 : 3600)) == null)
sources/ext/bad-behavior/badbehavior-plugin.php:if (($bb2_blocked = cache_get_data('bb2_blocked', 900)) === null)
sources/controllers/Stats.controller.php:if (($context['gender'] = cache_get_data('stats_gender', 240)) == null)
sources/controllers/Stats.controller.php:if (($members = cache_get_data('stats_top_starters', 360)) == null)
sources/controllers/Stats.controller.php:$temp = cache_get_data('stats_total_time_members', 600);
sources/controllers/ModerationCenter.controller.php:if (($watched_users = cache_get_data('recent_user_watches', 240)) === null)
sources/controllers/ModerationCenter.controller.php:if (($moderator_notes_total = cache_get_data('moderator_notes_total', 240)) === null)
sources/controllers/ModerationCenter.controller.php:if ($offset != 0 || ($moderator_notes = cache_get_data('moderator_notes', 240)) === null)
sources/controllers/ModerationCenter.controller.php:if (($reported_posts = cache_get_data('reported_posts_' . $cachekey, 90)) === null)
sources/controllers/MoveTopic.controller.php:if (!isset($context['response_prefix']) && !($context['response_prefix'] = cache_get_data('response_prefix')))
sources/controllers/Display.controller.php:if (!isset($context['response_prefix']) && !($context['response_prefix'] = cache_get_data('response_prefix', 600)))
sources/controllers/Who.controller.php:if (($mods = cache_get_data('mods_credits', 86400)) === null)
sources/controllers/Post.controller.php:if (!isset($context['response_prefix']) && !($context['response_prefix'] = cache_get_data('response_prefix')))
sources/controllers/Post.controller.php:if (!isset($context['response_prefix']) && !($context['response_prefix'] = cache_get_data('response_prefix')))
sources/controllers/MergeTopics.controller.php:if (!isset($context['response_prefix']) && !($context['response_prefix'] = cache_get_data('response_prefix')))
sources/controllers/Recent.controller.php:if (empty($modSettings['cache_enable']) || ($messages = cache_get_data($key, 120)) == null)
sources/controllers/PersonalMessage.controller.php:if ($user_settings['new_pm'] || ($context['labels'] = cache_get_data('labelCounts:' . $user_info['id'], 720)) === null)
sources/controllers/PersonalMessage.controller.php:if (!isset($context['response_prefix']) && !($context['response_prefix'] = cache_get_data('response_prefix')))
sources/controllers/News.controller.php:$xml = cache_get_data('xmlfeed-' . $xml_format . ':' . ($user_info['is_guest'] ? '' : $user_info['id'] . '-') . $cachekey, 240);
sources/Subs.php:$postgroups = cache_get_data('updateStats:postgroups', 360);
sources/Subs.php:if (($temp = cache_get_data($cache_key, 240)) != null)
sources/Subs.php:if (($temp = cache_get_data('parsing_smileys', 480)) == null)
sources/Subs.php:if (($host = cache_get_data('hostlookup-' . $ip, 600)) !== null)
sources/Subs.php:if (($menu_buttons = cache_get_data('menu_buttons-' . implode('_', $user_info['groups']) . '-' . $user_info['language'], $cacheTime)) === null || time() - $cacheTime <= $modSettings['settings_updated'])
sources/Logging.php:$do_delete = cache_get_data('log_online-update', 30) < time() - 30;
sources/Errors.php:if (($temp = cache_get_data('db_last_error', 600)) !== null)
sources/Errors.php:if (($temp = cache_get_data('db_last_error', 600)) === null)
sources/database/Db-mysql.subs.php:if (function_exists('cache_get_data') && (!isset($modSettings['autoFixDatabase']) || $modSettings['autoFixDatabase'] == '1'))
sources/database/Db-mysql.subs.php:if (($temp = cache_get_data('db_last_error', 600)) !== null)
sources/database/Db-mysql.subs.php:if (($temp = cache_get_data('db_last_error', 600)) === null)
sources/subs/MembersOnline.subs.php:if (($temp = cache_get_data('membersOnlineStats-' . $membersOnlineOptions['sort'], 240)) !== null)
sources/subs/SearchAPI-Sphinxql.class.php:if (($cached_results = cache_get_data('searchql_results_' . md5($user_info['query_see_board'] . '_' . $context['params']))) === null)
sources/subs/SearchEngines.subs.php:if (($spider_data = cache_get_data('spider_search', 300)) === null)
sources/subs/Attachments.subs.php:if (($cache = cache_get_data('getAvatar_id-' . $id_attach)) !== null)
sources/subs/Attachments.subs.php:if (($temp = cache_get_data('url_image_size-' . md5($url), 240)) !== null)
sources/subs/Cache.subs.php:if (empty($modSettings['cache_enable']) || $modSettings['cache_enable'] < $level || !is_array($cache_block = cache_get_data($key, 3600)) || (!empty($cache_block['refresh_eval']) && eval($cache_block['refresh_eval'])) || (!empty($cache_block['expires']) && $cache_block['expires'] < time()))
sources/subs/Cache.subs.php:call_integration_hook('cache_get_data', array($key, $ttl, $value));
sources/subs/Sound.subs.php:if (($ip = cache_get_data('wave_file/' . $user_info['ip'], 20)) > 2 || ($ip2 = cache_get_data('wave_file/' . $user_info['ip2'], 20)) > 2)
Binary file sources/subs/.Editor.subs.php.swp matches
sources/subs/Moderation.subs.php:$temp = cache_get_data('num_menu_errors', 900);
sources/subs/SearchAPI-Sphinx.class.php:if (($cached_results = cache_get_data('search_results_' . md5($user_info['query_see_board'] . '_' . $context['params']))) === null)
sources/subs/Editor.subs.php:if (($temp = cache_get_data('posting_icons-' . $board_id, 480)) == null)
sources/subs/Editor.subs.php:if (($temp = cache_get_data('posting_smileys', 480)) == null)
sources/subs/Editor.subs.php:if (($modSettings['question_id_cache'] = cache_get_data('verificationQuestionIds', 300)) == null)
sources/subs/PersonalMessage.subs.php:elseif (($context['message_limit'] = cache_get_data('msgLimit:' . $user_info['id'], 360)) === null)
For the moment that's all, lunch time...almost. :P
Bugs creator.
Features destroyer.
Template killer.

Re: cache

Reply #1

Absolutely not caching enough. I think a big issue is what gets cached and for how long. Caching needs a good overhaul.

Caching shouldn't just be to cache an entire query, it should be to cache the objects/resources that the query fetches. For instance, each message should be cached as a message instead of just the entire topic (or similar). Then use multi-gets (or emulate that for caches that don't have it). More fine-grained controls so an admin could determine how long to cache certain groups of items would help them tune for staleness and performance. I have been thinking about this a lot lately but it would require a lot of changes. We need groups, not levels.

There are a bunch of topics about this in the dev (private) board but here is one in particular http://www.simplemachines.org/community/index.php?topic=268579

Re: cache

Reply #2

QuoteCaching shouldn't just be to cache an entire query

There are very few places in SMF where it caches just the query's output.

QuoteFor instance, each message should be cached as a message instead of just the entire topic (or similar).

Last I checked it was pretty much done that way.

QuoteI think a big issue is what gets cached and for how long. Caching needs a good overhaul.

True enough. However caching everything isn't really an answer because partly you introduce lots of stale data points and partly because if you try to cache too much you can end up pushing stuff that's actually performance critical out of the cache.

Re: cache

Reply #3

One thing I always wondered (not having much experience of system administration) is what is the average amount of cache available.

The file cache is obviously bound to the space available.
But the other caches?
There is a limit (obviously I would say), but are we talking about 1 MB? 10 MB? 100 MB? 1GB?
Bugs creator.
Features destroyer.
Template killer.

Re: cache

Reply #4

The others are what they are configured to have. But remember that if you're using something like memcached, it's typically an entire server, and that server will be shared with other applications.

Re: cache

Reply #5

Arantor, stale data is largely in part because of the lack of invalidation. Not everything needs to be invalidated constantly, but definitely major things. When you make a post, invalidate all of the keys that it relates to. Invalidating a cache is a lot less work than it is to validate a post or insert it in the database. Then, on top of that, you make the cached objects a lot more tunable via "groups" of cached items. Groups could be like settings, users, events, topics, posts, boards. I think every query to the database should be named. That way you allow plugins to rewrite, change, or just cache them.

Oh, and about that caching... https://github.com/elkarte/Elkarte/blob/master/sources/subs/Messages.subs.php#L90 https://github.com/elkarte/Elkarte/blob/master/sources/subs/Messages.subs.php#L36 https://github.com/elkarte/Elkarte/blob/master/sources/Load.php#L326 https://github.com/elkarte/Elkarte/blob/master/sources/controllers/Display.controller.php#L149 https://github.com/elkarte/Elkarte/blob/master/sources/controllers/Display.controller.php#L644 and I could keep going on and on.

Emanuele, the size is really irrelevant. If you have 100MB or 100GB, you fill it up and then evict items when it gets full. If you are constantly evicting items before you get to use them, that is obviously a bad thing. If that happens, you should probably not be using caching or cache less. If you're going with the cache less scenario, an architecture that allows you to turn off what to cache would be the easiest solution.

Just brainstorming on ways to do groups. Add another parameter (a lot of work), make the keys arrays (might make it harder to understand multi-get), create a list of groups with keys in them (a lot of work and slow), prefix keys with a group name and tokenize on that (a lot of work, but might be the safest and fastest). With the prefix idea, you might already have some of the groups laid out for you "member", "topic", "msg".

Re: cache

Reply #6

Oh, and for the size of the cache - if you are really interested in caching and you really need it for your forum to actually work, you should have a decent amount of RAM anyway. RAM is cheap compared to everything else that improves performance. So, if you can cache more without making everything look stale, cache more... cache a LOT more.

Re: cache

Reply #7

Quote When you make a post, invalidate all of the keys that it relates to.

Not possible because in the current setup you actually can't know all the keys it will relate to without multiple database queries.

Also note that depending on what you're doing, that might almost be better off not being cached in the first place. Invalidating a cache is not free, especially on the higher end caches.

QuoteThen, on top of that, you make the cached objects a lot more tunable via "groups" of cached items.

That would certainly work better given the above.

QuoteOh, and for the size of the cache - if you are really interested in caching and you really need it for your forum to actually work, you should have a decent amount of RAM anyway.

That's fine but are the typical users of Elkarte actually running on VPSes with a ton of RAM and external caching facilities? The answer is no, they are likely not. In which case you may actually be hurting performance than helping it by forcing a disk cache for some things, simply because some stuff will be helped by the query cache which shared users will have zero control over.

It's a tough one to call but a blanket statement of 'just cache as much as possible all the time' is actually somewhat naive. Mind you, it's a hell of a lot better than some of the optimisations I saw you suggest at sm.org (like optimising everything to pass everything around by reference)

I'm not saying it shouldn't be done, but it should be done carefully rather than enthusiastically.

Re: cache

Reply #8

I wouldn't say it isn't possible, but right now it might be more difficult. With refactoring and getting the keys that get set closer together (in the code), it will be much easier. The refactoring that these guys are doing right now with moving disjointed queries to functions is going to help with that tremendously. Now, just to put those queries in the functions that are being called. All of the reorganization makes it much easier to work with though.

In the bigger picture of what is going to take the longest, invalidating a cache is the least of the concerns when adding content. If we're talking about number of view on a topic, "who's viewing", or "who's online" well, that is different and doesn't really matter (or shouldn't). I'm talking about adding content - events, posts, topics, members. I think the most complained about stale cache issue is when a person views a topic and then it is still marked as not read. That causes issues with the user so it should be invalidated. Instead of recaching the entire topic, you would only cache if the topic has been read (key: "topic-read:{id_member}.{id_topic}.{date-time}") or you could keep that as a set of the most recently read topics.

Forcing a disk cache? If they are on shared hosting and don't have a caching application, how are they going to do caching? I am not talking about caching to disk. If they are on shared hosting and have a caching mechanism (even a MEMORY table in MySQL) they are counted in this.

No idea what you're talking about the reference thing. I would be interested to read back on whatever I was talking about. I have changed a lot in what I consider optimizations. First thing I optimize these days is time to develop. My time is much more valuable than the cost of hardware in most cases.

Cache aggressively if you want a performant system... or don't if you don't want to. That's why I think groups is the best way forward.

Re: cache

Reply #9

QuoteCache aggressively if you want a performant system... or don't if you don't want to. That's why I think groups is the best way forward.

I would generally agree with you. But if you're aggressively caching, you don't trigger cache expiry, it reaches its TTL and expires manually. The whole point of caching as much as possible is to minimise direct queries, and implicitly stale data is part of that deal.

QuoteI think the most complained about stale cache issue is when a person views a topic and then it is still marked as not read. That causes issues with the user so it should be invalidated. Instead of recaching the entire topic, you would only cache if the topic has been read (key: "topic-read:{id_member}.{id_topic}.{date-time}") or you could keep that as a set of the most recently read topics.

Doesn't work like that. In fact it's never worked like that. If it's still being marked read, it's because the entire list of topics is still being cached. Not the topic itself.

In the worst complained about cases, it's the list of topics on the board index. Not the message index or anywhere else. Though they have their own caching issues to contend with if they're cached because you have absolutely no way to ensure you invalidate caching properly, because in some cases you're going to need to be invalidating keys based on user preferences. Which means unless you load the user preferences...

QuoteNo idea what you're talking about the reference thing. I would be interested to read back on whatever I was talking about. I have changed a lot in what I consider optimizations. First thing I optimize these days is time to develop. My time is much more valuable than the cost of hardware in most cases.

You even posted it as a 600-odd KB patch to Mantis.

Re: cache

Reply #10

Sidebar: the hovering message controls don't always show up. Anyone else notice that? Also, I really wish I could have a highlighter that quotes stuff.

Okay, board notifications would be another place where the cache would need to be updated. Say we had a group named "topic-view-notifications", you could enable or disable it for that. You could quickly and easily delete/set/get all of these keys with delete/set/get multi functions in memcached. Say they are using APC, then the possible network overhead becomes irrelevant and all it is doing is checking RAM locally (10ns). I am a little confused what you mean by loading user preferences. If the user is viewing the page, their preferences are loaded, right?

I am not saying you are wrong. I agree that getting stale data is bad. I agree that caching without using the cache is bad. The hardest part about the entire idea of caching more is making it tunable. With an interface to make fine-grained tuning, you have the ability to weigh what you can consider stale and for how long. The hardest part about the tuning interface is deciding what is too much control and what is too little.

Re: cache

Reply #11

The thing is, you're presuming that users have a decent cache mechanism, but studying the sm.org support boards would show that the majority of users do not have such luxuries to play with. That's one of the advantages of the level system, level 2 and up shouldn't be handled if the file cache is in play.

As far as user preferences, sure, the current user's preferences are loaded. But that's actually irrelevant. The very best example I have for you on this subject is something in Wedge - we need to get the likes per post, and currently they're not cached anywhere because it would really suck if you were to hit the like button and it be cached so you wouldn't see it.

Here's the preferences issue coming into play: the number of posts per page is a user preference as is the order of posts. Thus you cannot safely cache (or flush the cache) of that. You can't even say 'flush the cache of likes for topic 1, starting from 0 posts' because in some cases that's going to flush 10 posts' worth, sometimes it'll be 15 posts. Unless everyone has the same values, you can't guarantee that what values you have are the same values that everyone else has which means you're not going to be able to clear the cache properly, though in that case it's not so bad because we can show the current user the right values, but there's going to be all kinds of edge cases for that stale data.

Re: cache

Reply #12

The level system can somewhat remain. Though, going too far with that and it makes the entire thing complex. So, maybe not even a level system but "enabled advanced cache controls (not recommended for your forum)". Dunno how to handle users that refuse to read or do things and don't realize the consequences. If you enable caching and see less performance, you should disable it. That kind of makes sense to me but users don't always think.

"the number of posts per page is a user preference as is the order of posts. Thus you cannot safely cache (or flush the cache) of that" What? Easy key: "member-pref-23". Why would you need to invalidate their preferences without them visiting the site? If you want to flush the cache, flush the entire thing or invalidate all of the keys by changing the salt (which could be done by group).

Maybe you are thinking that you would create a list of messages for the user before they viewed the page? No. The messages get cached individually. If we were to cache the topic we'd cache the list of message ids (not the message) in one key "topic-1235" => {id, info, msg_count, messages: [1,3,66,33]}. Then use array_splice() to get the page and use cache->get($spliced_array) to get all of the topics on the page. When you loaded ?topic=322 you would load (not all inclusive): settings, member-33, membergroups, topic-322, boards, board-1, message-[2107,2106,2105,2103,2102,etc] (that is a list of message-{id}), member-[26,2], topic-seen-33-322, board-seen-33-1 and some more.

Re: cache

Reply #13

QuoteWhy would you need to invalidate their preferences without them visiting the site? If you want to flush the cache, flush the entire thing or invalidate all of the keys by changing the salt (which could be done by group).

Irrelevant to what I was saying.

QuoteMaybe you are thinking that you would create a list of messages for the user before they viewed the page?

Half way there.

Quotemessage-[2107,2106,2105,2103,2102,etc] (that is a list of message-{id})

How do you invalidate that cache? This is my point: you CAN'T.

The list of message ids that a user will see will be determined by their preferences. So if you update something on one of those, you can only invalidate the cache on that list of messages because that's all you *have*.

So either you nuke the entire topic's worth of message data, or you're guaranteed to have stale data.

One user might only see messages 1, 2, 3, 4 and 5. Another might see 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. If you do an operation as the first user, the second user still has stale data! That's what I mean about member preferences being a factor: it means there's a reasonable chance of not having the complete (or correct) list of ids to be able to invalidate.

Re: cache

Reply #14

Okay, I see you are missing what I am saying. You don't cache message-{mem}-{msg}. You cache the whole list of messages for a topic. What the user is seeing is irrelevant to what gets cached. When the topic is changed, the topic gets updated. If a message is changed the message gets updated. So, the only time you are changing that list is when you add to or take away messages from that topic. Even if there are a lot of messages being added, creating that list isn't a big operation. An added benefit is that databases are hard to scale out. PHP & memcached are really easy to scale out. So you push more processing (but not by much) to PHP and get more from memcached.

You don't need to create anything before the page is viewed. If your settings are to go back to the topic after posting, you might want to cache that data on the post page. Otherwise, it will get cached when you visit the page. Not a big deal either way though. You aren't caching to save 1 query, you cache to save many operations.

Do you understand invalidating a key now? If not, I will break it down to code level but that will probably be much later. Or, go on IRC and we can talk through it.