ElkArte Community

Elk Development => Feature Discussion => Topic started by: emanuele on April 22, 2013, 05:29:28 am

Title: cache
Post by: emanuele on April 22, 2013, 05:29:28 am
Reading Kays and Arantor (http://www.simplemachines.org/community/index.php?topic=498146.msg3530698#msg3530698) discussion on caching the menu, and following my own doubts about the cache, I have a couple of questions:
are we caching enough?
are we caching meaningful things?
The menu appears to be not the most important thing to cache.

This is the list of things we are currently caching:
sources/Load.php:if (($modSettings = cache_get_data('modSettings', 90)) == null)
sources/Load.php:if (($modSettings['load_average'] = cache_get_data('loadavg', 90)) == null)
sources/Load.php:if (empty($modSettings['cache_enable']) || $modSettings['cache_enable'] < 2 || ($user_settings = cache_get_data('user_settings-' . $id_member, 60)) == null)
sources/Load.php:if (ELKARTE != 'SSI' && !isset($_REQUEST['xml']) && (!isset($_REQUEST['action']) || $_REQUEST['action'] != '.xml') && empty($_SESSION['id_msg_last_visit']) && (empty($modSettings['cache_enable']) || ($_SESSION['id_msg_last_visit'] = cache_get_data('user_last_visit-' . $id_member, 5 * 3600)) === null))
sources/Load.php:if (($topic = cache_get_data('msg_topic-' . $_REQUEST['msg'], 120)) === NULL)
sources/Load.php:$temp = cache_get_data('topic_board-' . $topic, 120);
sources/Load.php:$temp = cache_get_data('board-' . $board, 120);
sources/Load.php:if ($modSettings['cache_enable'] >= 2 && !empty($board) && ($temp = cache_get_data('permissions:' . $cache_groups . ':' . $board, 240)) != null && time() - 240 > $modSettings['settings_updated'])
sources/Load.php:elseif (($temp = cache_get_data('permissions:' . $cache_groups, 240)) != null && time() - 240 > $modSettings['settings_updated'])
sources/Load.php:$data = cache_get_data('member_data-' . $set . '-' . $users[$i], 240);
sources/Load.php:if (($row = cache_get_data('moderator_group_info', 480)) == null)
sources/Load.php:if (!empty($modSettings['cache_enable']) && $modSettings['cache_enable'] >= 2 && ($temp = cache_get_data('theme_settings-' . $id_theme . ':' . $member, 60)) != null && time() - 60 > $modSettings['settings_updated'])
sources/Load.php:elseif (($temp = cache_get_data('theme_settings-' . $id_theme, 90)) != null && time() - 60 > $modSettings['settings_updated'])
sources/Load.php:if (($temp = cache_get_data($cache_name, 600)) !== null)
sources/Load.php:if (($temp = cache_get_data($cache_name, 600)) !== null)
sources/Load.php:if (($boards = cache_get_data('board_parents-' . $id_parent, 480)) === null)
sources/Load.php:if (!$use_cache || ($languages = cache_get_data('known_languages', !empty($modSettings['cache_enable']) && $modSettings['cache_enable'] < 1 ? 86400 : 3600)) == null)
sources/ext/bad-behavior/badbehavior-plugin.php:if (($bb2_blocked = cache_get_data('bb2_blocked', 900)) === null)
sources/controllers/Stats.controller.php:if (($context['gender'] = cache_get_data('stats_gender', 240)) == null)
sources/controllers/Stats.controller.php:if (($members = cache_get_data('stats_top_starters', 360)) == null)
sources/controllers/Stats.controller.php:$temp = cache_get_data('stats_total_time_members', 600);
sources/controllers/ModerationCenter.controller.php:if (($watched_users = cache_get_data('recent_user_watches', 240)) === null)
sources/controllers/ModerationCenter.controller.php:if (($moderator_notes_total = cache_get_data('moderator_notes_total', 240)) === null)
sources/controllers/ModerationCenter.controller.php:if ($offset != 0 || ($moderator_notes = cache_get_data('moderator_notes', 240)) === null)
sources/controllers/ModerationCenter.controller.php:if (($reported_posts = cache_get_data('reported_posts_' . $cachekey, 90)) === null)
sources/controllers/MoveTopic.controller.php:if (!isset($context['response_prefix']) && !($context['response_prefix'] = cache_get_data('response_prefix')))
sources/controllers/Display.controller.php:if (!isset($context['response_prefix']) && !($context['response_prefix'] = cache_get_data('response_prefix', 600)))
sources/controllers/Who.controller.php:if (($mods = cache_get_data('mods_credits', 86400)) === null)
sources/controllers/Post.controller.php:if (!isset($context['response_prefix']) && !($context['response_prefix'] = cache_get_data('response_prefix')))
sources/controllers/Post.controller.php:if (!isset($context['response_prefix']) && !($context['response_prefix'] = cache_get_data('response_prefix')))
sources/controllers/MergeTopics.controller.php:if (!isset($context['response_prefix']) && !($context['response_prefix'] = cache_get_data('response_prefix')))
sources/controllers/Recent.controller.php:if (empty($modSettings['cache_enable']) || ($messages = cache_get_data($key, 120)) == null)
sources/controllers/PersonalMessage.controller.php:if ($user_settings['new_pm'] || ($context['labels'] = cache_get_data('labelCounts:' . $user_info['id'], 720)) === null)
sources/controllers/PersonalMessage.controller.php:if (!isset($context['response_prefix']) && !($context['response_prefix'] = cache_get_data('response_prefix')))
sources/controllers/News.controller.php:$xml = cache_get_data('xmlfeed-' . $xml_format . ':' . ($user_info['is_guest'] ? '' : $user_info['id'] . '-') . $cachekey, 240);
sources/Subs.php:$postgroups = cache_get_data('updateStats:postgroups', 360);
sources/Subs.php:if (($temp = cache_get_data($cache_key, 240)) != null)
sources/Subs.php:if (($temp = cache_get_data('parsing_smileys', 480)) == null)
sources/Subs.php:if (($host = cache_get_data('hostlookup-' . $ip, 600)) !== null)
sources/Subs.php:if (($menu_buttons = cache_get_data('menu_buttons-' . implode('_', $user_info['groups']) . '-' . $user_info['language'], $cacheTime)) === null || time() - $cacheTime <= $modSettings['settings_updated'])
sources/Logging.php:$do_delete = cache_get_data('log_online-update', 30) < time() - 30;
sources/Errors.php:if (($temp = cache_get_data('db_last_error', 600)) !== null)
sources/Errors.php:if (($temp = cache_get_data('db_last_error', 600)) === null)
sources/database/Db-mysql.subs.php:if (function_exists('cache_get_data') && (!isset($modSettings['autoFixDatabase']) || $modSettings['autoFixDatabase'] == '1'))
sources/database/Db-mysql.subs.php:if (($temp = cache_get_data('db_last_error', 600)) !== null)
sources/database/Db-mysql.subs.php:if (($temp = cache_get_data('db_last_error', 600)) === null)
sources/subs/MembersOnline.subs.php:if (($temp = cache_get_data('membersOnlineStats-' . $membersOnlineOptions['sort'], 240)) !== null)
sources/subs/SearchAPI-Sphinxql.class.php:if (($cached_results = cache_get_data('searchql_results_' . md5($user_info['query_see_board'] . '_' . $context['params']))) === null)
sources/subs/SearchEngines.subs.php:if (($spider_data = cache_get_data('spider_search', 300)) === null)
sources/subs/Attachments.subs.php:if (($cache = cache_get_data('getAvatar_id-' . $id_attach)) !== null)
sources/subs/Attachments.subs.php:if (($temp = cache_get_data('url_image_size-' . md5($url), 240)) !== null)
sources/subs/Cache.subs.php:if (empty($modSettings['cache_enable']) || $modSettings['cache_enable'] < $level || !is_array($cache_block = cache_get_data($key, 3600)) || (!empty($cache_block['refresh_eval']) && eval($cache_block['refresh_eval'])) || (!empty($cache_block['expires']) && $cache_block['expires'] < time()))
sources/subs/Cache.subs.php:call_integration_hook('cache_get_data', array($key, $ttl, $value));
sources/subs/Sound.subs.php:if (($ip = cache_get_data('wave_file/' . $user_info['ip'], 20)) > 2 || ($ip2 = cache_get_data('wave_file/' . $user_info['ip2'], 20)) > 2)
Binary file sources/subs/.Editor.subs.php.swp matches
sources/subs/Moderation.subs.php:$temp = cache_get_data('num_menu_errors', 900);
sources/subs/SearchAPI-Sphinx.class.php:if (($cached_results = cache_get_data('search_results_' . md5($user_info['query_see_board'] . '_' . $context['params']))) === null)
sources/subs/Editor.subs.php:if (($temp = cache_get_data('posting_icons-' . $board_id, 480)) == null)
sources/subs/Editor.subs.php:if (($temp = cache_get_data('posting_smileys', 480)) == null)
sources/subs/Editor.subs.php:if (($modSettings['question_id_cache'] = cache_get_data('verificationQuestionIds', 300)) == null)
sources/subs/PersonalMessage.subs.php:elseif (($context['message_limit'] = cache_get_data('msgLimit:' . $user_info['id'], 360)) === null)
For the moment that's all, lunch time...almost. :P
Title: Re: cache
Post by: Joshua Dickerson on April 24, 2013, 02:20:26 pm
Absolutely not caching enough. I think a big issue is what gets cached and for how long. Caching needs a good overhaul.

Caching shouldn't just be to cache an entire query, it should be to cache the objects/resources that the query fetches. For instance, each message should be cached as a message instead of just the entire topic (or similar). Then use multi-gets (or emulate that for caches that don't have it). More fine-grained controls so an admin could determine how long to cache certain groups of items would help them tune for staleness and performance. I have been thinking about this a lot lately but it would require a lot of changes. We need groups, not levels.

There are a bunch of topics about this in the dev (private) board but here is one in particular http://www.simplemachines.org/community/index.php?topic=268579
Title: Re: cache
Post by: Arantor on April 24, 2013, 03:23:01 pm
QuoteCaching shouldn't just be to cache an entire query

There are very few places in SMF where it caches just the query's output.

QuoteFor instance, each message should be cached as a message instead of just the entire topic (or similar).

Last I checked it was pretty much done that way.

QuoteI think a big issue is what gets cached and for how long. Caching needs a good overhaul.

True enough. However caching everything isn't really an answer because partly you introduce lots of stale data points and partly because if you try to cache too much you can end up pushing stuff that's actually performance critical out of the cache.
Title: Re: cache
Post by: emanuele on April 24, 2013, 06:16:12 pm
One thing I always wondered (not having much experience of system administration) is what is the average amount of cache available.

The file cache is obviously bound to the space available.
But the other caches?
There is a limit (obviously I would say), but are we talking about 1 MB? 10 MB? 100 MB? 1GB?
Title: Re: cache
Post by: Arantor on April 24, 2013, 08:45:03 pm
The others are what they are configured to have. But remember that if you're using something like memcached, it's typically an entire server, and that server will be shared with other applications.
Title: Re: cache
Post by: Joshua Dickerson on April 26, 2013, 06:09:55 pm
Arantor, stale data is largely in part because of the lack of invalidation. Not everything needs to be invalidated constantly, but definitely major things. When you make a post, invalidate all of the keys that it relates to. Invalidating a cache is a lot less work than it is to validate a post or insert it in the database. Then, on top of that, you make the cached objects a lot more tunable via "groups" of cached items. Groups could be like settings, users, events, topics, posts, boards. I think every query to the database should be named. That way you allow plugins to rewrite, change, or just cache them.

Oh, and about that caching... https://github.com/elkarte/Elkarte/blob/master/sources/subs/Messages.subs.php#L90 https://github.com/elkarte/Elkarte/blob/master/sources/subs/Messages.subs.php#L36 https://github.com/elkarte/Elkarte/blob/master/sources/Load.php#L326 https://github.com/elkarte/Elkarte/blob/master/sources/controllers/Display.controller.php#L149 https://github.com/elkarte/Elkarte/blob/master/sources/controllers/Display.controller.php#L644 and I could keep going on and on.

Emanuele, the size is really irrelevant. If you have 100MB or 100GB, you fill it up and then evict items when it gets full. If you are constantly evicting items before you get to use them, that is obviously a bad thing. If that happens, you should probably not be using caching or cache less. If you're going with the cache less scenario, an architecture that allows you to turn off what to cache would be the easiest solution.

Just brainstorming on ways to do groups. Add another parameter (a lot of work), make the keys arrays (might make it harder to understand multi-get), create a list of groups with keys in them (a lot of work and slow), prefix keys with a group name and tokenize on that (a lot of work, but might be the safest and fastest). With the prefix idea, you might already have some of the groups laid out for you "member", "topic", "msg".
Title: Re: cache
Post by: Joshua Dickerson on April 26, 2013, 06:12:20 pm
Oh, and for the size of the cache - if you are really interested in caching and you really need it for your forum to actually work, you should have a decent amount of RAM anyway. RAM is cheap compared to everything else that improves performance. So, if you can cache more without making everything look stale, cache more... cache a LOT more.
Title: Re: cache
Post by: Arantor on April 26, 2013, 06:46:41 pm
Quote When you make a post, invalidate all of the keys that it relates to.

Not possible because in the current setup you actually can't know all the keys it will relate to without multiple database queries.

Also note that depending on what you're doing, that might almost be better off not being cached in the first place. Invalidating a cache is not free, especially on the higher end caches.

QuoteThen, on top of that, you make the cached objects a lot more tunable via "groups" of cached items.

That would certainly work better given the above.

QuoteOh, and for the size of the cache - if you are really interested in caching and you really need it for your forum to actually work, you should have a decent amount of RAM anyway.

That's fine but are the typical users of Elkarte actually running on VPSes with a ton of RAM and external caching facilities? The answer is no, they are likely not. In which case you may actually be hurting performance than helping it by forcing a disk cache for some things, simply because some stuff will be helped by the query cache which shared users will have zero control over.

It's a tough one to call but a blanket statement of 'just cache as much as possible all the time' is actually somewhat naive. Mind you, it's a hell of a lot better than some of the optimisations I saw you suggest at sm.org (like optimising everything to pass everything around by reference)

I'm not saying it shouldn't be done, but it should be done carefully rather than enthusiastically.
Title: Re: cache
Post by: Joshua Dickerson on April 27, 2013, 05:44:13 am
I wouldn't say it isn't possible, but right now it might be more difficult. With refactoring and getting the keys that get set closer together (in the code), it will be much easier. The refactoring that these guys are doing right now with moving disjointed queries to functions is going to help with that tremendously. Now, just to put those queries in the functions that are being called. All of the reorganization makes it much easier to work with though.

In the bigger picture of what is going to take the longest, invalidating a cache is the least of the concerns when adding content. If we're talking about number of view on a topic, "who's viewing", or "who's online" well, that is different and doesn't really matter (or shouldn't). I'm talking about adding content - events, posts, topics, members. I think the most complained about stale cache issue is when a person views a topic and then it is still marked as not read. That causes issues with the user so it should be invalidated. Instead of recaching the entire topic, you would only cache if the topic has been read (key: "topic-read:{id_member}.{id_topic}.{date-time}") or you could keep that as a set of the most recently read topics.

Forcing a disk cache? If they are on shared hosting and don't have a caching application, how are they going to do caching? I am not talking about caching to disk. If they are on shared hosting and have a caching mechanism (even a MEMORY table in MySQL) they are counted in this.

No idea what you're talking about the reference thing. I would be interested to read back on whatever I was talking about. I have changed a lot in what I consider optimizations. First thing I optimize these days is time to develop. My time is much more valuable than the cost of hardware in most cases.

Cache aggressively if you want a performant system... or don't if you don't want to. That's why I think groups is the best way forward.
Title: Re: cache
Post by: Arantor on April 27, 2013, 11:48:40 am
QuoteCache aggressively if you want a performant system... or don't if you don't want to. That's why I think groups is the best way forward.

I would generally agree with you. But if you're aggressively caching, you don't trigger cache expiry, it reaches its TTL and expires manually. The whole point of caching as much as possible is to minimise direct queries, and implicitly stale data is part of that deal.

QuoteI think the most complained about stale cache issue is when a person views a topic and then it is still marked as not read. That causes issues with the user so it should be invalidated. Instead of recaching the entire topic, you would only cache if the topic has been read (key: "topic-read:{id_member}.{id_topic}.{date-time}") or you could keep that as a set of the most recently read topics.

Doesn't work like that. In fact it's never worked like that. If it's still being marked read, it's because the entire list of topics is still being cached. Not the topic itself.

In the worst complained about cases, it's the list of topics on the board index. Not the message index or anywhere else. Though they have their own caching issues to contend with if they're cached because you have absolutely no way to ensure you invalidate caching properly, because in some cases you're going to need to be invalidating keys based on user preferences. Which means unless you load the user preferences...

QuoteNo idea what you're talking about the reference thing. I would be interested to read back on whatever I was talking about. I have changed a lot in what I consider optimizations. First thing I optimize these days is time to develop. My time is much more valuable than the cost of hardware in most cases.

You even posted it as a 600-odd KB patch to Mantis.
Title: Re: cache
Post by: Joshua Dickerson on April 27, 2013, 03:36:44 pm
Sidebar: the hovering message controls don't always show up. Anyone else notice that? Also, I really wish I could have a highlighter that quotes stuff.

Okay, board notifications would be another place where the cache would need to be updated. Say we had a group named "topic-view-notifications", you could enable or disable it for that. You could quickly and easily delete/set/get all of these keys with delete/set/get multi functions in memcached. Say they are using APC, then the possible network overhead becomes irrelevant and all it is doing is checking RAM locally (10ns). I am a little confused what you mean by loading user preferences. If the user is viewing the page, their preferences are loaded, right?

I am not saying you are wrong. I agree that getting stale data is bad. I agree that caching without using the cache is bad. The hardest part about the entire idea of caching more is making it tunable. With an interface to make fine-grained tuning, you have the ability to weigh what you can consider stale and for how long. The hardest part about the tuning interface is deciding what is too much control and what is too little.
Title: Re: cache
Post by: Arantor on April 27, 2013, 03:50:51 pm
The thing is, you're presuming that users have a decent cache mechanism, but studying the sm.org support boards would show that the majority of users do not have such luxuries to play with. That's one of the advantages of the level system, level 2 and up shouldn't be handled if the file cache is in play.

As far as user preferences, sure, the current user's preferences are loaded. But that's actually irrelevant. The very best example I have for you on this subject is something in Wedge - we need to get the likes per post, and currently they're not cached anywhere because it would really suck if you were to hit the like button and it be cached so you wouldn't see it.

Here's the preferences issue coming into play: the number of posts per page is a user preference as is the order of posts. Thus you cannot safely cache (or flush the cache) of that. You can't even say 'flush the cache of likes for topic 1, starting from 0 posts' because in some cases that's going to flush 10 posts' worth, sometimes it'll be 15 posts. Unless everyone has the same values, you can't guarantee that what values you have are the same values that everyone else has which means you're not going to be able to clear the cache properly, though in that case it's not so bad because we can show the current user the right values, but there's going to be all kinds of edge cases for that stale data.
Title: Re: cache
Post by: Joshua Dickerson on April 27, 2013, 04:25:40 pm
The level system can somewhat remain. Though, going too far with that and it makes the entire thing complex. So, maybe not even a level system but "enabled advanced cache controls (not recommended for your forum)". Dunno how to handle users that refuse to read or do things and don't realize the consequences. If you enable caching and see less performance, you should disable it. That kind of makes sense to me but users don't always think.

"the number of posts per page is a user preference as is the order of posts. Thus you cannot safely cache (or flush the cache) of that" What? Easy key: "member-pref-23". Why would you need to invalidate their preferences without them visiting the site? If you want to flush the cache, flush the entire thing or invalidate all of the keys by changing the salt (which could be done by group).

Maybe you are thinking that you would create a list of messages for the user before they viewed the page? No. The messages get cached individually. If we were to cache the topic we'd cache the list of message ids (not the message) in one key "topic-1235" => {id, info, msg_count, messages: [1,3,66,33]}. Then use array_splice() to get the page and use cache->get($spliced_array) to get all of the topics on the page. When you loaded ?topic=322 you would load (not all inclusive): settings, member-33, membergroups, topic-322, boards, board-1, message-[2107,2106,2105,2103,2102,etc] (that is a list of message-{id}), member-[26,2], topic-seen-33-322, board-seen-33-1 and some more.
Title: Re: cache
Post by: Arantor on April 27, 2013, 04:56:55 pm
QuoteWhy would you need to invalidate their preferences without them visiting the site? If you want to flush the cache, flush the entire thing or invalidate all of the keys by changing the salt (which could be done by group).

Irrelevant to what I was saying.

QuoteMaybe you are thinking that you would create a list of messages for the user before they viewed the page?

Half way there.

Quotemessage-[2107,2106,2105,2103,2102,etc] (that is a list of message-{id})

How do you invalidate that cache? This is my point: you CAN'T.

The list of message ids that a user will see will be determined by their preferences. So if you update something on one of those, you can only invalidate the cache on that list of messages because that's all you *have*.

So either you nuke the entire topic's worth of message data, or you're guaranteed to have stale data.

One user might only see messages 1, 2, 3, 4 and 5. Another might see 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. If you do an operation as the first user, the second user still has stale data! That's what I mean about member preferences being a factor: it means there's a reasonable chance of not having the complete (or correct) list of ids to be able to invalidate.
Title: Re: cache
Post by: Joshua Dickerson on April 27, 2013, 05:36:31 pm
Okay, I see you are missing what I am saying. You don't cache message-{mem}-{msg}. You cache the whole list of messages for a topic. What the user is seeing is irrelevant to what gets cached. When the topic is changed, the topic gets updated. If a message is changed the message gets updated. So, the only time you are changing that list is when you add to or take away messages from that topic. Even if there are a lot of messages being added, creating that list isn't a big operation. An added benefit is that databases are hard to scale out. PHP & memcached are really easy to scale out. So you push more processing (but not by much) to PHP and get more from memcached.

You don't need to create anything before the page is viewed. If your settings are to go back to the topic after posting, you might want to cache that data on the post page. Otherwise, it will get cached when you visit the page. Not a big deal either way though. You aren't caching to save 1 query, you cache to save many operations.

Do you understand invalidating a key now? If not, I will break it down to code level but that will probably be much later. Or, go on IRC and we can talk through it.
Title: Re: cache
Post by: Joshua Dickerson on April 27, 2013, 05:49:30 pm
If you wanted to sort from oldest to newest, you would reverse the array. If you wanted to sort by another method, you would have to create a different key to cache by. I don't know if you can sort by another method on a topic though.

Same goes with a list on a board. Most big boards disable sorting by other methods (or they should). So, you can store the first X number of topics in a list. The key would be something like "board-topics-{board}-{sort-method}". Same thing as with a topic, you do an array_splice() and get the topic and read status for each (as a multi-get). You compile a list of members on the entire page and at the end you do a multi-get of members.

The boardindex is easier. You store a single key called "board-tree" with relationships to each board. We do this because it is easy and can be used in many places. It would be an array of array( [id_cat] => array(name, icon, boards => array(id => array(name, icon, boards => (recursive))))). Then, on the board index you decide how many levels to show. You get all of the boards that you are going to show and then you do a multi-get for board-{id} board-{id}-last_topics-{id} board-{id}-{mem id}-read_status and maybe some board-member options.
Title: Re: cache
Post by: emanuele on April 27, 2013, 06:28:57 pm
But caching the entire topic wouldn't require quite a bit of memory the moment is loaded? (and even the moment is cached: you'd have to grab the entire topic with a single query)
For each and every topic page you should load the entire discussion in memory (i.e potentially thousands of messages) just for...20/25 entries...
Title: Re: cache
Post by: Joshua Dickerson on April 27, 2013, 06:40:56 pm
You aren't loading the entire topic. You are loading the message ids for the topic, extracting the messages you want, and then loading those messages. The memory is ridiculously small. Try it with 10k ids in an array:  [php]<?php $mem = memory_get_usage(); $array = array_fill(0,10000, rand(0,100000)); var_dump(memory_get_usage() - $mem);[/php]

The topic's messages could be cached one or more ways: 1) on insert/delete of a message to the topic 2) SELECT id_msg FROM topics WHERE id_topic = {id_topic} 3)  get the total number of messages in a topic, use the reply # as the key and if the page is before or after what is in the cache, load that page. I would prefer doing 1 & 2. #3 would be the most work to do and have the biggest potential for failure, but wouldn't require additional queries.
Title: Re: cache
Post by: emanuele on April 27, 2013, 06:43:21 pm
Ahh...okay, you were talking only about message ids...sorry I'm already in the bed and apparently I'm already sleeping... :P
Title: Re: cache
Post by: Joshua Dickerson on April 27, 2013, 07:12:24 pm
Yeah, lists of ids (index) and then you use those lists as a means to get the "full version". You still cache the full version.

I think Arantor has an issue with flushing the cache. You don't need to flush it, but you can quickly invalidate it. You would have two salts. One would be for the forum and the other would be for the group. I would keep them small - like <5 characters each and separate them with a simple character. For the topics group it would be 'a7z.yuP.topics.' would be the key prefix. Notice I used a delimiter instead of a fixed width. It could just as well be a fixed width and that would be fine.

Sometimes it takes a while to parse a message. They then get cached. In the message object, store the last time it was parsed. If it is empty, it doesn't need to be parsed because it runs quickly. If it isn't empty we check when the last time the BBC or any settings that would affect the topic were changed. If it was parsed before then, we reparse it and cache it. Otherwise, that's what we want. You might want to store a copy of it unparsed in another object or maybe vice versa.
Title: Re: cache
Post by: Arantor on April 27, 2013, 07:17:00 pm
No, you have an issue understanding what I'm trying to say (based on practical implementation issues rather than some theoretical notions)

I'm quite capable of invalidating a cache (hint: cache_put_data with a value of null). The problem is that in some of the cases I can't invalidate the cache that will actually be called upon. And caching the entire list of message ids is fine but I want to cache data about the messages.

For example, in the case I gave you, I want to cache the likes for each post and the names for them (so I can figure out if the current user is in the list of likers). I can't invalidate the entire topic's worth of likes just because I update one of them. Nor can I invalidate a 'page's worth' of likes because different users have different numbers of posts per page and therefore what one user sees is not the same subset of messages for the other.

The only thing that I could do in that situation is cache every post's likes individually but that has issues in terms of putting so much more in the cache than is practical.

But on this situation, you're convinced you know best - and were the entire user base all on VPSes and systems that have decent caches, I'd agree with some of the points you're making. But sadly most of the users aren't. And I still think you're talking theoretically rather than practically (as in actually benchmarking the differences etc.)
Title: Re: cache
Post by: Joshua Dickerson on April 27, 2013, 10:50:34 pm
Is the purpose of caching the likes to show the user if they've liked it or a list of people that have liked it? In the first instance I would store it as "topic-like-{id}-{mem}". That way you are only storing whether a person liked it when they visit the topic or when they see it. The other one I would keep a simple list of ids: topic-likes-{id}. The only time you'd need to invalidate it is when someone likes or unlikes it. Views to like/dislike is going to be much higher on any forum. You are going to have to do queries at some point. The point isn't to get rid of them, just minimize them.

You don't cache every post. You cache the ones that are being viewed. Chances are the ones that are being viewed are going to be viewed again sooner than one that isn't.

Why are you stuck on shared hosting for caching? How many shared hosts allow the admin to have a RAM based variable cache? I already said the file cache mechanism doesn't count. If you don't have a real cache, this doesn't apply and the cache->get() call returns null immediately and doesn't affect performance. If your cache is too small and you are having issues with too many evictions and you can't change that, use the tuning mechanism to adjust what gets cached and for how long. What do you see as needing benchmarking? Do you really want to see benchmarks of caching objects in RAM as opposed to database queries?

Mind you, I'm not arguing at all. I am just throwing out ideas and supporting them. I am working on something for work right now that is consuming too much of my time to do other development... Solr is a pain.
Title: Re: cache
Post by: Joshua Dickerson on April 27, 2013, 11:01:29 pm
To follow on to your like issue. If it is the message that is getting liked, you do it on the message, not the topic. So the key would be msg-likes-{id} => array(member-ids). When you load that topic, get all of the messages that you are going to display, you also look for all of the likes for those messages. Using multi-get is exactly how I took a script from 5 minutes and growing to under 30 seconds and pretty steady at my last employer. Although implementing it with Elkarte/SMF is theoretical at this point, the idea is very proven.

I do it with as many keys in one array as possible to minimize the amount of network IO I have, but you can split that up by msg and then msg-likes making two arrays.

Maybe you want to gist the like code and I will show you how I would do it with caching?
Title: Re: cache
Post by: TestMonkey on April 28, 2013, 12:03:46 pm
(I didn't read all topic, sorry, just to quick note FYI)
Re: cache with "groups". groundup, are you referring to tagged cache systems? If we are to add tags to cache entries, we need to redesign cache to support tagging.

On a flexible design of the cache system, I draw your attention to the following:
https://github.com/tedivm/fig-standards/tree/033e4bf15a4adf92fea09e469bd5655a626ccbfc/proposed

While I do not see the (too heavily OO) design in the proposal as suitable in our codebase, its main features may be worth a look. Note that I linked to an older version, which had also extensions folder, with stackable and taggable cache interfaces.

Same goes for the cache implementation library:
https://github.com/tedivm/Stash

Feel free to grab one and try to adapt to our needs, simplify it (IMO), benchmark, test, at your ease, and propose what of it and how we can integrate.
Title: Re: cache
Post by: Joshua Dickerson on April 28, 2013, 12:32:05 pm
Yes, tagging or grouping, whatever you want to call it. You could do it in the get/set functions without breaking anything.