cache

Topic: cache (Read 11130 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Re: cache

Reply #15 – April 27, 2013, 05:49:30 pm

If you wanted to sort from oldest to newest, you would reverse the array. If you wanted to sort by another method, you would have to create a different key to cache by. I don't know if you can sort by another method on a topic though.

Same goes with a list on a board. Most big boards disable sorting by other methods (or they should). So, you can store the first X number of topics in a list. The key would be something like "board-topics-{board}-{sort-method}". Same thing as with a topic, you do an array_splice() and get the topic and read status for each (as a multi-get). You compile a list of members on the entire page and at the end you do a multi-get of members.

The boardindex is easier. You store a single key called "board-tree" with relationships to each board. We do this because it is easy and can be used in many places. It would be an array of array( [id_cat] => array(name, icon, boards => array(id => array(name, icon, boards => (recursive))))). Then, on the board index you decide how many levels to show. You get all of the boards that you are going to show and then you do a multi-get for board-{id} board-{id}-last_topics-{id} board-{id}-{mem id}-read_status and maybe some board-member options.

Re: cache

Reply #16 – April 27, 2013, 06:28:57 pm

But caching the entire topic wouldn't require quite a bit of memory the moment is loaded? (and even the moment is cached: you'd have to grab the entire topic with a single query)
For each and every topic page you should load the entire discussion in memory (i.e potentially thousands of messages) just for...20/25 entries...

Re: cache

Reply #17 – April 27, 2013, 06:40:56 pm

You aren't loading the entire topic. You are loading the message ids for the topic, extracting the messages you want, and then loading those messages. The memory is ridiculously small. Try it with 10k ids in an array: [php]<?php $mem = memory_get_usage(); $array = array_fill(0,10000, rand(0,100000)); var_dump(memory_get_usage() - $mem);[/php]

The topic's messages could be cached one or more ways: 1) on insert/delete of a message to the topic 2) SELECT id_msg FROM topics WHERE id_topic = {id_topic} 3) get the total number of messages in a topic, use the reply # as the key and if the page is before or after what is in the cache, load that page. I would prefer doing 1 & 2. #3 would be the most work to do and have the biggest potential for failure, but wouldn't require additional queries.

Re: cache

Reply #18 – April 27, 2013, 06:43:21 pm

Ahh...okay, you were talking only about message ids...sorry I'm already in the bed and apparently I'm already sleeping...

Re: cache

Reply #19 – April 27, 2013, 07:12:24 pm

Yeah, lists of ids (index) and then you use those lists as a means to get the "full version". You still cache the full version.

I think Arantor has an issue with flushing the cache. You don't need to flush it, but you can quickly invalidate it. You would have two salts. One would be for the forum and the other would be for the group. I would keep them small - like <5 characters each and separate them with a simple character. For the topics group it would be 'a7z.yuP.topics.' would be the key prefix. Notice I used a delimiter instead of a fixed width. It could just as well be a fixed width and that would be fine.

Sometimes it takes a while to parse a message. They then get cached. In the message object, store the last time it was parsed. If it is empty, it doesn't need to be parsed because it runs quickly. If it isn't empty we check when the last time the BBC or any settings that would affect the topic were changed. If it was parsed before then, we reparse it and cache it. Otherwise, that's what we want. You might want to store a copy of it unparsed in another object or maybe vice versa.

Re: cache

Reply #20 – April 27, 2013, 07:17:00 pm

No, you have an issue understanding what I'm trying to say (based on practical implementation issues rather than some theoretical notions)

I'm quite capable of invalidating a cache (hint: cache_put_data with a value of null). The problem is that in some of the cases I can't invalidate the cache that will actually be called upon. And caching the entire list of message ids is fine but I want to cache data about the messages.

For example, in the case I gave you, I want to cache the likes for each post and the names for them (so I can figure out if the current user is in the list of likers). I can't invalidate the entire topic's worth of likes just because I update one of them. Nor can I invalidate a 'page's worth' of likes because different users have different numbers of posts per page and therefore what one user sees is not the same subset of messages for the other.

The only thing that I could do in that situation is cache every post's likes individually but that has issues in terms of putting so much more in the cache than is practical.

But on this situation, you're convinced you know best - and were the entire user base all on VPSes and systems that have decent caches, I'd agree with some of the points you're making. But sadly most of the users aren't. And I still think you're talking theoretically rather than practically (as in actually benchmarking the differences etc.)

Re: cache

Reply #21 – April 27, 2013, 10:50:34 pm

Is the purpose of caching the likes to show the user if they've liked it or a list of people that have liked it? In the first instance I would store it as "topic-like-{id}-{mem}". That way you are only storing whether a person liked it when they visit the topic or when they see it. The other one I would keep a simple list of ids: topic-likes-{id}. The only time you'd need to invalidate it is when someone likes or unlikes it. Views to like/dislike is going to be much higher on any forum. You are going to have to do queries at some point. The point isn't to get rid of them, just minimize them.

You don't cache every post. You cache the ones that are being viewed. Chances are the ones that are being viewed are going to be viewed again sooner than one that isn't.

Why are you stuck on shared hosting for caching? How many shared hosts allow the admin to have a RAM based variable cache? I already said the file cache mechanism doesn't count. If you don't have a real cache, this doesn't apply and the cache->get() call returns null immediately and doesn't affect performance. If your cache is too small and you are having issues with too many evictions and you can't change that, use the tuning mechanism to adjust what gets cached and for how long. What do you see as needing benchmarking? Do you really want to see benchmarks of caching objects in RAM as opposed to database queries?

Mind you, I'm not arguing at all. I am just throwing out ideas and supporting them. I am working on something for work right now that is consuming too much of my time to do other development... Solr is a pain.

Re: cache

Reply #22 – April 27, 2013, 11:01:29 pm

To follow on to your like issue. If it is the message that is getting liked, you do it on the message, not the topic. So the key would be msg-likes-{id} => array(member-ids). When you load that topic, get all of the messages that you are going to display, you also look for all of the likes for those messages. Using multi-get is exactly how I took a script from 5 minutes and growing to under 30 seconds and pretty steady at my last employer. Although implementing it with Elkarte/SMF is theoretical at this point, the idea is very proven.

I do it with as many keys in one array as possible to minimize the amount of network IO I have, but you can split that up by msg and then msg-likes making two arrays.

Maybe you want to gist the like code and I will show you how I would do it with caching?

Re: cache

Reply #23 – April 28, 2013, 12:03:46 pm

(I didn't read all topic, sorry, just to quick note FYI)
Re: cache with "groups". groundup, are you referring to tagged cache systems? If we are to add tags to cache entries, we need to redesign cache to support tagging.

On a flexible design of the cache system, I draw your attention to the following:
https://github.com/tedivm/fig-standards/tree/033e4bf15a4adf92fea09e469bd5655a626ccbfc/proposed

While I do not see the (too heavily OO) design in the proposal as suitable in our codebase, its main features may be worth a look. Note that I linked to an older version, which had also extensions folder, with stackable and taggable cache interfaces.

Same goes for the cache implementation library:
https://github.com/tedivm/Stash

Feel free to grab one and try to adapt to our needs, simplify it (IMO), benchmark, test, at your ease, and propose what of it and how we can integrate.

Re: cache

Reply #24 – April 28, 2013, 12:32:05 pm

Yes, tagging or grouping, whatever you want to call it. You could do it in the get/set functions without breaking anything.