ElkArte Community

Elk Development => Feature Discussion => Exterminated Features => Topic started by: emanuele on December 11, 2013, 03:27:43 pm

Title: BoardIndex query
Post by: emanuele on December 11, 2013, 03:27:43 pm
Code: [Select]
		WHERE {query_see_board}' . (empty($boardIndexOptions['countChildPosts']) ? (empty($boardIndexOptions['base_level']) ? '' : '
AND b.child_level >= {int:child_level}') : '
AND b.child_level BETWEEN ' . $boardIndexOptions['base_level'] . ' AND ' . ($boardIndexOptions['base_level'] + 1)),

So:
It's only me or that sounds odd?
Title: Re: BoardIndex query
Post by: Spuds on December 16, 2013, 09:45:05 am
Which part of it  :P   I think we only count the post one child level deep on purpose if that is the part that sound odd to you.

Title: Re: BoardIndex query
Post by: emanuele on December 20, 2013, 06:57:10 pm
The oddest part I think is the last: I can understand we grab everything when in BI because it avoids filters (and potentially filesorts or alike) because statistically the highest number of boards should be on the BI, though following that line, it doesn't make much sense to limit with a BETWEEN when we are on BI and counting the posts in the child boards.

I think something simpler like this would make more sense:
Code: [Select]
		WHERE {query_see_board}' . (empty($boardIndexOptions['base_level']) ? '' : '
AND b.child_level >= {int:child_level}'),

Anyway I did some tests, but I have everything on another computer, I'll post them later.

In the meantime, something related: as reported somewhere at SMF, the order of the boards and of the smiley rely on a table ALTER, though that "trick" doesn't work reliably in several conditions (non-MySQL, MySQL innodb and apparently myisam too after a certain version).

ORDER BY is apparently the only solution. The drawback is a filesort.

When I did the tests I included also the order, I don't remember exactly the results, but I seem to remember that there wasn't a big difference between the different versions (including the ORDER BY). Will post more later, including the code I used for the benchmarking. ;)
Title: Re: BoardIndex query
Post by: Spuds on December 20, 2013, 07:58:53 pm
Sounds good, looking forward to what you found out !

I knew we had that "alter" trick in a couple of places (and you know how I feel about some tricks :P ) ... interesting they don't work everywhere and that they are even breaking now in places they used to work.
Title: Re: BoardIndex query
Post by: emanuele on December 30, 2013, 11:27:17 am
First of all the code:
https://gist.github.com/emanuele45/8182929

Attached my results.

Now let's try to explain the conditions and all the combination.

Forum setup:

Conditions taken in consideration:

Variants taken in consideration:

All the possible combinations are tested.for 100 times.

An example of a result set:
Code: [Select]
~=~=~ Uncollapsed ~=~=~
admin
bi_count
0: 000 - 2.33
1: 001 - 2.33
2: 010 - 2.16
3: 011 - 2.21
4: 100 - 3.39
5: 101 - 3.43
6: 110 - 3.38
7: 111 - 3.56

That means we are looking at uncollapsed categories (Uncollapsed), from an admin point of view (admin), counting the posts of the child boards into their parent (bi_count).
Then we have the 8 set of results for the combinations of the three pair of variants:
000 => no moderators | no ordering | no table collapsed_categories
001 => no moderators | no ordering | table collapsed_categories included
010 => no moderators | ordering | no table collapsed_categories
011 => no moderators | ordering | table collapsed_categories included
100 => moderators | no ordering | no table collapsed_categories
etc. I hope you get the idea :P 0 means disabled, 1 means enabled, the first is moderators, the second is ordering  and third is collapsed_categories.
111 is basically the current situations: everything "on".

Some considerations.
The most interesting "variant" is the ordering/not-ordering (the central one). Comparing 5 with 7 (when the other two variants are the same) I can't notice any real difference, actually in some cases having the ORDER is slightly faster, but I think it's just an oddity or something like that.
From these results I think we can add the ORDER BY where needed without problems (I also think that keep the ALTER should not be a bad thing, but up to you).

Another "variant" that was interesting for me was the one about collapsed categories, because a possibility was to grab and output everything (just to avoid the whole collapsing code and reduce some complexity), but apparently, at least for registered users, the query/function slows down quite a bit (it takes about twice the time), so, better keep it like it is now.

Finally, the first variant (moderators): well... it looks like a no-go, yes the setup of the testing forum is a bit unusual, though unless I did a completely broken query, the times to retrieve the data are doubled in almost any case (for admins are almost 4 times, compare 0 to 3 with 4 to 7 in each set of results).

Anyway, the test is of course a bit biased by the fact that repeating the same query 100 times one after the other some mysql cache comes into play so it cannot be used to judge microseconds, though it should give a good general idea.

I did also some tests with the WHERE clause I was thinking is less odd, but in fact is it from not to very slower (from basically the same up to 1 second slower per 100 rounds).

Okay, that seems all. Do you see any hole in what I wrote? (Any bug in the code or methodology?) Do you think there is more to cover?
Title: Re: BoardIndex query
Post by: Spuds on December 30, 2013, 12:13:18 pm
WOW ... thanks for all that testing!

I'm still going over the results so I can get it straight in my mind, also trying to run your "bigelk" test file on my test site to see what results I get so we can compare :D
Title: Re: BoardIndex query
Post by: emanuele on December 30, 2013, 12:25:30 pm
Big Elk is big! LOL

I forgot to remove the path from the gist. :P
Title: Re: BoardIndex query
Post by: Spuds on December 30, 2013, 12:33:02 pm
Ran the test (boy does that take a while to run!), Don't have a translation on what all the numbers mean, hopefully @emanuele  can translate O:-)

The test site that I ran it on has:
3 Categories
13 Boards total
2 child boards (both in same category)
1 Admin
4 Global Moderators
110K posts

Did the run twice one as is and one with SQL_NO_CACHE set just to test if it made a difference.
Title: Re: BoardIndex query
Post by: emanuele on December 30, 2013, 02:33:21 pm
Quote from: Spuds – Ran the test (boy does that take a while to run!),
The more the better! ;D ... no wait in this case it's the opposite! :P

Quote from: Spuds – Don't have a translation on what all the numbers mean, hopefully @emanuele  can translate O:-)
[...]
Did the run twice one as is and one with SQL_NO_CACHE set just to test if it made a difference.
Cool the no cache!

About the data, the trend is similar to what I got: having a different query for local moderators is slower than do everything into that big thing, apply the ORDER BY doesn't make almost any difference, grab boards from uncollapsed categories brings in a significant overhead.

I just realised I didn't consider we could cache the board moderators.
I did a quick test (see the updated gist for the code): the values are comparable (some are slightly better, some are worse) to the ones I got keeping the moderators in the query itself, so apparently is not really an advantage because it means that we are not gaining anything in terms of performances and just a bit of decoupling. The net balance doesn't seem to be worth...
Title: Re: BoardIndex query
Post by: Spuds on December 30, 2013, 06:55:24 pm
So sounds like we can safely add the ORDER BY with no penalty, and in fact its needed for some DB setups.

Looking at things I did that no cache but only on one of the querys, so I should rerun it, but then again the results say the same thing anyway!
Title: Re: BoardIndex query
Post by: emanuele on May 17, 2014, 09:16:47 am
I updated the gist in a less lame way, to grab the moderators only in a single query at the end of the loop.
Unless I did something wrong, that way of doing looks quite faster (about 30%) than the single query with joins.
I also tried to cache the results of the moderators, though there is not a lot of difference, mainly because if there is the cache an additional loop is needed (see the two files in the gist): one to get the moderators and cache them, one to read the array cached and put the moderators into the boards/categories array.
Title: Re: BoardIndex query
Post by: Spuds on May 17, 2014, 09:46:49 am
QuoteUnless I did something wrong, that way of doing looks quite faster (about 30%) than the single query with joins.

Thats a nice improvement for sure, great work!