Skip to main content
Topic: BoardIndex query (Read 7023 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

BoardIndex query

Code: [Select]
		WHERE {query_see_board}' . (empty($boardIndexOptions['countChildPosts']) ? (empty($boardIndexOptions['base_level']) ? '' : '
AND b.child_level >= {int:child_level}') : '
AND b.child_level BETWEEN ' . $boardIndexOptions['base_level'] . ' AND ' . ($boardIndexOptions['base_level'] + 1)),

So:
  • if we are
    • not counting messages in childboards
    • and base_level is 0 (i.e. we are in the BordIndex)
    we retrieve all the boards
  • instead if we are
    • not counting messages in childboards
    • and base_level is not 0 (i.e. into a board)
    we grab everything with child_lever higher than what we are looking for
  • finally if we are
    • counting posts in child boards (and some them up to the parents)
    we grab only the boards between base_level and base_level + 1

It's only me or that sounds odd?
Bugs creator.
Features destroyer.
Template killer.

Re: BoardIndex query

Reply #1

Which part of it  :P   I think we only count the post one child level deep on purpose if that is the part that sound odd to you.


Re: BoardIndex query

Reply #2

The oddest part I think is the last: I can understand we grab everything when in BI because it avoids filters (and potentially filesorts or alike) because statistically the highest number of boards should be on the BI, though following that line, it doesn't make much sense to limit with a BETWEEN when we are on BI and counting the posts in the child boards.

I think something simpler like this would make more sense:
Code: [Select]
		WHERE {query_see_board}' . (empty($boardIndexOptions['base_level']) ? '' : '
AND b.child_level >= {int:child_level}'),

Anyway I did some tests, but I have everything on another computer, I'll post them later.

In the meantime, something related: as reported somewhere at SMF, the order of the boards and of the smiley rely on a table ALTER, though that "trick" doesn't work reliably in several conditions (non-MySQL, MySQL innodb and apparently myisam too after a certain version).

ORDER BY is apparently the only solution. The drawback is a filesort.

When I did the tests I included also the order, I don't remember exactly the results, but I seem to remember that there wasn't a big difference between the different versions (including the ORDER BY). Will post more later, including the code I used for the benchmarking. ;)
Bugs creator.
Features destroyer.
Template killer.

Re: BoardIndex query

Reply #3

Sounds good, looking forward to what you found out !

I knew we had that "alter" trick in a couple of places (and you know how I feel about some tricks :P ) ... interesting they don't work everywhere and that they are even breaking now in places they used to work.

Re: BoardIndex query

Reply #4

First of all the code:
https://gist.github.com/emanuele45/8182929

Attached my results.

Now let's try to explain the conditions and all the combination.

Forum setup:
  • 10 categories
  • 100 boards
  • random nesting
  • the first 10 boards have 10 local moderators each

Conditions taken in consideration:
  • User is admin
  • User is guest
  • User is normal user
  • Posts of children not counted in parent boards
  • Posts of children counted in parent boards
  • Board Index
  • Message Index
  • No categories collapsed
  • First 5 categories collapsed

Variants taken in consideration:
  • Moderators retrieved or not with the BI query
  • Using ORDER BY and not using it
  • Consider of ignore the {db_prefix}collapsed_categories table

All the possible combinations are tested.for 100 times.

An example of a result set:
Code: [Select]
~=~=~ Uncollapsed ~=~=~
admin
bi_count
0: 000 - 2.33
1: 001 - 2.33
2: 010 - 2.16
3: 011 - 2.21
4: 100 - 3.39
5: 101 - 3.43
6: 110 - 3.38
7: 111 - 3.56

That means we are looking at uncollapsed categories (Uncollapsed), from an admin point of view (admin), counting the posts of the child boards into their parent (bi_count).
Then we have the 8 set of results for the combinations of the three pair of variants:
000 => no moderators | no ordering | no table collapsed_categories
001 => no moderators | no ordering | table collapsed_categories included
010 => no moderators | ordering | no table collapsed_categories
011 => no moderators | ordering | table collapsed_categories included
100 => moderators | no ordering | no table collapsed_categories
etc. I hope you get the idea :P 0 means disabled, 1 means enabled, the first is moderators, the second is ordering  and third is collapsed_categories.
111 is basically the current situations: everything "on".

Some considerations.
The most interesting "variant" is the ordering/not-ordering (the central one). Comparing 5 with 7 (when the other two variants are the same) I can't notice any real difference, actually in some cases having the ORDER is slightly faster, but I think it's just an oddity or something like that.
From these results I think we can add the ORDER BY where needed without problems (I also think that keep the ALTER should not be a bad thing, but up to you).

Another "variant" that was interesting for me was the one about collapsed categories, because a possibility was to grab and output everything (just to avoid the whole collapsing code and reduce some complexity), but apparently, at least for registered users, the query/function slows down quite a bit (it takes about twice the time), so, better keep it like it is now.

Finally, the first variant (moderators): well... it looks like a no-go, yes the setup of the testing forum is a bit unusual, though unless I did a completely broken query, the times to retrieve the data are doubled in almost any case (for admins are almost 4 times, compare 0 to 3 with 4 to 7 in each set of results).

Anyway, the test is of course a bit biased by the fact that repeating the same query 100 times one after the other some mysql cache comes into play so it cannot be used to judge microseconds, though it should give a good general idea.

I did also some tests with the WHERE clause I was thinking is less odd, but in fact is it from not to very slower (from basically the same up to 1 second slower per 100 rounds).

Okay, that seems all. Do you see any hole in what I wrote? (Any bug in the code or methodology?) Do you think there is more to cover?
Bugs creator.
Features destroyer.
Template killer.

Re: BoardIndex query

Reply #5

WOW ... thanks for all that testing!

I'm still going over the results so I can get it straight in my mind, also trying to run your "bigelk" test file on my test site to see what results I get so we can compare :D

Re: BoardIndex query

Reply #6

Big Elk is big! LOL

I forgot to remove the path from the gist. :P
Bugs creator.
Features destroyer.
Template killer.

Re: BoardIndex query

Reply #7

Ran the test (boy does that take a while to run!), Don't have a translation on what all the numbers mean, hopefully @emanuele  can translate O:-)

The test site that I ran it on has:
3 Categories
13 Boards total
2 child boards (both in same category)
1 Admin
4 Global Moderators
110K posts

Did the run twice one as is and one with SQL_NO_CACHE set just to test if it made a difference.

Re: BoardIndex query

Reply #8

Quote from: Spuds – Ran the test (boy does that take a while to run!),
The more the better! ;D ... no wait in this case it's the opposite! :P

Quote from: Spuds – Don't have a translation on what all the numbers mean, hopefully @emanuele  can translate O:-)
[...]
Did the run twice one as is and one with SQL_NO_CACHE set just to test if it made a difference.
Cool the no cache!

About the data, the trend is similar to what I got: having a different query for local moderators is slower than do everything into that big thing, apply the ORDER BY doesn't make almost any difference, grab boards from uncollapsed categories brings in a significant overhead.

I just realised I didn't consider we could cache the board moderators.
I did a quick test (see the updated gist for the code): the values are comparable (some are slightly better, some are worse) to the ones I got keeping the moderators in the query itself, so apparently is not really an advantage because it means that we are not gaining anything in terms of performances and just a bit of decoupling. The net balance doesn't seem to be worth...
Bugs creator.
Features destroyer.
Template killer.

Re: BoardIndex query

Reply #9

So sounds like we can safely add the ORDER BY with no penalty, and in fact its needed for some DB setups.

Looking at things I did that no cache but only on one of the querys, so I should rerun it, but then again the results say the same thing anyway!

Re: BoardIndex query

Reply #10

I updated the gist in a less lame way, to grab the moderators only in a single query at the end of the loop.
Unless I did something wrong, that way of doing looks quite faster (about 30%) than the single query with joins.
I also tried to cache the results of the moderators, though there is not a lot of difference, mainly because if there is the cache an additional loop is needed (see the two files in the gist): one to get the moderators and cache them, one to read the array cached and put the moderators into the boards/categories array.
Bugs creator.
Features destroyer.
Template killer.

Re: BoardIndex query

Reply #11

QuoteUnless I did something wrong, that way of doing looks quite faster (about 30%) than the single query with joins.

Thats a nice improvement for sure, great work!