ElkArte Community

Project Support => General ElkArte discussions => Topic started by: Nifty on January 08, 2016, 02:31:52 am

Title: Delayed Statistic Counters
Post by: Nifty on January 08, 2016, 02:31:52 am
Hi @all,
are there any plans to delay the update of Statistics like "Views" on Topics / Boards to relax the io on serverside during the busy times ?
Some other boards do this using a cron Job to minimize the write operation.
Title: Re: Delayed Statistic Counters
Post by: Joshua Dickerson on January 08, 2016, 03:55:37 am
Heh, I was just looking at that thinking there should be a setting.

Code: [Select]
		// Add 1 to the number of views of this topic (except for robots).
if (!$user_info['possibly_robot'] && (empty($_SESSION['last_read_topic']) || $_SESSION['last_read_topic'] != $topic))
{
increaseViewCounter($topic);
$_SESSION['last_read_topic'] = $topic;
}

First off, it shouldn't update for every view that you've made. It should only update once per session (no matter how many other topics you look at) and it should only update when there is a new post or edit in the topic (not sure about edits). Also, the circuit breaker should kick in and disable this completely when the load is too high. That should handle most issues, but a hook in the function could do it via caching or a topic_views table which would be in memory. That's another idea.

Anyway, are you having issues with load? I wouldn't start trying to change things until you start seeing issues. Especially with numbers like that. Users seem to be picky about their counts (downloads, views, stats, etc).
Title: Re: Delayed Statistic Counters
Post by: Nifty on January 08, 2016, 04:24:20 am
It will update every time you go into that Message - only thing it will not update twice if you call it repeatedly without viewing another Thread in the meantime.
-
I don't know how the circuit breaker works, but in Topic.subs.php is no sign of any exceptions, it just make the update into the Database.
-
I don't know if this will be a problem for me. I delayed the View Update on my MyBB Board to avoid running into trouble if the load gets high. But MyBB is using a different Datastructure where i don't know if i can compare it to Elkarte. I think Elkarte is putting a bit more load onto the Server, but this is just guessing at the moment. I need to make a few Load Testings as soon as my Importer is ready.

btw: i don't won't to loose Counters - i just like to have the updates delayed.
Title: Re: Delayed Statistic Counters
Post by: emanuele on January 18, 2016, 12:38:33 pm
Where do you store them "in the meantime" waiting for the db update?
Title: Re: Delayed Statistic Counters
Post by: Nifty on January 19, 2016, 01:54:50 am
memcached ? (i know may be lost on powerfailure and so on.)
MyBB is inserting it into a table and updates the stats only once in a hour.
Title: Re: Delayed Statistic Counters
Post by: Spuds on January 21, 2016, 09:23:38 pm
So it not really delaying the writing of the "counts" just the calculation of the stats?   I'd have to look to see how often that occurs, it might be on stats page load, not sure.

For a really busy sight, there are certainly my.cnf changes that you would make, as well as MyISAM -> InnoDB changes in the default install as well.
Title: Re: Delayed Statistic Counters
Post by: Nifty on January 23, 2016, 04:53:08 am
I thing we talk about different things.
i talk about elkarte_topics.views .
Title: Re: Delayed Statistic Counters
Post by: Spuds on January 23, 2016, 11:05:28 am
I guess so

If you are having performance issues due to
Code: [Select]
UPDATE {db_prefix}topics  SET num_views = num_views + 1
Then I really don't know what to say, in terms of performance bottlenecks that would not even be in my top 1000 TBH

If you are talking about not updating the ?action=stats page but once every 30 mins or so, then I could potentially see a need for that as I don't think that currently uses the cache and it could.
Title: Re: Delayed Statistic Counters
Post by: emanuele on January 23, 2016, 12:24:30 pm
Judging by a quick inspection of the code, MyBB and ElkArte work basically the same in that respect (i.e. UPDATE view = view + 1).
Relevant index (i.e. the topic id) is the same, so it shouldn't make any difference.
MyBB has a "delayed" option that instead of UPDATE'ing the topics table, INSERTs a row in another table.

Reading a bit around (e.g. http://use-the-index-luke.com/sql/dml ), it seems INSERTs may be faster than UPDATEs, but usually only if the table doesn't have indexes (in that case INSERT is just an adding stuff at the end of the table).
In the specific case, MyBB has a KEY(tid) on the threadviews table as well, so the two operations (UPDATE/INSERT) should really be mostly the same.

It may give an advantage (to be quantified[1]) using a table without indexes for the insert, even better if MEMORY, or a caching system (not file-based though).

It may be worth doing some tests.
It's my localhost with a 1.5M random posts/145k random topics testing forum, the update takes between 0.00038099 (on a "recent" topic) and 0.01108718 (on a "very old" topic the first time it's accessed, the second it becomes similar to the "recent" topic, probably because of some mysql caching) seconds
Title: Re: Delayed Statistic Counters
Post by: Spuds on January 23, 2016, 12:42:19 pm
Is that a InnoDB or MyISAM table ?  I would have suggest using insert delay but thats been depreciated.  I also considered using a temporary memory table that you flush to the main table from time to time, but then I was like, its just an update :P

If its MyISAM table, could you try  (if you don't already have this)
Code: [Select]
concurrent_insert = 2
in my.cnf and see if that improves things,  it would help the insert but maybe not the update, not sure. 

If InnoDB do you have
Code: [Select]
innodb_flush_log_at_trx_commit  = 2
or even set that to = 0 to be more aggressive.  Neither 0 or 2 is  ACID compliance. so you could have a minor data loss (like a +1 miss) less so with 2, but you get a performance boost for that minor risk.

Title: Re: Delayed Statistic Counters
Post by: Joshua Dickerson on January 23, 2016, 02:26:27 pm
What I was talking about is the view count on the topics. It updates every time you visit the topic in your session so long as you go to another topic. That doesn't really make sense to me unless there's a new post. I would keep a "last_view_time" in the session with an array of $topic => time(). Then have a threshold of 5 minutes (tunable) for how often you'd update the topic. It would check the last post time of the topic as well so if $last_view_time < $last_post_time ? increaseViewCounter().

INSERT is almost always faster than UPDATE. Appending to a file is a hell of a lot faster than finding that data point in the file and then changing the value. The biggest issue is that it has to lock the row to run the update. INSERT has no locks, even on MyISAM.

Unless you're running one of the largest forums, you really shouldn't be worried about this kind of stuff though. Take a bit of advice from me, who has spent my entire dev life trying to work on performance, reliability, and scalability, it's not worth it.
Title: Re: Delayed Statistic Counters
Post by: Nifty on January 23, 2016, 02:48:03 pm
Quote from: Joshua Dickerson – Unless you're running one of the largest forums, you really shouldn't be worried about this kind of stuff though.
I have a large Forum. There is a different if every move of a user makes an Update or an Insert if you have 500 or 1500 Users concurrently running around. And i had a few times Peaks of 1500 Users.  But 400-500 Users are always active.

According to myisam <-> innodb: its right innodb is better while updating, bit myisam / aria ( i run on Mariadb) is better in terms of read ops ( and would support full text ( but i use sphinx, its just 1Mio times faster) ).
Title: Re: Delayed Statistic Counters
Post by: Nifty on January 23, 2016, 02:51:52 pm
Quote from: Spuds –
Code: [Select]
innodb_flush_log_at_trx_commit  = 2
or even set that to = 0 to be more aggressive.  Neither 0 or 2 is  ACID compliance. so you could have a minor data loss (like a +1 miss) less so with 2, but you get a performance boost for that minor risk.
I delayed the writes once and ended up having corrupt databases which i had to rebuild to get back online.  That takes hours of manual work - if you around. Very bad if you are currently on vacation .... ;) )

Data needs to be flushed to Disk.
Title: Re: Delayed Statistic Counters
Post by: Joshua Dickerson on January 23, 2016, 03:20:20 pm
There's a big difference between 1500 users looking at the ElkArte stats and looking at the database connections. I guarantee you've never had 1500 concurrent connections to the database. The ElkArte stats default to keeping someone "online" if they've visited the site (anywhere) in the past 15 minutes. Consider that once you open a page, it takes a second for it to load (from click to full page load). Then you take time to read it. Then you take time to find your next link. So, it isn't 1500 concurrent connections or even 1500 views per second. To know the true usage, you need to use a better tool.

There are much better ways to see how many concurrent users you have. You can look at your Apache/Nginx access log with a script that reads it or just use Google Analytics which uses a Javascript connection to monitor live stats.

Think about it like this, every page load takes a second, it takes 28 seconds (to make the math easy) to read the page, and another second to click a link or click back, that's 30 seconds per user per page. At most, the user can get 2 pages per minute. If you're like me, you open a bunch of tabs from the messageindex or boardindex and then read them by tab, but it still averages out to the same. So, 1500 users, 2 pages per minute over 15 minutes. That should be nothing for any decently run server.

Now, obviously everything is done in bursts. That's where caching comes in. That's what you're talking about here as well (along with row-level locking because you should be using InnoDB). How many seeks do you think happen on any given topic in the 1/10 of a second it takes for someone to run an update on that row? Also consider that the lock gets returned in that 1/10 of a second.

Like @Spuds said you're worrying about the wrong thing. I will make a commit to make the changes I said (at some point), but delaying the update for that row is probably not going to make any difference for your site or 99.9999% of the sites out there. Removing the query makes a difference because the SQL analyzer is what's taking the most time.

I hope you don't take this as me telling you you're wrong. You're not. You're just worried about something that is less important to your performance than some of the bigger things. I've gone down this rabbit hole for years working on YaBB SE, SMF, and for a little while with ElkArte before I realized I should be picking the low hanging fruit.
Title: Re: Delayed Statistic Counters
Post by: Joshua Dickerson on January 23, 2016, 03:22:13 pm
Wait, are you running MyISAM?
Title: Re: Delayed Statistic Counters
Post by: Nifty on January 23, 2016, 03:40:17 pm
Only on a few, most are innodb.
Title: Re: Delayed Statistic Counters
Post by: Nifty on January 23, 2016, 03:50:52 pm
Quote from: Joshua Dickerson – There's a big difference between 1500 users looking at the ElkArte stats and looking at the database connections. I guarantee you've never had 1500 concurrent connections to the database.
Thats right of course. i have a max of 15 concurent db connections ( hard set in my.cnf). And nginx / php-fpm is set to 12 childs.
Still - if 500 Users running around including spiders you will do everything to keep writes as fast as possible to avoid queuing up the requests.

Writes are fsync'd to disk - cache is only for reads.

Quote from: Joshua Dickerson – Removing the query makes a difference because the SQL analyzer is what's taking the most time.
Nope. Writes to Disk take more time than any Analyzer ( and Querys are cached and reused ) will ever use. I don't run on a ssd, i use standard hard disks.
Title: Re: Delayed Statistic Counters
Post by: Spuds on January 23, 2016, 04:24:05 pm
Quote from: Nifty – I delayed the writes once and ended up having corrupt databases which i had to rebuild to get back online.  That takes hours of manual work - if you around. Very bad if you are currently on vacation .... ;) )

Data needs to be flushed to Disk.
I'd like to understand the case where that setting to 2 can cause database corruption, its a potential data loss issue, yes, but corruption?

You will get better performance by setting the value to 2, but then you can lose up to one second worth of transactions should the system crash.  With a value of 2, only an operating system crash or a power outage can loose the last second of transactions.  Using a value of 0 you would add in that loss should mysql crashing, mysql crashing with a value of 2 is not different that the default value of 1. 

Quote i have a max of 15 concurent db connections
You are saying you have "max_connections" set to 15?  I really feel like you are hobbling your system doing that given the number of users you have.  Obviously I don't know your site situation, but 15 just seems really low.
Title: Re: Delayed Statistic Counters
Post by: Nifty on January 23, 2016, 04:39:55 pm
@Spuds
i used the "delay write" for MyIsam tables.
I have currently not need to delay anything since i switched the search to sphinx.
But - i am still running myBB. elkarte is what i am currently testing.
-
QuoteYou are saying you have "max_connections" set to 15?  I really feel like you are hobbling your system doing that given the number of users you have.  Obviously I don't know your site situation, but 15 just seems really low.
Current Page times are ~ 200-400ms. all reads come from cache. i only care about the writes going to disk. Thats my bottle neck.
Title: Re: Delayed Statistic Counters
Post by: Spuds on January 23, 2016, 05:09:19 pm
If you are now using Innodb now, then innodb_flush_log_at_trx_commit = 2 would be your friend. 

I have no idea on "delay write" for MyIsam.  For MyIsam using concurrent_insert = 2 simply means that insert will not search for an empty row in the table to insert into, but instead it simply stuffs it at the end.  This saves time at the expense of some table size.   Its not right for every application, but for forum querys it should be just dandy.  Also since there is a scheduled optimize table task those empty rows will get cleaned up as part of the standard maintenance so any size inefficiency is short lived.

I'm surprised that 15 connections could cause a disk I/O bound situation, I'm just having a hard time wrapping my head around that.  What are you using 5.25" floppy's :D :D  Is there something else that has bottle necked disk I/O other than the db?
Title: Re: Delayed Statistic Counters
Post by: Nifty on January 23, 2016, 05:24:26 pm
Quote from: Spuds – I'm surprised that 15 connections could cause a disk I/O bound situation, I'm just having a hard time wrapping my head around that. 
They cant. The 15 is just to make shure it will not happen. I prefer a "proxy did not respond in time" (max php childs -> 12 ;) ) to a not responding server.
But the server can currently  handle 1500 users at a time without any problems.
The trick was switching the search to sphinx & using nginx. Mariadb handles most situations better than mysql as well. And to many  sql connections can kill you easily.

But it seems you are more skilled at mysql internals than me. I would love to come back to you after running my performance test for elkarte if you don't mind. i will not change the settings for my current live system.
Title: Re: Delayed Statistic Counters
Post by: Joshua Dickerson on January 23, 2016, 07:28:33 pm
Which tables? I think I can help you out with your performance a lot more without code changes ;)

You already see that it doesn't matter for spiders. With 15 concurrent database connections, some of those are probably just open threads that are cached, right? Even still, what's the likelihood that 15 people browsing your forum are looking at the same topic or accessing its row (read/write) at the exact same moment? That's the only thing that's going to cause your UPDATE vs INSERT to really matter. Do me a favor and benchmark the time it takes to run an INSERT and an UPDATE on your topics table. That 1/100 or 1/1000 of a second going to matter in the grander scheme? Meh, I'd say not.

Caching doesn't just matter for reads. It matters for writes where you have common rows, or in your case - because you're using MyISAM - tables, that have to wait on write locks to clear. That's not the point though. Do you care how fast a query is or do you care about how fast your page loads? First comes the page as a whole, then comes the care about what's causing it to take longer than desired.

You're right, writes to disk take the most amount of time, but in the difference between insert and update (with a fixed length row), I'd think that it is everything else involved in the query and not the write that makes the update take longer.
Title: Re: Delayed Statistic Counters
Post by: SpeedFreak on January 23, 2016, 08:01:46 pm
What about stats - > a stats file of some sort.
Then a cron or task to update the db from the stats file every x seconds/minutes.

If the stats file was in the same location as the cache then folks that toss the cache into a tmpfs would be offloading even more io.
Title: Re: Delayed Statistic Counters
Post by: Joshua Dickerson on January 23, 2016, 10:27:24 pm
Quote from: Spuds – Also since there is a scheduled optimize table task those empty rows will get cleaned up as part of the standard maintenance so any size inefficiency is short lived.
On a forum with 1500 users, I would hope you'd disable the optimize table. I once worked for a company where we had scripts that did 1000s of INSERT/UPDATE a minute (more like 100k) adding about 10MB to the size of the table every hour. I used concurrent_insert=2 and never once ran an OPTIMIZE as I would kill the server. A couple of times in the 3 years I worked there I did a server swap - swapping the master and slave while I did a full server dump so I could upgrade the servers. Hard to benchmark the difference on that though because I was radically changing the hardware each time.

The way I see OPTIMIZE is that if you can use it without hours/days of locked tables or running out of free RAM/disk, your tables are too small/low IO to matter. As a "scheduled task" it's a feel good thing IMO
Title: Re: Delayed Statistic Counters
Post by: Nifty on January 23, 2016, 10:44:10 pm
Quote from: Joshua Dickerson – On a forum with 1500 users, I would hope you'd disable the optimize table.
For shure - i have no idea how long that would take:
QuoteOur members have made a total of 831.946 posts in 112.897 threads
Title: Re: Delayed Statistic Counters
Post by: Spuds on January 23, 2016, 11:17:38 pm
I give up, really I do.
Title: Re: Delayed Statistic Counters
Post by: Nifty on January 24, 2016, 03:20:54 am
@Spuds
Sorry if i made you upset, i did not mean it so. I am not a native speaker, my English is lacking the polished part.
Title: Re: Delayed Statistic Counters
Post by: Nifty on January 24, 2016, 04:00:33 am
Quote from: SpeedFreak – What about stats - > a stats file of some sort.
That means moving hot spots from the database engine to the filesystem and handle it from there. I thing this will course more work without a real benefit after all. Inserting into a separate table is a very fast operation.
Title: Re: Delayed Statistic Counters
Post by: Nifty on January 24, 2016, 04:12:29 am
Quote from: Joshua Dickerson – You already see that it doesn't matter for spiders. With 15 concurrent database connections, some of those are probably just open threads that are cached, right?
Currently
Quote449 users active in the past 30 minutes (172 members, 23 of whom are invisible, and 274 guests).
-
Table_locks_immediate   30609821
Table_locks_waited      13717
Tc_log_max_pages_used   0
Tc_log_page_size        0
Tc_log_page_waits       0
Threadpool_idle_threads 0
Threadpool_threads      0
Threads_cached  5
Threads_connected       1
Threads_created 6
Threads_running 1
Uptime  685523
Uptime_since_flush_status       685523
-
Generated in 166 ms (32.09% PHP / 67.91% MySQL)
SQL Queries: 17 / Server Load: 0.39 / Memory Usage: 10,75 MB
Stats from my mybb forum.

btw: i use innodb on all tables which are updated frequently.