Skip to main content
Topic: BBC Parsing (Read 19892 times) previous topic - next topic
0 Members and 3 Guests are viewing this topic.

Re: BBC Parsing

Reply #15
@Spuds I updated it.

If anyone is looking at it, index.php is the entrance for testing. The controls window is screwed up but it isn't a concern for me because I know what the inputs are.

There are currently two types of tests: bench and test (index.php?type=bench or index.php?type=test).

You can select individual messages to test by clicking the checkbox next to the message and clicking the submit button at the bottom of the page. If you don't select any messages and hit submit, it will do all of the messages.

There are no checkboxes for the benchmark test but you can put the message(s) you want in the address bar and it will do only those messages (for instance ?type=bench&iterations=1000&msg=20). It will always run "code" and "all" tests because selecting messages was an after-thought. As you can see from the example, iterations=# is the way to input the number of iterations it will run.

In index.php there is a constant called SAVE_TOP_RESULTS. Right now this is setup to save the top 5 percentage of difference between the A and B test to a CSV file.  You can view the results of that from TopResults.php. This helps me determine what is the biggest slow ups. I could change this to the top 5 in time but if message 20 takes 5 seconds on A and 10 seconds on B that's a difference of 5 seconds which is a 100% increase in time but if message 10 takes 2.5 seconds on A and 7.5 seconds on B that's a difference of 5 seconds which is a 200% increase and tells me there's a bigger difference in the code. At least, that's my logic. Going to also try to look at times as a difference to try to pick any low hanging fruit.
I am currently doing freelance work. If you need coding help, forum help, or server maintenance, shoot me a PM and let me know.

Re: BBC Parsing

Reply #16
Okay, cloning the parser isn't cheap but it certainly doesn't make up much of the difference in time. The only test that even calls that function is (currently) test 36
Code: [Select]
[quote=Edsger Dijkstra]If debugging is the process of removing software bugs, then programming must be the process of putting them in[/quote]
It only calls it once. On my last run of 1000 it only accounted for 0.12114 seconds difference out of 27.86 seconds for the entire test. 34, 40, 42, 31, and 39 make up the top 5 in terms of causing the most extra time. I think the pattern is nested tags are taking longer. I am going to add more messages to test that theory (that's usually how more messages have been added).

Still need to add more messages though. I know there are a bunch of issues from SMF and Elk with BBC that aren't being captured in these messages. There's no UTF-8 characters. I'm missing a lot of nesting scenarios.
I am currently doing freelance work. If you need coding help, forum help, or server maintenance, shoot me a PM and let me know.

Re: BBC Parsing

Reply #17
One thing I really don't like about the benchmark is that it can't actually tell me the memory loads because it is using the same process. I am thinking about getting rid of that altogether.

I am going to post this here because I don't have a real git repo setup (not on my computer yet) so I can't branch this. This is an in-situ example of parser. I don't see any difference between it and the previous version. In theory it should be using less RAM but I honestly don't know. Especially with PHP. I know in other language it wouldn't be making a copy and would use the reference.

Then again, there are a LOT of functions that create copies of the message. I don't think PHP has any in-situ string methods. I'm actually thinking about making a patch to add some. Would give me time working with C - something I haven't done in over 10 years. What we should do is make any internal functions Elkarte uses that makes changes to a string use references. I don't think it will save much time, but on long messages I think it will save memory.
I am currently doing freelance work. If you need coding help, forum help, or server maintenance, shoot me a PM and let me know.


Re: BBC Parsing

Reply #19
Finally figured out what the recursive parsing is for and added a new message for it. You are right, @Spuds, that takes a lot of time. Still not enough to account for the massive difference. Especially since it's such a rare thing to have bbc in the author tag of a quote.
I am currently doing freelance work. If you need coding help, forum help, or server maintenance, shoot me a PM and let me know.

Re: BBC Parsing

Reply #20
Feel free to upgrade to 5.4 in the admin panel if you want to. It's the maximum they currently have :/ But AFAIK there's quite a big difference between the two.
~ SimplePortal Support Team ~

Re: BBC Parsing

Reply #21
I am also working with 5.3 which is 6 years old. I want to test 5.4 and up. I imagine HHVM and PHP 7 are much faster, especially dealing with objects.
If the goal is Elk 1.1, then php 5.3 is the reference...
Bugs creator.
Features destroyer.
Template killer.

Re: BBC Parsing

Reply #22
Hmm, could I argue placing it 5.4 but keeping 5.3 compatible? Joshua still has a point there, 5.3 is old.
~ SimplePortal Support Team ~

Re: BBC Parsing

Reply #23
I don't know if I broke something or if 5.4 really is that much better but I saw it go from nearly twice as slow to almost the same speed using 5.4.

PHP 5.3
Code: [Select]
Messages: 73
Iterations: 500
Total Time In Tests: 14.77
Total Old Time: 5.72
Total New Time: 9.05
Diff Total Time: 3.33
Diff Total Time %: 8.42

PHP 5.4
Code: [Select]
Messages: 73
Iterations: 500
Total Time In Tests: 10.39
Total Old Time: 5.57
Total New Time: 4.81
Diff Total Time: 0.76
Diff Total Time %: 4.71
I am currently doing freelance work. If you need coding help, forum help, or server maintenance, shoot me a PM and let me know.

Re: BBC Parsing

Reply #24
That makes me feel a hell of a lot better. I was about to say screw using objects. An 84% increase in time isn't going to work. If you are still using 5.3, you deserve to have it take a little more time. Maybe it will push people to upgrade to >= 5.4. Especially since 5.3 EOL'd a year ago (last week).
I am currently doing freelance work. If you need coding help, forum help, or server maintenance, shoot me a PM and let me know.

Re: BBC Parsing

Reply #25
Wait!!! I was so used to old always beating new, I completely missed that new is faster now! :D :D :D I don't know why or how but this has made me so happy.

Consistently too! Just needed to use PHP 5.4 I guess.

Code: [Select]
Messages: 73
Iterations: 1000
Total Time In Tests: 19.85
Total Old Time: 10.6
Total New Time: 9.25
Diff Total Time: 1.35
Diff Total Time %: 9.73

Code: [Select]
Messages: 73
Iterations: 1000
Total Time In Tests: 20.27
Total Old Time: 10.78
Total New Time: 9.49
Diff Total Time: 1.28
Diff Total Time %: 9.9

Code: [Select]
Messages: 73
Iterations: 1000
Total Time In Tests: 20.22
Total Old Time: 10.74
Total New Time: 9.48
Diff Total Time: 1.26
Diff Total Time %: 9.86

There are still some changes that I can make and it needs more messages to test, but right now my code looks close to production for at least a mod. EDIT: forgot that it doesn't test for or handle disabled tags properly. Need to do that. before anything.

Now I have to edit the tests so that it shows me the difference between old and new and not just max/min.
I am currently doing freelance work. If you need coding help, forum help, or server maintenance, shoot me a PM and let me know.

Re: BBC Parsing

Reply #26
No idea how you are doing it, but I feel you are doing an amazing job, and I am glad my test site is being useful for such an important task!
~ SimplePortal Support Team ~

Re: BBC Parsing

Reply #27
Just downloaded the latest to play around with :D  Josh you are some night owl, I crap out by 9:30-10, but I do get up at 5:15ish.

I added a few tests to the messages array, nothing fancy but just a few more (and added the needed funcs in the helpers file).  Running on 5.4 (on my windows machine, should try it on my vps as well, but thats another day) I'm getting

Messages: 100
Iterations: 100
Total Time In Tests: 18.4
Total Old Time: 8.48
Total New Time: 9.91
Diff Total Time: 1.43
Diff Total Time %: -16.84

Not as fast on my system but doing pretty damn well, may be due to the extra tests I placed in the file. 

Also I  changed the way you were calculating the %'s, you can't do the max( , ) / min( , ) for the percents, you need to maintain one as the baseline and then delta and percent off of that.   So I used the old / current value as a baseline ... its the  ((known - experimental) / known) calculation that you want, with a div by zero check. 

You can see the problem in your examples ... Total Old Time: 10.6, Total New Time: 9.25, Diff Total Time: 1.35, Diff Total Time %: 9.73  ... The % difference is either 12.7% or -14.6% depending on which you used as the baseline, no idea what 9.73 is.
Squish squish. squish, squish, squish.
Find a bug,
Make a wish.

Re: BBC Parsing

Reply #28
Don't bother, I am just moral support :D You can do it guys!
~ SimplePortal Support Team ~

Re: BBC Parsing

Reply #29
Yeah, I realized it was screwed up as soon as I wrote it but it served its purpose in helping me.

Can you upload your changes? I want to work them in.
I am currently doing freelance work. If you need coding help, forum help, or server maintenance, shoot me a PM and let me know.