Skip to main content
Topic: BBC Parsing (Read 38846 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

BBC Parsing

Working on a new parser but not working entirely from the start. Just taking what is there now and "iteratively" changing it.

https://gist.github.com/joshuaadickerson/c528aae77d1cf0de029d

The goal is to make the code nicer with more ability to change things. The biggest thing is taking the code generation out of the parser. Also, taking it out of Subs.php. As a side effect, hopefully it is also faster and less resource intensive.

Everything works except itemcodes. They really are complex so I'm still trying to figure out where I broke them.

I would appreciate coming up with more messages. Especially ones that might break it.

Tester.php is the entrance to it but it really doesn't do much yet. I am going to make it much nicer shortly.

Re: BBC Parsing

Reply #1

Sounds awesome Josh  :D 

I've looked at, and implemented, two different "3rd" party parsers with good results.  Good in that it seemed to parse whatever I threw at it.  The downside was that there were both to slow as compared to the current function (which got some micro opto love to squeeze a bit more out of if). 

I think they just ended up with two many object or whatever floating around, the biggest disappointment  was they are also much slower for a post with no BBC, so plain text posts, lagged behind by a double digit percentage.

Anyway IMO its fine for a replacement to be a bit slower than what we have now if it gives us more readability, maintainability, extensibility, and any other ibilty's that I forgot :D

I have a few test cases that I can toss at it when you get a bit further along, looking forward to it !!!

Re: BBC Parsing

Reply #2

Post (or gist) them and I'll add them. Besides itemcodes, which I know are totally screwed right now, I want to see what it takes to break it. I think the issue with itemcodes is outside of the itemcodes function itself. So, if I can find a message that breaks it, I might be able to fix itemcodes lol.

Re: BBC Parsing

Reply #3

I hate Bootstrap so much right now. I just want code to look nice with overflow and highlighting. Once I get that part figured out, I will have a nice GUI for this thing. Right now, it looks okay, but is not that nice.

Oh, and as for benchmarks - old is killing new. Trying to figure that out now.

Re: BBC Parsing

Reply #4

Just throwing something that might be dumb, but wouldn't it be possible to render those codes on client side instead of exclusively server-side? I know that php cannot be executed on a regular browser and needs to run on a server but what if there was a way to have the browser helping somehow?
~ SimplePortal Support Team ~

Re: BBC Parsing

Reply #5

SMF has always worked the same with or without JS on (save a couple of things in admin). So, I guess Elkarte would be in the same boat.

Re: BBC Parsing

Reply #6

All tests are passing. Which is pretty awesome. Even added a bunch more messages to test.

Now, to figure out what is making it take nearly double the time.

Tester.php is no longer the entrance. It is now the index.php file.

It could certainly stand to look better but I am not in the mood to wrestle with Bootstrap's handling of
 and <code> anymore.

Re: BBC Parsing

Reply #7

Sounds like great progress, hope you can determine whats causing the speed issue.

Re: BBC Parsing

Reply #8

I set it up to show me what has the largest percentage time difference. Consistently, this string is the top of the list. Thought it was ironic.

Code: [Select]
[quote="Edsger Dijkstra"]If debugging is the process of removing software bugs, then programming must be the process of putting them in[/quote]

Re: BBC Parsing

Reply #9

Code: [Select]
		foreach ($this->bbc as $k => $v)
{
if ($tag !== $v['tag'])
{
unset($this->bbc[$k]);
}
}

is 50% faster than

Code: [Select]
		$this->bbc = array_filter($this->bbc, function ($ele) use ($tag) {
return $ele['tag'] !== $tag;
});
On PHP 5.3.

Re: BBC Parsing

Reply #10

Been poking through and playing with the code, thats a very nice refactor so far :D awesome job Josh!

In terms of speed, it looks like the biggest time delta is in the recursion call of handleParsedEquals, not sure how to improve that TBH.

Re: BBC Parsing

Reply #11

Oh, I have made a ton of changes so far. I will upload them shortly. I am also, at the same time, working on a complete rewrite using preg_split(). I could probably use preg_match() but I want to keep it fairly simple.

Re: BBC Parsing

Reply #12

Cool ... look forward to what you have done. 

I may take a couple of the simple optimizations (substr_xxx) and stuff them back in 1.0.5 as well.   That way its as fast as possible and then you really have to work to beat it :P (well in terms of speed), the new structure is already leaps better !

Re: BBC Parsing

Reply #13

Yeah, working on the optimizations, I could just throw them back in to it. I could definitely this without using objects but I really think that is the way to write good code. I had to do some stuff so I got side tracked and I also broke it last night before I went to sleep so I didn't upload them yet. Just got my computer back, so I should (in theory) be able to work much faster now. @Flavio93Zena let me use his test site with his online control panel so I could upload things to it. Awesome but I have to upload the changes any time I want to test something. I am also working with 5.3 which is 6 years old. I want to test 5.4 and up. I imagine HHVM and PHP 7 are much faster, especially dealing with objects.

Re: BBC Parsing

Reply #14

With the preg_split() parser, we could store the messages in their split form. Right now it is splitting by (\[/?$tag) and {\]) ($tag is the tag, not the regular expression). I guess if it splits by (\[/?.*)[\s\=\]]) (wrote that in this window so I don't know if it even makes sense) then it makes it so that any word or itemcode going forward can be added or removed.