Re: BBC Parsing
Reply #46 –
I did some initial testing on the RegexParser and preg_split() is taking WAY too much time. RegexParser is taking 5x longer and I'm not even doing a fraction of the amount of things the original one does since it's just a proof of concept.
Re: BBC Parsing
Reply #48 –
I think thats how the other parsers I played with worked. The benefit was it was extremely flexible and robust, the downside was the speed was subpar compared to what we have.
I'm still impressed with what you have achieved with the re-write, its only 5% slower then the procedural version and of course its significantly better in terms of structure and looks far less frail to updates..
Re: BBC Parsing
Reply #49 –
I'm not happy. I was really hoping the preg_split() parser would be faster. I might try some other regular expressions to see if they work better but I doubt it. I think just creating the regex makes it much slower. A regular expression parser makes the parser so much prettier. One way would be to cache the tokenized version. Then again, why wouldn't you just cache the parsed version at that point?
Which... I think I'll make possible. I'll add an attribute to the BBC "ATTR_NO_CACHE" which, if the message contains a BBC with this attribute, it cannot be cached. The only time this would be set is if you create a BBC that depends on the user's permissions. Then the pre-parser would parse the message and cache it. The parser would have a method Parser::canCache() which would do the same thing.
Still going to do some more things. Going to make the smiley parser, HTML parser, and autolink parser separate classes so you can throw in your own. To reduce the complexity, if they aren't in the constructor it will just create them from default. Going to test and see how making each code an object works. Thinking I'll wind up sticking with an array though.
Re: BBC Parsing
Reply #50 –
A lot of the parser deals with issues that the preparser should pick up. I wonder how much and I wonder if removing that will make it any faster. If posts become mangled, we can always run the preparser on them again to fix it. It can be made part of the upgrade or a separate admin action.
Re: BBC Parsing
Reply #51 –
What things in the parser do you feel should be part of pre-parse? Its funny how much that function has grown over time.
Re: BBC Parsing
Reply #53 –
Yup, speed is not the primary concern with preparse since its generally only called during a save. Of course since it does alter a message you need to be sure it does its job very well or it becomes an oatmeal cartoon.
Just to note what you have in your repo is not the preparse from elkarte, I guess its smf 2.? but its not what Elk currently uses. I think you used parse_bbc from elkare (which again is not the same as 2.?)
Re: BBC Parsing
Reply #54 –
Yeah, I messed up and picked the wrong preparsecode(). I changed it locally but haven't pushed it up yet.
Still trying to figure out how to make stupid PhpStorm not save the CRLF
Re: BBC Parsing
Reply #55 –
Just thinking, be sure to grab the one from 1.0.5, there was an error in the table code and font codes that were fixed.
Posted in the other thread re lfcr vs lf
Re: BBC Parsing
Reply #56 –
Quick update. The changes I just pushed are resulting in a regular 8-12% speedup in the new parser. Pretty much just trying to get the autolink method to be called a lot less. I think in a real site with regular messages the new parser would result in much faster parsing overall. The messages that we have try to get a lot of worst case scenarios.
Re: BBC Parsing
Reply #59 –
If someone posted a message like that, I'd just wind up deleting their message. Is that a message for the preparser to check? I don't see people posting anything like that all that often so I wouldn't say we need to check for anything like that.
It wouldn't be hard to add a mod in the preparser to do parsebbc() and check the time. If the time is greater than a limit (0.25 seconds?) it should kick it back with an error that there is too much BBC. Though, I'm not looking to waste my own time developing it because I don't see it happen... ever.