Skip to main content
Topic: Number of BBCode params and permutation (Read 11426 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Re: Number of BBCode params and permutation

Reply #30

How does his improved function work?

I wonder what exactly is taking so long in the pc_next_permutation function. I like your solution to max it.

Re: Number of BBCode params and permutation

Reply #31

No idea on that function, but feel free  :D

TBH I'm not sure how would we verify 10 years of undocumented bbc cases plus whatever addons may have done.  I know some of pregs in the parameters are not really stand alone, they need the other parameters to limit greediness etc.

The other thing I keep getting stuck on is that for 99.99 of the cases what was in place simply worked*.  I'm comfortable changing from a recursive permute to the lexicographic version since its really the same functionality, just a memory "smart" version.  This leaves the overall parse function unaltered.

I'd imagine the slower is simply the do/while plus comparison overhead.  There is some of the same overhead in the old function, its just *after
the permutations are computed.  So those numbers are a bit unfair but so is life :P

Re: Number of BBCode params and permutation

Reply #32

Oh, wait, the comparison includes the preg_match in the pc_next_permutation test and not in the other?

Ugh, I really need to get my damn computer back. This is killing me. I know that the proposed permute function works but now I want to test it against dougiefresh's. If his works and does so with much less resources and a limitless number of parameters, seems like his would be the way to go.

Re: Number of BBCode params and permutation

Reply #33

I really want to know, looking at a large number of posts with BBC, how often would the preg condition not match within X tries. Where X is whatever would add any measurable amount of time. Would be nice to have a couple of forums to be able to do some data analysis on.

Re: Number of BBCode params and permutation

Reply #34

I only did a simple test of performance on that, using quote blocks. 

Had a message with 10 quotes, some normal, some nested, with a couple of them using all 3 parameters out of order.  Ran 3 loops of 100.

Old permute function: Avg 3.73421 seconds in parse bbc and .19034 seconds in pre parse
New permute function: Avg 3.7125 seconds in parse bbc and  .18801 seconds in pre parse

Net: no difference, should have checked memory but for 3 parameters they should be about equal per previous postings based on 3 parameters.

Re: Number of BBCode params and permutation

Reply #35

I finally am able to test this out.

I am getting issues with the first quote code. Since there is only 1 param, the size results in 0.

At
Code: [Select]
for ($i = $size - 1; $p[$i] >= $p[$i + 1]; --$i);
it makes $i = -1. Then it loops through $p and stops when the value of $p is >= the last one. Problem is, there is no $p[-1]. So, it is an endless loop with 2 warnings.

Re: Number of BBCode params and permutation

Reply #36

I think I committed / tested that as
Code: [Select]
for ($i = $size - 1; isset($p[$i]) && $p[$i] >= $p[$i + 1]; --$i)
 


Re: Number of BBCode params and permutation

Reply #38

Personally I like what we have, and its likely faster anyway.

Re: Number of BBCode params and permutation

Reply #39

The only thing I think might make sense to pull is the sorting of the parameters so there is no need to put the regular expression in a loop.

Re: Number of BBCode params and permutation

Reply #40

Finally got to looking at this after getting nudged again, and using a test case that fails :P I think there are a couple of issues here, at least in 1.1, the 1.0 parser is what it is at this point.

First by the time we search for parameters, all \n's have been replaced by <br>. The issue is that without any \n's in the string the regex's tend to act as though the ~~s modifier is enabled.  This may cause the regex to capture past its parameter group.  We can fix this behavior by swapping out the  <br> with \n in the $message_stub variable but there is another "issue".

Second $message_stub starts at the current parameter marker and runs to the end of the message.  Doing the \n would fix most cases of regex greediness, but you can craft a message to cause problems.  I'd suggest clipping the $message_stub at the closing $possible[Codes::ATTR_TAG] value instead of doing any \n swap.  You can still craft a message to make the parser fail but its narrow.

Third the IMG code is defined with all optional parameters.  So if you supply an IMG  tag with all the available parameters, only (1) has to be in the right spot and the regex match is made, and very likely incomplete.  We need to define the code first as all required, which will properly parse a full loaded tag, and then have the current optional one, which will properly match a N-1 parameter list.  I added a very basic check to skip full parameter searching when the supplied tag does not contain the necessary DNA to warrant the search.
Last Edit: January 26, 2017, 09:46:28 pm by Spuds

Re: Number of BBCode params and permutation

Reply #41

After some more staring at my navel, Omphaloskepsis if you must know, the general issue is the use of the "optional" value on BBC parameters (as outlined above) ... The current way the search regex is built, it always adds the ? parameter to the regex which is not the right thing to do.

For example you use "width= alt= height=" in your img tag, all of those are optional ... if you build the regex as (width=)?(height=)?(alt=)? it will match, but only return width, alt and height are missed.   My first thought was to fix it by adding in a non optional code definition.  This does work, but it get progressively worse the more optional tags you create ( ie >3 )

My current plan is two fold. 

First is still trim down $message_stub so it just contains the tag in question (open to close) ... this helps prevent any overrun with nested tags etc, makes the regex nicer.  Currently its from the open tag to end of message and that can, in some cases, cause issues.

Second is to set the optional (?) modifier only on params that are optional AND not being used.  I added a simple (which can be made more robust if we find its needed) to check what parameters are being used on a current tag.  Then we add the optional modifier based on that condition.  So in the above example the user used the width= height= alt= parameters on a given tag.  Even though those are all optional, since they were used, they become required in the regex match.  The rest of the permutation stuff is the same and now runs to catch the "any" order.

https://github.com/elkarte/Elkarte/compare/development...Spuds:parseUpdates?expand=1
Last Edit: January 26, 2017, 09:46:12 pm by Spuds

Re: Number of BBCode params and permutation

Reply #42

I never did like matching to the end of the message. I think adding another regular expression in there to check for that will add overhead though. I think I tried it out and that's what it did.

The preparser could add flags in there from the start of a possible tag to the end of a possible tag. Then we could either use explode() to create an array (I think that's actually slower) or use strpos() to find the next value (should be faster). All it would do is search for [a-Z to find the start and then find the next ] that's not in a quoted string. If no closer is found, it's not possible to be a tag and the opener flag should not be added. Obviously, that would be an upgrading headache.

I don't quite understand the whole thing with the optional stuff. I will have to read it again.

I don't know the minimum version of PHP, but the main loop is a perfect place to use a generator.

I wonder if combining the regular expressions instead of putting them in a loop will be faster. http://nikic.github.io/2014/02/18/Fast-request-routing-using-regular-expressions.html explains what I'm talking about.

In the end, if your solutions fixes the bug, I think that's the way to go. Then we can worry about optimizing it. Really, we just need to create an AST and not have to worry about performance anymore.

Re: Number of BBCode params and permutation

Reply #43

Have you tried ordering the parameters? Just wondering if that had any positive effect.

Re: Number of BBCode params and permutation

Reply #44

I wonder if we couldn't also apply the flagging technique to parameters. Consider this:

  • first step is to flag all of the possible tags
  • after that, we go through all of the possible tags and find possible parameters. To do that, we find " [a-Z]=" that isn't inside of a quoted string (my brain doesn't want to do the work to write that regex out right now). We use a different flag to mark the start and end of those.
  • when we load the BBC, we create a list of required parameters and a list of optional. We sort both by their keys.
  • when we parse it, we find the tags, then we find all of the parameters. We order those parameters by their key. We loop through the required first and make sure they are all there. Skip to the next BBC like we do now if we hit one that isn't there or the test doesn't return true. Then we run through the optionals and do their tests.

I think I'm missing something huge here, but this essentially removes the need to use any regular expressions to find the parameters at read-time.

This would be the start of an AST which is a plus IMO.

PS: https://github.com/Spuds/Elkarte/blob/1be23253fa5c6a1d8e7cc16336b04e6579d336d3/sources/subs/BBC/Codes.php#L267 :( sorry