Skip to main content
Topic: Number of BBCode params and permutation (Read 11427 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Number of BBCode params and permutation

Reported at sm.org https://github.com/SimpleMachines/SMF2.1/pull/2956
Basically parse_bbc creates all the possible permutations of the parameters of a bbocde in order to catch any possible order of them. In case a bbcode has more than 12 parameters the parser starts to create an array with more than 10k entries and after more than 14 the memory consumption goes above the 90MB.

The proposed solution is to dynamically re-order the parameters of a bbcode in order not to have to guess the order at all.

A couple of considerations.
1) Provide more than 5 parameters for a bbcode is insane and pretty much useless. Why? For example because nobody is going to use them anyway (people don't even remember the bbcode tags, let alone 15 parameters of a single bbcode).
2) The solution may actually be rather nice, though. But do it in parse_bbc looks kind of overkill to me: to fix a one in a 10 years problem, you are going to reorder each and every bbcode tag parameters out there forever. If there is any interest in applying this solution, it would be cool do it before save the text in the database so that there will be no need to do anything at parsing-time, though this solution may require some more work at "preparsecode-time" (like parse the text and replace it back with bbcode instead of HTML)... :-\
Bugs creator.
Features destroyer.
Template killer.

Re: Number of BBCode params and permutation

Reply #1

I get a white page if I apply those edits so you might review them in case you are getting inspiration.
Fatal error: Call-time pass-by-reference has been removed in Subs.php on line 887 but thinking about it it might be due to another edit I had done to the parser - http://custom.simplemachines.org/mods/index.php?mod=2191
~ SimplePortal Support Team ~

Re: Number of BBCode params and permutation

Reply #2

I heard about this one as well, @dcmouser let me know that another ILA mod (not mine, I'm  O:-) ) was causing things to explode in that repetition without duplication function. 

Obviously its a corner case, it takes a lot of parameters to cause the issue, and I think its a better, or a more natural, UI to wrap BBcodes for certain effects since you can do that with the just the editor buttons.  Like if you want bold italic underline, you don't use [bold typeface=italic style=underline]bla etc.

That said, its still bad code, and the docblock for the function notes that fact.  Its a ticking time bomb for someone willing to add lots of bbc parameters. 

Having spent more than just a little "quality" time with parse_bbc, I also know you don't want to make changes unless you have a least a decent test bed of input vs expected results to verify changes,  it may be the single most important function.  You also want to benchmark a normal post code before and after any changel,  I've seen changes that fix an edge case at the expense of the other 99% in terms of speed / memory.  Not indicating thats the case here since I have not looked at the solution nor bench marked it, just saying its something that would have to be done.

Re: Number of BBCode params and permutation

Reply #3

Quote from: Spuds – let me know that another ILA mod (not mine, I'm  O:-) )
You have been just lucky this time! :P

Quote from: Spuds – That said, its still bad code, and the docblock for the function notes that fact.  Its a ticking time bomb for someone willing to add lots of bbc parameters. 
 emanuele likes bombs!! :D KABOOOMM!!

Quote from: Spuds – Having spent more than just a little "quality" time with parse_bbc, I also know you don't want to make changes unless you have a least a decent test bed of input vs expected results to verify changes,  [...]
BIG NODS to everything.
I also think the proposed fix will not cover many edge cases: the reason (or at least one of the reasons) the "parsing" of the parameters is done with a regexp on all the permutations is because rely on spaces, square brackets and equal signs is not reliable (just an example, the proposed code will probably fail straight away quoting a message from @[SiNaN] [1], see the very first strpos).

@Flavio93Zena the reason it fails is because there is a typo:
Code: [Select]
function_param_order($message, &$parameters, &$out)
should be:
Code: [Select]
function fix_param_order($message, &$parameters, &$out)
Yeah, it's always your fault!! :P
Bugs creator.
Features destroyer.
Template killer.

Re: Number of BBCode params and permutation

Reply #4

LOL .. yes kaboooom indeed!

Of interest as I've been looking at that function, I think there is actually a bug when you get > 3 parameters as well .. I need to do some more testing to be sure so I'm doing some more digging.

Re: Number of BBCode params and permutation

Reply #5

Well, seeing your reply I'd like a fix by someone with more experience (you :P). I don't think you changed the code too much as regards the BBCode stuff, otherwise this topic wouldn't have been here. So basically the fix you are going to apply could potentially be either similar or identical on Elk and SMF. But I think it also depends on which way you are going to choose.
 Flavio93Zena crosses fingers hoping he got it right ;D
~ SimplePortal Support Team ~

Re: Number of BBCode params and permutation

Reply #6

So looked at this some more this AM, and the issue is not so much that the current function "choked" when given 14 elements, but that it appears to return wrong results for anything > 3 elements, at least if I understand what it is supposed to do.

- Given 3 params like array(1, 2, 3) it should return 6 combinations or 3! or 3*2*1 which it does.
- Given 4 params like array(1, 2, 3, 4) it should return 24 combinations or 4! or 4*3*2*1, instead it returns 18 combinations and some of them are duplicate sets, so not exactly what it should be doing. 
- Given 8 params it returns 770 combinations not the 40,320 that it should.

So the fact that it even got to 14 before it choked the system is an error, it should have done so much sooner  :P   (14 = 87,178,291,200 possible orders that 14 items could be arranged),  I did fix the algorithm to return the correct, non repeating, sets but the fixed function just exasperates the out of memory issue.

We could be smarter about the check as well.  Say a code has 14 potential parameters, but they only use 3.  In this case there would only be 2184 combinations to check, not the 14Billion, thats a huge reduction.  Of course nothing would stop the user from entering all 14 so they could a nice DoS

TLDR: Its fubar

Re: Number of BBCode params and permutation

Reply #7

Good job! Will you apply such a fix in 1.0.5 (hopefully to me xD) or 1.1?
~ SimplePortal Support Team ~

Re: Number of BBCode params and permutation

Reply #8

I think before starting, this needs tests.

Re: Number of BBCode params and permutation

Reply #9

I don't think we have any solution to test at least I don't, and TBH I've not come up with anything. 

All I've found is that the permute function is broken and has been form shugs who knows how long.  I can fix that but it means even fewer parameters could be iterated on before you have some wsod.

To me the major stumbling block is parameters that contain ] characters, like author names.  Given that condition, its not easy to break down the regex to find / extract individual patterns, instead you need the full pattern containing all parameters to avoid missing the correct ]

Re: Number of BBCode params and permutation

Reply #10

Quote from: Spuds – So the fact that it even got to 14 before it choked the system is an error, it should have done so much sooner  :P
Cool!...

Quote from: Spuds – We could be smarter about the check as well.  Say a code has 14 potential parameters, but they only use 3.  In this case there would only be 2184 combinations to check, not the 14Billion, thats a huge reduction.  Of course nothing would stop the user from entering all 14 so they could a nice DoS
Though, we don't know in advance which are used in the body, because it should be that permutation giving use the result of which params are used...

Code: [Select]
(author)=(.*?)(?:author|link|date)|(link)=(.*?)(?:author|link|date)|(date)=(.*?)(?:author|link|date|\])
 emanuele runs!

LOL nods
Bugs creator.
Features destroyer.
Template killer.


Re: Number of BBCode params and permutation

Reply #12

I did fix the permute code and verified that the new code is returning the correct number of sets.  Past say a set of 5 (120 combinations) it gets to be a massive chore to validate the validity of the returned arrays, but I had that data available to me.

The problem is even with the fixed permute function, the system is badly broken, actually more broken.  Permutation without repetitions is N! so 14 is 14*13*12*11*10*9*8*7*6*5*4*3*2*1 = 87,178,291,200 possible combinations ... Obviously we can't have the system trying to (potentially) examine 87Billion combinations .... that is if it could even get to the point of being able to calculate the 87Billion combinations before running out of memory. 

Its honestly more of an addon issue, however the potential for the error is in the code for sure.   Even if we were to use a smarter approach based on the actual number of passed parameters (not easy to do either) it will still leave sites exposed to some miscreant entering all 14 or 10 or whatever the number, and it would choke the system to death.

I'm really not sure what to do to fix the potential issue.  Could put a simple "max" check on the permute function to prevent the runaway as it will only ever occur if a addon author adds a large enough number of parameters to cause an issue, number be > 6  (6=720 checks, 7=5,040 8=40,320 9=362,880 10=3,628,880  I think once past 6 you are going to notice the effect, certainly by 10 you are in to seconds. 

I did not look at the posted solution, but trying to order the parameters to match a "standard" may be the way to go, if you can actually extract the various parameters with success.
Last Edit: August 01, 2015, 08:46:53 pm by Spuds

Re: Number of BBCode params and permutation

Reply #13

As a quick fix I would do exactly that: 1) fix the permutation code, 2) hard-limit to 5 or 6 the number of allowed parameter.

Past that, ordering needs IMO some benchmarking, I feel it's counter-productive with a reasonable amount of parameter (less than 5), but it's just a feeling.
Bugs creator.
Features destroyer.
Template killer.

Re: Number of BBCode params and permutation

Reply #14

http://www.simplemachines.org/community/index.php?topic=538611.msg3827920#msg3827920

Without testing it, I think this saves a lot of RAM without losing any functionality. It will allow you to get a lot more parameters but not infinite.

In Subs.php, find:
Code: [Select]
			// This is long, but it makes things much easier and cleaner.
if (!empty($possible['parameters']))
{
$preg = array();
foreach ($possible['parameters'] as $p => $info)
$preg[] = '(\s+' . $p . '=' . (empty($info['quoted']) ? '' : '"') . (isset($info['match']) ? $info['match'] : '(.+?)') . (empty($info['quoted']) ? '' : '"') . ')' . (empty($info['optional']) ? '' : '?');

// Okay, this may look ugly and it is, but it's not going to happen much and it is the best way
// of allowing any order of parameters but still parsing them right.
$match = false;
$orders = permute($preg);
foreach ($orders as $p)
if (preg_match('~^' . implode('', $p) . '\]~i', substr($message, $pos1 - 1), $matches) != 0)
{
$match = true;
break;
}

// Didn't match our parameter list, try the next possible.
if (!$match)
continue;

Replace
Code: [Select]
			// This is long, but it makes things much easier and cleaner.
if (!empty($possible['parameters']))
{
// Didn't match our parameter list, try the next possible.
if (!find_bbc_parameters($possible['parameters'], substr($message, $pos1 - 1), $matches))
continue;

Add this to Subs.php
Code: [Select]
/**
 * Find the correct order for the BBC parameters in a string
 *
 * @param array $parameters
 * @param string $message_part
 * @param array &$matches
 *
 * @return bool
 */
function find_bbc_parameters(array $parameters, $message_part, &$matches)
{
$preg = array();
foreach ($parameters as $p => $info)
{
$preg[] = '(\s+' . $p . '=' . (empty($info['quoted']) ? '' : '"') . (isset($info['match']) ? $info['match'] : '(.+?)') . (empty($info['quoted']) ? '' : '"') . ')' . (empty($info['optional']) ? '' : '?');
}

// Okay, this may look ugly and it is, but it's not going to happen much and it is the best way
// of allowing any order of parameters but still parsing them right.
$match = false;
$orders = permute(array_keys($preg));

foreach ($orders as $keys)
{
$match_string = '~^';
foreach ($keys as $key)
{
$match_string .= $preg[$key];
}
$match_string .= '\]~i';

if (preg_match($match_string, $message_part), $matches) != 0)
{
$match = true;
break;
}
}

return $match;
}

As two added benefits, it gets rid of the $orders array and doesn't do so much string copying with the substr() in the loop.
Last Edit: August 02, 2015, 03:00:04 pm by Joshua Dickerson
Come work with me at Promenade Group