BBC Parsing

BBC Parsing Started by Joshua Dickerson · August 14, 2015, 05:56:08 pm · Read 74937 times 0 Members and 1 Guest are viewing this topic. previous topic - next topic

Re: BBC Parsing

Reply #75 – September 04, 2015, 01:13:47 pm

Code: [Select]

$possible[Codes::ATTR_BEFORE] = isset($possible[Codes::ATTR_DISALLOW_BEFORE]) ? $tag[Codes::ATTR_DISALLOW_BEFORE] : $possible[Codes::ATTR_BEFORE];
$possible[Codes::ATTR_AFTER] = isset($possible[Codes::ATTR_DISALLOW_AFTER]) ? $tag[Codes::ATTR_DISALLOW_AFTER] : $possible[Codes::ATTR_AFTER];

The isset() there is never true in any of the test messages. Not sure what message I would need to make it true. I am trying to figure out what $tag does there.

Re: BBC Parsing

Reply #76 – September 04, 2015, 03:04:33 pm

I am thinking about adding another BBC type for footnotes. It would be TYPE_PARSED_CONTENT_COUNTED. Footnotes would be the only type with this and I can't see another type having it, but it would make footnotes not so hacky.

For each tag it would add an element to $counted_bbc with the tag name as the key. For each tag found, it would save that code. So, instead of doing some kludgery with %fn%, you would just save the message stub in the code there and retrieve that later. It would save the count which would get reset for each message and save a total which wouldn't get reset at all. I guess I need to code it out to show you. Maybe "counted" isn't the best term.

hmm... I guess it could also be used it for images to make a gallery at the end of the message.

Re: BBC Parsing

Reply #77 – September 04, 2015, 03:34:24 pm

Quote from: Joshua Dickerson – September 04, 2015, 01:13:47 pm
Code: [Select]
$possible[Codes::ATTR_BEFORE] = isset($possible[Codes::ATTR_DISALLOW_BEFORE]) ? $tag[Codes::ATTR_DISALLOW_BEFORE] : $possible[Codes::ATTR_BEFORE];
$possible[Codes::ATTR_AFTER] = isset($possible[Codes::ATTR_DISALLOW_AFTER]) ? $tag[Codes::ATTR_DISALLOW_AFTER] : $possible[Codes::ATTR_AFTER];
The isset() there is never true in any of the test messages. Not sure what message I would need to make it true. I am trying to figure out what $tag does there.

I think those are only used when you trigger the disallow parents ... so nested codes that would render bad html. Don't have an example off the top of my head though, but they are out there.

On footnotes, well you have that opportunity to try that with the class, but if you can find a less kludgery way in the old procedural version have at it as well !

Re: BBC Parsing

Reply #78 – September 04, 2015, 04:00:09 pm

@Spuds, I think I know why it's not working. It is only for disallowed tags and I haven't tested that yet. I think the parser is very broken with that. I am going to add some stuff to allow us to check that more easily.

Re: BBC Parsing

Reply #79 – September 04, 2015, 06:41:05 pm

Quote from: Joshua Dickerson – September 04, 2015, 04:00:09 pm@Spuds, I think I know why it's not working. It is only for disallowed tags and I haven't tested that yet. I think the parser is very broken with that. I am going to add some stuff to allow us to check that more easily.

Nope, that's not it either. Doesn't result in true for any of the messages with this: index.php?type=test&a=Old+parse_bbc&b=Parser&disabled_tags=code,quote,size,color,url,list,li,table,tr,td,th,b,s,i,u,

Re: BBC Parsing

Reply #80 – September 04, 2015, 06:59:50 pm

Bah! I'm an idiot. I was reading it as disabled. It's disallow. Stupid me. Anyway, just pushed a message that checks it. Checked it on both new and old and it's broken on both.

The fix is here: https://github.com/joshuaadickerson/BBC-Parser/commit/2fbeeca132973844530d89af563d257850410b0e

Re: BBC Parsing

Reply #81 – September 04, 2015, 08:09:30 pm

I am also thinking that the BBC parser shouldn't handle smileys at all. It should just add the markers for where the smiley parser should/shouldn't parse and then return it. This way you don't have to even include the BBC parser if you don't want to and makes it a lot easier to change the smiley parser and leaves less for the BBC parser to do.

Re: BBC Parsing

Reply #82 – September 04, 2015, 08:22:08 pm

Quote from: Joshua Dickerson – September 04, 2015, 06:59:50 pmBah! I'm an idiot. I was reading it as disabled. It's disallow. Stupid me. Anyway, just pushed a message that checks it. Checked it on both new and old and it's broken on both.

The fix is here: https://github.com/joshuaadickerson/BBC-Parser/commit/2fbeeca132973844530d89af563d257850410b0e

Damn ... what does that make me then ... https://github.com/elkarte/Elkarte/commit/651d137647ed2d4a2752d3f607142e0bca28d95c

... great catch BTW, Now if I had any idea what I was thinking 2 years ago.

Quote from: Joshua Dickerson – September 04, 2015, 08:09:30 pmI am also thinking that the BBC parser shouldn't handle smileys at all. It should just add the markers for where the smiley parser should/shouldn't parse and then return it. This way you don't have to even include the BBC parser if you don't want to and makes it a lot easier to change the smiley parser and leaves less for the BBC parser to do.

Possibly .. yes .. hummm I feel I'm missing something though, about the smileys.

Re: BBC Parsing

Reply #83 – September 04, 2015, 09:33:22 pm

You just had a typo. I was looking at it for a while.

You probably think you're missing something because the old parser was so much more complex. I thought that it was so much more complicated than it is. The BBC parser adds \n around text that shouldn't be parsed. Then it explode()s the string on those and parses every other element in that array. If parsing is disabled, the smiley parser does the entire message.

Re: BBC Parsing

Reply #84 – September 04, 2015, 09:52:03 pm

For historical reference:

I tried this out but it was much slower:

Code: [Select]

	protected function newParseSmileys()
	{
		$message = '';
		$offset = -1;
		$smiley_block = true;
		do
		{
			$offset++;
			$next_marker = strpos($this->message, "\n", $offset);
			$length = max(0, $next_marker - $offset);
			$stub = substr($this->message, $offset, $length);

			if ($smiley_block)
			{
				parsesmileys($stub);
				$message .= $stub;
			}
			else
			{
				$message .= $stub;
			}

			$offset = $next_marker === false ? $offset : $next_marker;
			$smiley_block = !$smiley_block;
		} while ($next_marker !== false);

		if ($offset !== strlen($this->message))
		{
			$stub = substr($this->message, $offset);
			parsesmileys($stub);
			$message .= $stub;
		}

		$this->message = $message;
		//return $message;
	}

Re: BBC Parsing

Reply #85 – September 05, 2015, 12:43:51 am

I added the license stuff. Let me know if I missed anywhere. Can I license this project as

Changing out the smiley parser was really easy.

Now there are 4 separate classes for parsing: BBC, HTML, smileys, and links. There will need to be at least one new global. I hope it would be a DIC.

Re: BBC Parsing

Reply #86 – September 06, 2015, 02:37:33 am

The latest commit (at the time of me writing this) is the base for a converter. I have had a bit much to drink tonight so I'm not going to continue with it, but I want to make it possible to convert the old format and create new bulletin board codes with a GUI. I realized, quickly, that an exporter will be very limited because I can't (or don't want to do all of the work to) parse the PHP out of the BBC array. What you see now is when I figured that it would be too complicated and needed to change my focus.

Re: BBC Parsing

Reply #87 – September 06, 2015, 07:14:34 am

QuoteThe latest commit (at the time of me writing this) is the base for a converter. I have had a bit much to drink tonight so I'm not going to continue with it

ROFL ... I've made some of my best commits under those conditions

So the converter was to try and convert the old bbc array to the new config, to help convert addons or other? That may be pretty difficult, maybe just a basic wiki page would be enough to help addon authors etc?

All of the license headers look correct to me as well, good job !

Re: BBC Parsing

Reply #88 – September 06, 2015, 07:35:03 am

The converter would work fine if I didn't have to worry about anonymous functions and inline conditions. Actually, the converter will still work. I just can't export. So you can add a middleware for the BBC but that's not that same as exporting a new package.

Re: BBC Parsing

Reply #89 – September 08, 2015, 09:29:16 pm

This idea is about performance. What if, instead of doing autolink() on run, we do autolink() on parsing. We could either add another tag ([autolink]) or add another parameter to url so we know it's an autolink. Then, when we unparse() just remove the tag. Autolink accounts for 7.2% of the total time to parse. That would mean just adding another tag to a message instead of a regular expression (regular expressions are killer on this parser). I think the preferred method is to use another tag because parameters, tests, and regular expressions in general are the slowest part of the entire parser.

Major problem with that is it requires running the regex on all messages when you upgrade. I guess you can do an instr() and look for www. or :// or @ and then return the ID to do the regular expression in PHP.