Skip to main content
Topic: BBC Parsing (Read 38898 times) previous topic - next topic
0 Members and 3 Guests are viewing this topic.

Re: BBC Parsing

Reply #180

Having some brain farts...
Code: [Select]
<?php

namespace Elkarte\Messages\Formatters;

class ForumML
{
    const KNOWN_NODE_TYPES = [
        'TEXT'          => 1,
        'NEW_LINE'      => 2,
        'EMPTY_LINE'    => 3,
        'TAG'           => 4,
        'EMOJI'         => 5,
        'LINK'          => 6,
    ];

    public function format($message)
    {
        foreach ($message as $node) {
            if (!$this->isKnownType($node->getType())) {
                throw new InvalidNodeTypeException;
            }

            $this->formatChildren($node);
        }
    }

    public function isKnownType($type)
    {
        $knownTypes = self::KNOWN_NODE_TYPES;
        return isset($knownTypes[$type]);
    }

    protected function formatChildren($node)
    {
        if ($node->hasChildren()) {
            foreach ($node->getChildren() as $childNode) {
                $this->format($childNode);
            }
        }
    }
}

class NodeFactory
{
    public function newNode($node, AbstractNode $parent = null)
    {
        if (!isset($node['type']) || !$this->nodeFactory->isKnownType($node['type'])) {
            throw new InvalidNodeTypeException;
        }

        $newNode = null;

        switch ($node['type']) {
            case NodeFactory::TEXT:
                $newNode = new TextNode($node, $this, $parent);
                break;
        }

        return $newNode;
    }
}

abstract class AbstractNode
{
    protected $type;
    protected $factory;
    protected $parent;
    protected $value;
    protected $children = [];
    protected $attributes = [];

    abstract public function render();
    abstract public function getType();

    /**
     * AbstractNode constructor.
     * @param array $node the unserialized array
     * @param NodeFactory $nodeFactory
     * @param AbstractNode|null $parent
     */
    public function __construct(array $node, NodeFactory $nodeFactory, AbstractNode $parent = null)
    {
        $this->nodeFactory = $nodeFactory;
        $this->type = $this->getType();

        if (isset($node['attributes'])) {
            $this->attributes = $node['attributes'];
        }

        if (isset($node['children'])) {
            $this->setChildren($node['children']);
        }

        if (isset($node['value'])) {
            $this->value = $node['value'];
        }
    }

    public function __toString()
    {
        $this->render();
    }

    public function __sleep()
    {
        return array_filter([
            'type'          => $this->type,
            'value'         => $this->value,
            'attributes'    => $this->attributes,
            'children'      => array_reduce($this->children, function ($children, $child) {
                $children[] = $child->__sleep;
                return $children;
            }, []),
        ]);
    }

    public function hasParent()
    {
        return $this->parent instanceof AbstractNode;
    }

    public function getParent()
    {
        return $this->parent;
    }

    public function hasParentOfType($type)
    {
        return $this->hasParent() && (
            $this->getParent()->getType() === $type
            || $this->getParent()->hasParentOfType($type));
    }

    public function hasChildren()
    {
        return !empty($this->children);
    }

    public function getChildren()
    {
        return $this->children;
    }

    public function hasAttributes()
    {
        return !empty($this->attributes);
    }

    public function getAttributes()
    {
        return $this->attributes;
    }

    protected function setParent(AbstractNode $parent)
    {
        $this->parent = $parent;
        return $this;
    }

    protected function setChildren(array $nodes)
    {
        $this->children = [];

        foreach ($nodes as $node) {
            $this->children[] = $this->factory->newNode($node, $this);
        }

        return $this;
    }
}

// namespace Nodes;
class Tag extends AbstractNode
{
    const TYPE = 'TAG';

    public function getType()
    {
        return self::TYPE;
    }
}

 

Re: BBC Parsing

Reply #181

Let me know your thoughts on this.

As it is now, the BBC will always be parsed when a message is accessed. It will always have the most up-to-date settings from BBC. There's nothing to check to make sure that's still true.

There are two ways we can change that to make it better. Both are not perfect.

1) Check the tags on each interpretation.
This means we can not have to know what the BBC is beforehand. We just check for a standard regular expression of (\[[:alphanum:] ). Those alphanumeric characters are checked against the BBC that we have to see if one exists. If so, then we check to see if the attributes/arguments (whatever you want to call them) match the "code" we're testing. This is pretty close to what we're doing now. It saves a lot of string copying and some of the regular expressions that take up a lot of parsing time.

2) Cache since the last time the BBC settings were changed.
This way has the ability to do a lot more optimizations and makes it so you could essentially just cache the entire message barring any user or environmental variables. It would need to be invalidated after every change of the BBC settings. It wouldn't need to run any validation or testing for the BBC because we'd define which class/method is called for each "node" in the AST. When a message is requested, it would check if the time the BBC settings were updated (settingsTime) was >= the time the message was parsed at (parseTime) then it will reparse and insert that into the database. In pseudo: settingsTime >= parseTime ? msg.parse().save();

Perhaps it matters what the interpreters are going to be. I can see a need for 3: web, email, and print. You take the same syntax tree and can display it in different ways depending on the settings.

Re: BBC Parsing

Reply #182

I want to change how filters and validators work. I want to define a stack of validators like 'maxlen[25],minlen[1],alpha' and the same thing with filters 'truncate[25],alpha'

Validators have constraints that are checked. If any constraint returns false, it doesn't match the tag. It could have a warning that could be passed back to the user when they are posting. Validators are checked to see which code matches.

Filters are run after a code matches. They transform (maybe they should be called transformers?) and can be run on any input (attribute or content).

Then to create a new code, it would be simple by stacking together validators and filters/transformers you just name what you want. If you want to create a new one, it would require a new class with a method that matches and you register that (probably through a hook). This would make them DRY and make codes much more composable.

Re: BBC Parsing

Reply #183

Here's the problem with the first one from http://www.elkarte.net/community/index.php?topic=2833.msg30829#msg30829 "Check the tags on each interpretation."

Let's say my message is like this:
Code: [Select]
Hello [bbc I am "[i]writing[/i]" you a message to "[test"]

The AST would be something like:
Code: [Select]
[
    {type: 'text', value: 'Hello '},
    {type: 'tag', value: 'bbc', attr: [
        {type: 'I'},
        {type: 'am', value: '[i]writing[/i]', quoted: 1},
        {type: 'you'},
        {type: 'a'},
        {type: 'message'},
        {type: 'to'},
        {type: '"[test"', quoted: 1},
    ]}
]

It would get to the second node and find out it doesn't have a tag there so it has to go back and reparse all of its children as text. It will get to the [i]writing[/i] part and have to parse that as a tag node. I suspect that's actually worse than what we have now.

Re: BBC Parsing

Reply #184

I think this idea makes sense as a plugin and I think only the second option will work. It would be a huge task to keep it up-to-date, but I like it better that way.

Parsed messages would be in their own table. This way when you update BBC settings, you would just truncate that table. It could be cool to see which messages were seen since the last time you changed your BBC and what they were. Though, that's not the aim at all.

Obviously, this means a lot of changes to queries. Any table that requests the message would need to join this new table and select the parsed message to go with it.

Re: BBC Parsing

Reply #185

I don't have my BBC repo handy so I'm going off memory. One of the most resource intensive operations is handling itemcodes. I think the preparser should handle the creation of the list.

The preparser is exactly where the lexer should come in. I feel like it would be a lot easier to work with if it were tokens.

playing with my scratch some more: https://gist.github.com/joshuaadickerson/c3669645a6ae59e37a6d46b4efe1bed1