ElkArte Community

Elk Development => Feature Discussion => Exterminated Features => Topic started by: TE on February 27, 2013, 04:57:00 am

Title: shorten Urls..
Post by: TE on February 27, 2013, 04:57:00 am
I really, really hate these endless long urls, example:
http://www.google.de/search?hl=de&site=imghp&tbm=isch&source=hp&biw=1920&bih=1119&q=elk+cliparts&oq=elk+cliparts&gs_l=img.3...3042.8288.0.8576.14.6.1.7.7.0.92.503.6.6.0...0.0...1ac.1.4.img.oqgTBlIA8aM

Possible solutions to avoid this:
1) Modify parse_bbc() and do something similar like the mod trimURL: http://custom.simplemachines.org/mods/index.php?mod=425
2) Use Jquery  / javascript and let the client do the work:
Code: [Select]
	$(".bbc_link").text(function(index, sUrl) {
if (sUrl.length > '. $modSettings['max_url_lenght'] . ') {
return sUrl.substr(0, '. $modSettings['max_url_lenght_before'] . ') + '. $modSettings['max_url_placeholder'] . ' + sUrl.substr(-'. $modSettings['max_url_lenght_after'] . ');
}
return sUrl;
});
3) Use plain CSS to shorten the output:
Code: [Select]
.bbc_link { display: block; max-width: 200px; overflow:hidden; text-overflow:ellipsis;}

I'm going to make a pull request soon, but wich variant should I choose?
I favor the Javascript variant, it is simple to implement, configurable via Admin-Interface and doesn't affect parse_bbc() perfomance. The CSS variant seems possible at all, but is not configurable via admin interface and may need additional HTML since < a> is an inline element and overflow:elipsis only seems to work with block level elements. .. Maybe Antechinus can find a better version?!?
Title: Re: shorten Urls..
Post by: Trekkie101 on February 27, 2013, 06:42:01 am
If I copy and paste a post, would one of them work or would both fail?
Title: Re: shorten Urls..
Post by: TE on February 27, 2013, 07:21:04 am
Quote from: Trekkie101 – If I copy and paste a post, would one of them work or would both fail?
If you mark and copy the plain text: no
If you use "quote" or "edit" or even the browsers "copy hyperlink" function, then yes.
Title: Re: shorten Urls..
Post by: Antechinus on February 27, 2013, 07:27:49 am
Plain CSS. Simple. Bulletproof. Doesn't rely on js. All you need to do is set the anchor to display: inline-block;

It's also an advantage to have an extra tag in Subs.php, so you can NOT shorten the text when people put their own title inside the tags (http://www.elkarte.net/index.php?topic=215.0), like that. I'm already running this sort of code and it works well. Complete code for 1.1.x version on the CEMB theme is as follows. Very easy to adapt for 2.0.x/2.1/Elk/whatever.

/* Shorten url's inside posts. */
.post a.raw_url, .quote a.raw_url, .personalmessage a.raw_url {
    display: inline-block;
    white-space: pre;
    overflow: hidden;
    text-overflow: ellipsis;
    max-width: 24em;
    vertical-align: bottom;
}

Subs.php:
Code: [Select]
array(
                'tag' => 'url',
                'type' => 'unparsed_content',
                'content' => '<a href="$1" class="raw_url" target="_blank">$1</a>',
                'validate' => create_function('&$tag, &$data, $disabled', '
                    $data = strtr($data, array(\'<br />\' => \'\'));
                    if (strpos($data, \'http://\') !== 0 && strpos($data, \'https://\') !== 0)
                        $data = \'http://\' . $data;
                '),
            ),
            array(
                'tag' => 'url',
                'type' => 'unparsed_equals',
                'before' => '<a href="$1" target="_blank">',
                'after' => '</a>',
                'validate' => create_function('&$tag, &$data, $disabled', '
                    if (strpos($data, \'http://\') !== 0 && strpos($data, \'https://\') !== 0)
                        $data = \'http://\' . $data;
                '),
                'disallow_children' => array('email', 'ftp', 'url', 'iurl'),
                'disabled_after' => ' ($2)',
            ),

Another advantage of using CSS for it is you can also tweak it for res via media queries if you want to. Can be handy sometimes.
Title: Re: shorten Urls..
Post by: Antechinus on February 27, 2013, 07:30:13 am
You could make it configurable in admin if you really wanted to. You'd just do it the same way all css stuff is done in admin: use an inline style that calls the relevant variable (like for theme width, etc).

ETA: Oh and I did do the iurl tags in Subs too, but didn't bother quoting them here. Works the same way.

ETA again :P : Btw, my preference is not to have it set in admin, because I prefer to keep the option of using media queries to set the width.
Title: Re: shorten Urls..
Post by: TestMonkey on February 27, 2013, 05:57:00 pm
TE, thanks for this. I agree it's silly for us to keep this behavior, with all long urls slapped in posts.

I see tradeoffs either way. Server-side solutions are "more permanent". I doubt even an addition in parse_bbc() would be so terrible (only Spuds' reaction to this statement will be, meknows, ah well), but probably not worth since it's not permanent anyway.
On the other hand: people have long become accustomed with even web services which implement a similar idea with trimUrl (i.e. tinyurl, ur1.ca), as actual permanent links, and yes, for similar reasons. After all, it's not cool to the reader to slap those uselessly long links in their face.
I wonder, how about: add a bit of javascript/whatever to one of Emanuele's boxes, to show the user a notice that their post has long urls (and that they will be tweaked when Elk displays them). For example, when they do 'preview' action. A simple 'notice'/'information' box.
ETA: reason is people might want to choose a more 'permanent' solution, when reminded. Dunno.

It looks like there's a consensus is for css:
http://www.elkarte.net/index.php?topic=134.msg1496#msg1496
Title: Re: shorten Urls..
Post by: Antechinus on February 27, 2013, 07:08:49 pm
If they use preview they'll see it trimmed anyway, so do you really need the extra warning?
Title: Re: shorten Urls..
Post by: Spuds on February 27, 2013, 07:14:27 pm
Seems CSS is the best way to just cut those down to length.

On several of the sites I help we use titled links, so the links get changed to the website title, and in general that provides a much better look to the page.  Of course to get those (external ones) you have to fire off some web requests on save so it can slow that initial save down .... I suppose you could ajaxify that as well and convert as they are entered hummm
Title: Re: shorten Urls..
Post by: Antechinus on February 27, 2013, 07:31:40 pm
I do quite like grabbing the title for externals, but would it work for gnarly url's like Google search results?

ETA: Come to think of it, I really like grabbing the title for internals too. That's already an SMF mod and would make a good default feature.
Title: Re: shorten Urls..
Post by: TE on February 28, 2013, 12:22:43 pm
Grabbing the title looks nice and seems easy, and it works for Google Search, too ..

However I'm a bit worried about performance.

Just tested with this small piece of code,  seems fast but I'm not sure what will happen if the remote host is slow and the html document is fairly big ( related to the php values max_execution_time, memory_limit ..)
Code: [Select]
$url_content = file_get_contents($url);
$domobject = new DOMDocument();
if (!empty($url_content))
{
$domobject->loadHTML($url_content);
$title = $domobject->getElementsByTagName('title');
echo ($title->item(0)->nodeValue);
}


Quote from: Spuds – Seems CSS is the best way to just cut those down to length.

On several of the sites I help we use titled links, so the links get changed to the website title, and in general that provides a much better look to the page.  Of course to get those (external ones) you have to fire off some web requests on save so it can slow that initial save down .... I suppose you could ajaxify that as well and convert as they are entered hummm
Does it work well on these sites? An Ajax based solution seems great, but I'm not the javascript expert  ;)
Title: Re: shorten Urls..
Post by: Arantor on February 28, 2013, 03:13:03 pm
The problem with doing that is that there is an interesting vulnerability vector attached, though this is probably a safe enough use of it. DOMDocument etc. all rely on libxml2, which has one interesting quirk: when fed a given document with a DTD, it will try to load it. Since DTDs include entity definitions, it's entirely possible to embed something you didn't necessarily anticipate from that source.

As ever, validate+sanitise the external data; to be honest I think you'd be fine with just regexp-matching on the title tag, assuming the page gave you a HTTP 200 in the first place. Note that file_get_contents() can't always grab external URLs and you might need to look at a cURL solution, which has its own issues with respect to getting external data (namely redirections with open_basedir)
Title: Re: shorten Urls..
Post by: Spuds on February 28, 2013, 08:18:09 pm
QuoteJust tested with this small piece of code,  seems fast but I'm not sure what will happen if the remote host is slow and the html document is fairly big ( related to the php values max_execution_time, memory_limit ..)
Generally its not noticeable .. however if the link is to a site that's down,  then you will notice the delay during the save as the request has to timeout.  Also the requests are done serially, so if you have a lot of links in a post .... well it can get progressively slower up to a max threshold.

Thinking about that since we did add a cURL based fetch web data, we could add in curl_multi_exec capability and fire off a group of them to avoid the serial processing.  Of course the site would need curl, and if missing would have to default to the serial method.

QuotecURL solution, which has its own issues with respect to getting external data (namely redirections with open_basedir)
I remember trying to take that all in to account when I worked on the cURL class for 2.1, https://github.com/elkarte/Elkarte/blob/master/sources/CurlFetchWeb.class.php  but I also only tested it on a couple of sites :P  Now that seems like ages ago ... I was doing that as a learning exercise so who knows whats in there :D

Thinking more about that ajax approach ... although it could work, I don't think it would save anything, so many ways a url can end up in a post so you end up having to poll the post every so often and build what you know ...... unless I'm missing something, just seems like it would be clumsy at best.
Title: Re: shorten Urls..
Post by: TestMonkey on March 01, 2013, 09:02:14 am
Quote from: Antechinus – If they use preview they'll see it trimmed anyway, so do you really need the extra warning?

"Information/notice", not warning.

And, perhaps yes. They might have a long post. They might use preview by simple habit (I know I do). They may not think at the moment about URLs. Dunno, just some thoughts.
The other advantage, if people reconsider and choose a short url, will be for interoperability reasons: content easier to send to another medium without exposing it to the same problem.
Title: Re: shorten Urls..
Post by: TestMonkey on March 01, 2013, 09:07:50 am
Re: you folks discussion on titles.
...I thought my first post here was unorthodox about performance. :D


Titles are a nice thing to do, but it's exposed to risks, and potential slowness, yes. It's okay if you want it, but lets just implement it externally?
Title: Re: shorten Urls..
Post by: Antechinus on March 01, 2013, 02:03:36 pm
Yup, if it's going to be slower than just copy/pasting the title into the bbc tags then IMO it's not worth doing. The only reason to do it is if it speeded up the entire posting process. Plus, you'd still want to allow for people doing things like this (http://www.simplemachines.org/community/index.php) with external links in the middle of sentences.

I am still quite interested in auntomatically grabbing the title for internal links though. That shouldn't give any performance hit (or not noticable ones).
Title: Re: shorten Urls..
Post by: Arantor on March 01, 2013, 02:05:59 pm
If you're doing it purely on saving, you only have to accept the lookup penalty the first time. (Though that gets into a situation in itself, over whether you link to a private topic in a public board, the title potentially would be 'leaked')

I hate to say it but what FB does is pretty slick... it goes to get previews/whatever while you're still posting.
Title: Re: shorten Urls..
Post by: Antechinus on March 01, 2013, 03:28:51 pm
Yes I understand that it's a one hit thing, but it'd be a PITA to have timeouts just because some external site you wanted to link to was being unresponsive.

And linking to private topics in public boards is a moderation thing.
Title: Re: shorten Urls..
Post by: Arantor on March 01, 2013, 03:35:25 pm
I was mostly concentrating on the internal links angle - the lookup is not particularly expensive to do and even then it's still only once per save.

You're right, the leaking aspect is a moderation thing. I'm just naturally wary about users griping about the software being broken when it isn't.
Title: Re: shorten Urls..
Post by: TestMonkey on March 01, 2013, 07:52:16 pm
Quote from: Arantor – I hate to say it but what FB does is pretty slick... it goes to get previews/whatever while you're still posting.
Ah yes, so does G+, for the first link, and it is a quite cool behavior I'd say.
Title: Re: shorten Urls..
Post by: emanuele on November 04, 2013, 06:00:15 pm
I think this was not implemented, right?
Title: Re: shorten Urls..
Post by: Spuds on November 04, 2013, 06:43:07 pm
I don't think so, we should add the css since thats safe and clean.  I'll look to convert the mod over to Elk so we have that to play with.
Title: Re: shorten Urls..
Post by: emanuele on November 05, 2013, 02:13:24 am
Yep, yep, I was thinking about the simple css-shortening, sorry is I didn't explain better. ;)