regex issue

Discussions about the phpBB2 Forum. How to get the best from this powerful script.

Moderator: Moderators

regex issue

Postby swiety » Fri Apr 03, 2009 4:37 pm

Hello!
I have a problem with regular expression and I'd like you to tell me what you think about it.
It's about custom bbcode. As you know there is no way to add your own bbcodes via ACP in phpBB2, but there is a way to do this by editing includes/bbcode.php and tempaltes/your_template/bbcode.tpl files.
So, I'm using bbcodes created by myself for quite a long time, but there is a "bug" in one of them...
This bbcode is simply colled "toggled code" - code which is hidden by default and when you click "show" button it's toggling down and you see all text inside it. The only difference is that hyper links in toggled code are clickable by default.
As it's very similiar to normal "code" bbcode it is also very similiar from coding side, so its function also return $text variable at the end and this variables goes through "function make_clickable($text)" in includes/bbcode.php file and here is the issue...

The problem is that the script doesn't parse http link located in first line of "toggled code" bbcode as clicable.
This is some lines from make_clickable function which is parsing the links, but as you can notice there need to be new line or space before http link to make it clickable:
Code: Select all
// matches an "xxxx://yyyy" URL at the start of a line, or after a space.
   // xxxx can only be alpha characters.
   // yyyy is anything up to the first space, newline, comma, double quote or <
   $ret = preg_replace("#(^|[\n ])([\w]+?://[\w\#$%&~/.\-;:=,?@\[\]+]*)#is", "\\1<a href=\"\\2\" target=\"_blank\">\\2</a>", $ret);

Take a look just at first group, because there is an solution to get this to work! I was trying to add "]" (square bracket) to the range, but this didn't work.
Why? Because the string in posts_text table in database (this is from where script is taking the entire post body to parse through bbcode regexes) looks like:
Code: Select all
[tcode:1:05c769fb0a]http

so adding "]" to the first group like that:
Code: Select all
(^|[\]\n ])

should work, but it didn't - I don't know why. I was trying many many different ways, but neither worked and I found a mod to add ed2k link to phpbb2 (HERE) and got a part from its code to my bbcode.php like:
Code: Select all
$ret = preg_replace("#(^|(?<=[^\w\"']))

and it is actually working !!
But....here is my question:

I'm using regex in VB .NET and know how to build a match, but I can not understand this regex structure, because "^" matches the begining of the string, but "^|" - what is it? "begining or.." I simlpy don't get this...
and the thing I copied from ed2k mod: "?<=" is positive lookbehind and matches a group before your main expression without including it in the result - i use this quite often, but then it matches the range of any word character, the double and single quote - without " or ' it won't match the string, but there is no quotes in my string, so why it's needed for is this range?

OK, but please tell me if the regex like this is safe and won't mess up other things in phpbb post parsing structure? As for now I haven't noticed any messed up other links and it works properly...so, what do you think ppl?
Thanks in advance for your replies.
swiety
 
Posts: 13
Joined: Tue Mar 11, 2008 8:23 am

Advertisement

Re: regex issue

Postby dcz » Sat Apr 04, 2009 9:02 am

Hello,

So, :
Code: Select all
(^|something)[0-9]+$

will match :
Code: Select all
88

and :
Code: Select all
something98

and :
Code: Select all
blablasomething98

too, since "something" does not require to be the begining of the script, ^|something means "starting with" or having "something" in it.
Same can be done with $ :
Code: Select all
^[0-9]+($|something):

will match :
Code: Select all
78

and :
Code: Select all
98something

as well as :
Code: Select all
98something something else


And you can mix both, as we do in the phpBB3 phpbb_seo_class.php, for filtering words of less than 3 letters in urls (the filtering occurs when title are already filtered, title-of-the-topic) :
Code: Select all
      if ($this->seo_opt['rem_small_words']) {
         $this->seo_opt['url_pattern'][] = '`(^|-)[a-z0-9]{1,2}(?=-|$)`i';
      }

This matches all cases at once (string starting / ending with hyphen or not).

Then :
Code: Select all
(^|(?<=[^\w\"'])


Seems correct for what you want to do. The double / single quote is here to let double or signle quoted urls unclickable (from what I can tell) :
Code: Select all
"http://www.phpbb-seo.com/"


So it looks just fine, but as you found out, phpBB is adding an extra white space to deal with this case, so there could be another reason not to do so, but I really don't think it can cause trouble if it's working.

++
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 21382
Joined: Fri Apr 28, 2006 9:03 pm

Re: regex issue

Postby swiety » Sat Apr 04, 2009 12:22 pm

Thank you for explaining the "^|" thing.
As I said that first group:
Code: Select all
(^|(?<=[^\w\"'])

works fine and I haven't noticed any other problems e.g. other broken urls etc., but the thing about quotes isn't that clear, because this regex:
Code: Select all
(^|(?<=[^\w\"']))([\w]+?://[\w\#$%&~/.\-;:=,?@\[\]+]*)

matches fine what I want from:
Code: Select all
[tcode:1:05c769fb0a]http://example.com
http://example.com
http://example.com http://example.com
http://example.com[/tcode]

it matches all urls from this tag, but when you delete " or ' from the range of first capturing group this won't match...strange.

So, maybe the best solution would be to create additional $ret variable which will be used only for matching URLs in this toggled code bbcode block - also using positive lookbehind - I mentioned it at my first post.
What I want to do is to match e.g last "]" as there is "[tcode:1:05c769fb0a]http" and not including this square bracket in result and off course match every http links in this bbcode.
Any ideas?!

Thanks in advance.
swiety
 
Posts: 13
Joined: Tue Mar 11, 2008 8:23 am

Re: regex issue

Postby dcz » Tue Apr 07, 2009 8:54 am

Well if it's working, why change ?

I would not rely on matching ] in that case, because you could confuse with other bbcocde blocks that could be used there, such as the [url] tag.

++
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 21382
Joined: Fri Apr 28, 2006 9:03 pm


Return to phpBB2 Forum

Who is online

Users browsing this forum: No registered users and 3 guests