Chinese and title injection in URLs

Discussions and support about the different URL Rewriting techniques for phpBB2.

Moderator: Moderators

Chinese and title injection in URLs

Postby n260009610 » Sat Sep 23, 2006 7:05 am

I am Chinese

chinese simplified windows



Code: Select all
#-----[ OPEN ]------------------------------------------


#-----[ FIND ]------------------------------------------


#-----[ BEFORE ADD ]------------------------------------------

function if_query($amp)

   if($amp != '')
      return '?';

function format_url($url)
   $url = preg_replace("(\[.*\])U","",$url);
   $find = array('"','&','\r\n','\n');
   $url = str_replace ($find, '-', $url);
   $url = str_replace ('?, 'ss', $url);
   $url = str_replace (array('?,'?), 'oe', $url);
   $url = str_replace (array('?,'?), 'ae', $url);
   $url = str_replace (array('?,'?), 'ue', $url);
   $find = "懒旅培徕沐矣哉仳篝貘壬仕栝觌晴掏蜗祉铒仝垸疡";
   $replace = "AAAAAaaaaaOOOOOoooooEEEEeeeeCcIIIIiiiiUUUuuuyNn";
   $url = strtr($url,$find,$replace);
   $url = strtolower($url);
   $url = ereg_replace("[^a-zA-Z0-9]", "-", $url);
   while (strstr($url, '--')) $url = str_replace('--', '-', $url);
   $url = (substr($url, 0, 1) == '-') ? substr($url, 1) : $url;
   $url = (substr($url, strlen($url) - 1, 1) == '-') ? substr($url, 0, strlen($url) - 1) : $url;
   return $url;


$find = "懒旅培徕沐矣哉仳篝貘壬仕栝觌晴掏蜗祉铒仝垸疡"; ?????

Coding error?????

Please help me...

Posts: 6
Joined: Fri Sep 22, 2006 11:09 am


Postby dcz » Sat Sep 23, 2006 10:32 pm

And welcome :D

This come from your local char-set encoding. All of these are special characters like éàèü etc.. to be filtered prior to url injection.

The problem is the URL standard only accept the first 127 characters of the ASCII table, thus nothing in Chinese.

So injecting keywords in URL is only interesting if you are using at least some English in your phpBB forum and topic titles, which could be the case if for example you'd run a technical web site using many English words as it could be the case when talking about web design and such.

For these kind of cases, and thanks to our UTF-8 specialist (;) amir)I developed a solution to filter anything, even UTF-8 (thus Chinese, cyrillic, Arabic, Hebrew, Persian etc ...) but English keywords.

I did not release it yet because it's not useful in many cases and I first want to update and release the phpBB SEO mod rewrite before I go for more specific releases. I mean the mod rewrite are working nicely, but the update is required to make them able to deal with a lot more than just rewriting phpBB URLs ;)

Anyway, the code is already being deployed here for example, so if you feel like your project will use more than occasionally English or at least words using Latin characters, then tell me and I'll send you a dl link.

In all other case, the phpBB SEO Simple mod rewrite is a great and fast as light mod rewrite solution for any type of languages using char-set that cannot be used in URLs.

Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
Posts: 21451
Joined: Fri Apr 28, 2006 9:03 pm

Postby n260009610 » Sun Sep 24, 2006 9:49 am

Oh, ths...
Posts: 6
Joined: Fri Sep 22, 2006 11:09 am

Return to phpBB2 mod Rewrite


  • Related topics
    Last post

Who is online

Users browsing this forum: No registered users and 2 guests