phpBB SEO
Boards
Directory  
SEO  
Downloads
  phpBB SEO : Search Engine Optimization, Directory, Forums  
Index
Forums
Annuaire
Référencement
Télécharger
 
  Search Rechercher
    Register
Username :  Password :  Log me on automatically each visit  
S'enregistrer  
 
   
two other problem with UTF-8 persian
Goto page 1, 2  Next
 
Post new topic   Reply to topic    phpBB SEO » SEO Forum  » phpBB Forum
::  
Author Message
AmirAbbas
phpBB SEO Team
phpBB SEO Team


Joined: 11 May 2006
Posts: 529
Location: IRAN

two other problem with UTF-8 persianPosted: Mon Jul 31, 2006 11:12 am    Post subject: two other problem with UTF-8 persian

hello again

i know that this forum is not suitable for this question
i must ask this question in phpbb.com site but i think i can find
the answer of my question here Embarassed

there is two other problem with UTF-8 persian
first i must say something about persian language
persian is very similar to arabic language

arabic has 28 characters and persian has all of those 28 characters + 4 other characters
there is a character that is in both of this language
the pronunciation of this character in both language is the same, the usage is the same, everthing is the same
but unfortunately there is two code of this character in unicode

that character is this

persian one
ی

arabic one
ي

this character pronunced "ye" and it sounds is "ee"
as you can see there is a little diferences between arabic and persian character

the arabic one has two dot under character
this deferences caused biggest SEO problem for all persian sites
i said that both of these characters are supported in utf-8
in persian countries (iran, tajikistan, part of armenia, great part of pakistan, afghanestan)
there is a lot of arabic windows
for example one of my users have arabic windows, he come to my site and make a topic.
all "ye" characters are typed in arabic shape and because of UTF-8 support of arabic "ye"
that user can send his post without problem
and after that someone like me with persian windows come and make another post
that contain persian "ye" and he can send his post without problem

now imagine,

the result of search is not accurate because for each word that contain character "ye"
there is two record in database (one for arabic and one for persian)

for example you want to search this word

آموزشی

this word pronunced "amuzeshi" and means "educational"

if you type the word with arabic "ye" you can see a list of topic that contain this word
but when you type it with persian "ye" you will see different result
unfortunately google, yahoo and MSN can't recognize with site is persian and with one is arabic
they are not aware about this problem
in keyword tools in yahoo and google site it this two address

adwords google keyword selector

yahoo selector tool

if you type word with diferrent shape of "ye" you will see different result
for example test this two word in yahoo keyword selector tool

persian word

آموزشی

arabic word

آموزشي

pay attention that both of this two word is persian but in one of them we used arabic"ye"

now my request

can we replace all arabic "ye" with persian "ye" in

1-topic title
2-text body
3-search box

??

for example one person want to make a post that contain arabic "ye"
system replace all arabic "ye" with persian one and after that save that post
in database

and second problem
in UTF-8 there is lot of other characters
some Saboteur users come to site and register with some special character
after that they start to send some bad post
system can not find they user name . how can i change the username field in registeration
form that only english character be acceptable ?

_________________
چهار گوش - طراحی وب - مجله طراحی وب
Back to top
Visit poster's website
dcz
Administrateur - Site Admin
Administrateur - Site Admin


Joined: 28 Apr 2006
Posts: 15242

two other problem with UTF-8 persianPosted: Mon Jul 31, 2006 11:56 am    Post subject: Re: two other problem with UTF-8 persian

Well, I am not sure this is this big of a deal.

For sure results will be different, just like when using or not using accents in french (é or e for the same word, just changing pronunciation, but it's still readable without) as many user won't use accents while other will.

What I think in the end is that people posting one way will as well perform search the same way, and we want to be findable on the two spellings.

People writing in your forums with the Arabic characters are more than likely to as well search using the same spelling, and the same goes with the Persian case.

So I am not sure it would be this wise to filter just one case as obviously user will continue searching the way they always did.

All you could do about this would be content wise, posting good articles specifically about the keyword and spelling you want to SEO, one article or at list part for each spelling. And to post many links to them Wink

And it's quite a good geofilter too, let me explain.

If one of you keyword/key-topic is using this 'ye', you can write down two articles about it, one using the Arabic 'ye' and the other one the Persians.
You'll be able to personalize a bit the two articles because you know the one using the Arabic 'ye' will most likely be found by people using the Arabic char-set Wink

Now if the subject is really not personalize-able upon such criteria, then again, you should make sure to use the two spelling the most possible to be findable by your two types of users.

Because if you filter those, user with Arabic char-set won't find you as easy on those terms, as I bet there is more difference in search engine results in such cases than in the 'é' 'e' french accent case.

Then, talking about your registering problem, we could try using the UTF-8 filtering technique I put together.

Try open :
Code:

Includes/funtions_validate.php


Find:
Code:

function validate_username($username)
{
   global $db, $lang, $userdata;


After add :

Code:
   // www.phpBB-SEO.com BEGIN
   $username = rawurlencode(trim($username));
   $username = ereg_replace('%[a-zA-Z0-9]{2}', " ", $username);
   // www.phpBB-SEO.com END


Wink

++

_________________
Useful links :
SEO Forum || SEO Directory || SEO phpBB || SEO phpBB3 || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Référencement phpBB3 || Recherche
Back to top
Visit poster's website
AmirAbbas
phpBB SEO Team
phpBB SEO Team


Joined: 11 May 2006
Posts: 529
Location: IRAN

two other problem with UTF-8 persianPosted: Mon Jul 31, 2006 1:35 pm    Post subject: Re: two other problem with UTF-8 persian

Quote:
just like when using or not using accents in french (é or e for the same word, just changing pronunciation, but it's still readable without) as many user won't use accents while other will.


you mean that there is similar problem in french language ?
i thought only persian language has problem like this


Quote:
So I am not sure it would be this wise to filter just one case as obviously user will continue searching the way they always did.

All you could do about this would be content wise, posting good articles specifically about the keyword and spelling you want to SEO, one article or at list part for each spelling. And to post many links to them


you are right
this method is better
thanks


and the trick for registration is very good
i tested it. it can be very useful

-http://community.iransalamat.com

this forum is a medical forum
all the moderators are doctor. 3 month ago the admin of site had a argue with one of old
members. the users started to Swearword. the moderators banned that users but he came back and register
with an unusual username that was contain some special characters. we tried a lot but we couldn't
delete that users. he made lot of problem for that forum but with this trick we will not have
problem like that

after applying this trick if a user use persian character and submit the form, this message appear

Quote:
Sorry, but this username has already been taken.


i want to know is it possible for me to show another message ?
for example

Quote:
sorry, you can not use persian character for username. please use english characters and try again


thank you very much
Wink

_________________
چهار گوش - طراحی وب - مجله طراحی وب
Back to top
Visit poster's website
dcz
Administrateur - Site Admin
Administrateur - Site Admin


Joined: 28 Apr 2006
Posts: 15242

two other problem with UTF-8 persianPosted: Mon Jul 31, 2006 2:33 pm    Post subject: Re: two other problem with UTF-8 persian

Good Very Happy

I think the easiest to deal with this without to much changes would just be to edit the $lang['Username_taken'] key in lang_main.php, you can use html in those, thus a nice <br /> to tell that in addition, nicknames can only contain Latin characters for security reasons.

I am pretty sure a better filtering is possible here, like making sure the nickname does only contain Latin and / or one of the 32 allowed Persian characters, if you'd like to still allow Persian for nick names. Unfortunately, I don't know much about char-sets, even though I have been learning quite a lot thanks to you Very Happy, later maybe ...


++

_________________
Useful links :
SEO Forum || SEO Directory || SEO phpBB || SEO phpBB3 || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Référencement phpBB3 || Recherche
Back to top
Visit poster's website
AmirAbbas
phpBB SEO Team
phpBB SEO Team


Joined: 11 May 2006
Posts: 529
Location: IRAN

two other problem with UTF-8 persianPosted: Mon Jul 31, 2006 3:50 pm    Post subject: Re: two other problem with UTF-8 persian

Quote:
I think the easiest to deal with this without to much changes would just be to edit the $lang['Username_taken'] key in lang_main.php, you can use html in those, thus a nice <br /> to tell that in addition, nicknames can only contain Latin characters for security reasons.


yes, its the best an easiest way. i don't like users be able to register with persian character at all
this trick is sufficient for me Smile

excuse me i forgot something about the first problem
i said that both persian and arabic 'ye' is supported in UTF-8
but persian 'ye' is not supported in windows-1256 encoding

i said before that we can use both of this encoding for our sites but i preferred UTF-8
because this charset is standard charset for persian
but some persons like to use windows-1256 (arabic charset) like this two forum

-www.siscenter.com/forum/
-www.kowsarr.com/forum/

they claim that if you use windows-1256 charset size of the database is lower than UTf-8 database
because each unicode character takes 2 byte in database and the size of database with UTF-8 is
higher

but as i said persian 'ye' is not supported in windows-1256 charset
if a person use persian 'ye' in title, system replace that character with some codes
see this picture



all the persian 'ye' are replaced with that code

this problem is only exist in title and you can use persian 'ye' in body text without
any difficulties
i think its possible to replacing persian 'ye' with arabic 'ye' only for titles
i had asked this question in another forum. someone told me:
"use 'ereg_replace'." but i didn't know anything about PHP like know

_________________
چهار گوش - طراحی وب - مجله طراحی وب
Back to top
Visit poster's website
dcz
Administrateur - Site Admin
Administrateur - Site Admin


Joined: 28 Apr 2006
Posts: 15242

two other problem with UTF-8 persianPosted: Mon Jul 31, 2006 11:53 pm    Post subject: Re: two other problem with UTF-8 persian

Yes and here it is Wink

ی is
Code:
& # 1740 ;


ي is
Code:
& # 1610 ;


It's easy to filter this particular char, ereg_replace isn't even needed as str_replace is faster on such a basic replace, but our problem here is to filter all topic titles.

As you have notice, some mods, such as today at yesterday at do not even filter those with the censor system so I think the proper way to deal with this would be to filter the title when posting or editing, fresh install would be all right, others would need a sql script to manually filter topic titles once and for all.

Quite a bit of work even though it's not very complex.

The posting part could be dealed with like this I guess :

Open :

Code:
includes/functions_post.php


Find :
Code:

   if (!empty($subject))
   {
      $subject = htmlspecialchars(trim($subject));
   }



Replace with :

Code:

   if (!empty($subject))
   {
      $subject = htmlspecialchars(trim($subject));
                // www.phpbb-seo.com
      $subject = str_replace ('ی', 'ي', $subject);
                // www.phpbb-seo.com
   }


For the sql script, it's a bit longer, you'd have to select let's say 50 topic titles at a time and perform the str_replace before you'd update then and so on until it's done.

++

_________________
Useful links :
SEO Forum || SEO Directory || SEO phpBB || SEO phpBB3 || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Référencement phpBB3 || Recherche


Last edited by dcz on Tue Aug 01, 2006 10:42 am; edited 1 time in total
Back to top
Visit poster's website
AmirAbbas
phpBB SEO Team
phpBB SEO Team


Joined: 11 May 2006
Posts: 529
Location: IRAN

two other problem with UTF-8 persianPosted: Tue Aug 01, 2006 10:22 am    Post subject: Re: two other problem with UTF-8 persian

unfortunately it doesn't work

after i apllied this change and i tried to send a post
see this picture



the titles are missed
and there is no diferrence between english and persian in this case
if you use only english or only persian, in both case the title will miss

_________________
چهار گوش - طراحی وب - مجله طراحی وب
Back to top
Visit poster's website
dcz
Administrateur - Site Admin
Administrateur - Site Admin


Joined: 28 Apr 2006
Posts: 15242

two other problem with UTF-8 persianPosted: Tue Aug 01, 2006 10:43 am    Post subject: Re: two other problem with UTF-8 persian

Sorry there was a typo in the replace, $url instead of $subject erf, try :

Code:

   if (!empty($subject))
   {
      $subject = htmlspecialchars(trim($subject));
                // www.phpbb-seo.com
      $subject = str_replace ('ی', 'ي', $subject);
                // www.phpbb-seo.com
   }


++

_________________
Useful links :
SEO Forum || SEO Directory || SEO phpBB || SEO phpBB3 || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Référencement phpBB3 || Recherche
Back to top
Visit poster's website
AmirAbbas
phpBB SEO Team
phpBB SEO Team


Joined: 11 May 2006
Posts: 529
Location: IRAN

two other problem with UTF-8 persianPosted: Tue Aug 01, 2006 11:50 am    Post subject: Re: two other problem with UTF-8 persian

dcz wrote:
Sorry there was a typo in the replace, $url instead of $subject erf, try :

Code:

   if (!empty($subject))
   {
      $subject = htmlspecialchars(trim($subject));
                // www.phpbb-seo.com
      $subject = str_replace ('ی', 'ي', $subject);
                // www.phpbb-seo.com
   }


++


doesn't work again

i think the title at first must be changed to ascii code and after that replacement process replace that two character

_________________
چهار گوش - طراحی وب - مجله طراحی وب
Back to top
Visit poster's website
dcz
Administrateur - Site Admin
Administrateur - Site Admin


Joined: 28 Apr 2006
Posts: 15242

two other problem with UTF-8 persianPosted: Tue Aug 01, 2006 1:07 pm    Post subject: Re: two other problem with UTF-8 persian

All right, let's try it just before the post is submitted then.

Open :

Code:
includes/functions_post.php


Find :

Code:
   if ($mode == 'editpost')
   {
      remove_search_post($post_id);
   }


After add :

Code:
      $post_subject = str_replace ('ی', 'ي', $post_subject);


And if this is not enough, we could try adding an utf8_decode and utf8_encode to see if it helps.

++

_________________
Useful links :
SEO Forum || SEO Directory || SEO phpBB || SEO phpBB3 || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Référencement phpBB3 || Recherche
Back to top
Visit poster's website
AmirAbbas
phpBB SEO Team
phpBB SEO Team


Joined: 11 May 2006
Posts: 529
Location: IRAN

two other problem with UTF-8 persianPosted: Wed Aug 02, 2006 6:59 am    Post subject: Re: two other problem with UTF-8 persian

problem again

titles are missed

_________________
چهار گوش - طراحی وب - مجله طراحی وب
Back to top
Visit poster's website
dcz
Administrateur - Site Admin
Administrateur - Site Admin


Joined: 28 Apr 2006
Posts: 15242

two other problem with UTF-8 persianPosted: Wed Aug 02, 2006 8:38 pm    Post subject: Re: two other problem with UTF-8 persian

All right, I think I understood our problem.

The first suggested code works, but you need to enter the HTML ASCII codes instead of the copied characters from here.

eg :

Open :

Code:
includes/functions_post.php


Find :
Code:

   if (!empty($subject))
   {
      $subject = htmlspecialchars(trim($subject));
   }


replace with :

Code:
   if (!empty($subject))
   {
      $subject = htmlspecialchars(trim($subject));
            $subject = str_replace ('& #1740;', '& #1610;', $subject);
   }


In this last one make sure you past the code with the correct html ascii code (change "& #" to "&#" while pasted in you php file).

++

_________________
Useful links :
SEO Forum || SEO Directory || SEO phpBB || SEO phpBB3 || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Référencement phpBB3 || Recherche
Back to top
Visit poster's website
AmirAbbas
phpBB SEO Team
phpBB SEO Team


Joined: 11 May 2006
Posts: 529
Location: IRAN

two other problem with UTF-8 persianPosted: Thu Aug 03, 2006 4:36 am    Post subject: Re: two other problem with UTF-8 persian

i do it before Wink

i qouted your message and i clicked on preview button
with this method you can see the source in correct way

it didn't work

_________________
چهار گوش - طراحی وب - مجله طراحی وب
Back to top
Visit poster's website
dcz
Administrateur - Site Admin
Administrateur - Site Admin


Joined: 28 Apr 2006
Posts: 15242

two other problem with UTF-8 persianPosted: Thu Aug 03, 2006 9:14 am    Post subject: Re: two other problem with UTF-8 persian

mmmmh.

I have found out a simple, working in UTF-8, the bug came from the htmlspecialchars to break html ascii as it replaces all & with &amp; it breaks all &# Wink And actually this would break any char not being part of the defined char-set and thus coded in html ascii.

So here is how I turned around :

Open :

Code:
includes/functions_post.php



Find :

Code:
   if (!empty($subject))
   {
      $subject = htmlspecialchars(trim($subject));
   }



replace with :

Code:
   if (!empty($subject))
   {
      $subject = htmlspecialchars(trim($subject));
      $subject = str_replace ('&amp;#', '&#', $subject);
      $subject = str_replace ('& #1740;', '& #1610;', $subject);
   }


Or you could just replae the last one with :

Code:
   if (!empty($subject))
   {
      $subject = htmlspecialchars(trim($subject));
      $subject = str_replace ('&amp;#1740;', '&amp;#1610;', $subject);
   }


But the first version will make sure any no supported char in title is not broken.

In this last one make sure you past the code with the correct html ascii code (change "& #" to "&#" while pasted in you php file).


Let's just hope phpbb is not using windows char-set at this stage, but I doubt.

if it works, then we have solved a general phpbb bug as any language could see on title posted with unsupported chars.

++

++

_________________
Useful links :
SEO Forum || SEO Directory || SEO phpBB || SEO phpBB3 || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Référencement phpBB3 || Recherche
Back to top
Visit poster's website
AmirAbbas
phpBB SEO Team
phpBB SEO Team


Joined: 11 May 2006
Posts: 529
Location: IRAN

two other problem with UTF-8 persianPosted: Fri Dec 08, 2006 10:51 am    Post subject: Re: two other problem with UTF-8 persian

dcz wrote:


Then, talking about your registering problem, we could try using the UTF-8 filtering technique I put together.

Try open :
Code:

Includes/funtions_validate.php


Find:
Code:

function validate_username($username)
{
   global $db, $lang, $userdata;


After add :

Code:
   // www.phpBB-SEO.com BEGIN
   $username = rawurlencode(trim($username));
   $username = ereg_replace('%[a-zA-Z0-9]{2}', " ", $username);
   // www.phpBB-SEO.com END


Wink

++


hi

excuse me
i think this methos doesn't work in some situations

when some one try to register with persian character th system will block it but if the user use a combination of persian character and english character or persian character and numbers , this solution doesn't work

see

-http://forum.persia-cms.com/member224.html
-http://forum.persia-cms.com/member222.html

_________________
چهار گوش - طراحی وب - مجله طراحی وب
Back to top
Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    phpBB SEO » SEO Forum  » phpBB Forum
Page 1 of 2 Goto page 1, 2  Next

Navigation Similar Topics

Jump to: