| |
|
| :: |
| Author |
Message |
AmirAbbas phpBB SEO Team


Joined: 11 May 2006 Posts: 529 Location: IRAN
|
Posted: Mon Jul 31, 2006 11:12 am Post subject: two other problem with UTF-8 persian |
|
|
hello again
i know that this forum is not suitable for this question
i must ask this question in phpbb.com site but i think i can find
the answer of my question here
there is two other problem with UTF-8 persian
first i must say something about persian language
persian is very similar to arabic language
arabic has 28 characters and persian has all of those 28 characters + 4 other characters
there is a character that is in both of this language
the pronunciation of this character in both language is the same, the usage is the same, everthing is the same
but unfortunately there is two code of this character in unicode
that character is this
persian one
ی
arabic one
ي
this character pronunced "ye" and it sounds is "ee"
as you can see there is a little diferences between arabic and persian character
the arabic one has two dot under character
this deferences caused biggest SEO problem for all persian sites
i said that both of these characters are supported in utf-8
in persian countries (iran, tajikistan, part of armenia, great part of pakistan, afghanestan)
there is a lot of arabic windows
for example one of my users have arabic windows, he come to my site and make a topic.
all "ye" characters are typed in arabic shape and because of UTF-8 support of arabic "ye"
that user can send his post without problem
and after that someone like me with persian windows come and make another post
that contain persian "ye" and he can send his post without problem
now imagine,
the result of search is not accurate because for each word that contain character "ye"
there is two record in database (one for arabic and one for persian)
for example you want to search this word
آموزشی
this word pronunced "amuzeshi" and means "educational"
if you type the word with arabic "ye" you can see a list of topic that contain this word
but when you type it with persian "ye" you will see different result
unfortunately google, yahoo and MSN can't recognize with site is persian and with one is arabic
they are not aware about this problem
in keyword tools in yahoo and google site it this two address
adwords google keyword selector
yahoo selector tool
if you type word with diferrent shape of "ye" you will see different result
for example test this two word in yahoo keyword selector tool
persian word
آموزشی
arabic word
آموزشي
pay attention that both of this two word is persian but in one of them we used arabic"ye"
now my request
can we replace all arabic "ye" with persian "ye" in
1-topic title
2-text body
3-search box
??
for example one person want to make a post that contain arabic "ye"
system replace all arabic "ye" with persian one and after that save that post
in database
and second problem
in UTF-8 there is lot of other characters
some Saboteur users come to site and register with some special character
after that they start to send some bad post
system can not find they user name . how can i change the username field in registeration
form that only english character be acceptable ? |
_________________ چهار گوش - طراحی وب - مجله طراحی وب |
|
| Back to top |
|
 |
|
 |
dcz Administrateur - Site Admin

Joined: 28 Apr 2006 Posts: 15242
|
Posted: Mon Jul 31, 2006 11:56 am Post subject: Re: two other problem with UTF-8 persian |
|
|
Well, I am not sure this is this big of a deal.
For sure results will be different, just like when using or not using accents in french (é or e for the same word, just changing pronunciation, but it's still readable without) as many user won't use accents while other will.
What I think in the end is that people posting one way will as well perform search the same way, and we want to be findable on the two spellings.
People writing in your forums with the Arabic characters are more than likely to as well search using the same spelling, and the same goes with the Persian case.
So I am not sure it would be this wise to filter just one case as obviously user will continue searching the way they always did.
All you could do about this would be content wise, posting good articles specifically about the keyword and spelling you want to SEO, one article or at list part for each spelling. And to post many links to them
And it's quite a good geofilter too, let me explain.
If one of you keyword/key-topic is using this 'ye', you can write down two articles about it, one using the Arabic 'ye' and the other one the Persians.
You'll be able to personalize a bit the two articles because you know the one using the Arabic 'ye' will most likely be found by people using the Arabic char-set
Now if the subject is really not personalize-able upon such criteria, then again, you should make sure to use the two spelling the most possible to be findable by your two types of users.
Because if you filter those, user with Arabic char-set won't find you as easy on those terms, as I bet there is more difference in search engine results in such cases than in the 'é' 'e' french accent case.
Then, talking about your registering problem, we could try using the UTF-8 filtering technique I put together.
Try open :
| Code: |
Includes/funtions_validate.php |
Find:
| Code: |
function validate_username($username)
{
global $db, $lang, $userdata; |
After add :
++ |
_________________ Useful links :
SEO Forum || SEO Directory || SEO phpBB || SEO phpBB3 || Search
____________________
Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Référencement phpBB3 || Recherche |
|
| Back to top |
|
 |
AmirAbbas phpBB SEO Team


Joined: 11 May 2006 Posts: 529 Location: IRAN
|
Posted: Mon Jul 31, 2006 1:35 pm Post subject: Re: two other problem with UTF-8 persian |
|
|
| Quote: | | just like when using or not using accents in french (é or e for the same word, just changing pronunciation, but it's still readable without) as many user won't use accents while other will. |
you mean that there is similar problem in french language ?
i thought only persian language has problem like this
| Quote: | So I am not sure it would be this wise to filter just one case as obviously user will continue searching the way they always did.
All you could do about this would be content wise, posting good articles specifically about the keyword and spelling you want to SEO, one article or at list part for each spelling. And to post many links to them |
you are right
this method is better
thanks
and the trick for registration is very good
i tested it. it can be very useful
-http://community.iransalamat.com
this forum is a medical forum
all the moderators are doctor. 3 month ago the admin of site had a argue with one of old
members. the users started to Swearword. the moderators banned that users but he came back and register
with an unusual username that was contain some special characters. we tried a lot but we couldn't
delete that users. he made lot of problem for that forum but with this trick we will not have
problem like that
after applying this trick if a user use persian character and submit the form, this message appear
| Quote: | | Sorry, but this username has already been taken. |
i want to know is it possible for me to show another message ?
for example
| Quote: | | sorry, you can not use persian character for username. please use english characters and try again |
thank you very much
 |
_________________ چهار گوش - طراحی وب - مجله طراحی وب |
|
| Back to top |
|
 |
dcz Administrateur - Site Admin

Joined: 28 Apr 2006 Posts: 15242
|
Posted: Mon Jul 31, 2006 2:33 pm Post subject: Re: two other problem with UTF-8 persian |
|
|
Good
I think the easiest to deal with this without to much changes would just be to edit the $lang['Username_taken'] key in lang_main.php, you can use html in those, thus a nice <br /> to tell that in addition, nicknames can only contain Latin characters for security reasons.
I am pretty sure a better filtering is possible here, like making sure the nickname does only contain Latin and / or one of the 32 allowed Persian characters, if you'd like to still allow Persian for nick names. Unfortunately, I don't know much about char-sets, even though I have been learning quite a lot thanks to you , later maybe ...
++ |
_________________ Useful links :
SEO Forum || SEO Directory || SEO phpBB || SEO phpBB3 || Search
____________________
Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Référencement phpBB3 || Recherche |
|
| Back to top |
|
 |
AmirAbbas phpBB SEO Team


Joined: 11 May 2006 Posts: 529 Location: IRAN
|
Posted: Mon Jul 31, 2006 3:50 pm Post subject: Re: two other problem with UTF-8 persian |
|
|
| Quote: | | I think the easiest to deal with this without to much changes would just be to edit the $lang['Username_taken'] key in lang_main.php, you can use html in those, thus a nice <br /> to tell that in addition, nicknames can only contain Latin characters for security reasons. |
yes, its the best an easiest way. i don't like users be able to register with persian character at all
this trick is sufficient for me
excuse me i forgot something about the first problem
i said that both persian and arabic 'ye' is supported in UTF-8
but persian 'ye' is not supported in windows-1256 encoding
i said before that we can use both of this encoding for our sites but i preferred UTF-8
because this charset is standard charset for persian
but some persons like to use windows-1256 (arabic charset) like this two forum
-www.siscenter.com/forum/
-www.kowsarr.com/forum/
they claim that if you use windows-1256 charset size of the database is lower than UTf-8 database
because each unicode character takes 2 byte in database and the size of database with UTF-8 is
higher
but as i said persian 'ye' is not supported in windows-1256 charset
if a person use persian 'ye' in title, system replace that character with some codes
see this picture
all the persian 'ye' are replaced with that code
this problem is only exist in title and you can use persian 'ye' in body text without
any difficulties
i think its possible to replacing persian 'ye' with arabic 'ye' only for titles
i had asked this question in another forum. someone told me:
"use 'ereg_replace'." but i didn't know anything about PHP like know |
_________________ چهار گوش - طراحی وب - مجله طراحی وب |
|
| Back to top |
|
 |
dcz Administrateur - Site Admin

Joined: 28 Apr 2006 Posts: 15242
|
Posted: Mon Jul 31, 2006 11:53 pm Post subject: Re: two other problem with UTF-8 persian |
|
|
Yes and here it is
ی is
ي is
It's easy to filter this particular char, ereg_replace isn't even needed as str_replace is faster on such a basic replace, but our problem here is to filter all topic titles.
As you have notice, some mods, such as today at yesterday at do not even filter those with the censor system so I think the proper way to deal with this would be to filter the title when posting or editing, fresh install would be all right, others would need a sql script to manually filter topic titles once and for all.
Quite a bit of work even though it's not very complex.
The posting part could be dealed with like this I guess :
Open :
| Code: | | includes/functions_post.php |
Find :
| Code: |
if (!empty($subject))
{
$subject = htmlspecialchars(trim($subject));
} |
Replace with :
For the sql script, it's a bit longer, you'd have to select let's say 50 topic titles at a time and perform the str_replace before you'd update then and so on until it's done.
++ |
_________________ Useful links :
SEO Forum || SEO Directory || SEO phpBB || SEO phpBB3 || Search
____________________
Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Référencement phpBB3 || Recherche
Last edited by dcz on Tue Aug 01, 2006 10:42 am; edited 1 time in total |
|
| Back to top |
|
 |
AmirAbbas phpBB SEO Team


Joined: 11 May 2006 Posts: 529 Location: IRAN
|
Posted: Tue Aug 01, 2006 10:22 am Post subject: Re: two other problem with UTF-8 persian |
|
|
unfortunately it doesn't work
after i apllied this change and i tried to send a post
see this picture
the titles are missed
and there is no diferrence between english and persian in this case
if you use only english or only persian, in both case the title will miss |
_________________ چهار گوش - طراحی وب - مجله طراحی وب |
|
| Back to top |
|
 |
dcz Administrateur - Site Admin

Joined: 28 Apr 2006 Posts: 15242
|
|
| Back to top |
|
 |
AmirAbbas phpBB SEO Team


Joined: 11 May 2006 Posts: 529 Location: IRAN
|
Posted: Tue Aug 01, 2006 11:50 am Post subject: Re: two other problem with UTF-8 persian |
|
|
| dcz wrote: | Sorry there was a typo in the replace, $url instead of $subject erf, try :
++ |
doesn't work again
i think the title at first must be changed to ascii code and after that replacement process replace that two character |
_________________ چهار گوش - طراحی وب - مجله طراحی وب |
|
| Back to top |
|
 |
dcz Administrateur - Site Admin

Joined: 28 Apr 2006 Posts: 15242
|
|
| Back to top |
|
 |
AmirAbbas phpBB SEO Team


Joined: 11 May 2006 Posts: 529 Location: IRAN
|
Posted: Wed Aug 02, 2006 6:59 am Post subject: Re: two other problem with UTF-8 persian |
|
|
problem again
titles are missed |
_________________ چهار گوش - طراحی وب - مجله طراحی وب |
|
| Back to top |
|
 |
dcz Administrateur - Site Admin

Joined: 28 Apr 2006 Posts: 15242
|
Posted: Wed Aug 02, 2006 8:38 pm Post subject: Re: two other problem with UTF-8 persian |
|
|
All right, I think I understood our problem.
The first suggested code works, but you need to enter the HTML ASCII codes instead of the copied characters from here.
eg :
Open :
| Code: | | includes/functions_post.php |
Find :
| Code: |
if (!empty($subject))
{
$subject = htmlspecialchars(trim($subject));
} |
replace with :
| Code: | if (!empty($subject))
{
$subject = htmlspecialchars(trim($subject));
$subject = str_replace ('& #1740;', '& #1610;', $subject);
} |
In this last one make sure you past the code with the correct html ascii code (change "& #" to "&#" while pasted in you php file).
++ |
_________________ Useful links :
SEO Forum || SEO Directory || SEO phpBB || SEO phpBB3 || Search
____________________
Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Référencement phpBB3 || Recherche |
|
| Back to top |
|
 |
AmirAbbas phpBB SEO Team


Joined: 11 May 2006 Posts: 529 Location: IRAN
|
Posted: Thu Aug 03, 2006 4:36 am Post subject: Re: two other problem with UTF-8 persian |
|
|
i do it before
i qouted your message and i clicked on preview button
with this method you can see the source in correct way
it didn't work |
_________________ چهار گوش - طراحی وب - مجله طراحی وب |
|
| Back to top |
|
 |
dcz Administrateur - Site Admin

Joined: 28 Apr 2006 Posts: 15242
|
Posted: Thu Aug 03, 2006 9:14 am Post subject: Re: two other problem with UTF-8 persian |
|
|
mmmmh.
I have found out a simple, working in UTF-8, the bug came from the htmlspecialchars to break html ascii as it replaces all & with & it breaks all &# And actually this would break any char not being part of the defined char-set and thus coded in html ascii.
So here is how I turned around :
Open :
| Code: | | includes/functions_post.php |
Find :
| Code: | if (!empty($subject))
{
$subject = htmlspecialchars(trim($subject));
} |
replace with :
| Code: | if (!empty($subject))
{
$subject = htmlspecialchars(trim($subject));
$subject = str_replace ('&#', '&#', $subject);
$subject = str_replace ('& #1740;', '& #1610;', $subject);
} |
Or you could just replae the last one with :
| Code: | if (!empty($subject))
{
$subject = htmlspecialchars(trim($subject));
$subject = str_replace ('&#1740;', '&#1610;', $subject);
} |
But the first version will make sure any no supported char in title is not broken.
In this last one make sure you past the code with the correct html ascii code (change "& #" to "&#" while pasted in you php file).
Let's just hope phpbb is not using windows char-set at this stage, but I doubt.
if it works, then we have solved a general phpbb bug as any language could see on title posted with unsupported chars.
++
++ |
_________________ Useful links :
SEO Forum || SEO Directory || SEO phpBB || SEO phpBB3 || Search
____________________
Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Référencement phpBB3 || Recherche |
|
| Back to top |
|
 |
AmirAbbas phpBB SEO Team


Joined: 11 May 2006 Posts: 529 Location: IRAN
|
Posted: Fri Dec 08, 2006 10:51 am Post subject: Re: two other problem with UTF-8 persian |
|
|
| dcz wrote: |
Then, talking about your registering problem, we could try using the UTF-8 filtering technique I put together.
Try open :
| Code: |
Includes/funtions_validate.php |
Find:
| Code: |
function validate_username($username)
{
global $db, $lang, $userdata; |
After add :
++ |
hi
excuse me
i think this methos doesn't work in some situations
when some one try to register with persian character th system will block it but if the user use a combination of persian character and english character or persian character and numbers , this solution doesn't work
see
-http://forum.persia-cms.com/member224.html
-http://forum.persia-cms.com/member222.html |
_________________ چهار گوش - طراحی وب - مجله طراحی وب |
|
| Back to top |
|
 |
|
|
| Navigation |
Similar Topics |
|
|
|
|
|
|
|