phpBB SEO
Boards
Directory  
SEO  
Downloads
  phpBB SEO : Search Engine Optimization, Directory, Forums  
Index
Forums
Annuaire
Référencement
Télécharger
 
  Search Rechercher
    Register
Username :  Password :  Log me on automatically each visit  
S'enregistrer  
 
   
Google, PageRank and Persian UTF-8

 
Post new topic   Reply to topic    phpBB SEO » SEO Forum  » Google Forums
::  
Author Message
AmirAbbas
phpBB SEO Team
phpBB SEO Team


Joined: 11 May 2006
Posts: 529
Location: IRAN

Google, PageRank and Persian UTF-8Posted: Mon Jun 19, 2006 9:40 am    Post subject: Google, PageRank and Persian UTF-8

yesterday tried to search some persian keyword about phpbb in internet
i tried this words that are combination of an english word and a persian word:

Code:
مقالات phpbb



the persian words pronounced "maghale" and means "article" in english
its the resoult of google



i shocked when i see this result
this forum is an english forum. (how its possible that this forum contains persian words). i remembered that i posted some persian keywords in [code] in this forum
i have a question

there is lot of persian article about phpbb in web (my forum contains lot of these articles that includes that keyword) but google show this forum in third place in search result
how its possible ?

i know that one day this forum will have a very good page rank but at this time all famous persian site have a better page rank in comparison with this forum

i thought that if two site contain a keyword in their context
google will check the page rank of that two site and will show the link of the site with higher page rank in first place

_________________
چهار گوش - طراحی وب - مجله طراحی وب
Back to top
Visit poster's website
dcz
Administrateur - Site Admin
Administrateur - Site Admin


Joined: 28 Apr 2006
Posts: 14327

Google, PageRank and Persian UTF-8Posted: Mon Jun 19, 2006 12:42 pm    Post subject: Re: Google, PageRank and Persian UTF-8

Well things are more complicated than just having the greatest PageRank to show up first.

In this case,مقالا , must have been posted in the mentioned page, and the search result is illustrating the importance of the domain name for Search Engines.
phpBB is part of this site's URL, and more, is one of the most used word in it.
So no surprise here, with this much weight on "phpBB", domain + URL + the global density of the term here. If you search for the same without "phpBB" results are totally different.

Another thing to notice here is the second result, on drupal.org, obviously not in Persian. And the result is different on Google.fr using hl=fr and not hl=all (this stand for the targeted lang in Google results). Here wee only have Persians Web Site.

I think that Google is doing more than just comparing characters and words, I am pretty sure Google is able to check if spidered words are part of a language, and does make a difference between "pojqsgf", a word that do not exist in any language, and a real used word.
I'll elaborate more on this, but, some examples have shown me for example that the underscore "_" was already being used as a separator while SEO web-sites where supposedly proving the contrary by testing the indexing of two examples such as these two : "mkosdqlhgq_gqlmgkjqglkj" and "kjfhgikgqepjeth-groijhrgkhgr".
Their idea was to prove that the underscore was not a separator, the demonstration was based on two search queries after the page went spidered : "mkosdqlhgq", being part of the first one, and "kjfhgikgqepjeth", part of the second. Only the second one showned up, so they claimed the underscore was not a separator.

I do think things are different, because at the same time, I was using a site map installed in a folder named "site_map/" and was actually able to list all of it with such search queries "site:www.exmple.com site", "site:www.exmple.com map"and "site:www.exmple.com site map".
In all cases Google highlighted the corresponding words in URLs, and not the underscore Wink.

This proves to me that Google is really going deep in the language analysis and is actually using the languages rules, grammar and spelling, while analysing a web-site's content.
This explains why the underscore will not be a separator for two words that do not exists, because the underscore in not being used by any languages, so this must not be interpreted as a word nor two if we want good search results.

Using the "-", which is an actual separator used in many languages, will for sure add something in Google's analysis, the "-" just has more meaning than an underscore as again, it is part of many languages.
Not big surprise then if Google uses it as it should, even though the two "words" did not exists.

And, when it comes to the contrary, two valid words separated with an underscore, it's being understood by Google as more than two random character lists separated with and underscore. Just because they have a meaning, they are words.

That's what was totally missed in the underscore discussions, language analysis.
That's where we'll start to interest more than phpBB users on this site, when we'll start our own SEO experiments Wink
And that's all about I like in this project, my theoretical physics background as well as my Newspaper and magazine experiences will be useful to go further than trivial experiments.
I am sure that the phpBB SEO web-site will be a powerful tool for all of us willing to experiment, understand, share and learn on Search Engine Optimization, with the cookie being we'll provide SEO working solutions and Good Backlinks for free Very Happy

To come back to our subject and to answer your question, your example is using phpBB, in Latin characters, and a Persian term.
I just think that "phpBB" was taken as the word with the most importance here because it is most likely to have been a lot more used in the whole Internet than the Persian one.
It's not that much of a bad result if you where searching for phpBB related stuff, as for sure, English is still leading in the Internet, even more when it comes to our dear phpBB script.
Even from France, while searching for french words I often see English web site showing up first, just because there was a php function in the query, things like this. And most of the time, I must admit, the English web-sites will be better on php stuff, so ... It's still a good Search Result Wink

To go further I find it very interesting that you are part of this community, because your are using a UTF-8 char-set and I must admit I know few things about how well the Search Engines deals with it.

What we'll find out while analysing your web-site's Ranking will help out to understand useful matters for other languages using UTF-8, such as Russian or Chinese.

We already observed in this case that a Latin char-set word queried together with a UTF-8 Persian one was the one having more weight in the results.
But is it something always happening ?
Is Google actually giving more weight to Latin char-stet, or all of this is just happening because English is the most used language on the net (thus being a good reason to use the local Google servers (.fr in France etc ...) to obtain different results) ?
It is interesting to notice that bonjour مقال (bonjour = hello in French) gives almost the same results on Google.com (hl=all).
For sure because "bonjour" is less used than phpBB in Internet, we have much better balanced results. Sites on the first page are mostly using both languages, I don't see wrong matches in those Wink

_________________
Useful links :
SEO Forum || SEO Directory || SEO phpBB || SEO phpBB3 || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Référencement phpBB3 || Recherche
Back to top
Visit poster's website
AmirAbbas
phpBB SEO Team
phpBB SEO Team


Joined: 11 May 2006
Posts: 529
Location: IRAN

Google, PageRank and Persian UTF-8Posted: Wed Jun 21, 2006 1:57 pm    Post subject: Re: Google, PageRank and Persian UTF-8

Quote:
I think that Google is doing more than just comparing characters and words, I am pretty sure Google is able to check if spidered words are part of a language, and does make a difference between "pojqsgf", a word that do not exist in any language, and a real used word.


Quote:
This proves to me that Google is really going deep in the language analysis and is actually using the languages rules, grammar and spelling, while analysing a web-site's content.



this theories are true for english,french,german and spanish language
im sure that google can not understand the persian word and doesn't know anything about persian grammer. google is not able to find which character is the first character of a word and which character is the last character of a persian word. persian words are completely unmeaningful for google
see the word

Code:
آموزش


this word is a persian world. it pronounces "amoozesh" and it means "education"
this word is meaningful.
now see this picture



evertthing is normal

now i want to search part of that word in google
i removed the first character of that word (persian languge is right lo left language

and the first character is the first character in right side of the word)
now i searched this word

Code:
موزش


now this word is completely unmeaningful
i seached this word in google
see this page




as you can see google can find this word in sites but not as a complete word
imagine that you want search to seach word raw in google
now google shows you word crawl in search result
because word crawl is contain word raw inside of itself

google can't understand persian word and when you tried to search a word
google only will check its database and will seach for sequence of characters
according to this fact
i think the word weight is not have any meaning for persian language
the pagerank calculation is different for persian sites
persian and arabic site can obtain pagerank very soon and you cant find any
persian site with pagerank higher than 6 (at least i couldn't find them)

and another thing
the relationship between persian and arabic is very high (like relationship between english and french languages)
arabic language has 28 character. persian has all arabic character plus this 4 character

for the first example (word مقالات )
that word originally is an arabic language and the 6th link on the page in my first post is not a persian site. that site is a arabic site with UTF-8 charset

i thought this information can be helpful Rolling Eyes
thank you for information

_________________
چهار گوش - طراحی وب - مجله طراحی وب
Back to top
Visit poster's website
dcz
Administrateur - Site Admin
Administrateur - Site Admin


Joined: 28 Apr 2006
Posts: 14327

Google, PageRank and Persian UTF-8Posted: Wed Jun 21, 2006 2:26 pm    Post subject: Re: Google, PageRank and Persian UTF-8

Thanks a lot for elaborating on the Google UTF-8's handling.

As a matter of fact, I first though search engines did a lot less well with non Latin char-sets.

What you pointed out shows us that for sure Google is not dealing as well with Persian language, and that it's most likely to be the case in Arabic, Russian, Chinese, Japanese and every language not using Latin char-sets.

But, I think the results may be different between those languages being less considered by search engine so far.
It's mos likely due to economical and technical matters, English is the computer's original language, and believe me, even from France, we can still feel it Wink

Then, every language's must have been deeply taken into account together with the Internet spreading, faster where it is faster, better where some computer science research is being performed.

The thing is I don't think much could change the English language supremacy here, but I have good hopes Search Engines will improve the way they deal with other languages, just a matter of time actually.

The partial word match example you propose is very interesting and should be elaborated a bit further before we can conclude.

Because, and I'll re-use your English example to make it clear, crawl and raw is not exactly the same just taking a letter off a word, even though I think you are right.

Then, it's interesting to notice this could mean that the PageRank is not calculated the exact same as for English web sites. But this is not this sure, because the algorithm is taking into account relative datas about the site, and since there are a lot fewer Persian sites, one can appear more important compared to other easier.
Just like a growth rate when starting from nothing. First steps means high growth, even though the absolute difference is close to nothing.

And the same apply to the words weight. We have seen that words weren't necessary recognized as existing words in Persian, but this applies to other languages as well, on misspellings for example.
Google is still able to search for them, so I assume the rule to detect form which language is a word and other word details are using shortcuts for when it's not part of the obvious dictionary, the "is a word question" can be "is it a symbol or code" for example.
I am pretty sure Google is still doing words weight stats, at least basic, in Persian too, the analysis must just be less deep.


And then, the cool thing for you about this is it seem it's the right time to start new Persian websites, less backlinks for better PageRanking, this won't last too long, at least until the Persian Internet community grows more.

Older web-site from back when it was easier will be older, and time is very important for Google, so, it's most likely this will last if you continue posting new backlinks on a regular basis and Search Engine Optimize you web site with phpBB SEO Wink

++

_________________
Useful links :
SEO Forum || SEO Directory || SEO phpBB || SEO phpBB3 || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Référencement phpBB3 || Recherche
Back to top
Visit poster's website
AmirAbbas
phpBB SEO Team
phpBB SEO Team


Joined: 11 May 2006
Posts: 529
Location: IRAN

Google, PageRank and Persian UTF-8Posted: Thu Jun 22, 2006 9:44 am    Post subject: Re: Google, PageRank and Persian UTF-8

DCZ wrote:
And then, the cool thing for you about this is it seem it's the right time to start new Persian websites, less backlinks for better PageRanking, this won't last too long, at least until the Persian Internet community grows more.

Older web-site from back when it was easier will be older, and time is very important for Google, so, it's most likely this will last if you continue posting new backlinks on a regular basis and Search Engine Optimize you web site with phpBB SEO Wink


excuse me
this part was not clear for me
start a new persian site ?
less backlinks for better PageRanking ?!

Rolling Eyes

_________________
چهار گوش - طراحی وب - مجله طراحی وب
Back to top
Visit poster's website
dcz
Administrateur - Site Admin
Administrateur - Site Admin


Joined: 28 Apr 2006
Posts: 14327

Google, PageRank and Persian UTF-8Posted: Thu Jun 22, 2006 11:46 am    Post subject: Re: Google, PageRank and Persian UTF-8

My point was :

1) there must be a lot less Persian-speaking web site worldwide than English-speaking web sites,

2) web-site's PageRank is of course taking into account if your site is important compared to other, and more specifically, to others of the same languages, and further to others using same languages and about the same type of things.

So, less Persian web sites mean it's easier to become big compared to others, at least when talking about site providing contents and having backlinks.
Easier than if the site was in English, because it is now most likely there are already 1000 th of similar site in the English world, so one will need enormous amount of backlinks to be "important" in the English world, but a lot fewer in the Persian world, because what counts is to have more than others, so ... less web sites, having better PR with fewer backlinks make it easier to get well ranked in Persian.

My point was it must be easier to end up with PR 6 in Persian than in English. I as well observed that this situation will change while more and more Persian web sites will be online, so it is time to start web site.

Easier now, and when it'll become harder, your site will have become older and thus will still have a bonus in search engines, compared for example to new Persian Web sites lunched later.

So this is good time Wink

You'll just need to make sure you'll have more and more web directories links and any other type of new links to your domains with time, and you have great chances to show up in good position even on wanted keywords. It'll just be easier than if you'd have to obtain the same result in a English website.

So yes now it seems you'll need less backlinks than a similar English-speaking web-site would to obtain the same PageRank Wink

This is for sure, and it won't last 1000 years so Wink

_________________
Useful links :
SEO Forum || SEO Directory || SEO phpBB || SEO phpBB3 || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Référencement phpBB3 || Recherche
Back to top
Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    phpBB SEO » SEO Forum  » Google Forums
Page 1 of 1

Navigation Similar Topics

Jump to: