[Toolkit] Optimizing phpBB meta keywords

phpBB mods by phpBB SEO.

Moderator: Moderators

[Toolkit] Optimizing phpBB meta keywords

Postby SeO » Tue Jun 13, 2006 1:11 pm

Optimizing phpBB meta keywords :

We are going to see how to optimize phpBB meta Keywords, made with the phpBB SEO Dynamic Meta Tags which you must install before you go further with this method.
This optimization will as well concern you phpBB Search Tables.

Warning :

    This method is to be used with care, and should only be used by people with enough skills to understand what we propose to do here.
    On big forums, modifying or rebuilding the search tables can take some time. You'll thus need to be well prepared and carefully choose when to apply the DB changes.
    We advise you do lock your forum during the DB updates.
    On another hand, it is good to take care of phpBB search tables for better performances.
    In all cases, local testing will prevent many problems, a full backup (files +DB) even more.

Overview :

  1. Introduction

  2. Solutions


The French version of this article is located here.
Last edited by SeO on Mon Apr 02, 2007 2:01 pm, edited 4 times in total.
SeO
Admin
Admin
 
Posts: 6333
Joined: Wed Mar 15, 2006 9:41 pm

Advertisement

Postby SeO » Tue Jun 13, 2006 1:39 pm

Introduction


    Every time a new message is posted, phpBB counts all words in it, compare them with the one in the Search tables, and eventually creates a new entry or update the existing one's count (sum of all occurrences within every posted messages).
    The phpBB SEO meta keywords function is grabbing words in the message according to their respective weight (number of occurrences) in the search tables.

    A filter system is limiting the word to take into account. It's using the search_stopwords.txt files, that should be located in each one of the phpBB lang folders.
    This simple txt file contains a list of word that will not be search-able with the phpBB Search function, one word per line, without spaces at the end of the lines.
    This word list won't show up in the phpBB Search tables.

    A considerable amount of words, such as "I", "you", "hello", "they" etc ... are totally useless for searching posts and won't be of any good in your meta keywords.
    And it's very probable that your search tables are listing an important amount of useless words, if not enormous.

    By excluding useless words in the search_stopwords.txt listing you will :

    • Enhance (a lot) your meta keywords;
    • The phpBB Search function will work faster, since it won't check for useless entries any more;
    • Page load and posting will be faster on your forum.

    It's a bit long to achieve, but it is worth it.

    You must backup all your forum and data base before you work on the search tables, and not do it while your forum is the most visited.
    You should lock your forum while proceeding.
Last edited by SeO on Mon Apr 02, 2007 2:01 pm, edited 1 time in total.
SeO
Admin
Admin
 
Posts: 6333
Joined: Wed Mar 15, 2006 9:41 pm

Postby SeO » Tue Jun 13, 2006 5:30 pm

Solutions


    Our first problem here, is that the default search_stopwords.txt files, if any, are not listing enough words. And the phpBB code supposed to filter words with less than three letters is not really working.
    If you do nothing, your search tables will soon list a lot, if not a majority of useless words.

    One could think that some words with two letters are interesting to work on when Search Engine Optimizing (SEO), but this will be harder to deal with and not that useful in the end.
    Keeping those, you'd have to list all the words to exclude with one or two letters in your search_stopwords.txt, and it will be tricky to exclude all of them.
    The few interesting short keywords you are thinking about will cost you some time and performances. The fact they won't show up in the meta keywords will not changes much to the page's PageRank and Search Engine will still be able to perform search on such words.
    In addition, when dealing with bigger forums, and well Search Engine Optimized ;), it can be wise to promote the use of Search Engines to perform Site Searches. The results will be more accurate and you will save some server resources.

    Here is how to really limit the words with less than three letters in phpBB search tables :

    Giefca's fix :

    Code: Select all
    #
    #--[ OPEN ]
    #
    includes/functions_search.php

    #
    #--[ FIND ]
    #
       if ( $mode == 'post' )
       {
          $entry = str_replace('*', ' ', $entry);

          // 'words' that consist of <3 or >20 characters are removed.
          $entry = preg_replace('/[ ]([\S]{1,2}|[\S]{21,})[ ]/',' ', $entry);
       }

    #
    #--[ REPLACE WITH ]
    #
    // 3 letters fix by Giefca
       if ( $mode == 'post' )
       {
               $entry = str_replace('*', ' ', $entry);

               // 'words' that consist of <3 or >20 characters are removed.
               $split = explode(' ', $entry);
               $taille_split = sizeof($split);       
               for ($i = 0; $i < $taille_split; $i++)
               {
                  $split[$i] = trim($split[$i]);
                  if ((strlen($split[$i]) < 3) || (strlen($split[$i]) > 20))
                  {
                     $split[$i] = '';
                  }
               }
               $entry = implode(' ', $split);
         }
    // 3 letters fix by Giefca



    Please note that this patch will not get rid of the short words already listed in the search tables, it will only make sure they won't be added nor updated after you installed it.

    Now we need to find out which words we should exclude both from the phpBB search tables and the meta keywords.

    Please start phpMyAdmin and perform this SQL query :

    Code: Select all
    SELECT ls.word_id, ls.word_text, COUNT(wm.word_id) as entries FROM `phpbb_search_wordlist` as ls LEFT JOIN `phpbb_search_wordmatch` as wm ON ls.word_id=wm.word_id GROUP BY wm.word_id ORDER BY entries DESC LIMIT 0,100


    It will show you the 100 words the most used in your forum.
    You will see how much your search tables do includes useless words.
    The idea here is to find out among those 100 words which one we should exclude.

    Once you have build up your list, you can go further and see if the next 100 are not listing many words to exclude as well, with this SQL :

    Code: Select all
    SELECT ls.word_id, ls.word_text, COUNT(wm.word_id) as entries FROM `phpbb_search_wordlist` as ls LEFT JOIN `phpbb_search_wordmatch` as wm ON ls.word_id=wm.word_id GROUP BY wm.word_id ORDER BY entries DESC LIMIT 100,200


    And so on, if your search tables are really filled with useless words, changing this part of the query :

    Code: Select all
    LIMIT FIRST_WORD,LAST_WORD


    There is no need to list too many words at a time, nor to go up to the top 1000 ;)

    Then you just have to add those words in your search_stopwords.txt, one per line, without spaces at the end of the line.

    Note that there is one search_stopwords.txt file per language, you will have to edit all of them.
    According to the number of languages used in your forum, you can add all the words in every language in a single search_stopwords.txt you'll copy in every language folder; or build up a search_stopwords.txt per language.
    In this case you should still put all of your excluded words in the search_stopwords.txt of your default language prior to dispatch them again after you'd have worked on the search tables. This way, you won't have to do the same filtering for every language.

    Open up your search_stopwords.txt (\language\lang_english) or create a new one if there is none.
    With the Giefca's patch, do not include words (and numbers since all is "text" here) of one and two letters in the listing.

    It's a good thing to really build up a good words list at first, you won't have to clean your search table more than required.

    In the end, we need to clean the search tables. To do this two solutions : Rebuild Search, which will totally rebuild your search tables or the simple script proposed in this article at phpBB.com.
    The interesting part in it is the 4th : Delete search_stopwords from your search tables.

    Rebuild Search is a bit heavier to use and will certainly take more time to run, but it can be very useful in a phpBB forum's life, because search table errors do occur and recovering search tables from backups can be tricky, especially with big forums.
    In addition, if you implemented the Giefca's patch, words of less than three letters will be directly filtered, as the method used is cloning the posting process.

    The phpBB.com is lighter and easier to use. It will only delete the newly excluded words from the search tables.
    With Giefca's patch, you will still need to add the short words in the search_stopwords.txt the first time you run the script, and take them away once done (as they won't show up any more).

    Once your list is completed, upload the new search_stopwords.txt and wait for the right time to either run Rebuild Search or the phpBB.com script.
    Try the top 100 SQL query again and enjoy the difference.

    There might be some useless words left, because they were not in the previous top 100 (or 200 etc ...).
    No big deal, update your search_stopwords.txt and re-update your Search tables.
    As well you can check your meta keywords to see if some less popular words should be excluded.
    After a few cycles, your meta keywords should start to become very interesting ;)

    With big forums, as there should be a lot of work to do, try to minimize the number of cycles you have to go through, using a good search_stopwords.txt.
    And wait until you have found many more words to exclude before you go for another one.
    And again, choose the right time to work on your search tables.

    Rebuild Search has many options, here are the one advised here :

    • Time limit : Server's timeout, usually 30s, use 25 to be sure.
    • Starting post_id : Should be 0 for the first use as well as when you updated the search_stopwords.txt and go for a new cycle.
    • Posts per cycle : 50, no need to put too much here.
    • Disable board : Do it, this will prevent user from posting during the process, which should not be a problem with this mod, but if it's a 100 ...

    You should delete your search tables when rebuilding them completely, after you have backup-ed everything obviously.

    Lunch it, after the first 50 post will have been processed, you will see the graphs and details. You just have to wait until it ends.
    Screenshots can be found here.
Last edited by SeO on Mon Mar 19, 2007 10:16 pm, edited 1 time in total.
SeO
Admin
Admin
 
Posts: 6333
Joined: Wed Mar 15, 2006 9:41 pm

Postby AmirAbbas » Thu Jun 15, 2006 4:17 pm

this fix doesn't work for UTF-8 charset
because each utf-8 character is equal with two iso-8859-1 charset

Code: Select all
// 3 letters fix by Giefca
   if ( $mode == 'post' )
   {
           $entry = str_replace('*', ' ', $entry);

           // 'words' that consist of <3 or >20 characters are removed.
           $split = explode(' ', $entry);
           $taille_split = sizeof($split);       
           for ($i = 0; $i < $taille_split; $i++)
           {
              $split[$i] = trim($split[$i]);
              if ((strlen($split[$i]) < 3) || (strlen($split[$i]) > 20))
              {
                 $split[$i] = '';
              }
           }
           $entry = implode(' ', $split);
     }
// 3 letters fix by Giefca


i changed that two number (3 and 20)

Code: Select all
// 3 letters fix by Giefca
   if ( $mode == 'post' )
   {
           $entry = str_replace('*', ' ', $entry);

           // 'words' that consist of <3 or >20 characters are removed.
           $split = explode(' ', $entry);
           $taille_split = sizeof($split);       
           for ($i = 0; $i < $taille_split; $i++)
           {
              $split[$i] = trim($split[$i]);
              if ((strlen($split[$i]) < 6) || (strlen($split[$i]) > 40))
              {
                 $split[$i] = '';
              }
           }
           $entry = implode(' ', $split);
     }
// 3 letters fix by Giefca


i think it can work now
but im not sure :?
User avatar
AmirAbbas
phpBB SEO Team
phpBB SEO Team
 
Posts: 534
Joined: Thu May 11, 2006 3:30 pm
Location: IRAN

Postby dcz » Thu Jun 15, 2006 4:29 pm

Actually, have you checked what was the smallest word you has in your search tables ?

I am not sure either about how to deal with UTF-8 characters here, they'd need to be converted to iso-8859-1 for this code to work I think.

You can still list UTF-8 words to exclude in your search_stopwords.txt and proceed without the Giefca's fix for now.

You'll still enhance your meta keywords a lot and end up with a smaller db.

You just need a longer search_stopwords.txt that's all.
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 21219
Joined: Fri Apr 28, 2006 9:03 pm

Postby AmirAbbas » Thu Jun 15, 2006 4:51 pm

dcz wrote:Actually, have you checked what was the smallest word you has in your search tables ?

I am not sure either about how to deal with UTF-8 characters here, they'd need to be converted to iso-8859-1 for this code to work I think.

You can still list UTF-8 words to exclude in your search_stopwords.txt and proceed without the Giefca's fix for now.

You'll still enhance your meta keywords a lot and end up with a smaller db.

You just need a longer search_stopwords.txt that's all.


you mean that i must make a comprehensive search_stopwords.txt?
i think making that file is better than Giefca fix
it can make problem for UTF-8 encoding

thanks
User avatar
AmirAbbas
phpBB SEO Team
phpBB SEO Team
 
Posts: 534
Joined: Thu May 11, 2006 3:30 pm
Location: IRAN

Postby dcz » Thu Jun 15, 2006 4:54 pm

Yes, if phpBB search function is able to deal with utf-8, then, if you put all words you don't want to see including the one and two letters ones, then, they won't be search able nor in your meta keywords ;)
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 21219
Joined: Fri Apr 28, 2006 9:03 pm

Postby Peter77 » Fri Jun 16, 2006 12:00 am

I've read this a couple times already. I want to understand what im doing before I try it lol. I know meta keywords are not as 'important' as they once used to be. but IMHO, I think they can be very useful to have. I've always had meta keywords on my portal, but not my phpbb. and then about a month ago I decided to add meta keywords to my forum. I've been noticing a really great improvement in search engines results from people looking for other sites that are similiar to mine. in some cases my site is on the first page with certain keywords. and this is just the Meta Keywords doing thier job ( I suspect ) as im still waiting for my indexed SEO friendly pages to show up in search engine results... imagine when that starts kicking in! :)
Peter77
phpBB SEO Team
phpBB SEO Team
 
Posts: 520
Joined: Wed May 10, 2006 9:46 am
Location: Michigan

Postby dcz » Fri Jun 16, 2006 12:10 am

Well, meta keywords where once very useful to Search Engine Optimize on selected keywords.

Now, the keyword list won't change much to search results querying them, but the meta tags themselves by being there and personalised for every page will help a lot.

Many page with the same meta description will for sure not be indexed and ranked as well as if they ad all a different one, just like titles, this is for sure.

Now, this specific how-to will just optimize a bit further. It's really worth it because it is as well cleaning your phpbb Search tables from unnecessary entries.

I'll soon release a optimized titles mod for phpbb. No big deal waiting, bots love when pages are updated so ...

++
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 21219
Joined: Fri Apr 28, 2006 9:03 pm

Postby Peter77 » Fri Jun 16, 2006 3:23 am

Ok... everything seemed to go well. are we supposed to keep search_stopwords.txt in our language folder from now on?
Peter77
phpBB SEO Team
phpBB SEO Team
 
Posts: 520
Joined: Wed May 10, 2006 9:46 am
Location: Michigan

Postby dcz » Fri Jun 16, 2006 10:10 am

Exactly :D

one per language folder, can be the same copied over with words from all used languages, if not too big (I do it here for now, the file is 3.5 ko, not too much).

If you implemented the Giefca fix, no need to keep the word under three letter long in it after cleaning the search tables (nor before using rebuild search).

Fell free to start a thread to share our stop word lists, could be the [url]phpBB forum[/url] ;)

I think in most case we should end up with pretty much the same thing, at least a common base.

Anyway, this will be elaborated with time ;)

++
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 21219
Joined: Fri Apr 28, 2006 9:03 pm

Postby dcz » Tue Jun 27, 2006 8:23 am

For the UTF-8 I have found this :

mb_strlen :
Description
int mb_strlen ( string str [, string encoding] )

mb_strlen() returns number of characters in string str having character encoding encoding. A multi-byte character is counted as 1.

encoding is character encoding for str. If encoding is omitted, internal character encoding is used.


This look like the Giefca fix was a bit heavy, with this one it seems we can just make sure mb_strlen($string, 'charset') is under 20 and over 3.

to test further ;)
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 21219
Joined: Fri Apr 28, 2006 9:03 pm

Postby dcz » Wed Jul 12, 2006 12:17 pm

Or, this wonderful function can help even more : utf8_encode & utf8_decode

We could try this code then :

Code: Select all
// 3 letters fix by Giefca for utf-8
   if ( $mode == 'post' )
   {
           $entry = utf8_decode($entry);
           $entry = str_replace('*', ' ', $entry);

           // 'words' that consist of <3 or >20 characters are removed.
           $split = explode(' ', $entry);
           $taille_split = sizeof($split);       
           for ($i = 0; $i < $taille_split; $i++)
           {
              $split[$i] = trim($split[$i]);
              if ((strlen($split[$i]) < 3) || (strlen($split[$i]) > 20))
              {
                 $split[$i] = '';
              }
           }
           $entry = implode(' ', $split);
           $entry = utf8_encode($entry);
     }
// 3 letters fix by Giefca for utf-8


But this needs to be tested on an UTF-8 installation to make sure no characters are lost doing this, which I cannot do as I do not use UTF-8.

++
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 21219
Joined: Fri Apr 28, 2006 9:03 pm

Postby AmirAbbas » Wed Jul 12, 2006 3:47 pm

OHH

thanks
i will test it :P
User avatar
AmirAbbas
phpBB SEO Team
phpBB SEO Team
 
Posts: 534
Joined: Thu May 11, 2006 3:30 pm
Location: IRAN

Postby AmirAbbas » Thu Jul 13, 2006 12:08 pm

i tested it
it doesn't work for persian UTF-8 (maybe works for chinese or russian )
first i installed this fix
after that i sent a post that contains this word

Code: Select all
اندرز


as you can see this word have 4 charachter but system couldn't find this word in search
User avatar
AmirAbbas
phpBB SEO Team
phpBB SEO Team
 
Posts: 534
Joined: Thu May 11, 2006 3:30 pm
Location: IRAN

Next

Return to phpBB2 SEO TooLKit

 


  • Related topics
    Replies
    Views
    Last post

Who is online

Users browsing this forum: No registered users and 1 guest