About robots.txt

GoogleBot, MSNBot, Yahoo!Slurp ... Everything about indexing Bots, ip lists, User Agents, Crawl and robots.txt.

Moderator: Moderators

About robots.txt

Postby Peter77 » Tue Jun 06, 2006 5:15 pm

Hi, I've been trying for a while to get rid of old URL's of my site from Google. I submited links for spesific subdirectorys to be removed. but google still seems to be searching for these old URL's. or subfolders inside my old phpbb directory too.

In my robots.txt file I have a Dissallow /forum/ the old name of my phpbb. but the directory does not exist on my site... I figured this was okay since search engines usually scan robot.txt first before indexing a site right?


Why would google still be looking for a subdiectory or URL that I have blocked? my robots.txt is under 800 characters so it can't be because I have too many...

should this..

Disallow: /forum/

be

Disallow: /forum


:?:
Peter77
phpBB SEO Team
phpBB SEO Team
 
Posts: 532
Joined: Wed May 10, 2006 9:46 am

Advertisement

Postby dcz » Tue Jun 06, 2006 6:44 pm

Well, the problem is, even Google Bots has some trouble sometime following the robots.txt exclusions.

Once it even digged out 8 month old cache version for pages that where last cached like a week before.
And this mean Google do keep a lot of backups ;)

And after this the GoogleBot started testing old URL, from a previous mod Rewrite that had already been cleared from it's index, again.

So the most secure is to built as well nice redirection, like this :

Code: Select all
RewriteRule ^old_folder/ /new_folder/ [R=301,L]


In your root's .htaccess, before the forum's Rewriterules.

This is pretty basic, www.example.com/old_folder/*.* will be redirected to www.example.com/new_folder/ without the rest of the URI and with a nice 301.

Here I think we should not try to keep the old URI, cause if I remember well, you did not use mod Rewrite before that, so, you don't want old links to your vanilla url (disallowed any way by the robots.txt).

With this code, all the old links will be redirected to your new forum's folder. It's a good target to concentrate PageRank from all those old links, and users will still be "home".

If you prefer to keep the old URI, even though it is a duplicate that you should disallow with your robots.txt, do this instead :

Code: Select all
RewriteRule ^old_folder/(.*) /new_folder/$1 [R=301,L]



But this is more work for the server and you will not transmit the eventual old PageRank to the new URL, because the old one are not mod Rewritten.

You will have things like this :
www.example.com/old_folder/viewtopic.php?t=XX => www.example.com/new_folder/viewtopic.php?t=XX


Byt the way :

Code: Select all
Disallow: /forum/


is correct ;)
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 21406
Joined: Fri Apr 28, 2006 9:03 pm

Postby Peter77 » Wed Jun 07, 2006 3:17 pm

dcz wrote:Well, the problem is, even Google Bots has some trouble sometime following the robots.txt exclusions.

Once it even digged out 8 month old cache version for pages that where last cached like a week before.
And this mean Google do keep a lot of backups ;)

And after this the GoogleBot started testing old URL, from a previous mod Rewrite that had already been cleared from it's index, again.

So the most secure is to built as well nice redirection, like this :

Code: Select all
RewriteRule ^old_folder/ /new_folder/ [R=301,L]


In your root's .htaccess, before the forum's Rewriterules.

This is pretty basic, www.example.com/old_folder/*.* will be redirected to www.example.com/new_folder/ without the rest of the URI and with a nice 301.

Here I think we should not try to keep the old URI, cause if I remember well, you did not use mod Rewrite before that, so, you don't want old links to your vanilla url (disallowed any way by the robots.txt).

With this code, all the old links will be redirected to your new forum's folder. It's a good target to concentrate PageRank from all those old links, and users will still be "home".

If you prefer to keep the old URI, even though it is a duplicate that you should disallow with your robots.txt, do this instead :

Code: Select all
RewriteRule ^old_folder/(.*) /new_folder/$1 [R=301,L]



But this is more work for the server and you will not transmit the eventual old PageRank to the new URL, because the old one are not mod Rewritten.

You will have things like this :
www.example.com/old_folder/viewtopic.php?t=XX => www.example.com/new_folder/viewtopic.php?t=XX


Byt the way :

Code: Select all
Disallow: /forum/


is correct ;)


Thanks for the rewriterule, it works like a charm! lol, yeah it's amazing what some search engines still keep in thier archives. latley ive been getting hit for searches of Music from way back before I even installed phpbb and I think this rewriterule you just presented will be very handy.
Peter77
phpBB SEO Team
phpBB SEO Team
 
Posts: 532
Joined: Wed May 10, 2006 9:46 am

Postby dcz » Wed Jun 07, 2006 3:19 pm

Yeah I don't know what they do with all those broken links and outdated cached pages in their database ...

Anyway, Apache RuleZ ;)
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 21406
Joined: Fri Apr 28, 2006 9:03 pm


Return to roBots

 


  • Related topics
    Replies
    Views
    Last post

Who is online

Users browsing this forum: No registered users and 1 guest