Google Sitemaps and Robots.txt don't jive

GoogleBot, MSNBot, Yahoo!Slurp ... Everything about indexing Bots, ip lists, User Agents, Crawl and robots.txt.

Moderator: Moderators

Google Sitemaps and Robots.txt don't jive

Postby daddyo » Wed Oct 24, 2007 4:40 am

I have the following in my Robots.txt file in my root website directory:

User-agent: *
Disallow: /phpBB/viewforum.php
Disallow: /phpBB/viewtopic.php
Disallow: /phpBB/index.php?
Disallow: /phpBB/posting.php
Disallow: /phpBB/groupcp.php
Disallow: /phpBB/search.php
Disallow: /phpBB/login.php
Disallow: /phpBB/privmsg.php
Disallow: /phpBB/post
Disallow: /phpBB/member
Disallow: /phpBB/profile.php
Disallow: /phpBB/memberlist.php
Disallow: /phpBB/faq.php

But today, when I logged into Google Sitemaps, it tells me it didn't index about 4600 URLs, some of which are in my new SEO format..

Like this one:

-http://www.mydomain.com/phpBB/I-want-this-thread-indexed-t12345.html

Is everything functioning correctly? Should the Robots.txt only be blocking URLs with "viewtopic?t=#####" in them and leaving the ones like I listed above alone?
daddyo
 
Posts: 47
Joined: Wed Sep 12, 2007 6:05 pm

Advertisement

Re: Google Sitemaps and Robots.txt don't jive

Postby SeO » Wed Oct 24, 2007 6:49 am

I bet :
-http://www.mydomain.com/phpBB/I-want-this-thread-indexed-t12345.html

in deed is -http://www.mydomain.com/phpBB/this-thread-is-private-t12345.html

;)

Topics posted in private forum are thanks to god not public for real, so, when you follow such a link as a guest, you' indeed HTTP 302 (not 301) redirected to the login form, and login.php is disallowed with the robots.txt.

The 302 redirection makes so Google still consider the topic url to be the "real" one, but at the same time, it take into account the disallow on login.php.

About the other disallowed pages, looks like your forum previously was indexed. So if it was with the regular URLs, you can get rid of :

Code: Select all
Disallow: /phpBB/viewforum.php
Disallow: /phpBB/viewtopic.php
Disallow: /phpBB/index.php?


For couple month, and let the zero duplicate ( you need it in case you did not activated it ) do the rest ;)
SeO
Admin
Admin
 
Posts: 6333
Joined: Wed Mar 15, 2006 9:41 pm

Postby daddyo » Wed Oct 24, 2007 1:18 pm

So I have Zero Dupes and one of the SEO Mods (intermediate or advanced links I think) installed right now, trying to get Google to de-index my viewforum.php?t=12345 links and start indexing my This-link-needs-indexing.html instead.

And you're telling me I need to remove those three lines in my Robots.txt file?

Disallow: /phpBB/viewforum.php
Disallow: /phpBB/viewtopic.php
Disallow: /phpBB/index.php?
daddyo
 
Posts: 47
Joined: Wed Sep 12, 2007 6:05 pm

Postby dcz » Sun Oct 28, 2007 10:25 am

Yes, temporarily.

This way Google will still crawl them and will acknowledge faster that the viewtopic.php urls have moved (HTTP 301) to the new URLs.

++
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 21238
Joined: Fri Apr 28, 2006 9:03 pm

Postby daddyo » Sun Oct 28, 2007 2:18 pm

OK. Thanks.

BTW, do you have a Paypal donation button? Your techniques here and your responses to my questions have been worth so much over the past few months. Without your site, I don't know where my site would be.
daddyo
 
Posts: 47
Joined: Wed Sep 12, 2007 6:05 pm

Postby dcz » Sun Oct 28, 2007 2:37 pm

Thanks ;)

The paypal donation account has already been on my todo list for ages, I should commit on of these days :roll:

Would help out for hosting.

++
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 21238
Joined: Fri Apr 28, 2006 9:03 pm


Return to roBots

 


  • Related topics
    Replies
    Views
    Last post

Who is online

Users browsing this forum: No registered users and 1 guest