Review my robots.txt

GoogleBot, MSNBot, Yahoo!Slurp ... Everything about indexing Bots, ip lists, User Agents, Crawl and robots.txt.

Moderator: Moderators


Review my robots.txt

Postby zinnerz » Sat Feb 10, 2007 11:27 pm

Dear all,

Please review my robots.txt, is this ok?

Code: Select all
User-agent: googlebot
Disallow:
Disallow: *.php$
Disallow: post
Disallow: /*?
Disallow: /*?*

User-agent: slurp
Crawl-delay: 15
Disallow:
Disallow: /*.php$
Disallow: /post
Disallow: /*?
Disallow: /*?*

User-agent: msnbot
Crawl-delay: 15
Disallow:
Disallow: /*.php$
Disallow: /post
Disallow: /*?
Disallow: /*?*

User-agent: Googlebot-Image
Disallow:

User-Agent: MediaPartners-Google
Disallow:

User-agent: *
Disallow: /


My forum is in the root :wink:
zinnerz
PR0
PR0
 
Posts: 76
Joined: Thu Jun 08, 2006 7:25 am

Advertisement

Postby dcz » Sun Feb 11, 2007 12:20 pm

Code: Select all
User-agent: *
Disallow: /


Is basically disallowing everything for all bots.
Code: Select all
User-Agent: MediaPartners-Google
Disallow:


Is a mistake is you are using AdSense, and is useless if not (the media partner bots does not really crawl unless you run adds).

Disallows with wild-cards are not really reliable.

According to what I see you tried to do, I'd only keep :


Code: Select all
User-agent: *
Crawl-delay: 15

and then, as you're talking about a forum, it depends on you urls standard, if it's phpBB and running a phpBB SEO mod rewrite, then, you should just add the disallow advised in the release thread, if not, you'd still need to add some disallow to prevent duplicates.

++
dcz
Administrateur - Site Admin
Administrateur - Site Admin
 
Posts: 17678
Joined: Fri Apr 28, 2006 9:03 pm

Postby zinnerz » Sun Feb 11, 2007 12:43 pm

dcz wrote:
Code: Select all
User-agent: *
Disallow: /


Is basically disallowing everything for all bots.

I dont know about this as 3 bots already allowed above it, so far I've seen in Show Bots mod and View Online IP Address, the 3 bots is still crawling and everyday the numbers of list is increasing.

Code: Select all
User-Agent: MediaPartners-Google
Disallow:


Is a mistake is you are using AdSense, and is useless if not (the media partner bots does not really crawl unless you run adds).

I will remove it ;)

Disallows with wild-cards are not really reliable.

I want to test it and see what happens, hopefully they are working.

and then, as you're talking about a forum, it depends on you urls standard, if it's phpBB and running a phpBB SEO mod rewrite, then, you should just add the disallow advised in the release thread, if not, you'd still need to add some disallow to prevent duplicates.

++

I will update here again after 1 week, see are they crawling or not .php and post.html

Thanks Dcz for your review ;)

SS:
Image

Image
Last edited by zinnerz on Sun Feb 11, 2007 1:10 pm, edited 1 time in total.
zinnerz
PR0
PR0
 
Posts: 76
Joined: Thu Jun 08, 2006 7:25 am

Postby dcz » Sun Feb 11, 2007 1:07 pm

Well, it's a bit different to see bots on your forum, than having them actually cache pages.

As it's phpbb, my advice is in your case :

Code: Select all
User-agent: *
Crawl-delay: 15

Disallow: /viewtopic.php
Disallow: /viewforum.php
Disallow: /index.php?
Disallow: /posting.php
Disallow: /groupcp.php
Disallow: /search.php
Disallow: /login.php
Disallow: /privmsg.php
Disallow: /post
Disallow: /member
Disallow: /profile.php
Disallow: /memberlist.php
Disallow: /faq.php


++
dcz
Administrateur - Site Admin
Administrateur - Site Admin
 
Posts: 17678
Joined: Fri Apr 28, 2006 9:03 pm

Postby Ľubor » Mon Mar 12, 2007 5:23 pm

Code: Select all
User-agent: *

Disallow: viewtopic.php
Disallow: viewforum.php
Disallow: index.php?
Disallow: posting.php
Disallow: groupcp.php
Disallow: search.php
Disallow: login.php
Disallow: privmsg.php
Disallow: profile.php
Disallow: post
Disallow: member

is good?
http://pc-war.sk/robots.txt
Last edited by Ľubor on Mon Mar 12, 2007 7:52 pm, edited 1 time in total.
Ľubor
PR0
PR0
 
Posts: 52
Joined: Tue Feb 13, 2007 6:37 pm
Location: Slovakia,Bratislava

Postby dcz » Mon Mar 12, 2007 6:25 pm

Yes the robots.txt is correct, but you should install the zero duplicate, would get rid of all SID duplicate in SE's listing ;)

++
dcz
Administrateur - Site Admin
Administrateur - Site Admin
 
Posts: 17678
Joined: Fri Apr 28, 2006 9:03 pm


Return to roBots




  • Similar topics
    Replies
    Views
    Last post

Who is online

Users browsing this forum: No registered users and 1 guest