what to block and what not to in robots.txt

GoogleBot, MSNBot, Yahoo!Slurp ... Everything about indexing Bots, ip lists, User Agents, Crawl and robots.txt.

Moderator: Moderators

what to block and what not to in robots.txt

Postby apeee » Tue Dec 11, 2007 9:06 pm

1. I'm using
RewriteRule ^index\.html$ /index.php [QSA,L,NC]
in .htaccess, should I disallow index.php like Disallow: /forums/index.php? or Disallow: /forums/index.php

2. Why are you disallowing faq.php in robots.txt?

3. Can i also disallow all the directories like adm, cache, cgi-bin, docs, download, files, images, includes, install (deleted), includes, language, store, phpbb_seo? (all directories)

4. Anyway, this can be the silliest question ever, where are all these .html generated files are stored? some specific directories?

5. Can I rewrite unanswered posts and others to .html?

____________________________

One important thing: when I validate my website for links through w3.org, the link of your website kept in footer "http://www.phpbb-seo.com/" gets error like

Code: Select all
http://www.phpbb-seo.com/index.php
    What to do: The link is forbidden! This needs fixing. Usual suspects: a missing index.html or Overview.html, or a missing ACL.
    Response status code: 403
    Response message: Forbidden
    Line: 216


Are you blocking incoming links? why?
apeee
PR0
PR0
 
Posts: 60
Joined: Tue Nov 06, 2007 11:47 am

Advertisement

Postby Peter77 » Tue Dec 11, 2007 9:35 pm

You can use
Dissallow: forums/index.php? because since forums/index.php is 301 re directed to forums/ anyway.

2. Generally, you want to disallow pages that do not add much value to your site as far as content wise. faq page is important page, but to your members only.
Please note that Google warns of having too many lines in your robots.txt. I believe the limit is 100 lines.

3. You can disallow more directories and pages that you wish, yes.
I have extra directories in my robots text such as modules, images, cache, etc.
The html pages are not stored on your site, actually. they are generated thanks to the rewrite rules in your .htaccess

I'm moving this to the "robots" forum. It sounds like you are running phpbb3, but since your question and topic are geard more towards robots.txt ...
Peter77
phpBB SEO Team
phpBB SEO Team
 
Posts: 520
Joined: Wed May 10, 2006 9:46 am
Location: Michigan


Return to roBots

 


  • Related topics
    Replies
    Views
    Last post

Who is online

Users browsing this forum: No registered users and 3 guests