Robots.txt question?

GoogleBot, MSNBot, Yahoo!Slurp ... Everything about indexing Bots, ip lists, User Agents, Crawl and robots.txt.

Moderator: Moderators

Robots.txt question?

Postby hvactechforum » Thu Apr 19, 2007 12:17 am

My installed mods include:
- advanced mod rewrite
- dynamic meta tags
- advanced zero duplicates

What is a recommended robots.txt file? Until now, I am using this:
Code: Select all
User-agent: *
Disallow: /viewtopic.php
Disallow: /viewforum.php
Disallow: /index.php?
Disallow: /posting.php
Disallow: /groupcp.php
Disallow: /search.php
Disallow: /login.php
Disallow: /privmsg.php
Disallow: /post
Disallow: /member
Disallow: /profile.php
Disallow: /memberlist.php
Disallow: /faq.php


Is this good? Should I change it? I want maximum SEO
hvactechforum
 
Posts: 44
Joined: Sat Mar 03, 2007 12:50 am

Advertisement

Postby AmirAbbas » Thu Apr 19, 2007 5:23 am

i think your robots.txt file are good. just one little thing
you have installed zero dupe. you can remove

Code: Select all
Disallow: /post


from your robots.txt file because zero dupe will redirect all post URL to proper topic URL :wink:
User avatar
AmirAbbas
phpBB SEO Team
phpBB SEO Team
 
Posts: 534
Joined: Thu May 11, 2006 3:30 pm
Location: IRAN

Postby hvactechforum » Fri Apr 20, 2007 10:36 am

Thank you for your reply. :)
hvactechforum
 
Posts: 44
Joined: Sat Mar 03, 2007 12:50 am

Postby hvactechforum » Fri May 11, 2007 1:19 am

Shouldn't a robots.txt file block this sort of thing?

http : // www . hvacrunivers . com / probe . php ? extra=872a3e51e4ce4ac57bcc72afd2f4eb92,4b3e6ab3c7

+ADw-body+AD4 +ADw-iframe src+AD0AIg-http : // www . hvacrunivers . com ...
hvactechforum
 
Posts: 44
Joined: Sat Mar 03, 2007 12:50 am

Postby dcz » Fri May 11, 2007 5:00 pm

What do you mean ?

The robots.txt does not actually block anything, it's just a set of directive we ask bots to follow, in the end it's up to them to be nice or not.

++
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 21238
Joined: Fri Apr 28, 2006 9:03 pm

Postby hvactechforum » Fri May 11, 2007 10:09 pm

dcz wrote:What do you mean ?

The robots.txt does not actually block anything, it's just a set of directive we ask bots to follow, in the end it's up to them to be nice or not.

++


OK, I'll elaborate. What I mean is ... Is there any reason why I would want google to go through files such as probe.php? Should I be adding more files to my robots.txt file to prevent indexing of such files as this?
hvactechforum
 
Posts: 44
Joined: Sat Mar 03, 2007 12:50 am

Postby HB » Sat May 12, 2007 2:52 am

I don't recognize probe.php, but as a general statement, robots.txt lists files / directories that you don't want indexed, e.g.:

User-agent: *
Disallow: /forums/admin/
Disallow: /forums/images/
Disallow: /forums/cache/
Disallow: /forums/db/
Disallow: /forums/includes/
Disallow: /forums/language/
Disallow: /forums/templates/
Disallow: /forums/groupcp.php
Disallow: /forums/memberlist.php
Disallow: /forums/faq.php
Disallow: /forums/login.php
Disallow: /forums/modcp.php
Disallow: /forums/posting.php
Disallow: /forums/search.php
Disallow: /forums/profile.php
Disallow: /forums/privmsg.php
Disallow: /forums/viewonline.php

The robots.txt entries are "hints" that save the bot time and your server resources. It's not an access list; for that, you would use .htaccess / .htpasswd
Dan Kehn
HB
phpBB SEO Team
phpBB SEO Team
 
Posts: 1220
Joined: Mon Oct 16, 2006 2:25 am

Postby hvactechforum » Sat May 12, 2007 2:28 pm

HB wrote:The robots.txt entries are "hints" that save the bot time and your server resources. It's not an access list; for that, you would use .htaccess / .htpasswd


Thank you for your reply. The thing is I don't know what is best to allow/disallow via a robots.txt file. I used a recommended one off this site and I assumed it would "direct" the robots into the proper places. So, I guess I am simply wondering if there are more files/directories that I should include in my robots.txt file. Perhaps, the better question is "what files or directories should I be sure to include so that the search engines will go to relevant areas for indexing?" I guess I am looking for a comprehensive answer on a well-formed robots.txt file that "asks" the bots not to go to unnecessary places. Thank you.
hvactechforum
 
Posts: 44
Joined: Sat Mar 03, 2007 12:50 am

Postby HB » Sat May 12, 2007 5:38 pm

Assuming you're using GYM sitemap, you've got all the entry points covered for your forum. From the robots.txt viewpoint, the only question is what you DON'T want indexed. To the best of my knowledge, the settings above are exhaustive for the standard phpBB install. You may wish to check google's index to confirm extraneous pages aren't included.

FYI, google's webmaster tools now includes removal requests and it accepts wildcards if you wish to tidy up past mistakes. Whether it matters SEO-wise, I don't know, but it helps googlebot's efficiency.
Dan Kehn
HB
phpBB SEO Team
phpBB SEO Team
 
Posts: 1220
Joined: Mon Oct 16, 2006 2:25 am


Return to roBots

 


  • Related topics
    Replies
    Views
    Last post

Who is online

Users browsing this forum: No registered users and 1 guest