Help with robots.txt

GoogleBot, MSNBot, Yahoo!Slurp ... Everything about indexing Bots, ip lists, User Agents, Crawl and robots.txt.

Moderator: Moderators

Help with robots.txt

Postby technofreaks » Tue Mar 27, 2007 7:19 am

Dear all,

I have my root folder as public_html where all my phpbb files are lying. I put the robots.txt file in same folder. Now I have disallowed the googlebot from viewing the faq, register, login, and other pages, but yesterday I found it crawling all those pages, So where is the mistake in my robots.txt file?

Also now to stop the googlebot form viewing those pages can I put this additional text in my robots.txt file?

user-agent: *
Disallow:http://www.technofreaks.org/profile.php?mode=editprofile
Disallow:http://www.technofreaks.org/faq.php
Disallow:http://www.technofreaks.org/search.php
Disallow:http://www.technofreaks.org/memberlist.php
Disallow:http://www.technofreaks.org/login.php?sid=45913f6103f97435083f4b2d82142576
Disallow:http://www.technofreaks.org/profile.php?mode=register&sid=45913f6103f97435083f4b2d82142576
Disallow:http://www.technofreaks.org/privmsg.php?folder=inbox&sid=45913f6103f97435083f4b2d82142576
Disallow:http://www.technofreaks.org/groupcp.php?sid=45913f6103f97435083f4b2d82142576

Thank you for your help.

Regards,
Technofreaks
technofreaks
 
Posts: 7
Joined: Tue Mar 27, 2007 7:03 am

Advertisement

Postby dcz » Tue Mar 27, 2007 9:24 am

And welcome :D

Yes, your robots.txt is wrong. You should not mention domain in it.

Code: Select all
user-agent: *
Disallow: /profile.php
Disallow: /faq.php
Disallow: /search.php?
Disallow: /memberlist.php
Disallow: /login.php
Disallow: /privmsg.php
Disallow: /groupcp.php


Is correct to do what you want.

Notice the ? after search.php, this will allow search.php, but will disallow search.php?anything avoiding search results to be indexed, but allowing the search.php page itself to be indexed.

You can get rid of it in case you want to disallow the search.php page itself.

++
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 21238
Joined: Fri Apr 28, 2006 9:03 pm

Postby technofreaks » Tue Mar 27, 2007 11:12 am

Thank you DCZ,

I think you have not seen the link for my robot file. It looks like this...

User-agent: *
Disallow: public_html/admin/
Disallow: public_html/cache/
Disallow: public_html/db/
Disallow: public_html/images/
Disallow: public_html/includes/
Disallow: public_html/language/
Disallow: public_html/templates/
Disallow: public_html/attach_mod/
Disallow: public_html/viewtopic.php
Disallow: public_html/viewforum.php
Disallow: public_html/index.php?
Disallow: public_html/posting.php
Disallow: public_html/groupcp.php
Disallow: public_html/search.php
Disallow: public_html/login.php
Disallow: public_html/privmsg.php
Disallow: public_html/post
Disallow: public_html/member
Disallow: public_html/profile.php
Disallow: public_html/memberlist.php
Disallow: public_html/faq.php
Disallow: public_html/viewonline.php
Disallow: *.html?start_letter

Now I want to know, is it correct or I need to remove that public_html as its a root folder.
technofreaks
 
Posts: 7
Joined: Tue Mar 27, 2007 7:03 am

Postby dcz » Wed Mar 28, 2007 1:02 pm

Your online robots.txt is just fine :

Code: Select all
User-agent: *
Disallow: /admin/
Disallow: /cache/
Disallow: /db/
Disallow: /images/
Disallow: /includes/
Disallow: /language/
Disallow: /templates/
Disallow: /attach_mod/
Disallow: /viewtopic.php
Disallow: /viewforum.php
Disallow: /index.php?
Disallow: /posting.php
Disallow: /groupcp.php
Disallow: /search.php
Disallow: /login.php
Disallow: /privmsg.php
Disallow: /post
Disallow: /member
Disallow: /profile.php
Disallow: /memberlist.php
Disallow: /faq.php
Disallow: /viewonline.php
Disallow: *.html?start_letter


I just doubt this would would be of a great efficiency :

Code: Select all
Disallow: *.html?start_letter


Wild cards are not followed by all bots. And there is nothing much we can do about it.

And notice that :
Code: Select all
Disallow: *.html?


can have the same effect, but for all possible vars sent after the .html suffix.


The proper alternative for such cases is to use the meta robots, and to conditionally set it to noindex.
But, on the other hand, it's not that much of a big deal, .html?.. should be less considered than .html urls, preventing most of the confusion that could occur.

The final fix would be to implement some more mod rewrite to handle pagination (as it seems from the Disallow: *.html?start_letter) in case you need it.

;)
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 21238
Joined: Fri Apr 28, 2006 9:03 pm

Postby technofreaks » Mon Apr 02, 2007 10:57 am

Dear DCZ, as I was unable to get the meaning of your last post, I implemented folowing code in my robots.txt file

Code: Select all
User-agent: *
Disallow: /admin/
Disallow: /cache/
Disallow: /db/
Disallow: /images/
Disallow: /includes/
Disallow: /language/
Disallow: /templates/
Disallow: /attach_mod/
Disallow: /viewtopic.php
Disallow: /viewforum.php
Disallow: /index.php?
Disallow: /posting.php
Disallow: /groupcp.php
Disallow: /search.php
Disallow: /login.php
Disallow: /privmsg.php
Disallow: /post
Disallow: /member
Disallow: /profile.php
Disallow: /memberlist.php
Disallow: /faq.php
Disallow: /viewonline.php
Disallow: *.html?start_letter


Now google is saying that my robots.txt file has restricted it form sseeing 18 pages like this one.
http://www.technofreaks.org/viewtopic.php?p=53
Which is not what I want. It has to see for the posts on the forum. So do I need to remove following code?
Code: Select all
Disallow: /viewtopic.php
Disallow: /viewforum.php


Please let me know. as I am pretty confused now.

Thanks for your help.


Technofreak
technofreaks
 
Posts: 7
Joined: Tue Mar 27, 2007 7:03 am

Postby dcz » Mon Apr 02, 2007 11:02 am

actually, you're right, just drop :
Code: Select all
Disallow: /viewtopic.php
Disallow: /viewforum.php
Disallow: /index.php? 


Since you're not using any mod rewrite. You don't want to prevent indexing your content.

I encourage you to at least install the cyber alien guest sessions mod, and to consider installing one of the phpBB SEO mod rewrites, you'd have better results for sure.

++
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 21238
Joined: Fri Apr 28, 2006 9:03 pm

Postby technofreaks » Mon Apr 02, 2007 11:36 am

Thanks for your prompt reply.

As you have suggested, I am going through the documents of the mods suggested by you. If I find any problem, I will get back to you.

Thanks again

Well while seeing the CyberAlien's mod, I came to know that I have installed one mod for optimisation which edits sessions.php and makes the following changes.

Code: Select all
## MOD Title: enhance-google-indexing
## MOD Author: Showscout & R. U. Serious
## MOD Description: If the User_agent includes the string 'Googlebot', then no session_ids are
appended to links, which will (hopefully) allow google to index more than just your index-site.
## MOD Version: 0.9.1
#-----[ OPEN ]------------------------------------------
includes/sessions.php
#-----[ FIND ]------------------------------------------
global $SID;
if ( !empty($SID) && !preg_match('#sid=#', $url) )
#-----[ REPLACE WITH ]------------------------------------------
global $SID, $HTTP_SERVER_VARS;
if ( !empty($SID) && !preg_match('#sid=#', $url) && !strstr($HTTP_SERVER_VARS
['HTTP_USER_AGENT'] ,'Googlebot') && !strstr($HTTP_SERVER_VARS
['HTTP_USER_AGENT'] ,'slurp@inktomi.com;'))
#
#-----[ SAVE/CLOSE ALL FILES ]------------------------------------------
#
# EoM


Shall I install Cyber Alean's mod over this? Will that help me?

[EDIT] Well just now I installed this mod and hoping that it will help me. By the time I am going through the other link you porvided.

There are three categories (Simple, mixed and advance), which one shall I choose. I don't know anything about php as well as sql database. :(
And am not at all good in programming
Please let me know. [/EDIT]

Thank you

Technofreak
technofreaks
 
Posts: 7
Joined: Tue Mar 27, 2007 7:03 am

Postby dcz » Mon Apr 02, 2007 8:58 pm

We're going off topic here.

Cyber alien method is better than the one you currently use, at least IMHO, but we should discuss this elsewhere I think.

++
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 21238
Joined: Fri Apr 28, 2006 9:03 pm

Thanks

Postby technofreaks » Tue Apr 03, 2007 7:09 am

Thanks for your help.
technofreaks
 
Posts: 7
Joined: Tue Mar 27, 2007 7:03 am


Return to roBots

 


  • Related topics
    Replies
    Views
    Last post

Who is online

Users browsing this forum: No registered users and 1 guest