| :: |
| Author |
Message |
technofreaks
Joined: 27 Mar 2007 Posts: 7
|
Posted: Tue Mar 27, 2007 7:19 am Post subject: Help with robots.txt |
|
|
Dear all,
I have my root folder as public_html where all my phpbb files are lying. I put the robots.txt file in same folder. Now I have disallowed the googlebot from viewing the faq, register, login, and other pages, but yesterday I found it crawling all those pages, So where is the mistake in my robots.txt file?
Also now to stop the googlebot form viewing those pages can I put this additional text in my robots.txt file?
user-agent: *
Disallow:http://www.technofreaks.org/profile.php?mode=editprofile
Disallow:http://www.technofreaks.org/faq.php
Disallow:http://www.technofreaks.org/search.php
Disallow:http://www.technofreaks.org/memberlist.php
Disallow:http://www.technofreaks.org/login.php?sid=45913f6103f97435083f4b2d82142576
Disallow:http://www.technofreaks.org/profile.php?mode=register&sid=45913f6103f97435083f4b2d82142576
Disallow:http://www.technofreaks.org/privmsg.php?folder=inbox&sid=45913f6103f97435083f4b2d82142576
Disallow:http://www.technofreaks.org/groupcp.php?sid=45913f6103f97435083f4b2d82142576
Thank you for your help.
Regards,
Technofreaks |
|
|
| Back to top |
|
 |
|
 |
dcz Administrateur - Site Admin

Joined: 28 Apr 2006 Posts: 13354
|
Posted: Tue Mar 27, 2007 9:24 am Post subject: Re: Help with robots.txt |
|
|
And welcome
Yes, your robots.txt is wrong. You should not mention domain in it.
| Code: | user-agent: *
Disallow: /profile.php
Disallow: /faq.php
Disallow: /search.php?
Disallow: /memberlist.php
Disallow: /login.php
Disallow: /privmsg.php
Disallow: /groupcp.php |
Is correct to do what you want.
Notice the ? after search.php, this will allow search.php, but will disallow search.php?anything avoiding search results to be indexed, but allowing the search.php page itself to be indexed.
You can get rid of it in case you want to disallow the search.php page itself.
++ |
_________________ Useful links :
SEO Forum || SEO Directory || SEO phpBB || SEO phpBB3 || Search
____________________
Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Référencement phpBB3 || Recherche |
|
| Back to top |
|
 |
technofreaks
Joined: 27 Mar 2007 Posts: 7
|
Posted: Tue Mar 27, 2007 11:12 am Post subject: Re: Help with robots.txt |
|
|
Thank you DCZ,
I think you have not seen the link for my robot file. It looks like this...
User-agent: *
Disallow: public_html/admin/
Disallow: public_html/cache/
Disallow: public_html/db/
Disallow: public_html/images/
Disallow: public_html/includes/
Disallow: public_html/language/
Disallow: public_html/templates/
Disallow: public_html/attach_mod/
Disallow: public_html/viewtopic.php
Disallow: public_html/viewforum.php
Disallow: public_html/index.php?
Disallow: public_html/posting.php
Disallow: public_html/groupcp.php
Disallow: public_html/search.php
Disallow: public_html/login.php
Disallow: public_html/privmsg.php
Disallow: public_html/post
Disallow: public_html/member
Disallow: public_html/profile.php
Disallow: public_html/memberlist.php
Disallow: public_html/faq.php
Disallow: public_html/viewonline.php
Disallow: *.html?start_letter
Now I want to know, is it correct or I need to remove that public_html as its a root folder. |
|
|
| Back to top |
|
 |
dcz Administrateur - Site Admin

Joined: 28 Apr 2006 Posts: 13354
|
Posted: Wed Mar 28, 2007 1:02 pm Post subject: Re: Help with robots.txt |
|
|
Your online robots.txt is just fine :
| Code: | User-agent: *
Disallow: /admin/
Disallow: /cache/
Disallow: /db/
Disallow: /images/
Disallow: /includes/
Disallow: /language/
Disallow: /templates/
Disallow: /attach_mod/
Disallow: /viewtopic.php
Disallow: /viewforum.php
Disallow: /index.php?
Disallow: /posting.php
Disallow: /groupcp.php
Disallow: /search.php
Disallow: /login.php
Disallow: /privmsg.php
Disallow: /post
Disallow: /member
Disallow: /profile.php
Disallow: /memberlist.php
Disallow: /faq.php
Disallow: /viewonline.php
Disallow: *.html?start_letter |
I just doubt this would would be of a great efficiency :
| Code: | | Disallow: *.html?start_letter |
Wild cards are not followed by all bots. And there is nothing much we can do about it.
And notice that :
can have the same effect, but for all possible vars sent after the .html suffix.
The proper alternative for such cases is to use the meta robots, and to conditionally set it to noindex.
But, on the other hand, it's not that much of a big deal, .html?.. should be less considered than .html urls, preventing most of the confusion that could occur.
The final fix would be to implement some more mod rewrite to handle pagination (as it seems from the Disallow: *.html?start_letter) in case you need it.
 |
_________________ Useful links :
SEO Forum || SEO Directory || SEO phpBB || SEO phpBB3 || Search
____________________
Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Référencement phpBB3 || Recherche |
|
| Back to top |
|
 |
technofreaks
Joined: 27 Mar 2007 Posts: 7
|
Posted: Mon Apr 02, 2007 10:57 am Post subject: Re: Help with robots.txt |
|
|
Dear DCZ, as I was unable to get the meaning of your last post, I implemented folowing code in my robots.txt file
| Code: | User-agent: *
Disallow: /admin/
Disallow: /cache/
Disallow: /db/
Disallow: /images/
Disallow: /includes/
Disallow: /language/
Disallow: /templates/
Disallow: /attach_mod/
Disallow: /viewtopic.php
Disallow: /viewforum.php
Disallow: /index.php?
Disallow: /posting.php
Disallow: /groupcp.php
Disallow: /search.php
Disallow: /login.php
Disallow: /privmsg.php
Disallow: /post
Disallow: /member
Disallow: /profile.php
Disallow: /memberlist.php
Disallow: /faq.php
Disallow: /viewonline.php
Disallow: *.html?start_letter |
Now google is saying that my robots.txt file has restricted it form sseeing 18 pages like this one.
http://www.technofreaks.org/viewtopic.php?p=53
Which is not what I want. It has to see for the posts on the forum. So do I need to remove following code?
| Code: | Disallow: /viewtopic.php
Disallow: /viewforum.php |
Please let me know. as I am pretty confused now.
Thanks for your help.
Technofreak |
|
|
| Back to top |
|
 |
dcz Administrateur - Site Admin

Joined: 28 Apr 2006 Posts: 13354
|
|
| Back to top |
|
 |
technofreaks
Joined: 27 Mar 2007 Posts: 7
|
Posted: Mon Apr 02, 2007 11:36 am Post subject: Re: Help with robots.txt |
|
|
Thanks for your prompt reply.
As you have suggested, I am going through the documents of the mods suggested by you. If I find any problem, I will get back to you.
Thanks again
Well while seeing the CyberAlien's mod, I came to know that I have installed one mod for optimisation which edits sessions.php and makes the following changes.
| Code: | ## MOD Title: enhance-google-indexing
## MOD Author: Showscout & R. U. Serious
## MOD Description: If the User_agent includes the string 'Googlebot', then no session_ids are
appended to links, which will (hopefully) allow google to index more than just your index-site.
## MOD Version: 0.9.1
#-----[ OPEN ]------------------------------------------
includes/sessions.php
#-----[ FIND ]------------------------------------------
global $SID;
if ( !empty($SID) && !preg_match('#sid=#', $url) )
#-----[ REPLACE WITH ]------------------------------------------
global $SID, $HTTP_SERVER_VARS;
if ( !empty($SID) && !preg_match('#sid=#', $url) && !strstr($HTTP_SERVER_VARS
['HTTP_USER_AGENT'] ,'Googlebot') && !strstr($HTTP_SERVER_VARS
['HTTP_USER_AGENT'] ,'slurp@inktomi.com;'))
#
#-----[ SAVE/CLOSE ALL FILES ]------------------------------------------
#
# EoM |
Shall I install Cyber Alean's mod over this? Will that help me?
[EDIT] Well just now I installed this mod and hoping that it will help me. By the time I am going through the other link you porvided.
There are three categories (Simple, mixed and advance), which one shall I choose. I don't know anything about php as well as sql database.
And am not at all good in programming
Please let me know. [/EDIT]
Thank you
Technofreak |
|
|
| Back to top |
|
 |
dcz Administrateur - Site Admin

Joined: 28 Apr 2006 Posts: 13354
|
|
| Back to top |
|
 |
technofreaks
Joined: 27 Mar 2007 Posts: 7
|
Posted: Tue Apr 03, 2007 7:09 am Post subject: Thanks |
|
|
| Thanks for your help. |
|
|
| Back to top |
|
 |
|
|