Ideal robots.txt with the zero dup

Zero duplicate support forum.
Personalized HTTP 301 dynamic redirections.

Moderator: Moderators

Ideal robots.txt with the zero dup

Postby j2ross » Fri May 23, 2008 3:20 pm

Ok so I read about the correct robots file to use when installing the advanced mod rewrite, but wasn't clear about what to use with zero dup.

Since all my existing viewtopic.php and viewforum.php will be getting a 301 to the newly rewritten pages is it safe to say that I should not be disallowing those urls in the robots file?
j2ross
 
Posts: 15
Joined: Wed May 21, 2008 5:17 pm

Advertisement

Postby SeO » Fri May 23, 2008 3:29 pm

Well, you should not disallow the redirected files in your robots.txt only if they previously where indexed, otherwise, or once the redirecting is acknowledged by SEs, the advised robots.txt is the same with and without the zero dupe. With the zero dupe on, it's of course not really important, but still, it can make things clearer for SEs.
SeO
Admin
Admin
 
Posts: 6333
Joined: Wed Mar 15, 2006 9:41 pm

Postby Pigeon » Mon Jan 19, 2009 6:45 pm

Googlebot doesn't seem to be bothering to follow the 301s for these files though.

The redirections are certainly working properly... there is nothing wrong with the mod. Testing it using wget to spoof a googlebot request gives this output:

Code: Select all
wget 'http://www.lucy-pinder.tv/forum/viewforum.php?f=6&sid=fd52e349e3fec18b2eb9352c126bb97e' -O /dev/null --user-agent 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
--18:25:04--  http://www.lucy-pinder.tv/forum/viewforum.php?f=6&sid=fd52e349e3fec18b2eb9352c126bb97e
           => `/dev/null'
Resolving www.lucy-pinder.tv... 213.162.113.18
Connecting to www.lucy-pinder.tv|213.162.113.18|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://www.lucy-pinder.tv/forum/what-s-this-all-about-then-f6.html [following]
--18:25:04--  http://www.lucy-pinder.tv/forum/what-s-this-all-about-then-f6.html
           => `/dev/null'
Reusing existing connection to www.lucy-pinder.tv:80.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

    [ <=>                                                                             ] 19,439        --.--K/s             

18:25:05 (674.61 KB/s) - `/dev/null' saved [19439]


and the following entries in the server logs:

Code: Select all
192.168.1.2 - - [19/Jan/2009:18:25:04 +0000] "GET /forum/viewforum.php?f=6&sid=fd52e349e3fec18b2eb9352c126bb97e HTTP/1.0" 301 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
192.168.1.2 - - [19/Jan/2009:18:25:05 +0000] "GET /forum/what-s-this-all-about-then-f6.html HTTP/1.0" 200 19439 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"


(there is a reverse proxy between the server and the outside world so every access shows up as coming from 192.168.1.2)

When the real googlebot does it though it never bothers to follow up the 301.

I have therefore added viewforum.php and viewtopic.php to my robots.txt because it seems that something, I don't know what, is confusing the googlebot.

The forum has only been up for a week or so before I installed advanced mod rewrite and advanced zero duplicate, and I only installed them last night, so it seems Google is still looking for things to which it had previously indexed the references.

It also doesn't seem to be respecting robots.txt properly. I have just noticed a googlebot hit on groupcp.php. I have "Disallow: /forum/groupcp.php" in my robots.txt, and Google has read robots.txt since I added that entry, but it doesn't seem to have stopped it.
Pigeon
 
Posts: 17
Joined: Sun Jan 18, 2009 10:36 pm

Postby SeO » Wed Jan 21, 2009 8:10 am

For the same reason as in this post, do not implement the robots.txt !

Google will follow the redirect as long as the header sent are correct, but it's not acting like a browser, and will not necessarily follow the redirect right when it discovers it, but believe me, it will ;)
SeO
Admin
Admin
 
Posts: 6333
Joined: Wed Mar 15, 2006 9:41 pm


Return to Zero duplicate phpBB2

Who is online

Users browsing this forum: No registered users and 2 guests