| |
| |
|
|
|
|
| |
|
| |
|
| :: |
| Author |
Message |
Peter77 phpBB SEO Team


Joined: 10 May 2006 Posts: 512 Location: Michigan
|
Posted: Mon Jul 10, 2006 5:31 pm Post subject: bots spidering SID URLs |
|
|
Here is a little something of what the Robots MOD is picking up. i'm not sure if these are what bots are actually crawling or just attempting to...
/robots/pages.php?robot=23&d=20060709
Here is Yahoo bot crawling forums that are supposed to be blocked by guests...
/robots/pages.php?robot=10&d=20060710
I hope those URL's are not actually getting crawled?
In my robots.txt I do have the following:
Disallow: /phpbb/login.php?
Disallow: /login.php? |
Last edited by Peter77 on Sat Jun 28, 2008 6:21 am; edited 2 times in total |
|
| Back to top |
|
 |
|
 |
dcz Administrateur - Site Admin

Joined: 28 Apr 2006 Posts: 14403
|
Posted: Mon Jul 10, 2006 7:07 pm Post subject: Re: bots spidering SID URLs |
|
|
Hello,
I though this would be best answered in a new thread, so welcome to your new thread Peter77
And yes those URL are being spidered. Must be a HTTP_REFERER grabbed by a toolbar or when following a link from within those stats.
Search Engine find link in every possible way.
I don't think robostats can be of any security issues, but, I'd lock the folder with .htpasswd just not to bother about it and about such link being spidered, which is useless.
++ |
_________________ Useful links :
SEO Forum || SEO Directory || SEO phpBB || SEO phpBB3 || Search
____________________
Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Référencement phpBB3 || Recherche |
|
| Back to top |
|
 |
Peter77 phpBB SEO Team


Joined: 10 May 2006 Posts: 512 Location: Michigan
|
Posted: Fri Jul 14, 2006 5:31 pm Post subject: Re: bots spidering SID URLs |
|
|
LOL alright... I was lost for a minute. Im also guessing that this was happening because of this Problem. that is now been fixed, so maybe this will help a bit.
But I guess even Google bots sometimes do not always follow the Robots.txt  |
Last edited by Peter77 on Sat Jun 28, 2008 6:22 am; edited 1 time in total |
|
| Back to top |
|
 |
dcz Administrateur - Site Admin

Joined: 28 Apr 2006 Posts: 14403
|
Posted: Fri Jul 14, 2006 8:09 pm Post subject: Re: bots spidering SID URLs |
|
|
Google do follow the robots.txt, but, what can't be avoided is the old links previously not disallowed.
When Google visited you site back when you did not implemented this rule, it spidered it, and then once the robots.txt is updated, Google did continue to crawl it a bit, until it eventually finds out about the new robots.txt (even if it download its fairly often, it's not every day) and then, will update it's db and the url should not appear as cached any more, but still stay listed.
Then, you have to think about Google complexity, thousands of web and db servers, continuous db updates and search queries, sometimes, it takes time for everything to be as it should.
Once, as I had changed URLs on a web site, I used the automated removal tool to get rid of all the old URLs, which was done very quickly.
But, like 8 month after, I saw those pages in the search results and cached, and guess what, the cache date was correct, 8 month old
By the way a great example of the very deep backup strategy they deploy, I would like to know how many zillions of Terra bit could be necessary to keep every cached pages of every spidered page up to 8 month old or even more
Actually I still wonder if this was due to the fact I got rid of the old mod rewrite URLs disallows in my robots.txt, like 6 month after they had disappeared from the Internet, but I eventually submitted an updated robots.txt again and this time it took one week to have them taken of Google listing.
All this to say is the rule to encounter exception on such enormous system and actually they are very few if you compare to how many time it works fine.
And Google, as all decent search engines, do respect robots.txt, it's their interest as well not to loose time and ressources spidering dupes.
++ |
_________________ Useful links :
SEO Forum || SEO Directory || SEO phpBB || SEO phpBB3 || Search
____________________
Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Référencement phpBB3 || Recherche |
|
| Back to top |
|
 |
Peter77 phpBB SEO Team


Joined: 10 May 2006 Posts: 512 Location: Michigan
|
Posted: Sun Jul 16, 2006 11:16 am Post subject: Re: bots spidering SID URLs |
|
|
A few mistakes on my part. first, my Robots.txt was edited so that ALL search engines had no restrictions. second, I had the phpbb version of ggsitemap available and in non re write mode... yes along with my mxbb version of ggsitemap. I corrected these two problems. that's what I get for messing with the site after a long day at work.  |
|
|
| Back to top |
|
 |
|
|
| Navigation |
Similar Topics |
|
|
|
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |