phpBB SEO
Boards
Directory  
SEO  
Downloads
  phpBB SEO : Search Engine Optimization, Directory, Forums  
Index
Forums
Annuaire
Référencement
Télécharger
 
  Search Rechercher
    Register
Username :  Password :  Log me on automatically each visit  
S'enregistrer  
 
   
bots spidering SID URLs

 
Post new topic   Reply to topic    phpBB SEO » SEO Forum  » SEO Principles
::  
Author Message
Peter77
phpBB SEO Team
phpBB SEO Team


Joined: 10 May 2006
Posts: 512
Location: Michigan

bots spidering SID URLsPosted: Mon Jul 10, 2006 5:31 pm    Post subject: bots spidering SID URLs

Here is a little something of what the Robots MOD is picking up. i'm not sure if these are what bots are actually crawling or just attempting to...

/robots/pages.php?robot=23&d=20060709


Here is Yahoo bot crawling forums that are supposed to be blocked by guests...

/robots/pages.php?robot=10&d=20060710

I hope those URL's are not actually getting crawled?

In my robots.txt I do have the following:

Disallow: /phpbb/login.php?
Disallow: /login.php?


Last edited by Peter77 on Sat Jun 28, 2008 6:21 am; edited 2 times in total
Back to top
dcz
Administrateur - Site Admin
Administrateur - Site Admin


Joined: 28 Apr 2006
Posts: 14403

bots spidering SID URLsPosted: Mon Jul 10, 2006 7:07 pm    Post subject: Re: bots spidering SID URLs

Hello,

I though this would be best answered in a new thread, so welcome to your new thread Peter77 Very Happy

And yes those URL are being spidered. Must be a HTTP_REFERER grabbed by a toolbar or when following a link from within those stats.

Search Engine find link in every possible way.

I don't think robostats can be of any security issues, but, I'd lock the folder with .htpasswd just not to bother about it and about such link being spidered, which is useless.

++

_________________
Useful links :
SEO Forum || SEO Directory || SEO phpBB || SEO phpBB3 || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Référencement phpBB3 || Recherche
Back to top
Visit poster's website
Peter77
phpBB SEO Team
phpBB SEO Team


Joined: 10 May 2006
Posts: 512
Location: Michigan

bots spidering SID URLsPosted: Fri Jul 14, 2006 5:31 pm    Post subject: Re: bots spidering SID URLs

LOL alright... I was lost for a minute. Im also guessing that this was happening because of this Problem. that is now been fixed, so maybe this will help a bit.


But I guess even Google bots sometimes do not always follow the Robots.txt Question


Last edited by Peter77 on Sat Jun 28, 2008 6:22 am; edited 1 time in total
Back to top
dcz
Administrateur - Site Admin
Administrateur - Site Admin


Joined: 28 Apr 2006
Posts: 14403

bots spidering SID URLsPosted: Fri Jul 14, 2006 8:09 pm    Post subject: Re: bots spidering SID URLs

Google do follow the robots.txt, but, what can't be avoided is the old links previously not disallowed.

When Google visited you site back when you did not implemented this rule, it spidered it, and then once the robots.txt is updated, Google did continue to crawl it a bit, until it eventually finds out about the new robots.txt (even if it download its fairly often, it's not every day) and then, will update it's db and the url should not appear as cached any more, but still stay listed.

Then, you have to think about Google complexity, thousands of web and db servers, continuous db updates and search queries, sometimes, it takes time for everything to be as it should.

Once, as I had changed URLs on a web site, I used the automated removal tool to get rid of all the old URLs, which was done very quickly.
But, like 8 month after, I saw those pages in the search results and cached, and guess what, the cache date was correct, 8 month old Shocked

By the way a great example of the very deep backup strategy they deploy, I would like to know how many zillions of Terra bit could be necessary to keep every cached pages of every spidered page up to 8 month old or even more Laughing

Actually I still wonder if this was due to the fact I got rid of the old mod rewrite URLs disallows in my robots.txt, like 6 month after they had disappeared from the Internet, but I eventually submitted an updated robots.txt again and this time it took one week to have them taken of Google listing.

All this to say is the rule to encounter exception on such enormous system Wink and actually they are very few if you compare to how many time it works fine.

And Google, as all decent search engines, do respect robots.txt, it's their interest as well not to loose time and ressources spidering dupes.

++

_________________
Useful links :
SEO Forum || SEO Directory || SEO phpBB || SEO phpBB3 || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Référencement phpBB3 || Recherche
Back to top
Visit poster's website
Peter77
phpBB SEO Team
phpBB SEO Team


Joined: 10 May 2006
Posts: 512
Location: Michigan

bots spidering SID URLsPosted: Sun Jul 16, 2006 11:16 am    Post subject: Re: bots spidering SID URLs

A few mistakes on my part. first, my Robots.txt was edited so that ALL search engines had no restrictions. Confused second, I had the phpbb version of ggsitemap available and in non re write mode... yes along with my mxbb version of ggsitemap. I corrected these two problems. that's what I get for messing with the site after a long day at work. Rolling Eyes
Back to top
Display posts from previous:   
Post new topic   Reply to topic    phpBB SEO » SEO Forum  » SEO Principles
Page 1 of 1

Navigation Similar Topics

Jump to: