blocking bad bots from .htaccess

GoogleBot, MSNBot, Yahoo!Slurp ... Everything about indexing Bots, ip lists, User Agents, Crawl and robots.txt.

Moderator: Moderators


blocking bad bots from .htaccess

Postby lavinya » Thu Sep 14, 2006 6:41 pm

hello all.

this code true or wrong ??

Code: Select all
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} *FrontPage* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *httrack* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *Teleport* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *webzip* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *WebStripper* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *NetMechanic* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *CherryPicker* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *EmailCollector* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *EmailSiphon* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *WebBandit* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *EmailWolf* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *ExtractorPro* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *SiteSnagger* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *Cheese* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *Quester* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *WebZip* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *moget* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *WebSauger* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *WebCopier* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *WWW-Collector* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *InfoNavi* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *Harvest* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *Bullseye* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *LinkWalker* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *LinkextractorPro* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *Proxy* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *BlowFish* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *WebEnhancer* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *TightTwatBot* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *LinkScan* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *WebDownloader* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *BruteForce* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *BruteForce* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} lwp [NC,OR]
RewriteCond %{HTTP_USER_AGENT} lwp-* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} *anonym* [NC,OR]
RewriteRule !^403.html$ - [F,L]


if your reply me message I will be happy . thanks.
User avatar
lavinya
PR1
PR1
 
Posts: 166
Joined: Mon Jul 24, 2006 9:05 am
Location: Turkey

Advertisement

Postby dcz » Thu Sep 14, 2006 9:33 pm

Well, besides the wild-cards (*, if you don't set where to begin and end, the pattern will be searched for anywhere in the UA string) to be not very useful, it's correct to ban all those User Agents.

But it's just on level above robots.txt, as real bad bots do not use any static User Agent, to go through such walls.

This is not or next week, but I am working on a solution which will allow us to fight back against bad bots and known exploits.

++
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 19909
Joined: Fri Apr 28, 2006 9:03 pm

Postby lavinya » Fri Sep 15, 2006 8:35 am

thanks dcz. ok.
User avatar
lavinya
PR1
PR1
 
Posts: 166
Joined: Mon Jul 24, 2006 9:05 am
Location: Turkey

Postby lavinya » Fri Sep 15, 2006 9:40 am

new rule. redirect with 302 to robotstxt.org. But

eg.

RewriteCond %{HTTP_USER_AGENT} ^HTTrack [NC,OR]
not all blocked httrack all version. only blocked old version or "httrack".


Code: Select all
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^Alexibot [OR]
RewriteCond %{HTTP_USER_AGENT} ^asterias [OR]
RewriteCond %{HTTP_USER_AGENT} ^BackDoorBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Black.Hole [OR]
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^BlowFish [OR]
RewriteCond %{HTTP_USER_AGENT} ^BotALot [OR]
RewriteCond %{HTTP_USER_AGENT} ^BuiltBotTough [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bullseye [OR]
RewriteCond %{HTTP_USER_AGENT} ^BunnySlippers [OR]
RewriteCond %{HTTP_USER_AGENT} ^Cegbfeieh [OR]
RewriteCond %{HTTP_USER_AGENT} ^CheeseBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^CopyRightCheck [OR]
RewriteCond %{HTTP_USER_AGENT} ^cosmos [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DittoSpyder [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^EroCrawler [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Foobot [OR]
RewriteCond %{HTTP_USER_AGENT} ^FrontPage [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^Harvest [OR]
RewriteCond %{HTTP_USER_AGENT} ^hloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^httplib [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^humanlinks [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InfoNaviRobot [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JennyBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Kenjin.Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Keyword.Density [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^LexiBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^libWeb/clsHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkextractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkScan/8.1a.Unix [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^lwp-trivial [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mata.Hari [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIIxpc [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister.PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^moget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/2 [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/3.Mozilla/2.01 [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetMechanic [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^NPBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline.Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^Openfind [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^ProPowerBot/2.14 [OR]
RewriteCond %{HTTP_USER_AGENT} ^ProWebWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^ProWebWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^QueryN.Metasearch [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^RepoMonkey [OR]
RewriteCond %{HTTP_USER_AGENT} ^RMA [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SlySearch [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SpankBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^spanner [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^suzuran [OR]
RewriteCond %{HTTP_USER_AGENT} ^Szukacz/1.4 [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Telesoft [OR]
RewriteCond %{HTTP_USER_AGENT} ^The.Intraformant [OR]
RewriteCond %{HTTP_USER_AGENT} ^TheNomad [OR]
RewriteCond %{HTTP_USER_AGENT} ^TightTwatBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Titan [OR]
RewriteCond %{HTTP_USER_AGENT} ^toCrawl/UrlDispatcher [OR]
RewriteCond %{HTTP_USER_AGENT} ^toCrawl/UrlDispatcher [OR]
RewriteCond %{HTTP_USER_AGENT} ^True_Robot [OR]
RewriteCond %{HTTP_USER_AGENT} ^turingos [OR]
RewriteCond %{HTTP_USER_AGENT} ^TurnitinBot/1.5 [OR]
RewriteCond %{HTTP_USER_AGENT} ^URLy.Warning [OR]
RewriteCond %{HTTP_USER_AGENT} ^VCI [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebBandit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEnhancer [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web.Image.Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebmasterWorldForumBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website.Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^Webster.Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZip [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWW-Collector-E [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xenu's [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^(.*)$ http://www.robotstxt.org/
User avatar
lavinya
PR1
PR1
 
Posts: 166
Joined: Mon Jul 24, 2006 9:05 am
Location: Turkey

Postby dcz » Fri Sep 15, 2006 6:28 pm

Actually, only
Code: Select all
RewriteCond %{HTTP_USER_AGENT} WWWOFFLE [OR]


is needed, no need to use the ^ anchor, as the test string could be anywhere int eh UA string.

++
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 19909
Joined: Fri Apr 28, 2006 9:03 pm

Postby lavinya » Fri Sep 15, 2006 7:28 pm

hello. thanks dcz.

dcz I give the code that I don't know it true. can you give me the example about it. one line is enaugh . thanks.
User avatar
lavinya
PR1
PR1
 
Posts: 166
Joined: Mon Jul 24, 2006 9:05 am
Location: Turkey

Postby dcz » Fri Sep 15, 2006 8:10 pm

oki doki ;)
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 19909
Joined: Fri Apr 28, 2006 9:03 pm

Postby linus » Mon Dec 04, 2006 6:26 pm

GOOD :shock: :wink:
User avatar
linus
PR0
PR0
 
Posts: 68
Joined: Sun Jul 02, 2006 12:19 pm
Location: Italy

Postby arch stanton » Thu Feb 01, 2007 12:57 pm

Does it matter where in the .htaccess file you put this script?

At the moment, I have an anti-hotlink script, a mod rewrite script and Google sitemaps rewrite script.

Is it better to put the bad bots blocker above or below these, or does it make no difference?

Also, I would suggest adding ConveraCrawler to the list...
arch stanton
PR1
PR1
 
Posts: 154
Joined: Wed Oct 04, 2006 9:48 am

Postby dcz » Thu Feb 01, 2007 11:50 pm

You should put this kind of code before any rewriterule, there is no need to work more in case we have to deny access, and some rewriterules with the [L] tag would cut the mod rewrite thus the deny when matching.

++
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 19909
Joined: Fri Apr 28, 2006 9:03 pm

what to use?

Postby himanshu » Thu Sep 25, 2008 11:37 am

i pretty confused

to use
Code: Select all
RewriteCond %{HTTP_USER_AGENT} WWWOFFLE [OR]

or
Code: Select all
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]


or
Code: Select all
RewriteCond %{HTTP_USER_AGENT} *anonym* [NC,OR]


in nutshell whether to use * or ^ or nothing??
can anybody explain the purpose of each?
himanshu
 
Posts: 14
Joined: Fri Sep 19, 2008 11:09 am

Postby dcz » Sat Sep 27, 2008 12:18 pm

wildcards does not mean match anything in an .htaccess.

Using :
Code: Select all
RewriteCond %{HTTP_USER_AGENT} WWWOFFLE [NC,OR]


will be enough to match WWWOFFLE and wwwoffle anywhere in the UA string, adding ^ :
Code: Select all
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [NC,OR]


would mean that the UA string must start with WWWOFFLE or wwwoffle, and :
Code: Select all
RewriteCond %{HTTP_USER_AGENT} WWWOFFLE$ [NC,OR]

would mean to match a string ending with WWWOFFLE or wwwoffle.

So using :
Code: Select all
RewriteCond %{HTTP_USER_AGENT} WWWOFFLE [NC,OR]


is where you'll match more cases.

++
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 19909
Joined: Fri Apr 28, 2006 9:03 pm


Return to roBots

 


  • Related topics
    Replies
    Views
    Last post

Who is online

Users browsing this forum: No registered users and 1 guest


 
cron