Jun 14 2009

Detect Bots By Parsing The User Agent With PHP

Because I’ve been starting to keep a closer eye on my traffic I’ve been logging everything to my databases. When writing reports for this information I noticed that there was huge amounts of traffic from bots. Yahoo, Google, MSN and many others have been hammering my sites quite a lot lately. Since I’m writing my reports in PHP I needed a quick little function to identify whether the visitor was real traffic or some machine scraping my site. I couldn’t find one after a few quick Google queries but the solution was trivial so I wrote my own. Hopefully this will save someone a few minutes. The function is as simple as possible and it seems to be working so far. I’ve been watching it for awhile and it hasn’t missed any yet. I’m assuming that it’s not going to catch everything but it would be nice to get most. Any bot user agent string suggestions would be helpful.


//returns 1 if the user agent is a bot
function is_bot($user_agent)
{
  //if no user agent is supplied then assume it's a bot
  if($user_agent == "")
    return 1;

  //array of bot strings to check for
  $bot_strings = Array(  "google",     "bot",
            "yahoo",     "spider",
            "archiver",   "curl",
            "python",     "nambu",
            "twitt",     "perl",
            "sphere",     "PEAR",
            "java",     "wordpress",
            "radian",     "crawl",
            "yandex",     "eventbox",
            "monitor",   "mechanize",
            "facebookexternal"
          );
  foreach($bot_strings as $bot)
  {
    if(strpos($user_agent,$bot) !== false)
    { return 1; }
  }
  
  return 0;
}

Share