Jun 14 2009

Detect Bots By Parsing The User Agent With PHP

Because I’ve been starting to keep a closer eye on my traffic I’ve been logging everything to my databases. When writing reports for this information I noticed that there was huge amounts of traffic from bots. Yahoo, Google, MSN and many others have been hammering my sites quite a lot lately. Since I’m writing my reports in PHP I needed a quick little function to identify whether the visitor was real traffic or some machine scraping my site. I couldn’t find one after a few quick Google queries but the solution was trivial so I wrote my own. Hopefully this will save someone a few minutes. The function is as simple as possible and it seems to be working so far. I’ve been watching it for awhile and it hasn’t missed any yet. I’m assuming that it’s not going to catch everything but it would be nice to get most. Any bot user agent string suggestions would be helpful.

//returns 1 if the user agent is a bot
function is_bot($user_agent)
  //if no user agent is supplied then assume it's a bot
  if($user_agent == "")
    return 1;

  //array of bot strings to check for
  $bot_strings = Array(  "google",     "bot",
            "yahoo",     "spider",
            "archiver",   "curl",
            "python",     "nambu",
            "twitt",     "perl",
            "sphere",     "PEAR",
            "java",     "wordpress",
            "radian",     "crawl",
            "yandex",     "eventbox",
            "monitor",   "mechanize",
  foreach($bot_strings as $bot)
    if(strpos($user_agent,$bot) !== false)
    { return 1; }
  return 0;


Jun 10 2009

Match An IP Address To A Physical Location With PHP

Sometimes it’s useful to locate where a certain IP address is located. Usually this isn’t really that accurate but at least it gives you a general idea for stats or marketing research. I’ve written a simple php function that uses curl to query a IP address location database for the location info. In the function I specify that I want a JSON data structure containing the location information. I chose JSON over the XML alternative just for ease of use. The object that we get returned needs to be referenced in a specific kind of way. I show how under the function.
This function uses the IPInfoDB JSON php API. There are quite a few alternatives out there so feel free to do your research. The IPInfoDB PHP API returns :

  • Country Code (CA or US)
  • Country Name (Canada)
  • Region Code
  • Region Name (Province or Dtate)
  • City
  • Postalcode or Zip Code (which doesn’t always work)
  • Latitude
  • Longitude
  • gmt offset
  • Dst offset

function get_location_info($ip_address)
  $url = 'http://ipinfodb.com/ip_query.php?ip=';
  $url .= $ip_address;
        $url .= '&output=json';
  $ch = curl_init();
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_HEADER, 0);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
  $json_packet = curl_exec($ch);
  $location_array = json_decode($json_packet);


$loc_arr = get_location_info("");
echo "\n";
echo $loc_arr->{'City'};
echo "\n";
echo $loc_arr->{'RegionName'};
echo "\n";
echo $loc_arr->{'CountryCode'};

Hopefully theres no confusion there.


Jun 9 2009

Adding Sack Ajax To Your WordPress Plugin’s Admin Page

Using my previous wordpress example plugin I’m going to demonstrate how to use ajax within the admin panel. WordPress uses the Simple AJAX Code-Kit (SACK) which is relatively easy to use and understand.

For this example I’m going to expand on my previous example and add a simple javascript function that queries some data from the server. WordPress forces us to do this in a roundabout way. First we have to add two new function hooks. One prints my javascript function in the scripts section of the admin panel and the other is the code that gets called by the ajax sack request.

//add my custom ajax function to the scripts section in the admin panel
add_action('admin_print_scripts', 'ajax_request');

//add data returning ajax refresh table function
add_action('wp_ajax_do_something', 'get_random_number');

The ajax_request function is a shell function for the javascript ajax call get_random_number_from_server which is our sack ajax request. Note that we are calling the admin-ajax.php script. This script will handle the calling of the get_random_number function for us because of the action defined above.

//This function will print out in the header section. 
//Put all your javascript in this function.
function ajax_request()
        //print out the sack ajax library
        wp_print_scripts( array( 'sack' ));  
  <script type="text/javascript">
  function get_random_number_from_server()
    //creates the sack object and 
                //gives it the url that it should request to.
    var mysack = new sack( '<?php 
                bloginfo( "wpurl" ); ?>/wp-admin/admin-ajax.php' );    
    mysack.execute = 1;   //execute whatever is returned
    mysack.method = 'POST';
    //Set POST fields
    mysack.setVar( "action", "do_something" );
    mysack.encVar( "cookie", document.cookie, false );
    mysack.onError = function() { alert('Ajax Error')};
    mysack.runAJAX();  //run the result

    return true;

  } // end of JavaScript function myplugin_ajax_elevation

This is the php function that is called by the javascript function above. It spits out a alert javascript function and then dies. Not sure why but it is recommended to die in this function.

//This is the server side code for the ajax sack request
function get_random_number()
  $minimum_number = 0;
  $maximum_number = 100;

This code can now be run like the following in your plugins admin page.

<input type='button' value='Get Random Number' 

When the button is hit a popup with a random number in it will pop up. Here’s a link to the entire example plugin code.


Jun 7 2009

Simple WordPress Example Plugin

I wrote a plugin for wordpress this weekend and all the documentation and guides were fairly long winded. This is probably a good thing for most people but I never like large amounts of reading for something simple. So the following outlines and explains a very short and simple plugin for wordpress.

WordPress uses hooks to give plugins certain places to execute code. Hooks are created by using the add_action function where the first parameter is the desired, predefined hook and the second is the name of the function that you want to be called at that point. Check out the list of hooks.

In this example I first hook the install function which will be run when the plugin is activated. The second hook defines a function that will be called when a page is loaded and the footer code is executed. The third hook is for the admin menu. This one is a bit of a two step hook. For some reason wordpress makes you call a second function on top of the add_action function in order to get an entry in the admin panel. So the add_options_page is used for just that. It creates a page under the settings header called ‘WP Example’. the third parameter is the name of the function that creates the admin options page.

The commented out name and example at the top of the script is actually used by wordpress when you are activating and deactivating the plugin.

Here’s the code:


/* Plugin Name: Example Plugin
 * Description: Simple Example Plugin
 *  */

//hook the installation function to the activation plugin

//run the alert_user function in the wp_footer 
add_action( 'wp_footer', 'alert_user', 20);

//create a admin page 
add_action('admin_menu', 'my_example_admin_menu');

//make the example_options function is the admin option page
function my_example_admin_menu() {
  add_options_page(  'WP Example Options', 
            'WP Example', 

//This code is run on the admin options page for this plugin
function example_options() {
    echo "<div>This Simple Example Plugin Welcomes Every Visitor</div>";

//This funtion is called in the wp_footer and welcomes every visitor
function alert_user()
    echo "<script type='text/javascript'>".
                  "alert('Welcome to My Wordpress Blog');".

//This function is run when we activate the plugin
//It is where we would put all our create table and option code
//Since this is a simple example we won't do anything here
function install_example() 


If you put all that in a .php file and upload it to your wp-content/plugins directory you can activate it, deactivate it, view the options page and it will alert every user that comes to your site.


Jun 6 2009

Get Site Visitor Information With PHP

I wanted to do a few custom things with one of my site’s stats. PHP made this nice and simple with the $_SERVER array. When someone visits one of your scripts the $_SERVER array is automatically filled so as long as you know the keys then you can get all the information needed. These are the most useful items for identifying a visitor that I found in this using this array.

$_SERVER[‘REMOTE_ADDR’] gives you their ip address.
$_SERVER[‘HTTP_USER_AGENT’] is their user-agent (What browser they are using).
$_SERVER[‘HTTP_REFERER’] is where they came from.
$_SERVER[‘REQUEST_URI’] is the page they want to view.

This data is kinda useless unless you put it in a database so here’s a create table mySQL statement.

$sql = "CREATE TABLE visitors (
      `visitor_id` mediumint(11) NOT NULL AUTO_INCREMENT,
      `ip_address` VARCHAR(16)  NOT NULL,
      `user_agent` varchar(100) NOT NULL default '',
      `referrer` varchar(256) NOT NULL default '',
      `target_url` VARCHAR(256) NOT NULL,
      PRIMARY KEY  (`visitor_id`)      

If you’d like to explore the $_SERVER array more there is very detailed info at the php web site.


Jun 4 2009

Make Web Sites Think that PHP CURL is a Browser.

I’ve been doing a bit of site scraping using curl and PHP lately. I’ve found that most sites will ban your ip if they think you’re a bot (good thing I’m on DSL) so you need to make them think that your script is a browser. The easiest way to do this is to add a user agent header to your script. Here is an example of getting a results page from google for a specific search query.

function get_google_result($search_term)
  $ch = curl_init();
  $url = 'http://www.google.ca/search?hl=en&safe=off&q='
  $useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:".
                            " Gecko/20061204 Firefox/";
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_HEADER, 1);
  curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
  $google_string = curl_exec($ch);
  $google_string = utf8_encode($google_string);
  return $google_string;

So this function will take the search term that was provided and request the results from google for it. Our user agent header makes the script look like firefox.
I put the returned string (The entire google results page) into the google_string variable and return it for parsing out what is needed. My newest scrape experiment is a site called Quick Content and basically scrapes some results from google based on some parameters from google hot trends and posts it all as a feed. It was fun to code up but it is desperate need of a makeover.


Jun 3 2009

10 Things To Know About JSON (JSON Javascript Examples)

I used JSON for the first time today and it’s really nothing special. I’ve heard about it a few times but never really given it much thought and theres no reason I should have. Here is the list of all you really need to know about this syntax for passing around name value pairs and arrays to javascript.

  1. JSON is an acronym for ‘Javascript Object Notation’.
  2. JSON is fast. Mostly because it is recognized natively by Javascript so there’s no processing overhead.
  3. JSON is an ordered list of name value pairs.
  4. JSON is so much easier to read and write than XML due to it’s simplicity.
  5. Apperently (Untested by me) data is formatted as JSON then Ajax can travel across domains.
  6. Almost every language used in web development either already has a JSON library or set of functions. If one doesn’t, then creating functions is a trivial task.
  7. ‘var jsonObject = { ‘cody : ‘taylor’ };’ is referenced by ‘jsonObject.cody’ which gives us ‘taylor’.
  8. ‘var jsonObect = {‘javascript’ : {‘json’ : ‘not xml’ };’ is referenced by ‘jsonObject.javascript.json’ which gives us ‘not xml’.
  9. You can also reference the JSON object as if it was an associative array like ‘jsonObject[‘cody’]’ or ‘jsonObject.javascript[‘json’]’ which gives the same values as previously.
  10. If you don’t want to use key/value pairs you can define a normal data array. ‘jsonObject = {‘arrayOfData’: {‘numbers’ : [‘1’, ‘2’, ‘3’]}};’ We use indexes for this dataset. Don’t use indexes for the collections defined in the ‘{ }’. ‘jsonObject.arrayOfData.numbers[1]’ will give us ‘2’.
  11. You can put functions in the dataset to pass around executable code.

So now you know basically all there is to know about JSON.


Jun 2 2009

Execute Javascript in URL Bar of Browser

So it seems that most of the popular web browsers will allow you to write javascript where you normally put the url address. I thought this was kinda neat.

If you put something like ‘javascript:alert(“it works”);’ into the url field of your browser you should get a popup that says ‘it works’. This can be used to change any attribute on whatever page you are currently looking at. All you need to do is browse to the desired page and put something like the following into the url bar in your browser.

javascript:document.getElementById(“title”).innerHTML=”Fake Title”;

Of course this will only work on pages that contain a div or p element with the id of title (which most sites have).
I did have some issues with this depending on the browser used. Some browsers, like firefox 3.0.10 on Ubuntu Linux would just show me a blank page with Fake Title on it. This was solved by putting the javascript statements in an anonymous function in the url line.

This is all good and fun for small little tweaks but what if you wanted to do something a little more… crafty.
It is possible to include an external javascript file and alter the page so that you can execute your own little script from a specific domain. Have you ever been playing around with some ajax and gotten an ‘Access to restricted URI denied (NS_ERROR_DOM_BAD_URI)’ error message? Well the method outlined below would allow you to put some proper ajax code into the site that isn’t allowing your requests.

While I’m not going to explain explicitly how to do anything fun I will show you how to include an external javascript file into a page in your browser that will allow you to call functions from that file on the page. Even though this will include the script you will still have to call a function from the script in the url bar. The code in the script will not execute by itself just because you included it.

javascript:{{ var e=document.createElement(“script”); alert(“here”);e.src = “http://quick-content.com/include_me.js”;e.type=”text/javascript”; document.getElementsByTagName(“head”)[0].appendChild(e);} test();}

The snippet above should all be one line. So if you paste that long one liner into the url bar of your preferred browser the javascript file from the url specified (http://quick-content.com/include_me.js) will be included into the html. At the very end of that statement I call the test() function that just changes the title of my blog to ‘Tech Stuff From Null’ as opposed to ‘Tech Stuff From Cody Taylor’. Try it out. Paste that code into your browser as you’re looking at my blog and check the title. This will work for any js file so have fun and don’t break anything.


Jun 1 2009

Bing VS Google Bandwidth Comparison

With the unveiling of Microsofts new search engine Bing I was curious to see which site uses more bandwidth per load and query. These results are taken from the firefox plugin firebug’s Net feature. This plugin will tell you what files were downloaded and how big they were along with how long they took. It’s a great way to see how fast your site is going to be.

Google Main Page 20 KB
Search for Cody Taylor 38 KB
Search for asdf asdf 10 KB

The search for Cody Taylor was so large because of images displayed at the top of the pages.

Bing Main Page 109 KB
Search for Cody Taylor 22 KB

Over 5 times the data of Google. The search for Cody Taylor on Bing didn’t show any pictures but was still twice the size of a normal Google seearch. If we take into account the New Ajax onMouseOver event on Bing for each search result it becomes 27 KB still without any images. Remember those articles that went off about how much energy every google query uses? Looks like Bing more than triples that amount.

Of course both these sites are much more effecient when we take into account that after the first visit most of the large data is already cached in our browser. For this test I was clearing all my cache between each transaction.

Bing does have quite a few redeeming features and for a first impression it looks like it may be a serious contender but it still lacks the simplicity and speed of google.