Jun
4
2009
I’ve been doing a bit of site scraping using curl and PHP lately. I’ve found that most sites will ban your ip if they think you’re a bot (good thing I’m on DSL) so you need to make them think that your script is a browser. The easiest way to do this is to add a user agent header to your script. Here is an example of getting a results page from google for a specific search query.
function get_google_result($search_term)
{
$ch = curl_init();
$url = 'http://www.google.ca/search?hl=en&safe=off&q='
.urlencode($search_term);
$useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1)".
" Gecko/20061204 Firefox/2.0.0.1";
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$google_string = curl_exec($ch);
curl_close($ch);
$google_string = utf8_encode($google_string);
return $google_string;
}
So this function will take the search term that was provided and request the results from google for it. Our user agent header makes the script look like firefox.
I put the returned string (The entire google results page) into the google_string variable and return it for parsing out what is needed. My newest scrape experiment is a site called
Quick Content and basically scrapes some results from google based on some parameters from google hot trends and posts it all as a feed. It was fun to code up but it is desperate need of a makeover.
Comments Off on Make Web Sites Think that PHP CURL is a Browser. | tags: agent, google, internet, php, programming, scrape, user, web | posted in programming
Jun
3
2009
I used JSON for the first time today and it’s really nothing special. I’ve heard about it a few times but never really given it much thought and theres no reason I should have. Here is the list of all you really need to know about this syntax for passing around name value pairs and arrays to javascript.
- JSON is an acronym for ‘Javascript Object Notation’.
- JSON is fast. Mostly because it is recognized natively by Javascript so there’s no processing overhead.
- JSON is an ordered list of name value pairs.
- JSON is so much easier to read and write than XML due to it’s simplicity.
- Apperently (Untested by me) data is formatted as JSON then Ajax can travel across domains.
- Almost every language used in web development either already has a JSON library or set of functions. If one doesn’t, then creating functions is a trivial task.
- ‘var jsonObject = { ‘cody : ‘taylor’ };’ is referenced by ‘jsonObject.cody’ which gives us ‘taylor’.
- ‘var jsonObect = {‘javascript’ : {‘json’ : ‘not xml’ };’ is referenced by ‘jsonObject.javascript.json’ which gives us ‘not xml’.
- You can also reference the JSON object as if it was an associative array like ‘jsonObject[‘cody’]’ or ‘jsonObject.javascript[‘json’]’ which gives the same values as previously.
- If you don’t want to use key/value pairs you can define a normal data array. ‘jsonObject = {‘arrayOfData’: {‘numbers’ : [‘1’, ‘2’, ‘3’]}};’ We use indexes for this dataset. Don’t use indexes for the collections defined in the ‘{ }’. ‘jsonObject.arrayOfData.numbers[1]’ will give us ‘2’.
- You can put functions in the dataset to pass around executable code.
So now you know basically all there is to know about JSON.
4 comments | tags: Ajax, internet, javascript, JSON, web | posted in programming, reference
Jun
2
2009
So it seems that most of the popular web browsers will allow you to write javascript where you normally put the url address. I thought this was kinda neat.
If you put something like ‘javascript:alert(“it works”);’ into the url field of your browser you should get a popup that says ‘it works’. This can be used to change any attribute on whatever page you are currently looking at. All you need to do is browse to the desired page and put something like the following into the url bar in your browser.
javascript:document.getElementById(“title”).innerHTML=”Fake Title”;
Of course this will only work on pages that contain a div or p element with the id of title (which most sites have).
I did have some issues with this depending on the browser used. Some browsers, like firefox 3.0.10 on Ubuntu Linux would just show me a blank page with Fake Title on it. This was solved by putting the javascript statements in an anonymous function in the url line.
This is all good and fun for small little tweaks but what if you wanted to do something a little more… crafty.
It is possible to include an external javascript file and alter the page so that you can execute your own little script from a specific domain. Have you ever been playing around with some ajax and gotten an ‘Access to restricted URI denied (NS_ERROR_DOM_BAD_URI)’ error message? Well the method outlined below would allow you to put some proper ajax code into the site that isn’t allowing your requests.
While I’m not going to explain explicitly how to do anything fun I will show you how to include an external javascript file into a page in your browser that will allow you to call functions from that file on the page. Even though this will include the script you will still have to call a function from the script in the url bar. The code in the script will not execute by itself just because you included it.
javascript:{{ var e=document.createElement(“script”); alert(“here”);e.src = “http://quick-content.com/include_me.js”;e.type=”text/javascript”; document.getElementsByTagName(“head”)[0].appendChild(e);} test();}
The snippet above should all be one line. So if you paste that long one liner into the url bar of your preferred browser the javascript file from the url specified (http://quick-content.com/include_me.js) will be included into the html. At the very end of that statement I call the test() function that just changes the title of my blog to ‘Tech Stuff From Null’ as opposed to ‘Tech Stuff From Cody Taylor’. Try it out. Paste that code into your browser as you’re looking at my blog and check the title. This will work for any js file so have fun and don’t break anything.
6 comments | tags: browsers, fun, javascript, programming, web | posted in programming
Jun
1
2009
With the unveiling of Microsofts new search engine Bing I was curious to see which site uses more bandwidth per load and query. These results are taken from the firefox plugin firebug’s Net feature. This plugin will tell you what files were downloaded and how big they were along with how long they took. It’s a great way to see how fast your site is going to be.
Google Main Page 20 KB
Search for Cody Taylor 38 KB
Search for asdf asdf 10 KB
The search for Cody Taylor was so large because of images displayed at the top of the pages.
Bing Main Page 109 KB
Search for Cody Taylor 22 KB
Over 5 times the data of Google. The search for Cody Taylor on Bing didn’t show any pictures but was still twice the size of a normal Google seearch. If we take into account the New Ajax onMouseOver event on Bing for each search result it becomes 27 KB still without any images. Remember those articles that went off about how much energy every google query uses? Looks like Bing more than triples that amount.
Of course both these sites are much more effecient when we take into account that after the first visit most of the large data is already cached in our browser. For this test I was clearing all my cache between each transaction.
Bing does have quite a few redeeming features and for a first impression it looks like it may be a serious contender but it still lacks the simplicity and speed of google.
7 comments | tags: bandwidth, bing, comparison, google | posted in reviews, technical news
May
31
2009
I couldn’t find an easy to follow example to do this anywhere on the web so I did my own up. Basically I wanted a list where each item has an up and a down button that will dynamically reorder the list. I tried swapping the nodes themselves but it didn’t work so it seems that I have to use the removeChild and insertBefore methods to actually see the results on the page.
Here is the Javascript DOM Example
<ul id='list' name='list'>
</ul>
<script type='text/javascript'>
//swap the current li node with the one above it
function up(button_node)
{
var current_node = button_node.parentNode;
var list_children = document.getElementById('list').childNodes;
var list_node = document.getElementById('list');
var temp_node;
var previous_node;
for(var i=0;i<list_children.length;i++)
{
if(current_node.id == list_children[i].id && previous_node)
{
temp_node = list_children[i].cloneNode(true);
list_node.removeChild(list_children[i]);
list_node.insertBefore(temp_node,previous_node);
}
if(list_children[i].tagName=='LI')
previous_node=list_children[i];
}
}
//swap the chosen li node with the one below it
function down(button_node)
{
var current_node = button_node.parentNode;
var list_children = document.getElementById('list').childNodes;
var list_node = document.getElementById('list');
var temp_node;
var previous_node = "";
for(var i=0;i<list_children.length;i++)
{
if(current_node.id == list_children[i].id)
{
previous_node=list_children[i];
}
else if(previous_node && list_children[i].tagName=='LI')
{
temp_node = list_children[i].cloneNode(true);
list_node.removeChild(list_children[i]);
list_node.insertBefore(temp_node,previous_node);
previous_node = "";
}
}
}
//create the li elements
function create_list()
{
for(var i=0;i<6;i++)
{
add_li('list', "Item "+i," Item "+(i+1)+" \
<input type='button' id='up_button1' value='Up' onclick='up(this);'>\
<input type='button' id='down_button1' value='Down' onclick='down(this);'>");
}
}
//add a li element
function add_li(list, id, text) {
var list = document.getElementById(list);
var li = document.createElement("li");
li.innerHTML = text;
li.id=id;
list.appendChild(li);
}
create_list();
</script>
So it is a little longer than I expected but it works and it’s pretty easy to understand.
Comments Off on Javascript DOM List Swap Nodes Example | tags: dom, javascript, nodes, programming, web | posted in programming
May
30
2009
I tried a ‘tail settings.pyc’ in a new django project and I guess pyc means compiled?
Kinda neat what it did to my terminal though. Those lines ending in the dollar symbol are actually me hitting enter, that is my prompt.
1 comment | tags: problem, terminal, ubuntu
May
30
2009
Django is a high level python web framework that I’ve been hearing a lot about lately. I decided to try it out this weekend.
It took a little reading to get up and running so I documented the steps I took so others can get up and running a little more quickly.
sudo apt-get install django
django-admin –version
Create the appropriate directories and start the django project.
For no real reason I decided to put my new projects in /var/django.
sudo mkdir /var/django
sudo chmod 755 /var/django
cd /var/django
django-admin startproject mysite
cd mysite
mkdir apache
It seems that mod-wsgi is the best way to serve up python web apps due to mod_python being a little outdated. Note that django does come with a built in webserver that is really easy to get going. So if you’re just planning on evaluating the framework and not actually do any production aplications then that would be the way to go. mod_wsgi is an Apache module which can be used to host Python applications.
Install the module:
sudo apt-get install libapache2-mod-wsgi
Now for the configuration. It took me a few tries to get this right.
For now I just put my django site on port 8080 so I can play around without it being public to anyone else.
In the /var/django/mysite/apache directory I created a file called django.wsgi and put this in it:
import os, sys
apache_configuration= os.path.dirname(__file__)
project = os.path.dirname(apache_configuration)
sys.path.append(‘/var/django’)
os.environ[‘DJANGO_SETTINGS_MODULE’] = ‘mysite.settings’
import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()
Make sure that the sys.path.append line contains the directory above the project directory. This one took me awhile to figure out.
I was getting this error until I got it right.
[Sat May 30 15:33:24 2009] [error] [client 127.0.0.1] raise ImportError, “Could not import settings ‘%s’ (Is it on sys.path? Does it have syntax errors?): %s” % (self.SETTINGS_MODULE, e)
[Sat May 30 15:33:24 2009] [error] [client 127.0.0.1] ImportError: Could not import settings ‘mysite.settings’ (Is it on sys.path? Does it have syntax errors?): No module named mysite.settings
Next I added a new virtual host in my sites-enabled folder like this:
<VirtualHost *:808http://localhost:8080/0>
ServerAdmin root@mysite.com
ServerName mysite.com
ServerAlias mysite.com
<Directory /var/django/mysite/apache>
Allow from all
</Directory>
WSGIDaemonProcess www-data
WSGIProcessGroup www-data
WSGIScriptAlias / /var/django/mysite/apache/django.wsgi
</VirtualHost>
After restarting apache :
sudo /etc/init.d/apache2 restart
I got an “It Worked” page when I point my browser to http://localhost:8080/
Of course this doesn’t allow you to use a database in your applications yet but it’s a start.
1 comment | tags: apache, django, framework, programming, python, web, wsgi | posted in programming, reference
May
28
2009
Firstly check out the effect example. Hover over the list in the middle to see the effect. This is called the fisheye effect in dojo and it’s one of my favorites because it’s the easiest to implement. It’s ridiculously easy to implement. I’m going to assume that you know what a basic html page looks like, so in a quick three parts here’s the code:
First the Javascript:
<script type=”text/javascript” src=”includes/dojo/dojo/dojo.js” djConfig=”parseOnLoad: true”></script>
<script type=”text/javascript”>
dojo.require(“dojox.widget.FisheyeLite”);
dojo.addOnLoad(function(){
//make the list items into fisheye items
dojo.query(“li.fish”, dojo.byId(“fishyList”)).forEach(function(n){
new dojox.widget.FisheyeLite({ },n);
});
});
</script>
Next the CSS:
<style type=”text/css”>
@import “includes/dojo/dojo/resources/dojo.css”;
@import “css/dojo_lists.css”;
#fishyList ul {
width:600px;
list-style-type:none;
}
.fisheyeTarget {
font-weight:bold;
font-size:19px;
}
</style>
None of the CSS really matters as far as functionality goes. It does go a long way into making it look a lot slicker.
Now the HTML:
<ul id=”fishyList”>
<li class=”fish”><span class=”fisheyeTarget”>Cody Taylor’s<br><br></span></li>
<li class=”fish”><span class=”fisheyeTarget”>Dojo Javascript <br><br></span></li>
<li class=”fish”><span class=”fisheyeTarget”>Slick FishEye Effect<br><br></span></li>
</ul>
Of course you’ll have to download and put the dojo framework on your webserver and alter the include statements but other than that it’s that simple. You may want to look at the dojo documentation for the dojo.query and forEach functions.
3 comments | tags: dojo, dojo fisheye example, dojo programming, fisheye, programming | posted in programming, reference
May
27
2009
Q2 is basically April, May, and June and the 5.3 timeline says Q2 stable release. I’ve been reading mostly good things about the experimental release and considering all the new features being added every php developer should be stoked. I’m not just talking about the new closures/lambda functions which are gonna make my life a lot more fun. Here’s a list of all the fun stuff that I’ll be playing with next month:
- Namespace support. It’s about time.
- Late static binding. Somewhere between static and dynamic binding.
- Jump label (limited goto)
- Native closures. Simple javascript-like lambda functions. None of that create_function stuff anymore.
- PHP archives (phar). Not sure why?
- Garbage collection for circular references.
- Ternary shortcut. Still haven’t seen an example of what this is?
- Internationalization extension.
I’ve always been told that goto’s are bad programming practice but sometimes they can be very, very useful. Hopefully they get all the bugs out in the experimental versions and I can start using it at work shortly after release.
4 comments | tags: new features, php, php release | posted in programming, reviews
May
26
2009
I’ve recently been playing with anonymous (lambda) functions in javascript and I was thinking that it would be great if I could do the same thing in php, which is what I spend most of my time with. Turns out that you can. It’s not as clean and pretty as with javascript but it’s still a lambda function and functional programmers are better…aren’t they?
First the spec :
string create_function ( string $args , string $code )
So here’s a little example of something completely useless that can be done using anonymous functions in PHP :
$first_arg = 100;
$some_function = create_function(‘$first_arg’,’return $first_arg/5;’);
function process($arg,$func)
{
while($func($arg)>5)
{
$arg–;
}
return $arg;
}
echo process($first_arg,$some_function);
So we get 25 echoed to the screen. A bunch of better examples can be found at the php website. If you’re willing to read…
Nothing really special but It’s still cool that can pass functions into another PHP function just like a regular variable.
11 comments | tags: anonymous, anonymous function php, anonymous php, lambda, lambda php, php | posted in programming, reference