Thursday, December 27, 2007

Ubuntu's little upstart

One of my customers has a mix of Debian and Ubuntu servers installed in their network with a treasure trove of distribution versions from Debian GNU/Linux 3.0 (Woody) to Ubuntu 7.10 (Gutsy Gibbon). They have a fair amount of custom debs which need to coexist on these servers.

While installing a new Gutsy box I noticed that several of the in-house debs were failing while trying to modify /etc/inittab (our preferred way to keep things running that should never die). This was quite confusing to me as the last Ubuntu server I worked with was Ubuntu 6.06.1 LTS (Dapper Drake) which did not exhibit this weirdness.

Down the rabbit hole
Imagine my amazement when I tried to find /etc/inittab and it was completely missing! My reality reset, I checked again and it was still missing.

At first I thought some critical package (sysvinit in the versions of Ubuntu I know) was somehow missing after the base install. I jumped on the Ubuntu packages site and did a search for 'sysvinit'. Sure enough, there it was but the file list showed no inittab.

Suddenly I felt like I a four year old whose mommy had lost them at the Mall.

Further down the spiral
The file list did however list a bunch of items that referred to something called 'upstart-compat-sysv'. Browsing to this package listed the follwoing description for the package:

This package contains compatibility tasks and utilities that emulate the behaviour of the original sysvinit package, including runlevels, and ensures that the initscripts in /etc/rc*.d are still run.

OK. I faintly hear someone announcing that my mother is looking for me at the Mall's security office.

So if upstart-compat-sysv _emulates_ the original sysvinit, then what has _replaced_ it?

Upstart, show thy face
Google provided me with this link that points to the Ubuntu upstart page which went a long way towards uniting me and my mommy:

Upstart is an event-based replacement for the /sbin/init daemon which handles starting of tasks and services during boot, stopping them during shutdown and supervising them while the system is running.

It was originally developed for the Ubuntu distribution, but is intended to be suitable for deployment in all Linux distributions as a replacement for the venerable System-V init.

Sounds even more pervasive than a simple missing iniitab!

Features:
  • Tasks and Services are started and stopped by events
  • Events are generated as tasks and services are started and stopped
  • Events may be received from any other process on the system
  • Services may be respawned if they die unexpectedly
  • Bi-directional communication with init daemon to discover which jobs are running, why jobs failed, etc.
To my amazement upstart has been turned on by default since Ubuntu 6.10 (Edgy Eft) which explains why I was totally in the dark (but upstart was seemingly nothing new to everyone that had been following the normal upgrade path).

So, upstart was not only superseding everything we were trying to do with inittab but also changed the way one interacts with scripts via /etc/event.d.

Why reinvent the wheel?
With the Ubuntu decision to move to the 2.6 kernel, and all the hotplug facilities it provides, they were left with several problems in Dapper. The kernel could now completely cope with hardware coming and going but that had a knock-on effect in that there was now no way to guarantee that particular devices were available at particular point in the boot process.

For example: Dapper cannot mount USB disks in /etc/fstab because it is not guaranteed that the block device exists at the point in the mount process where that happens.

Several other reasons are also provided on the Ubuntu site.

At first, the team decided to look at the available alternatives in third party projects such as Solaris SMF, Apple's launchd, the LSB initserv/chkconfig tools and initNG. None of these met the design criteria that the team had set out for themselves and in true OSS style they decided to roll their own.

The design and implementation documentation is pretty clear so I won't repeat it here.

Epilogue
In the end I simply had to add some logic to the custom debs to check if the system was running upstart and do The Right Thing(TM) (using upstart if it was available or falling back to using inittab) based on that. That amounted to dropping a relevant file in /etc/event.d/ for every server that we previously ran from inittab for systems that were using upstart.

Further Reading



Friday, October 5, 2007

Func it up with JavaScript

UPDATE: David Pollak has a great introduction to FP via JavaScript and Ruby.

Using functional programming paradigms in JavaScript are non-existent, clumsy, verbose and difficult to read in most cases. "Oliver Steel":http://osteele.com/ has built an excellent little library that does the grunt work for you when trying to get your func on.

To Func || ! Func
Most people go about their imperial programming days without a thought of functional programming techniques and how they can be applied to their daily problems. In most part this is due to the inherent difficulties with _doing_ functional programming in their tool of choice.

I urge you to do some further investigation (read Why Functional Programming Matters) into functional programming techniques even if you think you'll never use them anywhere. This is not intended to be an exercise in (academic) pointlessness but to place you outside of your comfort zone and expand your thinking across different domains. The depth of knowledge gained from this will enable you to solve problems from a larger pool of tools (sometimes allowing you to bring functional programming paradigms to bare on a problem or simply augmenting your existing tools for a more efficient or elegant solution to problems).

Learn to get your Func on!

From a pure functional programming language perspective this approach to problem solving offers the following advantages over the imperative and OOP approaches:
  • No (re-)assignment
  • No side effects
  • No flow of control
Functional calls can therefore have no other effect that to compute its result. In a pure functional language there are no assignments statements. Once you assign a value to a variable the variable never changes. In this sense variables in a functional language have more in common with algebraic variables that the normal programming stock we're used to.

At first this seems like a debilitating restriction but after giving your brain some time to expand you'll find that this simple restriction eliminates one of the largest sources of bugs in programming and makes the order of execution irrelevant as no side-effect can change the value of an expression and it can be evaluated at any time.

Gone are the days of worrying about orchestrating the flow control of your program. Your programs are now referentially transparent because expressions, variables and their values can be freely evaluated and replaced at any time.

Elements of Func
From a strict academic sense functional programming refers to programs that has a main body which is a function that receives it's input as its arguments and delivers the output/transformation as it's result.

So far this definition should not seem too foreign to most people that have worked with c/c++. Where this departs from the general imperative meme is that the main function is generally defined in terms of other functions, which in
turn are defined in terms of still more functions, until at the lowest level the functions are first-class citizens (language primitives).

These functions are much like ordinary mathematical functions (in that the same input will always deliver the same output).

Higher-order programming (HOP), function level programming (FLP) and partial function application (PFA) are all styles used in functional programming.

Programming Transcendence
HOP is the ability to use functions as values, in other words you can pass functions as arguments to other functions and functions can be returned as a value of other functions. An example of HOP in JavaScript would be something like the simple sort() method that you can apply to an array.

In its simplest form the sort() function takes an unordered/ordered array and sorts the array:

var a = [2,3,1,4]
document.write(a.sort())
// prints "1,2,3,4"

The sort() method however allows you to use a comparison function as an optional argument, allowing you to pass it a function as a parameter, ergo implementing HOP. Let's assume we've got an array of date objects that we want to sort in a chronological order:

array_of_dates.sort{ function (x, y) { return x.date - y.date; } }

Here we pass in an anonymous function as our comparison function to sort(). The anonymous function is called for each object in the array of dates and it must return a negative value when x < x ="="> y.

This technique is best used when you have at least two functions that perform the same take with a slight variance. Here you would then combine the functions by replacing the part(s) that are different with a function call to a separate function which is passed in to the more general function as a function parameter.

The Functional library implements string lambdas that allow you to express some of the functional programming tools more succinctly. The traditional JavaScript way of doing say a map or filter would be something like this:

map(function(x){return x+1}, [1,2,3]) // returns [2,3,4]
filter(function(x){return x>2}, [1,2,3,4]] // returns [3,4]
some(function(w){return w.length < 3}, 'are there any short words?'.split(' ')) // returns false

Instead, string lambdas allow you to write this in the following way:

map('x+1', [1,2,3])
select('x>2', [1,2,3,4])
some('_.length < 3', 'are there any short words?'.split(' '))

Here are some other way to bend a program to your functional will using simply map, reduce and filter:

// Double the items in a list:
map('*2', [1,2,3]) // [2, 4, 6]

// Find just the odd numbers:
filter('%2', [1,2,3,4]) // [1, 3]

// Find just the evens:
filter(not('%2'), [1,2,3,4]) // [2, 4]

// Find the length of the longest word:
reduce(Math.max, 0, map('_.length', 'how long is the longest word?'.split(' '))) // 7

// Parse a binary array:
reduce('2*x+y', 0, [1,0,1,0]) // 10

// Parse a (non-negative) decimal string:
reduce('x*10+y', 0, map('.charCodeAt(0)-48', '123'.split(/(?=.)/))) // 123
Much more succinct, clear to read and easier to understand.

Func Levels
Value-level programming manipulates values, transforming a sequence of inputs into an output. Function-level programming manipulates functions, applying operations to functions to construct a new function. This new function transforms the inputs into outputs.

How can we make JavaScript dance to a functional-level programming paradigm using the Functional library as meter? Here's some example's:

// Find the reciprocal only of values that test true:
map(guard('1/'), [1,2,null,4]) // [1, 0.5, null, 0.25]

// Apply '10+' only to even values, leaving the odd ones alone:
map(guard('10+', not('%2')), [1,2,3,4]) // [1, 12, 3, 14]

// Write a version of map that only applies to the evens:
var even = not('%2');
var mapEvens = map.prefilterAt(0, guard.rcurry(even));
mapEvens('10+', [1,2,3,4])

// Find the first power of two that's greater than 100:
until('>100', '2*')(1) // 128

// Or, the first three-digit power of two (these are equivalent):
until('String(_).length>2', '2*')(1)
until(compose('>2', pluck('length'), String), '2*')(1)
until(sequence(String, pluck('length'), '>2'), '2*')(1)
Hot/Medium/Mild Curry?
Partial function application (aka currying) transforms a function that takes n arguments into a function that takes only one argument and returns a curried function of n - 1 arguments.

In English please! OK, let's try:

Currying is the process of partially, or incrementally, supplying arguments to a function. Curried functions are delayed functions expecting the remainder of the arguments to be supplied. Once all the arguments are supplied, the function evaluates normally. So, curried functions lead to lazy execution of the complete function.

From the definitions above it is clear that partial function application, or specialisation, creates a new function out of an old one. To illustrate how we apply this with Functional we'll implement between(x, y, z) which determines whether y is bounded by x and z. We then curry the first and last arguments to produce a function that tests whether a number is positive:

// Function that needs to be curried
function increasing(a, b, c)
{
return a < b && b < c;
}

// Define the set of positive numbers via lazy evaluation
var positive = increasing.partial(0, _, Infinity);

// Determine if each of the values -1, 0 and 1 fall in our range
map(positive, [-1, 0, 1]) // [false, false, true]

// Define the set of negative numbers via lazy evaluation
var negative = increasing.partial(-Infinity, _, 0);

// Determine if each of -1, 0 and 1 fall in our range
map(negative, [-1, 0, 1]) // [true, false, false]
Currying leads to lazy evaluation which allows you to work with structures like the infinite sets we created above. Cool eh?!

Epilogue
Functional does a great job at making your life easier if you want to experiment with functional programming in JavaScript without getting yourself tangled up in the verbose, standard syntax. The creator does however offer a word of warning with regards to performance if you use this lib in production. Functional is also confirmed to work in Firefox 2.0, Safari 3.0, and MSIE 6.0.


Tuesday, August 14, 2007

Links

Dynamic Attribute-based Finder Extensions
szeryf has written a nice little exposé on how you can extend ActiveRecord::Base to add _or_ and _not_ operators to the dynamic finders that rails provides you with.

You can write the following types of finders out of the box with rails:

User.find_by_login_and_status(some_login, 1)
User.find_by_login_and_status_and_role_(some_login, 1, role)

The additional extensions as described in the article adds the following to the repertoire above:

User.find_by_login_and_status_or_role(some_login, 1, role)
User.find_by_login_and_status_not_role_(some_login, 1)

The two statements above would then result in the following SQL, respectively:

login = ? and (status = ? or (role = ?))
login = ? and (status = ? not (role = ?))

rparsec (the union of ActiveRecord :select and :include)
As you most probably already know you use :select in a find to modify the fields that are in your result set and :include to specify that related tables are loaded via joins to provide improved performance. Unfortunately you cannot use either of these two together due to the limitations imposed by the ActiveRecord implementation.

:select meets :include (or a pitch for rparsec) is an interesting article by Charlie Savage which suggests using a SQL SELECT parser to provide the required functionality.

He goes on to suggest doing this with the rparsec parser combinator framework.

The Controller Formula
Nick Kallen provides a lucid look at how one can produce poetic controller code.

Creating Multiple Models in One Action
This article is a followup on the previous one elaborating on the method that can be used to create multiple models from one action.

It covers the simplistic case where one model is simply created based on the creation of the other (when crating a group model, the creating uses needs to be the first member of the group) and the more complex case where there is a dependancy relationship between two models that needs to be enforced (a cyclops creation cannot succeed if the creation of the eye does not succeed).

Sunday, August 12, 2007

Prototype, IE and Edge Rails Failures

I recently wrote a little conceptual file upload application with scaffold_resource that used the iframe remoting pattern with some baked-in AJAX goodness to minimise the amount of data returned from the server as well as making the UI a lot more snappy.

Everything went well and tested a-OK in Firefox. Unfortunately my default development platform does not support IE so I only tested the app in IE a little later. To my horror IE rendered the page differently as well as spewing the following JS error:

Line: 1629
Char: 9
Error: Invalid target element for this operation.
Code: 0
URL: <REMOVED>

Prototype goodness wherefor art thou?
A few concise questions to the Oracle of Google and I found a thread started by Rob Sanheim which detailed the same problem I was seeing.

A fix, that worked for many of the thread readers, was proposed by Andy (12 December 2005 @ 1pm) in the same thread to deal with the way in which prototype inserts content in a tbody or tr tag.

So, off I went and upgraded my app to the latest Edge Rails using:

rake rails:freeze:edge

Edge Rails will you be the end of me?

Starting my app up I got a similar nasty to this:

/Applications/Locomotive2/Bundles/standardRailsFeb2007.locobundle/i386/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `gem_original_require': no such file to load -- active_resource (MissingSourceFile) from /Applications/Locomotive2/Bundles/standardRailsFeb2007.locobundle/i386/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `require' from /Users/joelmeyer/src/FotoDir/vendor/rails/activerecord/lib/../../activesupport/lib/active_support/dependencies.rb:495:in `require' from /Users/joelmeyer/src/FotoDir/vendor/rails/activerecord/lib/../../activesupport/lib/active_support/dependencies.rb:342:in `new_constants_in' from /Users/joelmeyer/src/FotoDir/vendor/rails/activerecord/lib/../../activesupport/lib/active_support/dependencies.rb:495:in `require' from ./config/../vendor/rails/railties/lib/initializer.rb:160:in `require_frameworks' from ./config/../vendor/rails/railties/lib/initializer.rb:160:in `each' from ./config/../vendor/rails/railties/lib/initializer.rb:160:in `require_frameworks' from ./config/../vendor/rails/railties/lib/initializer.rb:88:in `process' ... 8 levels... from /Applications/Locomotive2/Bundles/standardRailsFeb2007.locobundle/i386/lib/ruby/gems/1.8/gems/capistrano-1.4.1/lib/capistrano/cli.rb:12:in `execute!' from /Applications/Locomotive2/Bundles/standardRailsFeb2007.locobundle/i386/lib/ruby/gems/1.8/gems/capistrano-1.4.1/bin/cap:11 from /Applications/Locomotive2/Bundles/standardRailsFeb2007.locobundle/i386/bin/cap:16:in `load' from /Applications/Locomotive2/Bundles/standardRailsFeb2007.locobundle/i386/bin/cap:16

Seems that when you're using Edge Rails you need to use the active resource gem in tandem or you world will turn pear-shaped. Installing the gem, with dependancies did the trick:

gem install -y activeresource --source http://gems.rubyonrails.org

IE: Dark side of the moon
So, I have now upgraded to the latest prototype lib (via the Edge Rails upgrade) and got Edge Rails to run with the active_scaffold and active resources I was using in my app.

Was everything fine then and did I get to live a more fulfilling existence? NAY!

Back to the entry by Rob Sanheim and I noticed that Chris Nolan reported that the prototype lib had been fixed but that he was still experiencing the same issue.

There is a boat and I feel that Chris and I are both in it together, brothers in misery.

While debugging the issue with trusty old JS alert() he found that he was trying to insert content at an id that belonged to a table tag which was not supported. The tags supported for this were of course tbody and/or tr.

Even though the tbody tag seems to be optional according to the W3C IE still does not allow you to insert the content based on a table id.

The simple addition of the required tbody tags cured all.

Sunday, August 5, 2007

Turning responsibility inside-out via delegation

5 August 2007

Turning responsibility inside-out via delegation

What is the delegation pattern? You find this where you have an object that expresses a certain behaviour externally but internally defers, or, delegates the responsibility for implementation to another object in an inversion of responsibility.

Turning responsibility inside-out
Ruby provides five ways to accomplish this: three (SimpleDelegator, DelegateClass and Delegator) encapsulated in the delegate library and the remaining two (Forwardable and SingleForwardable) via the forwardable library.

Let's use a queue data structure that delegates to an array to illustrate the various ways of accomplishing delegation.

SimpleDelegator
This is the simplest way to accomplish delegation. You simply pass an object to the constructor and all methods supported by the object will be delegated.

_This object can be changed later._

require 'delegate'

class Queue
def initialize
@sd = SimpleDelegator.new([]) # we delegate to an array object
end

def enqueue(element)
@sd.push(element)
end

def dequeue
@sd.shift
end
end

q = Queue.new
q.enqueue(10) # [10]
q.enqueue(20) # [10, 20]
q.dequeue # [20]

If you want to change the object you're delegating to you just use __setobj__(obj). You should just keep in mind that this does *not* cause SimpleDelegator’s methods to change which means that you should only be delegating to objects of the same type as the original delegate to avoid nastiness.

DelegateClass
If SimpleDelegator does not spin your propeller then the next step would be to look at DelegateClass. Using the top level DelegateClass method to setup delegation through class inheritance is considered more flexible and is seemingly the most common use for this library.

require 'delegate'

class Queue < DelegateClass(Array) # we delegate to an array object
def initialize(arg=[])
super(arg)
end

alias_method :enqueue, :push # alias_method sets up the method aliasing for us
alias_method :dequeue, :shift
end

q = Queue.new
q.enqueue(10) # [10]
q.enqueue(20) # [10, 20]
q.dequeue # [20]

Delegator
The final tool from the delegator library is Delegator which provides you with full control over the delegation scheme. The contrived example below is derived from the SimpleDelegator’s implementation.

require 'delegate'

class QueueDelegator < Delegator # inherit from the Delegator class
def initialize(obj)
super # pass obj to Delegator constructor
@_sd_obj = obj # store obj for future use
end

def __getobj__
@_sd_obj # return the object we are delegating to
end

def __setobj__(obj)
@_sd_obj = obj # change delegation object, a feature we're providing
end
end

The conventional wisdom here however is that you should most likely be using the forwardable library instead of Delegator.

Fowardable
If you need class-level delegation this is your beast of burden.

require 'forwardable'

class Queue
extend Forwardable

def initialize(obj=[])
@queue = obj # delegate to this object
end

def_delegator :@queue, :push, :enqueue
def_delegator :@queue, :shift, :dequeue
def_delegators :@queue, :clear, :empty?, :length, :size, :<<
end

There are a few things to take note of here. First, def_delegator is used to set up the delegation relationship between the method call, the delegated object and the method to call on the delegated object.

Second, notice the syntax (:@queue, instead of @queue or :queue) to specify the delegated object we're defining methods for. This is simply an artefact of the way that Forwardable is implemented.

SingleForwardable
Where Forwardable provides class-level delegation, SingleForwardable provides object level delegation. For this example I'll simply copy the example provided in the library documentation.

require 'forwardable'

printer = String.new
printer.extend SingleForwardable # prepare object for delegation
printer.def_delegator "STDOUT", "puts" # add delegation for STDOUT.puts()
printer.puts "Howdy!"

Epilogue
Using DelegateClass and Forwardable for your delegation needs will most likely cover most of the cases you may end up needing to implement the delegator pattern.

Did we default or not?

You may sometimes find yourself having to distinguish whether a method attribute was supplied externally or taken from the default specified in the method definition.

Let's say, for example, that you want to warn a user when they have neglected to set an attribute but still continue with execution. Here is a snippet that would accomplish this:

irb(main):060:0> def some_method(first, second=(flag=true; '2nd'))
irb(main):061:1> p "Default value #{second} used for unspecified parameter 'second'" if flag.inspect == "true"
irb(main):062:1> end
=> nil
irb(main):063:0> some_method(1,2)
=> nil
irb(main):064:0> some_method(1,'2nd')
=> nil
irb(main):065:0> some_method(1)
"Default value 2nd used for unspecified parameter 'second'"
=> nil
irb(main):066:0>
Can you work out what is going on in the method parameter declaration?

All that's happening is that the code in the round brackets after the equals sign defines a local variable _flag_, sets its value and returns the default value we want to set it to.

Ruby rocks!

Sunday, July 29, 2007

Tracking fast-paced packages on debian based systems (aka debian-volatile project)

debian-volatile
If you run some ISP services (your own mail server with virus and/or spam scanning tools) you will have run into the age old problem that the scanning tools in the stable distribution do not evolve as fast as they should to keep up with their fast-paced projects.

Even continual updates of the software in your distribution are not enough to stay up to date as the release cycle of the stable distribution is out of sync with the speed at which things change in the wild.

According to the debian-volatile project page:

The main goal of volatile is allowing system administrators to update their systems
in a nice, consistent way, without getting the drawbacks of using unstable, even
without getting the drawbacks for the selected packages. So debian-volatile will
only contain changes to stable programs that are necessary to keep them functional.

volatile-sloppy
Great effort goes into ensuring that no functional changes are made to packages in debian-volatile (so that configuration file changes, etc. are not required) for painless upgrades. Unfortunately painful upgrades are not always avoidable so a volatile-sloppy section was created to contain packages that are fast-paced but also require some functional change to how it runs, is installed or configured.

Security
You should note that the debian-volatile project is not supported by the _official_ security team. This responsibility falls to the debian-volatile team who currently has at least one member that is shared with the official debian testing security team.

How do I use it?
Add the relevant repository (volatile and/or volatile-sloppy) to your /etc/apt/sources.list file:

Sarge
deb http://volatile.debian.org/debian-volatile sarge/volatile main contrib non-free
deb http://volatile.debian.org/debian-volatile sarge/volatile-sloppy main contrib non-free

Etch
deb http://volatile.debian.org/debian-volatile etch/volatile main contrib non-free
deb http://volatile.debian.org/debian-volatile etch/volatile-sloppy main contrib non-free


Save sources.list and run _atp-get update_ which should generate something like this (your listing will vary depending on the repositories you have listed in your sources file):

# apt-get update
Get:1 http://archive.ubuntu.com dapper Release.gpg [189B]
Get:2 http://us.archive.ubuntu.com dapper Release.gpg [189B]
Get:3 http://us.archive.ubuntu.com dapper-backports Release.gpg [191B]
Get:4 http://archive.ubuntu.com dapper-updates Release.gpg [191B]
Get:5 http://volatile.debian.org etch/volatile Release.gpg [189B]
Hit http://us.archive.ubuntu.com dapper Release
Hit http://archive.ubuntu.com dapper Release
Get:6 http://volatile.debian.org etch/volatile Release [40.7kB]
Hit http://archive.ubuntu.com dapper-updates Release
Hit http://us.archive.ubuntu.com dapper-backports Release
Hit http://archive.ubuntu.com dapper/main Packages
Hit http://archive.ubuntu.com dapper/restricted Packages
Hit http://us.archive.ubuntu.com dapper/universe Packages
Hit http://us.archive.ubuntu.com dapper/universe Sources
Hit http://archive.ubuntu.com dapper/main Sources
Hit http://archive.ubuntu.com dapper/restricted Sources
Hit http://archive.ubuntu.com dapper-updates/main Packages
Hit http://archive.ubuntu.com dapper-updates/restricted Packages
Hit http://us.archive.ubuntu.com dapper-backports/main Packages
Hit http://us.archive.ubuntu.com dapper-backports/restricted Packages
Hit http://us.archive.ubuntu.com dapper-backports/universe Packages
Hit http://archive.ubuntu.com dapper-updates/main Sources
Hit http://archive.ubuntu.com dapper-updates/restricted Sources
Hit http://us.archive.ubuntu.com dapper-backports/multiverse Packages
Hit http://us.archive.ubuntu.com dapper-backports/main Sources
Hit http://us.archive.ubuntu.com dapper-backports/restricted Sources
Hit http://us.archive.ubuntu.com dapper-backports/universe Sources
Hit http://us.archive.ubuntu.com dapper-backports/multiverse Sources
Ign http://volatile.debian.org etch/volatile Release
Get:7 http://volatile.debian.org etch/volatile/main Packages [3953B]
Hit http://volatile.debian.org etch/volatile/contrib Packages
Hit http://volatile.debian.org etch/volatile/non-free Packages
Get:8 http://security.ubuntu.com dapper-security Release.gpg [191B]
Hit http://security.ubuntu.com dapper-security Release
Hit http://security.ubuntu.com dapper-security/main Packages
Hit http://security.ubuntu.com dapper-security/restricted Packages
Hit http://security.ubuntu.com dapper-security/main Sources
Hit http://security.ubuntu.com dapper-security/restricted Sources
Hit http://security.ubuntu.com dapper-security/universe Packages
Hit http://security.ubuntu.com dapper-security/universe Sources
Fetched 44.8kB in 5s (7554B/s)
Reading package lists... Done
W: GPG error: http://volatile.debian.org etch/volatile Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY EC61E0B0BBE55AB3
W: You may want to run apt-get update to correct these problems
#

The inclusion of the debian-volatile release fails because we do not have a key to authenticate the repository. Adding the following will import their key (mentioned as EC61E0B0BBE55AB3 above) into your key ring:

# gpg --keyserver subkeys.pgp.net --recv-keys EC61E0B0BBE55AB3
gpg: directory `/root/.gnupg' created
gpg: new configuration file `/root/.gnupg/gpg.conf' created
gpg: WARNING: options in `/root/.gnupg/gpg.conf' are not yet active during this run
gpg: keyring `/root/.gnupg/secring.gpg' created
gpg: keyring `/root/.gnupg/pubring.gpg' created
gpg: requesting key BBE55AB3 from hkp server subkeys.pgp.net
gpg: /root/.gnupg/trustdb.gpg: trustdb created
gpg: key BBE55AB3: public key "Debian-Volatile Archive Automatic Signing Key (4.0/etch)" imported
gpg: no ultimately trusted keys found
gpg: Total number processed: 1
gpg: imported: 1
# gpg --armor --export EC61E0B0BBE55AB3 | apt-key add -
gpg: no ultimately trusted keys found
OK
#

Another spin of _apt-get update_ (and possibly _apt-get upgrade_ if their are any outdated packages) should then do the trick.

Tuesday, July 24, 2007

Links

IRB
Dr Nic has a great little article describing his favourite additions to his .irbrc. He covers some common productivity wins such as:
  • TABed auto-completion
  • Map by method which allows you to get rid of constructs like articles.columns.map {|p| p.name} or articles.columns.map &:name and simply replace it with a plural: articles.columns.names or articles.columns.name
  • MethodFinder/Object.what?
  • pp
  • Auto-tabbing
GuessMethod
This little gem is the best bad idea ever! Gone are those frustrating typos that waste extra cycles finding and fixing them.

Wirble
Continuing on the irb enhancements theme do yourself a favour and have a look at Wirble. It offers you tab-completion, history, and a built-in ri command as well as colorised results and a couple other goodies.

Tuesday, July 17, 2007

Links

Sequel
Sequel is a light-weight ORM that fills in the gaps where using ActiveRecord without rails doesn't fit your needs.

Gregory Brown has a nice little expose on it as the winner of the June 2007 Ruby Project Spotlight.

Introduction to .NET 3.0 for Architects
Keep your friends close and competition^H^H^H^H^H^H^H^H^H^H alternative platforms closer. InfoQ has a great 50000 foot look at .NET 3.0.

Profiling your ails app with ruby-prof
Charlie Savage wrote a great article on doing some profiling on your rails app using ruby-prof. He discusses using both flat and graph (with associated call tree information) profiles to nuke performance hogs.

ruby-prof is a fast code profiler for Ruby. Its features include:
  • Speed - it is a C extension and therefore many times faster than the standard Ruby profiler.
  • Flat Profiles - similar to the reports generated by the standard Ruby profiler
  • Graph profiles - similar to GProf, these show how long a method runs, which methods call it and which methods it calls
  • Threads - supports profiling multiple threads simultaneously
  • Recursive calls - supports profiling recursive method calls
  • Reports - can generate both text and cross-referenced html reports
  • Output - can output to standard out or to a file

AR-Delegation
This plugin extends ActiveRecord::Base to add useful delegation features. For example: has_columns :from => :source, :only => ["title", "name"] has_column "title", :from => :source, :as => "source_title".

It really improves the conciseness of your code but in so doing hides your implementation adding a layer of indirection that may make your code a little more difficult to understand if the person reading your code does not know that AR-Delegation was used.

Exception Notifier
I use this with most of my projects that reside on remote customer networks where the only way for the application to give me a heads up is if it sends me an email with an attached problem report.

Saturday, May 19, 2007

The Journey from Lambda to Unlambda

While scrounging around the Net looking at functional programming languages I came across Unlambda. The name by itself piqued my interest so I though I'd go have a look-see.

Here follows the troubled tale of the journey from Lambda to Unlambda ...
Nature of the Beast
It was created by David Madore and is a minimal functional programming language that was specifically built to make programming in it obtuse (or as the author's site refers to it as: "... fun and challenging"). It is based on combinatory logic but omits the lambda abstraction forcing you to rely on K and S combinators.

Beside hailing itself as a functional language it also succeeds in being an obfuscated programming language through severely restricting the set of allowed operations in the language and making it generally alien to programmers from more conventional languages.

It strictly only manipulates functions. A function is the only function parameter that you can pass in to a function and a function is the only construct that can be returned from a function. It relies heavily on its built-in k and s functions (K and S combinators) to get anything done.

The source is built to be intentionally incomprehensible for a human making it next to impossible to deduce what the intention of the program is by simply reading the source.

You are welcome to create your own functions, but be warned, you cannot name or save the custom function(s) because Unlambda does not have support for variables. Besides dispelling variables from its cloth you will also notice the lack of built-in support for data structures or code constructs (e.g. loops, conditionals, etc.). You are however welcome to build your own code constructs.

As an illustration, here is a loop that prints “Hello, world!” repeatedly, followed by an incrementing number of asterisks (an explanation of this severely limited subset of alphabet soup can be found here):

```s``sii`ki
``s``s`ks
``s``s`ks``s`k`s`kr
``s`k`si``s`k`s`k
`d````````````.H.e.l.l.o.,. .w.o.r.l.d.!
k
k
`k``s``s`ksk`k.*

Unlambda is the little functional language that could however and in the face of all its consciously implemented obscurity it still manages to be Turing-complete.

Sure, but can it speak?
Here's the Unlambda equivalent for the trusty old "Hello world" we're used to:

`r```````````.H.e.l.l.o. .w.o.r.l.di

More examples, a tutorial and some HOWTOs can be found at David's site for Unlambda.

Secret Murmurings on Unlambda

  • “It's disgusting — it's revolting — we love it.” CyberTabloid

  • “Unlambda, the language in which every program is an IOUCC.” Encyclopædia Internetica

  • “The worst thing to befall us since Intercal.” Computer Languages Today

  • “The effect of reading an Unlambda program is like having your brains smashed out by a Lisp sexp wrapped around an ENIAC. You won't find anything like it west of Alpha Centauri.” The Hitch-Hacker's Guide to Programming



Epitaph
Unlambda is inevitably compared to Intercal, but unlike Intercal it has a kind of weird elegance; this is because Intercal gets in your way, but Unlambda simply fails to help you."

Tuesday, May 15, 2007

OS X File Name Subterfuge

While saving a file via the well known OS X file Save As dialog I noticed something really queer. The filename I had pasted in was magically altered to adhere to OS X's ideas on which characters are kosher for file/directory names!

Silent Substitution
To see the magic simply open TextEdit, create a new document (if you are not already presented with one) and add the following text to the document:

watch: this: space:


Now, select the text, copy it and save the file with the selected text as the file name.



Et voilà!

Subterfuge Uncorked
The dialog magically converts the colons to hyphens (-) to ensure that you don't have to go through he whole 'This filename is not allowed' error cycle needlessly. Brill.

The reason for the substitution is that OS X prohibits the use of colon characters in file/directory names because this character is used to represent a directory in the HFS+ file system.

According to the HFS+ spec you can use any Unicode or ASCII (including NUL) characters. OS APIs may limit some of these characters for legacy reasons.

I love subtlety.

Dynamic Arbitrary Depth Hashes In Ruby

UPDATE: Charles Duan has an interesting article in a similar vein.

Arbitrary array and hash depth constructs cannot be created in Ruby in the way you would in Perl or PHP. The following will simply fail with an error:

irb(main):001:0> a = []
=> []
irb(main):002:0> a[1][2][3][4] = 1
NoMethodError: undefined method `[]' for nil:NilClass
from (irb):2
irb(main):003:0> h = {}
=> {}
irb(main):004:0> h[1][2][3][4] = 5
NoMethodError: undefined method `[]' for nil:NilClass
from (irb):4

When dynamically constructing your array or hash (aka Autovivification) this really gets in the way.

Autovivification
This is a dynamic data structure creation feature that can be found in Perl and PHP (those are the ones I know of). It allows you to create dynamic, complex, nested data structures based on the types implied in the syntax of the statement of code accessed through the data structure.

IOW, the act of fetching or storing a value at a leaf through a branch dynamically creates the branch(es) to the leaf.

VivifiedHash
One approach is to do the following:

irb(main):011:0* VivifiedHash = Hash.new(&(p=lambda{|h,k| h[k] = Hash.new(&p)}))
=> {}
irb(main):012:0> VivifiedHash[1][2][3][4] = 5
=> 5
irb(main):013:0> VivifiedHash[1][2][3][4]
=> 5
irb(main):014:0> VivifiedHash[1][2][3]
=> {4=>5}
irb(main):015:0> VivifiedHash[1][2]
=> {3=>{4=>5}}
irb(main):016:0> VivifiedHash[1]
=> {2=>{3=>{4=>5}}}
irb(main):017:0> VivifiedHash
=> {1=>{2=>{3=>{4=>5}}}}

All this does is recursively assign the default key of the hash a new hash object as value. Each branch you specify in your assignment will recursively trigger the creation of a new hash.

Limitations
The limitation on this are of course that your data structure cannot contain anything but hashes as branches. Leaf nodes can be any data type though.

Sources

  1. Ruby Hashes of Arbitrary Depth

  2. Multidimensional arrays and hashes discussion on the RubyTalk mailing

  3. Auto Vivification

Monday, May 7, 2007

Mechanized Scraping

Ever needed to interface with a web application without any real APIs? Take one step back from looking for a traditional API and use WWW::Mechanize to bend the application to your will.

WWW:Mechanize (inspired by "Andy Lester's":mailto://andy@petdance.com perl Mechanize module and written by Aaron Patterson) allows you to moonlight as a web User Agent (browser) from the comfort of your ruby scripting environment. It is great for building automated tests of your web applications, creating your favourite mashups and also to treat another web application's UI as the API to the application.

I've been working on some code that needs to gather reporting information from our billing system but I have no real access to the Oracle db in the back to get to the require stored procedures. So, I decided to simply use the UI as my API to the data and dusted my trusty old WWW:Mechanize (which uses Hpricot internally to parse and tokenise pages) off for the challenge.

It provides you with all the required tools to log in to a site (as well as automatic cookie handling), click on URI, submit forms and oh so much more. The only real feature currently lacking is support for JavaScript (they do however provide you with ideas on how to manoeuvre around some of the more mundane corers) which is becoming more and more painful in this Web2.0 world of ours.

WWW:Mechanize is quite easy to use so I am not going to write an exposé on the in's and out's of the lib or share with you its secrets that helped me to sate world hunger and bring peace to all. Instead, I will mention some of the bits that tripped me up while trying to make the web application dance to my flute.

Button Value Attributes
I was getting nowhere while trying to submit a form in the web application with some crafted values. Tinker here, tinker there and still no go. Try a browser and the application itself and things work like swiss cheesewatches.

Right you mangy ASP application, its time for the big guns! Out comes Wireshark and the debugging starts in earnest. First I dump a session from my script and then one from a browser.

From the diff of of the POST request I notice that the browser has the value attribute for the 'Save' button in the form set whereas I didn't. Because the form was posting back to itself I assume they had some code like (pseudocode):

if $submit == 'Submit'
then
do your stuff when the form has been submitted
else
display the normal form
end

Adding something that resembles the following did the trick:

form.buttons.name('some_convoluted_button_name').value = 'Submit'


Out of Buffer Error
A few more form hoops later and I started getting an error like:

hpricot/parse.rb:44:in `scan': ran out of buffer space on element <group>, starting on line 361. (Hpricot::ParseError?)

Hey?!

A quick look on the bug db for WWW:Mechanize on RubyForge listed this closed bug that has some application to our situation. The error messages are not the same (I assume this is the case due to an earlier version of Hpricot that was used when this was reported).

According to this TT it is a Hrpicot issue and refers to this TT.

According to the problem description:

An 'OUT OF BUFFER SPACE' error shuts down my whole app when I try to parse through an aspx page with an abnormally (or normally?) large viewstate stuffed into an input. Here's what it looks like:

<input type="hidden" name="__VIEWSTATE"
value="dDw3NzQ0ODQ2ODQ ... 11954 characters in total ... DsXdJfP+k" />

If I remove the large value it works fine. Is there a way hpricot could not exit when trying to parse a page like this?

DING! DING DING!

I am also scraping an ASP application and lo and behold I too have a ginormous __VIEWSTATE input tag in the page in question. I knew ASP was evil, but this?!

The limit on the buffer was of course a protection mechanism to ensure that a parsed page does not cause your computer to become the black hole of memory. The workaround for this is quite simple though, just increase the buffer

Okay, kids. [98] now has a buffer_size method.
Hpricot.buffer_size = 262144
doc = Hpricot(open("http://asp.net/big-viewstate-vomit.html"))

Perhaps I will find the wherewithal to fix the parser to read these massive attributes, but on-the-other-hand I don't want to encourage this disastrous behavior by ASP.NET!! You know?

"That's all good and well but we're not really using Hpricot directly, we're using WWW:Mechanize!", you all shout in unison.

True, true. All you do is simply add the buffer_size declaration after instantiating your shiny new WWW:Mechanize object like so:

agent = WWW::Mechanize.new
Hpricot.buffer_size = 204800

The default buffer size is defined in hpricot_scan.rl as:

[...]

#define BUFSIZE 16384

[...]

buffer_size = BUFSIZE;
if (rb_ivar_defined(self, rb_intern("@buffer_size")) == Qtrue) {
bufsize = rb_ivar_get(self, rb_intern("@buffer_size"));
if (!NIL_P(bufsize)) {
buffer_size = NUM2INT(bufsize);
}
}
buf = ALLOC_N(char, buffer_size);

[...]

That's a buffer of about 16KB for an attribute which under normal circumstances would be more than ample space for an attribute but working with ASP seems to be anything but normal.

In Closing
I have not had as much fun in quite some time. WWW:Mechanize had me clapping my little hands in glee while shouting "Wheeeeeeeeeee!" like a little kid that was given his first bunny rabbit just after having his second double espresso for the hour.

Wednesday, May 2, 2007

Ruby (Hpricot) Program Guide - III

As discussed in the previous article our next steps will be to refactor the constructor and provide an example of how we can use objects from the DSTVSchedule class to collect and display channels of our choice.

Let's change the constructor to take the channel ID, time offset (to account for different time zones) and the period ahead in time for which we want to gather schedule information as parameters. This will mean that we get rid of the custom hash class and tidy things up a little bit:

def initialize(channel=219, offset=2, period=30)
start_date, end_date = get_search_dates(period)
url = build_url(build_query_string(channel, start_date ,end_date))

p "Start: #{start_date} End: #{end_date} URL: #{url}"

@hp = Hpricot(open(url))
@ic = Iconv.new('US-ASCII//TRANSLIT', 'UTF-8')
@coder = HTMLEntities.new
@schedule = process_html(@hp, offset)
end

def get_search_dates(period=30)
[DateTime.now().strftime("%d %b %Y"), (DateTime.now()+period).strftime("%d %b %Y")]
end

def build_query_string(channel, start_date, end_date)
urlencode({
'channelid' => channel,
'startDate' => start_date,
'EndDate' => end_date}) +
'&sType=5&searchstring=&submit=Submit'
end

def build_url(query_string)
host = 'www.mnet.co.za'
cgi = '/schedules/default.asp?'
"http://#{host}#{cgi}#{query_string}"
end

def urlencode(hash)
hash.map {|k, v| "#{URI::encode(k.to_s)}=#{URI::encode(v.to_s)}"}.join('&')
end

We no longer statically define the query parameters in the constructor and therefore have no real need for the custom hash. We can still use the urlencode() method though and add it as a helper in the class.

The start and end dates for the query are calculated based on today's date and the period provided to the constructor as an argument.

We also dumped all that horrible looking query string and url variable construction code into separate methods.

The next step is to provide some automation to the channel schedule collection code for our example program. Look at the the HTML data in any of the search pages and you'll see the following (excerpt):

<select name="channelid" class="ScheduleInputSelect">
<option value="" >CHANNEL</option>
<option value=246>actionX </option>
<option value=322>Activate </option>
<option value=496>Africa Magic</option>
<option value=487>Africa Magic Channel (C-Band) </option>
<option value=639>Africa Magic W4</option>
<option value=417>Animal Planet </option>
[...]
<option value=254>TV Globo </option>
<option value=493>TV5 Afrique </option>
<option value=110>TV5 Afrique (Africa) </option>
<option value=65>VH1 </option>
<option value=67>ZEE TV </option>
</select>

These are the channels that we can search for. What we need is to represent this information as an internal data structure that we can use to search for the channels we want. I suggest a hash that has the channel name as a key and the channel ID and offset as a tuple.

I am lazy so I'd prefer to avoid typing all that information up or manually trying to transform it in the editor. Perhaps we can use some good old command line ruby to chew up and spit out the code we need which we can then just cut 'n paste or import (depending on the editor you use).

Copy the HTML and drop it in a file somewhere. Let's call the file in.html and run it through this command line script (output is truncated):

$ ruby -n -e '$_=~/value=(\d+)\>(.+)\s+\</;if $1&&$2 then a=$1;b=$2;print "\# \"#{b.sub(/\s+$/,"")}\" => [#{a}, 120],\n" end' < in.html | head
# "actionX" => [246, 120],
# "Activate" => [322, 120],
# "Africa Magic Channel (C-Band)" => [487, 120],
# "Animal Planet" => [417, 120],
# "B4U Movies" => [227, 120],
# "BBC Food" => [284, 120],
# "BBC Prime" => [121, 120],
# "BBC World" => [5, 120],
# "Bloomberg Information TV" => [8, 120],
# "Boomerang" => [314, 120],
[...]

Now take the output and place it in your script as a hash (as described above):

channels = {
# "actionX" => [246, 120],
# "Activate" => [322, 120],
# "Africa Magic Channel (C-Band)" => [487, 120],
# "Animal Planet" => [417, 120],
# "B4U Movies" => [227, 120],
"BBC Food" => [284, 120],
"BBC Prime" => [121, 120],
# "BBC World" => [5, 120],
# "Bloomberg Information TV" => [8, 120],
# "Boomerang" => [314, 120],
# "BVN" => [270, 120],
# "Canal+ Horizons" => [237, 120],
# "Cartoon Network" => [13, 120],
# "Cartoon Network (Africa)" => [219, 120],
# "Cartoon Network (W4)" => [182, 120],
# "Channel O - Sound Television" => [27, 120],
# "China Central Television 4" => [15, 120],
# "China Central Television 9 (Africa)" => [226, 120],
# "CNBC" => [90, 120],
# "CNBC (Africa)" => [194, 120],
# "CNBC (W4)" => [187, 120],
# "CNN International" => [18, 120],
# "Deukom - 3SAT" => [165, 120],
# "Deukom - ARD" => [93, 120],
# "Deukom - DW" => [94, 120],
# "Deukom - PRO 7" => [164, 120],
# "Deukom - RTL" => [91, 120],
# "Deukom - SAT 1" => [92, 120],
# "Deukom - ZDF" => [95, 120],
"Discovery Channel" => [21, 120],
# "E-Entertainment" => [646, 120],
"ESPN" => [24, 120],
# "eTV" => [111, 120],
# "Fashion TV" => [145, 120],
# "Fashion TV (Africa)" => [196, 120],
# "Fashion TV (W4)" => [216, 120],
"GO" => [542, 120],
# "Go (K-World Teen)" => [341, 120],
"Hallmark Entertainment Network" => [32, 120],
"History Channel" => [484, 120],
# "History Channel (Africa)" => [485, 120],
# "K-TV World" => [36, 120],
# "KTV (Indian Bouquet)" => [501, 120],
# "kykNET" => [112, 120],
# "M-Net Domestic" => [39, 120],
"M-Net East (Africa)" => [40, 120],
"M-Net Series" => [75, 120],
# "MK89" => [592, 120],
# "Movie Magic (Africa)" => [57, 120],
"Movie Magic 2 (Africa)" => [234, 120],
# "Movie Magic 2 (W4)" => [233, 120],
# "MTV" => [42, 120],
# "MTV Base" => [69, 120],
"National Geographic" => [102, 120],
# "NDTV" => [499, 120],
# "Parliamentary Service" => [45, 120],
# "Pay Per View" => [109, 120],
"Reality TV" => [248, 120],
# "Rhema Network" => [46, 120],
# "RTPi" => [48, 120],
# "SABC 1" => [84, 120],
# "SABC 2" => [85, 120],
# "SABC 3" => [86, 120],
# "SABC Africa" => [87, 120],
# "SIC" => [255, 120],
# "Sky News" => [120, 120],
"Sony Entertainment" => [228, 90],
# "Summit" => [104, 120],
# "Sun TV" => [500, 120],
# "SuperSport" => [52, 120],
# "SuperSport 2" => [54, 120],
# "SuperSport 3" => [80, 120],
# "SuperSport 3 (W4)" => [172, 120],
# "SuperSport 5" => [208, 120],
# "SuperSport 5 (Africa)" => [252, 120],
# "SuperSport 5 (W4)" => [251, 120],
# "SuperSport 6" => [209, 120],
# "SuperSport 7 (C-Band)" => [580, 120],
# "SuperSport Zone Mosaic" => [235, 120],
# "TellyTrack" => [34, 120],
# "Travel Channel" => [61, 120],
# "Trinity Broadcasting Network" => [276, 120],
# "Turner Classic Movies" => [59, 120],
# "Turner Classic Movies (Africa)" => [60, 120],
# "Turner Classic Movies (W4)" => [181, 120],
# "TV Globo" => [254, 120],
# "TV5 Afrique" => [493, 120],
# "TV5 Afrique (Africa)" => [110, 120],
# "VH1" => [65, 120],
# "ZEE TV" => [67, 120]
}

You'll notice I have removed the comments from any of the channels I want (I recommend you do the same for the channels you may be interested in). I also added a default time offset of 2 hours (120 minutes) for most of the channels to adjust the time for my time zone. You can change this in the command line ruby filter above to suit your needs.

All we need to do now is wrap our object creation and the output from it in a loop and we're off:

channels.keys.each do |channel|
p "Channel: #{channel}"
schedule = DSTVSchedule.new(channels[channel][0], channels[channel][1], 30)
schedule.print_schedule
print "\n\n"
end

All done. Here is the complete script source listing:

#!/usr/bin/ruby

class DSTVSchedule
require 'rubygems'
require 'hpricot'
require 'open-uri'
require 'htmlentities'
require 'iconv'
require 'collections/sequenced_hash'

def initialize(channel=219, offset=2, period=30)
start_date, end_date = get_search_dates(period)
url = build_url(build_query_string(channel, start_date ,end_date))

p "Start: #{start_date} End: #{end_date} URL: #{url}"

@hp = Hpricot(open(url))
@ic = Iconv.new('US-ASCII//TRANSLIT', 'UTF-8')
@coder = HTMLEntities.new
@schedule = process_html(@hp, offset)
end

def process_html(hp, offset)
schedule = SequencedHash.new
date = ""
time = ""
(hp/"td").each do |line|
case line.inner_html
when /ScheduleChannel/
@channel = sanitize((line/"[@class='ScheduleChannel']").inner_html)
when /(ScheduleDate|date)/
date = utf7((line/"[@class='ScheduleDate']|[@class=date]").inner_html)
schedule[date] = SequencedHash.new
when /ScheduleTime/
time = sanitize((line/"[@class='ScheduleTime']").inner_html)
time = (Time.parse("#{date} #{time}") + (60 * offset)).strftime("%H:%M")
schedule[date][time] = []
when /ScheduleTitle/
schedule[date][time] << sanitize((line/"[@class='ScheduleTitle']").inner_html)
when /\<p\>/
schedule[date][time] << sanitize((line/"p").inner_html)
end
end

schedule
end

def to_s
self.print_schedule("\t")
end

alias :to_tdt :to_s

def to_csv
##TODO - Add channel to the output
self.print_schedule(",")
end

def print_schedule(separator="||")
sep = separator
@schedule.keys.each do |date|
@schedule[date].keys.each do |time|
print [date, time, @schedule[date][time][0], @schedule[date][time][1]].join(sep) + "\n"
end
end
end

protected

def sanitize(string)
string.gsub!(/\<\!\-\-.+$/, '') # remove HTML comments to the end of the line
string.gsub!(/^\s+/, '') # remove leading whitespace
string.gsub!(/\s+$/, '') # remove trailing whitespace
string
end

def utf7(string="")
@ic.iconv(@coder.decode(string))
end

def get_search_dates(period=30)
[DateTime.now().strftime("%d %b %Y"), (DateTime.now()+period).strftime("%d %b %Y")]
end

def build_query_string(channel, start_date, end_date)
urlencode({
'channelid' => channel,
'startDate' => start_date,
'EndDate' => end_date}) +
'&sType=5&searchstring=&submit=Submit'
end

def build_url(query_string)
host = 'www.mnet.co.za'
cgi = '/schedules/default.asp?'
"http://#{host}#{cgi}#{query_string}"
end

def urlencode(hash)
hash.map {|k, v| "#{URI::encode(k.to_s)}=#{URI::encode(v.to_s)}"}.join('&')
end
end


#
# Main
#
channels = {
# "actionX" => [246, 120],
# "Activate" => [322, 120],
# "Africa Magic Channel (C-Band)" => [487, 120],
# "Animal Planet" => [417, 120],
# "B4U Movies" => [227, 120],
"BBC Food" => [284, 120],
"BBC Prime" => [121, 120],
# "BBC World" => [5, 120],
# "Bloomberg Information TV" => [8, 120],
# "Boomerang" => [314, 120],
# "BVN" => [270, 120],
# "Canal+ Horizons" => [237, 120],
# "Cartoon Network" => [13, 120],
# "Cartoon Network (Africa)" => [219, 120],
# "Cartoon Network (W4)" => [182, 120],
# "Channel O - Sound Television" => [27, 120],
# "China Central Television 4" => [15, 120],
# "China Central Television 9 (Africa)" => [226, 120],
# "CNBC" => [90, 120],
# "CNBC (Africa)" => [194, 120],
# "CNBC (W4)" => [187, 120],
# "CNN International" => [18, 120],
# "Deukom - 3SAT" => [165, 120],
# "Deukom - ARD" => [93, 120],
# "Deukom - DW" => [94, 120],
# "Deukom - PRO 7" => [164, 120],
# "Deukom - RTL" => [91, 120],
# "Deukom - SAT 1" => [92, 120],
# "Deukom - ZDF" => [95, 120],
"Discovery Channel" => [21, 120],
# "E-Entertainment" => [646, 120],
"ESPN" => [24, 120],
# "eTV" => [111, 120],
# "Fashion TV" => [145, 120],
# "Fashion TV (Africa)" => [196, 120],
# "Fashion TV (W4)" => [216, 120],
"GO" => [542, 120],
# "Go (K-World Teen)" => [341, 120],
"Hallmark Entertainment Network" => [32, 120],
"History Channel" => [484, 120],
# "History Channel (Africa)" => [485, 120],
# "K-TV World" => [36, 120],
# "KTV (Indian Bouquet)" => [501, 120],
# "kykNET" => [112, 120],
# "M-Net Domestic" => [39, 120],
"M-Net East (Africa)" => [40, 120],
"M-Net Series" => [75, 120],
# "MK89" => [592, 120],
# "Movie Magic (Africa)" => [57, 120],
"Movie Magic 2 (Africa)" => [234, 120],
# "Movie Magic 2 (W4)" => [233, 120],
# "MTV" => [42, 120],
# "MTV Base" => [69, 120],
"National Geographic" => [102, 120],
# "NDTV" => [499, 120],
# "Parliamentary Service" => [45, 120],
# "Pay Per View" => [109, 120],
"Reality TV" => [248, 120],
# "Rhema Network" => [46, 120],
# "RTPi" => [48, 120],
# "SABC 1" => [84, 120],
# "SABC 2" => [85, 120],
# "SABC 3" => [86, 120],
# "SABC Africa" => [87, 120],
# "SIC" => [255, 120],
# "Sky News" => [120, 120],
"Sony Entertainment" => [228, 90],
# "Summit" => [104, 120],
# "Sun TV" => [500, 120],
# "SuperSport" => [52, 120],
# "SuperSport 2" => [54, 120],
# "SuperSport 3" => [80, 120],
# "SuperSport 3 (W4)" => [172, 120],
# "SuperSport 5" => [208, 120],
# "SuperSport 5 (Africa)" => [252, 120],
# "SuperSport 5 (W4)" => [251, 120],
# "SuperSport 6" => [209, 120],
# "SuperSport 7 (C-Band)" => [580, 120],
# "SuperSport Zone Mosaic" => [235, 120],
# "TellyTrack" => [34, 120],
# "Travel Channel" => [61, 120],
# "Trinity Broadcasting Network" => [276, 120],
# "Turner Classic Movies" => [59, 120],
# "Turner Classic Movies (Africa)" => [60, 120],
# "Turner Classic Movies (W4)" => [181, 120],
# "TV Globo" => [254, 120],
# "TV5 Afrique" => [493, 120],
# "TV5 Afrique (Africa)" => [110, 120],
# "VH1" => [65, 120],
# "ZEE TV" => [67, 120]
}

channels.keys.each do |channel|
p "Channel: #{channel}"
schedule = DSTVSchedule.new(channels[channel][0], channels[channel][1], 30)
schedule.print_schedule
print "\n\n"
end

I hope these articles have tickled your lobes and gets you to go explore Hpricot and the Wonderful World of Web Scraping.

Monday, April 30, 2007

Ruby (Hpricot) Program Guide - II

For this installment we'll see if we can build on what we learnt last time to provide a less naive solution to get a complete schedule for a channel that spans several days, each having variable amounts of programs per day.

First thing first though. Let's add the code that will retrieve the page for the channel we choose. Let's assume we want the schedule for Cartoon Network (Africa). The channel id for this channels happens to be 219 (as per the select list on the search page).

class Hash
require 'uri'

def urlencode
map {|k, v| "#{URI::encode(k.to_s)}=#{URI::encode(v.to_s)}"}.join('&')
end
end

class DSTVSchedule
require 'rubygems'
require 'hpricot'
require 'open-uri'
require 'htmlentities'
require 'iconv'

def initialize()
query_params = {
'startDate' => '30 Apr 2007',
'EndDate' => '01 May 2007',
'channelid' => 219
}
query_string = query_params.urlencode + '&sType=5&searchstring=&submit=Submit'
host = 'www.mnet.co.za'
cgi = '/schedules/default.asp?'
url = "http://#{host}#{cgi}#{query_string}"
@hp = Hpricot(open(url))
@ic = Iconv.new('US-ASCII//TRANSLIT', 'UTF-8')
@coder = HTMLEntities.new
@channel = channel
@date = date
@time = time
@title = title
@synopsis = synopsis

printf "Channel: %s\nDate: %s\nTime: %s\nTitle: %s\nSynopsis: %s\n",
@channel, @date, @time, @title, @synopsis
end

def channel
sanitize(@hp.at("font[@class='ScheduleChannel']").inner_html)
end

def date
sanitize(@hp.at("font[@class='ScheduleDate']").inner_html)
end

def time
sanitize(@hp.at("font[@class='ScheduleTime']").inner_html)
end

def title
sanitize(@hp.at("font[@class='ScheduleTitle']").inner_html)
end

def synopsis
sanitize((@hp/"td[@colspan=5]/p").first.inner_html)
end

def sanitize(string)
@ic.iconv(@coder.decode(string))
end
end


#
# Main
#
schedule = DSTVSchedule.new()

So what interesting changes are there from our last try? The first thing you'll notice is that I monkey patched the Hash class and added a nifty urlencode method to encode my URL parameters that are used to construct the query string which we will be sending off to the search application.

Inside the DSTVSchedule class we've added query_params to temporarily hold our variable URL parameters. We then construct the URL we'll use for the query and simply pass that to the open() method from open-uri.

The rest should all seem familiar to you (if you followed the previous article).

Now that we have that behind us do you notice we sit with a little dilemma? If we want multiple days' programs we cannot use the class as it stands because we will religiously only output the first program in the schedule. Let's replace all those methods (channel, time, date, title, synopsis) with one method that initialises an internal data structure which will represent the channel information.

def initialize()
query_params = {
'startDate' => '30 Apr 2007',
'EndDate' => '01 May 2007',
'channelid' => 219
}
query_string = query_params.urlencode + '&sType=5&searchstring=&submit=Submit'
host = 'www.mnet.co.za'
cgi = '/schedules/default.asp?'
url = "http://#{host}#{cgi}#{query_string}"
@hp = Hpricot(open(url))
@ic = Iconv.new('US-ASCII//TRANSLIT', 'UTF-8')
@coder = HTMLEntities.new
@schedule = process_html(@hp)

self.print_schedule
end

def process_html(hp)
schedule = SequencedHash.new
date = ""
time = ""
(hp/"td").each do |line|
case line.inner_html
when /ScheduleChannel/
@channel = sanitize((line/"[@class='ScheduleChannel']").inner_html)
when /(date|ScheduleDate)/
date = utf7((line/"[@class=date]|[@class='ScheduleDate']").inner_html)
schedule[date] = SequencedHash.new
when /ScheduleTime/
time = sanitize((line/"[@class='ScheduleTime']").inner_html)
schedule[date][time] = []
when /ScheduleTitle/
schedule[date][time] << sanitize((line/"[@class='ScheduleTitle']").inner_html)
when /\<p\>/
schedule[date][time] << sanitize((line/"p").inner_html)
end
end

schedule
end

The process_html method replaces all the methods we removed. All we've done is use Hpricot to search for all table column tags, and their content, and done some further search refinement in the case statement.

In the case structure I use simple regexps to find the classes I want and then use Hpricot to pull out the information contained in the matched tag. The structure I create is a hash of hashes that has the date and time as keys and the title and synopsis as 2 elements in an array (tuple).

There is one strange case above; when searching for dates. The reason for this is to cope with the inconsistent semantics used in the HTML (as mentioned in the previous article). The first date is listed with a class attribute of 'ScheduleDate' while all the rest have a class attribute of 'date'.

Take note of the use of the specialised hash SequencedHash that is used instead of the vanilla hash that is included in the core of ruby. The SequencedHash is part of the Ruby Collections gem which keeps track in which order we add elements so that we're able to pull them out in the same order.

I suspect storing the order of the keys may be a lot faster than trying to sort through a (potentially) large data set at the end to ensure the data is printed out in ascending date/time order.

The sanitize() method has changed in the following ways from the last article:

  1. Forcing of encoding to UTF7 has been moved to the utf7() method.

  2. Drop any text that is a HTML comment to the end of the string.

  3. Reap any leading and trailing white space.


They are protected so we can only use them in our class.

protected

def sanitize(string)
string.gsub!(/\<\!\-\-.+$/, '') # remove HTML comments to the end of the line
string.gsub!(/^\s+/, '') # remove leading whitespace
string.gsub!(/\s+$/, '') # remove trailing whitespace
string
end

def utf7(string="")
@ic.iconv(@coder.decode(string))
end

We can now construct a valid query, execute the search and build an internal data structure that represents our schedule. We now need to find some way to output what we have internally.

def to_s
self.print_schedule("\t")
end

alias :to_tdt :to_s

def to_csv
##TODO - Add channel to the output
self.print_schedule(",")
end

def print_schedule(separator="||")
sep = separator
@schedule.keys.each do |date|
@schedule[date].keys.each do |time|
print [date, time, @schedule[date][time][0], @schedule[date][time][1]].join(sep) + "\n"
end
end
end

print_schedule() forms the basis of my output strategy. It takes an optional separator character(s) and walks the internal data structure to construct a schedule entry with data concatenated by the separator.

I reuse this method in the to_s() and to_csv() methods to print out TAB delimited and comma separated values, respectively. I also added a to_tdt (TAD Delimited Text) alias which is essentially just another name for to_s().

Running the class as it stands should give you something like this (extract):

30 April 2007||00:20||King Arthur's Disasters||Following the crazy adventures of King Arthur as he tries to find a present for his true love, Princess Guinevere.
30 April 2007||00:45||Spaced Out||'Death Of An Alien!'. George feels guilty when a Russian astronaut who saved his life is evicted from the space station.
30 April 2007||01:10||The Cramp Twins||Follow the fun and adventures of the troublesome twins, Lucien and Wayne Cramp, who are always fighting, arguing and embarrassing each other!
[...]
1 May 2007||00:20||King Arthur's Disasters||'The Ice Palace'. King Arthur and Merlin are sent to Switzerland to find Guinevere an ice palace that she can live inside.
1 May 2007||00:45||Spaced Out||'Invasion'. When cockroaches invade the space station, the Martins are asked by a cockroach prince to solve a conflict between his people and another clan.
1 May 2007||01:10||The Cramp Twins||Follow the fun and adventures of the troublesome twins, Lucien and Wayne Cramp, who are always fighting, arguing and embarrassing each other!
[...]

Feel free to play with the other output options for more fun.

Here is the complete class as it stands now:

class Hash
require 'uri'

def urlencode
map {|k, v| "#{URI::encode(k.to_s)}=#{URI::encode(v.to_s)}"}.join('&')
end
end

class DSTVSchedule
require 'rubygems'
require 'hpricot'
require 'open-uri'
require 'htmlentities'
require 'iconv'
require 'collections/sequenced_hash'

def initialize(channel='', period=30, time_offset=2)
query_params = {
'startDate' => '30 Apr 2007',
'EndDate' => '1 May 2007',
'channelid' => "219"
}
query_string = query_params.urlencode + '&sType=5&searchstring=&submit=Submit'
host = 'www.mnet.co.za'
cgi = '/schedules/default.asp?'
url = "http://#{host}#{cgi}#{query_string}"
@hp = Hpricot(open(url))
@ic = Iconv.new('US-ASCII//TRANSLIT', 'UTF-8')
@coder = HTMLEntities.new
@schedule = process_html(@hp)
end

def process_html(hp)
schedule = SequencedHash.new
date = ""
time = ""
(hp/"td").each do |line|
case line.inner_html
when /ScheduleChannel/
@channel = sanitize((line/"[@class='ScheduleChannel']").inner_html)
when /(ScheduleDate|date)/
date = utf7((line/"[@class='ScheduleDate']|[@class=date]").inner_html)
schedule[date] = SequencedHash.new
when /ScheduleTime/
time = sanitize((line/"[@class='ScheduleTime']").inner_html)
schedule[date][time] = []
when /ScheduleTitle/
schedule[date][time] << sanitize((line/"[@class='ScheduleTitle']").inner_html)
when /\<p\>/
schedule[date][time] << sanitize((line/"p").inner_html)
end
end

schedule
end

def to_s
self.print_schedule("\t")
end

alias :to_tdt :to_s

def to_csv
##TODO - Add channel to the output
self.print_schedule(",")
end

def print_schedule(separator="||")
sep = separator
@schedule.keys.each do |date|
@schedule[date].keys.each do |time|
print [date, time, @schedule[date][time][0], @schedule[date][time][1]].join(sep) + "\n"
end
end
end

protected

def sanitize(string)
string.gsub!(/\<\!\-\-.+$/, '') # remove HTML comments to the end of the line
string.gsub!(/^\s+/, '') # remove leading whitespace
string.gsub!(/\s+$/, '') # remove trailing whitespace
string
end

def utf7(string="")
@ic.iconv(@coder.decode(string))
end
end


#
# Main
#
schedule = DSTVSchedule.new()
schedule.print_schedule

Further refactoring may see us adding some attributes to the constructor (channel name, time offset) and providing an example on how we can use objects from this class to collect and display multiple channels of our choice.

Sounds like there's another article in there somewhere.

Friday, April 27, 2007

Unholy Triumvirate: TextMate, MacPorts and Ruby

After switching back from a Ubuntu laptop to my MacBook Pro I was once again getting back to using TextMate to do some development and systems scripting. The combination of ruby and RubyGems have been a little bit rocky on OS X.

In part it was due to the default install of ruby on OS X, me using Fink for package management and then later switching from that to MacPorts.

Apple (and I presumably) suck cvyrf.

The problem I ran into was that after installing ruby and rb-rubygem via the ports system, TextMate no longer seems too interested in compiling ruby scripts when I hit CMD-R and provides me with a lovely:
"No such file to load ” rubygems
Checking Google the first listing I get is this.

It did not provide me with an applicable solution but got me thinking ... Either I have some environment variables that are not being set (or set incorrectly) or my library paths are screwy somehow.

An easy way to confirm the former is to check if your shell environment also suffers from the same malady:
$ ruby -r rubygems -e "p 1"
1

Not the problem then. Next step, let's pull out find and off a hunting we go:
$ sudo find / -name ruby -type f
Password:
/opt/local/bin/ruby /opt/local/var/db/dports/software/ruby/1.8.6_0/opt/local/bin/ruby /usr/bin/ruby
Let's see if there is some disparity between the ruby binary in /opt/local/bin and /usr/bin:
$ /usr/bin/ruby -v
ruby 1.8.2 (2004-12-25) [universal-darwin8.0]
$ /opt/local/bin/ruby -v
ruby 1.8.6 (2007-03-13 patchlevel 0) [i686-darwin8.9.1]

Well, what do you know. The version in /usr/bin is older and also looks for its libs in a non /opt location which means that it won't pick up the good work port has done for me. I moved /usr/bin/ruby to /tmp and added a soft link for /opt/local/bin/ruby to /usr/bin.

Running my script in TextMate now works like a charm!

Thursday, April 26, 2007

Puffing with SSHKeychain

In one of my previous articles I showed how you could use ssh-agent to your advantage to maximize lackadaisicalness. I have since then moved from the Ubuntu laptop that I was using at the time to my Mac that became available again.

I was looking for a nice and neat way to integrate ssh-agent into the Mac environment but could not get my shell scripting approach to gel elegantly. While doing the obligatory search on the web I found and fell in love with SSHKeychain.

This little app does all the had work (running ssh-agent from the correct place and exporting your keys into memory with ssh-add) for you, and more ... It not only handles the ssh-agent side of things but also provide support for integrating with the Apple Keychain and forward local ports over a ssh connection to set up ssh tunnels.

Go see the full feature list for more info.

Installation
Here are the step from their site:
  • Download SSHKeychain.dmg and mount it.
  • Copy SSHKeychain (SSHKeychain.app) to your Applications folder.
  • Run SSHKeychain. This should open a dock item and a statusbar item.
  • Click either the Statusbar Item, the Dock Item, or Main Menu and open the Preferences.
  • Open the Environment tab.
  • Enable "Manage global environment variables". This will make SSHKeychain available for other applications.
  • Open the keys tab and see if any of your keys are missing (~/.ssh/id_dsa and ~/.ssh/identity are default).
  • Re-login to make the global variables work.
  • Start up SSHKeychain, and you're set.
I added SSHKeychain to my Login Items in the System Preferences panel to ensure the app was running after a restart or log out/in sequence.

Setup
If you followed the installation instructions above there should be nothing further to do (assuming you had some pre-created keys in the default place like I had).

Excellent!

When I now fire Terminal.app up and log into a box that has my public key on it no password is required and I am logged in without further ado.

About Me

My photo
I love solving real-world problems with code and systems (web apps, distributed systems and all the bits and pieces in-between).