Monday, 22 November 2010

Creating dynamic methods with closures in Ruby

I came across an interesting problem while working on my yet to be birthed Microsisv project leadcapturer.  I want to document this discovery before I forget all about it.  My product, leadcapturer will scrape company and lead details from directory websites to provide fresh leads for marketing departments. In order to help me achieve this difficult goal, I have been using the totally awesome Nokogiri which is a wrapper around the C based libxml libraries.  WIth Nokogiri’s assistance, I can perform xpath on html documents even if the documents are invalid xml which let us face facts, most html documents are.  

Which brings me to the point of this post.  I wanted to use a regular expression in an xpath expression to return all text nodes that match the regex pattern.  This is not possible in normal xpath but is possible with the help of Nokogiri.  The nokogiri documentation provides the code listed below as an example:

  1 node.xpath('.//title[regex(., "\w+")]', Class.new {
  2 def regex(node_set, regex)
  3   node_set.find_all { |node| node['some_attribute'] =~ /#{regex}/ }
  4 end
  5 }.new)

In the above example, a new instance method named regex is created within an anonymous class defined by Class.new.  This ruby instance method is used as a predicate for the xpath expression.  A predicate is a Boolean expression which in an xpath context will return all nodes that are true for the boolean function defined within the square brackets.  In the above example, we want all nodes with an attribute of some_attribute that match the regex “\w+”.

The problem was that I wanted to make use of closures to call methods on the
outer class that defines the Class.new anonymous class.  When I mention closure in this context, I specifically mean being able to refer to variables from the context in which the closure was created.  The variable I wanted to make use of in this case was the self of the outer class that defines the anonymous class.  In Ruby methods are not closures, only blocks are.  I could not use self in the Class.new regex instance method because self in this context would refer to the anonymous class and not the containing class.

The answer to this puzzle was to use
define_method which allows an author to dynamically create a new method on an object.  define_method takes a method or a block as an argument.  As I stated earlier, methods are not closures in Ruby but blocks are.  As I can pass a block to define_method, my problem was solved, here is a stripped down version of the end result:

  1 lead = self
  2 expression = /\w+/
  3 parent.xpath("./descendant::text()[regex(.)]", Class.new{
  4   define_method(:regex) do |node_set|
  5     result = node_set.find_all do |node|
  6       if node.text =~ expression
  7         lead.attribute = node.text
  8         return true
  9       end
 10     false
 11   end
 12 end
 13 }.new)

In line 1, I am binding the containing or outer class to a variable named lead which means I can call methods of the outer class  in the anonymous inner class defined by Class.new.

In line 4, I am using
define_method to dynamically create a method named regex and also pass a block as an argument to define_method that will become the dynamically created instance method’s method body.

In line 7 I am able to call a method (
attribute=) of the outer class by using the variable I bound to self in line 1.

I think this is pretty cool and one of the reasons why I am really enjoying the different dynamic paradigms available in Ruby.  I also think that this is a more readable alternative that can be used instead of the “magic” of
method_missing. Depending on your circumstances of course.


2 comments:

  1. Like it. Custom regex method in xslt. So much more versatile than wrestling with mad xpath functions http://is.gd/hzXdZ

    ReplyDelete
  2. Perfect. I was looking for something with this kind of power in Ruby, and I struggled through the various closure-like mechanisms in Ruby, but this works.

    Funny how easily I did this in Java:

    new MyAbstractClass() {
    public void call() {
    ContainingClass.this.foo();
    }
    }

    ReplyDelete