Archive for February, 2007

Interview: Martin Fowler – man in the know

February 2, 2007

I am in search of an empty room at the Øredev conference. Normally this is an easy task, but I’ve got Martin Fowler on my tail. My mind is still blank. What on earth can I ask him that he hasn’t already written himself?

Finally, an empty room, well almost. Another speaker, Erik Dörnenburg, is sitting half way into his screen and mutters.
– What’s up, I ask.
– I’ve updated my machine and my demo doesn’t work. I’ve got 45 minutes until the presentation.
We sat down next to him. Do not disturb a developer while he’s coding…

So I got a man who has coined phrases like Dependency Injection and POJO in front of me. What next? Martin is easily recognizable both in accent and appearance, a frequent and brilliant speaker.
He has an excellent web site, http://www.martinfowler.com, which contains loads about his work. Articles and references abound. That is when it suddenly hits me – who is he as a programmer and person?

Q: When was the last time you coded?
Well, I do code my own website. But it’s been a while since I had any paying customers. I’ve been pairing quite recently though. A real delivery? That was some time ago. I’m actually afraid to lose contact with code, but I have smart people around me.

Q: But what makes you tick?
I enjoy trying to figure out new techniques – to organize knowledge. I see myself as a conduit [ledning] for transferring knowledge, to process what is out there and make some kind of structure out of it. Brian Foot actually described me as an “intellectual jackal with a good taste in carrion” [intellektuell schackal med god smak för kadaver].
I look around for interesting stuff and try to make sense of it.
The “Refactoring” is a good example. I figured out how to describe it and wrote a book that came out when it could make a difference and move the area forward.
I also enjoy writing a lot, that’s a big thing. I’m better now at speaking, but that’s not what makes me tick.

Q: You’ve written quite a few books – how do they compare?
Out of the five, “Uml Distilled” sold more copies than the others put together. Usually you can’t make a living out of your books, I guess I could though.
All of the books had their good sides, but I would have to say that it was fun to write with Kent Beck [red: wrote “Extreme Programming”, created JUnit etc]. We were in tune and could support each other through the dull bits.
I would have to say though that I’m proudest of “Refactoring”. It’s an important technique and didn’t get the attention it should have received – the book helped.

Q: How did you start out?
I was an independent consultant for many years. Giving talks was a good way of getting jobs. Articles same thing – it got my name known.
Also, I write something because I don’t understand a certain area or technology. It’s a good way to learn.
Erik is now on the phone with California. We calculate that time is roughly 6:30 there – in the morning.

Q: Then what? How come you started working for Thoughtworks (TW)?
I’ve been there for six years and done a lot of consulting. I never wanted to work for a company, but there was something about TW that made me interested.
Get the work done and tons of bright people. But more importantly is that it is a sort of social experiment. A notion that good people makes a difference.
I hope we can affect IT, which is a difficult and skilled exercise at best.

Q: What is the most difficult part of being a celebrity?
I’m not an extrovert person. I’m not good at the “person to person”. I get emails with questions like “I got a problem on…what is the magic trick”. They worked for months on it and I can only point to a book. That clearly wasn’t an answer they liked. It’s frustrating.
However, celebrity is also a nice thing – it opens a few doors. I can email people like Rod Johnson [red CEO of Interface 21 that created the Spring framework] if I have a question about something. And he will answer.
People tend to think I’m an ingenious programmer. I’m not. I’m pretty good, but not necessarily that great.
Erik suddenly spits out:
– F—!…ok the demo will be shorter.

Q: Looking forward, what’s next?
Oh, there is tons of stuff to write about. The design patterns area for instance. I’m also interested in DSL [domain specific languages] and agile development. But in agile there are too many writers and I don’t like competition. There are too many smart people in agile development.
My strategy is to look for topics that no one has written about. Basically I don’t foretell the future.

Q: What are your top three pieces of advice to a programmer?
My first advice must be to learn to collaborate with the user or purchaser. The really good ideas usually come from them. You don’t have to be an expert to do this. This I found to be a good general advice.
Secondly, it would be “continuous learning”. It’s like running up a downwards-moving escalator – you have to keep running.
The third one is difficult…
“Buy lots of books by good authors” would be it.
Erik suddenly releases a big:
– Yes!
I saw Erik’s demo some twenty minutes later – it was really good.
As for Martin, our discussions continued well into the debate panel and beyond. He would frequently forget his back pain and sip into some extra energy pack. I wonder how he did that.

Originally published in JayView.

Neo – a netbase

February 1, 2007

Neo is a network-oriented database for semi-structured information. Too complicated, let us try again. Neo handles data in networks – nodes, relationships and properties – instead of tables. This means entirely new solutions for data that is diffi cult to handle in static tables. It could mean we can go agile all the way into the persistence layer.

The relational database represents one of the most important developments in the history of computer science. Upon its arrival some 30 years ago, it revolutionized the way the industry views data management and today it is practically ubiquitous.

In fact, it is so taken for granted that we, as an industry, have stopped thinking. Could there be a better way to represent and store our data? In some cases the answer is – yes, absolutely. The relational database is showing its age. Some telltale signs:

  • The mismatch between relational data and object oriented programming.
  • Schema evolution – updating your data model when the domain model changes – is just so much manual labor.
  • Semi-structured or network-oriented data is diffi cult to handle.

The Neo Database

Neo is a database that is built with a different philosophy. Where the relational database squeezes all data into static tables, Neo uses a fl exible and dynamic network model. This model allows data to evolve more naturally as business requirements change. There’s no need for “alter table…” on your production databases after you introduce a new version of the business layer and no need to rewire and migrate your O/R mapping confi gurations. The network will evolve along with your business logic. This spells agility.

Neo is an embedded persistence engine, which means it’s a small, lightweight and non-intrusive Java library that is easy to include in your development environment. It has been designed for performance and scalability and has been proven to handle large networks of data (100+ millions of nodes, relationships and properties). Neo is a newly founded open source project, but the software is robust. It has been in commercial production in a highly demanding 24/7 environment for almost four years and has full support for enterprise-grade features such as distributed ACID transactions, confi gurable isolation levels and full transaction recovery. But so much for sweet talk, let’s cut to some code!

Model and Code

Representation

In the Neo model, everything is represented by nodes, relationships and properties. A relationship connects two nodes and has a well-defi ned, mandatory type. Properties are key-value pairs that are attached to both nodes and relationships. When you combine nodes, relationships between them and properties on both nodes and relationships they form a node space – a coherent network representing your business domain data.
This may sound fancy, but it’s all very intuitive. Here is how a simple social network
might be modeled:

Figure 1

Figure 1: An example of a social network from a somewhat famous movie. Note the different type on the relation between Agent Smith and his creator The Architect.

Note how all nodes have integer identifi ers and how all relationships have a type (KNOWS or CODED_BY). In this example, all nodes have a “name” property. But some nodes have other properties, for example, an “age” property (node 1) or a “last name” property (node 3). There’s no overall schema that forces all nodes to look the same. This allows Neo to capture so-called semi-structured information: information that has a small amount of mandatory attributes but many optional attributes. Furthermore, the relationships have properties as well. In this example, all relationships have an “age” property to describe how long two people have known each other and some relationships have a “disclosure” property to describe whether the acquaintance is secret.
Working with nodes and relationships is easy. The basic operations are as follows:

Figure 2

This is an intuitive representation of a network and probably similar to many other implementations that want to represent a network of data in an object-oriented language. It’s worth noting, however, that relationships in this model are full-blown objects and not just implicit associations between nodes. If you have another look at the social network example, you’ll see that there’s more information in the relationships between nodes than in the nodes themselves. The value of a network is in the connections between the nodes and Neo’s model captures that.

Creating a Node Space

And now, finally some code. Here’s how we would create the Matrix social network
from figure 1:

Transaction tx = Transaction.begin();
EmbeddedNeo neo = ... // Get factory
// Create Thomas ’Neo’ Anderson
Node mrAnderson = neo.createNode();
mrAnderson.setProperty( ”name”, ”Thomas Anderson” );
mrAnderson.setProperty( ”age”, 29 );
// Create Morpheus
Node morpheus = neo.createNode();
morpheus.setProperty( ”name”, ”Morpheus” );
morpheus.setProperty( ”rank”, ”Captain” );
morpheus.setProperty( ”occupation”, ”Total bad ass” );
// Create a relationship representing that they know each other
mrAnderson.createRelationshipTo( morpheus,
   MatrixRelationshipTypes.KNOWS );
// Create Trinity, Cypher, Agent Smith, Architect similarly
...
tx.commit();

As you can see in the code above: It is rather easy to construct the node space for our Matrix example. And, of course, our network is made persistent once we commit.

Traversing a Node Space

Now that we know how to represent our domain model in the node space, how do we get information out of it? Unlike a relational database, Neo does not support a declarative query language. Instead, Neo provides an object-oriented traverser framework that allows us to express complex queries in plain Java. Working with the traverser framework is very straight-forward. The core abstraction is, unsurprisingly, the Traverser interface. A Traverser is a Java Iterable that encapsulates a “query” – i.e. a traversal on the node space such as “give me all Morpheus’ friends and his friends’ friends” or “does Trinity know someone who is acquainted with an agent?”. The most complex part of working with a Traverser is instantiating it. Here’s an example of how we would create a Traverser that will return all the (transitive) friends of the “Thomas Anderson” node of the example above:

// Instantiate a traverser that returns all mrAnderson’s friends
Traverser friendsTraverser = mrAnderson.traverse(
    Traverser.Order.BREADTH_FIRST,
    StopEvaluator.END_OF_NETWORK,
    ReturnableEvaluator.ALL_BUT_START_NODE,
    MatrixRelationshipTypes.KNOWS,
    Direction.OUTGOING );

Here we can see that traversers are created by invoking the traverse(...) method on a start node with a number of parameters. The parameters control the traversal and in this example they tell Neo to traverse the network breadth-first (rather than depth-fi rst), to traverse until it has covered all reachable nodes in the network (StopEvaluator.END_OF_NETWORK), to return all nodes except the fi rst (ReturnableEvaluator.ALL_BUT_START_NODE), , and to traverse all OUTGOING relationships of type KNOWS. How would we go about if we wanted to list the output of this traversal? After we’ve created a Traverser, working with it is as easy as working with any Java Iterable:

// Traverse the node space and print out the result
for ( Node friend : friendsTraverser )
{
    System.out.println( friend.getProperty( “name” ) + “ at depth “ +
        friendsTraverser.currentPosition().getDepth() );
}

Running the traversal above on the Matrix example would yield the following out-
put:

$ bin/run-neo-example
Morpheus at depth 1
Trinity at depth 1
Cypher at depth 2
Agent Smith at depth 3
$

As you can see, the Traverser has started at the “Thomas Anderson” node and run through the entire network along the KNOWS relationship type, breadth fi rst, and returned all nodes except the fi rst one. “The Architect” is missing from this output since the relationship connecting him is of a different type, CODED_BY. This is a small, contrived example. But the code would work equally well on a network with hundreds of millions of nodes, relationships and properties. Now, let’s look at a more complex traversal. Going with our example, suppose that we wanted to fi nd all “hackers of the Matrix,” where we defi ne a hacker of the Matrix as any node that you reach through a CODED_BY relationship. How would we create a Traverser that gives us those nodes? First off, we want to traverse both our relationship types (KNOWS and CODED_BY). Secondly, we want to traverse until the end of the network and lastly, we want to return only nodes which we came to through a CODED_BY relationship. Here’s the code:

// Instantiate a traverser that returns all hackers of the Matrix
Traverser hackerTraverser = mrAnderson.traverse(
    Traverser.Order.BREADTH_FIRST,
    StopEvaluator.END_OF_NETWORK,
    new ReturnableEvaluator()
    {
        public boolean isReturnableNode( TraversalPosition pos )
        {
            return pos.getLastRelationshipTraversed().
                isType( MatrixRelationshipTypes.CODED_BY );
        }
 },
 MatrixRelationshipTypes.CODED_BY,
 Direction.OUTGOING,
 MatrixRelationshipTypes.KNOWS,
 Direction.OUTGOING );

Now it’s getting interesting! The ReturnableEvaluator.ALL_BUT_START_NODE constant from the previous example was actually a convenience implementation of the ReturnableEvaluator interface. This interface contains a single method and you can supply a custom implementation of it to the traverser framework. It turns out that this is a simple but powerful way to express complex queries. Setting aside the anonymous inner class cruft surrounding the code in bold, we basically pass in a snippet of code that checks whether we traversed a relationship of type CODED_BY to get to the current node. If this statement is evaluated to “true” then the current node will be included in the set of nodes that is returned from the traverser.
When executed with a simple print loop, the above code prints the following:

$ bin/run-neo-example
The Architect
$

StopEvaluators work the same way. In our experience, writing custom evaluators
is very easy. Even the most advanced applications we have developed with Neo – applications that traverse extremely large and complex networks – are based on
evaluators that are rarely more than a few lines of code.

Conclusion

Neo is not a silver bullet and some areas needs to improve, for instance tools, standardizing the model and a query language. However, if your data is naturally ordered in a network or is semi-structured or you just need to go truly agile, give the Neo database a run for your money. We hope you find it, as we do, to be an elegant and fl exible alternative that is both robust and fast.

Emil Eifrém, Neo Technology
Björn Granvik, Jayway

Links

Neo specification
www.neo4j.org

Originally published in JayView.