Neo – a netbase

Neo is a network-oriented database for semi-structured information. Too complicated, let us try again. Neo handles data in networks – nodes, relationships and properties – instead of tables. This means entirely new solutions for data that is diffi cult to handle in static tables. It could mean we can go agile all the way into the persistence layer.

The relational database represents one of the most important developments in the history of computer science. Upon its arrival some 30 years ago, it revolutionized the way the industry views data management and today it is practically ubiquitous.

In fact, it is so taken for granted that we, as an industry, have stopped thinking. Could there be a better way to represent and store our data? In some cases the answer is – yes, absolutely. The relational database is showing its age. Some telltale signs:

  • The mismatch between relational data and object oriented programming.
  • Schema evolution – updating your data model when the domain model changes – is just so much manual labor.
  • Semi-structured or network-oriented data is diffi cult to handle.

The Neo Database

Neo is a database that is built with a different philosophy. Where the relational database squeezes all data into static tables, Neo uses a fl exible and dynamic network model. This model allows data to evolve more naturally as business requirements change. There’s no need for “alter table…” on your production databases after you introduce a new version of the business layer and no need to rewire and migrate your O/R mapping confi gurations. The network will evolve along with your business logic. This spells agility.

Neo is an embedded persistence engine, which means it’s a small, lightweight and non-intrusive Java library that is easy to include in your development environment. It has been designed for performance and scalability and has been proven to handle large networks of data (100+ millions of nodes, relationships and properties). Neo is a newly founded open source project, but the software is robust. It has been in commercial production in a highly demanding 24/7 environment for almost four years and has full support for enterprise-grade features such as distributed ACID transactions, confi gurable isolation levels and full transaction recovery. But so much for sweet talk, let’s cut to some code!

Model and Code

Representation

In the Neo model, everything is represented by nodes, relationships and properties. A relationship connects two nodes and has a well-defi ned, mandatory type. Properties are key-value pairs that are attached to both nodes and relationships. When you combine nodes, relationships between them and properties on both nodes and relationships they form a node space – a coherent network representing your business domain data.
This may sound fancy, but it’s all very intuitive. Here is how a simple social network
might be modeled:

Figure 1

Figure 1: An example of a social network from a somewhat famous movie. Note the different type on the relation between Agent Smith and his creator The Architect.

Note how all nodes have integer identifi ers and how all relationships have a type (KNOWS or CODED_BY). In this example, all nodes have a “name” property. But some nodes have other properties, for example, an “age” property (node 1) or a “last name” property (node 3). There’s no overall schema that forces all nodes to look the same. This allows Neo to capture so-called semi-structured information: information that has a small amount of mandatory attributes but many optional attributes. Furthermore, the relationships have properties as well. In this example, all relationships have an “age” property to describe how long two people have known each other and some relationships have a “disclosure” property to describe whether the acquaintance is secret.
Working with nodes and relationships is easy. The basic operations are as follows:

Figure 2

This is an intuitive representation of a network and probably similar to many other implementations that want to represent a network of data in an object-oriented language. It’s worth noting, however, that relationships in this model are full-blown objects and not just implicit associations between nodes. If you have another look at the social network example, you’ll see that there’s more information in the relationships between nodes than in the nodes themselves. The value of a network is in the connections between the nodes and Neo’s model captures that.

Creating a Node Space

And now, finally some code. Here’s how we would create the Matrix social network
from figure 1:

Transaction tx = Transaction.begin();
EmbeddedNeo neo = ... // Get factory
// Create Thomas ’Neo’ Anderson
Node mrAnderson = neo.createNode();
mrAnderson.setProperty( ”name”, ”Thomas Anderson” );
mrAnderson.setProperty( ”age”, 29 );
// Create Morpheus
Node morpheus = neo.createNode();
morpheus.setProperty( ”name”, ”Morpheus” );
morpheus.setProperty( ”rank”, ”Captain” );
morpheus.setProperty( ”occupation”, ”Total bad ass” );
// Create a relationship representing that they know each other
mrAnderson.createRelationshipTo( morpheus,
   MatrixRelationshipTypes.KNOWS );
// Create Trinity, Cypher, Agent Smith, Architect similarly
...
tx.commit();

As you can see in the code above: It is rather easy to construct the node space for our Matrix example. And, of course, our network is made persistent once we commit.

Traversing a Node Space

Now that we know how to represent our domain model in the node space, how do we get information out of it? Unlike a relational database, Neo does not support a declarative query language. Instead, Neo provides an object-oriented traverser framework that allows us to express complex queries in plain Java. Working with the traverser framework is very straight-forward. The core abstraction is, unsurprisingly, the Traverser interface. A Traverser is a Java Iterable that encapsulates a “query” – i.e. a traversal on the node space such as “give me all Morpheus’ friends and his friends’ friends” or “does Trinity know someone who is acquainted with an agent?”. The most complex part of working with a Traverser is instantiating it. Here’s an example of how we would create a Traverser that will return all the (transitive) friends of the “Thomas Anderson” node of the example above:

// Instantiate a traverser that returns all mrAnderson’s friends
Traverser friendsTraverser = mrAnderson.traverse(
    Traverser.Order.BREADTH_FIRST,
    StopEvaluator.END_OF_NETWORK,
    ReturnableEvaluator.ALL_BUT_START_NODE,
    MatrixRelationshipTypes.KNOWS,
    Direction.OUTGOING );

Here we can see that traversers are created by invoking the traverse(...) method on a start node with a number of parameters. The parameters control the traversal and in this example they tell Neo to traverse the network breadth-first (rather than depth-fi rst), to traverse until it has covered all reachable nodes in the network (StopEvaluator.END_OF_NETWORK), to return all nodes except the fi rst (ReturnableEvaluator.ALL_BUT_START_NODE), , and to traverse all OUTGOING relationships of type KNOWS. How would we go about if we wanted to list the output of this traversal? After we’ve created a Traverser, working with it is as easy as working with any Java Iterable:

// Traverse the node space and print out the result
for ( Node friend : friendsTraverser )
{
    System.out.println( friend.getProperty( “name” ) + “ at depth “ +
        friendsTraverser.currentPosition().getDepth() );
}

Running the traversal above on the Matrix example would yield the following out-
put:

$ bin/run-neo-example
Morpheus at depth 1
Trinity at depth 1
Cypher at depth 2
Agent Smith at depth 3
$

As you can see, the Traverser has started at the “Thomas Anderson” node and run through the entire network along the KNOWS relationship type, breadth fi rst, and returned all nodes except the fi rst one. “The Architect” is missing from this output since the relationship connecting him is of a different type, CODED_BY. This is a small, contrived example. But the code would work equally well on a network with hundreds of millions of nodes, relationships and properties. Now, let’s look at a more complex traversal. Going with our example, suppose that we wanted to fi nd all “hackers of the Matrix,” where we defi ne a hacker of the Matrix as any node that you reach through a CODED_BY relationship. How would we create a Traverser that gives us those nodes? First off, we want to traverse both our relationship types (KNOWS and CODED_BY). Secondly, we want to traverse until the end of the network and lastly, we want to return only nodes which we came to through a CODED_BY relationship. Here’s the code:

// Instantiate a traverser that returns all hackers of the Matrix
Traverser hackerTraverser = mrAnderson.traverse(
    Traverser.Order.BREADTH_FIRST,
    StopEvaluator.END_OF_NETWORK,
    new ReturnableEvaluator()
    {
        public boolean isReturnableNode( TraversalPosition pos )
        {
            return pos.getLastRelationshipTraversed().
                isType( MatrixRelationshipTypes.CODED_BY );
        }
 },
 MatrixRelationshipTypes.CODED_BY,
 Direction.OUTGOING,
 MatrixRelationshipTypes.KNOWS,
 Direction.OUTGOING );

Now it’s getting interesting! The ReturnableEvaluator.ALL_BUT_START_NODE constant from the previous example was actually a convenience implementation of the ReturnableEvaluator interface. This interface contains a single method and you can supply a custom implementation of it to the traverser framework. It turns out that this is a simple but powerful way to express complex queries. Setting aside the anonymous inner class cruft surrounding the code in bold, we basically pass in a snippet of code that checks whether we traversed a relationship of type CODED_BY to get to the current node. If this statement is evaluated to “true” then the current node will be included in the set of nodes that is returned from the traverser.
When executed with a simple print loop, the above code prints the following:

$ bin/run-neo-example
The Architect
$

StopEvaluators work the same way. In our experience, writing custom evaluators
is very easy. Even the most advanced applications we have developed with Neo – applications that traverse extremely large and complex networks – are based on
evaluators that are rarely more than a few lines of code.

Conclusion

Neo is not a silver bullet and some areas needs to improve, for instance tools, standardizing the model and a query language. However, if your data is naturally ordered in a network or is semi-structured or you just need to go truly agile, give the Neo database a run for your money. We hope you find it, as we do, to be an elegant and fl exible alternative that is both robust and fast.

Emil Eifrém, Neo Technology
Björn Granvik, Jayway

Links

Neo specification
www.neo4j.org

Originally published in JayView.

Tags: , ,

2 Responses to “Neo – a netbase”

  1. Beatriz Says:

    hello!,I like your writing so a lot! share we keep in touch more approximately your article on AOL?
    I need a specialist in this space to unravel my problem.

    Maybe that is you! Looking forward to peer you.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: