As we know neo4j has a master slave replication with eventual consistency so there is not the typical ACID requirements. The way is ether wring the master which pushes to the slaves. But it is also possible to write to the slaves directly which is super save but much slower since syncronization between slaves is required.

In gerneral (not very specific to neo4j there are a view concerns)

  • Cluster management (how to handle new machines joining or leaving the cluster as well as heartbeat messages) this also holds true for failover (Master election, Distribution of Master status)
  • Replication (synchronized id-generation, distributed locks, and so on

Neo4j was building on Apache Zookeeper to take care of the concerns. Michael points out that there have been problems with using Zookeeper.

  • How to koordinate Zookeeper with neo4j cluster
  • unrelieable operations
  • people did not like the typology required from the zookeper architecture
  • Also Zookeeper is electing a new master to often which especially bad in a heavy load environment
  • no dynamic reconfigeration of the Zookeeper cluster.

The solution of neo4j was to rewrite the multi-paxos paradigm and replace zookeper. Micheal especially suggests to read the Paxos Made Simple paper by Leslie Lamport. The core exists of State Machines implemented using Java Enums.

I still remember a lot of discussions in the reading club on distributed graph data bases. We never actually looked into Apache Zookeper and the Paxos paradigm which would certainly an interesting technique to learn!

In the next part there was a lot of detail discussions which where hard to follow for me since I am so far not familiar with the Paxos Paradigm.

If you are curious about the HA of neo4j and you can bet I am you can look into Peter’s screencast that leads you through setting up neo4j HA

Setting up a local HA cluster in Neo4j 1.9 from Peter Neubauer on Vimeo.

If you like this post, you might like these related posts:

  1. Slides of Related work application presented in the Graphdevroom at FOSDEM Download the slidedeck of our talk at fosdem 2013 including...
  2. Claudio Martella talks @ FOSDEM about Apache Giraph: Distributed Graph Processing in the Cloud Claudio Martella introduces Apache Giraph which according to him is a loose implementation...
  3. Nils Grunwald from Linkfluence talks at FOSDEM about Cascalog for graph processing Nils Grunwald works at the french startup Linkefluence. Their product...
  4. Davy Suvee on FluxGraph – Towareds a time aware graph built on Datomic Davy really nicely introduced the problem of looking at a...
  5. Video of FOSDEM talk finally online I was visiting FOSDEM 2013 with Heinrich Hartmann and talking...

Sharing:

Tags: , , , ,

1 Comment on Michael Hunger talks about High Availability of Neo4j built on Paxos in the GraphDevroom @ FOSDEM

  1. [...] talked about the new Neo4j HA architecture and Peter explained the ideas behind the Linked Data Benchmark Council [...]

Leave a Reply

*

Close

Subscribe to my newsletter

You don't like mail?