graphdevroom – Data Science, Data Analytics and Machine Learning Consulting in Koblenz Germany https://www.rene-pickhardt.de Extract knowledge from your data and be ahead of your competition Tue, 17 Jul 2018 12:12:43 +0000 en-US hourly 1 https://wordpress.org/?v=4.9.6 Davy Suvee on FluxGraph – Towareds a time aware graph built on Datomic https://www.rene-pickhardt.de/davy-suvee-on-fluxgraph-towareds-a-time-aware-graph-built-on-datomic/ https://www.rene-pickhardt.de/davy-suvee-on-fluxgraph-towareds-a-time-aware-graph-built-on-datomic/#comments Sat, 02 Feb 2013 13:40:09 +0000 http://www.rene-pickhardt.de/?p=1522 Davy really nicely introduced the problem of looking at a snapshot of a data base. This problem obviously exists for any data base technology. You have a lot of timestamped records but running a query as if you fired it a couple of month ago is always a difficult challange.
With FluxGraph a solution to this is introduced.
How I understood him in the talk he introduces new versions of a vertex or an edge everytime it gets updated, added or removed. So far I am wondering about scaling and runtime. This approach seems like a lot of overhead to me. Later during Q & A I began to have the feeling that he has a more efficient way of storing this information so I really have to get in touch with davy to rediscuss the internals.
FluxGraph anyway provides a very clean API to access these temporal information.
On the various snapshots of the graph one is able to calculate for example the difference graph of the two checkpoints and gets  a fully blueprints compatible result graph.
github.com/datablend/fluxgraph
His use case comes from a data set with 15000 cancer patients from 2001 to 2010 on which he could ask questions.
As a resume I can say that Davy used his software for his work and open sourced it which is cool.

]]>
https://www.rene-pickhardt.de/davy-suvee-on-fluxgraph-towareds-a-time-aware-graph-built-on-datomic/feed/ 1
Frank Cellar introduces ArangoDB https://www.rene-pickhardt.de/frank-cellar-introduces-arangodb/ https://www.rene-pickhardt.de/frank-cellar-introduces-arangodb/#respond Sat, 02 Feb 2013 13:06:41 +0000 http://www.rene-pickhardt.de/?p=1516 Frank Cellar (https://twitter.com/fceller) introduces his ArangoDB which is basically a Document store (key, value) and uses a blueprint graph interface.
Interestingly he is doing his demonstrations on the DBLP data set which is highly relevant for Heinrich and my related work project which we are introducing in our talk.
ArangoDB has several APIs to interact with it they consist of:

  • available from JavaScript
  • RESTful HTTP API
  • Blueprint bindings (Gremlin support is available: nice!)

After playing around with the ArangoDB and Gremlin console Frank started to Introduce AQL (=ArangoDB Query Language). ArangoDB also consists of a traverser Framework.
After the talk I now know about the API’s of ArangoDB what I am missing is a benchmark against some other technologies. Still the technology looked very promising and I am sure we will have some looks at it.

]]>
https://www.rene-pickhardt.de/frank-cellar-introduces-arangodb/feed/ 0
Michael Hunger talks about High Availability of Neo4j built on Paxos in the GraphDevroom @ FOSDEM https://www.rene-pickhardt.de/michael-hunger-talks-about-high-availability-of-neo4j-built-on-paxos-in-the-graphdevroom-fosdem/ https://www.rene-pickhardt.de/michael-hunger-talks-about-high-availability-of-neo4j-built-on-paxos-in-the-graphdevroom-fosdem/#comments Sat, 02 Feb 2013 12:01:24 +0000 http://www.rene-pickhardt.de/?p=1511 As we know neo4j has a master slave replication with eventual consistency so there is not the typical ACID requirements. The way is ether wring the master which pushes to the slaves. But it is also possible to write to the slaves directly which is super save but much slower since syncronization between slaves is required.
In gerneral (not very specific to neo4j there are a view concerns)

  • Cluster management (how to handle new machines joining or leaving the cluster as well as heartbeat messages) this also holds true for failover (Master election, Distribution of Master status)
  • Replication (synchronized id-generation, distributed locks, and so on

Neo4j was building on Apache Zookeeper to take care of the concerns. Michael points out that there have been problems with using Zookeeper.

  • How to koordinate Zookeeper with neo4j cluster
  • unrelieable operations
  • people did not like the typology required from the zookeper architecture
  • Also Zookeeper is electing a new master to often which especially bad in a heavy load environment
  • no dynamic reconfigeration of the Zookeeper cluster.

The solution of neo4j was to rewrite the multi-paxos paradigm and replace zookeper. Micheal especially suggests to read the Paxos Made Simple paper by Leslie Lamport. The core exists of State Machines implemented using Java Enums.
I still remember a lot of discussions in the reading club on distributed graph data bases. We never actually looked into Apache Zookeper and the Paxos paradigm which would certainly an interesting technique to learn!
In the next part there was a lot of detail discussions which where hard to follow for me since I am so far not familiar with the Paxos Paradigm.
If you are curious about the HA of neo4j and you can bet I am you can look into Peter’s screencast that leads you through setting up neo4j HA

Setting up a local HA cluster in Neo4j 1.9 from Peter Neubauer on Vimeo.

]]>
https://www.rene-pickhardt.de/michael-hunger-talks-about-high-availability-of-neo4j-built-on-paxos-in-the-graphdevroom-fosdem/feed/ 1