graphdb – Data Science, Data Analytics and Machine Learning Consulting in Koblenz Germany https://www.rene-pickhardt.de Extract knowledge from your data and be ahead of your competition Tue, 17 Jul 2018 12:12:43 +0000 en-US hourly 1 https://wordpress.org/?v=4.9.6 Slides of Related work application presented in the Graphdevroom at FOSDEM https://www.rene-pickhardt.de/slides-of-related-work-application-presented-in-the-graphdevroom-at-fosdem/ https://www.rene-pickhardt.de/slides-of-related-work-application-presented-in-the-graphdevroom-at-fosdem/#comments Sat, 02 Feb 2013 15:13:02 +0000 http://www.rene-pickhardt.de/?p=1530 Download the slidedeck of our talk at fosdem 2013 including all the resources that we pointed to.
Most important other links are:

was great talking here and again we are open source, open data and so on. So if you have suggestions or want to contribute feel free. Just do it or contact us. We are really looking forward to meet some other hackers that just want to go geek and change the world
the video of the talk can be found here:

]]>
https://www.rene-pickhardt.de/slides-of-related-work-application-presented-in-the-graphdevroom-at-fosdem/feed/ 6
Davy Suvee on FluxGraph – Towareds a time aware graph built on Datomic https://www.rene-pickhardt.de/davy-suvee-on-fluxgraph-towareds-a-time-aware-graph-built-on-datomic/ https://www.rene-pickhardt.de/davy-suvee-on-fluxgraph-towareds-a-time-aware-graph-built-on-datomic/#comments Sat, 02 Feb 2013 13:40:09 +0000 http://www.rene-pickhardt.de/?p=1522 Davy really nicely introduced the problem of looking at a snapshot of a data base. This problem obviously exists for any data base technology. You have a lot of timestamped records but running a query as if you fired it a couple of month ago is always a difficult challange.
With FluxGraph a solution to this is introduced.
How I understood him in the talk he introduces new versions of a vertex or an edge everytime it gets updated, added or removed. So far I am wondering about scaling and runtime. This approach seems like a lot of overhead to me. Later during Q & A I began to have the feeling that he has a more efficient way of storing this information so I really have to get in touch with davy to rediscuss the internals.
FluxGraph anyway provides a very clean API to access these temporal information.
On the various snapshots of the graph one is able to calculate for example the difference graph of the two checkpoints and gets  a fully blueprints compatible result graph.
github.com/datablend/fluxgraph
His use case comes from a data set with 15000 cancer patients from 2001 to 2010 on which he could ask questions.
As a resume I can say that Davy used his software for his work and open sourced it which is cool.

]]>
https://www.rene-pickhardt.de/davy-suvee-on-fluxgraph-towareds-a-time-aware-graph-built-on-datomic/feed/ 1
Frank Cellar introduces ArangoDB https://www.rene-pickhardt.de/frank-cellar-introduces-arangodb/ https://www.rene-pickhardt.de/frank-cellar-introduces-arangodb/#respond Sat, 02 Feb 2013 13:06:41 +0000 http://www.rene-pickhardt.de/?p=1516 Frank Cellar (https://twitter.com/fceller) introduces his ArangoDB which is basically a Document store (key, value) and uses a blueprint graph interface.
Interestingly he is doing his demonstrations on the DBLP data set which is highly relevant for Heinrich and my related work project which we are introducing in our talk.
ArangoDB has several APIs to interact with it they consist of:

  • available from JavaScript
  • RESTful HTTP API
  • Blueprint bindings (Gremlin support is available: nice!)

After playing around with the ArangoDB and Gremlin console Frank started to Introduce AQL (=ArangoDB Query Language). ArangoDB also consists of a traverser Framework.
After the talk I now know about the API’s of ArangoDB what I am missing is a benchmark against some other technologies. Still the technology looked very promising and I am sure we will have some looks at it.

]]>
https://www.rene-pickhardt.de/frank-cellar-introduces-arangodb/feed/ 0
Reading Club on distributed graph db returns with a new Format on April 4th 2012 https://www.rene-pickhardt.de/reading-club-on-distributed-graph-db-returns-with-a-new-format-on-april-4th-2012/ https://www.rene-pickhardt.de/reading-club-on-distributed-graph-db-returns-with-a-new-format-on-april-4th-2012/#comments Mon, 02 Apr 2012 09:37:05 +0000 http://www.rene-pickhardt.de/?p=1231 The reading club was quite inactive due to traveling and also a not optimal process for the choice of literature. That is why a new format for the reading club has been discussed and agreed upon. 
The new Format means that we have 4 new rules

  1. we will only discuss up to 3 papers in 90 minutes of time. So rough speaking we have 30 minutes per paper but this does not have to be strict.
  2. The decided papers should be read by everyone before the reading club takes place.
  3. For every paper there is one responsible person (moderator) who did read the entire paper before he suggested it as a common reading.
  4. Open questions to the (potential) reading assignments and ideas for reading can and should be discussed on http://related-work.rene-pickhardt.de/ (use the same template as I used for the reading assignments in this blogpost) eg:

Moderator:
Paper download:
Why to read it
topics to discuss / open questions:

For next meeting on April 4th 2 pm CET (in two days) the literature will be:

While preparing these papers we might come across some other interesting literature.
If you want to suggest some of the literature you should also read that piece of work until the reading club meeting takes place and know why you want everybody to prepare the same paper and discuss it (rule 3). Additionally you should open a topic on the paper on http://related-work.rene-pickhardt.de/ using the above template before the reading club takes place (rule 4)
I hope this is of help for the entire project and I am looking forward to the next meeting!

]]>
https://www.rene-pickhardt.de/reading-club-on-distributed-graph-db-returns-with-a-new-format-on-april-4th-2012/feed/ 6
Paul Wagner and Till Speicher won State Competition "Jugend Forscht Hessen" and best Project award using neo4j https://www.rene-pickhardt.de/paul-wagner-and-till-speicher-won-state-competition-jugend-forscht-hessen-and-best-project-award-using-neo4j/ https://www.rene-pickhardt.de/paul-wagner-and-till-speicher-won-state-competition-jugend-forscht-hessen-and-best-project-award-using-neo4j/#comments Fri, 16 Mar 2012 11:18:38 +0000 http://www.rene-pickhardt.de/?p=1204 6 months of hard coding and supervising by me are over and end with a huge success! After analyzing 80 GB of Google ngrams data Paul and Till put them to a neo4j graph data base in order to make predictions for fast scentence completion. Today was the award ceremony and the two students from Darmstadt and Saarbrücken (respectivly) won the first place. Additionally the received the “beste schöpferische Arbeit” award. Which is the award for the best project in the entire competition (over all disciplines).
With their technology and the almost finnished android app typing will be revolutionized! While typing a scentence they are able to predict the next word with a recall of 67% creating a huge additional vallue for today’s smartphones.
So stay tuned of the upcomming news and the federal competition on May in Erfurt.
Have a look at their website where you can find the (still) German Documentation. As well as the source code and a demo (which I also include here (use tab completion (-: as in unix bash)
Right now it only works for German Language – since only German data was processed – so try sentences like

  • “Warum ist die Banane krumm” (where the rare word krumm is correctly predicted due to the relation of the famous question why is the banana curved?
  • “Das kann ich doch auch” (I am also able to do that)
  • “geht wirklich nur deutsche Sprache ?” (Is really only German language possible?)


<br /> Ihr Browser kann leider keine eingebetteten Frames anzeigen:<br /> Sie können die eingebettete Seite über den folgenden Verweis<br /> aufrufen: <a href=”http://complet.typology.de” mce_href=”http://complet.typology.de” data-mce-href=”http://complet.typology.de”>Demo</a><br />

]]>
https://www.rene-pickhardt.de/paul-wagner-and-till-speicher-won-state-competition-jugend-forscht-hessen-and-best-project-award-using-neo4j/feed/ 11
Related work of the Reading club on distributed graph data bases (Beehive, Scalable SPARQL Querying of Large RDF Graphs, memcached) https://www.rene-pickhardt.de/related-work-of-the-reading-club-on-distributed-graph-data-bases-beehive-scalable-sparql-querying-of-large-rdf-graphs-memcached/ https://www.rene-pickhardt.de/related-work-of-the-reading-club-on-distributed-graph-data-bases-beehive-scalable-sparql-querying-of-large-rdf-graphs-memcached/#comments Wed, 07 Mar 2012 16:34:00 +0000 http://www.rene-pickhardt.de/?p=1166 Today we finally had our reading club and discussed several papers from last week’s asignments
Before I give my usual summary I want to introduce our new infrastructure for the reading club. Go to: 
http://related-work.rene-pickhardt.de/
There you can find a question and answer system which we will use to discuss questions and answers of papers. Due to the included voting system we thought this is much more convenient than a closed unstructured mailing list. I hope this is of help to the entire community and I can only invite anyone to read and and discuss with us on http://related-work.rene-pickhardt.de/

Reading list for next meeting Wed March 14th 2 pm CET

We first discussed the memcached paper:

One of the first topics we discussed was how is the dynamically hash done? We also wondered how DHT take care of overloading in general? In the memcached paper this fact is not discussed very well. Schegi knows a good paper that explains the dynamics behind DHT’s and will provide the link soon.
Afterwards we discussed what would happen if a distributed Key Value store like memcached is used to implement a graph store. Obviously creating a graph store on the Key value model is possible. Additionally memcached is very fast in its lookups. One could add another persistence layer to memcached that woul enable disk writes. 
We think the main counter arguments are:

  • In this setting graph distribution to worker nodes is randomly done.
  • No performance gain by graph partitioning possible

We realized that we should really read about distributed graph distribution
If using memcached you can store much more than an adjacncy list in the value of one key. In this way reducing information needed.
Again I pointed out that seperating the data model from the data storage could help essentially. I will soon write an entire blog article about this idea in the stetting of relational / graph models and relational database management systems.
personally I am still convinced that memcached could be used to improve asynchronous message passing in distributed systems like signal / collect

Scalable SPARQL Query of Large RDF graphs:

We agreed that one of the core principles in this paper is that they remove supernodes (everything connected via RDF type) in order to have a much sparser graph and do the partitioning (which speed up computation a lot) afterwards they added the supernodes as a redundancy to all workers where the supernodes could be needed. This methodology could generalize pretty well to arbitrary graphs: You just look at the node degree and remove the x% nodes with highest degree from the graph run a cluster algorithm and then add the supernodes in a redundant way to the workers. 
Thomas pointed out that this paper had a drawback of not using a distributed cluster algorithm but then used a framework like map reduce

Beehive:

We all agreed that the beehive paper was solving a problem with a really great methodology by first looking into query distribution and then using proactive caching strategies. The interesting points are that they create an analytical model which they can solve in a closed way. The p2p protocols are enhanced by gossip talk to distribute the parameters of the protocol. In this way an adaptive system is created which will adjust its caching strategy once the queries are changing.
We thought that the behive approach could be generalized to various settings. Especially it might be possible to not only analyze zipf distributions but also other distributions of the queries and derive various analytical models which could even coexist in such a system.
You can find our questions and thoughts and joind our discussion about beehive online!

Challenges in parallel Graph processing:

Unfortunately we did not really have the time to discuss this – in my opinion – great paper. I created a discussion in our new question board. so feel free to discuss this paper at: http://related-work.rene-pickhardt.de/questions/13/challenges-in-parallel-graph-processing-what-do-you-think

]]>
https://www.rene-pickhardt.de/related-work-of-the-reading-club-on-distributed-graph-data-bases-beehive-scalable-sparql-querying-of-large-rdf-graphs-memcached/feed/ 1