From Graph (batch) processing towards a distributed graph data base

Yesterdays meeting of the reading club was quite nice. We all agreed that the papers where of good quality and we gained some nice insights. The only drawback of the papers was that it did not directly tell us how to achieve our goal for a real time distributed graph data base technology. In the readings for next meeting (which will take place Wednesday March 7th 2pm CET) we tried to choose papers that don’t discuss these distributed graph / data processing techniques but   focus more on speed or point out the general challenges in parallel graph processing.

Readinglist for next Meeting (Wednesday March 7th 2pm CET)

Again while reading an preparing stuff feel free to add more reading wishes to the comments of this blog post or drop me a mail!

Summary of yesterdays meeting

As written in the introduction we agreed that the papers where interesting but not heading in our direction. Claudio pointed out that everyone should consider the following set of questions.

  • Do we want the graph to be mutable or is it supposed to writable or is it supposed to be read only?
    • writing makes sens. If it is read only it is called batch processing
    • Writing is hard you care about locking consistancy
  • Do we want to answer queries (Cypher/gremlin/whatever)?
  • Do we want to provide an API for processing?
  • How big is the data set we want to support
    • many people do in memory
    • If you go to the disk you open a whole new bottle of topics
    • One approach would be to solve the problem in memory first.

I am very confident that it was a good idea to start with graph processing but that we are taking the right steps now to go in the direction of real distributed graph data base systems. I think there are some more questions and high level assumptions that one has to fix which I will post in a few days on this blog. Sorry I am in a hurry for this day / rest of the week.

Infrastructure

Schegi just suggested to create a Mailingliste for the reading club or to switch to Google Groups. He pointed out that a private blog is kind of a weired medium to be so central. What is your opinion on that? Do we need some other / more formal infrastructure?

You may also like...

Popular Posts

8 Comments

  1. One thing: i don’t believe that bach processing read-only. Pregel/Giraph supports modifying the graph during computation and writing it at the end of it.
    Second thing: i agree on the necessity of a different form of medium, i propose google groups+google docs.

    1. Mhm, it ate some of my sentence:
      It was originally: i don’t believe that batch processing means read-only and vice versa.

  2. As far as mailing list, etc., discussion on a variety of graph topics might be easier. But then your blog hasn’t been over-run with comments. 😉
    Whatever is the easiest for you to maintain. If the group grows and continues, can always add more facilities later.
    Hope you are having a great week!
    Patrick

  3. Peter from Neotechnology at the very beginning of the reading club suggested to use http://www.meetup.com/ I just stumbled upon this on another website. Anyone has some thoughts on this?

  4. […] already found the time to look over our courrent reading assignments. Especially the VLDB paper (Topology partitioning applied to SPARQL, HADOOP and TripleStores) and […]

  5. […] From Graph (batch) processing towards a distributed graph data base by René Pickhardt. […]

  6. […] Today we finally had our reading club and discussed several papers from last week’s asignments.  […]

Leave a Reply

Your email address will not be published. Required fields are marked *