Comments on: Google Pregel vs Signal Collect for distributed Graph Processing – pros and cons

By: Philip Stutz

Mon, 12 Mar 2012 16:52:51 +0000

That was a very interesting read, thank you for the analysis and comparison. A few comments about Signal/Collect (I am one of the authors of the paper):
“From here the authors say that one can get rid of the superstep model and make the entire calculation asynchronous. This is done by introducing randomization on the set of vertices on which signal and collect computations have to be computed (as long as the threshold scores are overcome)”
The execution in one example is random to illustrate that the asynchronous execution gives no guarantees about the execution order. In practice we use different heuristics (above-average, eager) to determine in which order to execute the signal/collect operations. The randomness is just used to explain/argue the absent guarantees; it is not actually implemented that way.
Sorry for not explaining this better in the paper.
“I personally understand that Signal Collect can only send signals from one vertex to another if an edge exists and is also not able to add or remove edges or vertices.”
It is possible to modify the graph even during computations (https://github.com/uzh/signal-collect/blob/master/src/main/scala/com/signalcollect/GraphEditor.scala). Modifying the graph can be problematic, because concurrent modifications introduce nondeterminism.
You can also send messages without edges, but it is a slight violation of the programming model and should mainly be done to save memory.
“I wonder how an integration of memcached to Signal/Collect would work in order to make the asynchronous computation possible in a distributed fashion.”
The architecture of Signal/Collect is based on message passing, so we are using the Akka actor framework for distribution. This should be more efficient than remote reads.

]]>

By: Philip Stutz

Mon, 12 Mar 2012 16:52:51 +0000

]]>

By: From Graph (batch) processing towards a distributed graph data base

Thu, 23 Feb 2012 12:45:29 +0000

[…] memcached paper: To understand how for distributed shared memory works which could essentially speed up approaches like Signal Collect […]

]]>

By: From Graph (batch) processing towards a distributed graph data base

Thu, 23 Feb 2012 12:45:29 +0000

[…] memcached paper: To understand how for distributed shared memory works which could essentially speed up approaches like Signal Collect […]

]]>

By: Stefan

Wed, 22 Feb 2012 12:50:23 +0000

My suggestions what to read next:
CHALLENGES IN PARALLEL GRAPH PROCESSING:
http://www.google.de/url?sa=t&rct=j&q=graphs-and-machines.pdf&source=web&cd=2&ved=0CDQQFjAB&url=http%3A%2F%2Fwww.sandia.gov%2F~bahendr%2Fpapers%2Fgraphs-and-machines.pdf&ei=teFET6idEsSK4gSAmYmPAw&usg=AFQjCNEpe-PZOrePN4eiiMROi7R3eDBSyQ&sig2=WOFa9lQJICFl7oHJPWl58w
And we should take a look at the stuff of the boost people:
http://www.boost.org/doc/libs/1_48_0/libs/graph/doc/index.html
http://www.boost.org/doc/libs/1_48_0/libs/graph_parallel/doc/html/index.html

]]>

By: Stefan

Wed, 22 Feb 2012 12:50:23 +0000

]]>

By: Google Pregel vs Signal Collect for distributed Graph Processing – pros and cons « Another Word For It

Wed, 22 Feb 2012 01:03:12 +0000

[…] Google Pregel vs Signal Collect for distributed Graph Processing – pros and cons […]

]]>

By: Google Pregel vs Signal Collect for distributed Graph Processing – pros and cons « Another Word For It

Wed, 22 Feb 2012 01:03:12 +0000

[…] Google Pregel vs Signal Collect for distributed Graph Processing – pros and cons […]

]]>