Comments on: Google Pregel vs Signal Collect for distributed Graph Processing – pros and cons https://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/ Extract knowledge from your data and be ahead of your competition Tue, 17 Jul 2018 11:07:57 +0000 hourly 1 https://wordpress.org/?v=4.9.6 By: Philip Stutz https://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/#comment-29487 Mon, 12 Mar 2012 16:52:51 +0000 http://www.rene-pickhardt.de/?p=1134#comment-29487 That was a very interesting read, thank you for the analysis and comparison. A few comments about Signal/Collect (I am one of the authors of the paper):
“From here the authors say that one can get rid of the superstep model and make the entire calculation asynchronous. This is done by introducing randomization on the set of vertices on which signal and collect computations have to be computed (as long as the threshold scores are overcome)”
The execution in one example is random to illustrate that the asynchronous execution gives no guarantees about the execution order. In practice we use different heuristics (above-average, eager) to determine in which order to execute the signal/collect operations. The randomness is just used to explain/argue the absent guarantees; it is not actually implemented that way.
Sorry for not explaining this better in the paper.
“I personally understand that Signal Collect can only send signals from one vertex to another if an edge exists and is also not able to add or remove edges or vertices.”
It is possible to modify the graph even during computations (https://github.com/uzh/signal-collect/blob/master/src/main/scala/com/signalcollect/GraphEditor.scala). Modifying the graph can be problematic, because concurrent modifications introduce nondeterminism.
You can also send messages without edges, but it is a slight violation of the programming model and should mainly be done to save memory.
“I wonder how an integration of memcached to Signal/Collect would work in order to make the asynchronous computation possible in a distributed fashion.”
The architecture of Signal/Collect is based on message passing, so we are using the Akka actor framework for distribution. This should be more efficient than remote reads.

]]>
By: Philip Stutz https://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/#comment-29488 Mon, 12 Mar 2012 16:52:51 +0000 http://www.rene-pickhardt.de/?p=1134#comment-29488 That was a very interesting read, thank you for the analysis and comparison. A few comments about Signal/Collect (I am one of the authors of the paper):
“From here the authors say that one can get rid of the superstep model and make the entire calculation asynchronous. This is done by introducing randomization on the set of vertices on which signal and collect computations have to be computed (as long as the threshold scores are overcome)”
The execution in one example is random to illustrate that the asynchronous execution gives no guarantees about the execution order. In practice we use different heuristics (above-average, eager) to determine in which order to execute the signal/collect operations. The randomness is just used to explain/argue the absent guarantees; it is not actually implemented that way.
Sorry for not explaining this better in the paper.
“I personally understand that Signal Collect can only send signals from one vertex to another if an edge exists and is also not able to add or remove edges or vertices.”
It is possible to modify the graph even during computations (https://github.com/uzh/signal-collect/blob/master/src/main/scala/com/signalcollect/GraphEditor.scala). Modifying the graph can be problematic, because concurrent modifications introduce nondeterminism.
You can also send messages without edges, but it is a slight violation of the programming model and should mainly be done to save memory.
“I wonder how an integration of memcached to Signal/Collect would work in order to make the asynchronous computation possible in a distributed fashion.”
The architecture of Signal/Collect is based on message passing, so we are using the Akka actor framework for distribution. This should be more efficient than remote reads.

]]>
By: From Graph (batch) processing towards a distributed graph data base https://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/#comment-29484 Thu, 23 Feb 2012 12:45:29 +0000 http://www.rene-pickhardt.de/?p=1134#comment-29484 […] memcached paper: To understand how for distributed shared memory works which could essentially speed up approaches like Signal Collect […]

]]>
By: From Graph (batch) processing towards a distributed graph data base https://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/#comment-29485 Thu, 23 Feb 2012 12:45:29 +0000 http://www.rene-pickhardt.de/?p=1134#comment-29485 […] memcached paper: To understand how for distributed shared memory works which could essentially speed up approaches like Signal Collect […]

]]>
By: Stefan https://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/#comment-29481 Wed, 22 Feb 2012 12:50:23 +0000 http://www.rene-pickhardt.de/?p=1134#comment-29481 My suggestions what to read next:
CHALLENGES IN PARALLEL GRAPH PROCESSING:
http://www.google.de/url?sa=t&rct=j&q=graphs-and-machines.pdf&source=web&cd=2&ved=0CDQQFjAB&url=http%3A%2F%2Fwww.sandia.gov%2F~bahendr%2Fpapers%2Fgraphs-and-machines.pdf&ei=teFET6idEsSK4gSAmYmPAw&usg=AFQjCNEpe-PZOrePN4eiiMROi7R3eDBSyQ&sig2=WOFa9lQJICFl7oHJPWl58w
And we should take a look at the stuff of the boost people:
http://www.boost.org/doc/libs/1_48_0/libs/graph/doc/index.html
http://www.boost.org/doc/libs/1_48_0/libs/graph_parallel/doc/html/index.html

]]>
By: Stefan https://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/#comment-29482 Wed, 22 Feb 2012 12:50:23 +0000 http://www.rene-pickhardt.de/?p=1134#comment-29482 My suggestions what to read next:
CHALLENGES IN PARALLEL GRAPH PROCESSING:
http://www.google.de/url?sa=t&rct=j&q=graphs-and-machines.pdf&source=web&cd=2&ved=0CDQQFjAB&url=http%3A%2F%2Fwww.sandia.gov%2F~bahendr%2Fpapers%2Fgraphs-and-machines.pdf&ei=teFET6idEsSK4gSAmYmPAw&usg=AFQjCNEpe-PZOrePN4eiiMROi7R3eDBSyQ&sig2=WOFa9lQJICFl7oHJPWl58w
And we should take a look at the stuff of the boost people:
http://www.boost.org/doc/libs/1_48_0/libs/graph/doc/index.html
http://www.boost.org/doc/libs/1_48_0/libs/graph_parallel/doc/html/index.html

]]>
By: Google Pregel vs Signal Collect for distributed Graph Processing – pros and cons « Another Word For It https://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/#comment-29478 Wed, 22 Feb 2012 01:03:12 +0000 http://www.rene-pickhardt.de/?p=1134#comment-29478 […] Google Pregel vs Signal Collect for distributed Graph Processing – pros and cons […]

]]>
By: Google Pregel vs Signal Collect for distributed Graph Processing – pros and cons « Another Word For It https://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/#comment-29479 Wed, 22 Feb 2012 01:03:12 +0000 http://www.rene-pickhardt.de/?p=1134#comment-29479 […] Google Pregel vs Signal Collect for distributed Graph Processing – pros and cons […]

]]>