Comments on: Google Pregel vs Signal Collect for distributed Graph Processing – pros and cons
https://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/
Extract knowledge from your data and be ahead of your competitionTue, 17 Jul 2018 11:07:57 +0000hourly1https://wordpress.org/?v=4.9.6By: Philip Stutz
https://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/#comment-29487
Mon, 12 Mar 2012 16:52:51 +0000http://www.rene-pickhardt.de/?p=1134#comment-29487That was a very interesting read, thank you for the analysis and comparison. A few comments about Signal/Collect (I am one of the authors of the paper):
“From here the authors say that one can get rid of the superstep model and make the entire calculation asynchronous. This is done by introducing randomization on the set of vertices on which signal and collect computations have to be computed (as long as the threshold scores are overcome)”
The execution in one example is random to illustrate that the asynchronous execution gives no guarantees about the execution order. In practice we use different heuristics (above-average, eager) to determine in which order to execute the signal/collect operations. The randomness is just used to explain/argue the absent guarantees; it is not actually implemented that way.
Sorry for not explaining this better in the paper.
“I personally understand that Signal Collect can only send signals from one vertex to another if an edge exists and is also not able to add or remove edges or vertices.”
It is possible to modify the graph even during computations (https://github.com/uzh/signal-collect/blob/master/src/main/scala/com/signalcollect/GraphEditor.scala). Modifying the graph can be problematic, because concurrent modifications introduce nondeterminism.
You can also send messages without edges, but it is a slight violation of the programming model and should mainly be done to save memory.
“I wonder how an integration of memcached to Signal/Collect would work in order to make the asynchronous computation possible in a distributed fashion.”
The architecture of Signal/Collect is based on message passing, so we are using the Akka actor framework for distribution. This should be more efficient than remote reads.
]]>By: Philip Stutz
https://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/#comment-29488
Mon, 12 Mar 2012 16:52:51 +0000http://www.rene-pickhardt.de/?p=1134#comment-29488That was a very interesting read, thank you for the analysis and comparison. A few comments about Signal/Collect (I am one of the authors of the paper):
“From here the authors say that one can get rid of the superstep model and make the entire calculation asynchronous. This is done by introducing randomization on the set of vertices on which signal and collect computations have to be computed (as long as the threshold scores are overcome)”
The execution in one example is random to illustrate that the asynchronous execution gives no guarantees about the execution order. In practice we use different heuristics (above-average, eager) to determine in which order to execute the signal/collect operations. The randomness is just used to explain/argue the absent guarantees; it is not actually implemented that way.
Sorry for not explaining this better in the paper.
“I personally understand that Signal Collect can only send signals from one vertex to another if an edge exists and is also not able to add or remove edges or vertices.”
It is possible to modify the graph even during computations (https://github.com/uzh/signal-collect/blob/master/src/main/scala/com/signalcollect/GraphEditor.scala). Modifying the graph can be problematic, because concurrent modifications introduce nondeterminism.
You can also send messages without edges, but it is a slight violation of the programming model and should mainly be done to save memory.
“I wonder how an integration of memcached to Signal/Collect would work in order to make the asynchronous computation possible in a distributed fashion.”
The architecture of Signal/Collect is based on message passing, so we are using the Akka actor framework for distribution. This should be more efficient than remote reads.
]]>By: From Graph (batch) processing towards a distributed graph data base
https://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/#comment-29484
Thu, 23 Feb 2012 12:45:29 +0000http://www.rene-pickhardt.de/?p=1134#comment-29484[…] memcached paper: To understand how for distributed shared memory works which could essentially speed up approaches like Signal Collect […]
]]>By: From Graph (batch) processing towards a distributed graph data base
https://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/#comment-29485
Thu, 23 Feb 2012 12:45:29 +0000http://www.rene-pickhardt.de/?p=1134#comment-29485[…] memcached paper: To understand how for distributed shared memory works which could essentially speed up approaches like Signal Collect […]
]]>By: Stefan
https://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/#comment-29481
Wed, 22 Feb 2012 12:50:23 +0000http://www.rene-pickhardt.de/?p=1134#comment-29481My suggestions what to read next:
CHALLENGES IN PARALLEL GRAPH PROCESSING: http://www.google.de/url?sa=t&rct=j&q=graphs-and-machines.pdf&source=web&cd=2&ved=0CDQQFjAB&url=http%3A%2F%2Fwww.sandia.gov%2F~bahendr%2Fpapers%2Fgraphs-and-machines.pdf&ei=teFET6idEsSK4gSAmYmPAw&usg=AFQjCNEpe-PZOrePN4eiiMROi7R3eDBSyQ&sig2=WOFa9lQJICFl7oHJPWl58w
And we should take a look at the stuff of the boost people: http://www.boost.org/doc/libs/1_48_0/libs/graph/doc/index.html http://www.boost.org/doc/libs/1_48_0/libs/graph_parallel/doc/html/index.html
]]>By: Stefan
https://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/#comment-29482
Wed, 22 Feb 2012 12:50:23 +0000http://www.rene-pickhardt.de/?p=1134#comment-29482My suggestions what to read next:
CHALLENGES IN PARALLEL GRAPH PROCESSING: http://www.google.de/url?sa=t&rct=j&q=graphs-and-machines.pdf&source=web&cd=2&ved=0CDQQFjAB&url=http%3A%2F%2Fwww.sandia.gov%2F~bahendr%2Fpapers%2Fgraphs-and-machines.pdf&ei=teFET6idEsSK4gSAmYmPAw&usg=AFQjCNEpe-PZOrePN4eiiMROi7R3eDBSyQ&sig2=WOFa9lQJICFl7oHJPWl58w
And we should take a look at the stuff of the boost people: http://www.boost.org/doc/libs/1_48_0/libs/graph/doc/index.html http://www.boost.org/doc/libs/1_48_0/libs/graph_parallel/doc/html/index.html
]]>By: Google Pregel vs Signal Collect for distributed Graph Processing – pros and cons « Another Word For It
https://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/#comment-29478
Wed, 22 Feb 2012 01:03:12 +0000http://www.rene-pickhardt.de/?p=1134#comment-29478[…] Google Pregel vs Signal Collect for distributed Graph Processing – pros and cons […]
]]>By: Google Pregel vs Signal Collect for distributed Graph Processing – pros and cons « Another Word For It
https://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/#comment-29479
Wed, 22 Feb 2012 01:03:12 +0000http://www.rene-pickhardt.de/?p=1134#comment-29479[…] Google Pregel vs Signal Collect for distributed Graph Processing – pros and cons […]
]]>