The start of the Linked Data benchmark council Eu FP7 Big Data pro

Peter who is working for Neo4j is an industry partner of the http://www.ldbc.eu/ which is a EU FP7 Project in the Big Data call.
The goal of this project is to put out good methodologies for benchmarking linked open data and rdf stores as well as graph data bases. In this context the council should also provide data sets for benchmarking.
Peter points out that a simple problem exists with benchmarks:”who ever puts it out wins” One simple reason is that benchmarking has so many flexible variables that it is really hard. he compared the challanges to the tpc http://de.wikipedia.org/wiki/Transaction_Processing_Performance_Council
After talking about the need for good benchmarks he pointed out again why the transaction processing Performence Council Benchmarks are not sufficient anymore giving many different examples of exploding big graphs being around in the world (Facebook, Google Knowledge Graph, Linked open data, dbpedia).
Since the project is really new Peter could not report any results yet. Anyway I am pretty sure that anyone interested in graph data bases and graph data should look into the project which has the following list of deliverables

overvew of current graph benchmakrs and designs
benchmark principles and methods
Query Languages (Cypher, Gremlin, SPARQL)
Analysis and classification of Choke points (Supernodes, data generators)
Benchmark transactions (which are in general very slow)
Benchmark the complexity of queries
Analysis (if anyone has data sets and usecases contact the LDBC, actually I think we have data comming from related work)
Navigational benchmark (e.g. open streetmaps)
Benchmarking design for pattern matching (e.g. SPARQL and Cypher)

As you could here from beside there is a huge discussion going on about query languages which i like. Creating a query language is a tough task. The more expressive a language is (like SPARQL) the less efficient this might become. So I hope the EU project will really create some good solid output. I am also happy that many different industry vendors are part of this project. In this sense the results will hopefully be objective and don’t suffer from the “Who ever puts it on wins” paradigm.
Interestingly the LDBC makes a speration between graph data bases and rdf stores which I am very pleased to see and have been thinking a lot.

Popular Posts

What are the 57 signals google uses to filter search results?

Graphity: An efficient Graph Model for Retrieving the Top-k News Feeds for users in social networks

Algorithmic Information Filter from Eli Pariser’s TED Talks

Time lines and news streams: Neo4j is 377 times faster than MySQL

Leave a Reply Cancel reply

You may also like...

Popular Posts

Leave a Reply Cancel reply