benchmark – Data Science, Data Analytics and Machine Learning Consulting in Koblenz Germany https://www.rene-pickhardt.de Extract knowledge from your data and be ahead of your competition Tue, 17 Jul 2018 12:12:43 +0000 en-US hourly 1 https://wordpress.org/?v=4.9.6 Comparison of platforms and places to use to host your MOOC https://www.rene-pickhardt.de/comparison-of-platforms-and-places-to-use-to-host-your-mooc/ https://www.rene-pickhardt.de/comparison-of-platforms-and-places-to-use-to-host-your-mooc/#comments Wed, 24 Jul 2013 16:03:50 +0000 http://www.rene-pickhardt.de/?p=1647 As many of you know and voted (thanks for that) Steffen and I tried to get a MOOC fellowship in order to create a web science MOOC. Even though our application was not successful we decided that online teaching in the MOOC format is suitable for the web science lecture. With the structure from our application and the teaching last term we have some basic structure for the content the students should learn. Now we start to create the material but the question is what platform to use and where to host a MOOC? I was actually planning to write one single article on that topic but it turned out that there are so many different approaches to online learning that I will have to split my work into several articles. So here I will just explain my methodology and the criteria I will use to compare the platforms for your MOOC.
There is a lot of good information about the MOOC industry and current trends in the MOOC wikipedia page
Basically there are 3 different approaches to online education:

  1. Free content: The focus of these platforms (Khan Academy, Wikiversity, OER Commons, P2P university,…) lies in freeing educational content from the publishing industry. In most cases the focus seems to be on content and not so much on learning paths or didactics or pedagogy. The argumentation seems to be like: “first we need the content, next we can think about how to use it”. Have alook at my blog post: http://www.rene-pickhardt.de/comparison-of-open-educational-resources-services-to-host-your-mooc/ to see which open platforms perform well.
  2. Commercial: There is a rising industry (Coursera, Udacity, edX, iversity,..) trying to commercialize massive open online education. Commercial platforms usually have high quality content and strong relationships with universities (most often ivy league) serving a lot of classes in this new format. Courses are usually not available under an open licence. So far most content is available at no cost and the business model is related to certification but also sometimes to tuition fees.
  3. Self hosted with the use of a learning management system: There are various learning management systems (OLAT, Moodle, Google Course Builder, ILIAS,…) available as open source software which enables one to host a MOOC oneself. Most of these systems are made for eLearning and but lack this MOOC feeling of excellent usability. Often their intent also is not primary to be open.

This means besides this article I will publish three blog articles comparing platforms for each of the 3 different approaches. There is a German list of Learning platforms on Wikipedia as well as the MOOC Template in the English wikipedia from which I extracted the following lists

Platforms for online education

People related to online education

Not all of the platforms are relevant for a Web Science MOOC but still I extracted some of the most relevant sites and added a fiew others. As for the evaluation methodology we did a little survey and identified some possibilities. Since there are so many hosting services and possibilities we tried to find some dimensions that are important to us in order find which hosting service makes the most sense. We will use the following dimensions for our evaluation:

  1. Overhead: How much overhead is associated providing the content for a certain platform infrastructure?
  2. Open: Will the platform accept our course?
  3. Licence: Who has the copyright and how is the licencing model?
  4. Hosting time: How much time of hosting does the platform guarantee?
  5. Open Format: Will the course content be in an open format so that we can easily export the data from the host and take it to some other service?
  6. Feedback: Feedback for instructors like how long do people interact with some content?
  7. Quizes: Will quizes be supported in the Platform
  8. Community:Is there an active community and exchange of instructors?
  9. Audience:Is there a large audience using the platform?
  10. Support: is there active support from the platform?
  11. Online Meetings: Does the platform support meetings of students and teachers on the cyberspace?
  12. Account Management: Is it possible to have different roles for the accounts (e.g. student, tutor, creator,…)?
  13. Risk: What are the risks of using this particular platform?

At least my goal would be to find a service with the following answers to our dimensions:

  1. Overhead: Little overhead to submit the course material.
  2. Open: The platform should be open to any course.
  3. Licence: We should maintain the copyright or the licence should be at least creative commons
  4. Hosting time: forever
  5. Open Format: data export of the material is needed. e.g respecting http://en.wikipedia.org/wiki/IMS_Global
  6. Feedback: In order to improve we need Feedback
  7. Quizes: We need various forms of quizes
  8. Community:A community of instructors with which one can exchange and from which one can learn would be amazing.
  9. Audience:In the end good content will win but the larger the audience the better
  10. Support: A platform that offers support with problems is preferable
  11. Online Meetings: It would be nice if the platform supports online meetings of users with Q&A systems or even with video chat.
  12. Account Management: Multiple account roles would support the learning process.
  13. Risk: Obviously we want the risks to be minimized

I am looking forward to your feedback of missing platforms or other dimensions for the evaluation of the learning platforms.

]]>
https://www.rene-pickhardt.de/comparison-of-platforms-and-places-to-use-to-host-your-mooc/feed/ 3
Video of FOSDEM talk finally online https://www.rene-pickhardt.de/video-of-fosdem-talk-finally-online/ https://www.rene-pickhardt.de/video-of-fosdem-talk-finally-online/#respond Tue, 25 Jun 2013 15:55:57 +0000 http://www.rene-pickhardt.de/?p=1663 I was visiting FOSDEM 2013 with Heinrich Hartmann and talking about related-work.net the video of this talk is finally online and of course I would like to share this with the community:

The slides can be found here.

]]>
https://www.rene-pickhardt.de/video-of-fosdem-talk-finally-online/feed/ 0
Slides of Related work application presented in the Graphdevroom at FOSDEM https://www.rene-pickhardt.de/slides-of-related-work-application-presented-in-the-graphdevroom-at-fosdem/ https://www.rene-pickhardt.de/slides-of-related-work-application-presented-in-the-graphdevroom-at-fosdem/#comments Sat, 02 Feb 2013 15:13:02 +0000 http://www.rene-pickhardt.de/?p=1530 Download the slidedeck of our talk at fosdem 2013 including all the resources that we pointed to.
Most important other links are:

was great talking here and again we are open source, open data and so on. So if you have suggestions or want to contribute feel free. Just do it or contact us. We are really looking forward to meet some other hackers that just want to go geek and change the world
the video of the talk can be found here:

]]>
https://www.rene-pickhardt.de/slides-of-related-work-application-presented-in-the-graphdevroom-at-fosdem/feed/ 6
The start of the Linked Data benchmark council Eu FP7 Big Data pro https://www.rene-pickhardt.de/the-start-of-the-linked-data-benchmark-council-eu-fp7-big-data-pro/ https://www.rene-pickhardt.de/the-start-of-the-linked-data-benchmark-council-eu-fp7-big-data-pro/#respond Sat, 02 Feb 2013 14:56:55 +0000 http://www.rene-pickhardt.de/?p=1524 Peter who is working for Neo4j is an industry partner of the http://www.ldbc.eu/ which is a EU FP7 Project in the Big Data call.
The goal of this project is to put out good methodologies for benchmarking linked open data and rdf stores as well as graph data bases. In this context the council should also provide data sets for benchmarking.
Peter points out that a simple problem exists with benchmarks:”who ever puts it out wins” One simple reason is that benchmarking has so many flexible variables that it is really hard. he compared the challanges to the tpc http://de.wikipedia.org/wiki/Transaction_Processing_Performance_Council
After talking about the need for good benchmarks he pointed out again why the transaction processing Performence Council Benchmarks are not sufficient anymore giving many different examples of exploding big graphs being around in the world (Facebook, Google Knowledge Graph, Linked open data, dbpedia).
Since the project is really new Peter could not report any results yet. Anyway I am pretty sure that anyone interested in graph data bases and graph data should look into the project which has the following list of deliverables

  • overvew of current graph benchmakrs and designs
  • benchmark principles and methods
  • Query Languages (Cypher, Gremlin, SPARQL)
  • Analysis and classification of Choke points (Supernodes, data generators)
  • Benchmark transactions (which are in general very slow)
  • Benchmark the complexity of queries
  • Analysis (if anyone has data sets and usecases contact the LDBC, actually I think we have data comming from related work)
  • Navigational benchmark (e.g. open streetmaps)
  • Benchmarking design for pattern matching (e.g. SPARQL and Cypher)

As you could here from beside there is a huge discussion going on about query languages which i like. Creating a query language is a tough task. The more expressive a language is (like SPARQL) the less efficient this might become. So I hope the EU project will really create some good solid output. I am also happy that many different industry vendors are part of this project. In this sense the results will hopefully be objective and don’t suffer from the “Who ever puts it on wins” paradigm.
Interestingly the LDBC makes a speration between graph data bases and rdf stores which I am very pleased to see and have been thinking a lot.

]]>
https://www.rene-pickhardt.de/the-start-of-the-linked-data-benchmark-council-eu-fp7-big-data-pro/feed/ 0
Get the full neo4j power by using the Core Java API for traversing your Graph data base instead of Cypher Query Language https://www.rene-pickhardt.de/get-the-full-neo4j-power-by-using-the-core-java-api-for-traversing-your-graph-data-base-instead-of-cypher-query-language/ https://www.rene-pickhardt.de/get-the-full-neo4j-power-by-using-the-core-java-api-for-traversing-your-graph-data-base-instead-of-cypher-query-language/#comments Tue, 06 Nov 2012 11:55:02 +0000 http://www.rene-pickhardt.de/?p=1460 As I said yesterday I have been busy over the last months producing content so here you go. For related work we are most likely to use neo4j as core data base. This makes sense since we are basically building some kind of a social network. Most queries that we need to answer while offering the service or during data mining carry a friend of a friend structure.
For some of the queries we are doing counting or aggregations so I was wondering what is the most efficient way of querying against a neo4j data base. So I did a Benchmark with quite surprising results.
Just a quick remark, we used a data base consisting of papers and authors extracted from arxiv.org one of the biggest pre print sites available on the web. The data set is available for download and reproduction of the benchmark results at http://blog.related-work.net/data/
The data base as a neo4j file is 2GB (zipped) the schema looks pretty much like that:

 Paper1  <--[ref]-->  Paper2
   |                    |
   |[author]            |[author]
   v                    v
 Author1              Author2

For the benchmark we where trying to find coauthors which is basically a friend of a friend query following the author relationship (or breadth first search (depth 2))
As we know there are basically 3 ways of communicating with the neo4j Database:

Java Core API

Here you work on the nodes and relationship objects within java. Formulating a query once you have fixed an author node looks pretty much like this.

for (Relationship rel: author.getRelationships(RelationshipTypes.AUTHOROF)){
Node paper = rel.getOtherNode(author);
for (Relationship coAuthorRel: paper.getRelationships(RelationshipTypes.AUTHOROF)){
Node coAuthor = coAuthorRel.getOtherNode(paper);
if (coAuthor.getId()==author.getId())continue;
resCnt++;
}
}

We see that the code can easily look very confusing (if queries are getting more complicated). On the other hand one can easy combine several similar traversals into one big query making readability worse but increasing performance.

Traverser Framework

The Traverser Framework ships with the Java API and I really like the idea of it. I think it is really easy to undestand the meaning of a query and in my opinion it really helps to create a good readability of the code.

Traversal t = new Traversal();
for (Path p:t.description().breadthFirst().
relationships(RelationshipTypes.AUTHOROF).evaluator(Evaluators.atDepth(2)).
uniqueness(Uniqueness.NONE).traverse(author)){
Node coAuthor = p.endNode();
resCnt++;
}

Especially if you have a lot of similar queries or queries that are refinements of other queries you can save them and extend them using the Traverser Framework. What a cool technique.

Cypher Query Language

And then there is Cypher Query language. An interface pushed a lot by neo4j. If you look at the query you can totally understand why. It is a really beautiful language that is close to SQL (Looking at Stackoverflow it is actually frightening how many people are trying to answer Foaf queries using MySQL) but still emphasizes on the graph like structure.

ExecutionEngine engine = new ExecutionEngine( graphDB );
String query = "START author=node("+author.getId()+
") MATCH author-[:"+RelationshipTypes.AUTHOROF.name()+
"]-()-[:"+RelationshipTypes.AUTHOROF.name()+
"]- coAuthor RETURN coAuthor";
ExecutionResult result = engine.execute( query);
scala.collection.Iterator it = result.columnAs("coAuthor");
while (it.hasNext()){
Node coAuthor = it.next();
resCnt++;
}
I was always wondering about the performance of this Query language. Writing a Query language is a very complex task and the more expressive the language is the harder it is to achieve good performance (same holds true for SPARQL in the semantic web) And lets just point out Cypher is quite expressive.

What where the results?

All queries have been executed 11 times where the first time was thrown away since it warms up neo4j caches. The values are average values over the other 10 executions.
  • The Core API is able to answer about 2000 friend of a friend queries (I have to admit on a very sparse network).
  • The Traverser framework is about 25% slower than the Core API
  • Worst is cypher which is slower at least one order of magnitude only able to answer about 100 FOAF like queries per second.
  • I was shocked so I talked with Andres Taylor from neo4j who is mainly working for cypher. He asked my which neo4j version I used and I said it was 1.7. He told me I should check out 1.9. since Cypher has become more performant. So I run the benchmarks over neo4j 1.8 and neo4j 1.9 unfortunately Cypher became slower in newer neo4j releases.

    One can see That the Core API outperforms Cypher by an order of magnitute and the Traverser Framework by about 25%. In newer neo4j versions The core API became faster and cypher became slower

    Quotes from Andres Taylor:

    Cypher is just over a year old. Since we are very constrained on developers, we have had to be very picky about what we work on the focus in this first phase has been to explore the language, and learn about how our users use the query language, and to expand the feature set to a reasonable level

    I believe that Cypher is our future API. I know you can very easily outperform Cypher by handwriting queries. like every language ever created, in the beginning you can always do better than the compiler by writing by hand but eventually,the compiler catches up

    Conclusion:

    So far I was only using the Java Core API working with neo4j and I will continue to do so.
    If you are in a high speed scenario (I believe every web application is one) you should really think about switching to the neo4j Java core API for writing your queries. It might not be as nice looking as Cypher or the traverser Framework but the gain in speed pays off.
    Also I personally like the amount of control that you have when traversing over the core yourself.
    Adittionally I will soon post an article why scripting languages like PHP, Python ore Ruby aren’t suitable for building web Applications anyway. So changing to the core API makes even sense for several reasons.
    The complete source code of the benchmark can be found at https://github.com/renepickhardt/related-work.net/blob/master/RelatedWork/src/net/relatedwork/server/neo4jHelper/benchmarks/FriendOfAFriendQueryBenchmark.java (commit: 0d73a2e6fc41177f3249f773f7e96278c1b56610)
    The detailed results can be found in this spreadsheet.

    ]]> https://www.rene-pickhardt.de/get-the-full-neo4j-power-by-using-the-core-java-api-for-traversing-your-graph-data-base-instead-of-cypher-query-language/feed/ 16