Collective Intelligence – Data Science, Data Analytics and Machine Learning Consulting in Koblenz Germany

What happened to Vensenya's "Changing mindset" project?

Rene — Thu, 02 Apr 2015 12:27:12 +0000

Two and a half years ago I posted about Simons project which at that time was just starting and to me still very fuzzy. Still I donated 150 Euro and asked others to do the same. It was the trust I had in him that it would be working out great even though it was still not clear how.
Today I have received an email, that the project has become much more focused and will be finally going public in September 2015. The will produce a tv series that will be published on youtube. Have a look at their trailer in German language.

Together with youngsters they produce a series about the live and problems and challenges of youngsters. They try to focus on a growing mind set approach that comes from “I can never do this” and focuses on “I will be able to do this.” The best thing is the authenticity of the project. It is done with non professional actors, cameramen, cutters. Also the equipment is borrowed. It seems that the project will get a really high quality but is kind of low budget – right in the sense of: “Of course we can do this if we really want to and we don’t need much money.”
In that spirit they have a second crowd founding campaign (which I guess is much more about publicity than really attracting money) which I warmly recommend to support:
https://socialimpactfinance.startnext.com/kaempfergeist
I will certainly keep you up to date as soon as the result will be published! But first I will send an email to Simon and ask him if it will be possible to use an open license for the material. I guess they want to earn money by licensing but still for a social and crowd founded project I think an open license would be appropriate.

Creating an award winning video doesn’t need much technology or technical know how.

Rene — Thu, 04 Dec 2014 15:18:18 +0000

After I won the community award in the Wikipedia Video contest in the category documentation and interview with my pointer in C video I would like to share some experiences on creating educational videos. This is mainly to encourage anyone to do the same as I did.
Have a look at the winning video again if you don’t remember it

In my opinion it doesn’t take much more than a real interest in education. So the video that won the award was used for me in a real teaching scenario. I only had one dry run before recording my one and only and published version of it (which still with more iterations could be a little bit shorter, more focused and slicker). Most time (about 3 hours) for the process was in planning how to present the learning content – something everyone teaching something should do anyway. The entire time it took me was less than 5 hours including planning, dryrun, recording, uploading and sharing with students.
The impact: From originally 16 Students that where participents in my class the video has been watched about 10 thousand times by now. Especially it was included in the wikipedia article on pointers and thus is hopefully a helpful resource for people interested in that topic.
Most important I did not need expensive technology. As you can see from the attached picture I did not even have a proper way of fixing my digital camera. The microphone was the internal one from that very digital camera. I used a couple of books together with a ruler to bring the camera to the correct position in order to be able to have a nice shot of the whiteboard that I was using. Other than that I used two lamps for proper light and lowered the outside courtains of the window.

What I am basically saying: Everyone who owns a camera (which most people nowadays do) can take a video and explain something. You can contribute your explaining video to the growing knowledge base on wikimedia commons. You can contribute to the ongoing discussion weather wikipedia articles should be enhanced with videos or not. Most important if you do everything like me on the whiteboard you will most certainly not run into any of the copyright problems that I ran before.
So what are you waiting for? I am sure you are an expert on something. Go and give it a shot and share your video here in the comments but also via wikimedia commons and maybe include it even within some wikipedia article that is fitting well.

About the future of Videos on Wikiversity, Wikipedia and Wikimedia Commons

Rene — Tue, 04 Nov 2014 18:47:27 +0000

In the following article I want to give an overview of the discussions and movements that are going on about video and multimedia content for wikipedia and her sister projects. I will start with some positive experiences and then tell you about some bad experiences. This article is not to wine about some edit war it is more about observing an undecided / open topic within the community of wikipedians.
During my time as a PhD student I actively contributed to open educational resources by uploading so far 52 educating videos to wikimedia commons. Some of those videos have been created together with Robert Naumann. Another share of the videos was uploaded by him. So a large fraction of those videos have been made for the web science mooc an can be found at:
https://commons.wikimedia.org/wiki/Category:Videos_for_Web_Science_MOOC_on_Wikiversity
Last week we submitted the following video to the OPERA Award, which is an award for OER video material. It was established with the goal of having more of such content.

As you can see it was selected to be the media file of the day on November 2nd on wikimedia commons (*cheering*) can anyone show me how this has happened? I was looking for the process but I did not find it.
Also I have included another video about pointers in C (in German language: Zeiger in C) into an wikipedia article.

Does wikipedia like videos within articles?

From my experience the Pointer video was removed a couple of times from the german wikipedia article related to that topic and then also brought back to the article. So it seems like there isn’t any consensus within the community yet about having videos. Interestingly enough I was asked by some wikipedians to submit my video for a video competition they are doing. So the goal of this competition is to have more content creators like me to upload their material to commons and include it into Wikipedia articles. This effort seems to be founded by money which was donated by the users. There seems to be a similar project in the english wikipedia. So well at least money is flowing towards the direction of creating more video content.
Even though these seem to be strong arguments I have the feeling that not the entire Wikipedia community supports this movement – or one could call it strategic move. 1 year ago without knowing about these kind of efforts I have tried to include some of the web science videos to wikipedia articles. For example I included the following video:

to the corresponding wikipedia article it was removed with a statement of saying this would be video SPAM which in my opinion is a little bit of an overreaction.
A summary of the discussion can be taken from my slides of my talk at the german open educational resources conference:

If you are interested you can find the entire discussion at the discussion page of the ethernet frame article

Problems for creating video Content for Commons:

Obviously there is a problem about the copyright. So for example I have pointed out in the past that creating a screencast during lecture on a Windows machine means committing a copyright violation since the the start button and the windows interface by Microsoft EULA are protected by copyright. Also in former discussions at #OER13de we agreed that it is hard to collaboratively edit videos (sorry link in german language) because the software often is not free and wikimedia commons does not support uploading the source files of the videos anyway.

Conclusion

It is not clear if video content will survive in Wikipedia even though some strategic movement is put into that idea. The people who are against this have pretty decent arguments and I also say that it is really hard to have a tool for collaboratively editing video files. If one does not have such tools even access to the source files of the videos would make it hard for people to work on this together. So I am curious to see what the competitions will bring and how the discussions on movies will evolve over time.
At least in wikiversity we are able to use our videos for teaching as we anticipated and I am pretty sure this space won’t be affected by the ongoing discussion.

Drug junkie steals my neo4j t-shirt out of my physical mailbox

Rene — Wed, 17 Jul 2013 18:51:46 +0000

Me wearing my stolen neo4j shirt which I took back from the thief

Being at FOSDEM 20013 Peter from Neo4j asked my if I would like to get a neo4j shirt send to my home adress. We have to keep in mind that i just moved back to Koblenz from China. I did not only move to Koblenz but I moved to Koblenz Lützel. I knew from my collegues that this part of Koblenz is supposed to be like the Harlem of NYC. But I found a really nice flat where I leave together with 3 other nice students. Even though I was skeptical looking at that flat I had to move there after I met my future roommates.
A couple of weeks after moving in I realized that I smelled pod more and more frequently from the streets. Especially from people smoking it in front of my front door. I had even observed people exchanging packages in the backyard of our house. Of course I cannot say for sure but even at the time of seeing this I was pretty confident that they would deal drugs. Over the last couple weeks we had several problems in our house.

People broke in our basement and stole some stuff
Another time people broke in our basement and stored bikes there which did not belong to any of our neighbors.
There is a hole from a gun in our front door.
last but not least: I was about to leave our backyard when I saw a guy wearing my neo4j shirt.

Ok let me elaborate on the last one:
Neo4j the graph database is not known as the most famous fashion brand in the world. So I hardly recognized the shirt when I saw him wear it. But I somehow recognized the design of the shirts and decided to turn around in order to get a second look at the shirt. Who in my neighborhood would wear such a shirt and what connection would he have to this rather new piece of technology?
When i turned around things got even stranger. I saw the back of the guy and his shirt said:
“My stream is faster than yours” which certainly is a link to graphity
and also displayed the Cypher Query:
“(renepickhardt) <-[:cites]- (peter)”
I was so perplex that I didn’t realize that I was alone and the guy was standing there with 2 other men. I said: “Sorry, you are wearing my shirt!” And his friends came in and told my I was crazy and how I could come up with this idea. I insisted that my name was written on the shirt. In particular my full name! Especially I knew the quote which was exactly what Peter had planned to print on my shirt.
The guys started mocking me and telling me to f… off. But I somehow resisted and pointed out again that this was certainly my shirt. At that moment the door of the Kung Fu School opened and the coach Mr. Lai came out and asked if the guys again stole packages from our post box. At that moment the guy with my shirt had to turn around again so anybody could see my name. He stared telling me some weird lie about how he got this shirt as a present and just thought it was nice looking but he finally returned it to me.
Most interestingly the police didn’t care. The policeman only said: “It’s your own fault when you move to a place like Koblenz Lützel.” I find this to be very disappointing. I always thought that our policemen should be objective and neutral. Stealing and opening other peoples mail is a crime in Germany. Also owning drugs or stealing bikes… It is said that the police refuses to help us with our situation.
Well anyway If you have an important message for me why don’t you use email rather than physical mail. My email is also potentially read by third parties but at least it is still safely delivered. Anyway big thanks to the guys from neo4j for my new shirt (:

Video of FOSDEM talk finally online

Rene — Tue, 25 Jun 2013 15:55:57 +0000

I was visiting FOSDEM 2013 with Heinrich Hartmann and talking about related-work.net the video of this talk is finally online and of course I would like to share this with the community:

The slides can be found here.

GWT + database connection in Servlet ContextListener – Auto Complete Video Tutorial Part 5

Rene — Mon, 24 Jun 2013 11:44:47 +0000

Finally we have all the basics that are needed for building an Autocomplete service and now comes the juicy part. From now on we are looking at how to make it fast and robust. In the current approach we open a new Data base connection for every HTTP request. This needs quite some time to lock the data base (at least when using neo4j in the embedded mode) and then also to run the query without having any opportunities to use the caching strategy of the data base.
In this tutorial I will introduce you to the concept of a ContextListener. This is roughly spoken a way of storing objects in the Java Servlet global memory using key value pairs. Once we understand this the roadmap is very clear. We can store objects like data base connections or search indices in the memory of our web server. As from what I currently understand this could also be used to implement some server side caching. I did not do any benchmarking yet testing how fast retrieving objects from context works in tomcat. Also this method of caching does not scale horizontally well as using memcached.
Anyway have fun learning about the context listener.

If you have any suggestions, comments or thoughts or even know of some solid benchmarks about caching using the ServletContext (I did a quick web search for a view minutes and didn’t find any) feel free to contact me and discuss this!

The best way to create an autocomplete service: And the winner is…. Giuseppe Ottaviano

Rene — Wed, 15 May 2013 13:06:42 +0000

Over one year ago I was starting to think about indexing scored stings for auto completion queries. I stumbled upon this problem after seeing the strength of the predictions of the typology approach for next word prediction on smartphones. The typology approach had one major drawback: Though its suggestions had a high precision the speed with 50 milliseconds per suggestion was rather slow especially when working on a server side application.

On August 16 th 2012 I found a first solution building on Nicolai Diethelms Suggest Tree. Though the speedup was great the suggest tree at this time had several major drawbacks (1. the amount of suggestions had to be known before building the tree 2.) large memory overhead and high redundancy 3.) no possibility of updating weights or even inserting new strings after building the tree) (the last 2 issues have been fixed just last month)
So I tried to find a solution which required less redundancy. But still for indexing Gigabytes of 5-grams we needed a persistent method. We tried Lucene and MySql in December and January. After seeing that MySQL does not provide any indices for this kind of query I decided to misuse multidimensional trees of MySQL in a highly redundant way to somehow be able to evaluate the strength of typology on large data sets with gigabytes of n-grams. Creating one of the dirtiest hacks in my life I could at least handle the data but the solution was rather engineered and consisted of throwing hardware at the problem.
After Christoph tried to solve this using bitmap indices which was quite fast but had issues with scaling and index maintainability we had a discussion and finally the solution popped in my mind in the beginning of march this year.

Even though I was thinking of scored tries before they always lacked the problem that they could only find the top-1 element efficiently. Then I realized that one had to sort the children of a node by score and use a priority queue during retrieval. In this way one would get the maximum possible runtime I was doing this in a rather redundant way because I was aiming for fast prefix retrieval of the trie node and then fast retrieval of the top children.
After I came up with my solution and after talking to Lucene contributers from IBM in Haifa I realized that Lucene had a pretty similar solution as a less popular “hidden feature” which I tested. Anyway in my experiment I also experienced a large memory overhead with the Lucene solution so my friend Heinrich and me started to develop my trie based solution and benchmark it with various baselines in order to produce a good solid output.
The developement started last month and we had quite some progress. Our goal was always to be about as fast as Nicolai Diethelms suggest tree but not running into all the drawbacks of his solution. In our coding session yesterday we realized that Nicolai improved his data structure a lot by getting rid of his memory overhead and also being able to update, insert and delete new items to his index (still the amount of suggestions has to be known before the tree was build)
Yet while learning more about the ternary tree data structure he used to build up his solution I found a paper that will be presented TODAY at WWW conference. Guess what: Independently of us Giuseppe Ottaviano explains in Chapter 4 the exact solution and algorithm that I came up with this march. Combined with an efficient implementation of the tries and many compression techniques (even respecting cache locality of the processor) he even beats Nicolai Diethelms suggest tree.
I looked up Giuseppe Ottaviano and the only thing two things I have to say are:

Congratulations Giuseppe. You really worked on that kind of problems for a long time an created an amazing paper. This is also reflected by the related work section and all the small details that are in your paper which we were still in the process of figuring out.
If anyone needs an auto completion service this is the way to go. Being able to provide suggestions from a dictionary with 10 Mio. entries in a few micro seconds (yes micro not milli!) means that a single computer can handle about 100’000 requests per second which is certainly web scale. Even the updated suggest tree by Nicolai is now the way to go and maybe much easier to use since it is java based and not C++ and the full code is open sourced.

Ok so much for the history of events and the congratulations to Giuseppe. I am happy to see that the algorithm really performs that well but there is one little thing that really bothers me a lot:

How come our community of researchers hasn’t come up with a good way of sharing credits to a person like me who came up independently with the solution? As for me I feel that the strongest chapter of my dissertation just collapsed and one year of research just burnt away. I mean personally I gained and learnt a lot from it but from a carrier point of view this seems rather like a huge drawback.

Anyway life goes on and by thinking about the trie based solution we already came up with a decent list of future work which we can most certainly use for follow up work and I will certainly contact the authors maybe a collaboration in future will be possible.

Building an Autocompletion on GWT screencast Part 1: Getting Warm – Reviewing remote procedure calls

Rene — Tue, 19 Feb 2013 09:11:29 +0000

Quite a while ago I promised to create some screencasts on how to build a (personalized) Autocompletion in GWT. Even though the screencasts have been created for quite some time now I had to wait publishing them for various reasons.
Finally it is now the time to go public with the first video. I do really start from scratch. So the first video might be a little bit boaring since I am only reviewing the Remote Procedure calls of GWT.
A litte Note: The video is hosted on Wikipedia! I think it is important to spread knowledge under a creative commons licence and the youtubes, vimeos,… of this world are rather trying to do a vendor lock in. So If the embedded player is not so well you can go directly to wikipedia for a fullscreen version or direct download of the video.

Another note: I did not publish the source code! This has a pretty simple reason (and yes you can call me crazy): If you really want to learn something, copying and pasting code doesn’t help you to get the full understanding. Doing it step by step e.g. watching the screencasts and reproducing the steps is the way to go.
As always I am open to suggestions and feedback but please have in mind that the entire course of videos is already recorded.

Slides of Related work application presented in the Graphdevroom at FOSDEM

Rene — Sat, 02 Feb 2013 15:13:02 +0000

Download the slidedeck of our talk at fosdem 2013 including all the resources that we pointed to.
Most important other links are:

was great talking here and again we are open source, open data and so on. So if you have suggestions or want to contribute feel free. Just do it or contact us. We are really looking forward to meet some other hackers that just want to go geek and change the world
the video of the talk can be found here:

The start of the Linked Data benchmark council Eu FP7 Big Data pro

Rene — Sat, 02 Feb 2013 14:56:55 +0000

Peter who is working for Neo4j is an industry partner of the http://www.ldbc.eu/ which is a EU FP7 Project in the Big Data call.
The goal of this project is to put out good methodologies for benchmarking linked open data and rdf stores as well as graph data bases. In this context the council should also provide data sets for benchmarking.
Peter points out that a simple problem exists with benchmarks:”who ever puts it out wins” One simple reason is that benchmarking has so many flexible variables that it is really hard. he compared the challanges to the tpc http://de.wikipedia.org/wiki/Transaction_Processing_Performance_Council
After talking about the need for good benchmarks he pointed out again why the transaction processing Performence Council Benchmarks are not sufficient anymore giving many different examples of exploding big graphs being around in the world (Facebook, Google Knowledge Graph, Linked open data, dbpedia).
Since the project is really new Peter could not report any results yet. Anyway I am pretty sure that anyone interested in graph data bases and graph data should look into the project which has the following list of deliverables

overvew of current graph benchmakrs and designs
benchmark principles and methods
Query Languages (Cypher, Gremlin, SPARQL)
Analysis and classification of Choke points (Supernodes, data generators)
Benchmark transactions (which are in general very slow)
Benchmark the complexity of queries
Analysis (if anyone has data sets and usecases contact the LDBC, actually I think we have data comming from related work)
Navigational benchmark (e.g. open streetmaps)
Benchmarking design for pattern matching (e.g. SPARQL and Cypher)

As you could here from beside there is a huge discussion going on about query languages which i like. Creating a query language is a tough task. The more expressive a language is (like SPARQL) the less efficient this might become. So I hope the EU project will really create some good solid output. I am also happy that many different industry vendors are part of this project. In this sense the results will hopefully be objective and don’t suffer from the “Who ever puts it on wins” paradigm.
Interestingly the LDBC makes a speration between graph data bases and rdf stores which I am very pleased to see and have been thinking a lot.