GRAPHITY – Data Science, Data Analytics and Machine Learning Consulting in Koblenz Germany https://www.rene-pickhardt.de Extract knowledge from your data and be ahead of your competition Tue, 17 Jul 2018 12:12:43 +0000 en-US hourly 1 https://wordpress.org/?v=4.9.6 Graphity Server for social activity streams released (GPLv3) https://www.rene-pickhardt.de/graphity-server-for-social-activity-streams-released-gplv3/ https://www.rene-pickhardt.de/graphity-server-for-social-activity-streams-released-gplv3/#comments Mon, 02 Sep 2013 07:11:22 +0000 http://www.rene-pickhardt.de/?p=1753 It is almost 2 years over since I published my first ideas and works on graphity which is nowadays a collection of algorithms to support efficient storage and retrieval of more than 10k social activity streams per second. You know the typical application of twitter, facebook and co. Retrieve the most current status updates from your circle of friends.
Today I proudly present the first version of the Graphity News Stream Server. Big thanks to Sebastian Schlicht who worked for me implementing most of the Servlet and did an amazing job! The Graphity Server is a neo4j powered servlet with the following properties:

  • Response times for requests are usually less than 10 milliseconds (+network i/o e.g. TCP round trips coming from HTTP)
  • The Graphity News Stream Server is a free open source software (GPLv3) and hosted in the metalcon git repository. (Please also use the bug tracker there to submit bugs and feature requests)
  • It is running two Graphity algorithms: One is read optimized and the other one is write optimized, if you expect your application to have more write than read requests.
  • The server comes with an REST API which makes it easy to hang in the server in whatever application you have.
  • The server’s response also follows the activitystrea.ms format so out of the box there are a large amount of clients available to render the response of the server.
  • The server ships together with unit tests and extensive documentation especially of the news stream server protocol (NSSP) which specifies how to talk to the server. The server can currently handle about 100 write requests in medium size (about a million nodes) networks. I do not recommend to use this server if you expect your user base to grow beyond 10 Mio. users (though we are working to get the server scaling) This is mostly due to the fact that our data base right now won’t really scale beyond one machine and some internal stuff has to be handled synchronized.

Koding.com is currently thinking to implement Graphity like algorithms to power their activity streams. It was for Richard from their team who pointed out in a very fruitfull discussion how to avoid the neo4j limit of 2^15 = 32768 relationship types by using an overlay network. So his ideas of an overlay network have been implemented in the read optimized graphity algorithm. Big thanks to him!
Now I am relly excited to see what kind of applications you will build when using Graphity.

If you’ll use graphity

Please tell me if you start using Graphity, that would be awesome to know and I will most certainly include you to a list of testimonials.
By they way if you want to help spreading the server (which is also good for you since more developer using it means higher chance to get newer versions) you can vote up my answer in stack overflow:
http://stackoverflow.com/questions/202198/whats-the-best-manner-of-implementing-a-social-activity-stream/13171306#13171306

How to get started

its darn simple!

  1. You clone the git repository or get hold of the souce code.
  2. then switch to the repo and type sudo ./install.sh
  3. copy the war file to your tomcat webapps folder (if you don’t know how to setup tomcat and maven which are needed we have a detailed setup guide)
  4. and you’re done more configuration details are in our README.md!
  5. look in the newswidget folder to find a simple html / java script client which can interact with the server.
I also created a small simple screen cast to demonstrate the setup: 

Get involved

There are plenty ways to get involved:

  • Fork the server
  • commit some bug report
  • Fix a bug
  • Subscribe to the mailing list.

Furhter links:

]]>
https://www.rene-pickhardt.de/graphity-server-for-social-activity-streams-released-gplv3/feed/ 5
Drug junkie steals my neo4j t-shirt out of my physical mailbox https://www.rene-pickhardt.de/drug-junkie-steals-my-neo4j-t-shirt-out-of-my-physical-mailbox/ https://www.rene-pickhardt.de/drug-junkie-steals-my-neo4j-t-shirt-out-of-my-physical-mailbox/#comments Wed, 17 Jul 2013 18:51:46 +0000 http://www.rene-pickhardt.de/?p=1659
Me wearing my stolen neo4j shirt which I took back from the thief

Being at FOSDEM 20013 Peter from Neo4j asked my if I would like to get a neo4j shirt send to my home adress. We have to keep in mind that i just moved back to Koblenz from China. I did not only move to Koblenz but I moved to Koblenz Lützel. I knew from my collegues that this part of Koblenz is supposed to be like the Harlem of NYC. But I found a really nice flat where I leave together with 3 other nice students. Even though I was skeptical looking at that flat I had to move there after I met my future roommates.
A couple of weeks after moving in I realized that I smelled pod more and more frequently from the streets. Especially from people smoking it in front of my front door. I had even observed people exchanging packages in the backyard of our house. Of course I cannot say for sure but even at the time of seeing this I was pretty confident that they would deal drugs. Over the last couple weeks we had several problems in our house.

  • People broke in our basement and stole some stuff
  • Another time people broke in our basement and stored bikes there which did not belong to any of our neighbors.
  • There is a hole from a gun in our front door.
  • last but not least: I was about to leave our backyard when I saw a guy wearing my neo4j shirt.

Ok let me elaborate on the last one:
Neo4j the graph database is not known as the most famous fashion brand in the world. So I hardly recognized the shirt when I saw him wear it. But I somehow recognized the design of the shirts and decided to turn around in order to get a second look at the shirt. Who in my neighborhood would wear such a shirt and what connection would he have to this rather new piece of technology?
When i turned around things got even stranger. I saw the back of the guy and his shirt said:
My stream is faster than yours” which certainly is a link to graphity
and also displayed the Cypher Query:
(renepickhardt) <-[:cites]- (peter)”
I was so perplex that I didn’t realize that I was alone and the guy was standing there with 2 other men. I said: “Sorry, you are wearing my shirt!” And his friends came in and told my I was crazy and how I could come up with this idea. I insisted that my name was written on the shirt. In particular my full name! Especially I knew the quote which was exactly what Peter had planned to print on my shirt.
The guys started mocking me and telling me to f… off. But I somehow resisted and pointed out again that this was certainly my shirt. At that moment the door of the Kung Fu School opened and the coach Mr. Lai came out and asked if the guys again stole packages from our post box. At that moment the guy with my shirt had to turn around again so anybody could see my name. He stared telling me some weird lie about how he got this shirt as a present and just thought it was nice looking but he finally returned it to me. 
Most interestingly the police didn’t care. The policeman only said: “It’s your own fault when you move to a place like Koblenz Lützel.” I find this to be very disappointing. I always thought that our policemen should be objective and neutral. Stealing and opening other peoples mail is a crime in Germany. Also owning drugs or stealing bikes… It is said that the police refuses to help us with our situation.
Well anyway If you have an important message for me why don’t you use email rather than physical mail. My email is also potentially read by third parties but at least it is still safely delivered. Anyway big thanks to the guys from neo4j for my new shirt (:

]]>
https://www.rene-pickhardt.de/drug-junkie-steals-my-neo4j-t-shirt-out-of-my-physical-mailbox/feed/ 1
Metalcon finally gets a redesign – Thinking about high scalability https://www.rene-pickhardt.de/metalcon-finally-becomes-a-redesign-thinking-about-high-scalability/ https://www.rene-pickhardt.de/metalcon-finally-becomes-a-redesign-thinking-about-high-scalability/#comments Mon, 17 Jun 2013 15:21:30 +0000 http://www.rene-pickhardt.de/?p=1631 Finally metalcon.de the social networking site which Jonas, Jens and me created in 2008 gets a redesign. Thanks to the great opportunities at the Institute for Web Science and Technologies here in Koblenz (why don’t you apply for a PhD position with us?) I will have the chance to code up the new version of metalcon. Kicking off on July 15th I will lead a team of 5 programmers for the duration of 4 months. Not only will the development be open source but during this time I will constantly (hopefully on a daily basis) write in this blog about the design decisions we took in order to achieve a good scaling web service.
Before I share my thoughts on high scaling architectures for web sites I want to give a little history and background on what metalcon is and why this redesign is so necessary:

Metalcon is a social networking site for german fans of metal music. It currently has

  • a user base of 10’000 users.
  • about 500 registered bands
  • highly semantic and interlinked data base (bands, geographical coordinates, friendships, events)
  • 624 MB of text and structured data about the mentioned topics.
  • fairly good visibility in search engines.
  • > 30k lines of code (mostly PHP)
  • a bad scaling architecture (own OR-mapper, own AJAX libraries, big monolithic data base design, bad usage of PHP,…)
  • no unit tests (so code maintenance is almost impossible)
  • no music and audio files
  • no processes for content moderation
  • no processes to fight spam and block users
  • a really bad usability (I could write tons of posts at which points the usability lacks)
  • no clear distinction of features for users to understand

When we built metalcon no one on the team had experience with high scaling web applications and we were about happy to get it running any way. After returning from china and starting my PhD program in 2011 I was about to shut down metalcon. Though we became close friends the core team was already up on new projects and we have been lacking manpower. On the other side everyone kept on telling me that metalcon would be a great place to do research. So in 2011 Jonas and me decided to give it another shot and do an open redevelopment. We set up a wiki to document our features and the software and we created a developer blog which we used to exchange ideas. Also we created some open source project to which we hardly contributed code due to the lacking manpower…
Well at that time we already knew of too many problems so that fixing was not the way to go. At least we did learn a lot. Thinking about high scaling architectures at that time I new that a news feed (which the old version of metalcon already had) was very core for the user experience. Reading many stack exchange discussions I knew that you wouldn’t build such a stream on MySQL. Also playing around with graph databases like neo4j I came to my first research paper building graphity a software which is designed to distribute highly personalized news streams to users. Since our development was not proceeding we never deployed Graphity within metalcon. Also building an autocomplete service for the site should not be a problem anymore.

Roadmap for the redesign

  • Over the next weeks I hope to read as many interesting articles about technologies and high scalability as I can possibly find and I will be more than happy to get your feedback and suggestions here. I will start reading many articles of http://highscalability.com/ This blog is pure gold for serious web developers. 
  • During a nice discussion about scalability with Heinrich we already came up with a potential architecture of metalcon. I will soon introduce this architecture but want to check first about the best practices in the high scalability blog.
  • In parallel I will also collect the features needed for the new metalcon version and hopefully be able to pair them with usefull technologies. I already started a wikipage about features and planned technologies to support them.
  • I will also need to decide the programming language and paradigms for the development. Right now I am playing around with ruby on rails vs GWT. We made some greate experiences with the power of GWT but one major drawback is for sure that the website is more an application than some lightweight website.

So again feel free to give input, share your ideas and experiences with me and with the community. I will be ver greatfull for every recommendation of articles, videos, books and so on.

]]>
https://www.rene-pickhardt.de/metalcon-finally-becomes-a-redesign-thinking-about-high-scalability/feed/ 10
Open access and data from my research. Old resources for various topics finally online. https://www.rene-pickhardt.de/open-access-and-data-from-my-research-old-resources-for-various-topics-finally-online/ https://www.rene-pickhardt.de/open-access-and-data-from-my-research-old-resources-for-various-topics-finally-online/#respond Mon, 05 Nov 2012 05:19:53 +0000 http://www.rene-pickhardt.de/?p=1430 Being strong pro on the topic of open access I always try to publish all my work on my blog but sometimes I am busy or I forget to update so today I took the time to look at all my old drafts and the stuff that hasn’t been published yet. So here is a list of new content on my blog that should have been published long ago I also linked it in the articles of interest:

In the last month I have created quite some content for my blog and it will be published over the next weeks. So watch out for screen casts how to create an autocompletion in gwt with neo4j, how to create ngrams from wikipedia, thoughts and techniques for related work, reasearch ideas and questions that we found but probably have not the time to work on

]]>
https://www.rene-pickhardt.de/open-access-and-data-from-my-research-old-resources-for-various-topics-finally-online/feed/ 0
How to do a presentation in China? Some of my experiences https://www.rene-pickhardt.de/how-to-do-a-presentation-in-china-some-of-my-experiences/ https://www.rene-pickhardt.de/how-to-do-a-presentation-in-china-some-of-my-experiences/#comments Fri, 02 Nov 2012 08:33:22 +0000 http://www.rene-pickhardt.de/?p=1432 So the culture is different from Western culture we all know that! I am certainly not an expert on China but after living in China for almost 2 years knowing some language and working in a chinese company seeing presentations every week and also visiting over 30 western and chinese companies placed in China I think I have some insights about how you should organize your presentation in China.
Since I recently went to Shanghai in order to to research exchange with Jiaotong University I was about to give a presentation to introduce my institute and me. So here you can find my rather uncommon presentation and some remarks, why some slides where designed in the way they are.
http://www.rene-pickhardt.de/wp-content/uploads/2012/11/ApexLabIntroductionOfWeST.pdf

Guanxi – your relations

First of all I think it is really important to understand that in China everything is related to your relations (http://en.wikipedia.org/wiki/Guanxi). A chinese business card will always name a view of your best and strongest contacts. This is more important than your adress for example. If a conference starts people exchange namecards before they sit down and discuss.
This principle of Guanxi is also reflected in the style presentations are made. Here are some basic rules:

  • Show pictures of people you worked together with
  • Show pictures of groups while you organized events
  • Show pictures of the panels that run events
  • Show your partners (for business not only clients but also people you are buying from or working together with in general)

My way of respecting these principles:

  • I first showed a group picture of our institute!
  • I also showed for almost every project where I could get hold of it pictures of the people that are responsible for the project
  • I did not only show the European research projects our university is in but listed all the different partners and showed logos of them

Family

The second thing is that in China the concept of family is very important. I would say as a rule of thumb if you want to make business with someone in china and you havent been introduced to their family things are not going like you might expect this.
For this reason I have included some slides with a worldmap going further down to the place where I was born and where I studied and where my parents still leave!

Localizing

When I choosed a worldmap I did not only take one with Chinese language but I also took one where china was centered. In my contact data I also put chinese social networks. Remember Twitter, Facebook and many other sites are blocked in China. So if you really want to communicate with chinese people why not getting a QQ number or weibo account?

Design of the slides

You saw this on conferences many times. Chinese people just put a hack a lot of stuff on a slide. I strongly believe this is due to the fact that reading and recognizing Chinese characters is much  faster than western characters. So if your presentation is in Chinese Language don’t be afraid to stuff your slides with information. I have seen many talks by Chinese people that where literally reading word by word what was written on the slides. Where in western countries this is considered bad practice in China this is all right. 

Language

Speaking of Language: Of course if you know some chinese it shows respect if you at least try to include some chinese. I split my presentation in 2 parts. One which was in chinese and one that was in english.

Have an interesting take away message

So in my case I included the fact that we have PhD positions open and scholarships. That our institut is really international and the working language is english. Of course I also included some slides about my past and current research like Graphity and Typology

During the presentation:

In China it is not rude at all if ones cellphone rings and one has more important stuff to do. You as presenter should switch of your phone but you should not be disturbed or annoyed if people in the audience receive phone calls and go out of the room doing that business. This is very common in China.
I am sure there are many more rules on how to hold a presentation in China and maybe I even made some mistakes in my presentation but at least I have the feeling that the reaction was quite positiv. So if you have questions, suggestions and feedback feel free to drop a line I am more than happy to discuss cultural topics!

]]>
https://www.rene-pickhardt.de/how-to-do-a-presentation-in-china-some-of-my-experiences/feed/ 3
Submission history of my first academic research paper (graphity at socialcom 2012) https://www.rene-pickhardt.de/submission-history-of-my-first-academic-research-paper-graphity-at-socialcom-2012/ https://www.rene-pickhardt.de/submission-history-of-my-first-academic-research-paper-graphity-at-socialcom-2012/#respond Tue, 10 Jul 2012 09:29:38 +0000 http://www.rene-pickhardt.de/?p=1387 My graph index graphity was – as mentioned in another blogpost – accepted at socialcom 2012. After I explained how it works and sharted the source code I now want to share some information about the history of submissions, reviews, quality of reviews, taken actions and so on. So if you are a coder, hacker, geek or what so ever just skip this post and digg into the source code of graphity. If you are a researcher you might enjoy learning from my experiences.

2011 April: Defining the research question

Wondering about possible research topics many people told me that running a social network like metalcon and seeing so much data would be like treasure for research questions. Even though at that time our goal was just to keep metalcon running in its current state I startet thinking about what research questions could be asked. For most questions we just did not collect good data. In this sense I realized if metalcon should really give rise to research questions I would have to reimplement big parts of it.
Since metalcon also had the problems of slow page loading times I decided to efficiently recode the project. I new that running metalcon on a graph data base like neo4j would be a good idea. That is why I started wondering about how to implement a newsfeed system efficiently in a scalable manner using a graph data base.
==> A research question was formulated.

2011 Mid August: Solution to the problem was

After 3 months of hard thinking and trying out different data base designs I had a meeting with my potential supervisor Prof. Dr. Steffen Staab. He would have loved to see some prelimenary results. While summing up everything for the meeting the solution or better said the idea of Graphity was suddenly clear before my eyes.
In late august I talked about the ideas in the oberseminar and I presented a poster at the european summer school of information retrieval. I wondered about where to submit this.
Steffen thought that the idea was pretty smart and could be published in a top journal like VLDB which he suggested me to choose. He said it would be my first scientific work and VLDB has a very short reviewing cycle of less than one month that could provide me fast feedback. A pieve of advice which I did not understand and also did not follow. Well, sometimes you have to learn the hard way…
Being confident about the quality of the index also due to Steffens positive feedback and plans for VLDB my plan was rather to submit to WWW conference. I thought social networks would be a relevant topic to this conference. Steffen agreed but pointed out that he was a program chair for that conference which would mean that submissions from his own institute would not be the best idea.
Even though graphity was not implemented or evaluated yet and no related work was read we decided that the paper should be submitted to SIGMOD conference. With an upcoming deadline of only 2 months later.
By the way having these results and starting this hard work gave me founding for 3 years starting in october 2012!

2011 november: Submission to sigmod

After two months of very hard work especially together with Jonas Kunze a physicist from metalcon from whom I really learnt a lot about setting up software and evaluation frameworks the paper was submitted to sigmod. Meanwhile I tried to get an overview of the related work (which at that time was kind of the hardest part) since I really did not know how to start and my research question came from a very practical and relevant usecase but did not emerge after reading many papers.
Anyway we finished our submission just in time together with all the experiments and a – I would still say decent – text about graphity. (you can have the final copy of the sigmod publication on request)

2011 middle of November: Blogpost with best content from the paper

I talked back to my now Supervisor Steffen and asked him what he thought about blogging the results of graphity already trying to get some awareness in the community. He was a little skeptical since blogs can not really be cited and the paper was not published yet. I argued that blogs are becoming more and more importent in research. I said even if a blogpost is not publication in the scientific sense it still is a form of publication and gains visability. Steffen agreed to test this out and I have to say the idea to do this was perfect. I put the core results on my blog received very positiv feedback from people working at linkedin, microsoft and other hackers. Also I realized that the problem was currently unpublished / unsolved and relevant to many people.
Interestingly one of my co-authors tried to persue me to publish the sigmod version of the paper as a technical report. I did not get the point of doing so (he said something like, then it is officially published and no one can steal the idea…” Up to today I don’t get the point of doing this other then manipulating ones official publication count…)

2012 February: Reject from Sigmod

After all the strong feedback on my blog and via mail the reviews from SIGMOD conference where quite disappointing. Every single reviewer highly valued the idea and the relevance of the problem. But they criticized the evaluation and the related work.

  • For the evaluation they criticized that using a memory disk is manipulating and the distributing and scaling behaviour could not be seen by theoretical and emperical proves of some big O notations.
  • As for the related work they where missing some papers especially from the data base community and as I said related work was definately one of the weaknesses of the paper.
  • Other feedback was on our notation and naming for some baselines.

The most disappointing of all the feedback was the following quote:

W1: I cannot find any strong and novel technical contribution in their algorithms. Their proposed method is too simple and straightforward. It is simply an application of using doubly linked list to graph structures.

To give an answer to this at least once: Yes the algorithm is simple and only based on double linked lists. BUT it is in the best complexity class one can imagine for the problem it scales perfectly and this was prooven. How can one throw away an answer because it is too simple if it is the best possible answer? I sense that this is one of the biggest differences between a mathematician and computer scientist. In particular the same reviewer pointed out that the problem was relevant and important.
Having this feedback we came to the conclusion that the database community might have been the wrong community for the problem. Again we would have figured this out much faster if submitting to VLDB like Steffen had suggested.

2012 February: Resubmission to hypertext

Only 1 week away from the feedback was the submission for the hypertext conference. Since one week was not really much time we decided to not implement any of the feedback and just check what another community would say. My supervisor Steffen was not really convinced of this idea but we argued that the notification of hypertext conference was only one month later and this quick and aditional feedback might be of help.

2012 late march: Reject from Hypertext

The reviews from hypertext conference have been rather strange. One strong accept but the reviewer said he was not an expert in the field. One strong reject of a person that did not understand the correctness of our top k n way merge (which was not even core to the paper) and a borderline from one reviewer who really had some interesting comments on our terminology and the related work.
Overall the paper was accepted as a poster presentation and we decided that this was not sufficient for our work so we withdraw our submission.

2012 mai resubmission to socialcom

Discussions rose up which conference we should target now. We have been sure that the style of the evaluation was for sure nothing for the data base community. On the other hand graph data bases alone would not be sufficient for semantic conferences and social is such a hype right now that we really were not sure.
Steffen would have liked to submit the paper to ISWC since this is an important community for our institue. I suggested SocialCom since the core of our paper was really lying in social networking. All reviews so far have valued the problem as important and relevant for social networks and people basically liked the idea.
We tried to find related work for the used baselines and figured out that for our strongest baseline we could not find any related work. 3 days before the submission to Socialcom Steffen argued that we should drop the name of the paper and not call it graphity anymore. He said that we just sell the work as two new indices for social news streams (which I thought was kind of reasonable since the baseline to graphity really makes sense to use) and was not presented in any related work. Also we could enhance the paper with some feedback from my blog and the neo4j mailinglist. The only downside of this strategy was that changing the title and changing the story line would yield to a complete rewrite of the paper. Something I was not to keen of doing within 3 days. My coauther and I were convinced that we should stick to our changes from working in the feedback and stick to our argument for the baseline without related work.
We talked back to Steffen. He said he left the final decission to us but would strongly recommend to change the storyline of the paper and drop the name. Even though I was not 100% convinced and my coauthor also did not want to rewrite the paper I decided to follow Steffens experience. We rewrote the story line.

2012 July accept at SocialCom

The paper was accepted as a full paper to SocialCom. So following Steffens advice was a good idea. Also not getting down by bad reviews from the wrong community and staying convinced that the work had some strong results turned out to be a good thing. I am sure that posting preliminary results on the blog was really nice. In this way we could integrate open accessable feedback from the community to our third submission. I am excited how much I will have learnt from this experience with later submissions.
I will publish the paper on the blog as soon as the camera ready version is done! Subscribe to my newsleter, RSS or twitter if you don’t want to miss the final version of the paper!

]]>
https://www.rene-pickhardt.de/submission-history-of-my-first-academic-research-paper-graphity-at-socialcom-2012/feed/ 0
Graphity source code and wikipedia raw data is online (neo4j based social news stream framework) https://www.rene-pickhardt.de/graphity-source-code/ https://www.rene-pickhardt.de/graphity-source-code/#comments Mon, 09 Jul 2012 15:43:57 +0000 http://www.rene-pickhardt.de/?p=1377 UPDATE: there is now the source code of an entire graphity server application online!
8 months ago I posted the results of my research about fast retrieval of social news feeds and in particular my graph index graphity. The index is able to serve more than 12 thousand personalized social news streams per second in social networks with several million active users. I was able to show that the system is independent of the node degree or network size. Therefor it scales to graphs of arbitrary size.
Today I am pleased to anounce that our joint work was accepted as a full research paper at IEEE SocialCom conference 2012. The conference will take place in early September 2012 in Amsterdam. As promised before I will now open the source code of Graphity to the community. Its documentation could / and might be improved in future also I am sure that one is even able to use a better data structure for our implementation of the priority queue.
Still the attention from the developer community for Graphity was quite high so maybe the source code is of help to anyone. The source code consists of the entire evaluation framework that we used for our evaluation against other baselines which will also help anyone to reproduce our evaluation.
There is some nice things one can learn in setting up multthreading for time measurements and also how to set up a good logging mechanism.
The code can be found at https://github.com/renepickhardt/graphity-evaluation and the main Algorithm should lie in the file:
https://github.com/renepickhardt/graphity-evaluation/blob/master/src/de/metalcon/neo/evaluation/GraphityBuilder.java
other files of high interest should be:

I did not touch it again over the last couple months and it really has a lot of debugging comments inside. My appologies for this bad practice. I hope you can oversee this by having in mind that I am a mathematician and this was one of my first bigger evaluation projects. In my own interest I promise next time I produce code that will be easier to read / understand and reuse.
Still if you have any questions suggestions or comments feel free to contact me.
The raw data is can be downloaded at:

the format of these files is straight foward:
de-nodeIs.txt has first some ID then a tab and then the title of the wikipedia article this is just necessary if you want to display your data with titles rather than names.
the interesting file is the de-events.log in this file there are 4 columns
timestamp TAB FromNodeID TAB [ToNodeID] TAB U/R/A
So every line tells exactly when an article FromNodeID changes. if only 3 collumns are available and an U is written then the article just changed. Maybe links in the article changed in this case there exists another nodeID in the 3 column and an A or a R for add or remove respectively.
I think processing these files is rather straight forward. With this file you can totally simulate the growth of wikipedia over time. The file is sorted by the 2. column. If you want to use it in our evaluation framework you should sort this by the first column. This can be done on a unix shell in less than 10 minutes with the sort command.
Sorry I cannot publish the paper right now on my blog yet since the camera ready version has to be prepared and checked in to IEEE. But follow me on twitter or subscribe to my newsletter so I can let you know as soon as the entire paper as a pdf is available.

]]>
https://www.rene-pickhardt.de/graphity-source-code/feed/ 7
My first PhD year summerized: What a great choice of mine! https://www.rene-pickhardt.de/my-first-phd-year-summerized-what-a-great-choice-of-mine/ https://www.rene-pickhardt.de/my-first-phd-year-summerized-what-a-great-choice-of-mine/#comments Wed, 28 Dec 2011 13:48:11 +0000 http://www.rene-pickhardt.de/?p=992 2011 is almost over and more than 9 months of my PhD have already passed by. During my math diploma I was founded by the german national academic foundation. Besides some really nice benefits that came along with this every 6 months I was forced to write reports about my study progress. Even though theses reports were sometimes quite annoying I realized that they are a good method for oneself to focus and work more efficient. That is why I decided to continue writing these reports. This time just in english and for a wider audience.
So here is the layout for this longer article:

Things I have done in 2011

  • I started my PhD time Koblenz on a scholarship and I felt I almost had to much freedom. Noone to report to. I have to admit in the beginning it was hard to focus with that much freedom.
  • I attended the TET workshop where I have learned some techniques and methods about design thinking. What a great subject and topic. I also met some students from WHU which was also nice!
  • I was allowed to visit the Webscie summer school at DERI in NUI Galway Ireland. That was really fantastic.
  • I attended the future music camp where I learned a lot about the business and why I don’t think I fit there. Especially I ran a session on bandpage SEO.
  • Since my university organized it I also attended the European Summer school on Information Retrieval. It was nice since I had my first poster presentation which reminded me to my old days in school when I attended Jugend Forscht.
  • I quit my scholarship and moved to a three year contract as a research assistant. That was nice from a money perspective but in particularly I wanted to have teaching responsabilities and the safty to be founded longer than 2 years.
  • I had the idea for my first paper graphity and I conducted an evaluation and created the paper in a team of 5 people.
  • I was attending the social sensor kick off meeting in Thessaloniki.
  • For the second time I touht a class at “Deutsche Schüler Akademie”. This time with students from 5 different countries and only 20% native German speakers.
  • I am supervising a Jugend Forscht project of 2 highly gifted and talented high school students which also uses neo4j and works on software to improove typing.
  • I learned more about the impact of social networks by createing the in legend facebook streaming app
  • I am advising a bachlor thesis on graph data bases and linked open data.
  • I became a most valued blogger for dzone
  • I had my first blog article or thing on the internet that became kind of viral.
  • I read tim berners lee’s book on weaving the web

Things that I have learned

  • It took some time but I got to know computer scientists and their culture / way of thinking
  • By now I am finnaly less afraid of programming.
  • I am also less afraid of using, configuring and fixing linux
  • Even though there is way of improvement I have a much more structured aproach to getting things done (especially writing paper)
  • I realized the power of blogging. It is amazing how much feedback you receive if you share your thoughts. You also get to know better recources and get to know people! It is really amazing how much reach a blog can create and how much it can grow. I am really excited to see where this will be going! And I encourage anyone to start blogging!
  • I have gained more background on internet technology (protocols / technologies / general understanding)
  • Reading the law might help more than talking to a lawyer
  • There is a lot of diplomancy during teamwork and it is really good if one (not neccessarily onself) is able to apply it.
  • Smart and creative ideas are very appreciated in university.
  • Amazingly motivating people is still one of my greatest assets.
  • I kind of understand how EU founded research projects are applied for and how they are working and where a lot of money in our institute comes from.
  • I am becoming more familiar how the system within computer science works
  • I made the experience how (at least theoretically easy) it is to create a paper
  • Lenovo thinkpads are just amazing + having the suitable business notebook really makes you move it everywhere and work on your stuff. 8 hours of battery are just perfect! (no I am not sponsored by lenovo but honestly it is the first notebook I am literally taking everywhere and it just works fine)
  • Diversity is the key to everything. Diversity in teams and between human beeings in general will almost every time lead to the most amazing things
  • Unfortunatly I am mentally not as flexible / dynamic fast learning as I used to be when I was younger (thinking in used patterns seems to be very comfortable)
  • Mathmaticians really have an amazing ability to understand complex abstract concepts in any context. They are able to generalzie almost everything.
  • It is incredibly easy to become an authority or gain social proof while making statements on something. In particular it is interesting how much more credible things are if someone else gives you trust (e.g. being cited or being invited to give a talk)
  • I increased my marketing knowledge and experience in how to create a brand
  • If you want to be a successfull enterpreneur or company focus on great products and outstanding service! This is how you beat the market. Marketing is not about selling and promoting stuff. it is about having the best product / service and communicate this in a smart way….
  • I understood many different levels of the general information retrieval problem. What different levels of search exist. The concept of information need. In particular I understood the many non technical challanges of this problem and the problems of language and semantics.
  • I finnaly realized why social networks should not be monetarized via advertising (almost no click through rates on banner ads) I also understood why search is such a cash cow (at least revenue wise) information need (also for advertising) from the user is given which naturally leads to high click through rates.
  • I understood how big the facebook bubble is that is being created. I almost hope they will enter the stock market soon. I will definatly bet some money and buy puts.
  • Speaking of this I got introduced to the concept of an ego network and what the implications of this are to a social network and to running a social network. It is actually embarresing that I never realized those concepts myself by running metalcon. Furthermore it is just amazing to me how much impact the concept of a persons ego network has to his everyday life and to his mindset.
  • I did learn why people using linux after a while only show a sad smile to windows users. It is unbelievable how much pain in the ass windows is

Weaknesses I still have

  • Even though I have acomplished many things I still have the feeling that I am procrastinating a lot (guess due to bad time management)
  • I still show a tendancy towards overcommitment which means to many paralell projects
  • Together with this comes my hard time focussing (especially on scientific output related output)
  • For some reason I am still not too keen on reading. I am not reading enough paper / blogs / mags / mailing lists / news …
  • My written communication skills could improve a lot. Especially spelling and structure
  • With this comes communication in general way to improve.
  • I am not doing enough pyhiscal excercises I have gained weight and I am tired a lot
  • I am not learning enouh chinese not to say I am forgetting my Chinese.
  • I have to make things happen and state the obvious. Especially realize the moments in which I think outside the box and have creative ideas. In my research I was constantly talking about a social circle in order to reference the ego network of a user. Some months later google plus comes up with the circle concept. I was talking about this for ages without realizing that I should seperate this from all the rest in order to create something big!

goals for 2012

  • The main goal is to write a solid good PhD proposal. I am still batteling between too topics. 1.) organizing social news feeds from your cirle of friends. 2.) distributing graph data bases. The first problem is more application oriented the second one seems to be more technical. There are many reasons that speak for both topics. I guess I will move twords the second problem. In any case I will have to write the proposal and submit it to a suitable conference. I will also have to write a German version of it in order to aply for the PhD scholarship from the German national academic foundation.
  • I want to go back to China. In the best case do a 3 month research trip with jiaotong daxue in shanghai.
  • I am still very interested in doing an internship. Since my professor said I could eather do an internship or go abroad I will have to choose. But if I should go for the internship I will have too look into google (research), yahoo research, linked in, facebook, last.fm maybe even simfy or find another interesting company.
  • There is most certainly the need to write more papers and I already have some very concrete and specific ideas (including solutions) so if anyone is interested in real joint work contact me any time! The ideas are in the subjects of (information retrieval, differential geometry, logging in graph data bases, sparql queries as graph traversals, sentence prediction using n grams and neo4j
  • There has to be progress with metalcon and with in legend
  • I want to run a seminar on one or two topics which could be search engines or graph data bases and its applications.
  • I want to spend more time on contributing to wikipedia. Especially I want to include this in teaching at university.
  • I already started and want to further improve my time management skills. I want to use todo lists more eficiently and also make more use of tools like a calander system. Also my workbalance between free time vs. work time has to be increased
  • I want to improve on my chinese language skills
  • I want teach my third course with deutsche schüler akademie.
  • I want to do even more team work projects
  • I want to create a reading class in university in order to do some more efficient research.
  • And as some private goals I want to make more music, model, do the toungtwister video, physical exercise on a regular basis

Some final thoughts

I really received a lot of help from my advisor / university. Going back to university was so far the best idea in my live. I totally know why I returned. I am enjoing the time in university from both perspectives the freedom as well as the topic.
So what have you guys been doing in 2011 and what are your goals for 2012? In any case I wish you a happy new year!

]]>
https://www.rene-pickhardt.de/my-first-phd-year-summerized-what-a-great-choice-of-mine/feed/ 4
11 lessons learnt after my first scientific paper was submitted https://www.rene-pickhardt.de/11-lessons-learnt-after-my-first-scientific-paper-was-submitted/ https://www.rene-pickhardt.de/11-lessons-learnt-after-my-first-scientific-paper-was-submitted/#comments Wed, 02 Nov 2011 00:58:28 +0000 http://www.rene-pickhardt.de/?p=855 During the last month my blog was rather quite. I dicided that I was aiming to submit my first paper to a top conference with a deadline of november first. Well besides the fact that I almost forgot about the fact that I also have a private life – as well as my collegues helping me with the paper – there were several lessons learnt:

  1. If your advisor tells you that the deadline is to short he is probably right! We beat the deadline but the cost for doing so was really high.
  2. Physicists rock like hell. Evaluating my algorithms I did many experiments together with Jonas Kunze my partner at metalcon. I was totally amazed by the way he approached meassuring things. I rememberd my time as an undergrad standing in the lab meassuring things for my physics classes. Despite the fact that I knew a usefull skill was being tought to me I hated it and decided to go for pure mathmatics… Well I now learnt the hard way what I didn’t learn as an under graduate.
  3. Things become clearer when you really dig into it. It is amazing how all the practical runtimes of my graph data base index for social news feeds – let’s call it GRAPHITY – matched the theoretical runtimes. But while evaluating you see how bad experiments have been designed in the planning phase and you reajust. Even if things work out right a way several times I got a deeper understanding by just seeing and feeling it.  What I want to say things are more complicated than you might think after 2 minutes, half a day or half a month of thinking.
  4. The whole learning experience was really nice and it was more about techniques for scientific working than the graph databes index.
  5. If your advisor tells you to change notation it is most likely true that even though he is not as deep as you in the topic he has more experience and changing notation is a good idea! Even though I was totally convinced that my notation was great ( at least I have learnt how to model things while studying mathmatics) it made things more complicated. After I finally listened to my advisor things worked out like magic (at least concerning notation)
  6. people in university have a very different approach to people in consultancies. But if the deadline comes closer both work until late at night!
  7. Freedom is perfect! Thinking of the problem and solution I did not have many conferences and current research topics in mind. I thought of practical problems for improving metalcon. While emerging with my ideas the first criticism was that my motivation was not scientific. Well screw that! As soon as you really work on describing the problem and doing evaluation you do the science!
  8. You can always generalize. I was pretty sures my skills in doing so are quite good. Well now I now there is space for improvement.
  9. Structure, structure, structure. You cannot have enough structure!
  10. Making a traffic light status overview document in Google docs or some other collaborative system as my friend Heinrich Harmann showed me during “schüler akademie” as he has learnt with McKinsey & Company is really good invested effort and time!
  11. neo4j is really a cool and exciting technology and the guys in sweeden are really helpfull and cool.

I guess I could boar you all has hell for the next couple pages. I actually should even do this because I know myself i will never come back and right down what is in my mind right now. That is the reason why I publish this here right now at 2 o’clock in the morning!
 

]]>
https://www.rene-pickhardt.de/11-lessons-learnt-after-my-first-scientific-paper-was-submitted/feed/ 4