UPDATE: there is now the source code of an entire graphity server application online!

8 months ago I posted the results of my research about fast retrieval of social news feeds and in particular my graph index graphity. The index is able to serve more than 12 thousand personalized social news streams per second in social networks with several million active users. I was able to show that the system is independent of the node degree or network size. Therefor it scales to graphs of arbitrary size.

Today I am pleased to anounce that our joint work was accepted as a full research paper at IEEE SocialCom conference 2012. The conference will take place in early September 2012 in Amsterdam. As promised before I will now open the source code of Graphity to the community. Its documentation could / and might be improved in future also I am sure that one is even able to use a better data structure for our implementation of the priority queue.

Still the attention from the developer community for Graphity was quite high so maybe the source code is of help to anyone. The source code consists of the entire evaluation framework that we used for our evaluation against other baselines which will also help anyone to reproduce our evaluation.

There is some nice things one can learn in setting up multthreading for time measurements and also how to set up a good logging mechanism.

The code can be found at https://github.com/renepickhardt/graphity-evaluation and the main Algorithm should lie in the file:
https://github.com/renepickhardt/graphity-evaluation/blob/master/src/de/metalcon/neo/evaluation/GraphityBuilder.java
other files of high interest should be:

I did not touch it again over the last couple months and it really has a lot of debugging comments inside. My appologies for this bad practice. I hope you can oversee this by having in mind that I am a mathematician and this was one of my first bigger evaluation projects. In my own interest I promise next time I produce code that will be easier to read / understand and reuse.

Still if you have any questions suggestions or comments feel free to contact me.

The raw data is can be downloaded at:

the format of these files is straight foward:

de-nodeIs.txt has first some ID then a tab and then the title of the wikipedia article this is just necessary if you want to display your data with titles rather than names.

the interesting file is the de-events.log in this file there are 4 columns
timestamp TAB FromNodeID TAB [ToNodeID] TAB U/R/A

So every line tells exactly when an article FromNodeID changes. if only 3 collumns are available and an U is written then the article just changed. Maybe links in the article changed in this case there exists another nodeID in the 3 column and an A or a R for add or remove respectively.

I think processing these files is rather straight forward. With this file you can totally simulate the growth of wikipedia over time. The file is sorted by the 2. column. If you want to use it in our evaluation framework you should sort this by the first column. This can be done on a unix shell in less than 10 minutes with the sort command.

Sorry I cannot publish the paper right now on my blog yet since the camera ready version has to be prepared and checked in to IEEE. But follow me on twitter or subscribe to my newsletter so I can let you know as soon as the entire paper as a pdf is available.

If you like this post, you might like these related posts:

  1. neo4j based social news feed demo on wikipedia graph running UPDATE: you can find an evaluation of the following blog...
  2. Download Google n gram data set and neo4j source code for storing it In the end of September I discovered an amazing data...
  3. Graphity: An efficient Graph Model for Retrieving the Top-k News Feeds for users in social networks UPDATE: the paper got accepted at SOCIALCOM2012 and the source...
  4. Open access and data from my research. Old resources for various topics finally online. Being strong pro on the topic of open access I...
  5. Graphity Server for social activity streams released (GPLv3) It is almost 2 years over since I published my first...

Sharing:

Tags: , , , , ,

4 Comments on Graphity source code and wikipedia raw data is online (neo4j based social news stream framework)

  1. [...] UPDATE: the paper got accepted at SOCIALCOM2012 and the source code is online [...]

  2. [...] Graphity source code and wikipedia raw data is online (neo4j based social news stream framework) Ren… [...]

  3. [...] blogpost on graphity which devoloped in a full paper for socialcom2012 (graphity blog post and blog post for source code) there is the yet informal typology blog post with the slides about the Typology Oberseminar talk [...]

  4. [...] blogpost on graphity which devoloped in a full paper for socialcom2012 (graphity blog post and blog post for source code) there is the yet informal typology blog post with the slides about the Typology Oberseminar talk [...]

Leave a Reply

*

Close

Subscribe to my newsletter

You don't like mail?