Almost 2 months ago I talked in our oberseminar about Typology. Update: Download slides Most readers of my blog will already know the project which was initially implemented by my students Till and Paul. I am just about to share some slides with you. They explain on one hand how the systems works and on the other hand give some overview of the related work.
As you can see from the slides we are planning to submit our results to SIGIR conference. So one year after my first blogpost on graphity which devoloped in a full paper for socialcom2012 (graphity blog post and blog post for source code) there is the yet informal typology blog post with the slides about the Typology Oberseminar talk and 3 months left for our SIGIR submission. I expect this time the submission will not be such a hassle as graphity since I shuold have learnt some lessons and also have a good student who is helping me with the implementation of all the tests.

Additionally I have finally uploaded some source code to git hub that makes the typology retrieval algorithm pretty fast. There are still some issues with this code since it lowers the quality of predictions a little bit. Also the index has to be built first. Last but not least the original SuggestTree code did not save the weights of the items to be suggested. I need those weights in the aggregation phase. Since i did not want to extend the original code I placed the weights at the end of the suggested Items. This is a little inefficent.

The main idea why retrieval speeds up with the new algorithm is that typology needs to make sorting over all outedges of a node. This is rather slow especially if one only needs the top k elements. Since neo4j as a graph data base does not provide indices for this kind of data I was forced to look for another way to presort the data. Additionally if a prefix is known one does not have to look at all outgoing edges. I found the Suggest Tree class by Nicolai Diethelm. Which solved the problem in a very good way and lead to such a great speed. The index is not persistent yet and it also needs quite some memory. On the other hand for every node a suggest tree is built. This means that the index can be distributed in a very easy manner over several machines allowing for horizontal scaling!

Anyway the old algorithm was only able to handle like 20 requests per second and now we have something like 14 k requests and as I mentioned there is still a little space for more (:

I hope indices like this will be standard in neo4j soon. This would open up the range of applications that could make good use of neo4j.

Like always I am happy for any suggestions and I am looking forward to do the complete evaluation and paper writing for typology.

If you like this post, you might like these related posts:

  1. Typology using neo4j wins 2 awards at the German federal competition young scientists. Two days ago I arrived in Erfurt in order to...
  2. Open access and data from my research. Old resources for various topics finally online. Being strong pro on the topic of open access I...
  3. Neo4j based Typology also awarded top 90 at Google Science fair. Yesterday I shared the good news about Till and Paul...
  4. Paul Wagner and Till Speicher won State Competition “Jugend Forscht Hessen” and best Project award using neo4j 6 months of hard coding and supervising by me are...
  5. Download Google n gram data set and neo4j source code for storing it In the end of September I discovered an amazing data...

Sharing:

Tags: , , , , , , , ,

1 Comment on Typology Oberseminar talk and Speed up of retrieval by a factor of 1000

  1. [...] Typology Oberseminar talk and Speed up of retrieval by a factor of 1000 by René Pickhardt. [...]

Leave a Reply

*

Close

Subscribe to my newsletter

You don't like mail?