data structures – Data Science, Data Analytics and Machine Learning Consulting in Koblenz Germany https://www.rene-pickhardt.de Extract knowledge from your data and be ahead of your competition Tue, 17 Jul 2018 12:12:43 +0000 en-US hourly 1 https://wordpress.org/?v=4.9.6 Typology Oberseminar talk and Speed up of retrieval by a factor of 1000 https://www.rene-pickhardt.de/typology-oberseminar-talk-and-speed-up-of-retrieval-by-a-factor-of-1000/ https://www.rene-pickhardt.de/typology-oberseminar-talk-and-speed-up-of-retrieval-by-a-factor-of-1000/#comments Thu, 16 Aug 2012 11:39:25 +0000 http://www.rene-pickhardt.de/?p=1396 Almost 2 months ago I talked in our oberseminar about Typology. Update: Download slides Most readers of my blog will already know the project which was initially implemented by my students Till and Paul. I am just about to share some slides with you. They explain on one hand how the systems works and on the other hand give some overview of the related work.
As you can see from the slides we are planning to submit our results to SIGIR conference. So one year after my first blogpost on graphity which devoloped in a full paper for socialcom2012 (graphity blog post and blog post for source code) there is the yet informal typology blog post with the slides about the Typology Oberseminar talk and 3 months left for our SIGIR submission. I expect this time the submission will not be such a hassle as graphity since I shuold have learnt some lessons and also have a good student who is helping me with the implementation of all the tests.
Additionally I have finally uploaded some source code to git hub that makes the typology retrieval algorithm pretty fast. There are still some issues with this code since it lowers the quality of predictions a little bit. Also the index has to be built first. Last but not least the original SuggestTree code did not save the weights of the items to be suggested. I need those weights in the aggregation phase. Since i did not want to extend the original code I placed the weights at the end of the suggested Items. This is a little inefficent.
The main idea why retrieval speeds up with the new algorithm is that typology needs to make sorting over all outedges of a node. This is rather slow especially if one only needs the top k elements. Since neo4j as a graph data base does not provide indices for this kind of data I was forced to look for another way to presort the data. Additionally if a prefix is known one does not have to look at all outgoing edges. I found the Suggest Tree class by Nicolai Diethelm. Which solved the problem in a very good way and lead to such a great speed. The index is not persistent yet and it also needs quite some memory. On the other hand for every node a suggest tree is built. This means that the index can be distributed in a very easy manner over several machines allowing for horizontal scaling!
Anyway the old algorithm was only able to handle like 20 requests per second and now we have something like 14 k requests and as I mentioned there is still a little space for more (:
I hope indices like this will be standard in neo4j soon. This would open up the range of applications that could make good use of neo4j.
Like always I am happy for any suggestions and I am looking forward to do the complete evaluation and paper writing for typology.

]]>
https://www.rene-pickhardt.de/typology-oberseminar-talk-and-speed-up-of-retrieval-by-a-factor-of-1000/feed/ 2
Building an Autocompletion on GWT with RPC, ContextListener and a Suggest Tree: Part 0 https://www.rene-pickhardt.de/building-an-autocompletion-on-gwt-with-rpc-contextlistener-and-a-suggest-tree-part-0/ https://www.rene-pickhardt.de/building-an-autocompletion-on-gwt-with-rpc-contextlistener-and-a-suggest-tree-part-0/#comments Wed, 13 Jun 2012 13:15:29 +0000 http://www.rene-pickhardt.de/?p=1360 Over the last weeks there was quite some quality programming time for me. First of all I built some indices on the typology data base in which way I was able to increase the retrieval speed of typology by a factor of over 1000 which is something that rarely happens in computer science. I will blog about this soon. But heaving those techniques at hand I also used them to built a better auto completion for the search function of my online social network metalcon.de.
The search functionality is not deployed to the real site yet. But on the demo page you can find a demo showing how the completion is helping you typing. Right now the network requests are faster than google search (which I admit it is quite easy if you only have to handle a request a second and also have a much smaller concept space).  Still I was amazed by the ease and beauty of the program and the fact that the suggestions for autocompletion are actually more accurate than our current data base search. So feel free to have a look at the demo:
http://134.93.129.135:8080/wiki.html
Right now it consists of about 150 thousand concepts which come from 4 different data sources (Metal Bands, Metal records, Tracks and Germen venues for Heavy metal) I am pretty sure that increasing the size of the concept space by 2 orders of magnitude should not be a problem. And if everything works out fine I will be able to test this hypothesis on my joint project related work which will have a data base with at least 1 mio. concepts that need to be autocompleted.
Even though everyting I used but the ContextListener and my small but effective caching strategy can be found at http://developer-resource.blogspot.de/2008/07/google-web-toolkit-suggest-box-rpc.html and the data structure (suggest tree) is open source and can be found at http://sourceforge.net/projects/suggesttree/ I am planning to produce a series of screencasts and release the source code of my implementation together with some test data over the next weeks in order to spread the knowledge of how to built strong auto completion engines. The planned structure of these articles will be:

part 1: introduction of which parts exist and where to find them

  • Set up a gwt project
  • Erease all files that are not required
  • Create a basic Design

part 2: AutoComplete via RPC

  • Neccesary client side Stuff
  • Integration of SuggestBox and Suggest Oracle
  • Setting up the Remote procedure call

part 3: A basic AutoComplete Server

  • show how to fill data with it and where to include it in the autocomplete
  • disclaimer! of not a good solution yet
  • Always the same suggestions

part 4: AutoComplete Pulling suggestions from a data base

  • inlcuding a data base
  • locking the data base for every auto complete http request
  • show how this is a poor design
  • demonstrate low response times speed

part 5: Introducing the context Listener

  • introducing a context listener.
  • demonstrate lacks in speed with every network request

part 6: Introducing a fast Index (Suggest Tree)

  • inlcude the suggest tree
  • demonstrate increased speed

part 7: Introducing client side caching and formatting

  • introducing caching
  • demonstrate no network traffic for cached completions

not covered topics: (but for some points happy for hints)

  • on user login: create personalized suggest tree save in some context data structure
  • merging from personalized AND gobal index (google will only display 2 or 3 personalized results)
  • index compression
  • schedualing / caching / precalculation of index
  • not prefix retrieval (merging?)
  • css of retrieval box
  • parallel architectures for searching
]]>
https://www.rene-pickhardt.de/building-an-autocompletion-on-gwt-with-rpc-contextlistener-and-a-suggest-tree-part-0/feed/ 6
balanced binary search trees exercise for algorithms and data structures class https://www.rene-pickhardt.de/balanced-binary-search-trees-exercise-for-algorithms-and-data-structures-class/ https://www.rene-pickhardt.de/balanced-binary-search-trees-exercise-for-algorithms-and-data-structures-class/#comments Tue, 29 Nov 2011 14:20:40 +0000 http://www.rene-pickhardt.de/?p=971 I created some exercises regarding binary search trees. This time there is no coding involved. My experience from teaching former classes is that many people have a hard time understanding why trees are usefull and what the dangers of these trees is. Therefor I have created some straight forward exercises that nevertheless involve some work and will hopefully help the students to better understand and internalize the concepts of binary search tress which are in my oppinion one of the most fundamental and important concepts in a class about algorithms and data structures.

Part A: finding elements in a binary search tree – 1 Point

You are given a binary search tree and you know the root element has the value 2. Considering that the path to for finding an element in the tree is unique decide which of the following two lists can be an actual traversal part in order to receive the element 363 from the binary search tree? Why so?

  • 2, 252, 401, 398, 330, 344, 397, 363
  • 2, 252, 397, 398, 330, 344, 401, 363

Part B: Create binary search trees – 1 Point

You are given an empty binary search tree and two lists of the same elements.

  • 10, 20, 5, 15, 2, 7, 23
  • 10, 5, 7, 2, 20, 23, 15

For both lists draw all the trees that are created while inserting one element after the other one.

Part C: skewed binary search trees and traversing trees – 1 Point

Compare the trees from part B to the tree you would get if inserting the numbers in the order of 2, 5, 7, 10, 15, 20, 23
To understand the different tree traversals please give the result of the inorder and preorder traversal applied to the trees from part B and C.

Part D: Balanced binary search trees. Counting Permutations – 2 Point

We realize that trees can have different topologies as soon as the order of the inserted items changes. Since balanced trees are most desired your task is to count how many permutations of our 7 elements will lead to a balanced binary search tree!
To do so it is sufficient to write down all the permutations that will lead to a balanced binary search tree. But you do not have to do this explicitly. It is also ok to write down all classes and cases of permuations and count them.
Compare the number to all permutations of 7 elements (= 7!) and give the probability to end up with a balanced binary search tree when given a random permutation of 7 different elements.

Part E: A closed formular for the probability to create a balanced binary search tree – 2 Extra Points

Your task is to find and prove a formular that states the number of permutations of the natural numbers 1, 2,…, 2^k-1 such that inserting the numbers will create a balanced binary search tree.
Give a closed forumlar for the probability P(k) to end up with a balanced search tree. Give the explicit results for k = 1,…,10

]]>
https://www.rene-pickhardt.de/balanced-binary-search-trees-exercise-for-algorithms-and-data-structures-class/feed/ 2