3 exercises for Sorting problems (Quicksort , Mergesort) in Algorithms and data structures class

November 14, 2011November 14, 2011/ Rene

#1: Sorting huge files

Sorting big files might not be as simple as just implementing an sort algorithm. As soon as the file does not fit in memory any more smarter implementations have to be applied. One way is to sort the file on the hard disk. We remark that not every algorithm is easily adopted for this kind of task. So your task for the exercise is to decide what kind of alogrithms are good to solve the problem and what approach to handle huge files could be taken?

Discuss what kind of operations are efficient while retrieving / processing data from the hard disk
Discuss what kind of operations are needed in the different algorithms
Create a table to display the results and choose the most apropriate algorithm.

One possible way of implementing this would be to split the file in smaller files which can be sorted in memory and then use a bottom up merge function to merge all those files.
In order to do so you can sort this Snapshot of all wikipedia revisions taken from the German wikipedia 2011. The file is uncompressed 3.1 gigabyte in size and consists of 128 million rows. In particular it already contains a partial order.

#2: Finding the k smallest element in an unsorted List

Your task is to find the k smallest element from an unsorted list. (thanks to Robert Sedgwick for inspiration!)
Obviously one approach would be to sort all the data and then retrieve the k-th element. The runtime of this approach would be O(n log(n)) though. We want to achieve this in linear runtime which is possible due to the help of the partition function of quicksort.
After calling the partition function the unordered list is split in two sublists with lenght i and n-i. The first list contains the first i elements (not neccessarily sorted). comparing i to k tells you weather to search in the first or second sublist for the element.

Use this idea to implement findMinK(ArrayList <Integer> array, int k, int l, int r)
Test the runtime of your implementation against the primitive approach of first sorting. In order to test you can just download the testframe work code below. In your function you should increase the global variable cmpcnt every time the partition function swaps elements in the list
Write down the recursive equation of your solution and solve it in order to prove that the average case runtime is also theoretically linear.
Compare this runtime behaviour to quicksort (next exercise) and explain why these approaches are in different complexity classes

import java.util.ArrayList; import java.util.Collections; import java.util.Random; public class mink { static public int cmpcnt = 0; public static void main(String[] args) { testFramework(); } public static int findMinK(ArrayList array, int k, int l, int r) { // Implement here } public static int findMinK(ArrayList array, int k){ Collections.sort(array); return array.get(k); } private static void testFramework() { ArrayList a = new ArrayList(); for (int j=2;j<8;j++){ a.clear(); for (int i=0;i<(int)Math.pow(10, j);i++){ a.add(i); } System.out.println("\n\n"+a.size()+" Elements\n\n"); double slow=0; double fast=0; for (int i = 0; i < 10; i++) { cmpcnt = 0; Collections.shuffle(a); int k = (int)(Math.random()*(Math.pow(10, j)-1))+1; System.out.println("test run number: " + i + " find: " + k); long start = System.currentTimeMillis(); findMinK(a, k, 0, a.size()-1); long end = System.currentTimeMillis(); long smarttime=(end-start); fast = fast + smarttime; System.out.println("SMART ALGO \t --- time in ms: " + smarttime + " comparisons: " + cmpcnt); start = System.currentTimeMillis(); findMinK(a, k); end = System.currentTimeMillis(); long slowtime = (end-start); System.out.println("WITH SORTING \t --- time in ms: " + slowtime); System.out.println("sorting is " +(double)slowtime/(double)smarttime + " times slower"); slow = slow + slowtime; } System.out.println("sorting (="+slow+"ms) is " +slow/fast + " times slower than smart algo (="+fast+"ms)"); } } }

#3: Solving recursive equations: Proving runtime of Quicksort

Quicksort is a probabilistic algorithm and its recursive equation is given by the implicit equation
[latex]T(n) = n + 1/n \sum_{i=1}^n(T(i)+T(n-i))[/latex]

Explain the meaning of the sum in this equation and its connection to stochastics.

Solve the equation. In order to do so. you can use the following equivalences and solve the recursive equation by substitution.

[latex]1/n \sum_{i=1}^n(T(i)+T(n-i)) = 2/n \sum_{i=1}^n(T(i)) = 2/n (T(n) + \sum_{i=1}^{n-1}(T(i)))[/latex]

November 14, 2011November 14, 2011/ Rene