Teaching and Exercises – Data Science, Data Analytics and Machine Learning Consulting in Koblenz Germany

Web Science MOOC – first lessons about Ethernet and Internet Protocol online

Rene — Tue, 22 Oct 2013 21:55:01 +0000

2 months ago I started to create the Web Science MOOC and now you can join our MOOC as a student. We will start online streamed flipped classroom lessons on October 29th. Our MOOC is truely open meaning that all the teaching material will be provided as open educational resources with a creative commons 3.0 attribution share alike licence.
In the first month we will learn about the following topics

Ethernet
Internet Protocol
Transfer Controll Protocol
Domain Name System
URIs
HTTP
HTML
RDF
Javascript / CSS

The Ethernet lessons can be found at:
https://en.wikiversity.org/wiki/Topic:Web_Science/Part1:_Foundations_of_the_web/Internet_Architecture/Ethernet

The Internet protocol lessons can be found at:
https://en.wikiversity.org/wiki/Topic:Web_Science/Part1:_Foundations_of_the_web/Internet_Architecture/Internet_Protocol

Since wikiversity in comparison to other MOOC platforms is truely open you might also want to watch some of my introductory videos. They are in particular helpful to show how to make the best use of wikiversity as MOOC platform and how one can really engage into the discussion. You can find the videos at:
https://en.wikiversity.org/wiki/Topic:Web_Science/New_here

but maybe your are already interested in watching some of the content right here right away:

MOOCs at Wikiversity: A Barcamp proposal for #OERde13

Rene — Sat, 14 Sep 2013 06:49:26 +0000

I would like to have an discussion with people that have experience or are interested in MOOCs and Wikiversity. The goal is to checkout the possibilities for creating (otherwise over commercialized) MOOCs in an OER environment (especially wikiversity).

Background:

According to my former blog post there are 3 ways for creating a MOOC that is truely OER:

Out of these I would love to discuss what possibilities exist in the context of Wikiversity and how such a MOOC could benefit from the ecosystem of other Wikimedia projects (e.g. books, commons, wikipedia and of course wikiversity itself)
I would also love to create a list of requirements for wikiversity software with functionalities needed (e.g. access to multiple choice results of students) to create an OER MOOC. This list could be present to the wikimedia foundation in order to extend the wikiversity software.

My experiences:

I have created a German online course in Wikiversity on how Computers work
Currently I am creating an English MOOC on Webscience which we will use in our university together with flipped classroom for the Web Science lecture. The content of the first block is almost ready:

Comparison of open educational resources services to host your MOOC

Rene — Thu, 25 Jul 2013 17:43:24 +0000

This article on open and free platforms to host your MOOC belongs to the entire series: comparison of places to host your MOOC. As already mentioned there are only a few platforms which really belong to the category of open educational resources. The term is described in the Wikipedia article: Open educational resources as follows:

Open Educational Resources (OER) are freely accessible, usually openly licensed documents and media that are useful for teaching, learning, educational, assessment and research purposes. Although some people consider the use of an open format to be an essential characteristic of OER, this is not a universally acknowledged requirement. The development and promotion of open educational resources is often motivated by a desire to curb the commodification of knowledge and provide an alternate or enhanced educational paradigm

I go a little further than the definition and really require an open licence and also open formats of the documents:

Open Educational Resources (OER) are freely accessible, ~~usually~~ openly licensed documents and media that are useful for teaching, learning, educational, assessment and research purposes. ~~Although some people consider~~ the use of an open format ~~to be~~ is an essential characteristic of OER, ~~this is not a universally acknowledged requirement~~. The development and promotion of open educational resources is often motivated by a desire to curb the commodification of knowledge and provide an alternate or enhanced educational paradigm

Taking this into account I’ll now compare OER platforms which offer services to host a MOOC. The upshot is that I would suggest to host your MOOC either on Khan Academy or on Wikiversity.

Kahn Academy

Khan Academy is a non-profit educational website created in 2006 by educator Salman Khan, a graduate of MIT and Harvard Business School. The stated mission is to provide “a free world-class education for anyone anywhere”. It is strongly supported by the Bill and Melinda Gates Foundation and won the Google 10 to the 100 award giving them 2 million dollars. Currently the content is translated to various languages including German. You can find more information for instructors on the website at https://www.khanacademy.org/about

Overhead: You have to learn the Khan academy software
Open: Anyone can create courses on Khan academy. I am note quite sure about including videos since khan academy seems to require some standard branding.
Licence: CC 3.0 by Share alike
Hosting time: As long as the project is founded.
Open Format: The website provides an API to obtain data at http://api-explorer.khanacademy.org/ also all (?) source code of Khan academy is available: https://github.com/Khan
Feedback:Various Feedback mechanisms are provided as explained on the website
Quizes: Yes
Community:As far as I understand instructors cannot collaborate within the software
Audience:Yes: more than a quarter billion lessons have been delivered.
Support: There are a lot of online courses training the coach
Online Meetings: There are Q&A style discussions related to every content created
Account Management:
Risk: Besides Khan Academy running out of money I don’t see any risks

Recommendation: Khan Academy is a very good platform to choose once you want to host a massive open online course. The material as free and open. The platform and community is very active and there is a lot of outside support. Exporting data doesn’t seem to work yet but there seems to be the will to be open in the future. Anyway Khan Academy is the only open educational resources platform that offers you a user experience that is closest to the otherwise commercialized MOOC format.

Wikiversity

Wikiversity is a Wikimedia Foundation project which supports learning communities, their learning materials, and resulting activities. It differs from more structured projects such as Wikipedia in that it instead offers a series of tutorials, or courses, for the fostering of learning, rather than formal content. Like Wikipedia it is offered in several languages. The English version of wikiversity seems quite active where as the German version is currently being restructured.

Overhead: Wiki markup language is very easy to learn. also there is the network of wiki tutors that can come to your place and teach you how to use mediawiki
Open: Anybody can contribute to Wikimedia projects
Licence: CC3.0 SA BY
Hosting time: Forever as long as Wikimedia exists
Open Format: Data base dumps are available and the software is open source
Feedback: So far there is little feedback for instructors but there are potential ways of changing this.
Quizes: yes
Community:Instructurs help each other out and also share content among each other. Minor mistakes in the material are quickly corrected.
Audience:There is a large audience, if the video content is uploaded to wiki commons and included into related wikipedia articles there is a high visibility of the MOOC at the targeted audience.
SupportEspecially in Germany there is the Mentoring network of Media wiki users who teach best practices of using media wiki software.
Online MeetingsHolger Brenner also uses media wiki on wikiversity to create online meetings but this is rather tricky
Account ManagementThere exist different user roles in media wiki but those are not really reflecting a student / teacher relationship
RiskBasically there are none. The data base dumps as well as the software are available for download. Even if the platform closes oneself can still easily host the content.

Recommendation: Mediawiki software is very flexible and offers a lot of opportunities. The software itself is not best suited for the “commercialized” massive open online course format. The biggest drawback is the missing analytics for instructors to see how the course is proceeding. On the other side if one actively uses wikiversity (which I did on my last course) one gets a lot of personal feedback. Wikiversity has a lot of trust (provided by wikipedia) and users to explore content and attract many new people. Also wikimedia really follows the concept of free content without any limitations. Finally Mediawiki is open source and also extensions can be included into Wikiversity if the community agrees to that.

OER Commons

OER Commons is a freely accessible online library located at www.oercommons.org that provides a web-based infrastructure for teachers and others to search and discover Open Educational Resources (OER) and other freely available instructional materials. OER Commons is a project created by ISKME, an independent non-profit organization based in Half Moon Bay, California, founded by Lisa Petrides in 2002. Launched in 2007, OER Commons aggregates Open Educational Resources, which are teaching and learning materials that are openly licensed for anyone to use and reuse, in order to support a global network for engaging with flexible, adaptable curriculum

Overhead: No at all
Open: to anybody. I don’t know about content moderation
Licence: Creative commons
Hosting time: hosting can be on any website.
Open Format: all formats supported
Feedback: No
Quizes: No
Community: Yes
Audience:not of students but rather of teachers collecting teaching material
Support: No
Online Meetings: No
Account Management: No
Risk: No

Recommendation: OER Commons is a very interesting approach since a lot of content that is needed for an open MOOC can be drawn from OER commons. All of the MOOC content can be integrated into OER commons and from this hub being spread to other instructors again. The platform itself doesn’t seem suitable to host an entire course. I think anybody who does a MOOC should submit his material to OER commons. This works really easily even if the content is just provided as a web link. I did this with my last course which was hosted on wikiversity

European MOOC platform open up ed

The european union created its own mooc platform under www.openuped.eu/.

Overhead: No at all
Open: only selected partners
Licence: partner choice
Hosting time: you host the mooc yourself
Open Format: your decision
Feedback: possible
Quizes:possible
Community: There is a network of partners but it’s hard to say how much collaboration exists
Audience:your own students
Support: n/a
Online Meetings: possible
Account Management: possible
Risk: None

Recommendation: This platform seems interesting since there is political will behind. Right now it seems to only aggregate MOOCs from various partners so there is no hosting service offered. On the other side you maintain the licence of everything and can probably add an existing MOOC to the index of the platform ==> Nice to have but for now it cannot work as a standalone hosting service. Also it is not clear if you can participate since they work only with selected partners.

P2P University

Peer to Peer University (P2PU) is a nonprofit online open learning community which allows users to organize and participate in courses and study groups to learn about specific topics. Peer 2 Peer University was started in 2009 with funding from the Hewlett Foundation and the Shuttleworth Foundation. The main learning management system for P2PU courses is called Lernanta (the Esperanto word for “learning”). P2PU also hosts a wiki and an OSQA server for questions and answers.

Overhead: low
Open: Anybody
Licence: CC SA BY
Hosting time: I did not spot video content
Open Format: As far as I see there is no standard format used
Feedback: through discussions
Quizes: no
Community: there are strong partners like mozilla connected to the project
Audience: doesn’t seem too large
Support: there is a lot of teaching about the platform in courses on the platform. since courses are p2p I assume there is quite some support
Online Meetings: possible
Account Management: probably not
Risk: This platform doesn’t seem to be mature yet. Will it survive?

Recommendation: I like the approach of this learning platform but I have the feeling it is much more targeted towards learning groups from students. It also doesn’t seem to be very mature and it is not quite clear to what place it will develop. Also I could not find data base dumps on the website which decreases my trust into the platform.

Summary

I hope I did not oversee any platform. My advice is to go for either Khan Academy or Wikiversity and submit your entire course as well as pieces of the material to OER Commons. In that way I would also suggest to add part of the content of your course to wiki commons if can enhance any given wikipedia article. I think it is probably personal choice whether to go for Khan Academy or for Wikiversity. Personally I would probably go for Wikiversity since I already had good experiences and my trust to this platform with respect to long term sustainability is higher. Also out of the box more languages are supported. In any case: When you want to create a MOOC don’t let yourself be blinded by commercialized platforms and offers just because they look nicer. Education is something that belongs to the citizens!

Comparison of platforms and places to use to host your MOOC

Rene — Wed, 24 Jul 2013 16:03:50 +0000

As many of you know and voted (thanks for that) Steffen and I tried to get a MOOC fellowship in order to create a web science MOOC. Even though our application was not successful we decided that online teaching in the MOOC format is suitable for the web science lecture. With the structure from our application and the teaching last term we have some basic structure for the content the students should learn. Now we start to create the material but the question is what platform to use and where to host a MOOC? I was actually planning to write one single article on that topic but it turned out that there are so many different approaches to online learning that I will have to split my work into several articles. So here I will just explain my methodology and the criteria I will use to compare the platforms for your MOOC.
There is a lot of good information about the MOOC industry and current trends in the MOOC wikipedia page
Basically there are 3 different approaches to online education:

Free content: The focus of these platforms (Khan Academy, Wikiversity, OER Commons, P2P university,…) lies in freeing educational content from the publishing industry. In most cases the focus seems to be on content and not so much on learning paths or didactics or pedagogy. The argumentation seems to be like: “first we need the content, next we can think about how to use it”. Have alook at my blog post: http://www.rene-pickhardt.de/comparison-of-open-educational-resources-services-to-host-your-mooc/ to see which open platforms perform well.
Commercial: There is a rising industry (Coursera, Udacity, edX, iversity,..) trying to commercialize massive open online education. Commercial platforms usually have high quality content and strong relationships with universities (most often ivy league) serving a lot of classes in this new format. Courses are usually not available under an open licence. So far most content is available at no cost and the business model is related to certification but also sometimes to tuition fees.
Self hosted with the use of a learning management system: There are various learning management systems (OLAT, Moodle, Google Course Builder, ILIAS,…) available as open source software which enables one to host a MOOC oneself. Most of these systems are made for eLearning and but lack this MOOC feeling of excellent usability. Often their intent also is not primary to be open.

This means besides this article I will publish three blog articles comparing platforms for each of the 3 different approaches. There is a German list of Learning platforms on Wikipedia as well as the MOOC Template in the English wikipedia from which I extracted the following lists

Platforms for online education

People related to online education

Not all of the platforms are relevant for a Web Science MOOC but still I extracted some of the most relevant sites and added a fiew others. As for the evaluation methodology we did a little survey and identified some possibilities. Since there are so many hosting services and possibilities we tried to find some dimensions that are important to us in order find which hosting service makes the most sense. We will use the following dimensions for our evaluation:

Overhead: How much overhead is associated providing the content for a certain platform infrastructure?
Open: Will the platform accept our course?
Licence: Who has the copyright and how is the licencing model?
Hosting time: How much time of hosting does the platform guarantee?
Open Format: Will the course content be in an open format so that we can easily export the data from the host and take it to some other service?
Feedback: Feedback for instructors like how long do people interact with some content?
Quizes: Will quizes be supported in the Platform
Community:Is there an active community and exchange of instructors?
Audience:Is there a large audience using the platform?
Support: is there active support from the platform?
Online Meetings: Does the platform support meetings of students and teachers on the cyberspace?
Account Management: Is it possible to have different roles for the accounts (e.g. student, tutor, creator,…)?
Risk: What are the risks of using this particular platform?

At least my goal would be to find a service with the following answers to our dimensions:

Overhead: Little overhead to submit the course material.
Open: The platform should be open to any course.
Licence: We should maintain the copyright or the licence should be at least creative commons
Hosting time: forever
Open Format: data export of the material is needed. e.g respecting http://en.wikipedia.org/wiki/IMS_Global
Feedback: In order to improve we need Feedback
Quizes: We need various forms of quizes
Community:A community of instructors with which one can exchange and from which one can learn would be amazing.
Audience:In the end good content will win but the larger the audience the better
Support: A platform that offers support with problems is preferable
Online Meetings: It would be nice if the platform supports online meetings of users with Q&A systems or even with video chat.
Account Management: Multiple account roles would support the learning process.
Risk: Obviously we want the risks to be minimized

I am looking forward to your feedback of missing platforms or other dimensions for the evaluation of the learning platforms.

Please help me to realize my Web science massive open online course

Rene — Wed, 01 May 2013 09:59:57 +0000

I am asking you for a big favor in this blog post! You can help me to achieve one of my childhood dreams:
I am an enthusiastic teacher and love to share information (as you might have seen by reading my blog) Over the last month I have designed a structure for an online course on Web Science together with a short video. In this blog post I will introduce the course to you but I am also asking you to vote for the course since only 10 of the 250 courses that applied for the fellowship will be sponsored and thus be realized.
So please go to https://moocfellowship.org/submissions/web-science an learn more about the course and vote for it. You can find almost all details of the course in this blog post.

Why creating such a cours?

The web has become important to its 2.3 billion users. Yet only a small group of people understand the processes that take place on it and quickly steer its development into new directions.

Novelty of the subject

Web Science is an upcoming academic field. Much information about the web already exists online, but no course that comprises all of it.

High value for every web user

The MOOC would be of high value and of relevance for anybody using the web e.g:

A programmer who is building the next web application
A company deciding their web strategy
A judge who has to decide a case regarding net neutrality or copy right infringements
The Government as well as public authorities which have to make decisions on how to regulate the web
…

The web is the right place to learn about the web

The web itself is the best platform to educate people about the web since you can always point directly to the object of study. By creating a MOOC we will be able to aggregate, organize and filter much of the available information.

Integration within our institution

The MOOC will be a core element for the web science lecture of our web science master program. The goal is that students will work with the material provided by the MOOC and the instructors will replace classical lectures with public Q&A sessions. Additionally the Web Science lecture of 2013/2014 will serve as an internal testing of the MOOC such that the improved MOOC can launch on iversity in 2014.

Course content

This MOOC consists of ten lessons divided into three parts.

Lesson 1 – 3: Foundations of the web
Lesson 4 – 7: Theoretical results of web user behavior
Lesson 8 – 10: Web & society

Lesson 1 & 2: History of the Web & Web Architecture

You will understand the historical development of the web and see how the cold war in combination with advances in technical developments led to the Internet Protocol suite.

On each Layer you will know one protocol and understand how these protocols build an open, inter operable and decentralized system. Furthermore you will learn about the domain name system and find out why the concepts of URI and Hypertext were crucial for the success of the web.

Lesson 3: Structure of the Web

You will learn about the six degrees of seperation and understand concepts like small world networks by studying ‘the other’ Milgram experiment. You will be able to use power law distributions to describe the structure of the web, its content and its users.

Lesson 4 & 5: Micro and Macro behavior of web users & Social Network (Analysis)

You will be introduced to theories from Microsociology and see how applying them to the behavior of people on the web leads to macro structures such as:

Analyzing social network data from the Koblenz Network Collection using Octave you will gain a deeper understanding of social theories and social networks.

Lesson 6 & 7: Information Retrieval & Recommender systems

Completing this section you will understand the basic architecture of a (web) search engine. You can name the fundamental (non technical) difficulties one has in order to create a good information retrieval system. You will learn about the connection to recommender systems that are (not only!) used by large web shops to increase cross selling.
You will be able to discuss the danger of such algorithms like the relevance paradox and the filter bubble.

Lesson 8: Trust and Security

You will learn how third parties act as trust providers on the web and how this issue is related to markets with asymmetric information. You will see that trust issues in the online word differ from the offline problems. You will know of ways like cryptography, secure communication and certificates to resolve trust issues and how those techniques can even lead to a new currency.

Lesson 9: Web Economics

You will know of e-commerce models like online shopping & auctions as well as online advertising and marketing. You will be able to interpret and apply metrics for web analytics such as

Lesson 10: Web Governance and Web Ethics

Finally you will understand the important role of institutions like W3C, IETF and ICANN . You will use your understanding of the web architecture to discuss and explain the connections between

Net neutrality
Piracy and copy right infringement
Internet censorship and the freedom of speach

So please go to https://moocfellowship.org/submissions/web-science an learn more about the course and vote for it.

Teaching Web Science (web architecture and Web ethics) to students

Rene — Tue, 26 Mar 2013 13:52:51 +0000

In July 2012 we taught a course for the German National Summer School for high school students. The course consisted of 50 hours over 14 days. Due to some specific settings of the Summer School we had to make a few adjustments to the format of our curriculum and lectures. Still we gathered some good experiences for future teaching. The main lesson learnt was that knowledge of the Internet protocol suite contributes to a better understanding of the decentralized and open aspects of the web. This leads to a better comprehension of the ethical aspects of the web like net neutrality, copyright, relevance paradox, censorship and others. We propose that any curriculum about Web Science should include a fair part of lectures on Web Architecture and the Internet protocol Stack.

Course context (level, students, discipline, etc.)

The course was designed for 16 highly gifted high school students (11th and 12th grade). The level was supposed to be manageable for a second year undergraduate student. Since our students came from different grades and schools we were forced to sacrifice some course time to teach some basic programming skills. Thus we could not cover all the aspects of Web Science. Instead we focused on three main course objectives:

Course objectives and targeted competencies

By the end of the course our students should…

understand the current web architecture in particular the decentralized and open aspects.
gain the ability to form and defend a solid opinion on currently ongoing ethical discussions related to the Web.
realize that the study of the Web needs much broader skill set than knowledge about Computer Science.

Course content (Structure, sections, topics, references)

All students were asked to prepare a talk and read the book ”Weaving the Web” by Sir Tim Berners-Lee before the summer school started. Ten of the talks included the technical foundations starting with binary numbers going all the way to the application layer and all the necessary protocols. This included the theoretical study of IP, TCP and HTTP as well as routing algorithms (BGP ) and DNS. To ensure a better understanding the students had to form groups and implement a simple Web Server and a Web Client that were able to process HTTP1.0 GET requests during course time. This was done using the Java Programming Language and the socket classes from the Java API. These topics have been covered in the first week of the course. In the second half we focused on the ethics of the web. After each talk on an ethical topic which was supposed to give an overview for about 20 minutes we entered a 2 hour group discussion. For example for the discussion on net neutrality we knew the following groups of interests from the overview talk: Large internet providers, big web companies, small web companies, politicians, consumers. Students were randomly assigned to one of these groups. Within 10 minutes they had to prepare a list of arguments that would reflect the interests of their particular group as well as arguments they would expect from other groups. While discussing the issue on a round table they had to find a good solution respecting the technical nature of the web and the interests of their group.

Evaluation methods (Tests, projects, papers,etc.)

Even though the Summer School is very competitive participation is voluntary so there can’t be an exam or something similar in the end. Also all work had to be completed during the 50 hours course time without any home work assignments. We had three evaluation methods to ensure the comprehension of the course content.
1. Hacking Project: As already mentioned students implemented a Web Server and Web Client during the first half of the course. Being in groups of 2 or 3 students and being new to programming we teachers helped students out which gave us a nice feedback whether or not students understood the content.
2. Oral presentation: After the middle of the course students had to prepare and give a presentation to be consumed by an interdisciplinary audience i.e the students from other courses of the summer school, which are all not covering any IT topics. We asked the students to create a theatre role-play of what happens if someone types www.wikipedia.org into a web browser and hits the enter key. All students placed routing tables on the seats for the audience, created TCP / IP packets (filled with candy that represented the time to live) and routed DNS requests as well as HTTP requests together with the TCP handshake around the audience in the class room demonstrating that the basic decentralized web architecture was understood by everyone in the course.
3. Paper Writing: During the last days of the course the students were expected to collectively prepare a 25 pages documentation with scientific standards of what they have learned during the summer school. The process of creating this documentation is not only guided by us teachers but gives also a nice feedback loop to see if the goals of the course have been achieved.
Overall we can say that the concept of the course worked really well. Especially putting such a high focus on the Web Architecture and actually letting students implement protocols helped to gain a deeper understanding.

Paul Wagner and Till Speicher won State Competition "Jugend Forscht Hessen" and best Project award using neo4j

Rene — Fri, 16 Mar 2012 11:18:38 +0000

6 months of hard coding and supervising by me are over and end with a huge success! After analyzing 80 GB of Google ngrams data Paul and Till put them to a neo4j graph data base in order to make predictions for fast scentence completion. Today was the award ceremony and the two students from Darmstadt and Saarbrücken (respectivly) won the first place. Additionally the received the “beste schöpferische Arbeit” award. Which is the award for the best project in the entire competition (over all disciplines).
With their technology and the almost finnished android app typing will be revolutionized! While typing a scentence they are able to predict the next word with a recall of 67% creating a huge additional vallue for today’s smartphones.
So stay tuned of the upcomming news and the federal competition on May in Erfurt.
Have a look at their website where you can find the (still) German Documentation. As well as the source code and a demo (which I also include here (use tab completion (-: as in unix bash)
Right now it only works for German Language – since only German data was processed – so try sentences like

“Warum ist die Banane krumm” (where the rare word krumm is correctly predicted due to the relation of the famous question why is the banana curved?
“Das kann ich doch auch” (I am also able to do that)
“geht wirklich nur deutsche Sprache ?” (Is really only German language possible?)

Algorithms exercise: Find mistakes in Wikipedia articles

Rene — Wed, 11 Jan 2012 14:47:20 +0000

Today I started an experiment I created an excercise for coursework in algorithms and data structures that is very unusuale and many people have been criticle if this was a good idea. The idea behind the exercise is that studens should read wikipedia articles to topics related to lectures and find mistakes or suggest things that could be improoved. Thereby I hope that people will do something that many people in science don’t do often enough: Read something critically and carefully and question the things that you have learnt. (more discussions after the exercise)
Read the following wikipedia articles:

Find at least 5 mistakes or passages that could be improved. Write down what is wrong or what could be improved. Give a justification for your statements and write down your suggested new version of this very passage.
To get inspired you can find mistakes by looking at the discussion page of the articles or at the version history. You might also be able to look the same article in versions of other languages!
Here are some example types / things that could possibly be improoved:

pure mistakes
semantics of links
semantics of pictures
articles could explain easy concepts in difficult words
missing cites
missing links to original scientific work
…

Further discussion
I am really excited how many students will try to do this exercise and how well it is accapted and what the quality of the answers will be…
I would also love to receive your feedback, thoughts and comments about this kind of exercise! Mabe you have some ideas that could be extended or you asked students to do similar coursework?

balanced binary search trees exercise for algorithms and data structures class

Rene — Tue, 29 Nov 2011 14:20:40 +0000

I created some exercises regarding binary search trees. This time there is no coding involved. My experience from teaching former classes is that many people have a hard time understanding why trees are usefull and what the dangers of these trees is. Therefor I have created some straight forward exercises that nevertheless involve some work and will hopefully help the students to better understand and internalize the concepts of binary search tress which are in my oppinion one of the most fundamental and important concepts in a class about algorithms and data structures.

Part A: finding elements in a binary search tree – 1 Point

You are given a binary search tree and you know the root element has the value 2. Considering that the path to for finding an element in the tree is unique decide which of the following two lists can be an actual traversal part in order to receive the element 363 from the binary search tree? Why so?

2, 252, 401, 398, 330, 344, 397, 363
2, 252, 397, 398, 330, 344, 401, 363

Part B: Create binary search trees – 1 Point

You are given an empty binary search tree and two lists of the same elements.

10, 20, 5, 15, 2, 7, 23
10, 5, 7, 2, 20, 23, 15

For both lists draw all the trees that are created while inserting one element after the other one.

Part C: skewed binary search trees and traversing trees – 1 Point

Compare the trees from part B to the tree you would get if inserting the numbers in the order of 2, 5, 7, 10, 15, 20, 23
To understand the different tree traversals please give the result of the inorder and preorder traversal applied to the trees from part B and C.

Part D: Balanced binary search trees. Counting Permutations – 2 Point

We realize that trees can have different topologies as soon as the order of the inserted items changes. Since balanced trees are most desired your task is to count how many permutations of our 7 elements will lead to a balanced binary search tree!
To do so it is sufficient to write down all the permutations that will lead to a balanced binary search tree. But you do not have to do this explicitly. It is also ok to write down all classes and cases of permuations and count them.
Compare the number to all permutations of 7 elements (= 7!) and give the probability to end up with a balanced binary search tree when given a random permutation of 7 different elements.

Part E: A closed formular for the probability to create a balanced binary search tree – 2 Extra Points

Your task is to find and prove a formular that states the number of permutations of the natural numbers 1, 2,…, 2^k-1 such that inserting the numbers will create a balanced binary search tree.
Give a closed forumlar for the probability P(k) to end up with a balanced search tree. Give the explicit results for k = 1,…,10

3 exercises for Sorting problems (Quicksort , Mergesort) in Algorithms and data structures class

Rene — Mon, 14 Nov 2011 21:34:49 +0000

#1: Sorting huge files

Sorting big files might not be as simple as just implementing an sort algorithm. As soon as the file does not fit in memory any more smarter implementations have to be applied. One way is to sort the file on the hard disk. We remark that not every algorithm is easily adopted for this kind of task. So your task for the exercise is to decide what kind of alogrithms are good to solve the problem and what approach to handle huge files could be taken?

Discuss what kind of operations are efficient while retrieving / processing data from the hard disk
Discuss what kind of operations are needed in the different algorithms
Create a table to display the results and choose the most apropriate algorithm.

One possible way of implementing this would be to split the file in smaller files which can be sorted in memory and then use a bottom up merge function to merge all those files.
In order to do so you can sort this Snapshot of all wikipedia revisions taken from the German wikipedia 2011. The file is uncompressed 3.1 gigabyte in size and consists of 128 million rows. In particular it already contains a partial order.

#2: Finding the k smallest element in an unsorted List

Your task is to find the k smallest element from an unsorted list. (thanks to Robert Sedgwick for inspiration!)
Obviously one approach would be to sort all the data and then retrieve the k-th element. The runtime of this approach would be O(n log(n)) though. We want to achieve this in linear runtime which is possible due to the help of the partition function of quicksort.
After calling the partition function the unordered list is split in two sublists with lenght i and n-i. The first list contains the first i elements (not neccessarily sorted). comparing i to k tells you weather to search in the first or second sublist for the element.

Use this idea to implement findMinK(ArrayList array, int k, int l, int r)
Test the runtime of your implementation against the primitive approach of first sorting. In order to test you can just download the testframe work code below. In your function you should increase the global variable cmpcnt every time the partition function swaps elements in the list
Write down the recursive equation of your solution and solve it in order to prove that the average case runtime is also theoretically linear.
Compare this runtime behaviour to quicksort (next exercise) and explain why these approaches are in different complexity classes

import java.util.ArrayList; import java.util.Collections; import java.util.Random; public class mink { static public int cmpcnt = 0; public static void main(String[] args) { testFramework(); } public static int findMinK(ArrayList array, int k, int l, int r) { // Implement here } public static int findMinK(ArrayList array, int k){ Collections.sort(array); return array.get(k); } private static void testFramework() { ArrayList a = new ArrayList(); for (int j=2;j<8;j++){ a.clear(); for (int i=0;i<(int)Math.pow(10, j);i++){ a.add(i); } System.out.println("\n\n"+a.size()+" Elements\n\n"); double slow=0; double fast=0; for (int i = 0; i < 10; i++) { cmpcnt = 0; Collections.shuffle(a); int k = (int)(Math.random()*(Math.pow(10, j)-1))+1; System.out.println("test run number: " + i + " find: " + k); long start = System.currentTimeMillis(); findMinK(a, k, 0, a.size()-1); long end = System.currentTimeMillis(); long smarttime=(end-start); fast = fast + smarttime; System.out.println("SMART ALGO \t --- time in ms: " + smarttime + " comparisons: " + cmpcnt); start = System.currentTimeMillis(); findMinK(a, k); end = System.currentTimeMillis(); long slowtime = (end-start); System.out.println("WITH SORTING \t --- time in ms: " + slowtime); System.out.println("sorting is " +(double)slowtime/(double)smarttime + " times slower"); slow = slow + slowtime; } System.out.println("sorting (="+slow+"ms) is " +slow/fast + " times slower than smart algo (="+fast+"ms)"); } } }

#3: Solving recursive equations: Proving runtime of Quicksort

Quicksort is a probabilistic algorithm and its recursive equation is given by the implicit equation
[latex]T(n) = n + 1/n \sum_{i=1}^n(T(i)+T(n-i))[/latex]

Explain the meaning of the sum in this equation and its connection to stochastics.

Solve the equation. In order to do so. you can use the following equivalences and solve the recursive equation by substitution.

[latex]1/n \sum_{i=1}^n(T(i)+T(n-i)) = 2/n \sum_{i=1}^n(T(i)) = 2/n (T(n) + \sum_{i=1}^{n-1}(T(i)))[/latex]