how to – Data Science, Data Analytics and Machine Learning Consulting in Koblenz Germany https://www.rene-pickhardt.de Extract knowledge from your data and be ahead of your competition Tue, 17 Jul 2018 12:12:43 +0000 en-US hourly 1 https://wordpress.org/?v=4.9.6 How to learn to learn programming for non techies? https://www.rene-pickhardt.de/how-to-learn-to-learn-programming-for-non-techies/ https://www.rene-pickhardt.de/how-to-learn-to-learn-programming-for-non-techies/#respond Mon, 01 Sep 2014 22:33:41 +0000 http://www.rene-pickhardt.de/?p=1898 Out of my personal offline social network I have been frequently approached by non techies that ask me questions like

  • “I want to learn programming, which language should I learn?”
  • “At what school / course can I participate to learn programming?”
  • “Which book can you recommend if I want to learn programming?”

These questions are in my opinion very tough but I will try to give an answer. Even though a lot of what I will say is probably true for people like me who learned programming just by curiosity while being in high school or for people who wish to make a career in IT engineering I try to aim this article for people with a university degree who already have a certain knowledge of how to approach learning and want to have programming skills as an additional skill but not for their main living.
The outline of this article will be the following.

  • Why do you want to learn programming? What I will assume about you.
  • What does it mean to learn programming?
  • What is really necessary in order to learn programming?
  • Discussing various kind of languages for certain purposes.
  • Your concrete road map.

Why do you want to learn programming? What I will assume about you.

I will assume that you do not want to learn programming because you want to change your major towards IT or that you are suddenly totally interested in programming, computers, nerds and geek talk. I will rather assume that you might be a linguist who realizes that statistical natural language processing might be useful. Or you are a biologist or psychologist and need to automate some work with your experimental data from the lab. You could also be a sociologist and want to do some experiments on the web or study how people behave on the web. Maybe you ended up in a company because jobs in your major are rare and you seek for an additional qualifications. You might have similar reasons but I guess you get the gist of the article.

What does it mean to learn programming? 

Being able to program is a somewhat weird skill. Those who are able to code know clearly what they can expect from another hacker. Yet I have the feeling that people who want to learn programming often have wrong expectations what programming means. So I will try to work on your expectation management.
Know to code is a skill that exists on various levels. Being a good programmer and acquiring programming skills that are really useful for you might be a long road and might require some effort (c.f. peter norvig’s “how to learn programming in 10 years?” ). Still I have to state (in a somewhat arrogant way) that the first step is very easy (at least in theory).
The first step would probably be to learn the syntax – which are the commands or instructions of a programming language. This is simple because programming languages usually have a very small syntax. Learning the syntax and playing around with it should seriously not take you more than a view hours in total. If you consider practice and repetitions this time will be distributed over a couple of days. This is what books with titles like “how to learn XXX in YYY days” which are heavily criticized by Norvig will teach you. The same can be learnt from online courses on http://www.codecademy.com/ The main problem here is that this skill – even though at core it is all you need – seems very useless to a beginner.
In particular once you have acquired this skill you will have the feeling to know as much as before. Compare it with having the force like a Jedi knight. You might feel it but you don’t know how to use those skills yet.
In my oppinion the main problem starts now. In order to make something meaningful with your new skill you will do much more that will distract you from using or learning that skill. You will become frustrated. You will find everything complicated and you most likely will often have the feeling that you don’t know why something does not work as explained and if it finally does you might also not know why it does so at this point.
Common pitfalls for distraction are:

  • Interacting with an Integrated development environment.
  • The same operator in a programming language might have a different semantics depending on the context.
  • Interacting with a compiler or interpreter.
  • (Most likely implicitly) interacting with the operating system
  • Using third party API’s and libraries.
  • Interacting with an editor that has its special properties
  • Doing a typo with all the new syntax receiving error messages that are not telling you “hey you forgot a semicolon here…” but rather look like total chaos.

The problem is these things are almost not documented at all and are just assumed to be learnt implicitly or while talking to people or while being in a classroom. You might want to have a look at my German screen cast where I explain how to run “hello world in c“.

The program is the most basic program and and it involves almost no programming at all it just has all the overhead that was described above. Even though the screen cast is 20 minutes I still omit so much information that you might think: “Why the hell should I learn all about this?” The point I am trying to make is that it is all the other stuff that is complex.
Taking this in mind I sometimes wonder how anyone can learn programming at all anyway? The bad news is. Books won’t tell you all the differences most of the time they don’t even help you with setting up your developing environment so they leave you lost to learn a task which – again – at its core is actually pretty simple.
I would say a subset of being able to program is actually being aware of all these things and being able to distinguish what is happening at what point in time and not becoming confused by all this “magic”.
It already needs some experience to know all of this. The good news is. Once you master these things you know about everything you need in order to move on.

What is really necessary?

Even though you just want to program a little bit I think what will help you most is to bring curiosity about computers in general and the willingness to iterate, fail and play around. Don’t get frustrated if you seem to not understand anything anymore. Abstract and structured thinking is a very helpful skill which you probably bring along when you read this article and have been able to follow until here.
Most likely you will have to move out of your comfort zone. I remember learning 10 finger typing. In the beginning I thought chatting with friends is faster with my old system. I had to move out of my comfort zone to use the new skill and I had to use it in order to get used to it. The same will hold true for programming. In the beginning you might think that stuff from your day to day job go faster the way you have always done them. Take the challange. This time using programming might take longer. But over time it will save you a hack a lot of time.
But moving out of your comfort zone will mean that you have to be open to geek culture. Computer scientists are nerds and they celebrate it. I honestly learnt stuff about hacking from xkcd comics (or similar stuff).
Having courage to try out stuff and play around will be essential but still many people seem to confuse “try and error” with meaning full testing of “what happens actually if I …” (sorry that was so important I had to bold print it and I will repeat it later…). Read the fucking manual (or documentation – even though it is often missing and badly written) will be of tremendous help for you. What you have to learn is to ask as many questions as possible. A good practice is to write down the questions and explaining your problems. While doing so you will understand your problem better and most likely solve the problem on your own. (Ask questions even to this article. I am well aware of the fact, that this article leaves open questions. Go and ask me in the comments as an exercise. (NOW (:)) In general before asking questions googeling is a great idea. This of course is another way of asking questions. Honestly it is a skill to use a search engine. I have seen so many students which are not able to properly use a search engine to solve their problems. Fact is if you want to learn programming almost all questions that you have, have already been asked on the web and are indexed by a search engine. Being aware of this fact and learning to find those answers is a large portion of what you need to become a programmer.
Fiannly you will need a lot of paticence. The complexity of programming comes from the sheer amount of technologies that play together. It will be inevitably that you will be confused. Stay focused take a deep breadth start over again and try once more.

Discussing various kind of languages for certain purposes

There are various kinds of languages which serve a certain purpose. If you are coming from the outside world you will most likely know what language you want to learn. Honestly the syntax is almost the same all the time anyway. What is different are the libraries and APIs. When gaining more experience you will realize that you use tools to remember APIs and at some point in time you have a feeling of what functions already exist in some API and you probably can guess the name or at least google for it. What I am trying to say it doesn’t really matter which language you will take for learning how to program.
Still I think there are three languages that might be particular interesting candidates. All of these language have object oriented concepts. Even though I think object oriented programming is great and probably not so difficult for you. You could probably ignore the object oriented concepts.
c/c++: This language is great for people who want to focus on the understanding part of fundamental stuff and computers. It is certainly most difficult to learn and easy to mess up things. The level of frustration will be highest but the reward will be the biggest. I learnt programming with C first and from there on I could very easily move on to other languages.
Python: I think Python has a very beautiful syntax and it minimizes the distraction elements I wrote about so broadly. Especially for people who want to use programming as an additional skill python is very quick for solving smaller tasks with few overhead on the amount of code that you need. Also the standard API that comes with python is easy to use. Python can be used in the Web, in the shell. So it is probably a very nice compromise among all scripting languages.
Java: I would say Java is one of the widest spread languages. It has many strong concepts and especially a lot of tool chain around it. I did not find any other language that has so much code completion and tools to make your life easier. Still having tools thinking for you while you did not understand something is a difficult thing.
There are other languages and excellent reasons why you would want to learn them. The only language I would not recommend is java script. This language is a mess for various reasons. So unless you want to explicitly do user interfaces on the web I see almost no reason why a non IT person should learn javascript.

Your concrete road map

Let’s finally get concrete now. How to approach this project of learning to program:
10 relatively easy tips for the first week:

  1. Choose a language
  2. Install the IDE, compiler, toolchain, …
  3. Do it the hard way. Install linux (e.g ubuntu), use a fancy editor like emacs or vim and go for a fancy tool chain or version control system like git. Not that you need it. But as I said it forces you out of your comfort zone and brings you in a hacking mindset. This is most important. Loosing the fear of shallow water within usage of computers. You need to loose your fear of technology.
  4. Learn the syntax with some random book / course about how to learn programming or your selected language in 7 days. (Again the book could nowadays also be an online course)
  5. Understand that most books are didactically very bad and don’t separate between concepts, APIs, operating systems and language specific parts. 
  6. Use a cheat sheet for learning the syntax. Not in order to cheat all the time. But it helps you to distinguish between what is syntax and what is all the distracting overhead. just google for “c cheat sheet” or “java cheat sheet” or “python cheat sheet”.
  7. Understand the syntax and remember it by heart.
  8. Try not to read the examples and remember them by heart but try to reproduce them without looking at them. If you can’t read the chapter again and iterate.
  9. Most important: While working through the book / course. Play around! If something worked start asking questions. What happens if I change code here. Try to answer the question and then try it out. Even try to break the code in order to see error messages from the compiler / interpreter. don’t get confused by naming try try try and try again. 
  10. Understand the concept of variable scope – this can also be done best while playing around.

Once you are here and as I said you should not need to long. Go to the next level – The transfer / experience stage.

  1. Ask yourself what is a task that you frequently do by hand (on a computer or elsewhere) which IT people might automate. Try to automate it.
  2. Find some open source project. Look for bugtracker, mailinglist, or IRC channel. Go an contribute. Experience, experience experience. talk to the people. almost every IT person is happy if you try to solve their problems (which they just might not find the time to solve themselves). Tell the people you are a beginner. Chances are pretty high that people will help you out.
  3. Learn blackboxing (after I learned programming with 12 and creating a large scale application with 22 I started to really dig into what computers and programming languages are 4 years ago. I understood many things down to the physical process. yet still most stuff I use is a blackbox to me. It is much less of a blackbox compared to 10 years ago but still most is a blackbox!) You cannot understand everything. Sometimes it is just enogh to know “hey if I enter print(‘a’); the letter a will be outputed to the terminal.” You don’t have to understand how system i/o (input and output) really works. it will be sufficient for you to be able to use it.
  4. Understand what libraries and APIs are. Understand that some instructions and commands that first seem to be typical code (or syntax) of your language are just APIs that you should use as a black box. a typical example is the above mentioned hello world program.
  5. To some degree understand what an operating system is, what its purpose are.
  6. Have a concrete roadmap project. Nothing is more boring that a computer scientist telling you “hey I can automatically calculate square numbers…” I once wrote an blog article explaining with the python language how to find the most frequent words in books or the longest sentence and decide how long it is… (I admit this program was probably to complex for a beginner. but you could still try)  

Finnaly the hard parts

  1. As with all learning tasks find a) a sparing partner and b) as early as possible teach the stuff to others.
  2. Get a feeling on what is important to know while programming.
  3. Listen to the fucking nerd. We sometimes might not communicate well (guess what: Communication is a two way thing. Chances are high you are not asking well ether and not communicating your needs in a good way) But most nerds will be very happy to help you out. They might assume you have too much knowledge which you do not have. If they do, stop them or slow them down. They might loose themselves in details. If they do, try to ask them if this is still relevant to the question and going towards the right direction. Do your job on good communication. Remember your talking to someone from a different subculture (:
  4. Have clear questions and clear goals.

I wish you good luck and a lot of fun while acquiring your new skills. If this article was helpful for you or changed your life please tell me in 10 years or so what kind of amazing things you have built since then. Just put the mark in your calendar right now. You don’t use an electronic calender system yet? Ohhh, much to learn young Padawan (:

]]>
https://www.rene-pickhardt.de/how-to-learn-to-learn-programming-for-non-techies/feed/ 0
Version control of your Linux config with git https://www.rene-pickhardt.de/version-control-of-your-linux-config-with-git/ https://www.rene-pickhardt.de/version-control-of-your-linux-config-with-git/#comments Thu, 24 Apr 2014 17:05:37 +0000 http://www.rene-pickhardt.de/?p=1840 I was just reading through the recent notes of Heinrich which I can recommend to read as well as his old notes. When I stumbled upon the note called Monitor /etc/ using git I was confused. Why would one do this?
So I talked to Heinrich and he said:

“Well you want to monitor changes of your system config. You want to be able to revert them and you don’t want to care about this when you do something.”

I really liked this and thinks its so useful and a smart idea that I wanted to share it with you. Just keep in mind that you don’t push the git repository to some public space since the config files might include a lot of passwords. Also look out for his .gitignore in his case the printer does a lot of automatic changes and is thus ignored. You might have similar settings for your configs.
I hope sharing this was useful for you!

]]>
https://www.rene-pickhardt.de/version-control-of-your-linux-config-with-git/feed/ 4
What should I do with my 10 to 15 year old desktop pcs? https://www.rene-pickhardt.de/what-should-i-do-with-my-10-to-15-year-old-desktop-pcs/ https://www.rene-pickhardt.de/what-should-i-do-with-my-10-to-15-year-old-desktop-pcs/#comments Sat, 18 Jan 2014 12:08:35 +0000 http://www.rene-pickhardt.de/?p=1792 Hey everyone I wonder if you could help me out. I am currently at my parents home and there are some old pcs from the time when I was young (even my first very on pc is among them). They might have been bought between 1997 and 2002 and have single core processors starting from 333 Mhz going up to 1800 Mhz. Memory is also varying between 64 MB to 1 GB as is the hard disk. Those computers need way to much energy, the fan is really loud and so on…
In general the electronic parts of these computers are still in a good shape and they have served a good purpose for a long time. I can imagine many use cases yet looking at ebay you would only be able to sell these computers for 1 euro each.
I kind of refuse to give them away for free or even worse trow them away. But apparently these computers are worth nothing. Which sorry to say this again I refuse to accept. Computing power is an amazing thing.
Does anyone have a cool idea what one could do with them? Maybe install some lite weight Linux and use them to control some hardware or investigate some networking projects. I even considered using them as a file / backup server but this also seems not to be a good idea since the energy consumption as mentioned above is too high and network storage devices which you can buy nowadays seem to fulfill the service much better.
I tried to google for the problem but I only find boring articles without any good ideas. So if anyone of you had some idea this would be highly appreciated.

]]>
https://www.rene-pickhardt.de/what-should-i-do-with-my-10-to-15-year-old-desktop-pcs/feed/ 14
GWT + database connection in Servlet ContextListener – Auto Complete Video Tutorial Part 5 https://www.rene-pickhardt.de/gwt-database-connection-in-servlet-contextlistener-auto-complete-video-tutorial-part-5/ https://www.rene-pickhardt.de/gwt-database-connection-in-servlet-contextlistener-auto-complete-video-tutorial-part-5/#comments Mon, 24 Jun 2013 11:44:47 +0000 http://www.rene-pickhardt.de/?p=1653 Finally we have all the basics that are needed for building an Autocomplete service and now comes the juicy part. From now on we are looking at how to make it fast and robust. In the current approach we open a new Data base connection for every HTTP request. This needs quite some time to lock the data base (at least when using neo4j in the embedded mode) and then also to run the query without having any opportunities to use the caching strategy of the data base.
In this tutorial I will introduce you to the concept of a ContextListener. This is roughly spoken a way of storing objects in the Java Servlet global memory using key value pairs. Once we understand this the roadmap is very clear. We can store objects like data base connections or search indices in the memory of our web server. As from what I currently understand this could also be used to implement some server side caching. I did not do any benchmarking yet testing how fast retrieving objects from context works in tomcat. Also this method of caching does not scale horizontally well as using memcached.
Anyway have fun learning about the context listener.

If you have any suggestions, comments or thoughts or even know of some solid benchmarks about caching using the ServletContext (I did a quick web search for a view minutes and didn’t find any) feel free to contact me and discuss this!

]]>
https://www.rene-pickhardt.de/gwt-database-connection-in-servlet-contextlistener-auto-complete-video-tutorial-part-5/feed/ 1
Building an Autocomplete Service in GWT screencast Part 4: Integrating the neo4j Data base https://www.rene-pickhardt.de/building-an-autocomplete-service-in-gwt-screencast-part-4-integrating-the-neo4j-data-base/ https://www.rene-pickhardt.de/building-an-autocomplete-service-in-gwt-screencast-part-4-integrating-the-neo4j-data-base/#comments Thu, 20 Jun 2013 12:38:46 +0000 http://www.rene-pickhardt.de/?p=1640 In this screencast of my series I explain at a very basic level how to integrate a data base to pull data for autocomplete queries. Since we have been working with neo4j at this time I used a neo4j data base. It will be only in the next two parts of this series where I introduce an efficient way of handling the data base (using the context listener of the web server) and building fast indices. So in this lesson the resulting auto complete service will be really slow and impractical to use but I am sure for didactic reasons it is ok to invest 7 minutes for a rather bad design.
Anyway if you want to use the same data set as I used in this screencast you can go to http://data.related-work.net and find the data set as well as a description of the data base schema:

]]>
https://www.rene-pickhardt.de/building-an-autocomplete-service-in-gwt-screencast-part-4-integrating-the-neo4j-data-base/feed/ 2
Building an Autocomplete Service in GWT screencast Part 3: Getting the Server code to send a basic response https://www.rene-pickhardt.de/building-an-autocomplete-service-in-gwt-screencast-part-3-getting-the-server-code-to-send-a-basic-response/ https://www.rene-pickhardt.de/building-an-autocomplete-service-in-gwt-screencast-part-3-getting-the-server-code-to-send-a-basic-response/#comments Mon, 17 Jun 2013 12:20:11 +0000 http://www.rene-pickhardt.de/?p=1626 In this screencast of my series on building an autocomplete service you will learn how to implement a Server servlet in GWT such that autocomplete queries receive a response. In this video the response will always be static and very naive. It will be up to the fourth part of this series which will follow already this week to make the server to something meaningful with the query. This part is rather created to see how the server is supposed to be invoked and what kind of tools and classes are needed. So see this as a preparation for the really interesting stuff.

If you have any questions, suggestions and comments feel free to discuss them.

]]>
https://www.rene-pickhardt.de/building-an-autocomplete-service-in-gwt-screencast-part-3-getting-the-server-code-to-send-a-basic-response/feed/ 1
Create a Screencast in Ubuntu with recordmydesktop and do Soundengineering and post production https://www.rene-pickhardt.de/create-a-screencast-in-ubuntu-with-recordmydesktop-and-do-soundengineering-and-post-production/ https://www.rene-pickhardt.de/create-a-screencast-in-ubuntu-with-recordmydesktop-and-do-soundengineering-and-post-production/#comments Fri, 17 May 2013 18:36:55 +0000 http://www.rene-pickhardt.de/?p=1600 I promised to create some screen casts for the autocomplete service with GWT based on neo4j. After I have created all the screencasts I had to go through quite some hassle in order to do so.
So let me share my experience and toolchain in order to produce a somewhat accaptable screen cast. I am still not happy with the quality of the results and if you have any suggestions feel free to tell me.
First of all you download and install recordmydesktop. Once this is done you can just do a screencast by calling the following line.
recordmydesktop --quick-subsampling --full-shots --no-shared --v_bitrate 2000000 --on-the-fly-encoding
bear in mind that you should have at least a dual core processor if you really want to do video encoding on the fly.
After creating this screencast I realized that my audio and video tracks where not in sync. Even worse. I realized that the video track was always a few seconds shorter than the audio track. At the end of the video when the audio wasn’t finnished yet the last video frame would just be displayed. This might be alright on a 20 second screencast but some of my screencasts where up to 14 minutes and that really sucked! because speaking and video would just not be synchrone anymore.
In another screen cast that I did today I had the problems with the sound not being loud enough so Here is the stack of software that works fine with corrupt .ogv files produced by recordmydesktop.
After searching quite a while and trying several programms (pitivi, Avidemux,oggtools(!),…) that all crashed on the .ogv file that came out of recordmydesktop I reallized that with ffmpeg2theora I could actually extract the audio and video files seperately. to do so just enter:
ffmpeg2theora video.ogv --noaudio
ffmpeg2theora video.ogv --novideo

The problem was the Audiofile. I was able to play it with my Audioplayer but again not able to open it with any editing software but oggconvert!
With this little tool I was able to open the corrupt .oga audio file and save it to an .ogg file (with vorbis audio codec!).
Now after I had a non corrupted audiofile I was able to adjusting the speed of the audiofile adjusting it to the length of the recoreded video with audacity. Easy instructions how to do this are available at http://wiki.audacityteam.org/wiki/Change_Speed Audacity also provided the options for me to amplify the volume of the sound.
Finally I had two files of the same length just need to merge them again. The oggTools again failed to help me on this but there is pitivi which is able to do the job.
So you see if one keeps patient one will reach his goal. Some of the screen casts are already published and more will come. So far I hope you enjoyed reading my experiences and solutions
by the way on wikipedia there are quite some tutorials on how to edit ogg videos.
http://commons.wikimedia.org/wiki/Help:Converting_video

]]>
https://www.rene-pickhardt.de/create-a-screencast-in-ubuntu-with-recordmydesktop-and-do-soundengineering-and-post-production/feed/ 2
Building an Autocompletion on GWT screencast Part 2: Invoking The Remote Procedure Call https://www.rene-pickhardt.de/building-an-autocompletion-on-gwt-screencast-part-2-invoking-the-remote-procedure-call/ https://www.rene-pickhardt.de/building-an-autocompletion-on-gwt-screencast-part-2-invoking-the-remote-procedure-call/#comments Tue, 12 Mar 2013 07:25:00 +0000 http://www.rene-pickhardt.de/?p=1544 Hey everyone after posting my first screencast in this series reviewing the basic process for creating remote procedure calls in GWT we are now finally starting with the real tutorial for building an autocomplete service.
This tutorial (again hosted on wikipedia) covers the basic user interface meaning

  • how to integreate a SuggestBox instead of a textfield into the GWT Starter project
  • how to set up the neccessary stuff (extending a SuggestOracle) to fire a remote procedure call that requests suggestions if the user has typed something.
  • how to override the necessary methods from the SuggestOracle Interface

So here we go with the second part of the screencast which you can of course directly download from wikipedia:

Feel free to ask questions, give comments and improve the screencast!

]]>
https://www.rene-pickhardt.de/building-an-autocompletion-on-gwt-screencast-part-2-invoking-the-remote-procedure-call/feed/ 2
Building an Autocompletion on GWT screencast Part 1: Getting Warm – Reviewing remote procedure calls https://www.rene-pickhardt.de/building-an-autocompletion-on-gwt-screencast-part-1-getting-warm-reviewing-remote-procedure-calls/ https://www.rene-pickhardt.de/building-an-autocompletion-on-gwt-screencast-part-1-getting-warm-reviewing-remote-procedure-calls/#comments Tue, 19 Feb 2013 09:11:29 +0000 http://www.rene-pickhardt.de/?p=1539 Quite a while ago I promised to create some screencasts on how to build a (personalized) Autocompletion in GWT. Even though the screencasts have been created for quite some time now I had to wait publishing them for various reasons.
Finally it is now the time to go public with the first video. I do really start from scratch. So the first video might be a little bit boaring since I am only reviewing the Remote Procedure calls of GWT.
A litte Note: The video is hosted on Wikipedia! I think it is important to spread knowledge under a creative commons licence and the youtubes, vimeos,… of this world are rather trying to do a vendor lock in. So If the embedded player is not so well you can go directly to wikipedia for a fullscreen version or direct download of the video.

Another note: I did not publish the source code! This has a pretty simple reason (and yes you can call me crazy): If you really want to learn something, copying and pasting code doesn’t help you to get the full understanding. Doing it step by step e.g. watching the screencasts and reproducing the steps is the way to go.
As always I am open to suggestions and feedback but please have in mind that the entire course of videos is already recorded.

]]>
https://www.rene-pickhardt.de/building-an-autocompletion-on-gwt-screencast-part-1-getting-warm-reviewing-remote-procedure-calls/feed/ 4
Experiences on semantifying a Mediawiki for the biggest recource about Chinese rock music: rockinchina .com https://www.rene-pickhardt.de/experiences-on-semantifying-a-mediawiki-for-the-biggest-recource-about-chinese-rock-music-rockinchina-com/ https://www.rene-pickhardt.de/experiences-on-semantifying-a-mediawiki-for-the-biggest-recource-about-chinese-rock-music-rockinchina-com/#comments Mon, 07 Jan 2013 09:38:45 +0000 http://www.rene-pickhardt.de/?p=1486 During my trip in China I was visiting Beijing on two weekends and Maceau on another weekend. These trips have been mainly motivated to meet old friends. Especially the heads behind the biggest English resource of Chinese Rock music Rock in China who are Max-Leonhard von Schaper and the founder of the biggest Chinese Rock Print Magazin Yang Yu. After looking at their wiki which is pure gold in terms of content but consists mainly of plain text I introduced them the idea of putting semantics inside the project. While consulting them a little bit and pointing them to the right resources Max did basically the entire work (by taking a one month holiday from his job. Boy this is passion!).
I am very happy to anounce that the data of rock in china is published as linked open data and the process of semantifying the website is in great shape. In the following you can read about Max experiences doing the work. This is particularly interesting because Max has no scientific background in semantic technologies. So we can learn a lot on how to improve these technologies to be ready to be used by everybody:

Max report on semantifying

max-leonhard-von-schaper
Max-Leonhard von Schaper in Beijing.
To summarize, for a non-scientific greenhorn experimenting with semantic mediawiki and the semantic data principle in general, a good two months were required to bring our system to the point where it is today. As easy as it seems in the beginning, there is still a lot of manual coding and changing to be done as well as trial-and-error to understand how the new system is working.
Apart from the great learning experience and availability of our data in RDF format, our own website expanded in the process by ~20% of content pages (from 4000 to above 5000), adding over 10000 real property triplets and gaining an additional 300 thousand pageviews.
Lessons learnt in a comprised way:

  • DBPedia resources are to be linked with “resources” in the URI not with “page”
  • SMW requires the pre-fix “foaf:” or “mo:” or something else for EACH imported property
  • Check the Special:ExportRDF early to see if your properties work
  • Properties / Predicates , no difference with SMW
  • How to get data to freebase depends on the backlinks and sameas to other ontologies as well as entering data in semantic search engines
  • Forms for user data entry are very important!
  • As a non-scientific person without feedback I would not have been able to implement that.
  • DBPedia and music ontology ARE not interlinked with SAMEAS (as checked on sameas.org).
  • Factbox only works with the standard skin (monoskin). For other skins one has to include it in the PHP code oneself.

Main article

The online wiki Rock in China has been online for a number of years and focusses on Chinese underground music. Prior to starting implementing Semantic Mediawikia our wiki had roughly 4000 content pages with over 1800 artists and 900 records. We used a number of templates for bands, CDs, venues and labels, but apart from using numerous categories and the DynamicPageList extension for a few joints, we were not able to tangibly use the available data.
DPL example for JOINT between two Wikipedia Categories:

<DynamicPageList>
category = Metal Artists
category = Beijing Artists
mode     = ricstyle
order  = ascending
</DynamicPageList>

Results of a simple mashup query: display venues in beijing on a Google Map

After having had an interesting discussion with Rene on the benefits of semantic data and Open Linked Data, we decided to go Semantic. As total greenhorns to the field and with only limited programming skills timely available, we started off googeling the respective key terms and quickly enough came to the websites of the Music Ontology and the Semantic Mediawiki, which we decided to install.
Being an electrical engineer with basic IT backgrounds and many years of working on the web in PHP, HTML, Joomla or Mediawiki, it was still a challenge to get used to the new semantic way of talking and understanding the principles behind. Not so much because there might not be enough tutorials or data information out in the web, but because the guiding principle is somewhere but not where I was looking. Without the help of Rene and several feedback discussions I don’t it would have been possible for us to implement this system within the month that it took us.
Our first difficulty (after getting the extension on our FTP server) was to upgrade our existing Mediawiki from version 1.16 to version 1.19. An upgrade that used up the better part of two days, including updating all other extensions as well (with five of them not working anymore at all, as they are not being further developed) and finally getting our first Semantic Property running.
Upon starting of implementing the semantic approach, I read a lot online on the various ontologies available and intensively checked the Music Ontology. However Music Ontology is by far the wrong use case for our wiki, as Music Ontology is going more into the musical creation process and Rock in China is describing the scene developments. All our implementations were tracked on the wiki page Rock in China – Semantic Approach for other team members to understand the current process and to document workarounds and problems.
Our first test class had been Venue, a category in which we had 40 – 50 live houses of China with various level of data depth that we could put into the following template SemanticVenue:

{{SemanticVenue
|Image=
|ImageDescription=
|City=
|Address=
|Phone=
|Opened=
|Closed=
|GeoLocation=
}}

As can be seen from the above template both predicates (City) and properties (Opened) are being proposed for the semantic class VENUE. Semantic Mediawiki is implementing this decisive difference in a very user-friendly way by setting the TYPE of each SMW property to either PAGE or something else. As good as this is, it somehow confuses if one is talking with someone else about the semantic concept in principle.
A major problem had been the implementation of external ontologies which was not sufficiently documented on the semantic mediawiki page, most probably due to a change in versioning. Especially the cross-referencing to the URI was a major problem. As per Semantic Mediawiki documentation, aliases would be allowed, however with trial and error, it was revealed that only a property with a domain prefix, e.g. foaf:phone or owl:sameas would be correctly recognized. We used the Special:RDFExport function to find most of these errors, everytime our URI referencing was wrong, we would get a parser function error.
First, the wrong way for the following two wiki pages:

  • Mediawiki:smw_import_mo
  • Property:genre

Mediawiki:smw_import_mo:

http://purl.org/ontology/mo/ |[http://musicontology.com/ Music Ontology Specification]
activity_end|Type:Date
activity_start|Type:Date
MusicArtist|Category
genre|Type:Page
Genre|Category
track|Type:String
media_type|Type:String
publisher|Type:Page
origin|Type:Page
lyrics|Type:Text
free_download|Type:URL

Property:genre:

[[Has type::Page]][[Imported from::mo:genre]]

And now the correct way how it should be actually implemented to work:
Mediawiki:smw_import_mo:

http://purl.org/ontology/mo/|[http://musicontology.com/ Music Ontology Specification]
activity_end|Type:Date
activity_start|Type:Date
MusicArtist|Category
genre|Type:Page
Genre|Category
track|Type:String
media_type|Type:String
publisher|Type:Page
origin|Type:Page
lyrics|Type:Text
free_download|Type:URL

Property:mo:genre:

[[Has type::Page]][[Imported from::mo:genre]]

The ontology with most problems was the dbpedia, which documentation did not tell us what the correct URI was. Luckily the mailing list provided support and we got to know which the correct URI was:

http://www.dbpedia.org/ontology/

Being provided that, we were able to implement a number of semantic properties for a number of classes and start updating our wiki pages to get the data on our semantic database.
To utilize semantic properties within a wiki, there is a number of extensions available, such as Semantic Forms, Semantic Result Formats and Semantic Maps. The benefits we were able to gain were tremendous. For example the original JOINT query that we had been running at the beginning of the blog post with DPL was now able to be utilized with the following ASK query:

{{#ask: [[Category:Artists]] [[mo:origin:Beijing]]
|format=list
}}

However with the major benefit that the <references/> extension would NOT be broken after setting the inline query within a page. Dynamic Page List breaks the <references/>, rendering a lot of information lost. Other examples of how we benefitted from semantics is that previously we were only able to use Categories and read information of joining one or two categories, e.g. Artist pages that were both categorized as BEIJING artists and METAL artists. However now, with semantic properties, we had a lot of more data to play around with and could create mashup pages such as ROCK or Category:Records on which we were able to implement random videos from any ROCK artists or on which we were able to include a TIMELINE view of released records.

Mashup Page with a suitable video

With the help of the mailing list of Semantic Mediawiki itself (which was of great help when we were struggling) we implemented inline queries using templates to avoid later data changes on multiple pages. That step taken, the basic semantic structures were set up at our wiki and it was time for our next step: Bringing the semantic data of our wiki to others!
And here we are, asking ourselves: How will Freebase or DBpedia actually find our data? How will they include it? Discussing this with Rene a few structural problems became apparent. Being used to work with Wikipedia we usually set the property same:

Owl:sameas (or sameas)

On various of our pages directly to Wikipedia pages.
However we learnt that the property

foaf:primaryTopic

is a much better and accurate property for this. The sameas property should be used for semantic RDF pages, i.e. the respective DBPedia RESOURCE page (not the PAGE page). Luckily we already implemented the sameas property mostly in templates, so it was easy enough to exchange the properties.
Having figured out this issue, we checked out both the freebase page as well as other pages, such as DBpedia or musicbrainz, but there seems to be no “submit RDF” form. Hence we decided that the best way for getting recognized in the Semantic Web is to include more links to other RDF resources, e.g. for our Category:Artists we set sameas links to dbpedia and music ontology. For dbpedia we linked to the class and for music ontology to the URI for the class.
Note on the side here, when checking on sameas.org, it seems that music ontology is NOT cross-linked to dbpedia so far.
Following the recommendations set forth at Sindice, we changed our robots.txt to include our semantic sitemap(s):

Sitemap: http://www.music-china.org/wiki/index.php?title=Special:RecentChanges&feed=atom
Sitemap: http://www.rockinchina.com/wiki/index.php?title=Special:RecentChanges&feed=atom

Going the next step we analyzed how we can include external data on our SMW, e.g. from musicbrainz or from youtube. Being a music-oriented page especially Youtube was of particular interest for us. We found the SMW extension External Data that we could use to connect with the Google API:

{{#get_web_data:
url=https://www.googleapis.com/youtube/v3/search?part=snippet&q=carsick+cars&topicId=%2Fm%2F03cmgbv&type=video&key=Googlev3API&maxResults=50
|format=JSON
|data= videoId=videoId,title=title
}}

And

{{#for_external_table:
{{Youtube|ID={{{videoId}}}|title={{{title}}} }}<br/>
{{{videoId}}} and {{{title}}}<br/>
}}

See our internal TESTPAGE for the live example.
Youtube is using its in-house Freebase ID system to generate auto-channels filled with official music videos of bands and singers. The Freebase ID can be found on the individual freebase RESOURCE page after pressing the EDIT button. Alternatively one could use the Google API to receive the ID, but would need a Youtube internal HC ID prior to that. Easy implementation for our wiki: Include the FreebaseID as semantic property on artist pages within our definitions template:

{{Definitions
|wikipedia=
|dbpedia=
|freebase=
|freebaseID=
|musicbrainz=
|youtubeautochannel=
}}

Voila, with the additional SQL-based caching of request queries (e.g. JSON) our API load on Google is extremely low as well as increasing speed for loading a page at our wiki. Using this method we were able to increase our saved YOUTUBE id tags from the original 500 to way over 1000 within half a day.

A big variety of videos for an act like carsick cars is now available thanks to semantifying

With these structures in place it was time to inform the people in our community not only on the changes that have been made but also on the additional benefits and possibilities. We used our own blog as well as our Facebook page and Facebook group to spread the word.

]]>
https://www.rene-pickhardt.de/experiences-on-semantifying-a-mediawiki-for-the-biggest-recource-about-chinese-rock-music-rockinchina-com/feed/ 3