metalcon – Data Science, Data Analytics and Machine Learning Consulting in Koblenz Germany https://www.rene-pickhardt.de Extract knowledge from your data and be ahead of your competition Tue, 17 Jul 2018 12:12:43 +0000 en-US hourly 1 https://wordpress.org/?v=4.9.6 My ranked list of priorities for Backend Web Programming: Scalability > Maintainable code > performance https://www.rene-pickhardt.de/my-ranked-list-of-priorities-for-backend-web-programming-scalability-maintainable-code-performance/ https://www.rene-pickhardt.de/my-ranked-list-of-priorities-for-backend-web-programming-scalability-maintainable-code-performance/#comments Sat, 27 Jul 2013 09:21:55 +0000 http://www.rene-pickhardt.de/?p=1721 The redevelopment of metalcon is going on and so far I have been very concerned about performance and webscale. Due to the progress of Martin on his bachlor thesis we did a code review of the code to calculate Generalized Language Models with Kneser Ney Smoothing. Even though his code is a standalone (but very performance sensitive) software I realized that for a Web application writing maintainable code seems to be as important as thinking about scalability.

Scaling vs performance

I am a performance guy. I love algorithms and data structures. When I was 16 years old I already programmed a software that could play chess against you using a high performance programming technique called Bitboards.
But thinking about metalcon I realized that web scale is not so much about performance of single services or parts of the software but rather about the scalability of the entire architecture. After many discussions with colleagues Heinrich Hartmann and me came up with a software architecture from which we believe it will scale for web sites that are supposed to handle several million monthly active users (probably not billions though). After discussing the architecture with my team of developers Patrik wrote a nice blog article about the service oriented data denormalized architecture for scalable web applications (which of course was not invented by Heinrich an me. Patrik found out that it was already described in a WWW publication from 2008).
Anyway this discussion showed me that Scalability is more important than performance! Though I have to say that the stand alone services should also be very performant. if a service can only handle 20 requests per seconds – even if it easily scales horizontally – you will just need too many machines.

Performance vs. Maintainable code

Especially after the code review but also having the currently running metalcon version in mind I came to the conclusion that there is an incredibly high value in maintainable code. The hackers community seems to agree on the fact that maintainability comes over performance (only one of many examples).
At that point I want to recall my initial post on the redesign of metalcon. I had in mind that performace is the same as scaling (which is a wrong assumption) and asked about Ruby on rails vs GWT. I am totally convinced that GWT is much more performant than ruby. But I have seen GWT code and it seems almost impractical to maintain. On the other side from all that I know Ruby on Rails is very easy to maintain but it is less performant. The good thing is it easily scales horizontally so it seems almost like a no brainer to use Ruby on Rails rather than GWT for the front end design and middle layer of metalcon.

Maintainable code vs scalability

Now comes the most interesting fact that I realized. A software architecture scales best if it has a lot of independent services. If services need to interact they should be asynchronous and non blocking. Creating a clear software architecture with clear communication protocols between its parts will do 2 things for you:

  1. It will help you to maintain the code. This will cut down development cost and time. Especially it will be easy to add , remove or exchange functionality from the entire software architecture. The last point is crucial since
  2. Being easily able to exchange parts of the software or single services will help you to scale. Every time you identify the bottleneck you can fix it by exchanging this part of the software to a better performing system.
In order to achieve scalable code one needs to include some middle layer for caching and one needs to abstract certain things. The same stuff is done in order to get maintainable code (often decreasing performance)

Summary

I find this to be very interesting and counter intuitive. One would think that performance is a core element for scalability but I have the strong feeling that writing maintainable code is much more important. So my ranked list of priorities for backend web programming (!) looks like that:

  1. Scalability first: No Maintainable code helps you if the system doesn’t scale and can’t be served to millions of users
  2. Maintainable code: As stated above this should go almost hand in hand with scalability
  3. performance: Of course we can’t have a data base design where queries need seconds or minutes to run. Everthing should happen within a few milliseconds. But if the code can become more maintainable at the cost of another few milliseconds I guess thats a good investment.
]]>
https://www.rene-pickhardt.de/my-ranked-list-of-priorities-for-backend-web-programming-scalability-maintainable-code-performance/feed/ 2
Why would musicians use online social networking sites? https://www.rene-pickhardt.de/why-would-musicians-use-online-social-networking-sites/ https://www.rene-pickhardt.de/why-would-musicians-use-online-social-networking-sites/#respond Wed, 17 Jul 2013 09:56:45 +0000 http://www.rene-pickhardt.de/?p=1666 For the last 5 years I have been running metalcon an online social network for metal fans and metal bands.
As written recently I have the the chance to rewrite the entire platform with a team of 6 programmers.
This time we want to do it the correct why. Instead of Thinking of features right away we are now thinking about the various stake holders for whom we are creating metalcon.
Of course being a fan of metal music I know pretty well what requirements a social network for metal music should fulfill to add value for me.
Running such a platform and being member of the In Legend team I also can think of various requirements from Musicians but I want to open the discussion and ask the musicians:

What do you expect from a social networking site and for what reasons would you use it?

We have already created a small list at:
https://github.com/renepickhardt/metalcon/wiki/requirementsBand
which I am sharing here and asking musicians to contribute to. Eather here in the comment section or via the github wiki. Thanks a lot

Self promotion

Bands want to advertise their music and get a lot of attation.

Music hosting

Bands often lack technical knowledge to host their music. Metalcon can provide them with the functionality to host promotional songs and also share the player on other websites and with other services.

Control

The worst that could happen to a band is that they spend a lot of effort building up their fan base on a social networking site like they did in the early 2000s and then the site becomes irrelevant. Similar problems hold with facebook where page owners nowadays have to pay money to get their message spread to everybody who liked the fan page.

Therefore a requirement for musicians is to keep control of their fan base that they have grown so far.

Contact with fans and Streetteam

Bands want contact to their most important fans and to people who can organize stuff for them. Heaving a streetteam is essential to the sucess of many bands. Using a social networking service can help the band to fulfill their goal

Privacy

Famous musicians want to use the band profil of a social networking site without revealing their private account.

Staying in contact with other industry players

Musicians might want to stay in contact with partners from

  • labels
  • booking agencies
  • promoters
  • photographers
  • video producers
  • music producers
  • venue owners

Sell products

Bands want the possibility to sell

  • Tickets
  • Merchandise
  • Music (MP3 and CD)

Booking

Bands want to get the opportunity to book gigs. Giving them the opportunity to get in contact with bookers will help them.

release Management

Bands often have a complete produced record. This should be shared with some players from the industry. In this process the music should be

  • hosted in the web
  • watermarked (to prevent leaking)
  • shared privately with selected partners
  • only be streamed from the web
]]>
https://www.rene-pickhardt.de/why-would-musicians-use-online-social-networking-sites/feed/ 0
Metalcon finally gets a redesign – Thinking about high scalability https://www.rene-pickhardt.de/metalcon-finally-becomes-a-redesign-thinking-about-high-scalability/ https://www.rene-pickhardt.de/metalcon-finally-becomes-a-redesign-thinking-about-high-scalability/#comments Mon, 17 Jun 2013 15:21:30 +0000 http://www.rene-pickhardt.de/?p=1631 Finally metalcon.de the social networking site which Jonas, Jens and me created in 2008 gets a redesign. Thanks to the great opportunities at the Institute for Web Science and Technologies here in Koblenz (why don’t you apply for a PhD position with us?) I will have the chance to code up the new version of metalcon. Kicking off on July 15th I will lead a team of 5 programmers for the duration of 4 months. Not only will the development be open source but during this time I will constantly (hopefully on a daily basis) write in this blog about the design decisions we took in order to achieve a good scaling web service.
Before I share my thoughts on high scaling architectures for web sites I want to give a little history and background on what metalcon is and why this redesign is so necessary:

Metalcon is a social networking site for german fans of metal music. It currently has

  • a user base of 10’000 users.
  • about 500 registered bands
  • highly semantic and interlinked data base (bands, geographical coordinates, friendships, events)
  • 624 MB of text and structured data about the mentioned topics.
  • fairly good visibility in search engines.
  • > 30k lines of code (mostly PHP)
  • a bad scaling architecture (own OR-mapper, own AJAX libraries, big monolithic data base design, bad usage of PHP,…)
  • no unit tests (so code maintenance is almost impossible)
  • no music and audio files
  • no processes for content moderation
  • no processes to fight spam and block users
  • a really bad usability (I could write tons of posts at which points the usability lacks)
  • no clear distinction of features for users to understand

When we built metalcon no one on the team had experience with high scaling web applications and we were about happy to get it running any way. After returning from china and starting my PhD program in 2011 I was about to shut down metalcon. Though we became close friends the core team was already up on new projects and we have been lacking manpower. On the other side everyone kept on telling me that metalcon would be a great place to do research. So in 2011 Jonas and me decided to give it another shot and do an open redevelopment. We set up a wiki to document our features and the software and we created a developer blog which we used to exchange ideas. Also we created some open source project to which we hardly contributed code due to the lacking manpower…
Well at that time we already knew of too many problems so that fixing was not the way to go. At least we did learn a lot. Thinking about high scaling architectures at that time I new that a news feed (which the old version of metalcon already had) was very core for the user experience. Reading many stack exchange discussions I knew that you wouldn’t build such a stream on MySQL. Also playing around with graph databases like neo4j I came to my first research paper building graphity a software which is designed to distribute highly personalized news streams to users. Since our development was not proceeding we never deployed Graphity within metalcon. Also building an autocomplete service for the site should not be a problem anymore.

Roadmap for the redesign

  • Over the next weeks I hope to read as many interesting articles about technologies and high scalability as I can possibly find and I will be more than happy to get your feedback and suggestions here. I will start reading many articles of http://highscalability.com/ This blog is pure gold for serious web developers. 
  • During a nice discussion about scalability with Heinrich we already came up with a potential architecture of metalcon. I will soon introduce this architecture but want to check first about the best practices in the high scalability blog.
  • In parallel I will also collect the features needed for the new metalcon version and hopefully be able to pair them with usefull technologies. I already started a wikipage about features and planned technologies to support them.
  • I will also need to decide the programming language and paradigms for the development. Right now I am playing around with ruby on rails vs GWT. We made some greate experiences with the power of GWT but one major drawback is for sure that the website is more an application than some lightweight website.

So again feel free to give input, share your ideas and experiences with me and with the community. I will be ver greatfull for every recommendation of articles, videos, books and so on.

]]>
https://www.rene-pickhardt.de/metalcon-finally-becomes-a-redesign-thinking-about-high-scalability/feed/ 10
How Tim Berners Lee told me in front of thousand people: “Go geek and do it” https://www.rene-pickhardt.de/how-tim-berners-lee-told-me-in-front-of-thousand-people-go-geek-and-do-it/ https://www.rene-pickhardt.de/how-tim-berners-lee-told-me-in-front-of-thousand-people-go-geek-and-do-it/#comments Fri, 20 Apr 2012 12:50:12 +0000 http://www.rene-pickhardt.de/?p=1254
The statement already got twittered by my colleague Thomas Gottron and retweeted by many others

I am at www2012 conference and after the keynote by Neelie Kroes there was a panel discussion with her, Tim Berners Lee and Gille Babinet.
The discussion was about the question “Weather access to an open internet should be a human right?”
Clearly knowing where I am standing on this issue (yes it should be!) I was very happy that this question was discussed in front of such an audience. Tim Berners Lee obviously agreed on this point and Neelie Kroes really had some great and very diplomatic insights.
But for some reason the discussion always drifted up to the drawbacks of the web like copyright infringement. I was starting to get annoyed by this. Especially because it was always going as Free web vs copyright protection. So I decided to ask a question during Q&A which I am now about to blog.

During Q&A I also gave a litte background on the actuall question but I want to be a bit more detailed in my blog:

  • So yes I wish the “open web” to be a human right.
  • And I also think it is really important to protect the copyrights of artists, musicians and other people creating stuff. Working together with In Legend I really know how hard it is for a musician to survive and it is really important that he gets paid for what he does and shares.
  • BUT: the discussion is always an “eather – or” discussion and goes in the wrong direction! Bastian Emig from In Legend is very open minded about new ways to use the web working for the musician. Already in the plenary session Tim pointed out that he did not invent the Web to harm the record industry. But it is rather the record industry that refuses to think about new business models and just wishes everything to stay in the old ways which used to work quite well for them. 
  • I made the experience that a band still needs to have a record label. You don’t get booked without the label. You don’t get articles in big print mags. The label gives you trust within the industrie and without that you are not seen by many people. And so on…
  • But just in my experience I see that the record label does big harm to a musician. As a member of this musicband I want to share our music on the web. Since there is piracy – which I cant change – I just have to think about a way how I could profit from it. Obviously by sharing the music myself I can increase my reach. This could significantly increase my chances for direct marketing (making the record label kind of obsolate) and this is what the labels seem to be afraid of. The web offers several huge opportunities for musicians to become recognized and an established act. But Labels own the licences and block musicians in doing smart and wise moves on the web.
  • I realize this problem exists due to the fact that labels have a monopoly on the product and too much power but pretending to protect the interests of the artists. Thereby hiding the fact that they are just fighting for their very own interests which do not neccessarily correlate  whitch those from artists.

Here my question / point

It is not about copyright vs free / open internet. It is much more about a new model of copyright that can coexist with a free internet. In This new model licence owners (e.g. the labels) wouldn’t build those exclusive monopolies giving them such a high power. I asked what can be done to establish a new way of thinking about copyright. Since it really does not make sense that itunes gets 50% royalties for a digital distribution that is almost free of cost which I could easily run myself!
First of all – to my surprise – this won me a big applause from the audience which happened very rarely during the conference.

The full panel and discussion can be found at: http://www2012.wwwconference.org/media/videos/keynote-neelie-kroes/
Gille – to whom the question was originally directed – who is very friendly to the record industry answered some stuff I don’t even remember but he was basically stumbling around.
But then two really great answers came along:
Neelie:
“We are working on this and we see that the biggest issue is the record industrie. They pretend to protect the artists and they are not! We need legislation but maybe we need new forms of legislation. Models that worked well in the past may not serve our needs in todays world. I agree with you that you are pointing to the most cruicial point in this discussion.”
Me being totally satisfied with her answer sat down but Tim Berners Lee wanted to say something:
“You know it! Think of a world that you want. Just imagine it!

  • What would be the distribution? 
  • what would be the user interface? 
  • What would be the processes? 
  • What third parties would be involved.

Go out and build it! Talk to the people here. Install an apache server and just go geek and make it happen!”

what a great statement!

It is always nice to have ideas and see solutions to problems. And yes you can always wine around and do nothing. But as a matter of the fact right now the web is still open an free! The technology is there. It really is just a matter of going out an building it. This is what I always said: This is why big traditional media companies didn’t built the youtube, google, facebooks, twitters, flickr,… applications in this world. 
This statement gave me a lot of confidence to stronger believe in my ideas and even one day later I am really feeling that this statement will change my future life. It is really interesting that a man – who I value a lot – tells me something I always felt, hardly did and hits right a way to one of my weekest points! 
After the sesion I got my copy of Tim Berners Lee’s book signed and he asked me to send him an email once my site is up. It is really amazing to receive this kind of feedback by such a great person.
That was one of the most inspiring moments in my life! So anyone who wants to join me going geek on the next generation music web app is very welcome to contact me or leave a comment! There really is a lot of stuff in my mind and I have already dreamt a lot and seen what is possible…

tim-berners-lee-rene-pickhardt-weaving-the-web
Tim Berners Lee signing my Copy of his book at www2012 in Lyon

]]>
https://www.rene-pickhardt.de/how-tim-berners-lee-told-me-in-front-of-thousand-people-go-geek-and-do-it/feed/ 6
Amazed by neo4j, gwt and my apache tomcat webserver https://www.rene-pickhardt.de/amazed-by-neo4j-gwt-and-my-apache-tomcat-webserver/ https://www.rene-pickhardt.de/amazed-by-neo4j-gwt-and-my-apache-tomcat-webserver/#comments Thu, 15 Sep 2011 09:56:48 +0000 http://www.rene-pickhardt.de/?p=800 edit: the demo is finally online but on a different data set though: check out the demo and read about the new data set. An evaluation of graphity can be found here
Besides reading papers I am currently implementing the infrastructure of my social news stream for the new metalcon version. For the very first time I was really using neo4j on a remote webserver in a real webapplication built on gwt. This combined the advantages of all these technologies and our new fast server! After seeing the results I am so excited I almost couldn’t sleep last night!

Setting

I selected a very small bipartit subgraph of metalcon which means just the fans and bands together with the fanship relation between them. This graph consists of 12’198 nodes (6’870 Bands and 5’328 Users). and 119’379 edges.

Results

  • For every user I displayed all the favourite bands
  • for each of those band I calculated similar bands (on the fly while page request!)
  • this was done by breadth first search  (depth 2) and counting nodes on the fly

A page load for a random user with 56 favourite bands ends up in a traversal of  555’372. Together with sending the result via GWT over the web this was done in about 0.9 seconds!

Comparison to mySQL

I calculated the most similar bands using this query:
select ub.Band_ID, count(*) as anzahl from UserBand ub join UserBand ub1 on ub.User_ID=ub1.User_ID  where ub1.Band_ID = 3006 group by ub.Band_ID order by anzahl desc
This took .17 seconds for just one band on average!
Multiply this number with 56 and you get 9.5 seconds! And we haven’t even included sending of data and parsing in html yet.

Demo

Though we will release the software open source soon right now I cannot provide a demo. This is due to the fact that currently browsing this data reveals more user data than their privacy settings would allow! But I can encourage you to bookmark this link and check it out once in a while, since we are about to get rid of these privacy problems and demonstrate our results!
http://gwt.metalcon.de/GWT-Modelling/

Summary

I am really excited. Very seldom I was so keen on going on programming something to see further results! Unfortunatly it is still a long way down the road but we will make it. What is the spead going to be once I have really implemented the efficient data structures and caching in the live system. And if multiple users use it and also write to the data base!
If you want to join our open source project feel free to contact me!

]]>
https://www.rene-pickhardt.de/amazed-by-neo4j-gwt-and-my-apache-tomcat-webserver/feed/ 2