open source – Data Science, Data Analytics and Machine Learning Consulting in Koblenz Germany https://www.rene-pickhardt.de Extract knowledge from your data and be ahead of your competition Tue, 17 Jul 2018 12:12:43 +0000 en-US hourly 1 https://wordpress.org/?v=4.9.6 My List of People who I admire and which I find truly inspiring https://www.rene-pickhardt.de/my-list-of-people-who-i-admire-and-which-i-find-truly-inspiring/ https://www.rene-pickhardt.de/my-list-of-people-who-i-admire-and-which-i-find-truly-inspiring/#respond Wed, 27 Aug 2014 14:44:59 +0000 http://www.rene-pickhardt.de/?p=1786 This is my personal list of people that I admire. In a sense I would say if you want to know what I stand for you can just have a brief look at this list and at the values, norms and ideas the people of the list stand for. I have been heavily criticised that this list contains too many white men and not people from other cultures and sex. I think the main reason is that I am a western person and even though I lived in China I can just see to the horizon of my culture and of course I am being influenced by my culture. This is also where my values come from. So if you know people with a similar set of ideas and beliefs from other cultures feel free to contact me or leave a comment and point them out to me. I am very excited to “meet” more exciting people especially outside of my current horizon.
Also the following list has a randomised order.

Tank man

from:https://en.wikipedia.org/wiki/Tank_Man

A man who stood in front of a column of tanks on June 5, 1989, the morning after the Chinese military had suppressed the Tiananmen Square protests of 1989 by force, became known as the Tank Man or Unknown Protester. The tanks manoeuvred to pass by the man, and he moved to continue to obstruct them, in something like a dance. The incident was filmed and seen worldwide.

further info:

own reason:
This is an unbelievable example of civil courage. Obviously his actions did not really change how things have been going on around tiananmen but I think this is truly heroic and brave.
I wish I will always have a similar courage when it comes to the point of fighting for a good thing or idea.

Aaron Swartz

from:https://en.wikipedia.org/wiki/Aaron_Swartz

Aaron Hillel Swartz (November 8, 1986 – January 11, 2013) was an American computer programmer, writer, political organizer and Internet Hacktivist.
Swartz was involved in the development of the web feed format RSS, the organization Creative Commons, the website framework web.py and the social news site, Reddit, in which he became a partner after its merger with his company, Infogami.
Swartz’s work also focused on sociology, civic awareness and activism. He helped launch the Progressive Change Campaign Committee in 2009 to learn more about effective online activism. In 2010 he became a research fellow at Harvard University’s Safra Research Lab on Institutional Corruption, directed by Lawrence Lessig. He founded the online group Demand Progress, known for its campaign against the Stop Online Piracy Act.
On January 6, 2011, Swartz was arrested by MIT police on state breaking-and-entering charges, after systematically downloading academic journal articles from JSTOR. Federal prosecutors later charged him with two counts of wire fraud and 11 violations of the Computer Fraud and Abuse Act, carrying a cumulative maximum penalty of $1 million in fines, 35 years in prison, asset forfeiture, restitution and supervised release.
Swartz declined a plea bargain under which he would serve six months in federal prison. Two days after the prosecution rejected a counter-offer by Swartz, he was found dead in his Brooklyn, New York apartment, where he had hanged himself.
In June 2013, Swartz was posthumously inducted into the Internet Hall of Fame.

further info:

own reason:
Just read the Guerilla open access manifesto. Writing something like this and understanding the impact of open access is terrific. But living it through the PACER project and also through the JSTOR case at MIT is a complete different story.
I strongly believe that unjust laws exist but we have to understand that law is a relative thing. It is us in our society who make the laws. So it is also us to change them. I think norms and values of a society should stand above a particular law. So what Aaron did is following a very strong set of norms and values and fighting for a better law. One might doubt if his actions have been to radical and not in the way how we as a society decided to live our democratic processes but I am sure Aaron was driven by the deep wish to make the world a more place with more justice.

Lawrence Lessig

from:https://en.wikipedia.org/wiki/Lawrence_Lessig

Lawrence “Larry” Lessig (born June 3, 1961) is an American academic and political activist. He is a proponent of reduced legal restrictions on copyright, trademark, and radio frequency spectrum, particularly in technology applications, and he has called for state-based activism to promote substantive reform of government with a Second Constitutional Convention. In May 2014, he launched a crowd-funded political action committee which he termed May Day PAC with the purpose of electing candidates to Congress who would pass campaign finance reform.
Lessig is director of the Edmond J. Safra Center for Ethics at Harvard University and a Professor of Law at Harvard Law School. Previously, he was a professor of law at Stanford Law School and founder of the Center for Internet and Society. Lessig is a founding board member of Creative Commons and the founder of Rootstrikers, and is on the board of MapLight. He is on the advisory boards of the Democracy Café, Sunlight Foundation and Americans Elect. He is a former board member of the Free Software Foundation, Software Freedom Law Center and the Electronic Frontier Foundation.

further info:


own reason:
I have to admit that I did not come around to read his book code2.0 which is said to be excellent. But from his talks and actions I love how Lessig points out problems within society and how he is trying to educate people about it. He seems to have a very similar set of norms and values as Aaron did (and I do) but he is following “the protocol” of our society to fight for them. Especially he seems to be a true intellectual and not just a person who made a career in academia.

Geschwister Scholl

from:https://en.wikipedia.org/wiki/Geschwister_Scholl

Hans and Sophie Scholl, often referred to in German as die Geschwister Scholl (literally: the Scholl siblings), were a brother and sister who were members of the White Rose, a student group in Munich that was active in the non-violent resistance movement in Nazi Germany, especially in distributing flyers against the war and the dictatorship of Adolf Hitler. In post-war Germany, Hans and Sophie Scholl are recognized as symbols of the humanist German resistance movement against the totalitarian Nazi regime.

further info:

own reason:
It always is hard to pick a single person or in this case siblings when it comes to role models in opposing a regime that is harmful for the people of a society. Of course the Geschwister Scholl have not been the only people in the resistence movement of Nazi Germany and there have been other regimes in other places that also had resitence movements. Still I believe their actions are very remarkable. I think it is the role of students to point out problems in our society. Nowadays many students seem to just accept everything that is happening. Distributing the fliers with the “truth” about Nazi Germany was not only brave but also at the university attracting many people that could multiply the message
I think it is similar to Aaron Swartz. Students and young people are in the role of more radically pointing out problems within society and the Geschwister Scholl most certainly fulfilled this role.

Randy Pausch

from:https://en.wikipedia.org/wiki/Randy_Pausch

Randolph Frederick “Randy” Pausch (October 23, 1960 – July 25, 2008) was an American professor of computer science, human-computer interaction, and design at Carnegie Mellon University (CMU) in Pittsburgh, Pennsylvania.
Pausch learned that he had pancreatic cancer in September 2006, and in August 2007 he was given a terminal diagnosis: “3 to 6 months of good health left”. He gave an upbeat lecture titled “The Last Lecture: Really Achieving Your Childhood Dreams” on September 18, 2007, at Carnegie Mellon, which became a popular YouTube video and led to other media appearances. He then co-authored a book called The Last Lecture on the same theme, which became a New York Times best-seller. Pausch died of complications from pancreatic cancer on July 25, 2008.

further info:

own reason:
It might be the American optimism that is behind Randy Pausch’s lecture and talk but I actually do not admire him for giving an inspiring lecture even though he was dying. I admire him much more for the fact that he seemed to have lived his life in a very positive way. His goal of enabling the dreams of others sounds very honest to me. I also like the statements that he made about “If you life your life in the right way, the dreams come to you”. I think Randy is a very good example to show that no matter what fate did with a person it is the person’s responsibility to answer to this. When people cry out they might receive pitty but probably not really improve their situation. I guess one can summarise Randy with his quote:

We cannot change the cards we are dealt with only the way we play them.

By the way I especially like the idea that he gave this talk for his kids to teach them a lesson at a time when they are grown up and he would not be around anymore.

Tim Berners-Lee

from:https://en.wikipedia.org/wiki/Tim_Berners-Lee

Sir Timothy John “Tim” Berners-Lee, OM, KBE, FRS, FREng, FRSA, DFBCS (born 8 June 1955), also known as “TimBL”, is an English computer scientist, best known as the inventor of the World Wide Web. He made a proposal for an information management system in March 1989, and he implemented the first successful communication between a Hypertext Transfer Protocol (HTTP) client and server via the Internet sometime around mid November of that same year.
Berners-Lee is the director of the World Wide Web Consortium (W3C), which oversees the Web’s continued development. He is also the founder of the World Wide Web Foundation, and is a senior researcher and holder of the Founders Chair at the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). He is a director of the Web Science Research Initiative (WSRI), and a member of the advisory board of the MIT Center for Collective Intelligence.
In 2004, Berners-Lee was knighted by Queen Elizabeth II for his pioneering work. In April 2009, he was elected a foreign associate of the United States National Academy of Sciences. He was honoured as the “Inventor of the World Wide Web” during the 2012 Summer Olympics opening ceremony, in which he appeared in person, working with a vintage NeXT Computer at the London Olympic Stadium. He tweeted “This is for everyone”, which instantly was spelled out in LCD lights attached to the chairs of the 80,000 people in the audience.

further info:
Even though he is a bad talker and reading his book (weaving the web) will help much more I link a video here:

own reason:
In my opinion there are many reasons to admire Tim Berners Lee. Of course he is famose for inventing the world wide web. But I think the time was due for this invention. Internet itself was not very useful. The ideas of hypertext where around and similar systems existed. As always on the internet we have a strong the winner takes it all phenomenon. So bringing us the world wide web is certainly something Tim should get credit for but it is not the main reason why I admire him.
What is really cool about Tim Berners Lee is that he seems to have a very clear sense and abstraction of technical things and especially about their impact. Maybe it is easy to develop this sense after creating a technology that literally everyone on the Internet is using but still I like his activism for openess, ineroperability, net neutrality and freedom in general but freedom of speech in particular. Also he addressed me directly after asking a question in a Q&A session at a conference. His attitude of saying if you want to change the world you have the tools don’t talk just go geek and do it will certainly stick to me for the rest of my life.

Other than that I like that he does not fear to make a political statement about the problems with the web and where it should go and that he seems to have no interest whatsoever in becoming a multi billionaire which he could have easily achieved after sitting on the invention of the world wide web and being so central in its development.

Albert Einstein

from:https://en.wikipedia.org/wiki/Albert_Einstein

Albert Einstein (/ˈælbərt ˈaɪnstaɪn/; German: [ˈalbɐt ˈaɪnʃtaɪn] ( listen); 14 March 1879 – 18 April 1955) was a German-born theoretical physicist and philosopher of science. He developed the general theory of relativity, one of the two pillars of modern physics (alongside quantum mechanics). He is best known in popular culture for his mass–energy equivalence formula E = mc2 (which has been dubbed “the world’s most famous equation”). He received the 1921 Nobel Prize in Physics “for his services to theoretical physics, and especially for his discovery of the law of the photoelectric effect”. The latter was pivotal in establishing quantum theory.
Near the beginning of his career, Einstein thought that Newtonian mechanics was no longer enough to reconcile the laws of classical mechanics with the laws of the electromagnetic field. This led to the development of his special theory of relativity. He realized, however, that the principle of relativity could also be extended to gravitational fields, and with his subsequent theory of gravitation in 1916, he published a paper on the general theory of relativity. He continued to deal with problems of statistical mechanics and quantum theory, which led to his explanations of particle theory and the motion of molecules. He also investigated the thermal properties of light which laid the foundation of the photon theory of light. In 1917, Einstein applied the general theory of relativity to model the large-scale structure of the universe.
He was visiting the United States when Adolf Hitler came to power in 1933 and, being Jewish, did not go back to Germany, where he had been a professor at the Berlin Academy of Sciences. He settled in the U.S., becoming an American citizen in 1940. On the eve of World War II, he endorsed a letter to President Franklin D. Roosevelt alerting him to the potential development of “extremely powerful bombs of a new type” and recommending that the U.S. begin similar research. This eventually led to what would become the Manhattan Project. Einstein supported defending the Allied forces, but largely denounced the idea of using the newly discovered nuclear fission as a weapon. Later, with the British philosopher Bertrand Russell, Einstein signed the Russell–Einstein Manifesto, which highlighted the danger of nuclear weapons. Einstein was affiliated with the Institute for Advanced Study in Princeton, New Jersey, until his death in 1955.
Einstein published more than 300 scientific papers along with over 150 non-scientific works. His great intellectual achievements and originality have made the word “Einstein” synonymous with genius.

further info:

own reason:
He was probably one of my first role models. I admire him for two reasons.
The first – which I nowadays actually find a stupid reason to admire someone – is just his pure intellect. Creating relativity theory was an amazing achievement of ignoring what we seem to know and just following the facts (as all good mathematicians and computer scientists should do all the time) But the list of his physical achievements does not stop at relativity theory (actually David Hilbert brought us general relativity much quicker and before Einstein (after he had talked to him on a conference) DOUBLE CHECK FACT) Further than that the list of various independent fields that he was working on in physics is just incredibly long.
The second reason is the way Einstein behaved about the development of the nuclear bomb. He first pointed out – by signing a letter to the American president of that time Roosevelt – that there is the danger that Nazi Germany might create a nuclear weapon. This led to the Manhatten project. The interesting part comes at the moment where Einstein regrets signing the letter. He said that if had known that this weapon would have been used against civil people and that Nazi Germany would not be successful in developing such a bomb he would have done nothing.
Many scientists have a great responsability. Knowledge can quickly become very dangerous or can be misused for a strategic advantage in harmful actions. Unfortunately I have the feeling that many scientists do not have the time or courage to think about ethics and the real impact of their research (I mean the impact that is not measured by citations and impact factors…). Even Einstein seemed not to be aware of his impact by writing this letter that led to the Manhatten project. Still he took responsibility after the Bombs had been used in Japan. I think many people in Einsteins position would have found a way of justifying how the americans had used the bomb against Japan. He did not. He publicly regreted what he did and had started. Finally he was a key player and intellectual of this open letter which pledges to the governments of this world to resolve conflicts in a peaceful way

Chelsea Manning

from:https://en.wikipedia.org/wiki/Chelsea_Manning

Chelsea Elizabeth Manning (born Bradley Edward Manning, December 17, 1987) is a United States Army soldier who was convicted in July 2013 of violations of the Espionage Act and other offenses, after releasing the largest set of classified documents ever leaked to the public. Manning was sentenced in August 2013 to 35 years confinement with the possibility of parole in eight years, and to be dishonorably discharged from the Army. Manning is a trans woman who, in a statement the day after sentencing, said she had felt female since childhood, wanted to be known as Chelsea, and desired to begin hormone replacement therapy. From early life and through much of her Army life, Manning was known as Bradley; she was diagnosed with gender identity disorder while in the Army.
Assigned in 2009 to an Army unit in Iraq as an intelligence analyst, Manning had access to classified databases. In early 2010, she leaked classified information to WikiLeaks and confided this to Adrian Lamo, an online acquaintance. Lamo informed Army Counterintelligence, and Manning was arrested in May that same year. The material included videos of the July 12, 2007 Baghdad airstrike, and the 2009 Granai airstrike in Afghanistan; 250,000 U.S. diplomatic cables; and 500,000 Army reports that came to be known as the Iraq War logs and Afghan War logs. Much of the material was published by WikiLeaks or its media partners between April and November 2010.
Manning was ultimately charged with 22 offenses, including aiding the enemy, which was the most serious charge and could have resulted in a death sentence. She was held at the Marine Corps Brig, Quantico in Virginia, from July 2010 to April 2011 under Prevention of Injury status—which entailed de facto solitary confinement and other restrictions that caused domestic and international concern—before being transferred to Fort Leavenworth, Kansas, where she could interact with other detainees. She pleaded guilty in February 2013 to 10 of the charges. The trial on the remaining charges began on June 3, 2013, and on July 30 she was convicted of 17 of the original charges and amended versions of four others, but was acquitted of aiding the enemy. She is serving her sentence at the maximum-security U.S. Disciplinary Barracks at Fort Leavenworth.
Reaction to Manning’s disclosures, arrest, and sentence was mixed. Denver Nicks, one of her biographers, writes that the leaked material, particularly the diplomatic cables, was widely seen as a catalyst for the Arab Spring that began in December 2010, and that Manning was viewed as both a 21st-century Tiananmen Square Tank Man and an embittered traitor. Reporters Without Borders condemned the length of the sentence, saying that it demonstrated how vulnerable whistleblowers are.

further info:

own reason:
Obviously I did not have the time to read everything that Manning has made public so I might be blinded by media coverage of his case. From what I know I can say that many others on the list Manning was bound to her moral and not to what she was allowed to do or not. I think she was truly trying to point out unjust things and I think especially the way she did it was actually pretty smart. I guess there is a lot of structural violence in politics and military. Pointing out problems in the “correct way” seems to not really change something. Therefor she just had to release the video of american soldiers randomly shooting civilians. Did she have to make public everything else? Who knows. Actually who cares? Making this video itself public is heroic and should have a much bigger impact than it did.
Going to jail for 35 years and having the society accepting this makes me just said. I really wonder what has to happen for people to make a revolution. Not that I believe in such a drastic action but having Manning in prison for 35 years is f*** up. I strongly hope that one day Chelsea Manning will receive the peace nobel price at some time.

Noam Chomsky

from:https://en.wikipedia.org/wiki/Noam_Chomsky

Avram Noam Chomsky (/ˈnoʊm ˈtʃɒmski/; born December 7, 1928) is an American linguist, philosopher, cognitive scientist, logician, political commentator and activist. Sometimes described as the “father of modern linguistics”, Chomsky is also a major figure in analytic philosophy. He has spent most of his career at the Massachusetts Institute of Technology (MIT), where he is currently Professor Emeritus, and has authored over 100 books. He has been described as a prominent cultural figure, and was voted the “world’s top public intellectual” in a 2005 poll.
Born to a middle-class Ashkenazi Jewish family in Philadelphia, Chomsky developed an early interest in anarchism from relatives in New York City. He later undertook studies in linguistics at the University of Pennsylvania, where he obtained his BA, MA, and PhD, while from 1951 to 1955 he was appointed to Harvard University’s Society of Fellows. In 1955 he began work at MIT, soon becoming a significant figure in the field of linguistics for his publications and lectures on the subject. He is credited as the creator or co-creator of the Chomsky hierarchy, the universal grammar theory, and the Chomsky–Schützenberger theorem. Chomsky also played a major role in the decline of behaviorism, and was especially critical of the work of B.F. Skinner. In 1967 he gained public attention for his vocal opposition to U.S. involvement in the Vietnam War, in part through his essay The Responsibility of Intellectuals, and came to be associated with the New Left while being arrested on multiple occasions for his anti-war activism. While expanding his work in linguistics over subsequent decades, he also developed the propaganda model of media criticism with Edward S. Herman. Following his retirement from active teaching, he has continued his vocal public activism, praising the Occupy movement for example.
Chomsky has been a highly influential academic figure throughout his career, and was cited within the field of Arts and Humanities more often than any other living scholar between 1980 and 1992. He was also the eighth most cited scholar overall within the Arts and Humanities Citation Index during the same period. His work has influenced fields such as artificial intelligence, cognitive science, computer science, logic, mathematics, music theory and analysis, political science, programming language theory and psychology. Chomsky continues to be well known as a political activist, and a leading critic of U.S. foreign policy, state capitalism, and the mainstream news media. Ideologically, he aligns himself with anarcho-syndicalism and libertarian socialism.

further info:

own reason:
Chomsky is very new on the list so I cannot say very much about him. I have watched several interviews and talk by him and I just find it amazing how he turned completely towards ethics and political activism and is highly educated, rational and fact driven (he seems always to just have the better argument). In particular I like his point of view on power systems (As far as I understand him he is not blaming single people for injustice but he is seeing the problem of structural violence). I also like his critical view on mass media therefor I am eager to read his book: manufacturing consent
I particular like his very clear view on fundamental issues and how certain policies inevitably lead to certain abuse.

Melinda Gates (also Bill Gates)

from:https://en.wikipedia.org/wiki/Bill_%26_Melinda_Gates_Foundation

Bill & Melinda Gates Foundation (BMGF or the Gates Foundation) is one of the largest private foundations in the world, founded by Bill and Melinda Gates. It was launched in 2000 and is said to be the largest transparently operated private foundation in the world. It is “driven by the interests and passions of the Gates family”. The primary aims of the foundation are, globally, to enhance healthcare and reduce extreme poverty, and in America, to expand educational opportunities and access to information technology. The foundation, based in Seattle, Washington, is controlled by its three trustees: Bill Gates, Melinda Gates and Warren Buffett. Other principal officers include Co-Chair William H. Gates, Sr. and Chief Executive Officer Susan Desmond-Hellmann.
It had an endowment of US$38.3 billion as of 30 June 2013. The scale of the foundation and the way it seeks to apply business techniques to giving makes it one of the leaders in the philanthrocapitalism revolution in global philanthropy, though the foundation itself notes that the philanthropic role has limitations. In 2007, its founders were ranked as the second most generous philanthropists in America, and Warren Buffett the first. As of May 16, 2013, Bill Gates had donated US$28 billion to the foundation.

further info:

own reason:
Ok I admit it is not fair to just name her. I mean it is still the Bill and Melinda Gates foundation. But from my perception it is Melinda who was the driving force and the eyeopener for Bill Gates. I always realised Bill Gates as one of the coldest and disgusting business man out there (On the same list as Steve Jobs and Marc Zuckerberg). Using Patents and Licence agreements and closed systems just for the purpose of becoming incredibly rich. Like other computer scientists he already had a deep impact on people and bringing us the operating systems and office suite was probably not that bad after all. I mean they were still useful tools for most people. Still he could have chosen a more ethical business model. Well how should he have seen these things when he was young. I guess he was even bound to investors and to what they wanted.
I guess with the help of Melinda he also realised that it would be to late to make drastic changes to Microsoft so he changed the focus in his life to create something new. Something that is much more sustainable and that feels very good.
Now using their wealth Bill and Melinda Gates start to tackle really important issues that we as humans can all tackle but which seem economically unimportant to tackle. This feels a little bit like a modern version of Robin Hood. Microsoft is pulling money out of the rich part of the world with nowadays ok software at high cost and vendor lockin but Bill and Melinda are distributing this money e.g. to fight diseases in areas of the world where the western world simply doesn’t care to fight these diseases. Also they act as multipliers to convince other rich people to do similar. I think this contributes a lot to more justice and progress.
Besides my love for technological topics the Bill and Melinda Gates foundation is besides the Wikimedia Foundation probably the only interesting NGO I am aware of and that I would be willing to work for and sacrifice my tech career. But I guess this could still even be done after a successful tech career (:
By the way fun fact: The rich get richer principle holds so incredibly in the case of bill and Melinda gates. Warren Buffet the “opponent” to Gates of being the wealthiest person in the world donated almost all his money to the Bill and Melinda Gates foundation which I think is an incredible trust provider to what Bill and Melinda are doing.

Uncertain candidates – since its had to say

There are some borderline candidates which I am still not sure about.

Julian Assange

I do not even know how to make up my mind. On the one hand Julian Assange seems to be an incredible important person and really doing a lot of good. On the other hand he seems very self centered and sometimes not authentic. I understand that he of has course operational costs and no fixed income. Still I am not sure how much is real

RAF – resp. Ulrike Meinhof

I guess in Germany it is almost as impossible to say that one sympathises with the RAF as it would be to state that one sympathises with the NSDAP. Yet I liked the fundamental problems the RAF addressed. Their methods where stupid and I guess there where a lot of “dead fish” swimming with the RAF and pursuing all the terror the RAF did but from their core beliefs and problems with the German society they seemed to have some really valid points.

Richard Stallman

Inventing the GPL was an an incredible smart move. I am not sure if this was the first copyleft licence and if Stallman really came up himself with the idea. Still he probably could and would have if he didn’t.
Stallman is often perceived to be too radical and not able to make a compromise. From what I understand (and within this article I believe that this is the topic with my biggest expertise) this is just the only way. There cannot be such a thing as “half free software” you are free or you are not free. The impact of being free is so incredibly big that I think it is indeed one of the view points in life where people really should not make a compromise. So I think that what Stallman is frequently being criticised for is actually one of his strongest points.

Linus Torvalds

I am not sure if he is just a winner takes it all guy or if there is more to him. Besides linux bringing git to the hacker community is the second and maybe on the long term even more impactful innovation by Linus Torvalds. Also the processes how he seems to work how he seems to understand the dynamics and social processes of the open source community is crazy.

Larry Page

People might ask: “Rene why is Steve jobs and Zuckerberg on your bad list and Larry page not? Where did he donate his money do and did he do all the philantropic work like Bill Gates?” My only response is: Yes that is a problem and that is part of the reason why I am still undecided about Page. What speaks for Page is his creativity combined with his strong will to use technology, and financial power to change the world and make it more automised and efficient. By pursuing this goal he seems to ignore economical principles. Google has released a bunch of products that are hard to monitise (even indirectly) or really “moonshot” projects. I have the feeling that page cannot donate money or give up power within google unless he has brought the amount of innovation to the world that he wanted.

  • Self driving cars (probably as shared economy with taxi, logistics, online shopping and not for sale)
  • a better “semantic” search (in combination with android and more knowledge of user context)
    • Even though not everything is perfect google does it is still incredible that a company with so many employees is still able to manage such a great company culture. At least Google is a company that started with a clear mission statement (“to make the worlds knowledge universally accessable for everyone everywhere”) and as said probably Page cannot rest and focus on other things unless he has fulfilled his noble goals.

      ]]> https://www.rene-pickhardt.de/my-list-of-people-who-i-admire-and-which-i-find-truly-inspiring/feed/ 0 Graphity Server for social activity streams released (GPLv3) https://www.rene-pickhardt.de/graphity-server-for-social-activity-streams-released-gplv3/ https://www.rene-pickhardt.de/graphity-server-for-social-activity-streams-released-gplv3/#comments Mon, 02 Sep 2013 07:11:22 +0000 http://www.rene-pickhardt.de/?p=1753 It is almost 2 years over since I published my first ideas and works on graphity which is nowadays a collection of algorithms to support efficient storage and retrieval of more than 10k social activity streams per second. You know the typical application of twitter, facebook and co. Retrieve the most current status updates from your circle of friends.
      Today I proudly present the first version of the Graphity News Stream Server. Big thanks to Sebastian Schlicht who worked for me implementing most of the Servlet and did an amazing job! The Graphity Server is a neo4j powered servlet with the following properties:

      • Response times for requests are usually less than 10 milliseconds (+network i/o e.g. TCP round trips coming from HTTP)
      • The Graphity News Stream Server is a free open source software (GPLv3) and hosted in the metalcon git repository. (Please also use the bug tracker there to submit bugs and feature requests)
      • It is running two Graphity algorithms: One is read optimized and the other one is write optimized, if you expect your application to have more write than read requests.
      • The server comes with an REST API which makes it easy to hang in the server in whatever application you have.
      • The server’s response also follows the activitystrea.ms format so out of the box there are a large amount of clients available to render the response of the server.
      • The server ships together with unit tests and extensive documentation especially of the news stream server protocol (NSSP) which specifies how to talk to the server. The server can currently handle about 100 write requests in medium size (about a million nodes) networks. I do not recommend to use this server if you expect your user base to grow beyond 10 Mio. users (though we are working to get the server scaling) This is mostly due to the fact that our data base right now won’t really scale beyond one machine and some internal stuff has to be handled synchronized.

      Koding.com is currently thinking to implement Graphity like algorithms to power their activity streams. It was for Richard from their team who pointed out in a very fruitfull discussion how to avoid the neo4j limit of 2^15 = 32768 relationship types by using an overlay network. So his ideas of an overlay network have been implemented in the read optimized graphity algorithm. Big thanks to him!
      Now I am relly excited to see what kind of applications you will build when using Graphity.

      If you’ll use graphity

      Please tell me if you start using Graphity, that would be awesome to know and I will most certainly include you to a list of testimonials.
      By they way if you want to help spreading the server (which is also good for you since more developer using it means higher chance to get newer versions) you can vote up my answer in stack overflow:
      http://stackoverflow.com/questions/202198/whats-the-best-manner-of-implementing-a-social-activity-stream/13171306#13171306

      How to get started

      its darn simple!

      1. You clone the git repository or get hold of the souce code.
      2. then switch to the repo and type sudo ./install.sh
      3. copy the war file to your tomcat webapps folder (if you don’t know how to setup tomcat and maven which are needed we have a detailed setup guide)
      4. and you’re done more configuration details are in our README.md!
      5. look in the newswidget folder to find a simple html / java script client which can interact with the server.
      I also created a small simple screen cast to demonstrate the setup: 

      Get involved

      There are plenty ways to get involved:

      • Fork the server
      • commit some bug report
      • Fix a bug
      • Subscribe to the mailing list.

      Furhter links:

      ]]>
      https://www.rene-pickhardt.de/graphity-server-for-social-activity-streams-released-gplv3/feed/ 5
      Metalcon finally gets a redesign – Thinking about high scalability https://www.rene-pickhardt.de/metalcon-finally-becomes-a-redesign-thinking-about-high-scalability/ https://www.rene-pickhardt.de/metalcon-finally-becomes-a-redesign-thinking-about-high-scalability/#comments Mon, 17 Jun 2013 15:21:30 +0000 http://www.rene-pickhardt.de/?p=1631 Finally metalcon.de the social networking site which Jonas, Jens and me created in 2008 gets a redesign. Thanks to the great opportunities at the Institute for Web Science and Technologies here in Koblenz (why don’t you apply for a PhD position with us?) I will have the chance to code up the new version of metalcon. Kicking off on July 15th I will lead a team of 5 programmers for the duration of 4 months. Not only will the development be open source but during this time I will constantly (hopefully on a daily basis) write in this blog about the design decisions we took in order to achieve a good scaling web service.
      Before I share my thoughts on high scaling architectures for web sites I want to give a little history and background on what metalcon is and why this redesign is so necessary:

      Metalcon is a social networking site for german fans of metal music. It currently has

      • a user base of 10’000 users.
      • about 500 registered bands
      • highly semantic and interlinked data base (bands, geographical coordinates, friendships, events)
      • 624 MB of text and structured data about the mentioned topics.
      • fairly good visibility in search engines.
      • > 30k lines of code (mostly PHP)
      • a bad scaling architecture (own OR-mapper, own AJAX libraries, big monolithic data base design, bad usage of PHP,…)
      • no unit tests (so code maintenance is almost impossible)
      • no music and audio files
      • no processes for content moderation
      • no processes to fight spam and block users
      • a really bad usability (I could write tons of posts at which points the usability lacks)
      • no clear distinction of features for users to understand

      When we built metalcon no one on the team had experience with high scaling web applications and we were about happy to get it running any way. After returning from china and starting my PhD program in 2011 I was about to shut down metalcon. Though we became close friends the core team was already up on new projects and we have been lacking manpower. On the other side everyone kept on telling me that metalcon would be a great place to do research. So in 2011 Jonas and me decided to give it another shot and do an open redevelopment. We set up a wiki to document our features and the software and we created a developer blog which we used to exchange ideas. Also we created some open source project to which we hardly contributed code due to the lacking manpower…
      Well at that time we already knew of too many problems so that fixing was not the way to go. At least we did learn a lot. Thinking about high scaling architectures at that time I new that a news feed (which the old version of metalcon already had) was very core for the user experience. Reading many stack exchange discussions I knew that you wouldn’t build such a stream on MySQL. Also playing around with graph databases like neo4j I came to my first research paper building graphity a software which is designed to distribute highly personalized news streams to users. Since our development was not proceeding we never deployed Graphity within metalcon. Also building an autocomplete service for the site should not be a problem anymore.

      Roadmap for the redesign

      • Over the next weeks I hope to read as many interesting articles about technologies and high scalability as I can possibly find and I will be more than happy to get your feedback and suggestions here. I will start reading many articles of http://highscalability.com/ This blog is pure gold for serious web developers. 
      • During a nice discussion about scalability with Heinrich we already came up with a potential architecture of metalcon. I will soon introduce this architecture but want to check first about the best practices in the high scalability blog.
      • In parallel I will also collect the features needed for the new metalcon version and hopefully be able to pair them with usefull technologies. I already started a wikipage about features and planned technologies to support them.
      • I will also need to decide the programming language and paradigms for the development. Right now I am playing around with ruby on rails vs GWT. We made some greate experiences with the power of GWT but one major drawback is for sure that the website is more an application than some lightweight website.

      So again feel free to give input, share your ideas and experiences with me and with the community. I will be ver greatfull for every recommendation of articles, videos, books and so on.

      ]]>
      https://www.rene-pickhardt.de/metalcon-finally-becomes-a-redesign-thinking-about-high-scalability/feed/ 10
      Analyzing the final and intermediate results of the iversity MOOC Fellowship online voting https://www.rene-pickhardt.de/analyzing-the-final-and-intermediate-results-of-the-iversity-mooc-fellowship-online-voting/ https://www.rene-pickhardt.de/analyzing-the-final-and-intermediate-results-of-the-iversity-mooc-fellowship-online-voting/#comments Thu, 23 May 2013 23:07:24 +0000 http://www.rene-pickhardt.de/?p=1609 As writen before Steffen and I participated in the online voting for the MOOC fellowship. Today the competition finished and I would like to say thank you to everyone who so far participated in the voting in particular to the 435 people supporting our course. I did never image to get that many people to be interested in our course!
      The voting period went from May first till today. During this period the user interface of the iversity website changed several times providing different kind of information about the voting to us users. Since I have observed a drastic change in rankings on May 9th and since the process and scores have not been very transparent I have decided on that very day to collect some data about the rankings. I already did some quick analysis on the data and found some interesting facts but I am running out of time right now to conduct an extensive data analysis. So I will share the data set with the public domain:
      http://rene-pickhardt.de/mooc.tar.bz2 (33MB)
      If you download the zip file and extract it you’ll find folders for every hour after May 9th. In every folder you will find 26 html-files representing the current ranking of the courses at that time and a transaction log of the http-requests which were done to download the 26 html files. There are 26 html files since 10 courses were displayed per page and we had 255 courses participating.
      During the time of data collection I had 2 or 3 short down times of my web server so it could be possible that some data points are missing.
      I already wrote a “dirty hack” and pushed it on github which also extracts the interesting information out of the downloaded html files.

      1. There is a file rank.tsv (334 kb) that displays for every course on an hourly basis the rankings
      2. There is a file vote.tsv (113 kb) that contains for every course on an hourly basis (between may 20th and today) the number of votes the course did acquire. The period of time for vote.tsv is so short since the votes have only been available in the html files during this time. 

      Skimming the data with my eyes there are already some facts that make me very curious for a deeper data analysis:

      1. Some courses gained several hundred votes within a short period of time (usually only 2 or 3 hours) whereas most courses (especially those gaining such a large amount of votes) often stayed far under 1000 votes at all. 
      2. Also it is interesting to see how much variation has been going on in the last couple of days. 
      3. Also I haven’t crawled the views of the Youtube videos of the courses and even now after observing the following I did not take a snapshot of them it is interesting that there is such a large difference in conversion rate. Especially the top courses seem to have much more votes than they have views of the application video. Where some really high class and outstanding applications like the ones from Chrstian Spannagel (Math) or  Oliver Vornberger (Algorithms and data structures) have two or three times as many views on Youtube as votes. Especially they have about the same amount of views on Youtube as the top voted courses.

      I am pretty sure there are some more interesting facts and maybe someone else has collected a better data set over the complete periode of time and including Youtube snapshots as well as Facebook and Twitter mentions.
      Since I have been asked several times already: here are the final rankings to download and also as a table in the blog post:

        Kursname Anzahl an votes
      1 sectio chirurgica anatomie interaktiv 8013
      2 internationales agrarmanagement 2 7557
      3 ingenieurmathematik fur jedermann 2669
      4 harry potter and issues in international politics 2510
      5 online surgery 2365
      6 l3t s mooc der offene online kurs uber das lernen und lehren mit technologien 2270
      7 design 101 or design basics 2 2216
      8 einfuhrung in das sozial und gesundheitswesen sozialraume entdecken und entwickeln 2124
      9 changeprojekte planen nachhaltige entwicklung durch social entrepreneurship 2083
      10 social work open online course swooc14 2059
      11 understanding sustainability environmental problems collective action and institutions 1912
      12 the dance of functional programming languaging with haskell and python 1730
      13 zyklenbasierte grundung systematische entwicklung von geschaftskonzepten 1698
      14 a virtual living lab course for sustainable housing and lifestyle 1682
      15 family politics domestic life revolution and dictatorships between 1900 1950 1476
      16 h2o extrem 1307
      17 dark matter in galaxies the last mystery 1261
      18 algorithmen und datenstrukturen 1207
      19 psychology of judgment and decision making 1168
      20 the future of storytelling 1164
      21 web engineering 1152
      22 die autoritat der wissenschaften eine einfuhrung in das wissenschaftstheoretische denken 2 1143
      23 magic and logic of music a comprehensive course on the foundations of music and its place in life 1138
      24 nmooc nachhaltigkeit fur alle 1130
      25 sovereign bond pricing 1115
      26 soziale arbeit eine einfuhrung 1034
      27 mathematische denk und arbeitsweisen in geometrie und arithmetik 1016
      28 social entrepreneurship wir machen gesellschaftlichen wandel moglich 1010
      29 molecular gastronomy an experimental lecture about food food processing and a bit of physiology 984
      30 fundamentals of remote sensing for earth observation 920
      31 kompetenzkurs ernahrungswissenschaft 891
      32 erfolgreich studieren 879
      33 deciphering ancient texts in the digital age 868
      34 qualitative methods 861
      35 karl der grosse pater europae 855
      36 who am i mind consciousness and body between science and philosophy 837
      37 programmieren mit java 835
      38 systemisches projektmanagement 811
      39 lernen ist sexy 764
      40 modelling and simulation using matlab one mooc more brains an interdisciplinary course not just for experts 760
      41 suchmaschinen verstehen 712
      42 hands on course on embedded computing systems with raspberry pi 679
      43 introduction to mixed methods and doing research online 676
      44 game ai 649
      45 game theory and experimental economic research 633
      46 cooperative innovation 613
      47 blue engineering ingenieurinnen und ingenieure mit sozialer und okologischer verantwortung 612
      48 my car the unkown technical being 612
      49 gesundheit ein besonderes gut eine multidisziplinare erkundung des deutschen gesundheitssystems 608
      50 teaching english as a foreign language tefl part i pronunciation 597
      51 wie kann lesen gelernt gelehrt und gefordert werden lesesozialisation lesedidaktik und leseforderung vom grundschulunterricht bis zur erwachsenenbildung 593
      52 the european dream 576
      53 education of the present what is the future of education 570
      54 faszination kristalle und symmetrie 561
      55 italy today a girlfriend in a coma a walk through today s italy 557
      56 dna from structure to therapy 556
      57 grundlagen der mensch computer interaktion 549
      58 malnutrition in developing countries 548
      59 marketing als strategischer erfolgsfaktor von der produktinnovation bis zur kundenbindung 540
      60 environmental ethics for scientists 540
      61 stem cells in biology and medicine 528
      62 praxiswissen fur den kunstlerischen alltagsdschungel 509
      63 physikvision 506
      64 high five evidence based practice 505
      65 future climate water 484
      66 diversity and communication challenges for integration and mobility 477
      67 social entrepreneurship 469
      68 die kunst des argumentierens 466
      69 der hont feat mit dem farat wek wie kinder schreiben und lesen lernen 455
      70 antikrastination moocen gegen chronisches aufschieben 454
      71 exercise for a healthier life 454
      72 the startup source code 438
      73 web science 435
      74 medizinische immunologie 433
      75 governance in and through human rights 431
      76 europe in the world law and policy aspects of the eu in global governance 419
      77 komplexe welt strukturen selbstorganisation und chaos 419
      78 mooc basics of surgery want to become a real surgeon 416
      79 statistical data analysis for the humanities 414
      80 business math r edux 406
      81 analyzing behavioral dynamics non linear approaches to social and cognitive sciences 402
      82 space technology 397
      83 der erzahler materialitat und virtualitat vom mittelalter bis zur gegenwart 396
      84 kriminologie 395
      85 von e mail skype und xing kommunikation fuhrung und berufliche zusammenarbeit im netz 394
      86 wissenschaft erzahlen das phanomen der grenze 392
      87 nachhaltige entwicklung 389
      88 die nachste gesellschaft gesellschaft unter bedingungen der elektrizitat des computers und des internets 388
      89 die grundrechte 376
      90 medienbildung und mediendidaktik grundbegriffe und praxis 368
      91 bubbles everywhere speculative bubbles in financial markets and in everyday life 364
      92 the heart of creativity 363
      93 physik und weltraum 358
      94 sim suchmaschinenimplementierung als mooc 354
      95 order of magnitude physics from atomic nuclei to the universe 350
      96 entwurfsmethodik eingebetteter systeme 343
      97 monte carlo methods in finance 335
      98 texte professionell mit latex erstellen 331
      99 wissenschaftlich arbeiten wissenschaftlich schreiben 330
      100 e x cite join the game of social research 330
      101 forschungsmethoden 323
      102 complex problem solving 321
      103 programmieren lernen mit effekt 317
      104 molecular devices and machines 317
      105 wie man erfolgreich ein startup aufbaut 315
      106 grundlagen der prozeduralen und objektorientierten programmierung 314
      107 introduction to disability studies 314
      108 eu2c the european union explained by two partners cologne and cife 313
      109 the english language a linguistic introduction 2 311
      110 allgemeine betriebswirtschaftslehre 293
      111 interaction design open design 293
      112 how we learn nowadays possibilities and difficulties 288
      113 foundations of educational technology 288
      114 projektmanagement und designbasiertes lernen 281
      115 human rights 278
      116 kompetenz des horens technische gehorbildung 278
      117 it infrastructure management 276
      118 a media history in 10 artefacts 274
      119 introduction to the practice of statistics and regression 271
      120 what is a good society introduction to social philosophy 268
      121 modellierungsmethoden in der wirtschaftsinformatik 265
      122 objektorientierte programmierung von web anwendungen von anfang an 262
      123 intercultural diversity networking vielfalt interkulturell vernetzen 260
      124 foundations of entrepreneurship 259
      125 business communication for impact and results 257
      126 gamification 257
      127 creativity and design in innovation management 256
      128 mechanik i 252
      129 global virtual project management 252
      130 digital signal processing for everyone 249
      131 kompetenzen fur klimaschutz anpassung 248
      132 digital economy and social innovation 246
      133 synthetic biology 245
      134 english phonetics and phonology 245
      135 leibspeisen nahrung im wandel der zeiten molekule brot kase fleisch schokolade und andere lebensmittel 243
      136 critical decision making in the contemporary globalized world 238
      137 einfuhrung in die allgemeine betriebswirtschaftslehre schwerpunkt organisation personalmanagement und unternehmensfuhrung 236
      138 didaktisches design 235
      139 an invitation to complex analysis 235
      140 grundlagen der programmierung teil 1 234
      141 allgemein und viszeralchirurgie 233
      142 mathematik 1 fur ingenieure 231
      143 consumption and identity you are what you buy 231
      144 vampire fictions 230
      145 grundlagen der anasthesiologie 228
      146 marketing strategy and brand management 227
      147 political economy an introduction 225
      148 gesundheit 221
      149 object oriented databases 219
      150 lebenswelten perspektiven fur menschen mit demenz 217
      151 applications of graphs to real life problems 210
      152 introduction to epidemiology epimooc 207
      153 network security 207
      154 global civics 207
      155 wissenschaftliches arbeiten 204
      156 annaherungen an zukunfte wie lassen sich mogliche wahrscheinliche und wunschbare zukunfte bestimmen 202
      157 einstieg wissenschaft 200
      158 engineering english 199
      159 das erklaren erklaren wie infografik klart erklart und wissen vermittelt 198
      160 betriebswirtschaftliche und rechtliche grundlagen fur das nonprofit management 192
      161 art and mathematics 191
      162 vom phanomen zum modell mathematische modellierung von natur und alltag an ausgewahlten beispielen 190
      163 design interaktiver medien technische grundlagen 189
      164 business englisch 187
      165 erziehung sehen analysieren gestalten 184
      166 basic clinical research methods 184
      167 ordinary differential equations and laplace transforms 180
      168 mathematische logik 179
      169 die geburt der materie in der evolution des universums 179
      170 innovationsmanagement von kleinen und mittelstandischen unternehmen kmu 176
      171 introduction to qualitative methods in the social sciences 175
      172 advert retard wirkung industrieller interessen auf rationale arzneimitteltherapie 175
      173 animation beyond the bouncing ball 174
      174 entropie einfuhrung in die physikalische chemie 172
      175 edufutur education for a sustainable future 165
      176 social network effects on everyday life 164
      177 pharmaskills for africa 163
      178 nachhaltige energiewirtschaft 162
      179 qualitat in der fruhpadagogik auf den anfang kommt es an 158
      180 dementias 157
      181 beyond armed confrontation multidisciplinary approaches and challenges from colombia s conflict 154
      182 investition und finanzierung 150
      183 praxis des wissensmanagements 149
      184 gutenberg to google the social construction of the communciations revolution 145
      185 value innovation and blue oceans 145
      186 kontrapunkt 144
      187 shakespeare s politics 142
      188 jetzt erst recht wissen schaffen uber recht 141
      189 rechtliche probleme von sozialen netzwerken 138
      190 augmented tuesday suppers 137
      191 positive padagogik 137
      192 digital storytelling mit bewegenden bildern erzahlen 136
      193 wirtschaftsethik 134
      194 energieeffizientes bauen 134
      195 advising startups 133
      196 urban design and communication 133
      197 bildungsreform 2 0 132
      198 mooc management basics 130
      199 healthy teeth a life long course of preventive dentistry 129
      200 digitales tourismus marketing 127
      201 the arctic game the struggle for control over the melting ice 127
      202 disease mechanisms 127
      203 special operations from raids to drones 125
      204 introduction to geospatial technology 120
      205 social media marketing strategy smms 119
      206 korpusbasierte analyse sprechsprachlichen problemlosungsverhaltens 116
      207 introduction to marketing 115
      208 creative coding 114
      209 mooc meets 3d 110
      210 unternehmenswert die einzig sinnvolle spitzenkennzahl fur unternehmen 110
      211 forming behaviour gestaltung und konzeption von web applications 109
      212 technology demonstration 108
      213 lebensmittelmikrobiologie und hygiene 105
      214 estudi erfolgreich studieren mit dem internet 105
      215 moderne geldtheorie eine paische perspektive 103
      216 kollektive intelligenz 103
      217 geschichte der optischen medien 100
      218 alter und soziale arbeit 99
      219 semantik eine theorie visueller kommunikation 97
      220 erziehung und beratung in familie und schule 96
      221 foreign language learning in indian context 95
      222 bildgebende verfahren 92
      223 applied biology 92
      224 bildung in der wissensgesellschaft gerechtigkeit 92
      225 standortmanagement 92
      226 europe a solution from history 90
      227 methodology of research in international law 90
      228 when african americans came to paris 90
      229 contemporary architecture 89
      230 past recent encounters turkey and germany 88
      231 wars to end all wars 83
      232 online learning management systems 82
      233 software applications 81
      234 business in germany 78
      235 requirements engineering 77
      236 anything relationship management xrm 77
      237 global standards and local practices 76
      238 prodima professionalisation of disaster medicine and management 75
      239 cytology with a virtual correlative light and electron microscope 75
      240 the organisation of innovation 75
      241 sensors for all 75
      242 diagnostik in der beruflichen bildung 73
      243 scientific working 71
      244 escience saxony lectures 71
      245 internet marketing strategy how to gain influence and spread your message online 69
      246 grundlagen des e business 69
      247 principles of public health 64
      248 methods for shear wave velocity measurements in urban areas 64
      249 democracy in america 64
      250 building typology studies gebaudelehre 63
      251 multi media based learning environments at the interface of science and practice hamburg university of applied sciences prof dr andrea berger klein 61
      252 math mooc challenge 60
      253 the value of the social 58
      254 dienstleistungsmanagement und informationssysteme 57
      255 ict integration in education systems e readiness e integration e transformation 56
      ]]>
      https://www.rene-pickhardt.de/analyzing-the-final-and-intermediate-results-of-the-iversity-mooc-fellowship-online-voting/feed/ 8
      Please help me to realize my Web science massive open online course https://www.rene-pickhardt.de/please-help-me-to-realize-my-web-science-massive-open-online-course/ https://www.rene-pickhardt.de/please-help-me-to-realize-my-web-science-massive-open-online-course/#comments Wed, 01 May 2013 09:59:57 +0000 http://www.rene-pickhardt.de/?p=1581 I am asking you for a big favor in this blog post! You can help me to achieve one of my childhood dreams:
      I am an enthusiastic teacher and love to share information (as you might have seen by reading my blog) Over the last month I have designed a structure for an online course on Web Science together with a short video. In this blog post I will introduce the course to you but I am also asking you to vote for the course since only 10 of the 250 courses that applied for the fellowship will be sponsored and thus be realized.
      So please go to https://moocfellowship.org/submissions/web-science an learn more about the course and vote for it. You can find almost all details of the course in this blog post.

      Why creating such a cours?

      The web has become important to its 2.3 billion users. Yet only a small group of people understand the processes that take place on it and quickly steer its development into new directions.

      Novelty of the subject

      Web Science is an upcoming academic field. Much information about the web already exists online, but no course that comprises all of it.

      High value for every web user

      The MOOC would be of high value and of relevance for anybody using the web e.g:

      • A programmer who is building the next web application
      • A company deciding their web strategy
      • A judge who has to decide a case regarding net neutrality or copy right infringements
      • The Government as well as public authorities which have to make decisions on how to regulate the web

      The web is the right place to learn about the web

      The web itself is the best platform to educate people about the web since you can always point directly to the object of study. By creating a MOOC we will be able to aggregate, organize and filter much of the available information.

      Integration within our institution

      The MOOC will be a core element for the web science lecture of our web science master program. The goal is that students will work with the material provided by the MOOC and the instructors will replace classical lectures with public Q&A sessions. Additionally the Web Science lecture of 2013/2014 will serve as an internal testing of the MOOC such that the improved MOOC can launch on iversity in 2014.

      Course content

      This MOOC consists of ten lessons divided into three parts.

      1. Lesson 1 – 3: Foundations of the web
      2. Lesson 4 – 7: Theoretical results of web user behavior
      3. Lesson 8 – 10: Web & society

      Lesson 1 & 2: History of the Web & Web Architecture

      You will understand the historical development of the web and see how the cold war in combination with advances in technical developments led to the Internet Protocol suite.

      On each Layer you will know one protocol and understand how these protocols build an open, inter operable and decentralized system. Furthermore you will learn about the domain name system and find out why the concepts of URI and Hypertext were crucial for the success of the web.

      Lesson 3: Structure of the Web


      You will learn about the six degrees of seperation and understand concepts like small world networks by studying ‘the other’ Milgram experiment. You will be able to use power law distributions to describe the structure of the web, its content and its users.

      Lesson 4 & 5: Micro and Macro behavior of web users & Social Network (Analysis)

      structure of the web
      You will be introduced to theories from Microsociology and see how applying them to the behavior of people on the web leads to macro structures such as:

      Analyzing social network data from the Koblenz Network Collection using Octave you will gain a deeper understanding of social theories and social networks.

      Lesson 6 & 7: Information Retrieval & Recommender systems


      Completing this section you will understand the basic architecture of a (web) search engine. You can name the fundamental (non technical) difficulties one has in order to create a good information retrieval system. You will learn about the connection to recommender systems that are (not only!) used by large web shops to increase cross selling.
      You will be able to discuss the danger of such algorithms like the relevance paradox and the filter bubble.

      Lesson 8: Trust and Security


      You will learn how third parties act as trust providers on the web and how this issue is related to markets with asymmetric information. You will see that trust issues in the online word differ from the offline problems. You will know of ways like cryptography, secure communication and certificates to resolve trust issues and how those techniques can even lead to a new currency.

      Lesson 9: Web Economics


      You will know of e-commerce models like online shopping & auctions as well as online advertising and marketing. You will be able to interpret and apply metrics for web analytics such as

      Lesson 10: Web Governance and Web Ethics


      Finally you will understand the important role of institutions like W3C, IETF and ICANN . You will use your understanding of the web architecture to discuss and explain the connections between

      So please go to https://moocfellowship.org/submissions/web-science an learn more about the course and vote for it.

      ]]>
      https://www.rene-pickhardt.de/please-help-me-to-realize-my-web-science-massive-open-online-course/feed/ 6
      Open access and data from my research. Old resources for various topics finally online. https://www.rene-pickhardt.de/open-access-and-data-from-my-research-old-resources-for-various-topics-finally-online/ https://www.rene-pickhardt.de/open-access-and-data-from-my-research-old-resources-for-various-topics-finally-online/#respond Mon, 05 Nov 2012 05:19:53 +0000 http://www.rene-pickhardt.de/?p=1430 Being strong pro on the topic of open access I always try to publish all my work on my blog but sometimes I am busy or I forget to update so today I took the time to look at all my old drafts and the stuff that hasn’t been published yet. So here is a list of new content on my blog that should have been published long ago I also linked it in the articles of interest:

      In the last month I have created quite some content for my blog and it will be published over the next weeks. So watch out for screen casts how to create an autocompletion in gwt with neo4j, how to create ngrams from wikipedia, thoughts and techniques for related work, reasearch ideas and questions that we found but probably have not the time to work on

      ]]>
      https://www.rene-pickhardt.de/open-access-and-data-from-my-research-old-resources-for-various-topics-finally-online/feed/ 0
      Typology Oberseminar talk and Speed up of retrieval by a factor of 1000 https://www.rene-pickhardt.de/typology-oberseminar-talk-and-speed-up-of-retrieval-by-a-factor-of-1000/ https://www.rene-pickhardt.de/typology-oberseminar-talk-and-speed-up-of-retrieval-by-a-factor-of-1000/#comments Thu, 16 Aug 2012 11:39:25 +0000 http://www.rene-pickhardt.de/?p=1396 Almost 2 months ago I talked in our oberseminar about Typology. Update: Download slides Most readers of my blog will already know the project which was initially implemented by my students Till and Paul. I am just about to share some slides with you. They explain on one hand how the systems works and on the other hand give some overview of the related work.
      As you can see from the slides we are planning to submit our results to SIGIR conference. So one year after my first blogpost on graphity which devoloped in a full paper for socialcom2012 (graphity blog post and blog post for source code) there is the yet informal typology blog post with the slides about the Typology Oberseminar talk and 3 months left for our SIGIR submission. I expect this time the submission will not be such a hassle as graphity since I shuold have learnt some lessons and also have a good student who is helping me with the implementation of all the tests.
      Additionally I have finally uploaded some source code to git hub that makes the typology retrieval algorithm pretty fast. There are still some issues with this code since it lowers the quality of predictions a little bit. Also the index has to be built first. Last but not least the original SuggestTree code did not save the weights of the items to be suggested. I need those weights in the aggregation phase. Since i did not want to extend the original code I placed the weights at the end of the suggested Items. This is a little inefficent.
      The main idea why retrieval speeds up with the new algorithm is that typology needs to make sorting over all outedges of a node. This is rather slow especially if one only needs the top k elements. Since neo4j as a graph data base does not provide indices for this kind of data I was forced to look for another way to presort the data. Additionally if a prefix is known one does not have to look at all outgoing edges. I found the Suggest Tree class by Nicolai Diethelm. Which solved the problem in a very good way and lead to such a great speed. The index is not persistent yet and it also needs quite some memory. On the other hand for every node a suggest tree is built. This means that the index can be distributed in a very easy manner over several machines allowing for horizontal scaling!
      Anyway the old algorithm was only able to handle like 20 requests per second and now we have something like 14 k requests and as I mentioned there is still a little space for more (:
      I hope indices like this will be standard in neo4j soon. This would open up the range of applications that could make good use of neo4j.
      Like always I am happy for any suggestions and I am looking forward to do the complete evaluation and paper writing for typology.

      ]]>
      https://www.rene-pickhardt.de/typology-oberseminar-talk-and-speed-up-of-retrieval-by-a-factor-of-1000/feed/ 2
      Graphity source code and wikipedia raw data is online (neo4j based social news stream framework) https://www.rene-pickhardt.de/graphity-source-code/ https://www.rene-pickhardt.de/graphity-source-code/#comments Mon, 09 Jul 2012 15:43:57 +0000 http://www.rene-pickhardt.de/?p=1377 UPDATE: there is now the source code of an entire graphity server application online!
      8 months ago I posted the results of my research about fast retrieval of social news feeds and in particular my graph index graphity. The index is able to serve more than 12 thousand personalized social news streams per second in social networks with several million active users. I was able to show that the system is independent of the node degree or network size. Therefor it scales to graphs of arbitrary size.
      Today I am pleased to anounce that our joint work was accepted as a full research paper at IEEE SocialCom conference 2012. The conference will take place in early September 2012 in Amsterdam. As promised before I will now open the source code of Graphity to the community. Its documentation could / and might be improved in future also I am sure that one is even able to use a better data structure for our implementation of the priority queue.
      Still the attention from the developer community for Graphity was quite high so maybe the source code is of help to anyone. The source code consists of the entire evaluation framework that we used for our evaluation against other baselines which will also help anyone to reproduce our evaluation.
      There is some nice things one can learn in setting up multthreading for time measurements and also how to set up a good logging mechanism.
      The code can be found at https://github.com/renepickhardt/graphity-evaluation and the main Algorithm should lie in the file:
      https://github.com/renepickhardt/graphity-evaluation/blob/master/src/de/metalcon/neo/evaluation/GraphityBuilder.java
      other files of high interest should be:

      I did not touch it again over the last couple months and it really has a lot of debugging comments inside. My appologies for this bad practice. I hope you can oversee this by having in mind that I am a mathematician and this was one of my first bigger evaluation projects. In my own interest I promise next time I produce code that will be easier to read / understand and reuse.
      Still if you have any questions suggestions or comments feel free to contact me.
      The raw data is can be downloaded at:

      the format of these files is straight foward:
      de-nodeIs.txt has first some ID then a tab and then the title of the wikipedia article this is just necessary if you want to display your data with titles rather than names.
      the interesting file is the de-events.log in this file there are 4 columns
      timestamp TAB FromNodeID TAB [ToNodeID] TAB U/R/A
      So every line tells exactly when an article FromNodeID changes. if only 3 collumns are available and an U is written then the article just changed. Maybe links in the article changed in this case there exists another nodeID in the 3 column and an A or a R for add or remove respectively.
      I think processing these files is rather straight forward. With this file you can totally simulate the growth of wikipedia over time. The file is sorted by the 2. column. If you want to use it in our evaluation framework you should sort this by the first column. This can be done on a unix shell in less than 10 minutes with the sort command.
      Sorry I cannot publish the paper right now on my blog yet since the camera ready version has to be prepared and checked in to IEEE. But follow me on twitter or subscribe to my newsletter so I can let you know as soon as the entire paper as a pdf is available.

      ]]>
      https://www.rene-pickhardt.de/graphity-source-code/feed/ 7
      Related-work.net – Product Requirement Document released! https://www.rene-pickhardt.de/related-work-net-product-requirement-document-released/ https://www.rene-pickhardt.de/related-work-net-product-requirement-document-released/#comments Mon, 12 Mar 2012 10:26:50 +0000 http://www.rene-pickhardt.de/?p=1176 Recently I visited my friend Heinrich Hartmann in Oxford. We talked about various issues how research is done in these days and how the web could theoretically help to spread information faster and more efficiently connect people interested in the same paper / topics.
      The idea of http://www.related-work.net was born. A scientific platform which is open source and open data and tries to solve those problems.
      But we did not want to reinvent the wheel. So we did some research on existing online solutions and also asked people from various disciplines to name their problems. Find below our product requirement document! If you like our approach you can contact us or contribute on the source code find some starting documentation!
      So the plan is to fork an open source question answer system and enrich it with the features fulfilling the needs of scientists and some social aspects (hopefully using neo4j as a supporting data base technology) which will eventually help to rank related work of a paper.
      Feel free to provide us with feedback and wishes and join our effort!

      Beginning of our Product Requirement Document

      We propose to create a new website for the scientific community which brings together people which are reading the same paper. The basic idea is to mix the functionality of a Q&A platform (like MathOverflow) with a paper database (like arXiv). We follow a strict openness principal by making available the source code and the data we collect.
      We start with an analysis how the internet is currently used in different fields and explain the shortcomings. The actual product description can be found under the section “Basic idea”. At the end we present an overview over the websites which follow a similar approach.
      This document – as well as the whole project – is work in progress. We are happy about any kind of comments or other contributions.

      The distribution of scientific knowledge

      Every scientist hast to stay up to date with the developments in his area of research. The basic sources for finding new information are:

      • Conferences
      • Research Seminars
      • Journals
      • Preprint-servers (arXiv)
      • Review Databases (MathSciNet, Zentralblatt, …)
      • Q&A Sites (MathOverflow, StackOverflow, …)
      • Blogs
      • Social Networks (Twitter, Google+)
      • Bibliograhpic Databases (Mendeley, nNode, Medline, etc. )

      Every community has found its very own way of how to use this tools.

      Mathematics by Heinrich Hartmann – Oxford:

      To stay up to date with recent developments I check arxiv.org on a daily basis (RSS feed) participate in mathoverflow.net and search for papers over Google Scholar or MathSciNet. Occasionally interesting work is shared by people in my Google+ circles. In general the speed of pure mathematics is very slow. New research often builds upon work which has been out for a few years. To stay reasonably up to date it is enough to go to conferences every 3-5 months.
      I read many papers on myself because I am the only one at the department who does research on that particular topic. We have a reading class where we read papers/lecture notes which are relevant for more people. Usually they are concerned with introductions to certain kinds of theory. We have weekly seminars where people talk about their recently published work. There are some very active blogs by famous mathematicians, but in my area blogs play virtually no role.

      Computer Science by René Pickhardt – Uni Koblenz

      In Computer Science topics are evolving but also changing very quickly. It is always important to have both an overview of upcoming technologies (which you get from tech blogs) as well as access to current research trends.
      Since the speed in computer science is so fast and the review process in Journals often takes much time our main source of information and papers are conferences and twitter.

      • Usually conference papers are distributed digitally to participants. If one is interested in those papers google queries like “conference name year papers” are frequently used. Sites like http://www.sciweavers.org/ host and aggregate preprints of papers and organize them by conference.
      • The general method to follow a conference that one is not attending is to follow the hashtag of the conference on Twitter. In general Twitter is the most used tool to share distribute and find information not only for papers but also for the above mentioned news about upcoming technologies.

      Another rich source for computer scientists is, of course, the related work of papers and google scholar. Especially useful is the method of finding a very influential paper with more than 1000 citations and find newer papers that quote this paper containing a certain keyword which is one of the features of google scholar.
      The main problem in computer science is not to find a rare paper or idea but rather to filter the huge amount of publications and also bad publications and also keep track of trends. In this way a system that ranks and summarize papers (not only by abstract and citation counts) would help me a lot to select what related work of a paper I should read!

      Psychology by Elisa Scheller – Uni Freiburg

      As a psychologist/neuroscientist, I receive recommendations for scientific papers via google scholar alerts or science direct alerts (http://www.sciencedirect.com/); I receive alerts regarding keywords or regarding entire journal issues. When I search for a certain publication, I use pubmed.org or scholar.google.com. This can sometimes be kind of annoying, as I receive multiple alerts from different sources; but I guess it is the best way to stay up to date regarding recent developments. This is especially important in my field, as we feel a big amount of “publication pressure”; I work on a method which is considered as “quite fancy” at the moment, so I also use the alerts to make sure nobody has published “my” experiment yet.
      Sometimes a facebook friend recommends a certain publication or a colleague points me to it. Most of the time, I read articles on my own, as I am the only person working on this specific topic at my institution. Additionally, we have a weekly journal club where everyone in turn presents work which is related to our focus of research, e.g. a certain part of the human brain. There is also a weekly seminar dedicated to presentations about ongoing projects.
      Blogs (e.g. mindhacks.com, http://neuroskeptic.blogspot.com/) can be a source to get an overview about recent developments, but I have to admit I use them mainly for work-related entertainment.
      All in all, it is easy to stay up to date using alerts from different platforms;  the annoying part of it is the flood of emails you receive and that you are quite often alerted to articles that don’t fit your interests (no matter how exact you try to specify your keywords).

      Biomedical Research by Johanna Goldmann – MIT

      In the biological sciences, in research at the bench – communication is one of the most fundamental tools a scientist can have. Communication with other scientist may open up the possibilities of new collaborations, can lead to a completely new view point of a known question, the integration and expansion of methods as well as allowing a scientist to have a good understanding of what is known, what is not known and what other people have – both successfully and unsuccessfully – tried to investigate.
      Yet communication is something that is currently very much lacking in academic science – lacking to the extent that most scientist will agree hinders the progress of research. Nonetheless the lack of communication and the issues it brings with it is something that most scientists will have accepted as a necessary evil – not knowing how to possibly change it.
      Progress is only reported in peer-reviewed journals – many which are greatly affected not only but what is currently “sexy” in research but also by politics and connections and the “publish or perish” pressure. Due to the amount of this pressure in publishing in journals and the amount of weight the list of your publications will have upon any young scientists chances of success, scientist tend also to be very reluctant in sharing any information pre-publication.
      Furthermore one of the major issues is that currently there really is no way of publishing or communicating either negative results or minor findings, which causes may questions or methods to be repeatedly investigated as well as a loss of information.
      Given how much social networks and the internet has changed communication as well as the access to information over the past years – there is a need for this change to affect research and communication in the life science and transform the way we think not only about solving and approaching research questions we gather but the information and insights we gain as a whole.

      Philosophy by Sascha Benjamin Fink – Uni Osnabrück

      The most important source of information for philosophers is http://philpapers.org/. You can follow trends going on in your field of interest. Philpapers has a list of almost all papers together with their abstracts, keywords and categories as well as a link to the publisher. Additional information about similar papers is displayed.
      Every category of papers is managed by some editor. For each category it is possible to subscribe to a newsletter. In this way once per month I will be informed about current publications in journals related to my topic of interest. Every User is able to create an account and manage his literature and the papers of his he is interested in.
      Other research and information exchange methods among philosophers consist of mailing lists, reading clubs and Blogs. Have a look at David Chalmers blog list. Blogs are also becoming more and more important. Unfortunately they are usually on general topics and discussing developments of the community (e.g. Leiter’s Blog, Chalmers’ Blog and Schwitzgebel’s Blog).
      But all together I still think that for me a centralized service like Philpapers is my favourite tool because it aggregates most information. If I don’t hear about it on Philpapers usually it is not that important. I think among Philosophers this platform – though incomplete – seems to be the standard for the next couple of years.

      Problems

      As a scientist it is crucial to be informed about the current developments in the research area. Abstracting from the reports above we divide the tasks roughly into the following stages.

      1. Finding and filtering new publications:

      • What is happening right now? What are the current hot topics my area? What are current trends? (→ Check arXiv/Twitter)
      • Did a friend of mine write something? Did a “big shot” write something?
        (→ Check meta information: title, authors)
      • Are my colleagues excited about a new development? (→ Talk to them.)

      2. Getting more information about a given paper:

      • What is actually done in a given paper? Is it relevant for me? Is it really new? Is it a breakthrough? (→ Read abstracts. Find a good readable summary/review.)
      • Judge the quality of a paper: Is it correct? Is it well written?
        ( → Where is it published, if at all? Skim through content.)

      Finally there is a fundamental decision: Shall I read the whole paper, or not? which leads us to the next task.

      3. Understanding a paper: Understanding a paper in depth can be a very time consuming and tedious process. The presentation is often very short and much knowledge is assumed from the reader. The notation choices can be bad, so that even the statements are hard to understand. In effect the paper is easily readable only for a very small circle of specialist in the area. If one is not in the lucky situation to belong to that circle, one usually applies the following strategies:

      1. Lookup references. This forces you to process a whole tree of older papers which might be hard to read, and hard to get hold of. Sometimes it is worthwhile to consult a textbook to polish up fundamentals.
      2. Finding additional resources. Is there a review? Is there a related video lecture or slides explaining the material in more detail? Is the author going to a conference in the near future, or even giving a seminar in the area?
      3. Join forces. Find people thinking about the same paper: Has somebody at my department already read the paper, so that I can ask some questions? Is there enough interest to make a reading group, or more formally, run a seminar about that paper.
      4. Contact the author. This a last resort. If you have struggled with understanding the paper for a very long time and really need/want to get it, you might eventually write an email to the author – who might respond, or not. Sometimes even errors are found! – and not published! An indeed, there is no easy way to publish “errata” anywhere on the net.

      In mathematics most papers are not getting read though the end. One uses strategies 1 & 2 till one gets stuck and moves on to something more exciting. The chances of survival are much better with strategy 3 where one is committed putting a lot of effort in it over weeks.

      4. Finding related work. Where to go from there? Is the paper superseded by a more recent development? Which are the relevant papers which the author builds upon? What are the historic influences? What are the founding ideas of the subject? Finding related work is very time consuming. It is easy to overlook things given that the references are often vast, and sometimes hard to get hold of. Getting information over citations requires often access to commercial databases.

      Basic idea:

      All researchers around the world are faced with the same problems and come up with their individual solutions. There are great synergies in bringing these people together with an online platform! Most of the addressed problems are solved with a paper centric service which allows you to…

      • …get to know other readers of the paper.
      • …exchange with the other readers: ask questions, write comments, reviews.
      • …share the gained insights with the community.
      • …ask questions about the paper.
      • …discuss the paper.
      • …review the paper.

      We want to do that with a new mixture of a traditional Q&A system like StackExchange or MathOverflow with a paper database and social features. The key features of this system are as follows:

      Openness: We follow a strict openness principle. The software will be developed in open source. All data generated on this site will be under a creative commons license (like Wikipedia) and will be made available to the community in form of database dumps or an API (open data).

      We use two different types of content sites in our system: Papers and Discussions.

      Paper sites. A paper site is dedicated to a single publication. And has the following features:

      1. Paper meta information
        – show title, author, abstract, journal, tags
        – leave a comment
        – write a review (with wiki option)
        – vote up/down
      2. Paper resources
        – show pdfs, slides, notes, video lectures, etc.
        – add a resource
      3. Related Work
        – show the reference-tree and citations in an intelligent way.
      4. Discussions:
        – show related discussions
        – start a new discussion
      5. Social features
        – bookmark
        – share on G+, twitter

      The point “Related Work” deserves some further explanation. The citation graph offers a great deal more information than just a list of references. Together with the user generated content like votes and the individual paper bookmarks and social graph one has a very interesting data set which can be harvested. We want this point at least view with respect to: Popularity/Topics/Read by Friends. Later on one could add more sophisticated, even graphical views on this graph.


      Discussion sites.
      A discussion looks more like a traditional QA-question, with the difference, that each discussion may have related (many) papers. A discussion site contains:

      1. Discussion meta information (title, author, body)
      2. Discussion content
      3. Related papers
      4. Voting
      5. Follow/Bookmark

      Besides the content sides we want to provide the following features:

      News Stream. This is the start page of our website. It will be generated from the network consisting of friends, papers and authors. There should be several modes like:

      • hot: heavily discussed papers/discussions
      • new papers: list new publications (filtered by tag, like arXiv feed)
      • social: What did your friends do lately
      • default: intelligent mix of recent activity that is relevant to the logged in user


      Moreover, filter by tag should be always available.

      Search bar:

      • Searches contents of the site, but should also find papers on freely available databases (e.g. arXiv). Adding a paper should be very seamless process from there.
      • Search result ranking uses vote and view information.
      • Personalized search information. (Physicists usually do not want sociology results.)
      • Auto completion on paper titles, author, discussions.

      Social: (hard to implement, maybe for second version!)

      • Easily refer to users by @-syntax familiar from Twitter/Google+
      • Maintain a friendship / trust graph
      • Friendship recommendations
      • Find friends from Google+ on the site

      Benefits

      Our proposed websites improves the above mentioned problems in the following ways.
      1. Finding and filtering new publications:This step can be improved with even very little  community effort:

      • Tell other people, that you are interested in the paper. Vote it up or leave a comment if you are very excited about it.
      • Point out a paper to a colleague.

      2. Getting more information about a given paper:

      • Write a summary or review about a paper you have read or skimmed through. Maybe the introduction is hard to read or some results are not clearly stated.
      • Can you recommend reading this paper? Vote it up!
      • Ask a colleague for his opinion on the paper. Maybe he can write a summary?

      Many reviews of new papers are already written. E.g. MathSciNet and Zentralblatt maintain a large database of Reviews which are provided by the community and are not freely available. Many authors would be much more happy to write them to an open system!
      3. Understanding a paper:Here are the mayor synergies which we want to address with our project.

      • Ask a question: Why is the author using this experimental method? How does Lemma 3.4 work? Why do I need this assumption? What is the intiution behind the “virtual truncation”? What implications does this work have?
      • Start a discussion: (might involve more than one paper.) What is the difference of these two papers? Is there a reference explaining this more clearly? What should I read in advance to understand the theory?
      • Add resources. Tell the community about related videos, notes, books etc. which are available on other sites.
      • Share your notes. If you have discussed a paper in a reading class or seminar. Collect your notes or opinions and make them available for the community.
      • Restate interesting statements. Tell the community when you have found a helpful result which is buried inside the paper. In that way Google may find it!

      4. Finding related work. Having a well structured and easily navigable view on related papers simplifies the search a lot. The filtering benefits from the content generated by the users (votes) and individual information, like friends who have written/bookmarked a paper.

      Similar Sites on the Web

      There are several discussions in QA forum which are discussing precisely this problem:

      We found three sites on the internet which follow a similar approach which we examined more carefully.
      1. There is a social network which has most of our features implemented:

      researchgate.net
      “Connect with researchers, make your work visible, and stay current.”

      The Economist has dedicated an article to them. It is essentially a facebook clone, with special features for scientist.

      • Large, fast growing community. 1.4m +50.000/m. Mainly Biology and Medicine.
        (As Daniel Mietchen points out, the size might be misleading due to institutional accounts)
      • Very professional Look and Feel. Company from Berlin, Germany, funded by VC. (48 People involved, 10 Jobs advertised)
      • Huge Feature set:
        • Profile site, Connect to friends
        • News Feed
        • Publication Database, Conference Finder, Jobmarket
        • Every Paper its own page: with
          • Voting up/down
          • Comments
          • Metadata (Title, Author, Abstract, Preveiw)
          • Social Media (Share, Bookmark, Follow author)
        • Organize Workgroups/Reading Classes.

      Differences to our approach:

      • Closed Data / Closed Source
      • Very complex site which solves a lot of purposes
      • Only very basic features on paper site: vote/comment.
      • QA system is not linked well to paper database
      • No MathML
      • Mainly populated by undergraduates

      2. Another website which comes reasonably close is:

      http://www.sciweavers.org/

      “an academic network that aggregates links to research paper preprints
      then categorizes them into proceedings.”

      • Includes a large collection of online tools for various purposes
      • Have a big library of papers/software/datasets/conferences for computer science.
        Paper sites have:
        • Meta information and preview
        • Vote functionality and view statistics, tags
        • Comments
        • Related work
        • Bookmarking
        • Author information
      • User profiles (no friendships)


      Differences to our approach:

      • Focus on computer science community
      • Comment and Discussions are well hidden on paper sites
      • No News stream
      • Very spacious design

       
      3. Another very similar site is:

      journalfire.com – beta
      “Share what your read – connect to colleagues – create journal clubs.”

      It has the following features:

      • Comment on Papers. Activity feed (?). Follow articles.
      • Host Journal Clubs. Create Events related to papers.
      • Powerful search box fetching papers from Arxiv and Pubmed (slow)
      • Social features on site: User profiles, friend finder (no fb/g+ integration yet)
      • News feed – from subscribed papers and friends
      • Easy paper import via Bookmarklet
      • Good usability!! (but slow loading times)
      • Private reading clubs cost money!

      They are very skilled: Maintained by 3 PhD students/postdocs from Caltec and MIT.

      Differences to our approach:

      • Closed Data, Closed Source
      • Also this site misses (currently) misses out ranking features
      • Very Closed model – Signup required
      • Weak Crowd sourcing: Cannot add Meta information

      The site is still at its very beginning with little users. The project started in 2010 and did not gain much momentum since.

      The other sites are roughly classified in the following categories:
      1. Single people who are following a very similar idea:

      • annotatr.appspot.com. Combines a metadata-base with the disqus plugin. You can comment but not rate. Good usability. Nice CSS. Good search function. No MathML. No related article suggestion. Maintained by two academics in private time. Hosted on Google Apps. Closed Source – Closed Data.
      • r-Forum – a resource where mathematicians can collect record reviews, corrections of a resource (e.g. paper, talk, …). A simple Vanilla-Forum/Wiki with almost no content used by maybe 12 people in US. No automated Data import. No rating system.
      • http://math-arch.org/ – Post comments to math papers. very bad usability – get even errors. Maintained by a group of russian programmers LogicSun. Closed Source – Closed Data.

      Analysis: Although the principal idea to connect people reading papers is there. The implementation is very bad in terms of usability and even basic programming. Also the voting features are missed out.

      2. (Semi) Professional sites.

      • Public Libary of Science very professional, huge paper data base for mainly biology, medicine. Features full text papers, lots of interesting meta information including references. Has comment features (not very visible) and news stream on the start page.
        No QA features (+1, Ask question) on the site. Only published articles are on the site.
      • Mendeley.com – Huge Bibliographic database with bookmarking and social features. You can organize reading groups in there, with comments and notes shared among the participants. Features a news stream with papers by friends. Nice import. Impressive fulltext data and Reference features.
        No QA features for paper. No comments for paper. Requires Signup to do anything useful.
      • papercritic.com – Open review database. Connected to Mendely bibliographic libary. You can post reviews. No rating. No comments. Not open: Mendely is commercial.
      • webofknowledge.com. Commercial academic citation index.
      • zotero.org – features programm that runs inside a browser. “easy-to-use tool to help you collect, organize, cite, and share your research sources”

      Analysis: The goal of all these tools is to simplify the reference management, by providing metadata like references, citations, abstracts, author profiles. Commenting features on the paper site are not there or not promoted.
      3. Vaguely related sites which solve different problems:

      • citeulike.org – Social bookmarking for papers. Closed Source – Open Data.
      • http://www.scholarpedia.org. A peer reviewed open access encyclopedia.
      • Philica.com Online Journal which publishes articles from any field along with its reviews.
      • MathSciNet/Zentralblatt – Review database for math community. Closed Source – Commercial.
      • http://f1000research.com/ – Online Journal with a public, post publish review process. “Open Science – Open Data – Open Review”
      • http://altmetrics.org/manifesto/ as an emerging trend from the web-science trust community. Their goal is to revolutionize the review process and create better filters for scientific publications making use of link structures and public discussions. (Might be interesting for us).
      • http://meta.wikimedia.org/wiki/WikiScholar – one of several ideas under discussion at Wikimedia as to a central repository for references (that are cited on Wikipedias and other Wikimedia projects)

      Upshot of all this:

      There is not a single site featuring good Q&A features for papers.

      If you like our approach you can contact us or contribute on the source code find some starting documentation!
      So the plan is to fork an open source question answer system and enrich it with the features fulfilling the needs of scientists and some social aspects which will eventually help to rank related work of a paper.
      Feel free to provide us with feedback and wishes and join our effort!

      ]]>
      https://www.rene-pickhardt.de/related-work-net-product-requirement-document-released/feed/ 17
      Claudio Martella talks @ FOSDEM about Apache Giraph: Distributed Graph Processing in the Cloud https://www.rene-pickhardt.de/claudio-martella-talks-fosdem-about-apache-giraph-distributed-graph-processing-in-the-cloud/ https://www.rene-pickhardt.de/claudio-martella-talks-fosdem-about-apache-giraph-distributed-graph-processing-in-the-cloud/#comments Sun, 05 Feb 2012 09:01:45 +0000 http://www.rene-pickhardt.de/?p=1085 Claudio Martella introduces Apache Giraph which according to him is a loose implementation of Google Pregel which was introduced  on SIGMOD in 2010. He points out that Map Reduce cannot be used to do graph processing.
      He then gave an example on how MapReduce can be used to to do page rank calculation. He points out that Pagerank can be calculated as a local property of a graph in a distributed way by calculating local pagerank from the knowledge of the neighbours. He did this to show what the Drawbacks of this method are in his oppinion:

      • job boostrap take some time
      • disk is hit about 6  times
      • Data is sorted
      • Graph is passed through

      Like in the Pregel Paper he says that other Graphalgorithms like singlesource shortest paths have the same problems. 
       

      Claudio Martella from Apache explains how giraph works at in the graph dev room @ Fosdem 2012
      Claudio Martella from Apache explains how giraph works at in the graph dev room @ Fosdem 2012

       
       
      After introducing more about implementing Pregle ontop of the existing MapReduce structure for distributing he says that this system has some advantages over MapReduce

      • it’s a stateful computation
      • Disk is hit if/only for checkpoints
      • No sorting is necessary
      • Only messages hit the network

      He points out that the advantages of Giraph over other methods (Hama, GoldenOrb, Signal/Collect) are especially an active community (Facebook, Yahoo, Linkedin, Twitter) behind this project. I personally think another advantage is that it is run by Apache who already run MapReduce (Hadoop) with great success. So it is something that people trust…
      Claudio points out explicitly that they are searching for more contributors and I think this is really an interesting topic to work on! So thank Claudio for your inspiring work!

      here the video streams from the graph dev room:

      ]]>
      https://www.rene-pickhardt.de/claudio-martella-talks-fosdem-about-apache-giraph-distributed-graph-processing-in-the-cloud/feed/ 4