open source – Data Science, Data Analytics and Machine Learning Consulting in Koblenz Germany

My List of People who I admire and which I find truly inspiring

Rene — Wed, 27 Aug 2014 14:44:59 +0000

This is my personal list of people that I admire. In a sense I would say if you want to know what I stand for you can just have a brief look at this list and at the values, norms and ideas the people of the list stand for. I have been heavily criticised that this list contains too many white men and not people from other cultures and sex. I think the main reason is that I am a western person and even though I lived in China I can just see to the horizon of my culture and of course I am being influenced by my culture. This is also where my values come from. So if you know people with a similar set of ideas and beliefs from other cultures feel free to contact me or leave a comment and point them out to me. I am very excited to “meet” more exciting people especially outside of my current horizon.
Also the following list has a randomised order.

Tank man

from:https://en.wikipedia.org/wiki/Tank_Man

A man who stood in front of a column of tanks on June 5, 1989, the morning after the Chinese military had suppressed the Tiananmen Square protests of 1989 by force, became known as the Tank Man or Unknown Protester. The tanks manoeuvred to pass by the man, and he moved to continue to obstruct them, in something like a dance. The incident was filmed and seen worldwide.

further info:

own reason:
This is an unbelievable example of civil courage. Obviously his actions did not really change how things have been going on around tiananmen but I think this is truly heroic and brave.
I wish I will always have a similar courage when it comes to the point of fighting for a good thing or idea.

Aaron Swartz

from:https://en.wikipedia.org/wiki/Aaron_Swartz

Aaron Hillel Swartz (November 8, 1986 – January 11, 2013) was an American computer programmer, writer, political organizer and Internet Hacktivist.
Swartz was involved in the development of the web feed format RSS, the organization Creative Commons, the website framework web.py and the social news site, Reddit, in which he became a partner after its merger with his company, Infogami.
Swartz’s work also focused on sociology, civic awareness and activism. He helped launch the Progressive Change Campaign Committee in 2009 to learn more about effective online activism. In 2010 he became a research fellow at Harvard University’s Safra Research Lab on Institutional Corruption, directed by Lawrence Lessig. He founded the online group Demand Progress, known for its campaign against the Stop Online Piracy Act.
On January 6, 2011, Swartz was arrested by MIT police on state breaking-and-entering charges, after systematically downloading academic journal articles from JSTOR. Federal prosecutors later charged him with two counts of wire fraud and 11 violations of the Computer Fraud and Abuse Act, carrying a cumulative maximum penalty of $1 million in fines, 35 years in prison, asset forfeiture, restitution and supervised release.
Swartz declined a plea bargain under which he would serve six months in federal prison. Two days after the prosecution rejected a counter-offer by Swartz, he was found dead in his Brooklyn, New York apartment, where he had hanged himself.
In June 2013, Swartz was posthumously inducted into the Internet Hall of Fame.

further info:

own reason:
Just read the Guerilla open access manifesto. Writing something like this and understanding the impact of open access is terrific. But living it through the PACER project and also through the JSTOR case at MIT is a complete different story.
I strongly believe that unjust laws exist but we have to understand that law is a relative thing. It is us in our society who make the laws. So it is also us to change them. I think norms and values of a society should stand above a particular law. So what Aaron did is following a very strong set of norms and values and fighting for a better law. One might doubt if his actions have been to radical and not in the way how we as a society decided to live our democratic processes but I am sure Aaron was driven by the deep wish to make the world a more place with more justice.

Lawrence Lessig

from:https://en.wikipedia.org/wiki/Lawrence_Lessig

Lawrence “Larry” Lessig (born June 3, 1961) is an American academic and political activist. He is a proponent of reduced legal restrictions on copyright, trademark, and radio frequency spectrum, particularly in technology applications, and he has called for state-based activism to promote substantive reform of government with a Second Constitutional Convention. In May 2014, he launched a crowd-funded political action committee which he termed May Day PAC with the purpose of electing candidates to Congress who would pass campaign finance reform.
Lessig is director of the Edmond J. Safra Center for Ethics at Harvard University and a Professor of Law at Harvard Law School. Previously, he was a professor of law at Stanford Law School and founder of the Center for Internet and Society. Lessig is a founding board member of Creative Commons and the founder of Rootstrikers, and is on the board of MapLight. He is on the advisory boards of the Democracy Café, Sunlight Foundation and Americans Elect. He is a former board member of the Free Software Foundation, Software Freedom Law Center and the Electronic Frontier Foundation.

further info:

own reason:
I have to admit that I did not come around to read his book code2.0 which is said to be excellent. But from his talks and actions I love how Lessig points out problems within society and how he is trying to educate people about it. He seems to have a very similar set of norms and values as Aaron did (and I do) but he is following “the protocol” of our society to fight for them. Especially he seems to be a true intellectual and not just a person who made a career in academia.

Geschwister Scholl

from:https://en.wikipedia.org/wiki/Geschwister_Scholl

Hans and Sophie Scholl, often referred to in German as die Geschwister Scholl (literally: the Scholl siblings), were a brother and sister who were members of the White Rose, a student group in Munich that was active in the non-violent resistance movement in Nazi Germany, especially in distributing flyers against the war and the dictatorship of Adolf Hitler. In post-war Germany, Hans and Sophie Scholl are recognized as symbols of the humanist German resistance movement against the totalitarian Nazi regime.

further info:

own reason:
It always is hard to pick a single person or in this case siblings when it comes to role models in opposing a regime that is harmful for the people of a society. Of course the Geschwister Scholl have not been the only people in the resistence movement of Nazi Germany and there have been other regimes in other places that also had resitence movements. Still I believe their actions are very remarkable. I think it is the role of students to point out problems in our society. Nowadays many students seem to just accept everything that is happening. Distributing the fliers with the “truth” about Nazi Germany was not only brave but also at the university attracting many people that could multiply the message
I think it is similar to Aaron Swartz. Students and young people are in the role of more radically pointing out problems within society and the Geschwister Scholl most certainly fulfilled this role.

Randy Pausch

from:https://en.wikipedia.org/wiki/Randy_Pausch

Randolph Frederick “Randy” Pausch (October 23, 1960 – July 25, 2008) was an American professor of computer science, human-computer interaction, and design at Carnegie Mellon University (CMU) in Pittsburgh, Pennsylvania.
Pausch learned that he had pancreatic cancer in September 2006, and in August 2007 he was given a terminal diagnosis: “3 to 6 months of good health left”. He gave an upbeat lecture titled “The Last Lecture: Really Achieving Your Childhood Dreams” on September 18, 2007, at Carnegie Mellon, which became a popular YouTube video and led to other media appearances. He then co-authored a book called The Last Lecture on the same theme, which became a New York Times best-seller. Pausch died of complications from pancreatic cancer on July 25, 2008.

further info:

own reason:
It might be the American optimism that is behind Randy Pausch’s lecture and talk but I actually do not admire him for giving an inspiring lecture even though he was dying. I admire him much more for the fact that he seemed to have lived his life in a very positive way. His goal of enabling the dreams of others sounds very honest to me. I also like the statements that he made about “If you life your life in the right way, the dreams come to you”. I think Randy is a very good example to show that no matter what fate did with a person it is the person’s responsibility to answer to this. When people cry out they might receive pitty but probably not really improve their situation. I guess one can summarise Randy with his quote:

We cannot change the cards we are dealt with only the way we play them.

By the way I especially like the idea that he gave this talk for his kids to teach them a lesson at a time when they are grown up and he would not be around anymore.

Tim Berners-Lee

from:https://en.wikipedia.org/wiki/Tim_Berners-Lee

Sir Timothy John “Tim” Berners-Lee, OM, KBE, FRS, FREng, FRSA, DFBCS (born 8 June 1955), also known as “TimBL”, is an English computer scientist, best known as the inventor of the World Wide Web. He made a proposal for an information management system in March 1989, and he implemented the first successful communication between a Hypertext Transfer Protocol (HTTP) client and server via the Internet sometime around mid November of that same year.
Berners-Lee is the director of the World Wide Web Consortium (W3C), which oversees the Web’s continued development. He is also the founder of the World Wide Web Foundation, and is a senior researcher and holder of the Founders Chair at the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). He is a director of the Web Science Research Initiative (WSRI), and a member of the advisory board of the MIT Center for Collective Intelligence.
In 2004, Berners-Lee was knighted by Queen Elizabeth II for his pioneering work. In April 2009, he was elected a foreign associate of the United States National Academy of Sciences. He was honoured as the “Inventor of the World Wide Web” during the 2012 Summer Olympics opening ceremony, in which he appeared in person, working with a vintage NeXT Computer at the London Olympic Stadium. He tweeted “This is for everyone”, which instantly was spelled out in LCD lights attached to the chairs of the 80,000 people in the audience.

further info:
Even though he is a bad talker and reading his book (weaving the web) will help much more I link a video here:

own reason:
In my opinion there are many reasons to admire Tim Berners Lee. Of course he is famose for inventing the world wide web. But I think the time was due for this invention. Internet itself was not very useful. The ideas of hypertext where around and similar systems existed. As always on the internet we have a strong the winner takes it all phenomenon. So bringing us the world wide web is certainly something Tim should get credit for but it is not the main reason why I admire him.
What is really cool about Tim Berners Lee is that he seems to have a very clear sense and abstraction of technical things and especially about their impact. Maybe it is easy to develop this sense after creating a technology that literally everyone on the Internet is using but still I like his activism for openess, ineroperability, net neutrality and freedom in general but freedom of speech in particular. Also he addressed me directly after asking a question in a Q&A session at a conference. His attitude of saying if you want to change the world you have the tools don’t talk just go geek and do it will certainly stick to me for the rest of my life.

Other than that I like that he does not fear to make a political statement about the problems with the web and where it should go and that he seems to have no interest whatsoever in becoming a multi billionaire which he could have easily achieved after sitting on the invention of the world wide web and being so central in its development.

Albert Einstein

from:https://en.wikipedia.org/wiki/Albert_Einstein

Albert Einstein (/ˈælbərt ˈaɪnstaɪn/; German: [ˈalbɐt ˈaɪnʃtaɪn] ( listen); 14 March 1879 – 18 April 1955) was a German-born theoretical physicist and philosopher of science. He developed the general theory of relativity, one of the two pillars of modern physics (alongside quantum mechanics). He is best known in popular culture for his mass–energy equivalence formula E = mc2 (which has been dubbed “the world’s most famous equation”). He received the 1921 Nobel Prize in Physics “for his services to theoretical physics, and especially for his discovery of the law of the photoelectric effect”. The latter was pivotal in establishing quantum theory.
Near the beginning of his career, Einstein thought that Newtonian mechanics was no longer enough to reconcile the laws of classical mechanics with the laws of the electromagnetic field. This led to the development of his special theory of relativity. He realized, however, that the principle of relativity could also be extended to gravitational fields, and with his subsequent theory of gravitation in 1916, he published a paper on the general theory of relativity. He continued to deal with problems of statistical mechanics and quantum theory, which led to his explanations of particle theory and the motion of molecules. He also investigated the thermal properties of light which laid the foundation of the photon theory of light. In 1917, Einstein applied the general theory of relativity to model the large-scale structure of the universe.
He was visiting the United States when Adolf Hitler came to power in 1933 and, being Jewish, did not go back to Germany, where he had been a professor at the Berlin Academy of Sciences. He settled in the U.S., becoming an American citizen in 1940. On the eve of World War II, he endorsed a letter to President Franklin D. Roosevelt alerting him to the potential development of “extremely powerful bombs of a new type” and recommending that the U.S. begin similar research. This eventually led to what would become the Manhattan Project. Einstein supported defending the Allied forces, but largely denounced the idea of using the newly discovered nuclear fission as a weapon. Later, with the British philosopher Bertrand Russell, Einstein signed the Russell–Einstein Manifesto, which highlighted the danger of nuclear weapons. Einstein was affiliated with the Institute for Advanced Study in Princeton, New Jersey, until his death in 1955.
Einstein published more than 300 scientific papers along with over 150 non-scientific works. His great intellectual achievements and originality have made the word “Einstein” synonymous with genius.

further info:

own reason:
He was probably one of my first role models. I admire him for two reasons.
The first – which I nowadays actually find a stupid reason to admire someone – is just his pure intellect. Creating relativity theory was an amazing achievement of ignoring what we seem to know and just following the facts (as all good mathematicians and computer scientists should do all the time) But the list of his physical achievements does not stop at relativity theory (actually David Hilbert brought us general relativity much quicker and before Einstein (after he had talked to him on a conference) DOUBLE CHECK FACT) Further than that the list of various independent fields that he was working on in physics is just incredibly long.
The second reason is the way Einstein behaved about the development of the nuclear bomb. He first pointed out – by signing a letter to the American president of that time Roosevelt – that there is the danger that Nazi Germany might create a nuclear weapon. This led to the Manhatten project. The interesting part comes at the moment where Einstein regrets signing the letter. He said that if had known that this weapon would have been used against civil people and that Nazi Germany would not be successful in developing such a bomb he would have done nothing.
Many scientists have a great responsability. Knowledge can quickly become very dangerous or can be misused for a strategic advantage in harmful actions. Unfortunately I have the feeling that many scientists do not have the time or courage to think about ethics and the real impact of their research (I mean the impact that is not measured by citations and impact factors…). Even Einstein seemed not to be aware of his impact by writing this letter that led to the Manhatten project. Still he took responsibility after the Bombs had been used in Japan. I think many people in Einsteins position would have found a way of justifying how the americans had used the bomb against Japan. He did not. He publicly regreted what he did and had started. Finally he was a key player and intellectual of this open letter which pledges to the governments of this world to resolve conflicts in a peaceful way

Chelsea Manning

from:https://en.wikipedia.org/wiki/Chelsea_Manning

Chelsea Elizabeth Manning (born Bradley Edward Manning, December 17, 1987) is a United States Army soldier who was convicted in July 2013 of violations of the Espionage Act and other offenses, after releasing the largest set of classified documents ever leaked to the public. Manning was sentenced in August 2013 to 35 years confinement with the possibility of parole in eight years, and to be dishonorably discharged from the Army. Manning is a trans woman who, in a statement the day after sentencing, said she had felt female since childhood, wanted to be known as Chelsea, and desired to begin hormone replacement therapy. From early life and through much of her Army life, Manning was known as Bradley; she was diagnosed with gender identity disorder while in the Army.
Assigned in 2009 to an Army unit in Iraq as an intelligence analyst, Manning had access to classified databases. In early 2010, she leaked classified information to WikiLeaks and confided this to Adrian Lamo, an online acquaintance. Lamo informed Army Counterintelligence, and Manning was arrested in May that same year. The material included videos of the July 12, 2007 Baghdad airstrike, and the 2009 Granai airstrike in Afghanistan; 250,000 U.S. diplomatic cables; and 500,000 Army reports that came to be known as the Iraq War logs and Afghan War logs. Much of the material was published by WikiLeaks or its media partners between April and November 2010.
Manning was ultimately charged with 22 offenses, including aiding the enemy, which was the most serious charge and could have resulted in a death sentence. She was held at the Marine Corps Brig, Quantico in Virginia, from July 2010 to April 2011 under Prevention of Injury status—which entailed de facto solitary confinement and other restrictions that caused domestic and international concern—before being transferred to Fort Leavenworth, Kansas, where she could interact with other detainees. She pleaded guilty in February 2013 to 10 of the charges. The trial on the remaining charges began on June 3, 2013, and on July 30 she was convicted of 17 of the original charges and amended versions of four others, but was acquitted of aiding the enemy. She is serving her sentence at the maximum-security U.S. Disciplinary Barracks at Fort Leavenworth.
Reaction to Manning’s disclosures, arrest, and sentence was mixed. Denver Nicks, one of her biographers, writes that the leaked material, particularly the diplomatic cables, was widely seen as a catalyst for the Arab Spring that began in December 2010, and that Manning was viewed as both a 21st-century Tiananmen Square Tank Man and an embittered traitor. Reporters Without Borders condemned the length of the sentence, saying that it demonstrated how vulnerable whistleblowers are.

further info:

own reason:
Obviously I did not have the time to read everything that Manning has made public so I might be blinded by media coverage of his case. From what I know I can say that many others on the list Manning was bound to her moral and not to what she was allowed to do or not. I think she was truly trying to point out unjust things and I think especially the way she did it was actually pretty smart. I guess there is a lot of structural violence in politics and military. Pointing out problems in the “correct way” seems to not really change something. Therefor she just had to release the video of american soldiers randomly shooting civilians. Did she have to make public everything else? Who knows. Actually who cares? Making this video itself public is heroic and should have a much bigger impact than it did.
Going to jail for 35 years and having the society accepting this makes me just said. I really wonder what has to happen for people to make a revolution. Not that I believe in such a drastic action but having Manning in prison for 35 years is f*** up. I strongly hope that one day Chelsea Manning will receive the peace nobel price at some time.

Noam Chomsky

from:https://en.wikipedia.org/wiki/Noam_Chomsky

Avram Noam Chomsky (/ˈnoʊm ˈtʃɒmski/; born December 7, 1928) is an American linguist, philosopher, cognitive scientist, logician, political commentator and activist. Sometimes described as the “father of modern linguistics”, Chomsky is also a major figure in analytic philosophy. He has spent most of his career at the Massachusetts Institute of Technology (MIT), where he is currently Professor Emeritus, and has authored over 100 books. He has been described as a prominent cultural figure, and was voted the “world’s top public intellectual” in a 2005 poll.
Born to a middle-class Ashkenazi Jewish family in Philadelphia, Chomsky developed an early interest in anarchism from relatives in New York City. He later undertook studies in linguistics at the University of Pennsylvania, where he obtained his BA, MA, and PhD, while from 1951 to 1955 he was appointed to Harvard University’s Society of Fellows. In 1955 he began work at MIT, soon becoming a significant figure in the field of linguistics for his publications and lectures on the subject. He is credited as the creator or co-creator of the Chomsky hierarchy, the universal grammar theory, and the Chomsky–Schützenberger theorem. Chomsky also played a major role in the decline of behaviorism, and was especially critical of the work of B.F. Skinner. In 1967 he gained public attention for his vocal opposition to U.S. involvement in the Vietnam War, in part through his essay The Responsibility of Intellectuals, and came to be associated with the New Left while being arrested on multiple occasions for his anti-war activism. While expanding his work in linguistics over subsequent decades, he also developed the propaganda model of media criticism with Edward S. Herman. Following his retirement from active teaching, he has continued his vocal public activism, praising the Occupy movement for example.
Chomsky has been a highly influential academic figure throughout his career, and was cited within the field of Arts and Humanities more often than any other living scholar between 1980 and 1992. He was also the eighth most cited scholar overall within the Arts and Humanities Citation Index during the same period. His work has influenced fields such as artificial intelligence, cognitive science, computer science, logic, mathematics, music theory and analysis, political science, programming language theory and psychology. Chomsky continues to be well known as a political activist, and a leading critic of U.S. foreign policy, state capitalism, and the mainstream news media. Ideologically, he aligns himself with anarcho-syndicalism and libertarian socialism.

further info:

own reason:
Chomsky is very new on the list so I cannot say very much about him. I have watched several interviews and talk by him and I just find it amazing how he turned completely towards ethics and political activism and is highly educated, rational and fact driven (he seems always to just have the better argument). In particular I like his point of view on power systems (As far as I understand him he is not blaming single people for injustice but he is seeing the problem of structural violence). I also like his critical view on mass media therefor I am eager to read his book: manufacturing consent
I particular like his very clear view on fundamental issues and how certain policies inevitably lead to certain abuse.

Melinda Gates (also Bill Gates)

from:https://en.wikipedia.org/wiki/Bill_%26_Melinda_Gates_Foundation

Bill & Melinda Gates Foundation (BMGF or the Gates Foundation) is one of the largest private foundations in the world, founded by Bill and Melinda Gates. It was launched in 2000 and is said to be the largest transparently operated private foundation in the world. It is “driven by the interests and passions of the Gates family”. The primary aims of the foundation are, globally, to enhance healthcare and reduce extreme poverty, and in America, to expand educational opportunities and access to information technology. The foundation, based in Seattle, Washington, is controlled by its three trustees: Bill Gates, Melinda Gates and Warren Buffett. Other principal officers include Co-Chair William H. Gates, Sr. and Chief Executive Officer Susan Desmond-Hellmann.
It had an endowment of US$38.3 billion as of 30 June 2013. The scale of the foundation and the way it seeks to apply business techniques to giving makes it one of the leaders in the philanthrocapitalism revolution in global philanthropy, though the foundation itself notes that the philanthropic role has limitations. In 2007, its founders were ranked as the second most generous philanthropists in America, and Warren Buffett the first. As of May 16, 2013, Bill Gates had donated US$28 billion to the foundation.

further info:

own reason:
Ok I admit it is not fair to just name her. I mean it is still the Bill and Melinda Gates foundation. But from my perception it is Melinda who was the driving force and the eyeopener for Bill Gates. I always realised Bill Gates as one of the coldest and disgusting business man out there (On the same list as Steve Jobs and Marc Zuckerberg). Using Patents and Licence agreements and closed systems just for the purpose of becoming incredibly rich. Like other computer scientists he already had a deep impact on people and bringing us the operating systems and office suite was probably not that bad after all. I mean they were still useful tools for most people. Still he could have chosen a more ethical business model. Well how should he have seen these things when he was young. I guess he was even bound to investors and to what they wanted.
I guess with the help of Melinda he also realised that it would be to late to make drastic changes to Microsoft so he changed the focus in his life to create something new. Something that is much more sustainable and that feels very good.
Now using their wealth Bill and Melinda Gates start to tackle really important issues that we as humans can all tackle but which seem economically unimportant to tackle. This feels a little bit like a modern version of Robin Hood. Microsoft is pulling money out of the rich part of the world with nowadays ok software at high cost and vendor lockin but Bill and Melinda are distributing this money e.g. to fight diseases in areas of the world where the western world simply doesn’t care to fight these diseases. Also they act as multipliers to convince other rich people to do similar. I think this contributes a lot to more justice and progress.
Besides my love for technological topics the Bill and Melinda Gates foundation is besides the Wikimedia Foundation probably the only interesting NGO I am aware of and that I would be willing to work for and sacrifice my tech career. But I guess this could still even be done after a successful tech career (:
By the way fun fact: The rich get richer principle holds so incredibly in the case of bill and Melinda gates. Warren Buffet the “opponent” to Gates of being the wealthiest person in the world donated almost all his money to the Bill and Melinda Gates foundation which I think is an incredible trust provider to what Bill and Melinda are doing.

Uncertain candidates – since its had to say

There are some borderline candidates which I am still not sure about.

Julian Assange

I do not even know how to make up my mind. On the one hand Julian Assange seems to be an incredible important person and really doing a lot of good. On the other hand he seems very self centered and sometimes not authentic. I understand that he of has course operational costs and no fixed income. Still I am not sure how much is real

RAF – resp. Ulrike Meinhof

I guess in Germany it is almost as impossible to say that one sympathises with the RAF as it would be to state that one sympathises with the NSDAP. Yet I liked the fundamental problems the RAF addressed. Their methods where stupid and I guess there where a lot of “dead fish” swimming with the RAF and pursuing all the terror the RAF did but from their core beliefs and problems with the German society they seemed to have some really valid points.

Richard Stallman

Inventing the GPL was an an incredible smart move. I am not sure if this was the first copyleft licence and if Stallman really came up himself with the idea. Still he probably could and would have if he didn’t.
Stallman is often perceived to be too radical and not able to make a compromise. From what I understand (and within this article I believe that this is the topic with my biggest expertise) this is just the only way. There cannot be such a thing as “half free software” you are free or you are not free. The impact of being free is so incredibly big that I think it is indeed one of the view points in life where people really should not make a compromise. So I think that what Stallman is frequently being criticised for is actually one of his strongest points.

Linus Torvalds

I am not sure if he is just a winner takes it all guy or if there is more to him. Besides linux bringing git to the hacker community is the second and maybe on the long term even more impactful innovation by Linus Torvalds. Also the processes how he seems to work how he seems to understand the dynamics and social processes of the open source community is crazy.

Larry Page

People might ask: “Rene why is Steve jobs and Zuckerberg on your bad list and Larry page not? Where did he donate his money do and did he do all the philantropic work like Bill Gates?” My only response is: Yes that is a problem and that is part of the reason why I am still undecided about Page. What speaks for Page is his creativity combined with his strong will to use technology, and financial power to change the world and make it more automised and efficient. By pursuing this goal he seems to ignore economical principles. Google has released a bunch of products that are hard to monitise (even indirectly) or really “moonshot” projects. I have the feeling that page cannot donate money or give up power within google unless he has brought the amount of innovation to the world that he wanted.

Self driving cars (probably as shared economy with taxi, logistics, online shopping and not for sale)
a better “semantic” search (in combination with android and more knowledge of user context)

Graphity Server for social activity streams released (GPLv3)

Rene — Mon, 02 Sep 2013 07:11:22 +0000

It is almost 2 years over since I published my first ideas and works on graphity which is nowadays a collection of algorithms to support efficient storage and retrieval of more than 10k social activity streams per second. You know the typical application of twitter, facebook and co. Retrieve the most current status updates from your circle of friends.
Today I proudly present the first version of the Graphity News Stream Server. Big thanks to Sebastian Schlicht who worked for me implementing most of the Servlet and did an amazing job! The Graphity Server is a neo4j powered servlet with the following properties:

Response times for requests are usually less than 10 milliseconds (+network i/o e.g. TCP round trips coming from HTTP)
The Graphity News Stream Server is a free open source software (GPLv3) and hosted in the metalcon git repository. (Please also use the bug tracker there to submit bugs and feature requests)
It is running two Graphity algorithms: One is read optimized and the other one is write optimized, if you expect your application to have more write than read requests.
The server comes with an REST API which makes it easy to hang in the server in whatever application you have.
The server’s response also follows the activitystrea.ms format so out of the box there are a large amount of clients available to render the response of the server.
The server ships together with unit tests and extensive documentation especially of the news stream server protocol (NSSP) which specifies how to talk to the server. The server can currently handle about 100 write requests in medium size (about a million nodes) networks. I do not recommend to use this server if you expect your user base to grow beyond 10 Mio. users (though we are working to get the server scaling) This is mostly due to the fact that our data base right now won’t really scale beyond one machine and some internal stuff has to be handled synchronized.

Koding.com is currently thinking to implement Graphity like algorithms to power their activity streams. It was for Richard from their team who pointed out in a very fruitfull discussion how to avoid the neo4j limit of 2^15 = 32768 relationship types by using an overlay network. So his ideas of an overlay network have been implemented in the read optimized graphity algorithm. Big thanks to him!
Now I am relly excited to see what kind of applications you will build when using Graphity.

If you’ll use graphity

Please tell me if you start using Graphity, that would be awesome to know and I will most certainly include you to a list of testimonials.
By they way if you want to help spreading the server (which is also good for you since more developer using it means higher chance to get newer versions) you can vote up my answer in stack overflow:
http://stackoverflow.com/questions/202198/whats-the-best-manner-of-implementing-a-social-activity-stream/13171306#13171306

How to get started

its darn simple!

You clone the git repository or get hold of the souce code.
then switch to the repo and type sudo ./install.sh
copy the war file to your tomcat webapps folder (if you don’t know how to setup tomcat and maven which are needed we have a detailed setup guide)
and you’re done more configuration details are in our README.md!
look in the newswidget folder to find a simple html / java script client which can interact with the server.

I also created a small simple screen cast to demonstrate the setup:

Get involved

There are plenty ways to get involved:

Fork the server
commit some bug report
Fix a bug
Subscribe to the mailing list.

Furhter links:

git repository
originial graphity blogpost
graphity paper
Stack overflow discussion on social activity streams (for voting thanks!)
issue tracker

Metalcon finally gets a redesign – Thinking about high scalability

Rene — Mon, 17 Jun 2013 15:21:30 +0000

Finally metalcon.de the social networking site which Jonas, Jens and me created in 2008 gets a redesign. Thanks to the great opportunities at the Institute for Web Science and Technologies here in Koblenz (why don’t you apply for a PhD position with us?) I will have the chance to code up the new version of metalcon. Kicking off on July 15th I will lead a team of 5 programmers for the duration of 4 months. Not only will the development be open source but during this time I will constantly (hopefully on a daily basis) write in this blog about the design decisions we took in order to achieve a good scaling web service.
Before I share my thoughts on high scaling architectures for web sites I want to give a little history and background on what metalcon is and why this redesign is so necessary:

Metalcon is a social networking site for german fans of metal music. It currently has

a user base of 10’000 users.
about 500 registered bands
highly semantic and interlinked data base (bands, geographical coordinates, friendships, events)
624 MB of text and structured data about the mentioned topics.
fairly good visibility in search engines.
> 30k lines of code (mostly PHP)
a bad scaling architecture (own OR-mapper, own AJAX libraries, big monolithic data base design, bad usage of PHP,…)
no unit tests (so code maintenance is almost impossible)
no music and audio files
no processes for content moderation
no processes to fight spam and block users
a really bad usability (I could write tons of posts at which points the usability lacks)
no clear distinction of features for users to understand
…

When we built metalcon no one on the team had experience with high scaling web applications and we were about happy to get it running any way. After returning from china and starting my PhD program in 2011 I was about to shut down metalcon. Though we became close friends the core team was already up on new projects and we have been lacking manpower. On the other side everyone kept on telling me that metalcon would be a great place to do research. So in 2011 Jonas and me decided to give it another shot and do an open redevelopment. We set up a wiki to document our features and the software and we created a developer blog which we used to exchange ideas. Also we created some open source project to which we hardly contributed code due to the lacking manpower…
Well at that time we already knew of too many problems so that fixing was not the way to go. At least we did learn a lot. Thinking about high scaling architectures at that time I new that a news feed (which the old version of metalcon already had) was very core for the user experience. Reading many stack exchange discussions I knew that you wouldn’t build such a stream on MySQL. Also playing around with graph databases like neo4j I came to my first research paper building graphity a software which is designed to distribute highly personalized news streams to users. Since our development was not proceeding we never deployed Graphity within metalcon. Also building an autocomplete service for the site should not be a problem anymore.

Roadmap for the redesign

Over the next weeks I hope to read as many interesting articles about technologies and high scalability as I can possibly find and I will be more than happy to get your feedback and suggestions here. I will start reading many articles of http://highscalability.com/ This blog is pure gold for serious web developers.
During a nice discussion about scalability with Heinrich we already came up with a potential architecture of metalcon. I will soon introduce this architecture but want to check first about the best practices in the high scalability blog.
In parallel I will also collect the features needed for the new metalcon version and hopefully be able to pair them with usefull technologies. I already started a wikipage about features and planned technologies to support them.
I will also need to decide the programming language and paradigms for the development. Right now I am playing around with ruby on rails vs GWT. We made some greate experiences with the power of GWT but one major drawback is for sure that the website is more an application than some lightweight website.

So again feel free to give input, share your ideas and experiences with me and with the community. I will be ver greatfull for every recommendation of articles, videos, books and so on.

Analyzing the final and intermediate results of the iversity MOOC Fellowship online voting

Rene — Thu, 23 May 2013 23:07:24 +0000

As writen before Steffen and I participated in the online voting for the MOOC fellowship. Today the competition finished and I would like to say thank you to everyone who so far participated in the voting in particular to the 435 people supporting our course. I did never image to get that many people to be interested in our course!
The voting period went from May first till today. During this period the user interface of the iversity website changed several times providing different kind of information about the voting to us users. Since I have observed a drastic change in rankings on May 9th and since the process and scores have not been very transparent I have decided on that very day to collect some data about the rankings. I already did some quick analysis on the data and found some interesting facts but I am running out of time right now to conduct an extensive data analysis. So I will share the data set with the public domain:
http://rene-pickhardt.de/mooc.tar.bz2 (33MB)
If you download the zip file and extract it you’ll find folders for every hour after May 9th. In every folder you will find 26 html-files representing the current ranking of the courses at that time and a transaction log of the http-requests which were done to download the 26 html files. There are 26 html files since 10 courses were displayed per page and we had 255 courses participating.
During the time of data collection I had 2 or 3 short down times of my web server so it could be possible that some data points are missing.
I already wrote a “dirty hack” and pushed it on github which also extracts the interesting information out of the downloaded html files.

There is a file rank.tsv (334 kb) that displays for every course on an hourly basis the rankings
There is a file vote.tsv (113 kb) that contains for every course on an hourly basis (between may 20th and today) the number of votes the course did acquire. The period of time for vote.tsv is so short since the votes have only been available in the html files during this time.

Skimming the data with my eyes there are already some facts that make me very curious for a deeper data analysis:

Some courses gained several hundred votes within a short period of time (usually only 2 or 3 hours) whereas most courses (especially those gaining such a large amount of votes) often stayed far under 1000 votes at all.
Also it is interesting to see how much variation has been going on in the last couple of days.
Also I haven’t crawled the views of the Youtube videos of the courses and even now after observing the following I did not take a snapshot of them it is interesting that there is such a large difference in conversion rate. Especially the top courses seem to have much more votes than they have views of the application video. Where some really high class and outstanding applications like the ones from Chrstian Spannagel (Math) or Oliver Vornberger (Algorithms and data structures) have two or three times as many views on Youtube as votes. Especially they have about the same amount of views on Youtube as the top voted courses.

I am pretty sure there are some more interesting facts and maybe someone else has collected a better data set over the complete periode of time and including Youtube snapshots as well as Facebook and Twitter mentions.
Since I have been asked several times already: here are the final rankings to download and also as a table in the blog post:

	Kursname	Anzahl an votes
1	sectio chirurgica anatomie interaktiv	8013
2	internationales agrarmanagement 2	7557
3	ingenieurmathematik fur jedermann	2669
4	harry potter and issues in international politics	2510
5	online surgery	2365
6	l3t s mooc der offene online kurs uber das lernen und lehren mit technologien	2270
7	design 101 or design basics 2	2216
8	einfuhrung in das sozial und gesundheitswesen sozialraume entdecken und entwickeln	2124
9	changeprojekte planen nachhaltige entwicklung durch social entrepreneurship	2083
10	social work open online course swooc14	2059
11	understanding sustainability environmental problems collective action and institutions	1912
12	the dance of functional programming languaging with haskell and python	1730
13	zyklenbasierte grundung systematische entwicklung von geschaftskonzepten	1698
14	a virtual living lab course for sustainable housing and lifestyle	1682
15	family politics domestic life revolution and dictatorships between 1900 1950	1476
16	h2o extrem	1307
17	dark matter in galaxies the last mystery	1261
18	algorithmen und datenstrukturen	1207
19	psychology of judgment and decision making	1168
20	the future of storytelling	1164
21	web engineering	1152
22	die autoritat der wissenschaften eine einfuhrung in das wissenschaftstheoretische denken 2	1143
23	magic and logic of music a comprehensive course on the foundations of music and its place in life	1138
24	nmooc nachhaltigkeit fur alle	1130
25	sovereign bond pricing	1115
26	soziale arbeit eine einfuhrung	1034
27	mathematische denk und arbeitsweisen in geometrie und arithmetik	1016
28	social entrepreneurship wir machen gesellschaftlichen wandel moglich	1010
29	molecular gastronomy an experimental lecture about food food processing and a bit of physiology	984
30	fundamentals of remote sensing for earth observation	920
31	kompetenzkurs ernahrungswissenschaft	891
32	erfolgreich studieren	879
33	deciphering ancient texts in the digital age	868
34	qualitative methods	861
35	karl der grosse pater europae	855
36	who am i mind consciousness and body between science and philosophy	837
37	programmieren mit java	835
38	systemisches projektmanagement	811
39	lernen ist sexy	764
40	modelling and simulation using matlab one mooc more brains an interdisciplinary course not just for experts	760
41	suchmaschinen verstehen	712
42	hands on course on embedded computing systems with raspberry pi	679
43	introduction to mixed methods and doing research online	676
44	game ai	649
45	game theory and experimental economic research	633
46	cooperative innovation	613
47	blue engineering ingenieurinnen und ingenieure mit sozialer und okologischer verantwortung	612
48	my car the unkown technical being	612
49	gesundheit ein besonderes gut eine multidisziplinare erkundung des deutschen gesundheitssystems	608
50	teaching english as a foreign language tefl part i pronunciation	597
51	wie kann lesen gelernt gelehrt und gefordert werden lesesozialisation lesedidaktik und leseforderung vom grundschulunterricht bis zur erwachsenenbildung	593
52	the european dream	576
53	education of the present what is the future of education	570
54	faszination kristalle und symmetrie	561
55	italy today a girlfriend in a coma a walk through today s italy	557
56	dna from structure to therapy	556
57	grundlagen der mensch computer interaktion	549
58	malnutrition in developing countries	548
59	marketing als strategischer erfolgsfaktor von der produktinnovation bis zur kundenbindung	540
60	environmental ethics for scientists	540
61	stem cells in biology and medicine	528
62	praxiswissen fur den kunstlerischen alltagsdschungel	509
63	physikvision	506
64	high five evidence based practice	505
65	future climate water	484
66	diversity and communication challenges for integration and mobility	477
67	social entrepreneurship	469
68	die kunst des argumentierens	466
69	der hont feat mit dem farat wek wie kinder schreiben und lesen lernen	455
70	antikrastination moocen gegen chronisches aufschieben	454
71	exercise for a healthier life	454
72	the startup source code	438
73	web science	435
74	medizinische immunologie	433
75	governance in and through human rights	431
76	europe in the world law and policy aspects of the eu in global governance	419
77	komplexe welt strukturen selbstorganisation und chaos	419
78	mooc basics of surgery want to become a real surgeon	416
79	statistical data analysis for the humanities	414
80	business math r edux	406
81	analyzing behavioral dynamics non linear approaches to social and cognitive sciences	402
82	space technology	397
83	der erzahler materialitat und virtualitat vom mittelalter bis zur gegenwart	396
84	kriminologie	395
85	von e mail skype und xing kommunikation fuhrung und berufliche zusammenarbeit im netz	394
86	wissenschaft erzahlen das phanomen der grenze	392
87	nachhaltige entwicklung	389
88	die nachste gesellschaft gesellschaft unter bedingungen der elektrizitat des computers und des internets	388
89	die grundrechte	376
90	medienbildung und mediendidaktik grundbegriffe und praxis	368
91	bubbles everywhere speculative bubbles in financial markets and in everyday life	364
92	the heart of creativity	363
93	physik und weltraum	358
94	sim suchmaschinenimplementierung als mooc	354
95	order of magnitude physics from atomic nuclei to the universe	350
96	entwurfsmethodik eingebetteter systeme	343
97	monte carlo methods in finance	335
98	texte professionell mit latex erstellen	331
99	wissenschaftlich arbeiten wissenschaftlich schreiben	330
100	e x cite join the game of social research	330
101	forschungsmethoden	323
102	complex problem solving	321
103	programmieren lernen mit effekt	317
104	molecular devices and machines	317
105	wie man erfolgreich ein startup aufbaut	315
106	grundlagen der prozeduralen und objektorientierten programmierung	314
107	introduction to disability studies	314
108	eu2c the european union explained by two partners cologne and cife	313
109	the english language a linguistic introduction 2	311
110	allgemeine betriebswirtschaftslehre	293
111	interaction design open design	293
112	how we learn nowadays possibilities and difficulties	288
113	foundations of educational technology	288
114	projektmanagement und designbasiertes lernen	281
115	human rights	278
116	kompetenz des horens technische gehorbildung	278
117	it infrastructure management	276
118	a media history in 10 artefacts	274
119	introduction to the practice of statistics and regression	271
120	what is a good society introduction to social philosophy	268
121	modellierungsmethoden in der wirtschaftsinformatik	265
122	objektorientierte programmierung von web anwendungen von anfang an	262
123	intercultural diversity networking vielfalt interkulturell vernetzen	260
124	foundations of entrepreneurship	259
125	business communication for impact and results	257
126	gamification	257
127	creativity and design in innovation management	256
128	mechanik i	252
129	global virtual project management	252
130	digital signal processing for everyone	249
131	kompetenzen fur klimaschutz anpassung	248
132	digital economy and social innovation	246
133	synthetic biology	245
134	english phonetics and phonology	245
135	leibspeisen nahrung im wandel der zeiten molekule brot kase fleisch schokolade und andere lebensmittel	243
136	critical decision making in the contemporary globalized world	238
137	einfuhrung in die allgemeine betriebswirtschaftslehre schwerpunkt organisation personalmanagement und unternehmensfuhrung	236
138	didaktisches design	235
139	an invitation to complex analysis	235
140	grundlagen der programmierung teil 1	234
141	allgemein und viszeralchirurgie	233
142	mathematik 1 fur ingenieure	231
143	consumption and identity you are what you buy	231
144	vampire fictions	230
145	grundlagen der anasthesiologie	228
146	marketing strategy and brand management	227
147	political economy an introduction	225
148	gesundheit	221
149	object oriented databases	219
150	lebenswelten perspektiven fur menschen mit demenz	217
151	applications of graphs to real life problems	210
152	introduction to epidemiology epimooc	207
153	network security	207
154	global civics	207
155	wissenschaftliches arbeiten	204
156	annaherungen an zukunfte wie lassen sich mogliche wahrscheinliche und wunschbare zukunfte bestimmen	202
157	einstieg wissenschaft	200
158	engineering english	199
159	das erklaren erklaren wie infografik klart erklart und wissen vermittelt	198
160	betriebswirtschaftliche und rechtliche grundlagen fur das nonprofit management	192
161	art and mathematics	191
162	vom phanomen zum modell mathematische modellierung von natur und alltag an ausgewahlten beispielen	190
163	design interaktiver medien technische grundlagen	189
164	business englisch	187
165	erziehung sehen analysieren gestalten	184
166	basic clinical research methods	184
167	ordinary differential equations and laplace transforms	180
168	mathematische logik	179
169	die geburt der materie in der evolution des universums	179
170	innovationsmanagement von kleinen und mittelstandischen unternehmen kmu	176
171	introduction to qualitative methods in the social sciences	175
172	advert retard wirkung industrieller interessen auf rationale arzneimitteltherapie	175
173	animation beyond the bouncing ball	174
174	entropie einfuhrung in die physikalische chemie	172
175	edufutur education for a sustainable future	165
176	social network effects on everyday life	164
177	pharmaskills for africa	163
178	nachhaltige energiewirtschaft	162
179	qualitat in der fruhpadagogik auf den anfang kommt es an	158
180	dementias	157
181	beyond armed confrontation multidisciplinary approaches and challenges from colombia s conflict	154
182	investition und finanzierung	150
183	praxis des wissensmanagements	149
184	gutenberg to google the social construction of the communciations revolution	145
185	value innovation and blue oceans	145
186	kontrapunkt	144
187	shakespeare s politics	142
188	jetzt erst recht wissen schaffen uber recht	141
189	rechtliche probleme von sozialen netzwerken	138
190	augmented tuesday suppers	137
191	positive padagogik	137
192	digital storytelling mit bewegenden bildern erzahlen	136
193	wirtschaftsethik	134
194	energieeffizientes bauen	134
195	advising startups	133
196	urban design and communication	133
197	bildungsreform 2 0	132
198	mooc management basics	130
199	healthy teeth a life long course of preventive dentistry	129
200	digitales tourismus marketing	127
201	the arctic game the struggle for control over the melting ice	127
202	disease mechanisms	127
203	special operations from raids to drones	125
204	introduction to geospatial technology	120
205	social media marketing strategy smms	119
206	korpusbasierte analyse sprechsprachlichen problemlosungsverhaltens	116
207	introduction to marketing	115
208	creative coding	114
209	mooc meets 3d	110
210	unternehmenswert die einzig sinnvolle spitzenkennzahl fur unternehmen	110
211	forming behaviour gestaltung und konzeption von web applications	109
212	technology demonstration	108
213	lebensmittelmikrobiologie und hygiene	105
214	estudi erfolgreich studieren mit dem internet	105
215	moderne geldtheorie eine paische perspektive	103
216	kollektive intelligenz	103
217	geschichte der optischen medien	100
218	alter und soziale arbeit	99
219	semantik eine theorie visueller kommunikation	97
220	erziehung und beratung in familie und schule	96
221	foreign language learning in indian context	95
222	bildgebende verfahren	92
223	applied biology	92
224	bildung in der wissensgesellschaft gerechtigkeit	92
225	standortmanagement	92
226	europe a solution from history	90
227	methodology of research in international law	90
228	when african americans came to paris	90
229	contemporary architecture	89
230	past recent encounters turkey and germany	88
231	wars to end all wars	83
232	online learning management systems	82
233	software applications	81
234	business in germany	78
235	requirements engineering	77
236	anything relationship management xrm	77
237	global standards and local practices	76
238	prodima professionalisation of disaster medicine and management	75
239	cytology with a virtual correlative light and electron microscope	75
240	the organisation of innovation	75
241	sensors for all	75
242	diagnostik in der beruflichen bildung	73
243	scientific working	71
244	escience saxony lectures	71
245	internet marketing strategy how to gain influence and spread your message online	69
246	grundlagen des e business	69
247	principles of public health	64
248	methods for shear wave velocity measurements in urban areas	64
249	democracy in america	64
250	building typology studies gebaudelehre	63
251	multi media based learning environments at the interface of science and practice hamburg university of applied sciences prof dr andrea berger klein	61
252	math mooc challenge	60
253	the value of the social	58
254	dienstleistungsmanagement und informationssysteme	57
255	ict integration in education systems e readiness e integration e transformation	56

Please help me to realize my Web science massive open online course

Rene — Wed, 01 May 2013 09:59:57 +0000

I am asking you for a big favor in this blog post! You can help me to achieve one of my childhood dreams:
I am an enthusiastic teacher and love to share information (as you might have seen by reading my blog) Over the last month I have designed a structure for an online course on Web Science together with a short video. In this blog post I will introduce the course to you but I am also asking you to vote for the course since only 10 of the 250 courses that applied for the fellowship will be sponsored and thus be realized.
So please go to https://moocfellowship.org/submissions/web-science an learn more about the course and vote for it. You can find almost all details of the course in this blog post.

Why creating such a cours?

The web has become important to its 2.3 billion users. Yet only a small group of people understand the processes that take place on it and quickly steer its development into new directions.

Novelty of the subject

Web Science is an upcoming academic field. Much information about the web already exists online, but no course that comprises all of it.

High value for every web user

The MOOC would be of high value and of relevance for anybody using the web e.g:

A programmer who is building the next web application
A company deciding their web strategy
A judge who has to decide a case regarding net neutrality or copy right infringements
The Government as well as public authorities which have to make decisions on how to regulate the web
…

The web is the right place to learn about the web

The web itself is the best platform to educate people about the web since you can always point directly to the object of study. By creating a MOOC we will be able to aggregate, organize and filter much of the available information.

Integration within our institution

The MOOC will be a core element for the web science lecture of our web science master program. The goal is that students will work with the material provided by the MOOC and the instructors will replace classical lectures with public Q&A sessions. Additionally the Web Science lecture of 2013/2014 will serve as an internal testing of the MOOC such that the improved MOOC can launch on iversity in 2014.

Course content

This MOOC consists of ten lessons divided into three parts.

Lesson 1 – 3: Foundations of the web
Lesson 4 – 7: Theoretical results of web user behavior
Lesson 8 – 10: Web & society

Lesson 1 & 2: History of the Web & Web Architecture

You will understand the historical development of the web and see how the cold war in combination with advances in technical developments led to the Internet Protocol suite.

On each Layer you will know one protocol and understand how these protocols build an open, inter operable and decentralized system. Furthermore you will learn about the domain name system and find out why the concepts of URI and Hypertext were crucial for the success of the web.

Lesson 3: Structure of the Web

You will learn about the six degrees of seperation and understand concepts like small world networks by studying ‘the other’ Milgram experiment. You will be able to use power law distributions to describe the structure of the web, its content and its users.

Lesson 4 & 5: Micro and Macro behavior of web users & Social Network (Analysis)

You will be introduced to theories from Microsociology and see how applying them to the behavior of people on the web leads to macro structures such as:

Analyzing social network data from the Koblenz Network Collection using Octave you will gain a deeper understanding of social theories and social networks.

Lesson 6 & 7: Information Retrieval & Recommender systems

Completing this section you will understand the basic architecture of a (web) search engine. You can name the fundamental (non technical) difficulties one has in order to create a good information retrieval system. You will learn about the connection to recommender systems that are (not only!) used by large web shops to increase cross selling.
You will be able to discuss the danger of such algorithms like the relevance paradox and the filter bubble.

Lesson 8: Trust and Security

You will learn how third parties act as trust providers on the web and how this issue is related to markets with asymmetric information. You will see that trust issues in the online word differ from the offline problems. You will know of ways like cryptography, secure communication and certificates to resolve trust issues and how those techniques can even lead to a new currency.

Lesson 9: Web Economics

You will know of e-commerce models like online shopping & auctions as well as online advertising and marketing. You will be able to interpret and apply metrics for web analytics such as

Lesson 10: Web Governance and Web Ethics

Finally you will understand the important role of institutions like W3C, IETF and ICANN . You will use your understanding of the web architecture to discuss and explain the connections between

Net neutrality
Piracy and copy right infringement
Internet censorship and the freedom of speach

So please go to https://moocfellowship.org/submissions/web-science an learn more about the course and vote for it.

Open access and data from my research. Old resources for various topics finally online.

Rene — Mon, 05 Nov 2012 05:19:53 +0000

Being strong pro on the topic of open access I always try to publish all my work on my blog but sometimes I am busy or I forget to update so today I took the time to look at all my old drafts and the stuff that hasn’t been published yet. So here is a list of new content on my blog that should have been published long ago I also linked it in the articles of interest:

The slides of my Graphity talk at FOSDEM 2012
The slides of my Graphity talk at SocialCom 2012
The slides of my Oberseminar talk on Typology.
We consolidated the source code for related work into a git repo.

In the last month I have created quite some content for my blog and it will be published over the next weeks. So watch out for screen casts how to create an autocompletion in gwt with neo4j, how to create ngrams from wikipedia, thoughts and techniques for related work, reasearch ideas and questions that we found but probably have not the time to work on

Typology Oberseminar talk and Speed up of retrieval by a factor of 1000

Rene — Thu, 16 Aug 2012 11:39:25 +0000

Almost 2 months ago I talked in our oberseminar about Typology. Update: Download slides Most readers of my blog will already know the project which was initially implemented by my students Till and Paul. I am just about to share some slides with you. They explain on one hand how the systems works and on the other hand give some overview of the related work.
As you can see from the slides we are planning to submit our results to SIGIR conference. So one year after my first blogpost on graphity which devoloped in a full paper for socialcom2012 (graphity blog post and blog post for source code) there is the yet informal typology blog post with the slides about the Typology Oberseminar talk and 3 months left for our SIGIR submission. I expect this time the submission will not be such a hassle as graphity since I shuold have learnt some lessons and also have a good student who is helping me with the implementation of all the tests.
Additionally I have finally uploaded some source code to git hub that makes the typology retrieval algorithm pretty fast. There are still some issues with this code since it lowers the quality of predictions a little bit. Also the index has to be built first. Last but not least the original SuggestTree code did not save the weights of the items to be suggested. I need those weights in the aggregation phase. Since i did not want to extend the original code I placed the weights at the end of the suggested Items. This is a little inefficent.
The main idea why retrieval speeds up with the new algorithm is that typology needs to make sorting over all outedges of a node. This is rather slow especially if one only needs the top k elements. Since neo4j as a graph data base does not provide indices for this kind of data I was forced to look for another way to presort the data. Additionally if a prefix is known one does not have to look at all outgoing edges. I found the Suggest Tree class by Nicolai Diethelm. Which solved the problem in a very good way and lead to such a great speed. The index is not persistent yet and it also needs quite some memory. On the other hand for every node a suggest tree is built. This means that the index can be distributed in a very easy manner over several machines allowing for horizontal scaling!
Anyway the old algorithm was only able to handle like 20 requests per second and now we have something like 14 k requests and as I mentioned there is still a little space for more (:
I hope indices like this will be standard in neo4j soon. This would open up the range of applications that could make good use of neo4j.
Like always I am happy for any suggestions and I am looking forward to do the complete evaluation and paper writing for typology.

Graphity source code and wikipedia raw data is online (neo4j based social news stream framework)

Rene — Mon, 09 Jul 2012 15:43:57 +0000

UPDATE: there is now the source code of an entire graphity server application online!
8 months ago I posted the results of my research about fast retrieval of social news feeds and in particular my graph index graphity. The index is able to serve more than 12 thousand personalized social news streams per second in social networks with several million active users. I was able to show that the system is independent of the node degree or network size. Therefor it scales to graphs of arbitrary size.
Today I am pleased to anounce that our joint work was accepted as a full research paper at IEEE SocialCom conference 2012. The conference will take place in early September 2012 in Amsterdam. As promised before I will now open the source code of Graphity to the community. Its documentation could / and might be improved in future also I am sure that one is even able to use a better data structure for our implementation of the priority queue.
Still the attention from the developer community for Graphity was quite high so maybe the source code is of help to anyone. The source code consists of the entire evaluation framework that we used for our evaluation against other baselines which will also help anyone to reproduce our evaluation.
There is some nice things one can learn in setting up multthreading for time measurements and also how to set up a good logging mechanism.
The code can be found at https://github.com/renepickhardt/graphity-evaluation and the main Algorithm should lie in the file:
https://github.com/renepickhardt/graphity-evaluation/blob/master/src/de/metalcon/neo/evaluation/GraphityBuilder.java
other files of high interest should be:

https://github.com/renepickhardt/graphity-evaluation/blob/master/src/de/metalcon/neo/evaluation/neo/SortUtils.java topk nway merge inside a graph db
https://github.com/renepickhardt/graphity-evaluation/blob/master/src/de/metalcon/neo/evaluation/neo/NodeQueueIterator.java iterator over the graphity index
https://github.com/renepickhardt/graphity-evaluation/blob/master/src/de/metalcon/neo/evaluation/neo/NeoUtils.java some shortcuts for neo4j coding

I did not touch it again over the last couple months and it really has a lot of debugging comments inside. My appologies for this bad practice. I hope you can oversee this by having in mind that I am a mathematician and this was one of my first bigger evaluation projects. In my own interest I promise next time I produce code that will be easier to read / understand and reuse.
Still if you have any questions suggestions or comments feel free to contact me.
The raw data is can be downloaded at:

18 MB: http://glm.rene-pickhardt.de/de-nodeIds.txt.bz2
650 MB: http://glm.rene-pickhardt.de/de-events.log.bz2 All events that ever happened to german wikipedia articles up to middle of 2011

the format of these files is straight foward:
de-nodeIs.txt has first some ID then a tab and then the title of the wikipedia article this is just necessary if you want to display your data with titles rather than names.
the interesting file is the de-events.log in this file there are 4 columns
timestamp TAB FromNodeID TAB [ToNodeID] TAB U/R/A
So every line tells exactly when an article FromNodeID changes. if only 3 collumns are available and an U is written then the article just changed. Maybe links in the article changed in this case there exists another nodeID in the 3 column and an A or a R for add or remove respectively.
I think processing these files is rather straight forward. With this file you can totally simulate the growth of wikipedia over time. The file is sorted by the 2. column. If you want to use it in our evaluation framework you should sort this by the first column. This can be done on a unix shell in less than 10 minutes with the sort command.
Sorry I cannot publish the paper right now on my blog yet since the camera ready version has to be prepared and checked in to IEEE. But follow me on twitter or subscribe to my newsletter so I can let you know as soon as the entire paper as a pdf is available.

Related-work.net – Product Requirement Document released!

Rene — Mon, 12 Mar 2012 10:26:50 +0000

Recently I visited my friend Heinrich Hartmann in Oxford. We talked about various issues how research is done in these days and how the web could theoretically help to spread information faster and more efficiently connect people interested in the same paper / topics.
The idea of http://www.related-work.net was born. A scientific platform which is open source and open data and tries to solve those problems.
But we did not want to reinvent the wheel. So we did some research on existing online solutions and also asked people from various disciplines to name their problems. Find below our product requirement document! If you like our approach you can contact us or contribute on the source code find some starting documentation!
So the plan is to fork an open source question answer system and enrich it with the features fulfilling the needs of scientists and some social aspects (hopefully using neo4j as a supporting data base technology) which will eventually help to rank related work of a paper.
Feel free to provide us with feedback and wishes and join our effort!

Beginning of our Product Requirement Document

We propose to create a new website for the scientific community which brings together people which are reading the same paper. The basic idea is to mix the functionality of a Q&A platform (like MathOverflow) with a paper database (like arXiv). We follow a strict openness principal by making available the source code and the data we collect.
We start with an analysis how the internet is currently used in different fields and explain the shortcomings. The actual product description can be found under the section “Basic idea”. At the end we present an overview over the websites which follow a similar approach.
This document – as well as the whole project – is work in progress. We are happy about any kind of comments or other contributions.

The distribution of scientific knowledge

Every scientist hast to stay up to date with the developments in his area of research. The basic sources for finding new information are:

Conferences
Research Seminars
Journals
Preprint-servers (arXiv)
Review Databases (MathSciNet, Zentralblatt, …)
Q&A Sites (MathOverflow, StackOverflow, …)
Blogs
Social Networks (Twitter, Google+)
Bibliograhpic Databases (Mendeley, nNode, Medline, etc. )

Every community has found its very own way of how to use this tools.

Mathematics by Heinrich Hartmann – Oxford:

To stay up to date with recent developments I check arxiv.org on a daily basis (RSS feed) participate in mathoverflow.net and search for papers over Google Scholar or MathSciNet. Occasionally interesting work is shared by people in my Google+ circles. In general the speed of pure mathematics is very slow. New research often builds upon work which has been out for a few years. To stay reasonably up to date it is enough to go to conferences every 3-5 months.
I read many papers on myself because I am the only one at the department who does research on that particular topic. We have a reading class where we read papers/lecture notes which are relevant for more people. Usually they are concerned with introductions to certain kinds of theory. We have weekly seminars where people talk about their recently published work. There are some very active blogs by famous mathematicians, but in my area blogs play virtually no role.

Computer Science by René Pickhardt – Uni Koblenz

In Computer Science topics are evolving but also changing very quickly. It is always important to have both an overview of upcoming technologies (which you get from tech blogs) as well as access to current research trends.
Since the speed in computer science is so fast and the review process in Journals often takes much time our main source of information and papers are conferences and twitter.

Usually conference papers are distributed digitally to participants. If one is interested in those papers google queries like “conference name year papers” are frequently used. Sites like http://www.sciweavers.org/ host and aggregate preprints of papers and organize them by conference.
The general method to follow a conference that one is not attending is to follow the hashtag of the conference on Twitter. In general Twitter is the most used tool to share distribute and find information not only for papers but also for the above mentioned news about upcoming technologies.

Another rich source for computer scientists is, of course, the related work of papers and google scholar. Especially useful is the method of finding a very influential paper with more than 1000 citations and find newer papers that quote this paper containing a certain keyword which is one of the features of google scholar.
The main problem in computer science is not to find a rare paper or idea but rather to filter the huge amount of publications and also bad publications and also keep track of trends. In this way a system that ranks and summarize papers (not only by abstract and citation counts) would help me a lot to select what related work of a paper I should read!

Psychology by Elisa Scheller – Uni Freiburg

As a psychologist/neuroscientist, I receive recommendations for scientific papers via google scholar alerts or science direct alerts (http://www.sciencedirect.com/); I receive alerts regarding keywords or regarding entire journal issues. When I search for a certain publication, I use pubmed.org or scholar.google.com. This can sometimes be kind of annoying, as I receive multiple alerts from different sources; but I guess it is the best way to stay up to date regarding recent developments. This is especially important in my field, as we feel a big amount of “publication pressure”; I work on a method which is considered as “quite fancy” at the moment, so I also use the alerts to make sure nobody has published “my” experiment yet.
Sometimes a facebook friend recommends a certain publication or a colleague points me to it. Most of the time, I read articles on my own, as I am the only person working on this specific topic at my institution. Additionally, we have a weekly journal club where everyone in turn presents work which is related to our focus of research, e.g. a certain part of the human brain. There is also a weekly seminar dedicated to presentations about ongoing projects.
Blogs (e.g. mindhacks.com, http://neuroskeptic.blogspot.com/) can be a source to get an overview about recent developments, but I have to admit I use them mainly for work-related entertainment.
All in all, it is easy to stay up to date using alerts from different platforms; the annoying part of it is the flood of emails you receive and that you are quite often alerted to articles that don’t fit your interests (no matter how exact you try to specify your keywords).

Biomedical Research by Johanna Goldmann – MIT

In the biological sciences, in research at the bench – communication is one of the most fundamental tools a scientist can have. Communication with other scientist may open up the possibilities of new collaborations, can lead to a completely new view point of a known question, the integration and expansion of methods as well as allowing a scientist to have a good understanding of what is known, what is not known and what other people have – both successfully and unsuccessfully – tried to investigate.
Yet communication is something that is currently very much lacking in academic science – lacking to the extent that most scientist will agree hinders the progress of research. Nonetheless the lack of communication and the issues it brings with it is something that most scientists will have accepted as a necessary evil – not knowing how to possibly change it.
Progress is only reported in peer-reviewed journals – many which are greatly affected not only but what is currently “sexy” in research but also by politics and connections and the “publish or perish” pressure. Due to the amount of this pressure in publishing in journals and the amount of weight the list of your publications will have upon any young scientists chances of success, scientist tend also to be very reluctant in sharing any information pre-publication.
Furthermore one of the major issues is that currently there really is no way of publishing or communicating either negative results or minor findings, which causes may questions or methods to be repeatedly investigated as well as a loss of information.
Given how much social networks and the internet has changed communication as well as the access to information over the past years – there is a need for this change to affect research and communication in the life science and transform the way we think not only about solving and approaching research questions we gather but the information and insights we gain as a whole.

Philosophy by Sascha Benjamin Fink – Uni Osnabrück

The most important source of information for philosophers is http://philpapers.org/. You can follow trends going on in your field of interest. Philpapers has a list of almost all papers together with their abstracts, keywords and categories as well as a link to the publisher. Additional information about similar papers is displayed.
Every category of papers is managed by some editor. For each category it is possible to subscribe to a newsletter. In this way once per month I will be informed about current publications in journals related to my topic of interest. Every User is able to create an account and manage his literature and the papers of his he is interested in.
Other research and information exchange methods among philosophers consist of mailing lists, reading clubs and Blogs. Have a look at David Chalmers blog list. Blogs are also becoming more and more important. Unfortunately they are usually on general topics and discussing developments of the community (e.g. Leiter’s Blog, Chalmers’ Blog and Schwitzgebel’s Blog).
But all together I still think that for me a centralized service like Philpapers is my favourite tool because it aggregates most information. If I don’t hear about it on Philpapers usually it is not that important. I think among Philosophers this platform – though incomplete – seems to be the standard for the next couple of years.

Problems

As a scientist it is crucial to be informed about the current developments in the research area. Abstracting from the reports above we divide the tasks roughly into the following stages.

1. Finding and filtering new publications:

What is happening right now? What are the current hot topics my area? What are current trends? (→ Check arXiv/Twitter)
Did a friend of mine write something? Did a “big shot” write something?
(→ Check meta information: title, authors)
Are my colleagues excited about a new development? (→ Talk to them.)

2. Getting more information about a given paper:

What is actually done in a given paper? Is it relevant for me? Is it really new? Is it a breakthrough? (→ Read abstracts. Find a good readable summary/review.)
Judge the quality of a paper: Is it correct? Is it well written?
( → Where is it published, if at all? Skim through content.)

Finally there is a fundamental decision: Shall I read the whole paper, or not? which leads us to the next task.

3. Understanding a paper: Understanding a paper in depth can be a very time consuming and tedious process. The presentation is often very short and much knowledge is assumed from the reader. The notation choices can be bad, so that even the statements are hard to understand. In effect the paper is easily readable only for a very small circle of specialist in the area. If one is not in the lucky situation to belong to that circle, one usually applies the following strategies:

Lookup references. This forces you to process a whole tree of older papers which might be hard to read, and hard to get hold of. Sometimes it is worthwhile to consult a textbook to polish up fundamentals.
Finding additional resources. Is there a review? Is there a related video lecture or slides explaining the material in more detail? Is the author going to a conference in the near future, or even giving a seminar in the area?
Join forces. Find people thinking about the same paper: Has somebody at my department already read the paper, so that I can ask some questions? Is there enough interest to make a reading group, or more formally, run a seminar about that paper.
Contact the author. This a last resort. If you have struggled with understanding the paper for a very long time and really need/want to get it, you might eventually write an email to the author – who might respond, or not. Sometimes even errors are found! – and not published! An indeed, there is no easy way to publish “errata” anywhere on the net.

In mathematics most papers are not getting read though the end. One uses strategies 1 & 2 till one gets stuck and moves on to something more exciting. The chances of survival are much better with strategy 3 where one is committed putting a lot of effort in it over weeks.

4. Finding related work. Where to go from there? Is the paper superseded by a more recent development? Which are the relevant papers which the author builds upon? What are the historic influences? What are the founding ideas of the subject? Finding related work is very time consuming. It is easy to overlook things given that the references are often vast, and sometimes hard to get hold of. Getting information over citations requires often access to commercial databases.

Basic idea:

All researchers around the world are faced with the same problems and come up with their individual solutions. There are great synergies in bringing these people together with an online platform! Most of the addressed problems are solved with a paper centric service which allows you to…

…get to know other readers of the paper.
…exchange with the other readers: ask questions, write comments, reviews.
…share the gained insights with the community.
…ask questions about the paper.
…discuss the paper.
…review the paper.

We want to do that with a new mixture of a traditional Q&A system like StackExchange or MathOverflow with a paper database and social features. The key features of this system are as follows:

Openness: We follow a strict openness principle. The software will be developed in open source. All data generated on this site will be under a creative commons license (like Wikipedia) and will be made available to the community in form of database dumps or an API (open data).

We use two different types of content sites in our system: Papers and Discussions.

Paper sites. A paper site is dedicated to a single publication. And has the following features:

Paper meta information
– show title, author, abstract, journal, tags
– leave a comment
– write a review (with wiki option)
– vote up/down
Paper resources
– show pdfs, slides, notes, video lectures, etc.
– add a resource
Related Work
– show the reference-tree and citations in an intelligent way.
Discussions:
– show related discussions
– start a new discussion
Social features
– bookmark
– share on G+, twitter

The point “Related Work” deserves some further explanation. The citation graph offers a great deal more information than just a list of references. Together with the user generated content like votes and the individual paper bookmarks and social graph one has a very interesting data set which can be harvested. We want this point at least view with respect to: Popularity/Topics/Read by Friends. Later on one could add more sophisticated, even graphical views on this graph.

Discussion sites. A discussion looks more like a traditional QA-question, with the difference, that each discussion may have related (many) papers. A discussion site contains:

Discussion meta information (title, author, body)
Discussion content
Related papers
Voting
Follow/Bookmark

Besides the content sides we want to provide the following features:

News Stream. This is the start page of our website. It will be generated from the network consisting of friends, papers and authors. There should be several modes like:

hot: heavily discussed papers/discussions
new papers: list new publications (filtered by tag, like arXiv feed)
social: What did your friends do lately
default: intelligent mix of recent activity that is relevant to the logged in user

Moreover, filter by tag should be always available.

Search bar:

Searches contents of the site, but should also find papers on freely available databases (e.g. arXiv). Adding a paper should be very seamless process from there.
Search result ranking uses vote and view information.
Personalized search information. (Physicists usually do not want sociology results.)
Auto completion on paper titles, author, discussions.

Social: (hard to implement, maybe for second version!)

Easily refer to users by @-syntax familiar from Twitter/Google+
Maintain a friendship / trust graph
Friendship recommendations
Find friends from Google+ on the site

Benefits

Our proposed websites improves the above mentioned problems in the following ways.
1. Finding and filtering new publications:This step can be improved with even very little community effort:

Tell other people, that you are interested in the paper. Vote it up or leave a comment if you are very excited about it.

Point out a paper to a colleague.

2. Getting more information about a given paper:

Write a summary or review about a paper you have read or skimmed through. Maybe the introduction is hard to read or some results are not clearly stated.
Can you recommend reading this paper? Vote it up!
Ask a colleague for his opinion on the paper. Maybe he can write a summary?

Many reviews of new papers are already written. E.g. MathSciNet and Zentralblatt maintain a large database of Reviews which are provided by the community and are not freely available. Many authors would be much more happy to write them to an open system!
3. Understanding a paper:Here are the mayor synergies which we want to address with our project.

Ask a question: Why is the author using this experimental method? How does Lemma 3.4 work? Why do I need this assumption? What is the intiution behind the “virtual truncation”? What implications does this work have?
Start a discussion: (might involve more than one paper.) What is the difference of these two papers? Is there a reference explaining this more clearly? What should I read in advance to understand the theory?
Add resources. Tell the community about related videos, notes, books etc. which are available on other sites.
Share your notes. If you have discussed a paper in a reading class or seminar. Collect your notes or opinions and make them available for the community.
Restate interesting statements. Tell the community when you have found a helpful result which is buried inside the paper. In that way Google may find it!

4. Finding related work. Having a well structured and easily navigable view on related papers simplifies the search a lot. The filtering benefits from the content generated by the users (votes) and individual information, like friends who have written/bookmarked a paper.

Similar Sites on the Web

There are several discussions in QA forum which are discussing precisely this problem:

Quora.com: Where can I comment on sci papers?
MathOverflow: Good websites for discussions of mathematical papers?
MathOverflow: Is a free alternative to MathSciNet possible?
MathOverflow: Errata-Database

We found three sites on the internet which follow a similar approach which we examined more carefully.
1. There is a social network which has most of our features implemented:

researchgate.net
“Connect with researchers, make your work visible, and stay current.”

The Economist has dedicated an article to them. It is essentially a facebook clone, with special features for scientist.

Large, fast growing community. 1.4m +50.000/m. Mainly Biology and Medicine.
(As Daniel Mietchen points out, the size might be misleading due to institutional accounts)
Very professional Look and Feel. Company from Berlin, Germany, funded by VC. (48 People involved, 10 Jobs advertised)
Huge Feature set:

Profile site, Connect to friends
News Feed
Publication Database, Conference Finder, Jobmarket
Every Paper its own page: with

Voting up/down
Comments
Metadata (Title, Author, Abstract, Preveiw)
Social Media (Share, Bookmark, Follow author)

Organize Workgroups/Reading Classes.

Differences to our approach:

Closed Data / Closed Source
Very complex site which solves a lot of purposes
Only very basic features on paper site: vote/comment.
QA system is not linked well to paper database
No MathML
Mainly populated by undergraduates

2. Another website which comes reasonably close is:

http://www.sciweavers.org/

“an academic network that aggregates links to research paper preprints
then categorizes them into proceedings.”

Includes a large collection of online tools for various purposes
Have a big library of papers/software/datasets/conferences for computer science.
Paper sites have:

Meta information and preview
Vote functionality and view statistics, tags
Comments
Related work
Bookmarking
Author information

User profiles (no friendships)

Differences to our approach:

Focus on computer science community
Comment and Discussions are well hidden on paper sites
No News stream
Very spacious design

3. Another very similar site is:

journalfire.com – beta
“Share what your read – connect to colleagues – create journal clubs.”

It has the following features:

Comment on Papers. Activity feed (?). Follow articles.
Host Journal Clubs. Create Events related to papers.
Powerful search box fetching papers from Arxiv and Pubmed (slow)
Social features on site: User profiles, friend finder (no fb/g+ integration yet)
News feed – from subscribed papers and friends
Easy paper import via Bookmarklet
Good usability!! (but slow loading times)
Private reading clubs cost money!

They are very skilled: Maintained by 3 PhD students/postdocs from Caltec and MIT.

Differences to our approach:

Closed Data, Closed Source
Also this site misses (currently) misses out ranking features
Very Closed model – Signup required
Weak Crowd sourcing: Cannot add Meta information

The site is still at its very beginning with little users. The project started in 2010 and did not gain much momentum since.

The other sites are roughly classified in the following categories:
1. Single people who are following a very similar idea:

annotatr.appspot.com. Combines a metadata-base with the disqus plugin. You can comment but not rate. Good usability. Nice CSS. Good search function. No MathML. No related article suggestion. Maintained by two academics in private time. Hosted on Google Apps. Closed Source – Closed Data.
r-Forum – a resource where mathematicians can collect record reviews, corrections of a resource (e.g. paper, talk, …). A simple Vanilla-Forum/Wiki with almost no content used by maybe 12 people in US. No automated Data import. No rating system.
http://math-arch.org/ – Post comments to math papers. very bad usability – get even errors. Maintained by a group of russian programmers LogicSun. Closed Source – Closed Data.

Analysis: Although the principal idea to connect people reading papers is there. The implementation is very bad in terms of usability and even basic programming. Also the voting features are missed out.

2. (Semi) Professional sites.

Public Libary of Science very professional, huge paper data base for mainly biology, medicine. Features full text papers, lots of interesting meta information including references. Has comment features (not very visible) and news stream on the start page.
No QA features (+1, Ask question) on the site. Only published articles are on the site.
Mendeley.com – Huge Bibliographic database with bookmarking and social features. You can organize reading groups in there, with comments and notes shared among the participants. Features a news stream with papers by friends. Nice import. Impressive fulltext data and Reference features.
No QA features for paper. No comments for paper. Requires Signup to do anything useful.
papercritic.com – Open review database. Connected to Mendely bibliographic libary. You can post reviews. No rating. No comments. Not open: Mendely is commercial.
webofknowledge.com. Commercial academic citation index.
zotero.org – features programm that runs inside a browser. “easy-to-use tool to help you collect, organize, cite, and share your research sources”

Analysis: The goal of all these tools is to simplify the reference management, by providing metadata like references, citations, abstracts, author profiles. Commenting features on the paper site are not there or not promoted.
3. Vaguely related sites which solve different problems:

citeulike.org – Social bookmarking for papers. Closed Source – Open Data.
http://www.scholarpedia.org. A peer reviewed open access encyclopedia.
Philica.com Online Journal which publishes articles from any field along with its reviews.
MathSciNet/Zentralblatt – Review database for math community. Closed Source – Commercial.
http://f1000research.com/ – Online Journal with a public, post publish review process. “Open Science – Open Data – Open Review”
http://altmetrics.org/manifesto/ as an emerging trend from the web-science trust community. Their goal is to revolutionize the review process and create better filters for scientific publications making use of link structures and public discussions. (Might be interesting for us).
http://meta.wikimedia.org/wiki/WikiScholar – one of several ideas under discussion at Wikimedia as to a central repository for references (that are cited on Wikipedias and other Wikimedia projects)

Upshot of all this:

There is not a single site featuring good Q&A features for papers.

If you like our approach you can contact us or contribute on the source code find some starting documentation!
So the plan is to fork an open source question answer system and enrich it with the features fulfilling the needs of scientists and some social aspects which will eventually help to rank related work of a paper.
Feel free to provide us with feedback and wishes and join our effort!

Claudio Martella talks @ FOSDEM about Apache Giraph: Distributed Graph Processing in the Cloud

Rene — Sun, 05 Feb 2012 09:01:45 +0000

Claudio Martella introduces Apache Giraph which according to him is a loose implementation of Google Pregel which was introduced on SIGMOD in 2010. He points out that Map Reduce cannot be used to do graph processing.

He then gave an example on how MapReduce can be used to to do page rank calculation. He points out that Pagerank can be calculated as a local property of a graph in a distributed way by calculating local pagerank from the knowledge of the neighbours. He did this to show what the Drawbacks of this method are in his oppinion:

job boostrap take some time
disk is hit about 6 times
Data is sorted
Graph is passed through

Like in the Pregel Paper he says that other Graphalgorithms like singlesource shortest paths have the same problems.

: Claudio Martella from Apache explains how giraph works at in the graph dev room @ Fosdem 2012

After introducing more about implementing Pregle ontop of the existing MapReduce structure for distributing he says that this system has some advantages over MapReduce

it’s a stateful computation
Disk is hit if/only for checkpoints
No sorting is necessary
Only messages hit the network

He points out that the advantages of Giraph over other methods (Hama, GoldenOrb, Signal/Collect) are especially an active community (Facebook, Yahoo, Linkedin, Twitter) behind this project. I personally think another advantage is that it is run by Apache who already run MapReduce (Hadoop) with great success. So it is something that people trust…
Claudio points out explicitly that they are searching for more contributors and I think this is really an interesting topic to work on! So thank Claudio for your inspiring work!

here the video streams from the graph dev room: