We propose to create a new website for the scientific community which brings together people which are reading the same paper. The basic idea is to mix the functionality of a Q&A platform (like MathOverflow) with a paper database (like arXiv). We follow a strict openness principal by making available the source code and the data we collect.
We start with an analysis how the internet is currently used in different fields and explain the shortcomings. The actual product description can be found under the section “Basic idea”. At the end we present an overview over the websites which follow a similar approach.
This document – as well as the whole project – is work in progress. We are happy about any kind of comments or other contributions.
Every scientist hast to stay up to date with the developments in his area of research. The basic sources for finding new information are:
Every community has found its very own way of how to use this tools.
To stay up to date with recent developments I check arxiv.org on a daily basis (RSS feed) participate in mathoverflow.net and search for papers over Google Scholar or MathSciNet. Occasionally interesting work is shared by people in my Google+ circles. In general the speed of pure mathematics is very slow. New research often builds upon work which has been out for a few years. To stay reasonably up to date it is enough to go to conferences every 3-5 months.
I read many papers on myself because I am the only one at the department who does research on that particular topic. We have a reading class where we read papers/lecture notes which are relevant for more people. Usually they are concerned with introductions to certain kinds of theory. We have weekly seminars where people talk about their recently published work. There are some very active blogs by famous mathematicians, but in my area blogs play virtually no role.
In Computer Science topics are evolving but also changing very quickly. It is always important to have both an overview of upcoming technologies (which you get from tech blogs) as well as access to current research trends.
Since the speed in computer science is so fast and the review process in Journals often takes much time our main source of information and papers are conferences and twitter.
Another rich source for computer scientists is, of course, the related work of papers and google scholar. Especially useful is the method of finding a very influential paper with more than 1000 citations and find newer papers that quote this paper containing a certain keyword which is one of the features of google scholar.
The main problem in computer science is not to find a rare paper or idea but rather to filter the huge amount of publications and also bad publications and also keep track of trends. In this way a system that ranks and summarize papers (not only by abstract and citation counts) would help me a lot to select what related work of a paper I should read!
As a psychologist/neuroscientist, I receive recommendations for scientific papers via google scholar alerts or science direct alerts (http://www.sciencedirect.com/); I receive alerts regarding keywords or regarding entire journal issues. When I search for a certain publication, I use pubmed.org or scholar.google.com. This can sometimes be kind of annoying, as I receive multiple alerts from different sources; but I guess it is the best way to stay up to date regarding recent developments. This is especially important in my field, as we feel a big amount of “publication pressure”; I work on a method which is considered as “quite fancy” at the moment, so I also use the alerts to make sure nobody has published “my” experiment yet.
Sometimes a facebook friend recommends a certain publication or a colleague points me to it. Most of the time, I read articles on my own, as I am the only person working on this specific topic at my institution. Additionally, we have a weekly journal club where everyone in turn presents work which is related to our focus of research, e.g. a certain part of the human brain. There is also a weekly seminar dedicated to presentations about ongoing projects.
Blogs (e.g. mindhacks.com, http://neuroskeptic.blogspot.com/) can be a source to get an overview about recent developments, but I have to admit I use them mainly for work-related entertainment.
All in all, it is easy to stay up to date using alerts from different platforms; the annoying part of it is the flood of emails you receive and that you are quite often alerted to articles that don’t fit your interests (no matter how exact you try to specify your keywords).
In the biological sciences, in research at the bench – communication is one of the most fundamental tools a scientist can have. Communication with other scientist may open up the possibilities of new collaborations, can lead to a completely new view point of a known question, the integration and expansion of methods as well as allowing a scientist to have a good understanding of what is known, what is not known and what other people have – both successfully and unsuccessfully – tried to investigate.
Yet communication is something that is currently very much lacking in academic science – lacking to the extent that most scientist will agree hinders the progress of research. Nonetheless the lack of communication and the issues it brings with it is something that most scientists will have accepted as a necessary evil – not knowing how to possibly change it.
Progress is only reported in peer-reviewed journals – many which are greatly affected not only but what is currently “sexy” in research but also by politics and connections and the “publish or perish” pressure. Due to the amount of this pressure in publishing in journals and the amount of weight the list of your publications will have upon any young scientists chances of success, scientist tend also to be very reluctant in sharing any information pre-publication.
Furthermore one of the major issues is that currently there really is no way of publishing or communicating either negative results or minor findings, which causes may questions or methods to be repeatedly investigated as well as a loss of information.
Given how much social networks and the internet has changed communication as well as the access to information over the past years – there is a need for this change to affect research and communication in the life science and transform the way we think not only about solving and approaching research questions we gather but the information and insights we gain as a whole.
The most important source of information for philosophers is http://philpapers.org/. You can follow trends going on in your field of interest. Philpapers has a list of almost all papers together with their abstracts, keywords and categories as well as a link to the publisher. Additional information about similar papers is displayed.
Every category of papers is managed by some editor. For each category it is possible to subscribe to a newsletter. In this way once per month I will be informed about current publications in journals related to my topic of interest. Every User is able to create an account and manage his literature and the papers of his he is interested in.
Other research and information exchange methods among philosophers consist of mailing lists, reading clubs and Blogs. Have a look at David Chalmers blog list. Blogs are also becoming more and more important. Unfortunately they are usually on general topics and discussing developments of the community (e.g. Leiter’s Blog, Chalmers’ Blog and Schwitzgebel’s Blog).
But all together I still think that for me a centralized service like Philpapers is my favourite tool because it aggregates most information. If I don’t hear about it on Philpapers usually it is not that important. I think among Philosophers this platform – though incomplete – seems to be the standard for the next couple of years.
As a scientist it is crucial to be informed about the current developments in the research area. Abstracting from the reports above we divide the tasks roughly into the following stages.
1. Finding and filtering new publications:
2. Getting more information about a given paper:
Finally there is a fundamental decision: Shall I read the whole paper, or not? which leads us to the next task.
3. Understanding a paper: Understanding a paper in depth can be a very time consuming and tedious process. The presentation is often very short and much knowledge is assumed from the reader. The notation choices can be bad, so that even the statements are hard to understand. In effect the paper is easily readable only for a very small circle of specialist in the area. If one is not in the lucky situation to belong to that circle, one usually applies the following strategies:
In mathematics most papers are not getting read though the end. One uses strategies 1 & 2 till one gets stuck and moves on to something more exciting. The chances of survival are much better with strategy 3 where one is committed putting a lot of effort in it over weeks.
4. Finding related work. Where to go from there? Is the paper superseded by a more recent development? Which are the relevant papers which the author builds upon? What are the historic influences? What are the founding ideas of the subject? Finding related work is very time consuming. It is easy to overlook things given that the references are often vast, and sometimes hard to get hold of. Getting information over citations requires often access to commercial databases.
All researchers around the world are faced with the same problems and come up with their individual solutions. There are great synergies in bringing these people together with an online platform! Most of the addressed problems are solved with a paper centric service which allows you to…
We want to do that with a new mixture of a traditional Q&A system like StackExchange or MathOverflow with a paper database and social features. The key features of this system are as follows:
Openness: We follow a strict openness principle. The software will be developed in open source. All data generated on this site will be under a creative commons license (like Wikipedia) and will be made available to the community in form of database dumps or an API (open data).
We use two different types of content sites in our system: Papers and Discussions.
Paper sites. A paper site is dedicated to a single publication. And has the following features:
The point “Related Work” deserves some further explanation. The citation graph offers a great deal more information than just a list of references. Together with the user generated content like votes and the individual paper bookmarks and social graph one has a very interesting data set which can be harvested. We want this point at least view with respect to: Popularity/Topics/Read by Friends. Later on one could add more sophisticated, even graphical views on this graph.
Discussion sites. A discussion looks more like a traditional QA-question, with the difference, that each discussion may have related (many) papers. A discussion site contains:
Besides the content sides we want to provide the following features:
News Stream. This is the start page of our website. It will be generated from the network consisting of friends, papers and authors. There should be several modes like:
Moreover, filter by tag should be always available.
Search bar:
Social: (hard to implement, maybe for second version!)
Our proposed websites improves the above mentioned problems in the following ways.
1. Finding and filtering new publications:This step can be improved with even very little community effort:
2. Getting more information about a given paper:
Many reviews of new papers are already written. E.g. MathSciNet and Zentralblatt maintain a large database of Reviews which are provided by the community and are not freely available. Many authors would be much more happy to write them to an open system!
3. Understanding a paper:Here are the mayor synergies which we want to address with our project.
4. Finding related work. Having a well structured and easily navigable view on related papers simplifies the search a lot. The filtering benefits from the content generated by the users (votes) and individual information, like friends who have written/bookmarked a paper.
There are several discussions in QA forum which are discussing precisely this problem:
We found three sites on the internet which follow a similar approach which we examined more carefully.
1. There is a social network which has most of our features implemented:
researchgate.net
“Connect with researchers, make your work visible, and stay current.”
The Economist has dedicated an article to them. It is essentially a facebook clone, with special features for scientist.
Differences to our approach:
2. Another website which comes reasonably close is:
“an academic network that aggregates links to research paper preprints
then categorizes them into proceedings.”
Differences to our approach:
3. Another very similar site is:
journalfire.com – beta
“Share what your read – connect to colleagues – create journal clubs.”
It has the following features:
They are very skilled: Maintained by 3 PhD students/postdocs from Caltec and MIT.
Differences to our approach:
The site is still at its very beginning with little users. The project started in 2010 and did not gain much momentum since.
The other sites are roughly classified in the following categories:
1. Single people who are following a very similar idea:
Analysis: Although the principal idea to connect people reading papers is there. The implementation is very bad in terms of usability and even basic programming. Also the voting features are missed out.
2. (Semi) Professional sites.
Analysis: The goal of all these tools is to simplify the reference management, by providing metadata like references, citations, abstracts, author profiles. Commenting features on the paper site are not there or not promoted.
3. Vaguely related sites which solve different problems:
Upshot of all this:
If you like our approach you can contact us or contribute on the source code find some starting documentation!
So the plan is to fork an open source question answer system and enrich it with the features fulfilling the needs of scientists and some social aspects which will eventually help to rank related work of a paper.
Feel free to provide us with feedback and wishes and join our effort!
We are lucky. modern and great companies have very informative videos about their product.
I also love the open approach they took. Freebase the database created by metaweb is not protected from others. No it is creative common licence. Everyone in the community knows that the semantic database they run is nice to have but the key point is really the technology they built to run queries against the data base and handel data reconciling. Making it creative common helps others to improve it like wikipedia. it also created trust and built a community around freebase.
I know it sounds as if I was 5 years old but I am fully convinced that there is an abstract concept behind the success of metaweb and I call it their business model. Of course I can think of many other ways than selling to google how to monetarize this technoligy. But in the case of metaweb this was just not neccessary.
]]>In order to create a news stream there is the possibility to just show the most recent information to the user (as Twitter is doing it). Due to the huge amount of information created, one wants to filter the results in order to gain a higher user experience. Facebook first started to filter the news stream on their site which lead to the widely spread discussion about their ironically called EdgeRank algorithm. Many users seem to be unhappy with the user experience of Facebook’s Top News.
Also for some information such as the existence of an event in future it might not be the best moment to display the information as soon as it becomes available.
I observed these trends and realized that this problem can be seen as a special case of search or more general recommendation engines in information retrieval. We want to obtain the most relevant information updates around a certain time window for every specific user.
This problem seems to me algorithmically much harder than web search where the results don’t have this time component and for a long time also haven’t been personalized to the user’s interest. The time component makes it hard to decide the question for relevance. The information is new and you don’t have any votes or indicators of relevance. Consider a news source or person in someone’s environment that wasn’t important before. All of a sudden this person could provide a highly relevant and useful information to the user.
Fortunately in the past I have created metalcon.de together with several friends. Metalcon is a social network for heavy metal fans. On metalcon users can access information (cd releases, upcoming concerts, discussions, news, reviews,…) about their favorite music bands, concerts and venues in their region and updates from their friends. These information can perfectly be displayed in a social news stream. On the other hand metalcon users share information about their taste of music, the venues they go to and the people they are friend with.
This means that I have a perfect sandbox to develop and test (with real users) some smart social news algorithms that are supposed to aggregate and filter the most relevant news to our users based on their interests.
Furthermore regional information and information about music are available as linked open data. So the news stream can easily be enriched with semantic components.
Since I am about to redesign (a lot of work) metalcon for the purpose of research and I am about to go into this direction for my PhD thesis I would be very happy to receive some feedback and thoughts about my suggestions of my future research topic. You can leave a comment or contact me.
Thanks you!
You might wonder now how data can be linked? This is pretty easy. Let us take me for example. Take the following statement about me: “Rene lives in Koblenz which is a City in Germany.”
So I could create the following data triples:
I think it is clear what great things can be achieved once we have this linked open data. But there are still a lot of challenges to take.
Fortunately we already have more or less satisfying answers to these questions. But as with any science we have to carefully watch out. Since linked open data is accessible by everyone and enables probably as many bad things as it enables people to do good things with it. So of course we should all become very euphoric about this great thing but bear in mind that nuclear science was not only good thing. By the end of the day it lead to really bad things like nuclear bombs!
I am happy to get your feedback and opinion about linked open data! I will very soon publish some articles with links to sources of linked open data. If you know some why don’t you tell me in the comments?
Some data sets might only be available through an API. If there doesn’t exist good tutorials I will introduce the API in these cases.
In the end, I just wanted to encourage you that if you think I am missing out on some good data sets, to please contact me and tell me about it immediately! That would be very helpful!.
]]>