filter bubble – Data Science, Data Analytics and Machine Learning Consulting in Koblenz Germany Extract knowledge from your data and be ahead of your competition Tue, 17 Jul 2018 12:12:43 +0000 en-US hourly 1 Swiftkey XKCD comic: Sorry this has not happened before… Wed, 13 Jun 2012 15:29:17 +0000 Current readers of my blog know about typology. The project which makes predictions of what you will type in next on your smartphone. This is of course pretty similar to Swiftkey. Amazing to see how xkcd took swiftkey as a topic for the current comic. More amazing is that 3 people (not in the development team of typology) send me the link to this project independently.
I liked the comic and I think it really demonstrates the importance of a software like this. So I am really looking forward to see how typology will evolve. It also demonstrates the fallbacks of personalization of everything. In this sense I think there is also a hint to the filter bubble

]]> 0
Wikipedia to Blackout for 24 hours to fight SOPA and PIPA – Copy of the user discussion and poll on my blog Tue, 17 Jan 2012 11:31:49 +0000 I am one of the web pioneers but this is about the most amazing thing that I will be witnessing on the web as long as I can remember. Tomorrow on January 18th the english version of Wikipedia will shut down for 24 hours to protest two upcoming (?) american laws (SOPA and PIPA) that set the legal foundations to censor the web. This is happening in the country that is so proud of it’s freedom of speech.

This is such an important move of democracy that I was standing still for a couple of minutes after I heard of this! 1’800 active wikipedia authors moderators and administrators collectively agreed to make this move in order to show a protest! I am very excited to see where this will be going and what impact this has. Freedom of the internet is what makes this such a beautiful space. Everyone spread this word! discuss this! Don’t let anyone take the freedom of speech and information sharing from you!
Since the user discussion and poll won’t be available tomorrow I attached them to my blogpost.
I will not comment on this any further. Please everyone Have your own oppinion and act with responsability.

]]> 0
Propaganda, filtering and blocking by Facebook? Tue, 16 Aug 2011 17:33:22 +0000 The discussions about the ethics of facebook are old and everyone knows my oppinion on their ethics. But now I discovered a youtube video shared by a Google Product Manager on Google plus that Facebook will filter out your invitations to Googleplus from your friends facebook news stream.
It is not new that facebook is filtering news in your friends news stream but now they seem to filter information and content they don’t like and they don’t want to reach you!
In my oppinion this is very close to propaganda and our freedom! Of course there is a competition on social networking and facebook is affraid of google. But this kind of bloking and filtering of content is in my oppinion unacceptable!
have a look at the video for yourself!

This is not about competition anymore this is about ethics! Ask yourself: “Do you really want to use a social network and give so much power to a company that blocks your friends from telling you something?”

]]> 0
Stop Facebook – Filterbubble of facebook's news stream & wall Tue, 07 Jun 2011 21:31:39 +0000 As everyone knows in my limited freetime I am currently doing two tings:

  1. I am reading Eli Parisers book about the filterbubble. I don’t want to review it until I am done which happens hopefully somewhen soon. But it has a connection to the other thing I am doing
  2. I am promoting my Band’s first album ballads n bullets that just came out

I was able to convince my band that money invested in Google Adwords and Facebook is probably a better deal than spending money in print campaigns. Online is much more efficiant to reach our target group and introduce them to our music. So far so good! But sometimes you discover the worst or boldest things while doing something.

Some Background

As we all know facebook is filtering your social newsstream. Recently we did an update and send a reminder to all our fans on facebook (2000) and reminded them about the upcoming release. We got 2500 Impressions and received 100 clicks resulting in 6 sales. Not so cool!
Right now – just by my experience of watching impressions in facebook and several websites und seeing how much traffic comes from facebook to our homepage I am about to belief that Facebook counts ten impressions to your status update if one user visits his facebook news stream 10 times and was therby ten times able to see your news update.
Apperantly it seems that I am not mistaking with my guess. Have a look at Tim Wilson’s post on how Impressions are counted on Facebook:
Let us think again what it means to have 2500 Impressions of a status update. Could it be possible that these 2500 impressions have been generated only by a couple of users – let us say less than 200?

See how bold facebook is!

While booking ads facebook is offering me a deal that made me sit down and look twice! I am now able to buy visibility of my own status updates in the social news stream. If I pay for each click my status update receives Facebook will not only show the update to all of our fans. But they will also show the highly filtered interactions of them to their friends!
To make buying advertising even more attractive Facebook is telling me that our friend of a friend network has about 206’000 users that could be reached with the status updates of my band. Up till today my status updates only reached a visibility of 5500 impressions. The 206’000 that facebook offers me is only 40 times as much. Assuming that every user produces only one impression and we know that a user produces many more impressions.
A factor of 40 or higher is an amazingly huge number. Probably the number every marketing person has in mind when he decides that everyone has to be on facebook now!
Isn’t that insane! Facebooks user experience suggests us to be there because we get think that we get this incredible high reach. but the reality is that we get nothing but our premium customers if we don’t pay facebook. Facebook should pay me that I produce such a great content on facebook!

What should I say? It is a curse!

Everyone jumps on the facebook train with the totally wrong expectations. No not everyone and all of their friends see your advertising and status updates!
Of course if something really sepcial happens facebook really makes you viral but most the time you have no advanage by using facebook in comparison to other marketing methods. Especially the only one that is winning always is facebook. I have hardly seen any brand in the world that was printed and promoted on so many flyer / poster / and mags and even tv commercials. Amazingly facebook did not even pay one Cent to appear on all these media. Everyone pushes their own facebook channel – hoping to become viral – instead of pushing their own brand and thinking about how to really bring out the brand and do marketing or thinking about how to make a great product.

Everyone seems to expect miracles from Facebook

Hello everybody! Think about it! The world wide GNP won’t grow just because everyone is now using facebook! It is only facebook that is growing!
By the way I was warning everyone about the fact that you should focus on your website and not on facebook in one of my articles about the perfect band website. It is just to risky to relay on facebook. First it was great. now it is big and policies are changed over and over again!

]]> 0
Filterbubble appeared on Eli Parisers! Tue, 24 May 2011 17:53:21 +0000 Yesterday I clicked on Eli Parisers site and right away I was surprised that the AddThis plugin under the headline story offered facebook, email, twitter AND meinVZ (one of germanys biggest social networking sites)
MeinVZ logo can be found in the German Version of

I asked a friend in the states to visit For him the plugin offered facebook / email / twitter and google and of course no MeinVZ for the Americans.
Instead of MeinVZ there is a link to Google on the american Moveon website

Well of course this isn’t really the kind of personalization that Eli Pariser is telling us to watch out for. In my opinion it is a rather useful personalization of technology and not a personalization of information. But I was sure that it wasn’t on Elis mind that this is actually happening. Since Eli always points out that the filterbubble is invisible I thought it is an excelent little example of how INVISIBLE the filter bubble actually is and how easy people can contribute to it. So I contacted him and told him about it.

Eli Parisers reply to my mail

It’s a good catch — I wasn’t aware that AddThis was doing that. In the long run, we’re hoping to move away from that plugin for a number of reasons, but it’s fine to point out if you’d like — it does underscore the point that one can miss this happening under one’s own nose.

]]> 1
Google uses your Location to personalize search results Mon, 23 May 2011 11:47:23 +0000 So today I’ll start my research about the 57 signals Google uses to personalize search results. To verify that Google uses your Location to tailor search results was an easy score. After the experiment we can be 100% certain that your location is one of the 57 signals. Well I guess there was no surprise to this. Anyway personalizing search to your location is in my opinion a lot of help especially because it seems that google gives the user the freedom to pass by this filter and the filter also doesnt seem to be that strong:

First Experiment: verify Location as a signal

I am right now sitting in the city of Koblenz and I was typing in a search for weather (Wetter in German) Google returns me the weatherforcast for Koblenz. Afterwards I used a proxy tunnel to my webserver of which is located in the city of Stuttgart. After doing the exact same search on the same computer with the same browser Google returned the weather forcast of Stuttgart.

google search for weather. My location was Koblenz

Google search for weather when using a proxy tunnel to my Location Stuttgart
Since my blog is run on a shared hosting I logged in to my third webserver of which is a root server. Google coulden’t personalzie the search in this case because Google doesn’t know where the server is located. But Google asks me to enter the location where I am.

Second experiment: How to bypass the gatekeeper

Googleing the weather is kind of boaring so I did some other queries for bars and nightclubs (also in German = “bars und kneipen”) Again I got different results for Koblenz and Stuttgart. The interesting thing is that for both locations the results have been very general inlcuding also city guides from Berlin / Hamburg / Köln Munich and so on. Just the snippets from google maps have been tailored to my location.

Search for bars with a computer located in Koblenz

Search for bars with a computer located in Koblenz but using a proxy server located in Stuttgart

After adding Koblenz to the Query I obtained much better results for koblenz independent of using my proxy or not using the proxy (realize the results are the same but the location flag on the left side changed from Koblenz to Stuttgart.
Search for bars in koblenz with a computer located in Koblenz

Search for bars in koblenz with a computer located in Koblenz but using a proxy server in Stuttgart. the results remain the same as without proxy

We conclude that at the moment google pays much more attention to what you tell google you want to know than to the information google collects while browsing even though it includes information about your location while doing a keyword search it seems that it only does include this information if your search query is very general. Once you ask a more specific Query Google puts your interest as first priority.

]]> 4
What are the 57 signals google uses to filter search results? Tue, 17 May 2011 22:58:16 +0000 Since my blog post on Eli Pariser’s Ted talk about the filter bubble became quite popular and a lot of people seem to be interested in which 57 signals Google would use to filter search results I decided to extend the list from my article and list the signals I would use if I was google. It might not be 57 signals but I guess it is enough to get an idea:

  1. Our Search History.
  2. Our location – verfied -> more information
  3. the browser we use.
  4. the browsers version
  5. The computer we use
  6. The language we use
  7. the time we need to type in a query
  8. the time we spend on the search result page
  9. the time between selecting different results for the same query
  10. our operating system
  11. our operating systems version
  12. the resolution of our computer screen
  13. average amount of search requests per day
  14. average amount of search requests per topic (to finish search)
  15. distribution of search services we use (web / images / videos / real time / news / mobile)
  16. average position of search results we click on
  17. time of the day
  18. current date
  19. topics of ads we click on
  20. frequency we click advertising
  21. topics of adsense advertising we click while surfing other websites
  22. frequency we click on adsense advertising on other websites
  23. frequency of searches of domains on Google
  24. use of or google toolbar
  25. our age
  26. our sex
  27. use of “i feel lucky button”
  28. do we use the enter key or mouse to send a search request
  29. do we use keyboard shortcuts to navigate through search results
  30. do we use advanced search commands  (how often)
  31. do we use igoogle (which widgets / topics)
  32. where on the screen do we click besides the search results (how often)
  33. where do we move the mouse and mark text in the search results
  34. amount of typos while searching
  35. how often do we use related search queries
  36. how often do we use autosuggestion
  37. how often do we use spell correction
  38. distribution of short / general  queries vs. specific / long tail queries
  39. which other google services do we use (gmail / youtube/ maps / picasa /….)
  40. how often do we search for ourself

Uff I have to say after 57 minutes of brainstorming I am running out of ideas for the moment. But this might be because it is already one hour after midnight!
If you have some other ideas for signals or think some of my guesses are totally unreasonable, why don’t you tell me in the comments?
Disclaimer: this list of signals is a pure guess based on my knowledge and education on data mining. Not one signal I name might correspond to the 57 signals google is using. In future I might discuss why each of these signals could be interesting. But remember: as long as you have a high diversity in the distribution you are fine with any list of signals.

]]> 126
Social news streams – a possible PhD research topic? Mon, 25 Apr 2011 22:03:08 +0000 It is two months now of reading papers since I started my PhD program. Enough time to think about possible research topics. I am more and more interested in search, social networks in general and social news streams in particular. It is obvious that it is becoming more and more important to aggregate news around a users interests and social circle and display them to the user in an efficient manner. Facebook and Twitter are doing this in an obvious way but also Google, Google News and a lot of other sites have similar products.

To much information in one’s social environment

In order to create a news stream there is the possibility to just show the most recent information to the user (as Twitter is doing it). Due to the huge amount of information created, one wants to filter the results in order to gain a higher user experience. Facebook first started to filter the news stream on their site which lead to the widely spread discussion about their ironically called EdgeRank algorithm. Many users seem to be unhappy with the user experience of Facebook’s Top News.
Also for some information such as the existence of an event in future it might not be the best moment to display the information as soon as it becomes available.

Interesting research hook points and difficulties

I observed these trends and realized that this problem can be seen as a special case of search or more general recommendation engines in information retrieval. We want to obtain the most relevant information updates around a certain time window for every specific user.
This problem seems to me algorithmically much harder than web search where the results don’t have this time component and for a long time also haven’t been personalized to the user’s interest. The time component makes it hard to decide the question for relevance. The information is new and you don’t have any votes or indicators of relevance. Consider a news source or person in someone’s environment that wasn’t important before. All of a sudden this person could provide a highly relevant and useful information to the user.

My goal and roadmap

Fortunately in the past I have created together with several friends. Metalcon is a social network for heavy metal fans. On metalcon users can access information (cd releases, upcoming concerts, discussions, news, reviews,…) about their favorite music bands, concerts and venues in their region and updates from their friends. These information can perfectly be displayed in a social news stream. On the other hand metalcon users share information about their taste of music, the venues they go to and the people they are friend with.
This means that I have a perfect sandbox to develop and test (with real users) some smart social news algorithms that are supposed to aggregate and filter the most relevant news to our users based on their interests.
Furthermore regional information and information about music are available as linked open data. So the news stream can easily be enriched with semantic components.
Since I am about to redesign (a lot of work) metalcon for the purpose of research and I am about to go into this direction for my PhD thesis I would be very happy to receive some feedback and thoughts about my suggestions of my future research topic. You can leave a comment or contact me.
Thanks you!

Current Achievments:

]]> 4
Algorithmic Information Filter from Eli Pariser’s TED Talks Sun, 13 Mar 2011 13:34:06 +0000 Just today an interesting story came up on a German news site which goes back to Eli Pariser’s (Homepage, follow @Twitter ) talk on TED about a thing he calls the Filter Bubble and how personalization is changing the Internet. Before commenting on his talk I want to personally thank him to use his reputation and start a discussion on such a fundamental and important topic!
UPDATE most likely you are looking for my list of almost 57 signals google might use to filter
I had a short Mail conversation with Eli. He asked me to temporarly remove his TED talk since his book isn’t on sale yet. I found a very similar talk by him which he allowed me to make public in my blog. So here you go folks:

Google is filtering and personalizing search results

Eli is pointing out a thing some people might have already noticed. If two different people search for the same thing on Google it is very probable that the search results will be very different. Google is doing this without telling the user that it is acutally filtering the results based on what the algorithm thinks the user might like. According to Eli Pariser Google is using 57 signals to determine the interest of us. Among those we find:

Of course this kind of personalization has its good sides. When I am about to buy a new notebook computer y I definitely want to see different Websites if I live in Germany or in the US. This could be due to tax and shipping fees. Which means that I am most probably interested in local stores and not in oversea shops. Still this personalization and filtering is a huge potential for serious problems. Let me ask a few questions:

  • What happens if Google misinterprets our 57 signals?
  • What happens if I only receive results from a certain type?
  • What if I rely to the fact that I have access to all kind of information?

We might think we get all the information we need. But in reality we are becoming blinded by the filters Google is using. We have no chance to determine what other information is filtert and potentially available for a certain topic. On the other hand due to the amount of information we need filters and computers to help us. But the systems should be more transparent!

Facebook is also filtering the newsstream from your friends:

I have always been thinking Facebook’s huge success is strongly correlated to the fact that there is hardly Spam on Facebook and the information economy is very smart and user friendly. The attention of users to status updates is very high making facebook a great place for every company to do online and viral marketing. This of course contributes to Facebook’s reach. In fact the information architecture on Facebook is even so smart that your 20’000 followers on Facebook might not receive your status updates since Facebook’s EdgeRank algorithm decides it is not relevant to your fans or friends. Edgerank might not have 57 signals but it still takes into consideration:

  • who your fans are friend with
  • what other news they like
  • how heavy they have interacted with you in the past
  • the time passed since your last status update

Great news isn’t it? Just compare this with my statement in a recent blog post about creating newsletters as a musician in order to communicate with your fans and not solely rely on other services like Facebook or MySpace.
You don’t believe the Facebook thing? There is a video about the EdgeRank algorithm used by Facebook to determine which status updates should reach us and which shouldn’t. Feel free to have a look and thanks to the guys from Klurig Analytics for producing such a great video resource:

So what can we do?

  1. We should join the discussion in order to pursue Google, Facebook and others to become more transparent.
  2. We should be aware of the fact that a lot of information might not reach us.
  3. Even though more and more information is made available through the Internet we should not become lazy and rely on all these great web services.
  4. Last but not least you can help to spread the information about this topic! As we have seen only if a lot of people spread the information it breaks through the filtering system. And this topic is worth to be spread!

Again thanks a lot to Eli Pariser to start this discussion!

]]> 35