Data – Data Science, Data Analytics and Machine Learning Consulting in Koblenz Germany

Virtual Payment Channels for the Lightning Network – A Lightning Network Improvement Proposal draft

Tue, 17 Jul 2018 11:07:50 +0000

I suggest to add another feature to the lightning network which I call a virtual payment channel (or in short VPC) to be described in this article. Such a virtual payment channel will NOT be backed by the blockchain and thus cannot operate in a trustless manner. However I see several use cases for virtual payment channels. The most important one seems to be the fact that two parties which agree to create a virtual payment channel can instantly create a channel at basically arbitrary capacity. This will help to increase the max flow properties of the network and allow for routing to become much easier. A virtual payment channel basically allows both channel partners to access the funds of the other channel partner which yields some flexibility and obviously a high risk in case of fraudulent behavior. While writing down the article I realized there are also some dangers to virtual payment channels. The main one is that to some degree VPCs resembles the fractional reserve banking system even though it is still only credit based on positive money (I think the English language should borrow the German word “Vollgeld-System“). At first I considered dropping the article because the last thing that we need is a cryptocurrency crises similar to the banking crisis that we have been through. However I think when regulated and used properly VPCs will add more benefit than danger to the lightning network. Despite the drawbacks with this article I will still propose this feature.

I came up with the concept of virtual payment channels since I finally started my work on a splicing protocol specification for BOLT1.1. On paper I have already laid out the data flow for asynchronous, non blocking splicing in and splicing out operations – which resemble pretty much the ones discussed on the github issue. I will write about them soon. Before digging deeper into laying out my specification I wanted to increase the usefulness of the splicing operations. So I started to wonder if there was a way that splicing could take place without the need to ever hit the blockchain? While playing around with some ideas I realized (though not proven) that splicing without hitting the blockchain seems impossible if the trustless property for payment channels is mandatory. Hence I asked myself what would happen if we dropped this central property? As a result I accidentally came up with the concept of virtual payment channels.

Having talked to larger exchanges and potential hubs of the lightning network in the past and having seen that they wonder a lot about rebalancing their funds and about how they could provide capacity without locking their liquidity uselessly I realized that virtual payment channels might be a solution to some of their core problems. During the article and at the end of this article I will provide use case stories from the perspective of a central node in the network as well as from the perspective of a mining pool.

Let us begin by looking at one of the two core elements of the lightning network: The (standard) payment channel. A standard payment channel operates in a trustless manner which allows two parties to create a payment channel without the need for them to trust each other. The ability to create trustless payment channels is certainly a very useful property that the lightning network must have. However this trustless property comes at a rather high price bringing us the following three drawbacks (which should still exist even if we upgrade payment channels to follow the eltoo protocol proposal):

Even when Splicing is implemented channels need to lock the bitcoins that create the capacity for the channel. Meanwhile those bitcoins cannot be used differently making the lightning network and its topology somewhat rigid. (In particular if you take into consideration how much topology could at most change within one bitcoin block.)
Creating (and closing) the channels (and splicing funds) are blockchain transactions which in general are slow and expensive. Slow because we have a block time of roughly ten minutes. Expensive because independently of the block size discussion there is a physical limit of how many transactions fit into a block. Remember that space in the block for a transaction is generally bought by providing mining fees.
Bitcoins within the payment channel are in a hot wallet and it is difficult to secure them properly. Even if the output of the commitment transaction script is spendable only by a cold wallet address we have no guarantee that an attacker taking over the lightning node(s) might mitigate a different output for the next commitment transaction (for example if splicing out of funds is being implemented) or even easier the attacker could just make a payment to some address emptying the channel.

Additionally an indirect problem will be that companies like exchanges, large scale shops or payment provider or even banks will have many payment channels since they are likely to be central nodes in the network. The amount of channels might exceed the capacity of the hardware running this node. So people providing core financial infrastructure might need to operate several lightning nodes (let us call them A,B,C,…) to serve their customers. Companies then have to decide by collecting and evaluating data which customers to connect to which of their nodes. They also have to connect their own nodes via largely funded payment channels among each other. Even if done in some smart circle like topology (with channels so that each node can be reached in log N hops) it is not the preferable way since way too many funds are unnecessarily being locked to maintain clearing of funds between ones own nodes. Another drawback is that those additional funds are also on hot wallets again and need to be available in the first place.

Let us assume in the above setting some node N1 wants to make a payment to some node N2. The route for the payment happens to be:

N1 –> A –> B –> N2.

In the classical lightning network the routing could only take place if A could literally send over the requested money to node B within a payment channel between A and B. For brevity I omitted the exchange of the preimages and negotiating HTLCs and new commitment transactions. Those details are defined in the protocol specification so that A and B don’t loose money while they route a payment from N1 to N2.

However if A and B are owned by the same entity (as assumed above) there would be no harm if A received some money and B forwards it without A forwarding the money to B. In particular a payment channel with Bitcoins being locked between A and B would be superfluous. This observation is exactly the idea behind a virtual payment channel.

A high level protocol for operating a virtual payment channel could look like this:

Two nodes would agree on an amount both sides provide to the virtual channel. The money does not necessarily have to be backed somewhere (though having the money available is strongly advisable) and the initial balance also does not have to be symmetric.
The channel balance is maintained by both sides as a balance sheet that needs to be cryptographically signed by both sides for every new update. Even though it will not be possible to enforce the final balance via the blockchain those balance sheets might be enforceable by real courts. (Which technically should never happen as VPCs should only be created if there is complete mutual trust between the channel partners)
The contract between both sides includes the fact that the balance sheet with the highest number of mutually signed updates is the correct one and considered to be used for settlement. (I will repeat myself: Such a contract could only be enforced by a real court and not by the blockchain.)
Channel creation is the first update on the balance sheet and needs to be signed by both parties. In that way a party can decline to open a virtual payment channel by refusing to sign the balance sheet.
The channel is then announced to the network as virtual payment channel via an extension of the gossip protocol. The extension is necessary because currently the blockchain funding transaction needs to be referenced as a proof of existence of the channel. The transaction is encoded to the short_channel_id field of the announcement_signature message as well as the channel_announcement message. From the perspective of the network it is not any different than the other channels. It can be used for routing payments since every hop is a bilateral contract. The rest of the network doesn’t have to care if that contract is actually enforceable by the blockchain or if there is some other trust provider between the channel partners. I literally see no risk in routing money through a third party virtual payment channel for the sender and recipient of the funds.
The channel announcement should be done by both sides and should at least be signed with their own bitcoin keys. This will prevent spamming channels which can’t be used for routing since the partners do not agree that the channels exist. (c.f. point 4)
Routing along a VPC takes place instead of an HTLC. Channel Partners will just update the channel balance. (in fact they could even resemble the HTLC behavior to their balance sheet but I don’t see the benefit)
When closing the channel one party has to somehow reimburse the other side depending on the end balance in comparison to the start balance. This problem of settlement is the problem of the two partners and not part of the protocol. From a protocol perspective any channel partner can just announce the closing of the channel which takes effect immediately. This is precisely the moment where trust and IOU come into place.
Closing of the payment channel also has to be announced via the gossip protocol.

Attention! If for some reason the other party turns out not to be trustworthy one could loose as many funds as the other side virtually provided. In the best case the two nodes are owned by the same person or company or there is some written agreement which backs the virtual payment channel. However obviously one cannot enforce such circumstances and prohibit arbitrary people to create a virtual payment channel. That is why virtual payment channels have to be used with extreme caution. As soon as you accept an inbound virtual payment channel from a third party you vouch with your own funds for the payments of your channel partner. Besides that there are a couple of other drawbacks I see with this proposal.

The 3 biggest drawbacks that I see are:

A virtual payment channel can be seen as an IOU resembling to some degree the mechanics of a demand deposit. One side can provide funds that it doesn’t possess to a VPC and have the other side spend those funds. One cannot tackle this issue by implementing a rule that the network would not accept virtual payment channels if the sum of the amounts in the virtual payment channel is bigger than the sum of the capacities of the standard payment channels. The reason is that the network cannot prove after a virtual payment channel is being closed and reopened that settlement between both partners took place. Hence the potential channel capacity is really infinite. This does not mean that an infinite number of bitcoins can be created out of thin air.
Infinite channel capacity is not a direct problem as in the worst case when routing takes place from A to B the standard channels of B might eventually be dried out in the sense that no further routing away from B can take place. However a network or circle of virtual payment channels could lead to weird chain reactions in which I pay someone and the money is only being routed through virtual payment channels creating literally a money flow out of thin air. I did not calculate this through but I guess at some point in time it will lead to imbalances with real channels but in the first place creating more virtual payment channels is not prohibited.
Since the balance sheet is signed by both parties leaking of the key used for signing is like leaking keys of a traditional payment channel. So even in a virtual payment channel the bitcoins are somehow hot. One cannot directly steal bitcoins but one could initiate sending the virtual balance to nodes one controls and the virtual balance on the routes eventually becomes real balance. So again there need to be complete trust among channel partners.

Things to take into consideration:

It is always safe sending money over a virtual payment channel.
There is no risk to provide virtual balance in a virtual payment channel.
The risk is receiving money on a virtual payment channel because the recipient needs to be sure she will get settled later. This holds true in particular if receiving money was linked to a task like routing it further through standard channels or selling some goods. Both of which should be the standard use cases.
One could adopt routing fees that reflect those risks depending in which direction one uses the virtual payment channel or who provides it.

I will now describe a realistic setting which would benefit hugely from the chance to create virtual payment channels.

A use case for virtual payment channels: Payouts in mining pools.

Let us assume you are operating a mining pool. You want to do daily payouts for your – let us say 100k – customers. Obviously your customers already trust you since they commit their hash power to your pool and trust on the fact that you will eventually do a payout.

The payout scenario without virtual payment channels and without the lightning network.

Every day one has to fit 100k transactions to the blockchain. With batching obviously one could use less transactions but still the amount of transactions and fees will be considerable.

If we have the lightning network the situation is not even getting better:

For the payouts on day one we have to create payment channels. Obviously a mining pool cannot overfund those channels by too much since it would already need to have the liquidity to do so. So on day 2 the channels have to be either spliced in or the channel has to be closed and reopened. Unless of course the money was already being spent by the customer over the lightning network. (This however will only be possible if additional funds of the mining pool will be used to fund other external channels giving customers the possibility to spend their funds in a meaningful way). We see that using lightning in that case actually seems to increase the amount of blockchain transactions and increases the amount of liquidity that is being used.

Obviously the answer to the sketched problems could be that the mining pool does not have to operate channels to all its customers but can just rely on the lightning network to work in order to find routes to the customers while doing payouts. I predict that in many cases it won’t be possible for the pool to find a route to the customer when she creates an invoice and a channel has to be funded anyway.

Situation with virtual payment channels on the lightning network:

The pool could accept a virtual payment channel with funds as high as the pool owns to the customer on the customers side of the VPC. This does not require a Blockchain transaction and saves fees. For the second payout on the next day the balance can just be increased by the amount the pool owes to the user. Obviously this kind of payout is not real as the customer does not directly control the funds. However the pool provides cryptographic proof and a signature to the customer which again cannot be enforced by the blockchain. The customer at any time can pay via lightning using exactly this virtual payment channel.

In order for this to be useful for the customer his virtual payment channel needs to be connected to the “real” lightning network. This is achieved by the mining pool which has to have several outgoing channels funded to rather central nodes on the network (e.g. shops, exchanges, payment provider,…). Since the pool can assume that not all customers will spend or transfer all their funds immediately the pool only has to provide a fraction of the funds allocated in the virtual payment channels as outgoing capacity to the other real channels. This reduces the amount of liquidity that needs to be on the hot lightning network node wallet. In case a channel dries out there will be funds spliced in or the channel is closed and reopened. This will happen way less frequently than payouts to all customers and thus reduces load on the blockchain. Also the customer can already receive bitcoins over the lightning network in this virtual payment channel (if he decides to trust even more funds to the virtual channel) In any case the customer can obviously also accept or demand payments on a standard payment channel by disallowing inbound routing over the virtual payment channel.

Finally the customer could even get hold of his funds by fulfilling inbound routing requests on his standard payment channels which do outbound routing over the virtual payment channel.

For customers of the pool this mechanism probably also has other great advantages:

They could configure their software that spending and routing will first go through the VPC so that their money is quickly being spent.
They will probably be connected much better to many shops since the mining pool is maintaining and funding those real payment channels.
One could even think of a feature in which the pool could “splice out” money of a virtual payment channel by funding a standard payment channel for the user in case the pool can’t find a route to the recipient of the customer. The pool could obviously just itself open that channel in order to have a better connectivity to the lightning network in the future increasing the service for all its customers.

Conclusion:

We see that Virtual Payment Channels increase the flexibility for groups of users that for some reason trust each other. We can think of a virtual payment channel as an agreement to spend the funds of the channel partner in ones own name. Applications are for services that have plenty of customers and don’t want to reallocate funds all the time as well as for very central power nodes that cannot handle all of their traffic on one computer and need to distribute the node to several computers. However virtual payment channels should be used with large caution as one needs to trust another party. In the case of just distributing ones own node this will never be an issue. In case of the mining pool scenario trust already needed to exist in the first place so it should be ok but still yields a risk for the customer. The biggest downside is that a virtual payment channel can be seen as creating liquidity out of thin air and is kind of a credit that might break forcing chain reactions. However in this case it is just the channel partner who should have known better than trusting blindly. In particular if large operators go bankrupt as a follow up of such a chain reaction everyone who has standard payment channels with those operators won’t loose his or her funds due to the trustless property of the lightning network.

To the best of my knowledge the concept of virtual payment channels for the lightning network has not been proposed before. If I am mistaken I will be happy to receive a pointer. Now I am very excited to see what you as a community will think about my proposal. In particular I am excited since I understand that my proposal violets the core philosophy of the lightning network meaning that payment channels can operate in a trustless manner. Yet I think progress will only happen if creativity is allowed by thinking about all possibilities instead of fixed limitations. Also it will not change the trustless nature of the lightning network as this will only occur if two consenting parties agree to engage to such an endeavor. As outlined in this article they might have good reasons to do so. In the great tradition of Bitcoin Improvement Proposals (BIPs) I suggest that we start a process for Lightning Network Improvement Proposals (aka LIPs). This draft would need to be specified a little bit more if people think it would be a useful feature and could be such a LIP at the end.

Obviously if my proposal won’t make it to future BOLT specifications the idea is out and we cannot prevent companies like a mining pool operator still implementing this within their proprietary lightning supported wallet which they provide to their customers. However I think it would be much more preferable if virtual payment channels either become part of the protocol standard or are never used.

If you liked this article and are interested in the follow ups you can follow me on twitter.

]]>

Thoughts about eltoo: Another protocol for payment channel management in the lightning network

Sun, 08 Jul 2018 21:56:08 +0000

In this article I will give an overview about the proposed lightning network extension eltoo. There has been quite some buzz about the eltoo paper proposed by Christian Decker, Rusty Russell and Olaoluwa Osuntokun in May 2018. Bitcoin magazines and blogs have been making promising statements, that lightning will become even faster than it currently is. According to them eltoo allows for payment channels to interact less with bitcoins blockchain (which I will state already is just plain wrong.)

Having read Christian Deckers main publication for his dissertation in which the authors suggest the use of invalidation trees to invalidate old states of an payment channel I was confident that eltoo would be a really nice improvement for the lightning network protocol. However following the “Don’t trust. Verify!” philosophy of the Bitcoin community I obviously had to study the eltoo protocol proposal myself. In particular I did not believe those bullish news from the magazines since from a technical perspective I do not see how the speed of the lightning network can significantly be increased. The speed of the lightning network at its current state is basically bound by the speed of TCP/IP sockets at the payment channel level and during routing by the time locks of the HTLCs. Since eltoo is only looking at channel management we can obviously omit the later case. As I am starting to work on the splicing part of the BOLT 1.1 specification which is a protocol extension of payment channel management obviously I need to understand eltoo anyway.

Yesterday after holding an openly licensed German language lecture on the basics of the lightning network technology (not to be confused with BOLT the Lightning network RFC) I had a 5 hour train ride home from the event and finally found the time to carefully study the eltoo paper. Even though the paper is very well structured and written and I thought it might be helpful if I share my understanding of the proposed protocol back with the community. However my main reason is that I hope by writing down and explaining the paper I get an even clearer understanding myself. In case I misunderstood something about the eltoo paper please let me know so that I can update this article. Also there are some rather technical questions I still have (for example this one which I have already posted on bitcoin stack exchange)

So let us analyze eltoo by understanding the motivation for yet another payment channel protocol by reviewing some of the well known facts about the current payment channel management.

Unpublished commitment Transactions encode the balance (or more technically spoken state) of a payment channel in the lightning network.

For the current version of the lightning network payment channel protocol we need to store information about all the old states of a channel. With state of a channel we mean the current distribution of the channels capacity (aka the balance) among the channel partners. At the core the current payment channel (and also the solution proposed in eltoo) a payment channel is just a 2-2 multi signature wallet. Currently the balance of a payment channel is encoded by two unpublished transactions that would spend the channels capacity according to the balance sheet with the channel partners as receipients. Actually we call those transactions commitment transactions. These transactions are both signed by both channel partners and each of them has one of them stored on his lightning node. This creates some kind of information asymmetry and a risk of funds being stolen should the information leak to the channel partner.

Funds within the channel are being maintained by creating a new commitment transaction (kind of a double spend) that spends the funds of the multi signature wallet (the channels capacity) in a different way. In order for both parties to be able to trust in the newly agreed upon channel balance we obviously have the problem of invalidating the old commitment transactions. In a trustless setting (which we wish to achieve) channel partners cannot rely on their counterparts to delete the old commitment transaction once they agree on a new channel balance. The reason is that those commitment transaction are valid bitcoin transactions signed by both channel partners and both partners own a copy and could have theoretically made more copies in between time.

All currently known protocols for maintaining a full duplex payment channel have in common that old channel states need to be invalidated.

The current implementation of the lightning network achieves the goal of invalidating old channel states by including the hash of a revocation key within the output script of a commitment transaction. The revocation key itself can only be known by the channel partner that does not control a signed copy of that transaction. (Again this is where the information asymmetry kicks in) The output script of the commitment transaction will eventually spend the funds according to the channel balance that was agreed upon for this particular commitment transaction. However it allows to spend the output of a published commitment transaction directly to send the entire capacity of the channel to ones own bitcoin address once one can present the signature from the secrete revocation key. If channel partners wish to update to a new channel state they will collaboratively compute the revocation resulting in the hash that is stored in the most current commitment transaction. The important requirement of the protocol design is that neither partner can compute the revocation key without the help of the other partner. Owning the revocation key prevents the channel partner to publish an old commitment transaction (aka state) since the publishing side would be punished by the other side releasing the revocation key and spending the outputs of the commitment transaction directly. Note that releasing the revocation key itself is not dangerous since it can only be useful in the output script of the transaction which includes the hash of the revocation key. As you would imagine each new state / commitment transaction includes the hash of a new revocation key. Therefor it is really the old transaction that needs to be kept secrete or even better destroyed. Obviously with such a setup both sides need to store all old revocation keys in order to be able to protect oneself from the other side publishing an old channel state (which for example could be a state in which all the funds belonged to the side that tries to publish this state)

Eltoo suggest a different protocol for channel management such that either side does not need to store all old state but just the current state. This is supposed to simplify implementing lightning nodes and also running them. According to Christian Decker this will also simplify implementing features like splicing. (On a side note I have not thought the splicing operation completely through for eltoo but it seems obvious to me that the chain of update transactions in eltoo can easily be extended by at least splice out operations.)

The initial setup of a payment channel within the Eltoo protocol is in my opinion exactly the same as in the current specification of the lightning network.
A payment channel is created by funding a 2-2 multi signature wallet. The state of the channel is again encoded within a transaction that is just not published yet. Only the process of invalidating old state differs. Both channel parties create two sets of public and private keys. The first set is called the update keys and the second set is called the settlement keys. Actually the settlement keys should be derived from a deterministic hierarchical wallet so that settlement transactions can only be bound to one specific update transaction (but I guess that is a rather subtle though important detail). According to Christian Decker what is new is that

… commitment is no longer a single transaction customized for each side, but an update/commitment that is symmetric and a settlement built on top.

A chain of update and settlement transactions allows for encoding channel state.

The state of the channel is encoded as the most recent element of a sequence of settlement transactions in which the current settlement transaction is encoding the current channel state. All the old settlement transactions are not needed and can safely be thrown away. Though not being the same as commitment transactions in the current lightning network implementations the settlement transactions can probably be considered to be the analogous counterpart. Each settlement transaction spends an – so called – update transaction. The update transactions are formed as a “linked list” starting from the funding transaction (which could already be interpreted as an update transaction). The update transactions spend the output of the funding transaction (or the previous update transaction(s!) ). The output of the new update transaction is a two path multisignature script where in the first path the output can be directly spent with the pair of update keys. In the second path the output can be spent with the settlement keys as long as a timelock is achieved. This will always give priority to successfully double spend an old settlement transaction with a newer update transaction. In case the newest settlement transaction is being broadcast there exist no newer update transaction which is why the double spend of another update transaction cannot be achieved by a single entity in the channel. Therefor after the timelock is over the channel is closed.

This setup is bad however since the current channel balance encoded in an settlement transaction which is the child of the latest update transaction which is the child of the previous update transaction and so on. Therefor in order to close the channel one would need to broadcast the complete chain of update transactions. This would be bad as it would not take any load from the blockchain in comparison to directly transfer bitcoins every time one wold want to update a payment channel and we would also need to store all old update transactions not really decreasing the overhead. However with the current version of the bitcoin protcol there is nothing we can do about this and eltoo is nothing but a nice thought and completely impractical. However a minor update to bitcoin (which was actually already talked about in the current lightning network paper) is finally proposed as BIP 118 and would also make eltoo useful.

Eltoo achieves the simplification by introducing the idea of floating transactions which are realized by introducing a modifier SIGHASH_NOINPUT for the existing OP_CODES check{,multi}sig{,verify}

With SIGHASH_NOINPUT the transaction that is being created to spend some output will ignore the prevout_point (tx hash, output index), sequences, and scriptCode for its signature calculation. This allows to change the output that is being spent as long as the output script matches the input script. Since the output scripts of the updating transactions are all the same (at least for the path that spends to output again to the update keys of the 2-2 multisig wallet) one can bind a later update transaction to the parents or grandparents or great grandparents or … of the most recent output transaction. (I don’t see the point of summarizing all the technical details here since they are very well described in the paper and I don’t see how I would formulate them differently or shorter) What this does however is that instead of the need to publish all previous funding transactions the current one can just be bound to the funding transaction and can be broadcasted together with its settlement transaction. Since there is a timelock in the update transaction (it could be an invalid one) the settlementtransaction should obviously only be broadcasted after the timeout since it will be rejected by the bitcoin network beforehand.

That being said the current channel state is defined by the already mined funding transaction as well as the most recent update transaction and its child the settlement transaction. In case channel partner Mallory would want to publish and older state (encoded within some settlement transaction) she would first need to publish the update transaction that is parent of this very settlement transaction. She can do so as she can also bind that corresponding update transaction to the funding transaction. However if she now spends the settlement transaction there is the path in the output script of the update transaction that allows for direct spending of a newer update transaction (which can be the newest update transaction -known by the other side – being bound to that particular update transaction). This will prevent old state from entering the blockchain.

There are some minor details about including a trigger transaction which I will omit here. However it seems important to mention that this mechanism also requires for the other side to be online or having a watchtower mechanism being included. Also with this mechanism in the uncollaborative closing case, the collaborative one and the case where one party tries to steal funds the blockchain will need to have at least three transactions being broadcasted to the blockchain (the trigger transaction, the most recent update transaction and the settlement transaction) In that sense I understand eltoo as a tradeoff between convenience for channel management in comparison to minimal blockchain involvement. As I mentioned during the autopilot discussion at the lightning hackday the blockchain will still need about 10 years to have one payment channel open for every internet user in the world. On the other side it seems to have many advantages over the current protocol for payment channel management.

Advantages of eltoo:

Biggest pro seems to be that it is easily compatible with the current lightning network. It does not matter of the network uses various protocols for payment channel management.
As fraudulent behavior can also exist by accident (e.g. a bug in the implementation) it is nice that with eltoo the punishment of fraudulent behavior is not achieved by giving all bitcoins to the honest side but rather by forcing a channel close and distributing the funds according to the last agreed upon channel balance
Way less protocol overhead on the implementation side (only true if we eventually only support eltoo channels and don’t have to support both payment channels)
(Although I did not explain the details here) Fees for settling (closing) the channel do not need to be decided for while creating the payment channel but can be chosen a posteriori.
Possibility to easily create multi user payment channels (though all parties have to be online in order to update the channel balance which is said because otherwise we could easily create payment caves in wich thousands of participants would share the balance increasing liquidity in the network) also the transaction size increases (but still stays smaller than having several 2 party channels created among the participants). Christian Decker pointed out that with Schnorr Signatures there won’t be a difference in the size of the transactions for 2 party channels or n party channels. I see that the signature part of the transaction won’t grow but if more parties are on the channel more outputs are being created. (Maybe I miss something. I haven’t had the chance to look at Schnorr Signatures in detail.)
Since honest nodes can safely forget old channel state we should never run in memory issues while maintaining one channel.

disadvantages of eltoo:

It might incentivize people to try behave in a fraudulent way since there is only the risk of loosing fees involved.
SIGHASH_NOINPUT needs to be implemented in future bitcoin versions. Though the authors claim the protocol is backward compatible with bitcoin for the lightning network this seems like a typical IT mess. Probably it will take time for bitcoin and lightning network nodes to update. Obviously a routing network can be created with various channel management protcols as long as they support HTLCs and the internal API for routing. However it seems quite cumbersome to implement future features to support both eltoo and the classical lightning payment channels. One strategy could be to just implement splicing on top of eltoo or to implement it with a higher abstraction of channel management (not sure yet if that is possible)

~~Open~~ Questions:

What happens if we have an Integer overflow in the sequence number?

Which Christian Decker answered by stating that in case this really happens one can set back the counter by splicing or reanchoring (splice without changing the values).

If you are more interested in technical write ups about the technology of the lightning network you can follow me on twitter. If you are also an enthusiast of the lightning network you could connect to my lightning node via: 036f464b54416ea583dcfae3872d28516dbe85414ed838513b1c34fb3a4aee4e7a@144.76.235.20:9735 and help to grow the lightning network. Also I want to shout out a thanks to Christian Decker who has reviewed this article and pointed out some mistakes which have been fixed.

]]>

Improve the autopilot of bitcoin’s lightning network (Summary of the bar camp Session at the 2nd lightninghackday in Berlin)

Sun, 24 Jun 2018 22:02:55 +0000

I have been visiting Berlin to attend the second lightninghackday and want to give a brief wrap up about the event. This article will basically cover two topics. 1st as promised within my bar camp session on “Building an automated topology for autopilot features of the lightning network nodes” I will give an extensive protocol / summary of the session itself. 2nd I will talk about an already well known technique called splicing which I realized during the event might be one of the more important yet unimplemented features of lightning nodes.

Let me just make a quick shoutout to Jeff and his team from fulmo lightning: Thanks for organizing this excellent event. Especially bringing together such a group of high profile lightning network enthusiasts was astonishing for me. Out of the many tech events and conferences that I have attended in the past this was one of my top experiences.

In my bar camp session we had roughly 30 people attending. Luckily also Conner Fromknecht and Olaoluwa Osuntokun from the San Francisco based lightning labs joined the session and gave many insights resulting from their hands on experience. I started the session with a short presentation of my thoughts about the problem. I had previously formulated those in my github repo as a rough draft / scetch for a whitepaper on the topic. That happened after I opened Issue 677 in the LND Project criticizing the current design choices for the autopilot. The main points of those thoughts are a list of metrics and criteria I previously thought I would want to monitor and optimize in order to find a good network topology. Before the hackday that list looked like that. (We discussed this list and basically people agreed on it):

Diameter: A small diameter produces short paths for onion routing. Short paths are preferable because failiure is less likely to happen if less nodes are involved for routing a payment.
Channel balance: Channels should be properly funded but also the funds should be balanced to some degree.
Connectivity / Redundancy: Removing nodes (in particular strongly connected nodes) should not be a problem for the remaining nodes / connectivity of the network.
Uptime: It seems obvious that nodes with a high uptime are better candidates to open a channel to.
Blockchain Transactions: Realizing that the Blockchain only supports around 300k Transactions per day the opening, closing and updating of channels should be minimized.
Fees for routing: Maybe opening a channel (which is cost intensive) is cheaper overall.
Bandwith, Latency,…: nodes can only process a certain amount of routing requests. I assume that also the HTLCs will lock channels for a certain amount of time during onion routing.
Internet topology: obviously routing through the network becomes faster if the P2P network has a similar topology as the underlying physical network. Also it makes sense since even on the internet people might most of the time use products and services within their geographic region. check assumptions

Before I state the new ideas that came from the attendees and the discussion I want to briefly sum up the great keynote by the guys from lightning labs that preceded the bar camp session. In particular I think the insights and techniques (Atomic Multi Path routing and Splicing) they talked about have a huge impact on the autopilot and the topology generation problems of the lightning network. Long story short the magic concept in my opinion is splicing. For those that are unfamiliar with the topic: Splicing is the process of updating the channel balance of a payment channel by adding or removing (partial) funds. In the past I always thought that even though it was not implemented in any of the lightning nodes that this is a problem which technically is rather trivial to solve and thus of minor importance. The guys from lightning labs basically stated the same emphasizing that splicing funds out of a channel is trivial and can even be achieved easily in a non blocking way such that the channel can be used again directly after splicing out even if the spent transaction is not yet mined. However splicing in (aka adding more funds to a channel) seems to be a little bit more cumbersome. Without going into too many technical details the real eyeopener for me was the fact that splicing (together with the autopilot) seem to make lightning wallets in the way they exist these days obsolete. This obviously is a game changer. So to make it crystal clear:

With splicing and the autopilot implemented in a standard bitcoin wallet (and running in the background without the need for users to be aware of this) users can efficiently, quickly and cheaply send funds from one to another. If a path via lightning exists the transaction will be lightning fast. If no path can be found one could just splice out some funds from existing channels to create a new channel for the transaction. This would basically boil down to a common on chain transaction which happened before we had the lightning network anyway. However it doesn’t matter if all funds are currently frozen up in payment channels. Also it doesn’t waste the blockchain transaction but rather uses this opportunity to open create a funding transaction for the next channel increasing the size of the lightning network. I dare to say a bigger lightning network is a better lightning network in general. Basically the goal would be that eventually all bitcoin funds would be locked in some payment channels (which with splicing obviously doesn’t lower the control or flexibility of the user) In case a user really needs to do a transaction wich can’t be achieved via lightning it will just be spliced out and takes as much processing time as a regular on chain transaction. As a disclaimer: Obviously watchtowers are still needed in particular in this scenario in which users might not even realize they are using the lightning network.

Taking the opportunities of splicing into consideration I think that the topology problem of the autopilot becomes issue of only minor importance. One can easily splice out from existing payment channels to new payment channels if needed. The only bad thing is that such a transaction is not routed at lightning speed but rather takes a couple block times to be mined and processed. However it eventually creates a user generated network topology that hopefully pretty much follows actual flows of funds and would thus be rather robust. The only drawback with such a process would be that transactions frequently include channel creations which takes some time and that only a maximum of 300k channels can be altered per day on top of the current bitcoin protocol. This observation explains why topology generation of the autopilot still is a reasonable topic to think about since it should still help to move traffic away from the blockchain.

Finally I will now list some new thoughts that have been collected during the session. I will also update my whitepaper soon. Feel free to fork me on github and do a pull request in order to fix mistakes or add your own thoughts.

Number of nodes reachable within x hops:
It was pointed out that his metric would look quite nice. As a result of the discussion we came to realize that this greedy heuristic would basically lead to the scenario in which every node would open a channel to the most central node in the channel. such a central node would increase the number of nodes that can be reached within x hopes by the maximum amount. Still it looks like an important number to somehow optimize for.

Honesty and well behavior of nodes:
Following a similar philosophy we discussed weather a distributed topology creation algorithm should aim for global health of the network in comparison for greedy strategies in which every node tries to optimize their own view of the network. Though it wasn’t pointed out in the session I think that a strategy where every node tries to optimize their own access to the network will at some point yield a Nash equilibrium which. With my rather little understanding of game theory I think this might not necessarily be the best solution from a global perspective. Also we discussed that in the later mentioned sections where clients share information with neighbors an algorithm must be robust against fraudulent behavior or lying of nodes.

Stakeholder models:
Pretty much everyone agreed right away that different nodes might have different needs for the lightning network. so the topology creation should be configurable (or learnable by the node) taking into respect weather the node is just a casual consumer or a shop or a bank / exchange or …

Privacy vs information sharing:
We discussed quite extensively that for a distributed algorithm to make predictions which channels should be created it would be great if channel balances would be public (or at least there would be some rough information available about the distribution of the balance within one channel). We realized that as a first step following the ideas of lnd Issue 1253 nodes should start collecting historic information about their own operations and routing acticities. Actually I just saw that a pull request that claims to have resolved issue 1253 already exists.
We also realized that channel fees might act as a reasonable well proxy for the channel balance. Assume Alice and Bob have a channel and the balance is very skew in the sense that Alice has almost no funds and Bob has all of them. If Bob was asked to route a payment through that channel he would probably charge a smaller fee than Alice if she was asked to route a payment through her almost dried up channel.

Still the idea circulated around that nodes could share their channel balances with neighbors in a similar fashion how routing information in the IP network are shared with neighbors. In such a way eventually a map of paths would be constructed for each node.

A point mentioned – that in my oppionion is important but only somewhat related to these issues – was the fact that of course nodes should take higher routing fees if the timelock of the routing request is high since in the worst case this makes a channel or many other paths unusable for quite some time maybe even without a payment taking place. As a side node I just realize that this could be a more sophisticated strategy for nodes to estimate their fees if they are aware of the number of routing paths their channel makes possible.

Some technical details:
One could use the number of disjoint paths between nodes as a good criteria since it also enables heavy use of atomic multi path transactions. Also it was mentioned that one could look at the AS-number of the underlying internet hosts.

Why not using machine Learning instead of trying to find some smart algorithm / heuristics?
For me the most surprising idea was the fact that this entire autopilot problem could easily be transferd into a machine learning problem. There are some obvious flaws because single nodes might not have enough training data and if one has enough data sharing the model would probably also not work out of the box. So we discussed a little bit if that would be a use case for transfer learning. Here I will not dig deeper into the topic, since the article is already quite long. But working in the field of machine learning and being a data scientist and having not even put the slightest thought about this idea before the event took place was a little bit shocking and surprising for me.

Anyway I hope my summary of the session will be useful for you and the community! My personal roadmap now consists of four things.

I am thinking to add a splicing protocol specification to the BOLT (lightning-rfc)
I want to get running with go-lang and the codebase of lnd in order to be able to do some hands on experiments.
I plan to update the very rough draft of the white paper.
Finally I will hopefully find the time to hack a litte python script that does a simulation of how my above described splicing strategy would create a lightning network wich is able to route most payment requests. If you want to be update just follow me on twitter where I will inform you once I am done. Also feel free to leave a comment with more ideas or extend the draft of the white paper. I would love join forces working on this topic.

Also kudos to Susette Bader who took this really nice snapshot and cover image of this post while we have been hacking.

]]>

Extracting 2 social network graphs from the Democratic National Committee Email Corpus on Wikileaks

Thu, 28 Jul 2016 12:15:05 +0000

tl,dr verion: Source code at github!
A couple of days ago a data set was released on Wikileaks consisting of about 23 thousand emails sent within the Democratic National Committee that would demonstrate how the DNC was actively trying to prevent Bernie Sanders from being the democratic candidate for the General public election. I am interested in who are the people with a lot of influence so I decided to have a closer look at the data.
Yesterday I crawled the dataset and processed it. I extracted two graphs in the Konect format. Since I am not sure if I am legally allowed to publish the processed data sets I will only link to the source code so you can generate the data sets yourself, if you don’t know how to run the code but need the information drop me a mail. I Also hope that Jérôme Kunegis will do an analysis of the networks and include them to Konect.

First we have the temporal graph

This graph consists of 39338 edges. There is a directed edge for each email sent from one person to another person and a timestamp when this has happened. If a person puts n recipients in CC there will be n edges added to the graph.
rpickhardt$ wc -l temporalGraph.tsv 39338 temporalGraph.tsv rpickhardt$ head -5 temporalGraph.tsv GardeM@dnc.org DavisM@dnc.org 1 17 May 2016 19:51:22 ShapiroA@dnc.org KaplanJ@dnc.org 1 4 May 2016 06:58:23 JacquelynLopez@perkinscoie.com EMail-Vetting_D@dnc.org 1 13 May 2016 21:27:16 JacquelynLopez@perkinscoie.com LykinsT@dnc.org 1 13 May 2016 21:27:16 JacquelynLopez@perkinscoie.com ReifE@dnc.org 1 13 May 2016 21:27:16
clearly the format is: sender TAB receiver TAB 1 TAB date
The data is currently not sorted by the fourth column but this can easily be done. Clearly an email network is directed and can have multi edges.

Second we have the weighted co-recipient network

Looking at the data I have discovered that many mails have more than one recipient so I thought it would be nice to see the social network structure by looking at how often two people occur in the recipient list for an email. This can reveal a lot about the social network structure of the DNC.
rpickhardt$ wc -l weightedCCGraph.tsv 20864 weightedCCGraph.tsv rpickhardt$ head -5 weightedCCGraph.tsv PaustenbachM@dnc.org MirandaL@dnc.org 848 MirandaL@dnc.org PaustenbachM@dnc.org 848 WalkerE@dnc.org PaustenbachM@dnc.org 624 PaustenbachM@dnc.org WalkerE@dnc.org 624 WalkerE@dnc.org MirandaL@dnc.org 596
clearly the format is: recipient1 TAB recipient2 TAB count
where count counts how ofthen recipient1 and recipient2 have been together in mails

Simple statistics

There have been

1226 senders
1384 recipients
2030 people

included in the mails. In total I found 1226 different senders and 1384 different receivers. The top 7 Senders are:
MirandaL@dnc.org 1482 ComerS@dnc.org 1449 ParrishD@dnc.org 750 DNCPress@dnc.org 745 PaustenbachM@dnc.org 608 KaplanJ@dnc.org 600 ManriquezP@dnc.org 567
And the top 7 recievers are:
MirandaL@dnc.org 2951 Comm_D@dnc.org 2439 ComerS@dnc.org 1841 PaustenbachM@dnc.org 1550 KaplanJ@dnc.org 1457 WalkerE@dnc.org 1110 kaplanj@dnc.org 987
As you can see kaplanj@dnc.org and KaplanJ@dnc.org occur in the data set so as I mention in the Roadmap section at the end of the article more clean up of data might be necessary to get a more precise picture.
Still on a first glimse the data looks pretty natural. In the following I provide a diagram showing the rank frequency plot of senders and recievers. One can see that some people are way more active then other people. Also the recipient curve is above the sender curve which makes sense since every mail has one sender but at least 1 reciepient.

Also you can see the rank co-occurence count diagram of the co-occurence network. This when the ranks are above 2000 the standard network structure picture changes a little bit. I have no plausible explaination for this. Maybe this is due to the fact that the data dump is not complete. Still I find the data looks pretty natrual to me so further investigations might make sense.

Code

The crawler code is a two-liner. just some wget and sleep magic
The python code for processing the mails builds upon the python email library by Alain Spineux which is released under the LGPL license. My Code on top is released under GPLv3 and can be found on github.

Roadmap

Use the Generalized Language Model Toolkit to build Language Models on the data
Compare with the social graph from twitter – many email addresses or at least names will be linked to twitter accounts. Comparing the Twitter network with the email network might reveal the differences in internal and external communication
Improve Quality of data i.e. better clean up of the data. Sometimes people in the recipient list have more than one email address. Currently they are treated as two different people. On the other hand sometimes mail addresses are missing and just names are included. These could probably be inferred from the other mail addresses. Also names in this case serve as uniq identifiers. So if two different people are called ‘Bob’ they become one person in the dataset.

]]>

Aurelius Titan graph enables realtime querying with 2400 concurrent users on a distributed graph database!

Wed, 21 Aug 2013 10:43:02 +0000

Sorry to start with a conclusion first… To me Titan graph seems to be the egg-laying wool-milk-sow that people would dream of when working with graph data. Especially if one needs graph data in a web context and in real time. I will certainly try to free some time to check this out and get hands on. Also this thing is really new and revolutionary. This is not just another Hadoop or Giraph approach for big data processing this is distributed in real time! I am almost confident if the benchmark hold what it promised Titan will be one of the fastest growing technologies we have seen so far.

I met Matthias Bröcheler (CTO of Aurelius the company behind titan graph) 4 years ago in a teaching situation for the German national student high school academy. It was the time when I was still more mathematician than computer scientist but my journey in becoming a computer scientist had just started. Matthias was in the middle of his PhD program and I valued his insights and experiences a lot. It was for him that my eyes got open for the first time about what big data means and how companies like facebook, google and so on knit their business model around collecting data. Anyway Matthias influenced me in quite some way and I have a lot of respect of him.
I did not start my PhD right away and we lost contact. I knew he was interested in graphs but that was about it. First when I started to use neo4j more and more I realized that Matthias was also one of the authors of the tinkerpop blueprints which are interfaces to talk to graphs which most vendors of graph data bases use. At that time I looked him up again and I realized he was working on titan graph a distributed graph data base. I found this promising looking slide deck:

Slide 106:

Slide 107:

But at that time for me there wasn’t much evidence that Titan would really hold the promise that is given in slides 106 and 107. In fact those goals seemed as crazy and unreachable as my former PhD proposal on distributed graph databases (By the way: Reading the PhD Proposal now I am kind of amused since I did not really aim for the important and big points like Titan did.)
During the redesign phase of metalcon we started playing around with HBase to support the architecture of our like button and especially to be able to integrate this with recommendations coming from mahout. I started to realize the big fundamental differences between HBase (Implementation of Google Bigtable) and Cassandra (Implementation of Amazon Dynamo) which result from the CAP theorem about distributed systems. Looking around for information about distributed storage engines I stumbled again on titan and seeing Matthias’ talk on the Cassandra summit 2013. Around minute 21 / 22 the talk is getting really interesting. I can also suggest to skip the first 15 minutes of the talk:

Let me sum up the amazing parts of the talk:

2400 concurrent users against a graph cluster!
real time!
16 different (non trivial queries) queries
achieving more than 10k requests answered per second!
graph with more than a billion nodes!
graph partitioning is plugable
graph schema helps indexing for queries

So far I was not sure what kind of queries were really involved. Especially if there where also write transactions and unfortunately no one in the audience asked that question. So I started googleing and found this blog post by aurelius. As we can see there is an entire overview on the queries and much more detailed the results are presented. Unfortunately I was not able to find the source code of that very benchmark (which Matthias promised to open in his talk). On Average most queries take less than half a second.

Even though the source code is not available this talk together with the Aurelius blog post looks to me like the most interesting and hottest piece of technology I came across during my PhD program. Aurelius started to think distributed right away and made some clever design decisions:

Scaling data size
scaling data access in terms of concurrent users (especially write operations) is fundamentally integrated and seems also to be successful integrated.
making partitioning pluggable
requiring an schema for the graph (to enable efficient indexing)
being able on runtime to extend the schema.
building on top of ether Cassandra (for realtime) or HBase for consistency
being compatible with the tinkerpop techstack
bringing up an entire framework for analytics and graph processing.

Further resources:

https://github.com/thinkaurelius/titan
https://github.com/renepickhardt/metalcon/wiki/Technologytitan (our wiki page in metalcon where we will collect more information about titan.)
There is also a screencast by Marko available on how to set up titan on an amazon cluster and querying music brainz rdf data:

]]>

Analyzing the final and intermediate results of the iversity MOOC Fellowship online voting

Thu, 23 May 2013 23:07:24 +0000

As writen before Steffen and I participated in the online voting for the MOOC fellowship. Today the competition finished and I would like to say thank you to everyone who so far participated in the voting in particular to the 435 people supporting our course. I did never image to get that many people to be interested in our course!
The voting period went from May first till today. During this period the user interface of the iversity website changed several times providing different kind of information about the voting to us users. Since I have observed a drastic change in rankings on May 9th and since the process and scores have not been very transparent I have decided on that very day to collect some data about the rankings. I already did some quick analysis on the data and found some interesting facts but I am running out of time right now to conduct an extensive data analysis. So I will share the data set with the public domain:
http://rene-pickhardt.de/mooc.tar.bz2 (33MB)
If you download the zip file and extract it you’ll find folders for every hour after May 9th. In every folder you will find 26 html-files representing the current ranking of the courses at that time and a transaction log of the http-requests which were done to download the 26 html files. There are 26 html files since 10 courses were displayed per page and we had 255 courses participating.
During the time of data collection I had 2 or 3 short down times of my web server so it could be possible that some data points are missing.
I already wrote a “dirty hack” and pushed it on github which also extracts the interesting information out of the downloaded html files.

There is a file rank.tsv (334 kb) that displays for every course on an hourly basis the rankings
There is a file vote.tsv (113 kb) that contains for every course on an hourly basis (between may 20th and today) the number of votes the course did acquire. The period of time for vote.tsv is so short since the votes have only been available in the html files during this time.

Skimming the data with my eyes there are already some facts that make me very curious for a deeper data analysis:

Some courses gained several hundred votes within a short period of time (usually only 2 or 3 hours) whereas most courses (especially those gaining such a large amount of votes) often stayed far under 1000 votes at all.
Also it is interesting to see how much variation has been going on in the last couple of days.
Also I haven’t crawled the views of the Youtube videos of the courses and even now after observing the following I did not take a snapshot of them it is interesting that there is such a large difference in conversion rate. Especially the top courses seem to have much more votes than they have views of the application video. Where some really high class and outstanding applications like the ones from Chrstian Spannagel (Math) or Oliver Vornberger (Algorithms and data structures) have two or three times as many views on Youtube as votes. Especially they have about the same amount of views on Youtube as the top voted courses.

I am pretty sure there are some more interesting facts and maybe someone else has collected a better data set over the complete periode of time and including Youtube snapshots as well as Facebook and Twitter mentions.
Since I have been asked several times already: here are the final rankings to download and also as a table in the blog post:

	Kursname	Anzahl an votes
1	sectio chirurgica anatomie interaktiv	8013
2	internationales agrarmanagement 2	7557
3	ingenieurmathematik fur jedermann	2669
4	harry potter and issues in international politics	2510
5	online surgery	2365
6	l3t s mooc der offene online kurs uber das lernen und lehren mit technologien	2270
7	design 101 or design basics 2	2216
8	einfuhrung in das sozial und gesundheitswesen sozialraume entdecken und entwickeln	2124
9	changeprojekte planen nachhaltige entwicklung durch social entrepreneurship	2083
10	social work open online course swooc14	2059
11	understanding sustainability environmental problems collective action and institutions	1912
12	the dance of functional programming languaging with haskell and python	1730
13	zyklenbasierte grundung systematische entwicklung von geschaftskonzepten	1698
14	a virtual living lab course for sustainable housing and lifestyle	1682
15	family politics domestic life revolution and dictatorships between 1900 1950	1476
16	h2o extrem	1307
17	dark matter in galaxies the last mystery	1261
18	algorithmen und datenstrukturen	1207
19	psychology of judgment and decision making	1168
20	the future of storytelling	1164
21	web engineering	1152
22	die autoritat der wissenschaften eine einfuhrung in das wissenschaftstheoretische denken 2	1143
23	magic and logic of music a comprehensive course on the foundations of music and its place in life	1138
24	nmooc nachhaltigkeit fur alle	1130
25	sovereign bond pricing	1115
26	soziale arbeit eine einfuhrung	1034
27	mathematische denk und arbeitsweisen in geometrie und arithmetik	1016
28	social entrepreneurship wir machen gesellschaftlichen wandel moglich	1010
29	molecular gastronomy an experimental lecture about food food processing and a bit of physiology	984
30	fundamentals of remote sensing for earth observation	920
31	kompetenzkurs ernahrungswissenschaft	891
32	erfolgreich studieren	879
33	deciphering ancient texts in the digital age	868
34	qualitative methods	861
35	karl der grosse pater europae	855
36	who am i mind consciousness and body between science and philosophy	837
37	programmieren mit java	835
38	systemisches projektmanagement	811
39	lernen ist sexy	764
40	modelling and simulation using matlab one mooc more brains an interdisciplinary course not just for experts	760
41	suchmaschinen verstehen	712
42	hands on course on embedded computing systems with raspberry pi	679
43	introduction to mixed methods and doing research online	676
44	game ai	649
45	game theory and experimental economic research	633
46	cooperative innovation	613
47	blue engineering ingenieurinnen und ingenieure mit sozialer und okologischer verantwortung	612
48	my car the unkown technical being	612
49	gesundheit ein besonderes gut eine multidisziplinare erkundung des deutschen gesundheitssystems	608
50	teaching english as a foreign language tefl part i pronunciation	597
51	wie kann lesen gelernt gelehrt und gefordert werden lesesozialisation lesedidaktik und leseforderung vom grundschulunterricht bis zur erwachsenenbildung	593
52	the european dream	576
53	education of the present what is the future of education	570
54	faszination kristalle und symmetrie	561
55	italy today a girlfriend in a coma a walk through today s italy	557
56	dna from structure to therapy	556
57	grundlagen der mensch computer interaktion	549
58	malnutrition in developing countries	548
59	marketing als strategischer erfolgsfaktor von der produktinnovation bis zur kundenbindung	540
60	environmental ethics for scientists	540
61	stem cells in biology and medicine	528
62	praxiswissen fur den kunstlerischen alltagsdschungel	509
63	physikvision	506
64	high five evidence based practice	505
65	future climate water	484
66	diversity and communication challenges for integration and mobility	477
67	social entrepreneurship	469
68	die kunst des argumentierens	466
69	der hont feat mit dem farat wek wie kinder schreiben und lesen lernen	455
70	antikrastination moocen gegen chronisches aufschieben	454
71	exercise for a healthier life	454
72	the startup source code	438
73	web science	435
74	medizinische immunologie	433
75	governance in and through human rights	431
76	europe in the world law and policy aspects of the eu in global governance	419
77	komplexe welt strukturen selbstorganisation und chaos	419
78	mooc basics of surgery want to become a real surgeon	416
79	statistical data analysis for the humanities	414
80	business math r edux	406
81	analyzing behavioral dynamics non linear approaches to social and cognitive sciences	402
82	space technology	397
83	der erzahler materialitat und virtualitat vom mittelalter bis zur gegenwart	396
84	kriminologie	395
85	von e mail skype und xing kommunikation fuhrung und berufliche zusammenarbeit im netz	394
86	wissenschaft erzahlen das phanomen der grenze	392
87	nachhaltige entwicklung	389
88	die nachste gesellschaft gesellschaft unter bedingungen der elektrizitat des computers und des internets	388
89	die grundrechte	376
90	medienbildung und mediendidaktik grundbegriffe und praxis	368
91	bubbles everywhere speculative bubbles in financial markets and in everyday life	364
92	the heart of creativity	363
93	physik und weltraum	358
94	sim suchmaschinenimplementierung als mooc	354
95	order of magnitude physics from atomic nuclei to the universe	350
96	entwurfsmethodik eingebetteter systeme	343
97	monte carlo methods in finance	335
98	texte professionell mit latex erstellen	331
99	wissenschaftlich arbeiten wissenschaftlich schreiben	330
100	e x cite join the game of social research	330
101	forschungsmethoden	323
102	complex problem solving	321
103	programmieren lernen mit effekt	317
104	molecular devices and machines	317
105	wie man erfolgreich ein startup aufbaut	315
106	grundlagen der prozeduralen und objektorientierten programmierung	314
107	introduction to disability studies	314
108	eu2c the european union explained by two partners cologne and cife	313
109	the english language a linguistic introduction 2	311
110	allgemeine betriebswirtschaftslehre	293
111	interaction design open design	293
112	how we learn nowadays possibilities and difficulties	288
113	foundations of educational technology	288
114	projektmanagement und designbasiertes lernen	281
115	human rights	278
116	kompetenz des horens technische gehorbildung	278
117	it infrastructure management	276
118	a media history in 10 artefacts	274
119	introduction to the practice of statistics and regression	271
120	what is a good society introduction to social philosophy	268
121	modellierungsmethoden in der wirtschaftsinformatik	265
122	objektorientierte programmierung von web anwendungen von anfang an	262
123	intercultural diversity networking vielfalt interkulturell vernetzen	260
124	foundations of entrepreneurship	259
125	business communication for impact and results	257
126	gamification	257
127	creativity and design in innovation management	256
128	mechanik i	252
129	global virtual project management	252
130	digital signal processing for everyone	249
131	kompetenzen fur klimaschutz anpassung	248
132	digital economy and social innovation	246
133	synthetic biology	245
134	english phonetics and phonology	245
135	leibspeisen nahrung im wandel der zeiten molekule brot kase fleisch schokolade und andere lebensmittel	243
136	critical decision making in the contemporary globalized world	238
137	einfuhrung in die allgemeine betriebswirtschaftslehre schwerpunkt organisation personalmanagement und unternehmensfuhrung	236
138	didaktisches design	235
139	an invitation to complex analysis	235
140	grundlagen der programmierung teil 1	234
141	allgemein und viszeralchirurgie	233
142	mathematik 1 fur ingenieure	231
143	consumption and identity you are what you buy	231
144	vampire fictions	230
145	grundlagen der anasthesiologie	228
146	marketing strategy and brand management	227
147	political economy an introduction	225
148	gesundheit	221
149	object oriented databases	219
150	lebenswelten perspektiven fur menschen mit demenz	217
151	applications of graphs to real life problems	210
152	introduction to epidemiology epimooc	207
153	network security	207
154	global civics	207
155	wissenschaftliches arbeiten	204
156	annaherungen an zukunfte wie lassen sich mogliche wahrscheinliche und wunschbare zukunfte bestimmen	202
157	einstieg wissenschaft	200
158	engineering english	199
159	das erklaren erklaren wie infografik klart erklart und wissen vermittelt	198
160	betriebswirtschaftliche und rechtliche grundlagen fur das nonprofit management	192
161	art and mathematics	191
162	vom phanomen zum modell mathematische modellierung von natur und alltag an ausgewahlten beispielen	190
163	design interaktiver medien technische grundlagen	189
164	business englisch	187
165	erziehung sehen analysieren gestalten	184
166	basic clinical research methods	184
167	ordinary differential equations and laplace transforms	180
168	mathematische logik	179
169	die geburt der materie in der evolution des universums	179
170	innovationsmanagement von kleinen und mittelstandischen unternehmen kmu	176
171	introduction to qualitative methods in the social sciences	175
172	advert retard wirkung industrieller interessen auf rationale arzneimitteltherapie	175
173	animation beyond the bouncing ball	174
174	entropie einfuhrung in die physikalische chemie	172
175	edufutur education for a sustainable future	165
176	social network effects on everyday life	164
177	pharmaskills for africa	163
178	nachhaltige energiewirtschaft	162
179	qualitat in der fruhpadagogik auf den anfang kommt es an	158
180	dementias	157
181	beyond armed confrontation multidisciplinary approaches and challenges from colombia s conflict	154
182	investition und finanzierung	150
183	praxis des wissensmanagements	149
184	gutenberg to google the social construction of the communciations revolution	145
185	value innovation and blue oceans	145
186	kontrapunkt	144
187	shakespeare s politics	142
188	jetzt erst recht wissen schaffen uber recht	141
189	rechtliche probleme von sozialen netzwerken	138
190	augmented tuesday suppers	137
191	positive padagogik	137
192	digital storytelling mit bewegenden bildern erzahlen	136
193	wirtschaftsethik	134
194	energieeffizientes bauen	134
195	advising startups	133
196	urban design and communication	133
197	bildungsreform 2 0	132
198	mooc management basics	130
199	healthy teeth a life long course of preventive dentistry	129
200	digitales tourismus marketing	127
201	the arctic game the struggle for control over the melting ice	127
202	disease mechanisms	127
203	special operations from raids to drones	125
204	introduction to geospatial technology	120
205	social media marketing strategy smms	119
206	korpusbasierte analyse sprechsprachlichen problemlosungsverhaltens	116
207	introduction to marketing	115
208	creative coding	114
209	mooc meets 3d	110
210	unternehmenswert die einzig sinnvolle spitzenkennzahl fur unternehmen	110
211	forming behaviour gestaltung und konzeption von web applications	109
212	technology demonstration	108
213	lebensmittelmikrobiologie und hygiene	105
214	estudi erfolgreich studieren mit dem internet	105
215	moderne geldtheorie eine paische perspektive	103
216	kollektive intelligenz	103
217	geschichte der optischen medien	100
218	alter und soziale arbeit	99
219	semantik eine theorie visueller kommunikation	97
220	erziehung und beratung in familie und schule	96
221	foreign language learning in indian context	95
222	bildgebende verfahren	92
223	applied biology	92
224	bildung in der wissensgesellschaft gerechtigkeit	92
225	standortmanagement	92
226	europe a solution from history	90
227	methodology of research in international law	90
228	when african americans came to paris	90
229	contemporary architecture	89
230	past recent encounters turkey and germany	88
231	wars to end all wars	83
232	online learning management systems	82
233	software applications	81
234	business in germany	78
235	requirements engineering	77
236	anything relationship management xrm	77
237	global standards and local practices	76
238	prodima professionalisation of disaster medicine and management	75
239	cytology with a virtual correlative light and electron microscope	75
240	the organisation of innovation	75
241	sensors for all	75
242	diagnostik in der beruflichen bildung	73
243	scientific working	71
244	escience saxony lectures	71
245	internet marketing strategy how to gain influence and spread your message online	69
246	grundlagen des e business	69
247	principles of public health	64
248	methods for shear wave velocity measurements in urban areas	64
249	democracy in america	64
250	building typology studies gebaudelehre	63
251	multi media based learning environments at the interface of science and practice hamburg university of applied sciences prof dr andrea berger klein	61
252	math mooc challenge	60
253	the value of the social	58
254	dienstleistungsmanagement und informationssysteme	57
255	ict integration in education systems e readiness e integration e transformation	56

]]>

Experiences on semantifying a Mediawiki for the biggest recource about Chinese rock music: rockinchina .com

Mon, 07 Jan 2013 09:38:45 +0000

During my trip in China I was visiting Beijing on two weekends and Maceau on another weekend. These trips have been mainly motivated to meet old friends. Especially the heads behind the biggest English resource of Chinese Rock music Rock in China who are Max-Leonhard von Schaper and the founder of the biggest Chinese Rock Print Magazin Yang Yu. After looking at their wiki which is pure gold in terms of content but consists mainly of plain text I introduced them the idea of putting semantics inside the project. While consulting them a little bit and pointing them to the right resources Max did basically the entire work (by taking a one month holiday from his job. Boy this is passion!).
I am very happy to anounce that the data of rock in china is published as linked open data and the process of semantifying the website is in great shape. In the following you can read about Max experiences doing the work. This is particularly interesting because Max has no scientific background in semantic technologies. So we can learn a lot on how to improve these technologies to be ready to be used by everybody:

Max report on semantifying

Max-Leonhard von Schaper in Beijing.

To summarize, for a non-scientific greenhorn experimenting with semantic mediawiki and the semantic data principle in general, a good two months were required to bring our system to the point where it is today. As easy as it seems in the beginning, there is still a lot of manual coding and changing to be done as well as trial-and-error to understand how the new system is working.
Apart from the great learning experience and availability of our data in RDF format, our own website expanded in the process by ~20% of content pages (from 4000 to above 5000), adding over 10000 real property triplets and gaining an additional 300 thousand pageviews.
Lessons learnt in a comprised way:

DBPedia resources are to be linked with “resources” in the URI not with “page”
SMW requires the pre-fix “foaf:” or “mo:” or something else for EACH imported property
Check the Special:ExportRDF early to see if your properties work
Properties / Predicates , no difference with SMW
How to get data to freebase depends on the backlinks and sameas to other ontologies as well as entering data in semantic search engines
Forms for user data entry are very important!
As a non-scientific person without feedback I would not have been able to implement that.
DBPedia and music ontology ARE not interlinked with SAMEAS (as checked on sameas.org).
Factbox only works with the standard skin (monoskin). For other skins one has to include it in the PHP code oneself.

Main article

The online wiki Rock in China has been online for a number of years and focusses on Chinese underground music. Prior to starting implementing Semantic Mediawikia our wiki had roughly 4000 content pages with over 1800 artists and 900 records. We used a number of templates for bands, CDs, venues and labels, but apart from using numerous categories and the DynamicPageList extension for a few joints, we were not able to tangibly use the available data.
DPL example for JOINT between two Wikipedia Categories:


category = Metal Artists
category = Beijing Artists
mode     = ricstyle
order  = ascending

Results of a simple mashup query: display venues in beijing on a Google Map

After having had an interesting discussion with Rene on the benefits of semantic data and Open Linked Data, we decided to go Semantic. As total greenhorns to the field and with only limited programming skills timely available, we started off googeling the respective key terms and quickly enough came to the websites of the Music Ontology and the Semantic Mediawiki, which we decided to install.
Being an electrical engineer with basic IT backgrounds and many years of working on the web in PHP, HTML, Joomla or Mediawiki, it was still a challenge to get used to the new semantic way of talking and understanding the principles behind. Not so much because there might not be enough tutorials or data information out in the web, but because the guiding principle is somewhere but not where I was looking. Without the help of Rene and several feedback discussions I don’t it would have been possible for us to implement this system within the month that it took us.
Our first difficulty (after getting the extension on our FTP server) was to upgrade our existing Mediawiki from version 1.16 to version 1.19. An upgrade that used up the better part of two days, including updating all other extensions as well (with five of them not working anymore at all, as they are not being further developed) and finally getting our first Semantic Property running.
Upon starting of implementing the semantic approach, I read a lot online on the various ontologies available and intensively checked the Music Ontology. However Music Ontology is by far the wrong use case for our wiki, as Music Ontology is going more into the musical creation process and Rock in China is describing the scene developments. All our implementations were tracked on the wiki page Rock in China – Semantic Approach for other team members to understand the current process and to document workarounds and problems.
Our first test class had been Venue, a category in which we had 40 – 50 live houses of China with various level of data depth that we could put into the following template SemanticVenue:

{{SemanticVenue
|Image=
|ImageDescription=
|City=
|Address=
|Phone=
|Opened=
|Closed=
|GeoLocation=
}}

As can be seen from the above template both predicates (City) and properties (Opened) are being proposed for the semantic class VENUE. Semantic Mediawiki is implementing this decisive difference in a very user-friendly way by setting the TYPE of each SMW property to either PAGE or something else. As good as this is, it somehow confuses if one is talking with someone else about the semantic concept in principle.
A major problem had been the implementation of external ontologies which was not sufficiently documented on the semantic mediawiki page, most probably due to a change in versioning. Especially the cross-referencing to the URI was a major problem. As per Semantic Mediawiki documentation, aliases would be allowed, however with trial and error, it was revealed that only a property with a domain prefix, e.g. foaf:phone or owl:sameas would be correctly recognized. We used the Special:RDFExport function to find most of these errors, everytime our URI referencing was wrong, we would get a parser function error.
First, the wrong way for the following two wiki pages:

Mediawiki:smw_import_mo
Property:genre

Mediawiki:smw_import_mo:

http://purl.org/ontology/mo/ |[http://musicontology.com/ Music Ontology Specification]
activity_end|Type:Date
activity_start|Type:Date
MusicArtist|Category
genre|Type:Page
Genre|Category
track|Type:String
media_type|Type:String
publisher|Type:Page
origin|Type:Page
lyrics|Type:Text
free_download|Type:URL

Property:genre:

[[Has type::Page]][[Imported from::mo:genre]]

And now the correct way how it should be actually implemented to work:
Mediawiki:smw_import_mo:

http://purl.org/ontology/mo/|[http://musicontology.com/ Music Ontology Specification]
activity_end|Type:Date
activity_start|Type:Date
MusicArtist|Category
genre|Type:Page
Genre|Category
track|Type:String
media_type|Type:String
publisher|Type:Page
origin|Type:Page
lyrics|Type:Text
free_download|Type:URL

Property:mo:genre:

[[Has type::Page]][[Imported from::mo:genre]]

The ontology with most problems was the dbpedia, which documentation did not tell us what the correct URI was. Luckily the mailing list provided support and we got to know which the correct URI was:

http://www.dbpedia.org/ontology/

Being provided that, we were able to implement a number of semantic properties for a number of classes and start updating our wiki pages to get the data on our semantic database.
To utilize semantic properties within a wiki, there is a number of extensions available, such as Semantic Forms, Semantic Result Formats and Semantic Maps. The benefits we were able to gain were tremendous. For example the original JOINT query that we had been running at the beginning of the blog post with DPL was now able to be utilized with the following ASK query:

{{#ask: [[Category:Artists]] [[mo:origin:Beijing]]
|format=list
}}

However with the major benefit that the extension would NOT be broken after setting the inline query within a page. Dynamic Page List breaks the , rendering a lot of information lost. Other examples of how we benefitted from semantics is that previously we were only able to use Categories and read information of joining one or two categories, e.g. Artist pages that were both categorized as BEIJING artists and METAL artists. However now, with semantic properties, we had a lot of more data to play around with and could create mashup pages such as ROCK or Category:Records on which we were able to implement random videos from any ROCK artists or on which we were able to include a TIMELINE view of released records.

Mashup Page with a suitable video

With the help of the mailing list of Semantic Mediawiki itself (which was of great help when we were struggling) we implemented inline queries using templates to avoid later data changes on multiple pages. That step taken, the basic semantic structures were set up at our wiki and it was time for our next step: Bringing the semantic data of our wiki to others!
And here we are, asking ourselves: How will Freebase or DBpedia actually find our data? How will they include it? Discussing this with Rene a few structural problems became apparent. Being used to work with Wikipedia we usually set the property same:

Owl:sameas (or sameas)

On various of our pages directly to Wikipedia pages.
However we learnt that the property

foaf:primaryTopic

is a much better and accurate property for this. The sameas property should be used for semantic RDF pages, i.e. the respective DBPedia RESOURCE page (not the PAGE page). Luckily we already implemented the sameas property mostly in templates, so it was easy enough to exchange the properties.
Having figured out this issue, we checked out both the freebase page as well as other pages, such as DBpedia or musicbrainz, but there seems to be no “submit RDF” form. Hence we decided that the best way for getting recognized in the Semantic Web is to include more links to other RDF resources, e.g. for our Category:Artists we set sameas links to dbpedia and music ontology. For dbpedia we linked to the class and for music ontology to the URI for the class.
Note on the side here, when checking on sameas.org, it seems that music ontology is NOT cross-linked to dbpedia so far.
Following the recommendations set forth at Sindice, we changed our robots.txt to include our semantic sitemap(s):

Sitemap: http://www.music-china.org/wiki/index.php?title=Special:RecentChanges&feed=atom
Sitemap: http://www.rockinchina.com/wiki/index.php?title=Special:RecentChanges&feed=atom

Going the next step we analyzed how we can include external data on our SMW, e.g. from musicbrainz or from youtube. Being a music-oriented page especially Youtube was of particular interest for us. We found the SMW extension External Data that we could use to connect with the Google API:

{{#get_web_data:
url=https://www.googleapis.com/youtube/v3/search?part=snippet&q=carsick+cars&topicId=%2Fm%2F03cmgbv&type=video&key=Googlev3API&maxResults=50
|format=JSON
|data= videoId=videoId,title=title
}}

And

{{#for_external_table:
{{Youtube|ID={{{videoId}}}|title={{{title}}} }}

{{{videoId}}} and {{{title}}}

}}

See our internal TESTPAGE for the live example.
Youtube is using its in-house Freebase ID system to generate auto-channels filled with official music videos of bands and singers. The Freebase ID can be found on the individual freebase RESOURCE page after pressing the EDIT button. Alternatively one could use the Google API to receive the ID, but would need a Youtube internal HC ID prior to that. Easy implementation for our wiki: Include the FreebaseID as semantic property on artist pages within our definitions template:

{{Definitions
|wikipedia=
|dbpedia=
|freebase=
|freebaseID=
|musicbrainz=
|youtubeautochannel=
}}

Voila, with the additional SQL-based caching of request queries (e.g. JSON) our API load on Google is extremely low as well as increasing speed for loading a page at our wiki. Using this method we were able to increase our saved YOUTUBE id tags from the original 500 to way over 1000 within half a day.

A big variety of videos for an act like carsick cars is now available thanks to semantifying

With these structures in place it was time to inform the people in our community not only on the changes that have been made but also on the additional benefits and possibilities. We used our own blog as well as our Facebook page and Facebook group to spread the word.

]]>

Get the full neo4j power by using the Core Java API for traversing your Graph data base instead of Cypher Query Language

Tue, 06 Nov 2012 11:55:02 +0000

As I said yesterday I have been busy over the last months producing content so here you go. For related work we are most likely to use neo4j as core data base. This makes sense since we are basically building some kind of a social network. Most queries that we need to answer while offering the service or during data mining carry a friend of a friend structure.
For some of the queries we are doing counting or aggregations so I was wondering what is the most efficient way of querying against a neo4j data base. So I did a Benchmark with quite surprising results.
Just a quick remark, we used a data base consisting of papers and authors extracted from arxiv.org one of the biggest pre print sites available on the web. The data set is available for download and reproduction of the benchmark results at http://blog.related-work.net/data/
The data base as a neo4j file is 2GB (zipped) the schema looks pretty much like that:

 Paper1  <--[ref]-->  Paper2
   |                    |
   |[author]            |[author]
   v                    v
 Author1              Author2

For the benchmark we where trying to find coauthors which is basically a friend of a friend query following the author relationship (or breadth first search (depth 2))
As we know there are basically 3 ways of communicating with the neo4j Database:

Java Core API

Here you work on the nodes and relationship objects within java. Formulating a query once you have fixed an author node looks pretty much like this.
for (Relationship rel: author.getRelationships(RelationshipTypes.AUTHOROF)){ Node paper = rel.getOtherNode(author); for (Relationship coAuthorRel: paper.getRelationships(RelationshipTypes.AUTHOROF)){ Node coAuthor = coAuthorRel.getOtherNode(paper); if (coAuthor.getId()==author.getId())continue; resCnt++; } }
We see that the code can easily look very confusing (if queries are getting more complicated). On the other hand one can easy combine several similar traversals into one big query making readability worse but increasing performance.

Traverser Framework

The Traverser Framework ships with the Java API and I really like the idea of it. I think it is really easy to undestand the meaning of a query and in my opinion it really helps to create a good readability of the code.
Traversal t = new Traversal(); for (Path p:t.description().breadthFirst(). relationships(RelationshipTypes.AUTHOROF).evaluator(Evaluators.atDepth(2)). uniqueness(Uniqueness.NONE).traverse(author)){ Node coAuthor = p.endNode(); resCnt++; }
Especially if you have a lot of similar queries or queries that are refinements of other queries you can save them and extend them using the Traverser Framework. What a cool technique.

Cypher Query Language

And then there is Cypher Query language. An interface pushed a lot by neo4j. If you look at the query you can totally understand why. It is a really beautiful language that is close to SQL (Looking at Stackoverflow it is actually frightening how many people are trying to answer Foaf queries using MySQL) but still emphasizes on the graph like structure.
ExecutionEngine engine = new ExecutionEngine( graphDB ); String query = "START author=node("+author.getId()+ ") MATCH author-[:"+RelationshipTypes.AUTHOROF.name()+ "]-()-[:"+RelationshipTypes.AUTHOROF.name()+ "]- coAuthor RETURN coAuthor"; ExecutionResult result = engine.execute( query); scala.collection.Iterator it = result.columnAs("coAuthor"); while (it.hasNext()){ Node coAuthor = it.next(); resCnt++; } I was always wondering about the performance of this Query language. Writing a Query language is a very complex task and the more expressive the language is the harder it is to achieve good performance (same holds true for SPARQL in the semantic web) And lets just point out Cypher is quite expressive.

What where the results?

All queries have been executed 11 times where the first time was thrown away since it warms up neo4j caches. The values are average values over the other 10 executions.

The Core API is able to answer about 2000 friend of a friend queries (I have to admit on a very sparse network).
The Traverser framework is about 25% slower than the Core API
Worst is cypher which is slower at least one order of magnitude only able to answer about 100 FOAF like queries per second.

I was shocked so I talked with Andres Taylor from neo4j who is mainly working for cypher. He asked my which neo4j version I used and I said it was 1.7. He told me I should check out 1.9. since Cypher has become more performant. So I run the benchmarks over neo4j 1.8 and neo4j 1.9 unfortunately Cypher became slower in newer neo4j releases.

One can see That the Core API outperforms Cypher by an order of magnitute and the Traverser Framework by about 25%. In newer neo4j versions The core API became faster and cypher became slower

Quotes from Andres Taylor:

Cypher is just over a year old. Since we are very constrained on developers, we have had to be very picky about what we work on the focus in this first phase has been to explore the language, and learn about how our users use the query language, and to expand the feature set to a reasonable level

I believe that Cypher is our future API. I know you can very easily outperform Cypher by handwriting queries. like every language ever created, in the beginning you can always do better than the compiler by writing by hand but eventually,the compiler catches up

Conclusion:

So far I was only using the Java Core API working with neo4j and I will continue to do so.
If you are in a high speed scenario (I believe every web application is one) you should really think about switching to the neo4j Java core API for writing your queries. It might not be as nice looking as Cypher or the traverser Framework but the gain in speed pays off.
Also I personally like the amount of control that you have when traversing over the core yourself.
Adittionally I will soon post an article why scripting languages like PHP, Python ore Ruby aren’t suitable for building web Applications anyway. So changing to the core API makes even sense for several reasons.
The complete source code of the benchmark can be found at https://github.com/renepickhardt/related-work.net/blob/master/RelatedWork/src/net/relatedwork/server/neo4jHelper/benchmarks/FriendOfAFriendQueryBenchmark.java (commit: 0d73a2e6fc41177f3249f773f7e96278c1b56610)
The detailed results can be found in this spreadsheet.

https://www.rene-pickhardt.de/get-the-full-neo4j-power-by-using-the-core-java-api-for-traversing-your-graph-data-base-instead-of-cypher-query-language/feed/

Foundations of statistical natural language processing Review of chapter 1 https://www.rene-pickhardt.de/foundations-of-statistical-natural-language-processing-review-of-chapter-1/ https://www.rene-pickhardt.de/foundations-of-statistical-natural-language-processing-review-of-chapter-1/#comments Tue, 12 Jun 2012 13:31:51 +0000 http://www.rene-pickhardt.de/?p=1354

Neo4j based Typology also awarded top 90 at Google Science fair.

Tue, 22 May 2012 13:08:48 +0000

Yesterday I shared the good news about Till and Paul who have been awarded one of the top 5 projects at the German federal competition Young scientists. Today the good news continues. Together with more than 1000 competitors they did also submit the project to the Google Science Fair.
And guess what! Google likes graph data bases, information retrieval, auto completion, and scentence prediction. Till and Paul have been (together with 5 other germans) selected to be in the second round together with 90 remaining projects. Even Heise reported about this! Within the next month they will have phone calls with Google employees and with a little bit of luck they will be selected to be in the top 15 which would mean they would be invited to the Googleplex at Mountain View California.
Remember Typology is open source and already available as an android app (only in German language but this will be fixed soon).
Btw I am just wondering when I will be traveling to Google in Mountain View?

]]>