Relayed Broadcasting in Off-Grid Mesh Networks Towards Efficient Decentralized Algorithms
The leader of an overland fire-fighting crew in Colorado recently remarked “It is really big being able to see where people are at….. that’s extremely helpful” in a report on using ATAK over the goTenna Pro. The ability to see one’s team members’ locations on a map in real-time, drop annotated “pins” on specific locations, or “fences” around an area is invaluable not only in fire-fighting but across a range of public safety, disaster relief and military scenarios. And because these scenarios are likely to happen in areas where cellular infrastructure is not available or has been destroyed, such collaborative mapping may need to be provided over off-grid mesh networks.
The central capability needed from a networking perspective to provide collaborative mapping over a mesh network is relayed broadcasting: given a piece of information (e.g. a message containing the geolocation), get it to every node (a user or device) in the mesh network, relayed over multiple hops. There are several challenges: the nodes may move around and change the connectivity, messages may get lost due to noise and interference, the network may get clogged due to low capacity or too many messages, and batteries may die out due to too many transmissions. In view of the last two points, we would like our solution to be efficient, that is, use as few transmissions as possible.
Further, in an off-grid setting, there is no central “base station” or equivalent, and so we need decentralized algorithms. In such algorithms, nodes start off with no information other than “I exist”. Each node receives a message, executes a set of steps, and transmits the message, and out of this emerges some global feature — in our case, the network-wide dissemination of messages. In this article, we shall walk through some solutions to the broadcasting problem, and along the way, hopefully convey the flavor and beauty of decentralized algorithms.
To make this less abstract (and fun), let us take an analogy. Imagine a scenario in medieval England where a “lookout network” covering the population of a county is built up to warn people of Viking raiders across the North Sea. The first sighter shouts “Northmen Alert” (or some equivalent “bird call” to obfuscate from the enemy!) and this needs to be relayed by the network of lookouts to reach everyone in the county. Further, each lookout may itself originate a message periodically to give an update of activity in their area. What is the algorithm each lookout should execute to get this done efficiently, i.e., use a minimum number of “shouts”? For the moment, let us not worry about message loss.
Let us begin with the obvious solution to this problem: each lookout that either originates or receives a message simply relays (or re-shouts) the message if she hasn’t already done so. This last bit is crucial, otherwise there will be a cycle of re-shout cacophony! This decentralized algorithm, commonly called Flooding, clearly works, is very easy to implement, and is pretty robust. But how scalable is it? If each lookout originates a message per minute, and there are N nodes, the total number of shouts is N2 / min. Such a quadratic dependency quickly gets out of hand as N increases. If there are 30 nodes, we have of the order of 900 messages/min, which may be tolerable for high-bandwidth wireless networks, but is prohibitively high for long-range short-burst networks where fundamental tradeoffs imply low capacity.
Although mesh networks, alternatively called Mobile Ad Hoc Networks (MANETs), have been studied for a long time, most of the work has been on unicasting, that is, sending a message from one node to another node, rather than to all other nodes. Broadcasting is a very different problem, and Flooding remains the de facto standard/solution despite its poor scalability. For example, Bluetooth has a mesh option based on flooding even for unicasting. Many unicasting protocols such as B.A.T.M.A.N, AODV and OLSR depend on Flooding for disseminating their control information, but do not natively support Broadcasting for data.
Can we do better than Flooding? Looking carefully at the operation of Flooding in our medieval scenario, one observes that many of the shouts are redundant. In fact, it suffices if lookouts A and C (or B and C) relay the message, no matter who originated it. But how does each lookout figure out, no matter what the network topology is, if they should relay or not relay? In other words, how can we compute critical relays in a decentralized manner?
Given that it doesn’t seem too hard to figure out critical relays if we know the connectivity network of the lookouts, one approach to the problem might be to first acquire the necessary network connectivity map at each node. This is the approach adopted by a protocol known as Multi-Point Relay (MPR) — developed to disseminate link-state information in OLSR, and perhaps the best known non-flooding choice. MPR figures out which subset of its 1-hop neighbors reaches the maximum number of 2-hop neighbors, breaking ties randomly. It then selects a subset of 1-hop neighbors as critical relays and instructs them to relay. The figure below illustrates the key idea.
This works, except that exchanging control packets across a part of the network to figure out the entire connectivity in a mobile mesh network may itself require periodic Flooding, which scales poorly2.
Is there a scalable, delivery-guaranteed, and decentralized solution to the Broadcasting problem that doesn’t require a priori collection of connectivity information? This was the question at hand at goTenna while designing the Pro device. With no solutions “out there”, we were forced to innovate. And what we came up with was an elegant, scalable, fully decentralized algorithm called ECHO that easily outperforms both Flooding and MPR in bandwidth-constrained networks.
How does ECHO work? Back to our medieval lookouts, who by now have time-traveled to learn and use ECHO! The lookouts make a tiny modification to Flooding — instead of just shouting “Alert”, they also include how they first knew about the alert. So, for example, if B first heard about the alert from A, lookout B shouts “A says Alert”. Each lookout also executes the following (decentralized) rule: “if I heard my name being mentioned, I will become a critical relay (that is, I will relay future shouts) else I will not”.
Let us execute the above decentralized rules on each lookout in our network (see Figure). In Step 1, A simply transmits ALERT since he is the originator. In Step 2, since B, C, D and E all first heard the Alert from A, they all transmit “A says Alert”. All lookouts except F hear each others’ shouts, but only A has heard his name being mentioned. Thus, by the rule mentioned earlier, only A marks himself critical in Step 2.
In Step 3, lookout F, who has heard “A says Alert” from C, re-broadcasts, but changes it to “C says Alert”, since he heard it from C. So now C has heard her name mentioned and marks herself critical. The wave of flooding now terminates, with A and C marking themselves critical and the others marking themselves non-critical, which is a correct solution to the problem! For all future relayed broadcasts — no matter who originates — only A and C need relay.
This works not only in the above example, but in any network. Why? Intuitively, if a lookout X does not hear his or her name in a broadcast, it means that no node depends on her as a relay, i.e., there are others who have gotten to all of her neighbors earlier than her. In that case, X does not need to be a relay. For example, B, D and E in the example network above. And vice-versa, if a lookout Y does hear his name, someone could depend on him, therefore he should continue relaying.
In the 6 node network above, the number of transmissions per origination is reduced from 6 to 2 or 3 depending on who originates. For N nodes, each generating a message per minute, the load is c*N / min — a factor of N/c lower than Flooding. For large N and small c (i.e., large dense networks), the gains are immense — as much as a factor of 10x in a 30 node highly dense network. For a maximally sparse network, such as a chain of nodes, there is no room for pruning relays and any algorithm including ECHO will use a similar number of relays.
The above example omits a number of issues to focus on the key idea for critical relay selection, but obviously a real implementation has to address a number of other issues. ECHO uses a single 2-byte additional field in the header that we call “previous sender” to do the equivalent of “X says…”. It has no other “overhead”. It addresses message loss, and accommodates the mobility of nodes by periodically and randomly doing full floods to re-compute the critical relays afresh. This has the beneficial side effect of randomizing the selection of relays to balance battery consumption. The full details of ECHO are given in this paper, including a formal (mathematical) proof that the above logic works correctly to self-anoint critical relays no matter the topology and originator.
ECHO represents a key breakthrough in relayed broadcasting: it is the only real-world protocol that efficiently selects relays without using any control packets. This is crucial to scalability and energy-efficiency in low-bit-rate battery-powered networks. Our simulations show that for a realistic network of 30 nodes in a roughly 3km x 3km area, ECHO uses 6.4x fewer transmissions than Flooding, and 5.3x fewer transmissions than MPR, while delivering a larger percentage of originated messages than either. This means significantly longer battery life, and support for higher numbers of nodes and/or messages. Not using control packets also means better security, since a whole slew of attacks that target and spoof these packets are eliminated.
ECHO is part of the goTenna Aspen GroveTM suite of protocols running on the goTenna Pro providing scalable, resilient, decentralized mesh networking for mobile mesh networks. It is a game-changing technology that has improved the world of low-cost long-range short-burst communications in disaster relief and public safety scenarios.
Not to mention helping our medieval lookouts get their Alerts across with minimal shouting!
1 For those with some familiarity with graph theory or network algorithms, this is the Minimum Connected Dominating Set problem.
2 MPR can be modified to use 2-hop neighborhood information only, but even this scales as N2 for dense networks in terms of communication complexity.