Scalability and Reliability Limitations of ROS

Robot Cars in a Line Formation – Part 1: Purpose and Design

Analysing ROS Distribution Capabilities (July 20, 2016)

My last post was quite a while ago and during that interval I and my supervisors discussed and analysed from what kind of angle we'd want to approach the ROS experiments, so that they may best demonstrate ROS' scalability and reliability capabilities but also reflect real-life scenarios. Building on the initial vision of a few small independent robot cars being deployed on foreign planets or a disaster zone, we consolidated the design of a ROS-powered 9 robot system capable of maintaining a line formation as the robots navigate their environment. Here's an overview of the system concept:

Devices:

9 robot cars
1 "loose", static Raspberry Pi, in a non-mobile chassis
1 wireless network router ideally placed near the master Raspberry Pi, functioning as a connection hub for the robots to be able to discover each other and communicate with the master.

ROS nodes:

Node(s) running on each robot, used for sensor processing, actuator operation and inter-robot communication.
The ROS master, running on the static Raspberry Pi
1 node responsible for electing a team leader; the election algorithm chooses the robot at the front of the line but it also performs "leader election" in the traditional distributed network sense, since it must ensure the team is not left in disarray if the current leader fails or no longer meets leadership criteria.

ROS topics:

2 topics between each pair of robots (so, 9x8 = 72 topics in total) to act as relays for movement instructions; one is for inbound messages (i.e. mars_to_terra) and one for outbound communication (i.e. terra_to_mars).
A leader information topic, to which all robots subscribe to find out who the current leader should be and how they are supposed to line up.
9 status topics for each robot, containing information about the functioning of their sensors/actuators; this is relevant for both the leader election algorithm (that must know if everything is working OK in order to consider a robot for leadership) and the other robots (that must know whether other robots have failed).

While the setup appears simple enough, some clarifications as to why it works this way might be useful.

Why the additional hardware? Well, firstly, the robots need a reliable and far-reaching communication channel. This is where the wireless router comes in, as it can provide easy connectivity with better speed and range than say an inter-robot Bluetooth connection. Secondly, in the interest of energy savings, we reasoned that the robots themselves should not be burdened with any operations that are not essential to communication or task completion, such as periodic leader polling. And of course in order for leader election to work effectively, it must run on a machine with a strong power supply and a low probability of failure. Thus, a static Raspberry Pi is much better at meeting these criteria than a mobile robot carrying a small battery pack and open to damage from the environment.

There is also another reason for the design decision of making election centralised. If you're a bit familiar with ROS, probably know that the ROS master node - which coordinates node interaction - is essentially a single point of failure. If the master node goes down, it will bring down all nodes registered with it and since all nodes must be registered with a master before starting their processing, the system can very easily become entirely inoperable. So as we cannot eliminate this centralised point of failure, it is sensible to run the master node on the device with the lowest expected chance of failure, which in this case would be the static Raspberry Pi.

That said, a ROS package called multimaster_fkie endeavours to remedy this weakness in ROS design by allowing multiple masters to merge their managed nodes and also have active redundancy failsafe. However, in the interest of keeping things simple for our first run of the experiments, we decided to not make use of it just yet.

Also of interest is the justification for the use of a connected topic graph for robot intercommunication. While it may seem like 144 topics is excessive, having dedicated channels for each robot pair allows for more thorough scalability benchmarks. And, in opposition to our previous tendency towards centralisation, it also relieves the master Pi of some responsibilities, in addition to reducing the latency of inter-robot communication. With a well-constructed message type, the topics should be capable of passing not only instructions, but also environment information or any other types of transmissions the robots may find useful. Topics that are not in use remain dormant and should not (in theory) consume much processing power.

This post will conclude here, but keep an eye out for the next part, in which I will go deeper into explaining the technical details of the initial version of the line formation program.

<< Previous Next >>

A Look at the Scalability and Reliability Limitations...
Sunfounder Video Car Kit for Raspberry Pi... Part 1
Sunfounder Video Car Kit for Raspberry Pi... Part 2
Robot Cars in Action
Robot Cars in a Line Formation – Part 1...
Robot Cars in a Line Formation – Part 2...

ROSIE

Robot Cars in a Line Formation – Part 1: Purpose and Design

Analysing ROS Distribution Capabilities (July 20, 2016)

Related Posts