Scalability and Reliability Limitations of ROS

Robot Cars in a Line Formation – Part 2: Implementation and Challenges

Analysing ROS Distribution Capabilities (August 11, 2016)

Following on from the previous post, this post will go a bit more in depth on the functioning of the team leader election and team movement coordinator nodes running on my robot system and some of the challenges we encountered along the way.

Leader election

Let's start with the team leader election algorithm and the supporting virtual battery class. The criteria for a successful leader election consists of three factors: robot "liveness" (response to pings), operation of their sensors and battery level. On initialisation, each robot’s main node sets up a virtual battery. The battery is initialised with a given level of charge, as well as a charge and discharge rate. This class was developed as a workaround to the issue that the Pi cannot read the voltage of the actual robot batteries without a AD/DA converter module, as its inputs are purely digital.

The leader election node then subscribes to each robot's status topic, thus keeping track of whether their sensors are still functioning normally. Election is triggered every N seconds and begins by pinging all robots to determine which of them are alive. All of the live robots are then checked for proper sensor functioning. If a robot reports a status different from "OK", it will be excluded from the list of leader candidates. Lastly, the leader candidates are sorted by remaining battery level and the robot with the most remaining battery is chosen as the leader, with the other robots expected to follow behind it in descending order of their battery level.

Robot operation

So how do the robots respond to these elections? After the election nodes decides on a team leader, it publishes the chosen robot line order on a dedicated topic, as a Lineup message, in essence a single String containing the robots' hostnames in the chosen order. An evident question arises here, however. Why choose a single String for all of the robot names when a String array would be clearer? Through trial and error, though, it appeared that String array messages were being lost in transit through the topic no matter what queue size and publish rate setting I chose. While it is unclear whether or not this is an inherent ROS limitation, the single String message is a work around solution that will suffice for our experiments.

Now, each robot main node subscribes to this topic and checks all incoming robot order messages to see where it stands within the lineup. It also subscribes to the status topics of each of the other robots (to ascertain whether they are down or still moving) and incoming direct message topics from each robot (a total of 8 incoming topics). It also registers itself as a publisher for another 8 outgoing direct message topics.

If a robot finds out it is the leader, it executes a set of instructions (given by the user at the initialisation of the main node) and stores them as the CURRENT instruction. These instructions are actually command sets, composed of one or more "non-moving" commands (camera pan/tilt, wheel direction change) and exactly one "moving" command. (forwards, backwards).

The leader then tells the robot behind it sends the LAST instruction over the outgoing direct message topic to that robot. If the CURRENT instruction is the first instruction the leader has executed since being elected, the LAST instruction will have a default value of "forwards". This design choice was made since, according to the line formation, the robot behind needs to first move to the previous position of the robot in front of it before it can execute the CURRENT instruction.

If a robot is not the leader, it executes all incoming instructions that it determines have come from the robot in front of it, similarly marks them as the CURRENT instruction and relays its LAST instruction to the robot behind. The program can also respond to dynamic changes in leadership. If a leader goes down, on the next run of the election algorithm, the leader will change and new instructions will be executed.

Issues

What of the challenges we encountered? I already mentioned the String array message's seeming incompatibility with topic publishing, but that problem turned out to be fairly minor in comparison with our biggest issue: time synchronisation. ROS topics feature fairly strict timing requirements and while this is usually a non-issue for nodes running within the same machine, once multiple hosts are introduced, it can be difficult to ensure that multiple clocks are kept in sync. This is even further complicated by two issues:

Our robots are connected to a LAN isolated from the Internet (and thus, any NTP servers)
The Raspberry Pis have no real time clock on board, meaning they cannot keep track of time changes while they aren't powered

Since the ROS master node does not by default offer any solutions for synchronising the times of connected machines, third-party solutions had to be investigated. Thus we came to the decision to use chrony, a popular and more versatile NTP implementation specifically designed with LANs and slow dial-up connections in mind.

Using the master Pi as a chrony server and the robots as slaves, we were able to synchronise the robots' clocks to what we initially believed to be fairly high accuracy. However, it soon became apparent that great care needed to be taken to ensure the robots sync with the server upon boot, otherwise the timing discrepancies introduced by the power 0ffs of each device would compromise topic communication. The robots were also found to periodically disconnected from the chrony server, resulting in some fairly significant timing differences (of the order of hundreds of seconds!).

To add to these issues, ongoing experiments seem to show that even machines that appear in sync have a clock error large enough to cast doubt over the accuracy of gathered experimental data, particularly data which relates to how long messages take to transit topics.

In spite of these challenges, we have succeeded in compiling the first of our experiment results, which will be described in detail in future posts!

<< Previous Back to Blog >>

A Look at the Scalability and Reliability Limitations...
Sunfounder Video Car Kit for Raspberry Pi... Part 1
Sunfounder Video Car Kit for Raspberry Pi... Part 2
Robot Cars in Action
Robot Cars in a Line Formation – Part 1...
Robot Cars in a Line Formation – Part 2...

ROSIE

Robot Cars in a Line Formation – Part 2: Implementation and Challenges

Analysing ROS Distribution Capabilities (August 11, 2016)

Leader election

Robot operation

Issues

Related Posts