XMPP in OMF

Introduction

Previous versions of OMF used a TCP/Multicast communication server to send experiment instructions from the Experiment Controller (EC) to the Resource Controller (RC) on the nodes. In a federated environment where the experiment nodes are spread across different testbeds in different subnets, this implementation has some limitations, e.g. for multicast traffic crossing any network boundaries. Furthermore, the TCP communicator was implemented in C and therefore was not as portable and the rest of the code which is written in Ruby.

In OMF version 5.2 we are introducing the Extensible Messaging and Presence Protocol (XMPP) for communication between the EC and the nodes. All entities register with the XMPP server and subscribe to a set of PubSub nodes. Messages posted to a certain PubSub node are relayed to all node subscribers by the XMPP server. The message format is standardised XML, and we are making use of XMPP standard features such as authentication, encryption and presence.

Architecture

PubSub nodes

In OMF, we can identify several receiver groups. Messages can be sent to either a specific node through various identifiers, such as the node's IP address or node ID, or they can be sent to groups. A group can be a group of nodes as defined in the experiment script, or a "meta" group such as all nodes in a specific experiment, session or domain. Using XMPP's publish/subscribe scheme, each domain, session, experiment, node and group is represented by a PubSub node. Members of a group subscribe to the respective group nodes, all nodes in an experiment subscribe to the experiment node, and each node subscribes to their very own PubSub node named after their node ID or IP address. In order to send messages to a certain node or group, the sender just needs to pick the right PubSub node to publish to.

The following graph illustrates the hierarchy of PubSub nodes in OMF:

It is important to note, however, that XMPP does not inherently support such a "tree" hierarchy of PubSub nodes. Each node is to their own, there is no dependencies or inherent grouping among them. Messages are not cascaded from branches to leaves. We only use the tree for visualisation and node naming purposes.

A Domain is the name of a testbed site, which consists of a System and Sessions. The System subtree holds the PubSub nodes that refer directly to the testbed hardware, while the Session subtree holds the nodes for user sessions and their experiments.

Here's a typical list of OMF-created PubSub nodes during an experiment:

/Domain
/Domain/System
/Domain/System/10.0.0.1
/Domain/System/10.0.0.2
/Domain/System/10.0.0.3
/Domain/System/10.0.0.4
/Domain/Session/123
/Domain/Session/123/npc_2009_09_11_11_21_01
/Domain/Session/123/npc_2009_09_11_11_21_01/source
/Domain/Session/123/npc_2009_09_11_11_21_01/sink
/Domain/Session/123/npc_2009_09_11_11_21_01/n_1_1
/Domain/Session/123/npc_2009_09_11_11_21_01/n_1_2
/Domain/Session/123/npc_2009_09_11_11_21_01/n_1_3
/Domain/Session/123/npc_2009_09_11_11_21_01/n_1_4
/Domain/Session/123/npc_2009_09_11_11_21_01/master
/Domain/Session/123/npc_2009_09_11_11_21_01/client

Four nodes are available in the testbed (each has a PubSub node under System). The session ID is 123, and the experiment ID is npc_2009_09_11_11_21_01. All four nodes participate in the experiment (n_1_1 to n_1_4). The experiment also defines the groups source, sink, master and client.

Some of the PubSub nodes are static in a testbed, such as Domain, Domain/System and all its subnodes and Domain/Session. These nodes are typically created by the AM or manually by the testbed administrator before any experiments are run. All other nodes relate to a running session or experiment, and are therefore created by the EC at experiment runtime. The EC also deletes those nodes after the experiment has finished. The RC does not create any nodes itself, it only subscribes to them.

In XMPP, the creator of a node is automatically a subscriber. Since all nodes below /Domain/Session/SessionID/ are created by the EC, it is subscribed to all of them. The EC also subscribes to the nodes Domain, Domain/System, Domain/Session and /Domain/Session/SessionID/ to receive testbed- or session-related messages. It furthermore subscribes to the nodes below Domain/System that are part of the current experiment.

At startup, the RC subscribes to Domain, Domain/System, Domain/Session and Domain/System/<control IP address>. The basic idea here was that a subscriber to a PubSub node should also receive the messages sent to the parents of the node it subscribed to. So when a node subscribes to 'x/y/z', it should also subscribe to 'x/y' and 'x'. We are assuming that a message sent to 'x' is relevant to all children of 'x'. At the moment there are no messages sent to Domain, Domain/System and Domain/Session.

When an experiment is started, the RC receives the session and experiment IDs as well as its node name and a list of groups it belongs to. It then subscribes to the respective PubSub nodes.

XMPP users

Openfire creates an admin user by default upon installation. XMPP clients can register and unregister user accounts for themselves. The AM creates a user named aggmgr that is used to create the PubSub nodes Domain, Domain/System and all its subnodes and Domain/Session. This user account as well as the admin account are permanent, while all others are just temporary. The EC creates a user named after the current experiment ID for each experiment. After the experiment has finished, that user and all its PubSub nodes are deleted. The RC registers a user named after its control network IP address and deletes it when the RC is shut down.

Protocol

The XMPP communicator is a plug-in replacement for the TCP communicator, therefore the message exchange between EC and RC is not much different from previous versions of OMF. The main difference is the removal of sequence numbers and an overhauled YOUARE/ENROL procedure. There is no more direct communication between the EC and RC, they now only talk to the XMPP server.

The following diagram shows the message exchange between the EC and XMPP server as well as between the RC and XMPP server:

When the RC starts up, it registers a user for itself and subscribes to a default set of PubSub nodes. The most important node here is Domain/System/<control IP address>, since the EC will send a message there to bootstrap this node with a YOUARE message at the beginning of a new experiment. Before the EC does that, it also creates an XMPP account for itself and subscribes to a few default nodes. During initialisation of the experiment it creates session-related PubSub nodes using the session and experiment IDs. It also creates PubSub nodes for each group and each experiment node. The first messages that are sent out from the EC are the YOUARE messages to each node. They contain experiment and session IDs as well as the node's name and any groups it may belong to. As the RC receives this message, it subscribes to the /Domain/Session/SessionID/, /Domain/Session/SessionID/ExpID, /Domain/Session/SessionID/ExpID/<nodename> and /Domain/Session/SessionID/ExpID/<groupname> nodes. After this has been done, the RC replies to the YOUARE message by sending an ENROL, confirming its name and group associations.

After all nodes have enrolled in the experiment, messages are published by both sides to the /Domain/Session/SessionID/ExpID/<nodename> nodes. The EC may also publish to /Domain/Session/SessionID/ExpID/<groupname>. A /Domain/Session/SessionID/ExpID/<nodename> node only has two subscribers, the EC and the RC with the respective node name. A /Domain/Session/SessionID/ExpID/<groupname> node can have multiple RC subscribers. The XMPP server duplicates the published messages to the group nodes and relays them to each subscriber.

In this example, the RC (the node) is started before the experiment is run. It enrols immediately after the YOUARE message was sent. In another scenario, the EC might have to wait until all RCs sign in. In that situation, we are making use of a certain XMPP feature: the last message sent to a PubSub node is cached, and every new subscriber receives this message immediately after subscription. Therefore the EC can send out the YOUARE messages even if just some or none of the nodes are available yet. As soon as they become alive, the RC will receive the cached YOUARE and enrol in the experiment. Should the user have decided to abort the experiment before the node enrolled, the session-related PubSub nodes have been removed. When the node then comes up and receives the cached and outdated YOUARE message, it checks if the session nodes are available. If not, the RC concludes that this YOUARE must be from a previous experiment and ignores it.

Implementation

XMPP Server

Since XMPP is an open standard, any XMPP server implementation should work in conjunction with OMF. However, we recommend the use of Openfire (http://www.igniterealtime.org/projects/openfire/), which we have extensively used in our local testbeds. Openfire is a free software XMPP server implementation written in Java. It runs on Windows, Linux and Mac OSX and is configurable via web interface. Openfire makes use of an internal web server and is capable of using mySQL or HSQL as database backends.

Openfire also supports advanced features such as server-to-server communication, SSL certificates and traffic statistics. The details on how to configure Openfire for OMF can be found in the Installation Guide.

Shared OMF XMPP code

Since XMPP is used by different components of OMF, we've made the basic XMPP code available in our shared library package, omf-common. The class OmfPubSubService provides XMPP communication functions to the EC and RC. It makes heavy use of the Ruby gem XMPP4r, which is a free XMPP implementation for Ruby.

In order to connect to an XMPP server, OmfPubSubService's function initialize must be called with parameters user, password and host. Since we've set the Openfire permissions to "open", a user is registered on the server if it doesn't yet exist. Immediately after calling initialize, a TCP connection to the XMPP server is established and the user shows up as online in Openfire's "Users/Groups" pane.

OmfPubSubService then provides various functions to create and remove PubSub nodes, subscribe to and unsubscribe from PubSub nodes, register an event callback for incoming messages and send a XMPP ping to the server to keep the connection alive. These function are basically just calling functions of XMPP4r.

The hourly sent "ping" message is necessary since we've observed that Openfire disconnects XMPP clients after an idle timeout. It has been implemented in accordance with the XMPP specifications.

Experiment Controller

To communicate with the testbed nodes, the EC uses an instance communicator of the XMPPCommunicator class, which is derived from the Communicator superclass. All XMPP-related code is situated in the XMPPCommunicator class, which makes the communication scheme pluggable.

The EC also registers an event callback for incoming messages. It filters out messages that it has sent itself (they are always echoed to the creator of the node, which is the EC is most cases) and hands the remaining messages down to the AgentCommands module for further processing.

The contact information for the XMPP server is located in the config file nodehandler.yaml:

:communicator:
  :type: 'xmpp'
  :xmpp:
    :server: '10.0.0.200'
    :password: '123'

Resource Controller

The implementation of XMPP in the RC is situated in the AgentPubSubCommunicator class. It has a very similar structure to the EC implementation described in the previous section. It also registers an event handler for incoming messages and hands them down to the NodeAgent.instance.execCommand() after some filtering.

Issues

There are currently no known issues with XMPP in OMF. Experiments involving up to 40 nodes did not indicate any scalability problems. A possible bottleneck could be the XMPP server, but since our code is laid out to work with any XMPP server implementation, we could swap out the server for another one to rectify any performance issues. Another possibility would be to partition the nodes and have them connect to different XMPP servers, which themselves are interconnected using the XMPP server-to-server feature.

A potential security issue might be the fact that at the moment we need to configure Openfire to allow anyone to register XMPP accounts. Furthermore, we are not using SSL at the moment. In a closed testbed this might not be a problem, but we will have to re-think about security when we make the XMPP server accessible from the web, in order to allow people to run EC's from outside of the testbed.

Future Work

In OMF 5.2 there is currently no AM gridservice to create the SYSTEM nodes. It has to be done manually during testbed installation using a script that is provided as part of the OMF release (omf_create_sysnode).

The communication between AM and EC/RC is currently using HTTP. We might want to use XMPP here in the future as well.

xmpp-hierarchy.png (132 kB) Christoph Dwertmann, 26/11/2009 01:53 pm

xmpp-message-flow.png (113.9 kB) Christoph Dwertmann, 26/11/2009 04:52 pm