11 The Control Network
11.1 Overview
At every Emulab cluster, the majority of the allocatable nodes (primarily general-purpose servers) are connected to a persistent, shared IPv4 subnet known as the control network. Each cluster has its own control network namespace, typically a subset of the affiliated university’s larger IPv4 namespace.
The control network at each cluster is used for two primary purposes. First, since it is Internet accessible, it provides the path through which users interact with their experiment nodes, including getting data in and out of Emulab. Second, it is used by Emulab infrastructure for experiment control. This includes initial imaging and configuration of nodes as well as ongoing monitoring of resource usage at the cluster level. It is also used for infrastructure coordination and control between the individual clusters and the master Emulab portal at Utah.
The control network is distinct from the network topology connecting nodes as defined in an experiment’s profile. That topology, known as the experiment network, is realized at experiment instantiation time on a completely separate fabric of physical switches using vlans on those switches to implement an experiment’s links and lans. Each experiment network is isolated from those of other active experiments, and exists only for the lifetime of the experiment.
We do allow Internet wide experiments on occasion, but this requires permission from, and coordination with, Emulab staff since such traffic tends to trigger monitors resulting in abuse reports from both at Emulab host institutions and external sites.
The following sections elaborate on how to ensure your experiments use the correct network (including a list of common mistakes and how to empirically verify correct interface use), how to respond to a Control Network Violation message, why it is so important not to use the control network anyway, and finally after all the negativity, what activity is allowed on the control network.
11.2 Ensuring your experiment nodes use the correct interfaces
From the perspective of an experiment node, the control network and any experiment networks are accessed through the same OS-provided mechanisms. This requires that the experimenter know how to distinguish the two and ensure that they use the correct interfaces.
On the surface, the distinction is simple: active experiment interfaces will have 10.x.x.x private IPv4 addresses assigned to them. The control network interface will have an Internet routable IP address in the IANA-assigned IP space of the specific Emulab cluster. All applications that transmit packets running on experiment nodes should use the 10.x.x.x addresses (or the aliases in /etc/hosts) when addressing other nodes in experiments. All applications listening for connections from other nodes in the experiment should bind to those names/addresses as well.
This sounds simple enough, but unfortunately there are a number of scenarios in which identifying the control interface is not so easy or it is not obvious that you are using it. The next section details some of the common mistakes made and how to avoid them.
11.2.1 Common causes of control network misuse
Use of a fully qualified domain name in applications. In Emulab, fully qualified domain names (FQDN) for a node are mapped by the Emulab DNS to the control network interface. This includes both the permanent “physical” node name assigned by the infrastructure (e.g., amd047.utah.cloudlab.us), and the ephemeral user-assigned virtual node name specified in the profile of an experiment instantiation (e.g., node1.myexpt.myproj.utah.cloudlab.us). Many applications, in particular Java applications, use the FQDN of a host to determine an IP address to use for “the network” resulting in its improper use of the control net. You will need to configure the application to explicitly use the “short name” of your experiment nodes in /etc/hosts or use the explicit 10.x.x.x IP address.
Using INADDR_ANY (0.0.0.0) for listening services. It is very common for applications that provide a network service to default to listening on every active interface via the INADDR_ANY address in a socket bind call. In Emulab this will include the control network interface resulting in a service that is exposed to the Internet at large. While this is not directly a misuse of the control network, it becomes one if one of these services is compromised, as they often are due to weak security in the default configuration or due to known bugs. At this point compromised nodes often become part of a botnet launching attacks on other sites.
Fortunately, most applications include options to limit the interfaces on which they listen either by specifying those interfaces directly or by having an option to exclude specific interfaces.
Use of the wrong interface with RDMA and kernel-bypass networking. If your experimentation involves RDMA or other uses of highly optimized kernel-bypass network stacks (e.g., DPDK), you will need to make sure that your configuration of these tools is using the correct interface. In the case of RDMA, simply specifying an experiment network 10.x.x.x IP is often insufficient for ensuring your RDMA traffic flows over the correct interface. In utilities such as ib_send_bw for example, the internal IP is used for the handshake QP info exchange, but the physical port that it actually sends traffic out on is selected with a different argument. By default, it will select the first port on the first RDMA-capable NIC it finds, which will be the control network port on a number of Emulab hardware types.
You will need to use utilities such as ibstat or ibv_devinfo to find the device that you actually want to send on, and then ensure that your application is properly configured to use that. In the case of ib_send_bw for example, you specify it with the -d command line argument.
Modifying the configuration of the control network interface. Emulab infrastructure makes a number of assumptions about the control network namespace and bad things can happen if those assumptions are violated. Changing the IP address assigned to the control network interface, or removing it entirely, is obviously bad because the infrastructure and outside world can no longer talk to the node. Adding an alias to the interface can likewise cause problems. If you were to assign an additional routable IP address from a non-Emulab network IP namespace, you are likely to trigger anti-spoofing measures at our gateway or beyond as non-Emulab sourced packets get routed out of Emulab. Even assigning a non-routable IP alias may have consequences as it could result in unintentional communication between your experiment node and another experiment’s nodes that happens to use an alias in the same subnet.
As a general rule, never change settings on the control network interface and ensure that applications do not do so either.
Unintentional use of the default route. The default route on Emulab nodes is through the control network. This is necessary so that applications on nodes can communicate with arbitrary Internet hosts for SSH, DNS, and other well-behaved services.
For experiment network traffic, explicit routes via the experiment interfaces are added by the Emulab software based on IP addresses explicitly or implicitly assigned in profiles. Thus use of any IP address other than these in your generated traffic, will cause that traffic to be routed using the default route via the control network. For example, replays of real world packet traces with routable IP destinations could result in sending packets with routable IP destinations out the control network, potentially wreaking havoc. Even simple typos when specifying 10.x.x.x addresses (e.g., using 10.2.1.1 in an app when you meant 10.1.1.1) will result in use of the default route. Use of IP multicast without a route for that traffic is another cause. Finally, disabling automatic route configuration in your experiment profile and attempting to setup the routing yourself, or forgetting to setup routing at all, is another common source of misrouting.
The best advise here is to let Emulab handle IP assignment and route setup whenever possible and to verify that traffic is not going through the control network.
Use of the wrong interface when broadcasting or multicasting. Using the wrong interface for broadcast and multicast traffic can have much worse consequences than doing so for unicast traffic, so it is called out specially here.
Broadcasting packets on the control network interface of a node will effect every other active experiment node in the cluster as they must receive and ignore those packets adding unexpected and unwanted overhead. Likewise, every control network infrastructure switch must forward those packets on every active port. Given that even the oldest nodes in Emulab can generate in excess of one million packets per second, the impact can be considerable. Certain types of multicast traffic can also cause issues when sent via the control network interface. In particular STP and IGMP packets that will be interpreted by the infrastructure switches.
Thus you must be extremely careful when using broadcast and multicast in an experiment. You should ensure that you have bound the traffic to the correct interface or that you have specified an explicit route for IP multicast.
Overuse of the infrastructure NFS filesystems. On each cluster, Emulab provides a private, persistent NFS filesystem to every project. At any time, all nodes allocated to any experiment in project foo can access that space over the control network via /proj/foo. When using the Emulab portal, there is also a per-user NFS filesystem mounted under /users which hosts users’ home directories. This is a very convenient mechanism for providing a shared name space to all nodes in an experiment or across experiments. However, it should only be used for sporadic access to files and not for continuous reading and writing of datasets, databases, VM images, or log files to avoid overloading of the control network (and the infrastructure NFS server). This is especially a problem for large multi-node experiments simultaneously accessing the shared filesystem. See the storage section for alternatives to use for shared storage.
11.2.2 Empirically verifying interface use
It is always a good idea to ensure that your experiment is using the correct network interfaces after initial experiment setup has been done and once the involved applications are running. To do this you should start with the smallest instance of your experiment that is reasonable, just two nodes if possible, and then scale up to your desired size once you are confident that things are behaving correctly.
“Behaving correctly” in this context means “not using the control network excessively”, so we are only interested in getting a sense of the volume of traffic on the control network interface and whether there are any services listening on the interface that need not be. Following is a short list of techniques you can use to determine that.
Use the Graphs tab in the experiment status page. Every experiment node runs a monitoring daemon which collects resource usage as provided by the node OS. This information is reported (every five minutes) to the portal where it is aggregated and the results for all experiment nodes summarized on the Graphs tab. If the Control Traffic Graph shows over 100,000 packets-per-second, you are using the wrong interface. Only the Experiment Traffic Graph should show rates that high.
However, if you are using kernel-bypass networking (e.g., RDMA), then the OS-provided metrics will be wrong since the traffic is not passing through the kernel. In that case the Graphs tab will tell you nothing and you should try the Portstats tab.
Use the Portstats tab in the experiment status page. The Portstats data come from the infrastructure switches that a node’s control and experiment interfaces are connected to. Thus, it will still see traffic that the OS does not when using kernel-bypass. Unfortunately, Portstats data only show experiment network traffic, so this is at best an indirect metric; i.e., if there is a lot of experiment network traffic, the experiment is probably configured correctly. Note that the page also only displays the delta from the previous refresh, so the first call will show all experiment traffic from when the experiment was started. You will have to wait a few minutes and click the refresh icon to get a more meaningful reading. Also, not all control net switches support access to the switch counters, so you might not always get non-zero values from this tab.
Run tcpdump on the node. The Graphs tab will only give you an aggregate packet rate at five minute intervals. If you need to know more about the traffic or get an instantaneous snapshot, then you may be able to learn more using tcpdump. You need to be careful running on the control net interface however since you are likely logged into the node via ssh over the control network, and you do not want to capture your own interactions. Try:
tcpdump -c 500 -n -i ‘cat /var/emulab/boot/controlif‘ not port ssh and not arp
This will print out info about the first 500 packets it sees that are not from ssh (your login session) and are not ARP packets (we have a lot of normal ARP traffic on our control networks). If the command finishes nearly instantaneously and produces output too fast to read, then it is likely that something is sending or receiving at an abnormal rate. From the tcpdump output, you should be able to see the source and destination IP addresses and ports as well as the type of traffic. This may offer a clue as to what is happening.
Note that as with the Graphs tab, this method will not work with kernel-bypass network activity, tcpdump will not capture any of that traffic.
If you want to figure out if any of your experiment services are listening for connections on the control network, there are a couple of ways to test:
Try connecting to the service from another host. What “connecting” means is highly dependent on the service, but generally you can use telnet to attempt to connect to the service port. You should first try doing this from another Emulab node using the target node’s control network IP address. If you can connect, then the service is definitely listening on the control network interface. This may still not be a problem, as the service might be blocked by the Emulab firewall. You can try connecting to the service from your home or university machine as well. If you can connect, then the service is listening and likely exposed to the Internet at large. You should fix the configuration of the service to disable that interface or ensure that it is securely configured.
Run nmap on the experiment node. An nmap scan of your node will give you a good idea of all the ports (services) exposed from the node. Here it is better to run from outside of Emulab as there are a number of services that are visible inside that do not pose a problem since they are blocked at the firewall. So if you have access to the nmap utility at your home location, then you can try scanning one of your experiment hosts using its FQDN. The only “open” service you should see is ssh (port 22).
Run lsof on server processes. You can use the lsof command, which should be installed on all supported Emulab OS images, to see what ports a specific service (process) has open or do the same for all processes. Run:
sudo lsof -Pn -p PID | grep LISTEN
for a single process PID or:
sudo lsof -Pn | grep LISTEN
for all processes. if it shows *:portnum or IP:portnum where IP is the control network address, then it is listening on the control network.
This can be used with the remote method to determine which process is exposing a port.
11.3 What to do if you get a Control Network Violation email
There are two forms of “control network violations” that we send email notifications about. One is the “very unusual amount of traffic over the shared control network” message which is the result of our auto-detection of high traffic volumes (cluster-specific, but typically exceeding 1Gb/sec or 100,000 packets/sec) over an extended period of time (average over 10 minutes). The other is the “we have determined that one or more nodes in the experiment has been compromised” message which is usually the result of Emulab staff receiving a message from the host site or an external administrator, that a Emulab node is engaging in undesirable behavior.
In addition to sending the email, we may also:
freeze your Emulab account, logging you out of the portal and blocking further logins.
quarantine your experiment, which entails rebooting all nodes in the experiment into a minimal Ubuntu Linux environment running from a RAM-based filesystem, collectively known as “the MFS”. The disk is left intact, allowing the experiment to resume later as though the nodes had just rebooted.
terminate your experiment, causing all nodes in the experiment to have their disks reloaded and returning them to the free pool. The experiment cannot be resumed.
In the high traffic volume case, we generally first send the email and wait for an hour or two for a response. If we get no response, we will freeze your account and quarantine your experiment. If we still get no response, we will terminate the experiment and leave your account frozen. Upon receiving a reply and reaching a resolution, we will unfreeze your account and release the experiment from quarantine, allowing the nodes to boot from disk again.
See Using the Recovery MFS for more information on accessing a quarantined node’s disk from the MFS.
Your responsibility as a user is first to make sure you whitelist email from Emulab! We will only use email to reach out to you. It is not good to discover that your experiment has been terminated because you never saw email from us.
If you receive one of these emails from us, do not panic! In most instances, it is not a serious problem for us, just a situation that needs to be fixed. Even with compromised nodes, we understand that these things happen. If you make an honest effort to fix the problem and be diligent in future experiments, all is well. We want you to continue using Emulab for your research!
Don’t panic, but do be responsible and respond quickly to the email so that the problem can be resolved quickly. Your initial response should include an explanation of what you are attempting to do, why that may have caused the problem observed, and what you plan to do to fix it.
We do make exceptions when the “excessive” traffic is necessary, or when a “compromised” node really isn’t.
11.4 Why is it so important to avoid the control network?
This document has been hammering home the point “Using the control network is bad!” while citing largely reasons why it causes problems for the Emulab infrastructure. For example, high volumes of traffic (data or packet rate) can interfere with infrastructure services, in particular those that are UDP-based such as DHCP, TFTP and our multicast image distribution mechanism. This traffic can also affect other experiments, interfering with their interactions with the infrastructure or introducing overhead receiving and rejecting rogue traffic. This is exacerbated by the control net fabric not being as well provisioned as the experiment fabric: switches are typically 1Gb or 10Gb, having less interswitch bandwidth and being less balanced in the topology. The control network’s relative transparency to applications coupled with its accessibility from the Internet, makes it much easier to inadvertently run vulnerable services with insecure default security policies that are exposed more than intended. It also allows for running services that can hijack identical infrastructure services; e.g., ARP, DHCP, DNS, or NFS.
These are all reasons we as infrastructure providers care about. There are also good reasons that you as an experimenter should care. The experiment network fabric consists of higher bandwidth node NICs (10, 25, 100Gbps) coupled with better switches and higher bandwidth interconnects. This fabric provides isolation guarantees by using separate switch vlans per experiment link/lan and optional performance guarantees by not allowing over-provisioning of logical links onto the physical links. Together these provide a more reproducible environment for experimentation. Accidental use of the control network instead of the experiment network can undermine these characteristics. You might perform a benchmark expecting it to run over an isolated 100Gb link, but misconfiguration might cause it to use a 10Gb link over the shared network. Likewise a dynamic routing experiment over a carefully constructed multi-hop topology, might be circumvented by the routing daemons discovering and using the control network where all nodes are one hop from every other node.
11.5 When can I use the control network?
While there are many usage patterns of the control network you should avoid, there are times when it is legitimate to use it. These include:
Monitoring and control of nodes in an experiment.
Modest use of the shared NFS filesystem rooted at /proj, such as storage of gigabyte or smaller package tarballs that you install on nodes, or datasets that you reference infrequently.
Importing and exporting data at the start and end of experiments or transferring data between an expiring experiment instance and a new one, as long as you throttle the bandwidth when possible (scp -l or rsync –bwlimit).
In general, low bandwidth or one-time “bursty” use between Emulab nodes and the Internet, or Emulab nodes in one experiment to nodes in another experiment, are okay–think human interaction or 1Gb/sec or less of TCP traffic.
We are flexible however, so if you have a need to move a large amount of data in or out of Emulab or between experiments quickly, just let us know in advance.