ClusterFrontends

From BroWiki

Jump to: navigation, search

Contents

Cluster Frontends

A crucial part of the Bro Cluster installation is the frontend(s). A frontend operates on a 10Gbps packet stream and rewrites the destination MAC address of each packet to be that of the destinated backend. The decision which backend to chose is based on a hash calculated over the packets' IP addresses. Once the MACs are rewritten, a switch dispatches the modified packet stream across the backends. See the RAID paper for more information about the cluster's architecture.

Goals

We are looking for options which allow us to implement front-ends that

  • support monitoring 10Gbps links.
  • are reasonably priced, relative to the rest of the cluster equipment.

Requirements

A frontend needs to do be able to perform these tasks:

  1. Receiving & sending packets at 10 Gbps line-rate.
  2. Calculating a hash function based on the IP addresses found in the packet's header. The hash can be as simple as doing an xor, though ideally it's some cryptographic scheme like MD5 or SHA1. The hash function doesn't need to be great, but it ideally should be keyed, which prevents an attacker from precomputing hotspots along the Crosby/Wallach hashfunction attack.
  3. Mapping each hash value to one of a set of destination MAC addresses.
  4. Rewriting the destination MAC of the packet accordingly.
  5. Sending the packet out again.

Current Installations

Currently, we are using two different front-end implementations. Unfortunately, neither of them is ideal:

  • A purely software-based approach using the Click modular router. Our implementation uses a reduced round RC5 encryption.
    • (+) Uses commodity hardware.
    • (+) Easy to setup
    • (+) Keyed function
    • (-) Limited performance (up to 2Gbps in our tests; perhaps a bit more with different hardware but certainly not able to do 10Gbps).
    • (-) Linux only (because Click does not support, e.g., FreeBSD in kernel-mode currently).
  • A modified P10 appliance from Force10 Networks. This does an xor-based hash.
    • (+) Can do 10Gbps line-rate.
    • (-) Very expensive.
    • (-) Requires custom reprogramming of the P10's FPGA.

We need something in between these two options which is able to do 10Gbps line-rate at reasonable cost.

More Options

Here we are collecting potential options which might be worth exploring further. More ideas welcome!

For all the 10 Gigabit NICs, if the programmability allows a packet to be output from input modified, we don't need to have a dual port card because we can use a VLAN-rewriting strategy and a switch.

  • There are some NICs available which appear to be programmable:
    • LeWiz's Talon3220
      • "Dual port, 10GigE, CX-4 Copper/Fiber Optical NIC, Dual TOE Engine, PCI-Express"
      • Programmable hardware planned; LeWIZ is potentially interested in doing the programming for our application.
    • Netxen's NXB-10GXxR Intelligent NIC
      • "The NX203x is a family of highly integrated, fully programmable Intelligent NIC devices designed to offload and accelerate processing tasks."
      • Netxen said programming their NICs for our application shouldn't be difficult. However, a few monts ago it wasn't high on their priority-list to do so, and they also didn't provide an SDK so that we could do ourselves.
      • Seems to be single-port.
    • Myricom's 10GBase-CX4
      • "Myri-10G NICs include processors and firmware, and can be used in system solutions that go beyond the standard TCP/IP and UDP/IP over Ethernet. These same NICs can be used with the MX (Myrinet Express) software for kernel-bypass operation for low latency and low host-CPU load over either Ethernet or Myrinet networks."
      • Seems to be single-port.
    • MoreThanIP's 10G HiGig/HiGig2 Ethernet MAC
      • "The 10 Gigabit HiGig / HiGig+ / HiGig2 MAC Core provides a solution to interconnect standard Ethernet devices to Switch HiGig Ports and can be implemented in FPGAs or ASIC devices" / "Optional MAC address comparison on receive and overwrite on transmit for NIC applications
      • Dual-port.
      • This is a MAC core for an FPGA or ASIC. The problem is finding a board which has a suitable interface. IP core liscences are also pretty expensive, probably $50K or so for a redistributable liscence (rough pricing from Xilinx for a similar core, based on memory)
    • Neterion Neterion
      • Current system is not programmable, but next-generation platform promises programmability.
    • Endace's cards should also be able to do this if programmed accordingly.
  • Open FPGA platforms:
    • Stanford's NetFPGA
      • Only 4x1 Gbps.
    • Rice's RiceNIC
      • Also only 1Gbps; but 10 Gbps design available
  • This paper describes how a generic switch with support for EtherChannels can be used to split a high-bandwidth link into several low-bandwith link. The paper demonstrates this for 1Gbps->100Mbps.
    • (+) Uses available components.
    • (-) Expensive with 10Gbps switches (10Gbps add $2000-4000 to the cost of a high-end cluster switch, but adds more if the cluster switch is "cheap").
    • (-) Limits the number of backends to the available output-ports on the switch.
  • Another idea leveraging a switch: using ACLs to restrict which traffic leaves each port: it may be hard to express the right ACLs for load balancing however.
  • An idea to stick with commodity hardware: using multiple taps and a set of frontend PCs each of which only considers a share of the traffic for rewriting (i.e., a "frontend cluster"). Might allow to raise the bar a bit but probably still not providing the desired performance (and complex & rather expensive to setup).

One possibility is the following: 10 Gbps -> Ethernet Switch -(span 8)-> 8 front-ends. On the front ends, a Click module in kernel passes some of the traffic onto Bro, and others are rewritten and injected back into the switch. This way, the front-ends serve double-duty, acting as both load balancer and Bro host, which makes sense given the dual-processor nature of most systems, while allowing more than 8 nodes in the cluster.

10 Gbps modules for switches are expensive but not THAT expensive: EG, for the ProCurve switches (which unfortunately only support SA/DA based on ethernet MAC, not IP, I just use this as an example because I'm more familiar with HP than Cisco -NW), a dual port copper 10 GigE module is $1800. Fiber, however, is much more expensive.

On the Cisco 3750-E series switches, EtherChannels supports SA/DA IP-based distribution. These switches are $6800 for a 24 port version with 2x 10 Gbps slots, plus the cost of the 10 GigE transceivers. So about a $2000 premium over a comparable HP switch.

For doing a 10 Gbps cluster today, this appears the best option: Use a Cisco 3750-E (or 3560-E) as the cluster switch and front end: have the packets go from the 10 Gbps link onto an EtherChannel to 8 front-end systems in SA/DA IP-based balancing mode. These front-end systems are running Click in Kernel mode and Bro. For some of the packets, the front ends will forward directly to the Bro host (by using "ToHost" Click devices). For the rest of the packets, they rewrite the packets and send them back into the switch (probably no a separate Ethernet card) to the remaining Bro compute nodes.

Personal tools
User Management