HOWTO BONDING

From KVM
Revision as of 15:36, 23 February 2011 by Jinzishuai (talk | contribs)

NIC Bonding

Sometimes this is called port trunking and may be called for something else too, but we will use bonding. But what is bonding? It's shortly making X number of NICs to work as one, with the purse of increasing the throughput (HT), increase the network availability (HA) or a combination of both.

It's possible to use different brands and models of NICs, in a HA setup you can have different speeds (the bond will adapt to the slowest). Even if a NIC supports jumbo frames, it may not always work well in a bond together with jumbo frames.

Before you begin with setting up your bond, check that all of the components used in your bond are working properly, for broken hardware and bad cables will be slightly more difficult to detect when you are setting up your bond for the first time.


Example

This example will include 3 servers, all using 3 NICs for their bond (The servers could have more NICs or/and bonds) and they have a RedHat like Linux which uses network-scripts to configure network settings.

Bonding example.png

- Server 1: NFS server (ip: 10.0.0.1)
- Server 2: NFS client (ip: 10.0.0.2)
- Server 3: NFS client (ip: 10.0.0.3)

You have to decide if we want to use mii or arp monitoring of the "ports", mii is done locally and won't detect if something stopped to work remotely. Arp has the disadvantage that not all NIC drivers supports features needed for this to work.

You also need to pick a mode how your bond should work, mode 0 - 3 should work with most switches, while mode 4 will require features you won't find in home switches and mode 5 - 6 will require that your NICs driver has ethtool support.


In the /etc/modprobe.conf file add the following (mii):

alias bond0 bonding
options bond0 miimon=80 mode=0


In the /etc/modprobe.conf file add the following (arp, server 1):

alias bond0 bonding
options bond0 arp_interval=80 arp_ip_target=10.0.0.2,10.0.0.3 mode=0

You must specify between 1 and 16 ip-numbers, the more ip-numbers listed in the arp_ip_target the less risk that the "port" will be taken down when the remote machine reboots, all addresses is separated with a comma.


Create the /etc/sysconfig/network-scripts/ifcfg-bond0 (server 1):

DEVICE=bond0
IPADDR=10.0.0.1
NETMASK=255.255.255.0
NETWORK=10.0.0.0
BROADCAST=10.0.0.255
GATEWAY=
ONBOOT=yes
BOOTPROTO=none
USERCTL=no

The ifcfg-bond0 don't really differ from a traditional ifcfg-eth0, and it may have gateway specified


Change the /etc/sysconfig/network-scripts/ifcfg-eth1 to (all servers):

DEVICE=eth1
HWADDR=c6:73:4b:1b:ba:45
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
MASTER=bond0
SLAVE=yes

Do always specify the hardware address, or else you will never know which NIC is eth0, eht1 and so on and will cause you problems if you have more than one bond or you have an NIC not part of the bond. Do a similar modification for the eth2 and eth3

Now you can restart the network service and you will have a new entry when you run ifconfig, the bond0, it will have the same MAC adress as eth1 and that applies also to eth2 and eth3. If you want to change the mode used, you need to unload the bonding module and change the setting and then load the module again, this can cause some problems if you do it remotely.

If you decide to remove a NIC from the bond, either you take it down manually with ifconfig, or stop the network change the ifcfg-ethX file so that the NIC ain't part of the bond and then start the network again, if you change the file and then restart the network, you will still have the NIC as part of the bond.


Modes

It's possible to assign the mode number or the mode name when selecting the mode in the kernel module option.

       0 or balance-rr 
               Round-robin policy: Transmit packets in sequential
               order from the first available slave through the
               last.  This mode provides load balancing and fault
               tolerance. (This is the default mode if no mode specified)
       1 or active-backup 
               Active-backup policy: Only one slave in the bond is
               active.  A different slave becomes active if, and only
               if, the active slave fails.  The bond's MAC address is
               externally visible on only one port (network adapter)
               to avoid confusing the switch.
               In bonding version 2.6.2 or later, when a failover
               occurs in active-backup mode, bonding will issue one
               or more gratuitous ARPs on the newly active slave.
               One gratutious ARP is issued for the bonding master
               interface and each VLAN interfaces configured above
               it, provided that the interface has at least one IP
               address configured.  Gratuitous ARPs issued for VLAN
               interfaces are tagged with the appropriate VLAN id.
               This mode provides fault tolerance.  The primary
               option, documented below, affects the behavior of this
               mode.
       2 or balance-xor 
               XOR policy: Transmit based on the selected transmit
               hash policy.  The default policy is a simple [(source
               MAC address XOR'd with destination MAC address) modulo
               slave count].  Alternate transmit policies may be
               selected via the xmit_hash_policy option, described
               below.
               This mode provides load balancing and fault tolerance.
       3 or broadcast 
               Broadcast policy: transmits everything on all slave
               interfaces.  This mode provides fault tolerance.
       4 or 802.3ad 
               IEEE 802.3ad Dynamic link aggregation.  Creates
               aggregation groups that share the same speed and
               duplex settings.  Utilizes all slaves in the active
               aggregator according to the 802.3ad specification.
               Slave selection for outgoing traffic is done according
               to the transmit hash policy, which may be changed from
               the default simple XOR policy via the xmit_hash_policy
               option, documented below.  Note that not all transmit
               policies may be 802.3ad compliant, particularly in
               regards to the packet mis-ordering requirements of
               section 43.2.4 of the 802.3ad standard.  Differing
               peer implementations will have varying tolerances for
               noncompliance.
               Prerequisites:
               1. Ethtool support in the base drivers for retrieving
               the speed and duplex of each slave.
               2. A switch that supports IEEE 802.3ad Dynamic link
               aggregation.
               Most switches will require some type of configuration
               to enable 802.3ad mode.
       5 or balance-tlb 
               Adaptive transmit load balancing: channel bonding that
               does not require any special switch support.  The
               outgoing traffic is distributed according to the
               current load (computed relative to the speed) on each
               slave.  Incoming traffic is received by the current
               slave.  If the receiving slave fails, another slave
               takes over the MAC address of the failed receiving
               slave.
               Prerequisite:
               Ethtool support in the base drivers for retrieving the
               speed of each slave.
       6 or balance-alb 
               Adaptive load balancing: includes balance-tlb plus
               receive load balancing (rlb) for IPV4 traffic, and
               does not require any special switch support.  The
               receive load balancing is achieved by ARP negotiation.
               The bonding driver intercepts the ARP Replies sent by
               the local system on their way out and overwrites the
               source hardware address with the unique hardware
               address of one of the slaves in the bond such that
               different peers use different hardware addresses for
               the server.
               Receive traffic from connections created by the server
               is also balanced.  When the local system sends an ARP
               Request the bonding driver copies and saves the peer's
               IP information from the ARP packet.  When the ARP
               Reply arrives from the peer, its hardware address is
               retrieved and the bonding driver initiates an ARP
               reply to this peer assigning it to one of the slaves
               in the bond.  A problematic outcome of using ARP
               negotiation for balancing is that each time that an
               ARP request is broadcast it uses the hardware address
               of the bond.  Hence, peers learn the hardware address
               of the bond and the balancing of receive traffic
               collapses to the current slave.  This is handled by
               sending updates (ARP Replies) to all the peers with
               their individually assigned hardware address such that
               the traffic is redistributed.  Receive traffic is also
               redistributed when a new slave is added to the bond
               and when an inactive slave is re-activated.  The
               receive load is distributed sequentially (round robin)
               among the group of highest speed slaves in the bond.
               When a link is reconnected or a new slave joins the
               bond the receive traffic is redistributed among all
               active slaves in the bond by initiating ARP Replies
               with the selected mac address to each of the
               clients. The updelay parameter (detailed below) must
               be set to a value equal or greater than the switch's
               forwarding delay so that the ARP Replies sent to the
               peers will not be blocked by the switch.
               Prerequisites:
               1. Ethtool support in the base drivers for retrieving
               the speed of each slave.
               2. Base driver support for setting the hardware
               address of a device while it is open.  This is
               required so that there will always be one slave in the
               team using the bond hardware address (the
               curr_active_slave) while having a unique hardware
               address for each slave in the bond.  If the
               curr_active_slave fails its hardware address is
               swapped with the new curr_active_slave that was
               chosen.

Use together with KVM

This isn't something you are meant to do within your KVM guests, but on the host, you can assign the bridge to the bond instead of the traditional eth0, this way you will have HA, HT or HA/HT setup.

Problem with Bridge + Bonding

There is a known ARP problem for bridge on a bonded interface. Ref:

Please let me know if you know a solution.

Read more

Here are some useful external links how to setup your bond for other Linux distributions and of course the more in detail Linux Ethernet Bonding Driver HOWTO, where you can read a bit of different examples of how to build your network with one switch (single point of failure).

- Linux Ethernet Bonding Driver HOWTO
- Gentoo bonding HOWTO
- Ubuntu 6 Bonding


--Trizt 13:57, 16 August 2009 (EDT)