Note: This is not a Nutanix best practice and should not be done before discussing the caveats of this with upper management.
This is a unique situation I came across in the most recent build of a Nutanix cluster. The project required the Nutanix cluster to support 2 distinct and separate networks. Since the NTNX nodes each have 2x 10Gbps and 2x 1Gbps, I needed to create 2 bonds… each bond having a single 10Gbps and a single 1Gbps.
Herein lies the issue and why this is an unsupported setup. The architecture of NTNX’s clusters leverages a Controller VM that resides on each node in the cluster. These CVM’s talk to one another and are essentially the entire brains behind the operation and what makes NTNX so great. This is why the best practice is to pair your 10Gbps links in bond0 and pair your 1Gbps links in bond1. If you do a dual-homed setup and happen to lose one of your 10Gbps links, most scenarios will see you saturating that 1Gbps link and performance will be degraded. As long as all parties understand the risks associated with a dual-homed setup of this nature, let’s get cracking.
Time to create the new bridge… SSH into one of your AHV hosts and run:
1 |
ovs-ctl add-br br1 |
SSH into the CVM of your choice or if you’re already SSH’d into your AHV node, just run ‘ssh nutanix@192.168.5.254’ for internal access to the CVM.
Take a look at your interfaces first:
1 2 3 4 5 6 7 |
manage_ovs show_interfaces name mode link speed eth0 1000 True 1000 eth1 1000 True 1000 eth2 10000 True 10000 eth3 10000 True 10000 |
Acropolis sets up all your interfaces into a single bridge and bond by default so we will create a new bridge, new bond, and split up the interfaces into eth0/eth2 and eth1/eth3. Obviously you will have to physically run your cables to the appropriate separate networks so these interfaces might reflect differently on your end.
Modify bridge0 and bond0 to only contain the interfaces of network 1:
1 |
manage_ovs --bridge_name br0 --bond_name bond0 --interfaces eth0,eth2 update_uplinks |
Create bridge1 and bond1 to only contain the interfaces of network2:
1 |
manage_ovs --bridge_name br1 --bond_name bond1 --interfaces eth1,eth3 update_uplinks |
Verify everything looks good:
1 2 3 4 5 6 7 8 9 |
manage_ovs show_uplinks Bridge br1: Uplink ports: bond1 Uplink ifaces: eth1 eth3 Bridge br0: Uplinks ports: bond0 Uplinks ifaces: eth0 eth2 |
Now if that all looks good, you don’t want to have to do this manually on every CVM so use the built-in allssh command:
1 2 |
allssh manage_ovs --bridge_name br0 --bond_name bond0 --interfaces eth0,eth2 update_uplinks allssh manage_ovs --bridge_name br1 --bond_name bond1 --interfaces eth1,eth3 update_uplinks |
At this point, your new bridge and bond is setup to allow for access to 2 different physical networks for dual-homed NTNX love. But now you’re seeing an alert in Prism about your CVM’s using an interface which is slower than 10 Gbps. This alert will trigger all the time in this setup regardless of if the active interface is the 10Gbps or the 1Gbps but thankfully there is a way to set which interface is active. Let’s make sure that our new bonds are using the 10 Gbps.
Since we have mismatched interface speeds in our bonds, we need to use the active-backup bonding mode. If this was a traditional cluster, you could have the option of load balancing between 2 active links.
From the AHV host:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
ovs_appctl bond/show bond0 ---- bond0 ---- bond_mode: active-backup bond may use recirculation: no, Recirc-ID : -1 bond-hash-basis: 0 updelay: 0 ms downdelay: 0 ms lacp_status: off active slave mac: xxxxxxxxxxxx(eth2) slave eth0: enabled may_enable: true slave eth2: enabled active slave may_enable: true |
First verify that you are in active-backup mode, but if you are not, you can set it by:
1 |
ovs-vsctl set port bond0 bond_mode=active-backup |
Once set, we need to see which interface holds the “active slave” parameter. In my example above, that would be eth2. The active slave mac is set to eth2, as well as under the eth2 interface information, you can see “active slave.” I will admit that calling the active interface the “active slave” seems a bit counter-intuitive but alas.
In my case, the 10 Gbps port is the active slave so I am done. If you aren’t as lucky as me or perhaps you have a 10 Gbps link fail and have to manually set this back (it will not auto-repair back to 10 Gbps as of Acropolis 4.6), all you have to do is:
1 |
ovs-appctl bond/set-active-slave bond0 ethX |
Repeat that for your appropriate bond# and eth# on your hosts and that’s it. Long winded post but ultimately a few commands to get the job done. If you have a large amount of hosts, feel free to do these AHV host commands via for loop in bash to automate it a bit.