Press "Enter" to skip to content

Category: Nutanix

Fixing crashed alerts service in Nutanix Prism

This week I ran across this issue randomly when I went to resolve some alerts through a STIG applied version of IE11. Prism itself was fully functional with the exception of any place where ‘Alerts’ would appear, I was met with:

In order to properly troubleshoot, I SSH’d into a CVM, ran ‘cluster status’ to verify all components were up on each CVM… when I found they were, it was log parsing time. Thankfully Nutanix has an awesome log system in place to troubleshoot things like this. All logs are kept in /home/nutanix/data/logs and are organized in such a way that it makes it very easy to sift through what you need. Each major component has an INFO, WARNING, ERROR, and FATAL log to expedite the hunt.

With this issue, I knew the problem was with the alert_manager since that was the component not functioning properly in Prism. I changed directors, grep’d the alert_manager files, and then tail’d the alert_manager.FATAL log.

The FATAL log only contained 1 line:

This verified that the alert_manager was crashed and since the last thing I did was try to clear an alert, it made sense to manually clear the alerts from SSH and restart the service.

Close and re-open your flavor of browser and re-login to Prism and you should see that your Alerts are now empty, but displaying information properly.

Comments closed

Setup dual-home Nutanix cluster on Acropolis 4.6

Note: This is not a Nutanix best practice and should not be done before discussing the caveats of this with upper management.

This is a unique situation I came across in the most recent build of a Nutanix cluster. The project required the Nutanix cluster to support 2 distinct and separate networks. Since the NTNX nodes each have 2x 10Gbps and 2x 1Gbps, I needed to create 2 bonds… each bond having a single 10Gbps and a single 1Gbps.

Herein lies the issue and why this is an unsupported setup. The architecture of NTNX’s clusters leverages a Controller VM that resides on each node in the cluster. These CVM’s talk to one another and are essentially the entire brains behind the operation and what makes NTNX so great. This is why the best practice is to pair your 10Gbps links in bond0 and pair your 1Gbps links in bond1. If you do a dual-homed setup and happen to lose one of your 10Gbps links, most scenarios will see you saturating that 1Gbps link and performance will be degraded. As long as all parties understand the risks associated with a dual-homed setup of this nature, let’s get cracking.

Time to create the new bridge… SSH into one of your AHV hosts and run:

SSH into the CVM of your choice or if you’re already SSH’d into your AHV node, just run ‘ssh nutanix@’ for internal access to the CVM.

Take a look at your interfaces first:

Acropolis sets up all your interfaces into a single bridge and bond by default so we will create a new bridge, new bond, and split up the interfaces into eth0/eth2 and eth1/eth3. Obviously you will have to physically run your cables to the appropriate separate networks so these interfaces might reflect differently on your end.

Modify bridge0 and bond0 to only contain the interfaces of network 1:

Create bridge1 and bond1 to only contain the interfaces of network2:

Verify everything looks good:

Now if that all looks good, you don’t want to have to do this manually on every CVM so use the built-in allssh command:

At this point, your new bridge and bond is setup to allow for access to 2 different physical networks for dual-homed NTNX love. But now you’re seeing an alert in Prism about your CVM’s using an interface which is slower than 10 Gbps. This alert will trigger all the time in this setup regardless of if the active interface is the 10Gbps or the 1Gbps but thankfully there is a way to set which interface is active. Let’s make sure that our new bonds are using the 10 Gbps.

Since we have mismatched interface speeds in our bonds, we need to use the active-backup bonding mode. If this was a traditional cluster, you could have the option of load balancing between 2 active links.

From the AHV host:

First verify that you are in active-backup mode, but if you are not, you can set it by:

Once set, we need to see which interface holds the “active slave” parameter. In my example above, that would be eth2. The active slave mac is set to eth2, as well as under the eth2 interface information, you can see “active slave.” I will admit that calling the active interface the “active slave” seems a bit counter-intuitive but alas.

In my case, the 10 Gbps port is the active slave so I am done. If you aren’t as lucky as me or perhaps you have a 10 Gbps link fail and have to manually set this back (it will not auto-repair back to 10 Gbps as of Acropolis 4.6), all you have to do is:

Repeat that for your appropriate bond# and eth# on your hosts and that’s it. Long winded post but ultimately a few commands to get the job done. If you have a large amount of hosts, feel free to do these AHV host commands via for loop in bash to automate it a bit.

Comments closed