Press "Enter" to skip to content

Fixing crashed alerts service in Nutanix Prism

This week I ran across this issue randomly when I went to resolve some alerts through a STIG applied version of IE11. Prism itself was fully functional with the exception of any place where ‘Alerts’ would appear, I was met with:
alertmanager_error

In order to properly troubleshoot, I SSH’d into a CVM, ran ‘cluster status’ to verify all components were up on each CVM… when I found they were, it was log parsing time. Thankfully Nutanix has an awesome log system in place to troubleshoot things like this. All logs are kept in /home/nutanix/data/logs and are organized in such a way that it makes it very easy to sift through what you need. Each major component has an INFO, WARNING, ERROR, and FATAL log to expedite the hunt.

With this issue, I knew the problem was with the alert_manager since that was the component not functioning properly in Prism. I changed directors, grep’d the alert_manager files, and then tail’d the alert_manager.FATAL log.

The FATAL log only contained 1 line:

This verified that the alert_manager was crashed and since the last thing I did was try to clear an alert, it made sense to manually clear the alerts from SSH and restart the service.

Close and re-open your flavor of browser and re-login to Prism and you should see that your Alerts are now empty, but displaying information properly.