Setup dual-home Nutanix cluster on Acropolis 4.6

Note: This is not a Nutanix best practice and should not be done before discussing the caveats of this with upper management.

This is a unique situation I came across in the most recent build of a Nutanix cluster. The project required the Nutanix cluster to support 2 distinct and separate networks. Since the NTNX nodes each have 2x 10Gbps and 2x 1Gbps, I needed to create 2 bonds… each bond having a single 10Gbps and a single 1Gbps.

Herein lies the issue and why this is an unsupported setup. The architecture of NTNX’s clusters leverages a Controller VM that resides on each node in the cluster. These CVM’s talk to one another and are essentially the entire brains behind the operation and what makes NTNX so great. This is why the best practice is to pair your 10Gbps links in bond0 and pair your 1Gbps links in bond1. If you do a dual-homed setup and happen to lose one of your 10Gbps links, most scenarios will see you saturating that 1Gbps link and performance will be degraded. As long as all parties understand the risks associated with a dual-homed setup of this nature, let’s get cracking.

Time to create the new bridge… SSH into one of your AHV hosts and run:

SSH into the CVM of your choice or if you’re already SSH’d into your AHV node, just run ‘ssh nutanix@192.168.5.254’ for internal access to the CVM.

Take a look at your interfaces first:

Acropolis sets up all your interfaces into a single bridge and bond by default so we will create a new bridge, new bond, and split up the interfaces into eth0/eth2 and eth1/eth3. Obviously you will have to physically run your cables to the appropriate separate networks so these interfaces might reflect differently on your end.

Modify bridge0 and bond0 to only contain the interfaces of network 1:

Create bridge1 and bond1 to only contain the interfaces of network2:

Verify everything looks good:

Now if that all looks good, you don’t want to have to do this manually on every CVM so use the built-in allssh command:

At this point, your new bridge and bond is setup to allow for access to 2 different physical networks for dual-homed NTNX love. But now you’re seeing an alert in Prism about your CVM’s using an interface which is slower than 10 Gbps. This alert will trigger all the time in this setup regardless of if the active interface is the 10Gbps or the 1Gbps but thankfully there is a way to set which interface is active. Let’s make sure that our new bonds are using the 10 Gbps.

Since we have mismatched interface speeds in our bonds, we need to use the active-backup bonding mode. If this was a traditional cluster, you could have the option of load balancing between 2 active links.

From the AHV host:

First verify that you are in active-backup mode, but if you are not, you can set it by:

Once set, we need to see which interface holds the “active slave” parameter. In my example above, that would be eth2. The active slave mac is set to eth2, as well as under the eth2 interface information, you can see “active slave.” I will admit that calling the active interface the “active slave” seems a bit counter-intuitive but alas.

In my case, the 10 Gbps port is the active slave so I am done. If you aren’t as lucky as me or perhaps you have a 10 Gbps link fail and have to manually set this back (it will not auto-repair back to 10 Gbps as of Acropolis 4.6), all you have to do is:

Repeat that for your appropriate bond# and eth# on your hosts and that’s it. Long winded post but ultimately a few commands to get the job done. If you have a large amount of hosts, feel free to do these AHV host commands via for loop in bash to automate it a bit.

Citrix PVS 7.x : No servers available for disk

This error appeared out of the blue on a vDisk maintenance patch day after the scheduled auto-update was set to start. I was given notice by a colleague that the latest vDisk had failed and thus was not updated. Upon further investigation, turns out that when the update VM was set to boot, it pulls an IP from DHCP but is never served a vDisk even though PVS creates the proper maintenance version.

Good news! This fix is about as simple as it gets by only needing a single new key added to the PVS registry on each PVS server.

That’s it, copy/paste that into your GPO’s or SCCM deployments and your PVS maintenance VM’s will boot without issue. No PVS server reboot required!

Homelab of Doom – Thanks to PernixData

Labotage Labotage2

Some of you may have seen on Twitter that I was the lucky, insanely lucky, winner of the VMworld 2015 PernixData Force Awakens Homelab. Winning it was so shocking, I stared at the Tweetdeck ‘notification’ column for a good 5 minutes in disbelief but low and behold, it was true!

I’ve always been a huge proponent of homelabs for building, breaking, and in turn, learning. Prior to winning this, I had 2 HP Proliant tower servers running over iSCSI to my Synology NAS. I loved that lab… it was with me for my VCP-Cloud, VCP-5, and VCP-5.5 certifications and even learning Xen Server and Hyper-V. But, BRING ON THE SPEED!

The PernixData lab is loaded with 3 SuperMicro 5018D-FN4T’s each with:

  • 8 core Intel Xeon D-1540 CPU’s
  • 64GB of DDR4 PC4-2133
  • 64GB mSATA SSD for ESXi
  • 400GB Intel SSD P3600 PCI NVMe
  • 3TB Seagate Enterprise HDD
  • 1x 1Gbps IPMI NIC
  • 2x 1Gbps NICs
  • 2x 10Gbps NICs
  • Needless to say, these hosts SCREAM. EMC ScaleIO is serving up the capacity layer via the spinning disks while PernixData FVP is giving the performance over Intel SSD’s.

    As you can see, they also supplied some nice switches to get all this connected:

  • Netgear GS728TSB 24 x 1Gbps
  • Netgear XS708E 8 x 10Gbps
  • Beyond the speed of this setup, it’s also surprisingly quiet. Lab noiseI used a decibel meter app on my Nexus 6P to gauge the sound and as you can see, it’s remarkably quiet. That is 3 servers, 2 switches, and 2 NAS’s all running.

    I cannot even begin to thank PernixData, Intel, EMC, and Micron for putting this all together.

    Citrix Xen Server 6.x Error: VDI Not Available

    Enterprise team shot this issue up to me today which I hadn’t seen before. There was a hung VDI VM in our Xen Server 6.5 farm that refused to boot. Every time it would attempt to power up, it would fail with the error of “VDI Not Available.” Typically this occurs if a VM hangs and is terminated improperly or a host fails. In my case, the host wouldn’t let go of the writecache disk for the VDI VM. Thankfully the fix is relatively easy and doesn’t result in any downtime for the rest of the environment.

  • First SSH into the host holding onto the VM in question
  • Get the UUID of the VM and copy it to your clipboard

  • Check to see if the VM is properly shutdown across the cluster/site

  • Get the VDI disk UUID and copy that to your clipboard

  • Run the reset script packaged with Xen Server
  • And that’s all there is to it. Quick, simple, and effective. If only all solutions were this simple…

    vSphere Thick Client? I don’t need no stinkin’ thick client!

    Quick post about an awesome, new VMware Fling that was released somewhat recently. I’m a little late to the party but I haven’t needed to deploy a new host since its release until today.

    If you haven’t heard, VMware Flings are small applications built by the VMware engineers that aren’t officially supported immediately but can still prove very powerful for your environment. Recently, they addressed something that bugged me from the day VMware announced that the web client was the future and the C# thick client was going bye-bye. Long story short, if you wanted to directly interface with an ESXi host without vCenter middle-manning, you were left with either PowerCLI, SSH, or the bloated C# client.

    This new fling is called the ESXi Embedded Host Client, a lightweight web client installed on your hosts that gives you a familiar vCenter web client experience. It takes about a minute to install via a one line esxcli command.

    SSH into your host(s) and execute this command:

    That will pull down the latest version of the Fling from VMware’s servers and auto-install. From that point on, you can use your favorite flavor of browser and point to the DNS/IP of your host and interface with it as you please.

    If you find you love this Fling and want to deploy it your datacenter, Brian Graf wrote a nice pCLI script to automate the whole ordeal which you find here.

    Upgrading vCSA 6.0 to 6.0U1 without VAMI

    Quick post here on updating your vCSA 6.0 in what I believe to be the fastest way to update your vCSA installation. The VAMI that comes with vCSA is a great little tool but I find it to be hit or miss at times so I wanted to find a more reliable and visible way to upgrade. Behold the baked in software-packages tool via SSH!

    1) Go to https://my.vmware.com/group/vmware/patch#search and search for the latest VC patches and download
    2) Upload to a datastore visible by your vCSA
    3) Attach *.iso to the vCSA VM
    4) SSH into the vCSA with your root credentials
    5) Run software-packages install –iso –acceptEulas and wait for the update to finish, it should look like:

    6) Reboot vCSA via shutdown reboot -r updating and rejoice!

    vCSA 6.0 CLI Install

    It is the week following VMworld 2015 so that marks my annual homelab wipe. I normally do this after every VMworld due to the renewed urge to test out all the new tricks I learned over the course of the week. In doing this, I decided to dunk the new vCSA6 web installer in lieu of the CLI. I work in environments where browsers are typically locked down to the point of supreme frustration so below you’ll find a faster, in my opinion, way of deploying the vCSA using JSON.

    Note: This is simply a fully embedded vCSA install using the vPostGres database and localized SSO.

    First, mount the vCSA ISO to your system (or extract the ISO if that is your preference), fire up command prompt/terminal, and change directories to the “vcsa-cli-installer” folder. As you can see, there are tools for OS X, Windows, and Linux so you can pull this off on any system you prefer.

    The tool which is utilized is vcsa-deploy. This application can accept direct parameters or reference a JSON file, which is what this article will concentrate on. Simply run

    and away you go.

    Below you will find the JSON file I created for my install.

    As you can see, it’s pretty self explanatory so just adjust the settings that fit your environment and fire! You can also check out /vcsa-cli-installer/templates/full_conf.json for more settings if you are curious.

    Assuming your JSON is formatted properly, you should see output similar to this:

    And there you have it, a fresh vCSA 6.0 install without having to use the web installer.

    Quick Script: Syslog Server Updater

    Recently deployed a new syslog server and needed a script to update the ~20+ ESXi hosts as fast as possible. This is pretty cut and dry in terms of what happens… prompts for vCenter and Syslog addresses, then it updates the Syslog Server field on each host associated with that vCenter as well as allowing UDP/514 through the ESXi firewall.

    vCAC and Linux Guest Agent How-To and Gotchas

    Earlier this week, I ran into an issue in a new environment that I had just deployed. The vCloud stack was installed as vCAC 6.1 Appliance, external vCO, and vCAC IaaS VM running on Windows Server 2012R2.

    In this post, we’ll run through setting up a CentOS VM with the vCAC guest agent in order to get all the goodies that come with it like the management of new disks, new networking, as well as execution of scripts after deployment. This tutorial can be applied to other distros like Ubuntu, Debian, or SLES but for this example, I kept it in the EL6 family.

    What you’ll need:
    CentOS VM
    Linux Guest Agent Packages
    Certificate file from your vCAC IaaS Server
    DNS working properly

    Since this VM will be a template, I won’t tell you what you should or shouldn’t put on it, but may I suggest giving ‘yum update -y’ a little love? After that is completed, you need to get the LGA (Linux Guest Agent) Packages onto the server. This zip is located on your vCAC server at port 5480/installer, e.g. https://vCAC-server.local:5480/installer. Feel free to use SCP or wget with the –no-check-certificate flag. Lastly, explode the zip to the directory of your choice.

    Next, you need to install the certificate of the IaaS server you deployed. Whether it was self-signed or from a CA, we need a copy of it on our soon-to-be template VM. Easiest way is to use the browser of your choice and go to your IaaS FQDN, e.g. https://IaaS-server.local/, then click the lock on the far left of the address bar, get certificate information, Details tab, then Copy to file leaving it as an encoded X.509 .CER and saving it wherever you choose. SCP this file onto your VM, we’ll come back to it in a moment.

    Now let’s get to installing the actual agent. Change directories to where you unzipped the prior package and go into the architecture of your distro. In our case, we’re going into /rhel6-amd64 and then running:

    This will install itself to /usr/share/gugent/ so change directories to that path. Remember the IaaS cert? Now is the time to copy it to /usr/share/gugent/axis2/ and run:

    Note: If you open /usr/share/gugent/axis2.xml, you can change the final name and path of where the cert file will exist. By default, the cert file will be named cert.pem and reside in /usr/share/gugent/axis2/
    Now run the install script in /usr/share/gugent as such:

    To verify everything is working properly, run ./rungugent.sh and ensure all you see are [Debug] and not [Error] messages.

    If you do see errors, they’re most likely cert related, grep through /usr/share/gugent/axis2/logs/gugent-axis.log to verify. If you see:

    [info] [ssl client] Client certificate chain filenot specified
    [error] ssl/ssl_utils.c(153) Error occured in SSL engine
    [error] ssl/ssl_stream.c(108) Error occured in SSL engine

    Ensure you have placed the cert in the correct directory and/or modified axis2.xml to reflect wherever the finalized cert.pem exists. You will know you’re good to go once you see:

    [Thu Mar 19 15:58:37 2015] [debug] ssl/ssl_utils.c(190) [ssl client] SSL certificate verified against peer

    Now finish setting up your template to your liking with a kickstart script and you’re done!

    Automate password changes with Ansible

    Everyone should be changing root passwords from time to time on their infrastructure. It’s something we all put off as long as possible for various reasons whether they be the hatred of learning a new password or just sheer laziness. Needless to say, it is a necessity of being an admin of ANY system, hope or otherwise.

    One way to get more people on the bandwagon of security and password changes is to make them as seamless as possible. Once again, I turn to Ansible to touch all my boxes for me so I can continue listening to my hero Henry Rollins wax poetic with Pete Holmes on his podcast.

    It is worth noting that I do all my admin work via Ansible on my Macbook Pro. As such, I will assume you already have Ansible running on Mac OS X as well as Python.

    Within Ansible, we will leverage the ‘user’ module to quickly change the password for the account root on all our servers. Ansible doesn’t allow you to pass a cleartext password through its playbooks so you have to install a password hashing library to be leveraged by Python.

    To install the library:

    Generate a hash for the new root password you want:

    Simple Ansible playbook:

    And that’s all there is to it. Execute this playbook against whatever servers you wish and you’re done. This is also a useful addition to your bootstrap playbooks for new provisioning!