A vRealize Automation 7 installation is something that can be fairly different from one organization to the next. There is quite a few different components involved in it and most of them are flexible enough for you to have a truly granular control of them and how they will perform. So in this post, we'll take a look at a real life example of a vRA7 distributed installation. Hopefully, this post will give most of you a guide to follow if you don't know where to start.
As it is with any design, you need to know as much as possible about your requirements before you begin. In this particular case, my (admittedly simple) requirements are:
(1) Redundancy of each of the components.
(2) Installation is capable of observing a load of 10 simultaneous deployments.
Each of these deployments consists of an average of 10 VMs and includes multiple NSX components: a load balancer instance with multiple virtual servers & pools, Edge Gateways for routing, Security Groups to automatically apply Distributed Firewall rules and of course Logical Switches for VXLAN traffic.
Notice that these 2 requirements are related to (1) Redundancy and (2) Performance. This is the whole reason we're doing this!
The foundation of this vRealize Automation 7 installation environment is derived from the Reference Architecture document on VMware's Documentation Center. You should definitely give that document a look if you're planning a vRA7 distributed installation. Also, you'll want to keep the vRealize Automation 7 installation guide in your back pocket during the installation procedure!
Here's a shopping list of components that will be part of our distributed installation :
- 2 x vRealize Automation appliances
- 2 x vRealize Orchestrator appliances
- 2 x Microsoft SQL Servers (1 cluster)
- 2 x DEM Servers
- 2 x Manager / IaaS Web / DEO Servers
- 2 x Agent Servers
- A load balancer instance for the vRealize Automation appliances
- A load balancer instance for the vRealize Orchestrator appliances
- A load balancer instance for the IaaS Web Server
A load balancer instance for the Manager Server
And here's a bigger version of that diagram that shows all these components (click to expand) :
Let's say you're setting up a vRA7 lab, you can have a very simple installation with just a few machines. The bare minimum to get vRA up and running is 2 machines : (1) the vRealize Automation 7 appliance, (2) a Windows server with A LOT of stuff installed on it : IaaS components, DEM, DEO, Manager Service and a Microsoft SQL Server instance for the IaaS components. That is a lot of stuff on one machine and should only be used for Test environments.
One great thing about Distributed Installations, is that the new vRA7 installation wizard guides you through it step by step. So installing a vRA7 instance with many components is not much more difficult than doing a simple little test installation with just a few machines.
So if you're ready to play with the big boys and you're planning an Enterprise Distributed Installation, then there are 2 ways to do it : the easy way or the hard way.
The easy way involves starting up the installation of a vRA appliance, using it as a repository server for all the Windows servers to download and install their Management Agent from and let the vRA enterprise installer automatically deploy all the Windows server components on its own. That way is quick, easy and painless. The installation wizard takes a while to go through, but it's very straightforward. After you decide on which components will be installed where, the rest of the installation is automated. Here's a screenshot of the installation wizard :
If however, you're the type of person that enjoys unnecessary pain and suffering, you could chose to install every single component separately and manually, that's the hard way. If that's your thing, here's the documentation, have a nice day. Whichever path you decide to take, the vRealize Automation 7 Installation Guide will become your best friend at this point. Be sure to skim through it, it's worth a read. Oh and by the way, before you begin the vRealize Automation installation on all those VMs, do yourself a favor and tttaaaaaakkkkkeeeee ssnnnnaaaappppssshhoooooooootttttsssss!!! You don't want to have to redeploy all the VMs just because of a typo you made in the installation wizard!
Update : Even if the Entreprise Installation Wizard doesn't explicitly specify it, ALWAYS use FQDNs when asked for hostnames of the servers on which you install the different components. I've personally encountered a significant amount of issues, just because of this. Do yourself a favor and always use the FQDN.
LET'S GET SCALING!
Now let's get serious and break down the list of the components and look at how we can scale and failover each of them.
vRealize Automation Appliance
You can deploy 2 or more appliances, and configure them in a cluster. Doing so will create an Active/Passive cluster for the vPostgres DB component. Since the Web portal is available on both appliances, you will also need to position a load balancer in front of these two appliances on port 443 to account for it. To make your connection to an Active Directory redundant, you also have to configure a 2nd "connector" to the same AD for the 2nd vRA appliance. When you need to failover to another appliance, you'll need to go to the surviving appliance's management page and promote the Postgres Database to a MASTER copy. If your load balancer is properly configured, the appliance that failed should already be considered Down in the Load balancer pool and not be receiving any traffic.
DEM – Distributed Execution Manager
Simply install another instance of a DEM and it will be automatically detected (using the vRA IaaS Management Agents) by an existing one. Once it comes online and becomes automatically paired, it puts itself in Passive mode and regularly checks if the Active DEM is alive. When a failure occurs, it becomes the master and is the only active DEM. If the other DEM comes back online, it will become the Passive one until another failure occurs.
DEO – Distributed Execution Manager Orchestrator
The DEO works in exactly the same way as the DEM. The above description for DEM applies here as well.
The Manager Service is an Active-Passive component that needs to be put behind a load balancer. You can't have two active at a time, so you need to Stop the "Manager Service" Windows service on one of your two Windows servers. It is even recommended to disable the Manager service on the Passive node, just to avoid any surprises. You also need to configure the load balancer to send all traffic to ONLY the Active server at all times. When you want to failover, make sure the Manager Service is stopped on the failed primary node. Once that's done, start the Manager Service on the Passive node and configure your load balancer to send all traffic to the this newly active node and deactivate the failed one.
The IaaS website is another component that needs to be put behind a load balancer, The first IaaS server is the one used to initialize the database that resides in a Microsoft SQL Server instance, but after the installation is done, both IaaS servers are for all intents and purposes equals. When you have a failed IaaS Web server, there's not much to do besides ensure that the remaining servers are active and responding through the Load Balancer's Virtual IP.
To make Agents redundant, all you need to do is install 2 of them on different servers with exactly the same configuration and using the Management Agents, they will become aware of each other and automatically be configured. This is easily achieved if you use the Installation Wizard. Failover is automatic as well : if one Agent goes down, the other takes over seamlessly.
vRealize Orchestrator Appliance
vRO is actually a entirely separate product from a vRA installation, but in the end it's so tightly intertwined with vRA that I have to mention it. There's more or less well documented procedure available in the vRealize Orchestrator Installation Guide that you can follow, but I found that I needed to look around elsewhere to find all the information I needed to feel confident about my vRO deployment. Someone should really take care of documenting that… Who knows, maybe in a future article… Anywho, to scale these things, slap 2 of them behind a load balancer, point all the nodes to the same Database and call it a day. Failover shouldn't cause any headaches, as there should no manual tasks to do to ensure availability after a node failure. However, keep in mind that once the vRA instances are clustered, it can be quite a pain to keep them in sync when you make changes to one of the nodes in the cluster.
UPDATE : It has been brought to my attention that when you clsuter vRA appliances, the vRO instances within them are also automagically clustered as well. Considering the pain it can be to maintain a standalone vRO cluster, I think this is the way to go when building out a distributed vRA installation. Also, since version 7.1 of the vRealize Automation Reference Architecture Guide, this is the recommended setup!
Microsoft SQL Server for IaaS components
The SQL Server database contains all the information related to IaaS components and objects. You'll need a clustered installation if you want redundancy. Keep in mind, it won't be possible to use fancy features like "AlwaysOn" because vRA's installation have dependency on the MSDTC component, which apparently does jive well with that feature. There are a ton of great installation guides to install a MSSQL cluster than I could provide, so if you're lookin' for one, LET ME GOOGLE THAT FOR YOU.
Whether you use NSX or F5 Big-IP or anything else for Load balancing, everything you need to know is in this document. There's a bunch of information there, load balancing algorithms, health monitor configurations, etc..
Keep in mind, your load balancer should be fully configured before you begin the vRA7 distributed installation. You'll also have to create DNS records for the load balancer virtual server VIPs, since the vRA7 installer will ask you to input them during the installation process. Finally, you will have to accommodate the installer by doing 2 things : (1) Enable only 1 vRA appliance, 1 IaaS Web server and 1 IaaS Manager Server in the load balancer's pools and (2) disable the load balancer's health checks during the installation process. Once the process is completed, you can re-enable the other nodes and reactivate the health checks. You don't need to worry about forgetting these details, the Installation Wizard will remind you when it's time to do all this.
Here is a table of all the components and a summary of how to configure them and how to fail them over in case of disaster:
[table id=1 /]
Also… IaaS Management Agents
The IaaS management agents need to be installed onto any Windows machine that will host an installation of any of vRealize Automation 7's IaaS components. The management agents are sort of like that one friend that is always the one that brings everyone together and if it weren't for them, you'd never see your friends! These agents are what allow you to have a distributed installation by being the glue that keeps everyone together. Every IaaS Management Agent that is installed, registers itself with the vRA appliance and allows the services to know about each other and to be automagically installed.
THE BOTTOM LINE
So as you can see, you need to understand what each of these components is responsible for before you can decide how much to scale each of them, trying to balance redundancy and performance vs complexity. I used NSX load balancers with HA in my vRA installation, but no matter which load balancer you use, make sure you have some sort of redundancy on them, otherwise it sort of defeats the purpose of distributing your installation!
If you use vRealize Automation 7 anywhere else than a lab or your parents' basement, you want to consider using a distributed installation. We all know how "it's just a lab" can quickly become "this is production now, right?". It's better to be safe than be sorry, so think about it before your users really start hammering your vRA installation with requests and you notice that it's about to crack under the pressure.
So now that VMware made every component of vRealize Automation redundant, maybe it's time we see the same treatment given to vCenter and NSX Manager… Fingers crossed!