Orchestration

The goal of orchestration is to make the process of using Infrastructure as Code repeatable. An idempotent approach achieves this.

But just what is orchestration? As discussed in Infrastructure as Code, the number of servers has exploded with the use of virtualization. Whether these servers are on-premise or hosted in the cloud, there are now multiple servers doing similar jobs. Orchestration is a framework that addresses this rush of servers much like a conductor of a symphony. There are strings (web servers), brass (application servers), percussion (database servers) for example. To coordinate all of this, the conductor, reading from a sheet of music (source code) directs the instruments as appropriate. Orchestration is supported by automation, usually in the form of a build server. We will cover build servers in a future post.

The primary open source orchestration tools used currently are Puppet, Ansible, Chef, and Salt. There are others, but they are not as well know or utilized. Some people refer to these programs as configuration control or version control. I will refer to them as orchestration tools to prevent confusion with other tools, such as Git or Mercurial which are generally regarded as version control tools and are not orchestration tools.

Idempotent

Before we begin, we need to address the idea of Idempotency. The concept comes from mathematics. Specifically:

denoting an element of a set which is unchanged in value when multiplied or otherwise operated on by itself.

Confused? How about this from Wikipedia:

Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application. The concept of idempotence arises in many places in abstract algebra (in particular, in the theory of projectors and closure operators) and functional programming (in which it is connected to the property of referential transparency).

Still unclear? Understanding it takes a bit. Here is a concrete example.

The idea of Infrastructure as Code is you use building blocks to achieve your final goal, automated through an orchestration tool. To build a web server, for example, a starting image is selected. This is usually a barebones kernel and some necessary supporting files. Next, the web server software and additional support files are installed. The final solution is tested, then made available.

The orchestration tool takes the base image and then it executes the following:

install web server

Simple, right? The obvious questions arise: Which operating system? What packages are needed? Just like an orchestra, a conductor has a sheet of music, but he does not tell the individual players how to play their music. It is similar with orchestration software. It is an abstraction to the actual commands. This is idempotency.

For a Red Hat based initial image, if a web server installation is done manually, a base image would need to be instantiated, then the appropriate RPMs would need to be installed with this command:

$ sudo yum install httpd

This would install the web server into the running instance. Running something Debian based? This command would work:

$ sudo apt-get install apache2 apache2-doc apache2-utils

In either case, once the software is installed, the final product needs to be covered into an image and then made available to the next process that will consume it. Clearly, the commands to install software will vary by distribution, whether it is Linux or Windows.

With orchestration, the process is much simpler. For example, in Chef, the command would look like this:

package "httpd" 

service "httpd" do
  action [ :enable, :start ]
end

This will install the web server, and enable it to start up. It does not matter what the underlying operating system is.

Many things are being glossed over in this example, but the idea is this - The code instructs what it wants to do, and let the orchestration software figure out how, and where, to install what is needed. This is idempotency, and orchestration.

The Orchestration Tools

All of the orchestration tools share similar aspects. They have both a client and a server component. Some are designed where only the client component is needed, some require both pieces. They are all designed to be run from a command line, with the code stored in version control systems. They are designed to build a system from scratch or modify an existing configuration. They can also be used to maintain a current configuration by ensuring that any changes made outside of configuration are reverted. Finally they can also be used to destroy running instances.

Each of these programs use YAML, an acronym for Yet Another Markup Language. In certain cases, it is used to define parameters that must be set, as well as some other background information needed by the orchestration tool. In other cases, it defines the actual values to be configured. If there are problems, check the YAML.

All of these tools work with a variety of host environments, including VMWare, Virtual Box, KVM, and Hyper-V, along with all the major cloud providers.

Puppet

Puppet works in a client-server model. The system resources and their state are defined using either the Puppet declarative language or a Ruby DSL (called a domain-specific-language). Puppet calls its installation plans manifests. It discovers resource and dependencies using a utility called Facter. A normal Puppet transaction follows these stages:

An agent sends facts from the client to the master (via Facter)
Puppet builds a list of resources and interdependencies representing the order they need to be configured.
The actual state of the system is set according to the desired state in the manifest file.
The agent sends a report back to the master with changes made, and errors encountered.

Puppet was one of the first of the orchestration tools used in large scale deployments, both for the management of the host as well as guest server environments. You can use either a database, or version control system for managing configuration as well as report on heuristics of system state over time. Puppet does not have an intrinsic testing tool suite and relies on other tools for verification of compliance. A successful puppet installation requires a bit of setup in advance of its use, including DNS, NTP, and some additional firewall settings depending on the selected feature set. Puppet Server runs as a Java virtual machine as well. Tuning of the JVM is essential for large scale operations.

Puppet server even has a manifest for installing itself. It looks like this:

# Installs the application
package 'puppetserver'

# Firewall manipulation
execute 'firewall-cmd' do
  command 'firewall-cmd --permanent --zone=public --add-port=8140/tcp'
end

execute 'firewall-cmd' do
  command 'firewall-cmd --reload'
end

file '/var/opt/puppetlabs/puppetserver' do
	owner 'puppet'
	group 'puppet'
	mode '0755'
	action :create
end

puppet_file = 'etc/puppetlabs/puppetserver/conf.d/puppetserver.conf'

file puppet_file do
	content 'master-var-dir = /var/opt/puppetlabs/puppetserver'
end

There are a couple of things that should be apparent. First, this is not quite idempotent. Specifically, we are telling it what firewall port to open, and exactly where to put the configuration file. Also, this is designed for a Linux system and would not, theoretically, work on a Windows system. However, it does show the power of the orchestrator.

Chef

Like Puppet, Chef utilizes Ruby DSL, but it is also written in Ruby, which makes it extensible. It has a local client mode (Chef Zero) and a server mode (Chef Server). Chef also has a number of support tools, including an automated testing suite (InSpec) designed for testing compliance, security and policy requirements, Kitchen, which is designed to test the efficacy of Chef recipes, Bento, for image management, and Knife, that provides an interface between a local Chef repository and a Chef server. The entire suite of Chef tools and documents revolve around food. There is a test tool for linting recipes, which is what Chef calls its installation plans, called Food Critic. It can get confusing sometimes, especially when using Google for tracking down documentation. More than once, while looking for a solution, multiple links for eight-inch cooking knives were returned rather than syntax.

Chef, like Puppet works equally well with host and guest machines. A combination of shell scripts passing variables, such as a machine image identifier, and a chef recipe can instantiate a full virtual machine in a matter of seconds, depending on the complexity of the installation.

Chef’s recipes can be more complicated than puppet manifests. They are more modular which makes them flexible, but when chained with other recipes, it can make troubleshooting challenging and even a trifle annoying. When creating a new Chef recipe, along with the instructions for installing the software, it also automatically creates a test suite. The basis of Chef is that unit tests are an integrated part of the process of recipe creation. Chef expects you to create the tests for the installation hand in hand with the creation of the recipe.

Here is a sample recipe. It utilizes a JSON input file to populate the variables and then installs the AWS SDK in an AWS machine image, all of which are triggered from a shell script on a build machine.

chef_gem 'aws-sdk' do
  action :install
  compile_time false if respond_to?(:compile_time)
  version node['amazon']['gem']['version']
end

chef_gem 'aws-sdk-core' do
  action :install
  compile_time false if respond_to?(:compile_time)
  version node['amazon']['gem']['version']
end

chef_gem 'aws-sdk-resources' do
  action :install
  compile_time false if respond_to?(:compile_time)
  version node['amazon']['gem']['version']
end

s3_file 'download-phantom' do
  path lazy {"#{node['phantomjs']['base_url'].gsub('file://', '')}/#{node['phantomjs']['basename']}.tar.bz2"}
  action :create
  bucket node['amazon']['s3_buckets']['binaries']
  remote_path lazy {"#{node['phantomjs']['basename']}.tar.bz2"}
end

Like Puppet, there is a bit of configuration that has to be done before you can successfully use Chef. There is a core YAML file that has to be modified, especially for use with different guest types (and especially AWS). A Red Hat on AWS system looks like this:

---
driver:
  name: ec2
  security_group_ids: ["security group"]
  require_chef_omnibus: true
  region: us-east-1 <-- zone may need verification
  availability_zone: e <-- may need verification
  subnet_id: "subnet-yoursubnet"
  associate_public_ip: true
  interface: private <-- when building from inside AWS

transport:
  ssh_key: ~/.ssh/AWS.pem <-- set to your key name
  username: ["ec2-user"] <-- may need to be root for CentOS, ubuntu for ubuntu

provisioner:
  name: chef_solo

platforms:
  - name: centos-6.4
driver:
  image_id: ami-26cc934e <-- Verify
  instance_type: t1.micro <-- Verify
  block_device_mappings:
    - ebs_device_name: /dev/sdb
      ebs_volume_type: gp2
      ebs_virtual_name: test
      ebs_volume_size: 8
      ebs_delete_on_termination: true  

suites:
  - name: default
    run_list:
    attributes:

One nice feature of Chef is the error report it gives if you do not set a version in your recipe.

Ansible

Ansible, unlike the other applications, has its own declarative language but is formatted in, you guessed it, YAML. It is critical to understand YAML. Each of these programs relies on it, some more than others, so knowing YAML syntax is crucial for success. Ansible is available in an open-source version as well as a paid for support version from Red Hat. Fun fact: the name Ansible refers to a fictional instantaneous hyperspace communication system and was first coined by the late Ursula K. Le Guin, but has been used by numerous science fiction stories since.

Ansible calls its installation plans runbooks. Ansible is based on the idea of a single controlling machine, usually called the Tower, and should not be confused with Ansible Tower which is a REST API web service and web-based console Ansible Tower is a commercial product. From here, Ansible’s playbooks can be launched to configure systems of all shapes and sizes.

Ansible was designed to be minimal in structure, with fewer dependencies, which results in a lighter weigh application. It utilizes SSH and Python to manage nodes, rather than agents like Puppet. And, of course, it is idempotent.

An Ansible playbook looks something like this, where it pulls in variables also from a JSON file:

---
- name: Create a new RHEL EC2 instance
  hosts: localhost
  gather_facts: False

  vars:
      region: us-east-1
      instance_type: t2.micro
      ami: ami-26ebbc5c  # RHEL 7.4
      keypair: ansible-key # pem file name

  tasks:

    - name: Create an ec2 instance
      ec2:
         key_name: "{{ keypair }}"
         group: dlane_sg_nova  # security group name
         instance_type: "{{ instance_type}}"
         image: "{{ ami }}"
         wait: true
         region: "{{ region }}"
         count: 1  # default
         count_tag:
            Name: ansible
         instance_tags:
            Name: ansible
         vpc_subnet_id: subnet-17968460
         assign_public_ip: yes
      register: ec2

Salt

Salt, or Saltstack is the newest tool. Because of this, several design decisions were made, based on reactions from prior tools. This makes Salt the odd man out in several ways. It utilizes python as its declarative language, and it has at its heart the idea of high-speed data collection. As a result, it is highly modular, more so than Chef even, and relies on many agents, called minions, to collect that information. As an IT specialist knows, you cannot get anything done without a good collection of minions.

An example of the YAML file for adding users might look like this:

----------
    fullname:

    gid:
        501
    groups:
        - wilma
    home:
        /home/wilma
    homephone:

    name:
        wilma
    passwd:
        x
    roomnumber:

    shell:
        /bin/bash
    uid:
        501
    workphone:

Salt has a straightforward command syntax that can be easily chained together but does require python to be installed on your hosts, which makes it the most limited of the four applications when used in a Windows environment since Python is not installed by default on Windows.

Conclusion

All four of these applications share several similarities. They are idempotent in their operation like a sheet of music with the notes to play, you tell them what you want to be done, and the application figures out how to do it. They all utilize YAML in some fashion or another. You will note, for example, that in Ansible, the YAML is prefaced with three dashes, while Salt uses ten dashes. If there are having issues with the YAML, check the dashes, spaces, tabs, and indentations. It can be very fussy about that sort of thing, and it will drive you nuts if you are not aware of it. The same is true for JSON. Be sure to do a full lint and format check on your JSON before you test your playbook. It will not help with syntax, but it will help get those pesky brackets, commas, and semicolons lined up. That can be an exercise in itself.

The choice of tool will depend on the corporate decision. Each tool has its advantages and disadvantages, but it usually comes down to what has already been developed. Everyone should be familiar with each of these tools, regardless of the one the company has decided to use.

Weblinks

Idempotence
JSON
Puppet
Chef
- Inspec (replaces ServerSpec)
Ansible
Salt