Infrastructure as Code

Gordon Moore, a former CEO of Intel, postulated back in 1965, that the number of transistors in a dense integrated circuit doubles every two years. What has become known as Moore’s law has been generalized to explain the rapid proliferation of technology. In the mid-1990s, we saw the transition from mainframes and mini-computers kept in the glass house of the Information Services department to the mass deployment of personal computers under every employee’s desktop, connected by a variety of networking technologies. These were wild days. Technologist had to know multiple protocols, a variety of operating systems, and how to make cables of all shapes and sizes. With the application of Moore’s law, we watched the personal computer’s capabilities explode. Companies added new hardware to their network almost daily, and the need for more, and bigger servers in the IT department also expanded. By the turn of the century, things began to stabilize. Or at least, normalize.

With the advent of public access to the Internet, TCP/IP v4 became the de facto standard protocol for moving data. Tim Berners-Lee’s new protocol for document sharing became the standard for viewing, and network traffic had grown to the point that layer three switches were a standard in every data closet, replacing shared media hubs. Corporations were attaching themselves to the Internet at a rapid pace, and Internet Service Providers were scrambling to keep up with the demands for bandwidth.

No longer was it enough to be a generalist. Companies scrambled to hire specialists. IT staffs grew exponentially. Departments dedicated entire teams to rolling out servers, while others were rolling out desktops. A third managed the network infrastructure and so on. And if we were lucky, they had a cup of coffee together now and then. With the rise of network penetration, originally little more than a nuisance and of little concern, the big foot of security came down and only complicated things.

As we came into the year 2000, coders wrestled with a memory space problem. The Y2K bug was the result of code being kept around too long, and shortcuts made in that code to achieve better operational capabilities. The next significant shift came when the 32-bit memory space was declared too small to implement the necessary improvements need in routing code and other software. The move to 64-bit memory space opened up the next frontier. But before we jump into that, let’s take a look at a small idea that was beginning to grow.

Back in the early years of 2000, I worked for a company that spun up and tore down development servers almost on an hourly basis. We maintained many configurations for jumpfiles (Solaris) and kickstart files (RedHat) that utilized a pixie (PXE) boot methodology. Under increased pressure to make it happen faster, the boss asked his team to develop a system that would allow us to push a button and deploy a server, based on any configuration, to any server. These were not virtual servers. These were physical servers, and we had racks of them. At least fifty. After several attempts to write something in Perl, we discovered a program called CFEngine, which is still around today. Our efforts to automate and orchestrate the environment fell short, but it was the first look into a brand new world.

With the advent of 64-bit code methodologies, virtualization exploded. The cost of physical server hardware that could support numerous virtual machines, usually 32-bit guests running on 64-bit hosts, meant that a data center could host upwards of 16 to 32 guests instances on a 2U server. And as the cost of RAM came down in price and the densities of RAM in servers increased, the upper limit of guests was almost limitless. Companies that were limited in floor space could suddenly increase the number of servers without an increase in real estate. Where an administration team used to manage dozens of servers, they were now responsible for managing hundreds of servers. And it only got worse.

There were several attempts made to replace desktop computers during this time. Wyse and others developed many thin terminals. Machines that could connect to servers where virtual user environments lived. As server proliferation exploded inside companies, so did the issue of workstation management. Patching operating systems, applications, and maintaining security postures became a full-time job for administrators. Even with tools, this challenge was almost overwhelming. But we did not see the next wave, where the management of servers, applications, patching, and traffic intersected when mobile devices began to supplant the desktop. If server proliferation under the advent of virtualization was explosive, server and application proliferation to support mobile devices was an avalanche meeting a tsunami. Management of servers, applications, and network infrastructure had to change. One administrator, even a team of skilled administrators, could not keep up with the burden.

Enter infrastructure as code. In the late 1990s, Microsoft introduced Active Directory as a solution to the user virus, the proliferation of user account needed to log into the various servers in the corporation, infrastructure as code is a solution to the server virus, the increase and need to manage and deploy servers across the enterprise, whether that enterprise is local or remote.

But what problem does Infrastructure as Code solve? Here are three of the common problems it addresses.

What Infrastructure as Code Solves

The Hard Candy Problem

When you manipulate code outside of configuration management or version control, code drift occurs. We all know the scenario. A problem crops up, sometimes on one system, sometimes it is seen on all systems. The solution is tested in development before it moves to production. A configuration file is adjusted, a comma added. Something small. Hardly worth the effort. Maybe it is commented in the configuration file, but usually, it is forgotten ten minutes after it happens as some new problem comes up to be wrestled with and addressed. This does not happen once but usually happens frequently through the lifecycle of the application or server supporting it. I liken this to a piece of hard candy that has rolled under the couch. By the time you find it, cruft covers the original candy along with lint and other indecipherable things.

Similarly, code drift does not become evident until the server or application running on it needs to be updated, the OS needs to be updated, or restored from backup. Administrators have lost many hours while they struggled to make things work, either by bringing the server back into the baseline or just trying to get the application running successfully. Usually on limited available hardware.

Infrastructure as code solves the problem of code drift by enforcing the configuration and version of the infrastructure. In a proper implementation, no changes are possible at the OS or application level outside of the Version Control system. There are several ways to manage the enforcement - as a managed configuration, or as a managed version. Configuration management is not a new idea. Programs like CFEngine, or more commonly, Tripwire would take a snapshot of a valid configuration and then by command or periodic basis, run against the server configuration and reset any manual changes made back to the default. Tripwire suffered from two problems. First, it was not designed to work well across server platforms. It was intended to enforce a configuration on a single server instance. Secondly, it is not idempotant. We will discuss the value of idempotency in a future edition, but orchestration and automation programs like Chef, Puppet, Ansible, and Salt are both designed for large scale, cross-platform environments and are idempotent in their operation.

In either case, you only make changes outside of the server environment, in the code that builds and maintains the server configuration. If you need to push a change, you change the code, update the version, and the new configuration will be deployed automatically or by command. This means that every server impacted will be updated. And new servers and application builds will be built the same way, every time.

The It Works For Me Problem

The second problem addressed by infrastructure as code is the age-old it works for me problem seen between development and operations. Traditional development systems are under the control of the development group, and operations manage production. Developers test their build locally, push it to the development environment, test it again, pass it to QA, and then to operations. The problem comes up usually at QA where the code does not work, generally because of a different sort of code drift - local libraries. Developers install libraries to support their development and forget they introduced them. As a result, the requirement for these libraries is overlooked and not installed. The code fails. The bug is written up, and the battle lines are drawn. Development cannot reproduce the error, so they accuse QA of not doing something right, QA says they are doing it the way they used to. Meetings occur, blood pressure rises, and the issues are tossed around like a rubber ball.

Infrastructure as code, as I said before, builds the server, and application, the same way, every time, whether that code is a local build, a development build, QA, or production. Further, done correctly, version control allows management to control the pace of adoption of new external code, such as library updates, patch implementation, and other areas that tend to lead to problems as code moves through the lifecycle.

The We’ve Been Hacked Problem

The last thing any team wants to hear is we’ve been hacked. Whether it is from external malfeasance or a lack of internal controls, bots, worms, rootkits, spearfishing, and misconfiguration are the norm in today’s IT world. A system that is penetrated and has a rootkit or other trojan installed can linger for weeks, months, or years, based backup strategy, and server retention — the hard candy model where you want to eat it - ewww.

Infrastructure as code, with DevOps practices, reduces the risk in many ways. First, as part of good DevOps practices, you are constantly scanning your code for vulnerabilities, at least at every build. This prevents internal insertion of malware or accidental importation via third party libraries. With additional security tools built into the server image, further on-demand scanning at server instantiation will alert and kill off servers before they are brought on-line and allowed to come into use. Finally, because it is so easy to spin-up or spin-down an instance, servers can be killed off frequently, resulting in less exposure should malware sneak in despite preventative measures.

Remember also that infrastructure as code is maintaining version and configuration control, and can be used for firewall and other security operations. It is not just for servers and applications. Your entire enterprise, managed as code, can reduce human error.

Conclusion

With the explosion of devices, modes of access, operational tempo, and increasing demands to do more with less, infrastructure as code offers administrators and management teams powerful tools. It enables them to get their arms around a problem that will only increase in size as the business demands grow. But it is not a panacea. Infrastructure as code only solves as a select set of problems. It allows companies to reduce code drift, normalize operations and reduce fragility across the enterprise. It does not solve all lifecycle development problems. It does not address version control scenarios, even though it aids in configuration management. It does not prevent people from doing bad things, and, without proper controls, it can cause catastrophic damage.

Infrastructure as code does address system consistencies, repeatable processes, and allow you to adapt to an ever-changing set of demands in a rapid, flexible manner. And done well, it will let you do so before anyone even knows they have a need.