As we began moving apps up into AWS we were challenged by the security team to “harden” the AMI used for our ec2 instances, we were already thinking of using HashiCorp Terraform for infra as code, so HashiCorp Packer was an obvious choice and while not always elegant it has done a good job.
Packer uses the concepts of builders and provisioners to spin up (builder) an ec2 instance, and run scripts (provisioners) to configure resources. Once complete the instance is shut down and an AMI is “baked” from the instance. We build multiple AMIs that inherit from each other so an image that has our minimum configuration applied is used as the base for other AMIs that in turn install and provision additional resources, rinse and repeat. To make things simple Jenkins polls our packer repository for changes and kicks off packer builds as soon as commits are merged to master.
At the beginning of our journey there were many changes to our ‘base’ image, and multiple sets of eyes on those builds as it was hardened and configured. Over time the configurations stabilized and updates happened infrequently which meant our AMIs got a bit stale - OS patches piled up and we realized that all our AMIs needed to be built on a schedule. Jenkins has the option to build a job on a schedule, but with infrequent changes to the packer configurations there were less eyes on the packer build process - if something failed in a way that didn’t cause packer to fail we could produce AMIs with defects that would go unnoticed, or worse - noticed by customers.
Building AMIs frequently automates us out of the OS patch management business but instances need to be terminated and configurations need to be updated to take advantage of the new AMIs - but what if a defect was introduced during a packer build, something sneaky enough that the build didn’t fail yet damaging enough that an application wouldn’t be able to start or communicate using an updated AMI? Since the processes to build the AMI and restart the applications are all automated it’s easy to imagine coming in to work and discovering multiple applications in boot loops failing health checks.
Packer will fail a build and not produce an AMI if a command exits with a non-zero status. But what if packer is just copying iptables rules into place? Easy enough to foul one up and realize the mistake later when applications are unable to communicate so they are stuck in ALB health-check reboot land, causing application outages and wasting money.
Chef InSpec to the rescue - “InSpec is an open-source testing framework for infrastructure with a human-readable language for specifying compliance, security and other policy requirements.”. While I imagine InSpec is used primarily to test security configurations I decided it would be well suited for acceptance testing as well. Integrating InSpec with Packer I can verify that the configurations applied (builders) by Packer are actually getting … applied, and that I can guarantee an AMI built with our process meets a base level of acceptance. For example if I enable selinux I would like to test that when an ec2 instance is spun up that selinux is actually enabled, if docker is installed I would like to verify that the configuration applied produces the expected results using the docker binary. This goes beyond checking the configuration files, it’s better if a process can start/run and respond so that configurations can be validated.
Lastly I didn’t want to install anything on the ec2 instance to run the acceptance test, and ideally I want to install the minimum of things on the instance being used to run the packer build / InSpec tests.
Ok, enough rambling, lets get into the details.
Creating an AMI with Packer
First a directory structure was created with a base directory containing a common variable file and a bash script that will kick off a packer build, sub-directories are named for the image that will be built and contain the .json file which defines the builders and provisioners used by Packer
This script is run by Jenkins (or locally for testing purposes) and takes a single argument that defines which image should be built, so
packer-build.sh base would create an image named base, using files in the
base directory. We use Amazon Linux as our base image, but this could easily be tailored to suit your needs by updating the
Here we define some variables that will be referenced in all our images, to make ssh connections easier we are using an ssh key that is already populated in AWS.
This file defines the builder (amazon-ebs), and then the provisioners (only using shell so far).
config.sh & other-config.sh
These are simple shell scripts that run commands to configure the instance - you would write these to comply with your business requirements to setup things like docker, selinux, logging configuration, etc.
Running the packer-builder.sh script
Now that you have the directory structure and scripts installed, and assuming you have packer installed and IAM configured you can manually run the packer-builder.sh script or have it triggered in your build tool of choice, either way it takes a single argument that corresponds to the image you want to build, assuming everything works you should end up with an AMI built to your spec.
Integrating InSpec tests
As discussed earlier building an AMI is really only half the battle, acceptance testing is important to make this fully automated process run smoothly. There are a couple hints in the packer-build.sh script and common-vars.json that help setup our environment for InSpec testing, like exporting the name of the image we want built, and the ssh keypair and username. Additional work needs to be done to discover the IP address of the ec2 instance since Packer doesn’t expose this data. First you’ll need some additional provisioners added to your image.json files:
We need to restart the instance since any changes we made in our config.sh scripts may not take effect until boot time - things like services starting up automatically or selinux getting enabled. Packer will happily wait while the instance reboots and pick up where it left off.
Next you’ll need to actually run the InSpec test, here we are using the
shell-local provisioner which runs scripts on the packer builder node
pause_before - InSpec is much less tolerant of connection timeouts so we just wait patiently while the node comes back online, then run the InSpec tests.
Directory structure for InSpec looks like this:
This file defines what acceptance tests (InSpec tests) you are going to run against the instance after it has been rebooted. Note the name of this file (base.rb) matches the name of the directory (in this case base/) that we are building - each build will require a similarly named .rb file containing InSpec tests.
This is just a sample test - you’d want to write tests for as many services as you have configured, refer to the InSpec documentation and tutorials for what you can test (hint: all the things).
This file is the real MVP… using a function to define a docker command is the key to avoiding installing InSpec on the ec2 instance being tested as well as the ec2 instance packer is running from - in our case the nodes running the packer build already happen to be running docker so all that’s required is pulling an image.
Earlier on in the packer-build.sh script you may have noticed we pipe the output through
build.log, because Packer doesn’t expose the IP of the ec2 instance we scrape build.log for the instance ID and then ask AWS what the IP is, it’s a little convoluted but was the cleanest method I could come up with. With the InSpec function defined, the IP address exposed, and Packer aware of the ssh user and key used to authenticate we can simply run InSpec using the test file that matches the name that was passed to packer-build.sh, any failure in InSpec will cause packer to fail, which in turn will cause Jenkins to fail the build - if you cover enough code with InSpec tests you can provide some semblance of a guarantee that the AMI has been built to spec and will provide all the services necessary when it is spun up!
Tests for a different image (named ‘secondary’ for this example), in our case we have a base image and then images that use the base and apply additional configuration, I have chosen not to duplicate tests in these files, but only to test for additional services that have been configured.
ssh-key-name & ssh-key-name.pub
Because Packer does not expose the ssh connection information without running in debug mode we need to define a static ssh-key instead of using the packer-generated keys. I store these in the InSpec folder to keep them in a central location that can be mounted in the InSpec docker container without additional mount options.
It uh, reboots the ec2 instance - Rebooting the instance prior to running InSpec tests ensures that all services and configurations are applied, if a kernel has been installed it should now be running etc.
Complete packer json example
Here is a complete sanitized version of our base build json file used by packer, note that we remove the ssh key used for the build process at the very end - whether you use a statically defined key or a dynamically generated key this is an important step that is easy to miss!
Wraping it all up
Creating the structure to test our AMIs was not trivial. There are a fair amount of moving pieces and small gotchas that completely fail builds, and troubleshooting can be complicated when you run this through a build system such as Jenkins. Hopefully the method and scripts provided above can help you along your journey to automate and test with Packer and InSpec!
- HashiCorp Packer
- HashiCorp Terraform
- Chef InSpec
- Creating tested GCE images using Chef+Test Kitchen+InSpec+Packer
- Testing Packer builds with Serverspec