Vagrant is a fantastic tool for defining how virtual instances are to be run and provisioned. I’ve used Vagrant with Chef-Solo and Ansible provisioners and it’s helped me understand those tools and iterate quickly. There are some gotchas however and in this post I will explore a particular flaw in the way Vagrant and Ansible cooperate.
Multi-machine setup
Let’s begin by defining a Vagrant environment that we will play with (you will need VirtualBox, Vagrant and Ansible installed):
mkdir multi-vagrant-ansible
cd multi-vagrant-ansible
vagrant init
This will create a Vagrantfile
in the current directory with
commented contents. Let’s cut it back to the essentials and add in a
URL for the base box (Ubuntu Trusty is the latest LTS
release so that’s what I’ll use):
# -*- mode: ruby -*-
# vi: set ft=ruby :
Vagrant.configure(2) do |config|
config.vm.box = "trusty64"
config.vm.box_url = "https://cloud-images.ubuntu.com/vagrant/trusty/current/trusty-server-cloudimg-amd64-vagrant-disk1.box"
config.vm.network "private_network", type: "dhcp"
end
If we run Vagrant now, it’ll clone that base box (downloading it first if it hasn’t already done so) and boot it up. This is already quicker than downloading an ISO, creating a new VirtualBox instance, booting that up and going through the installation procedure.
Let’s define some machines and set them up to be provisioned by Ansible. We’ll have two web servers and one load balancer, because that’s boringly conventional:
config.vm.define "ariadne" do |ariadne|
ariadne.vm.provision "ansible" do |ansible|
ansible.playbook = "loadbalancer.yml"
ansible.sudo = true
end
end
config.vm.define "minos" do |minos|
minos.vm.provision "ansible" do |ansible|
ansible.playbook = "webserver.yml"
ansible.sudo = true
end
end
config.vm.define "pasiphae" do |pasiphae|
pasiphae.vm.provision "ansible" do |ansible|
ansible.playbook = "webserver.yml"
ansible.sudo = true
end
end
So in the above, minos
and pasiphae
are web servers (i.e. they
will be running nginx) and ariadne
is the load balancer. The
location of the ansible playbooks are relative to the Vagrantfile so
in the same directory we will create webserver.yml
with the
following contents:
---
- hosts: all
tasks:
- apt: name=nginx state=present
- service: name=nginx state=started
Which ensures that nginx is not only installed but also running (it’ll also mean that if that server is rebooted, it’ll still run nginx).
Now for the load balancer, loadbalancer.yml
:
---
- hosts: all
tasks:
- apt: name=haproxy state=present
- service: name=haproxy state=started
Which ensures haproxy is installed and running in the same way.
These two playbooks are not aware of each other, they act
independently and you could use ansible-playbook
to provision any
server you liked with them.
If you run vagrant up
at this point (assuming you’ve not done that
with this Vagrantfile before), it’ll boot up new VirtualBox instances
and provision them with ansible, installing the necessary software
etc. All well and good so far.
Ansible facts
Ansible starts off by collecting facts about the nodes it’ll run on. It does this so that you can use information about the node in your playbooks, roles and tasks.
To see the kind of facts that ansible collects about a node, you can
run ansible’s setup
module like this (for the minos instance):
ansible -i .vagrant/provisioners/ansible/inventory/vagrant_ansible_inventory -m setup -u vagrant --private-key=.vagrant/machines/minos/virtualbox/private_key minos
The above command should print out a large JSON structure of all the facts ansible has collected about that node. Ansible facts are somewhat extensible so it can include information gathered using the Ohai or Facter tools.
The facts relevant to our example are under the ansible_eth1
key and they include an IPv4 address - which will come in handy in a
moment.
Facts & templates
Now let’s create a template for the haproxy configuration (in
templates/haproxy.cfg.j2
):
{% include_code Haproxy config lang:jinja haproxy.cfg.j2 %}
We’ll also need to ensure that template gets used in the loadbalancer playbook:
---
- hosts: all
tasks:
- apt: name=haproxy state=present
- service: name=haproxy state=started
- name: Configure haproxy
template: src=templates/haproxy.cfg.j2 dest=/etc/haproxy/haproxy.cfg
If we run this now, we’ll get a cryptic error:
fatal: [ariadne] => {'msg': "AnsibleUndefinedVariable: One or more undefined variables: 'dict object' has no attribute 'webservers'", 'failed': True}
One possible reason for this is that we haven’t defined any groups for our vagrant instances, let’s do that now. We’ll start by defining the groups at the top of the Vagrantfile, before anything else (but after the emacs/vi mode comments):
groups = {
"webservers" => ["minos", "pasiphae"],
"loadbalancers" => ["ariadne"],
"all_groups:children" => ["webservers", "loadbalancers"]
}
This correlates to the playbooks we’ve assigned for each node in the
Vagrantfile. Then we need to refer to that variable in each of our
machine definitions, adding a line that says ansible.groups = groups
, so the modified ariadne definition should now be:
config.vm.define "ariadne" do |ariadne|
ariadne.vm.provision "ansible" do |ansible|
ansible.playbook = "loadbalancer.yml"
ansible.sudo = true
ansible.groups = groups
end
end
If we run vagrant provision
now we get a different error! Ah Ha!
Progress:
fatal: [ariadne] => {'msg': "AnsibleUndefinedVariable: One or more undefined variables: 'dict object' has no attribute 'ansible_eth1'", 'failed': True}
Oh no!
It would be useful at this point to examine what we do have in that
dictionary object. Maybe I mistyped the key? In order to do that, we
can add a debug line above the haproxy configuration line in the
loadbalancer.yml
file, like this: - debug: var=hostvars['minos']
.
When we run vagrant provision
now, we will get the facts about minos
printed in JSON to the console. It’ll look something like this:
{
"hostvars['minos']": {
"inventory_hostname_short": "minos",
"inventory_hostname": "minos",
"group_names": [
"all_groups",
"webservers"
],
"ansible_ssh_port": 2200,
"ansible_ssh_host": "127.0.0.1"
}
}
Clearly all those facts gathered aren’t here. Why? The reason for this is that Vagrant runs provisioning separately on each virtual machine - so each ansible run is not aware of anything from another ansible run. If you look this up online, you will find apparent answers to this problem that reconfigure vagrant to connect to all hosts when doing an ansible run. Let’s do that now.
For each ansible block in the Vagrantfile, add the line:
ansible.limit = 'all'
. Let’s try vagrant provision
again now
that’s in place.
The error that I get after making this change is that SSH is failing.
If we add ansible.verbose = 'vvvv'
to each ansible block in the
Vagrantfile then with a lot of scrolling around we can deduce that
ansible is attempting to connect to each machine in the inventory
using the same private key as it would for the machine it’s currently
provisioning. In other words, while provisioning ariadne
, it uses
the ariadne
private SSH key to log on to both the other servers.
This won’t work of course because those SSH keys are generated by
Vagrant per machine. Not only that but the private key is on the host
machine, not on the guests so it’s a fools errand.
I’m not sure what kind of SSH key setup would allow ansible.limit = 'all'
to work at all, but it’s hardly straightforward.
Potential workaround: Redis
The only way I’ve discovered to have ansible and Vagrant work well together is to use Fact Caching. This allows ansible to cache all facts from a node in Redis (or memcached) so that nodes can refer to each other without requiring an extra ssh connection for every node.
In order to enable fact caching, you will need Redis installed and
running. Then create an ansible.cfg
file in the same directory as
your Vagrantfile, with the following contents:
[defaults]
fact_caching = redis
fact_caching_timeout = 86400
You will need to provision minos
and pasiphae
first so that their
facts are stored in Redis before provisioning ariadne
(because it
refers to those other nodes):
vagrant provision minos
vagrant provision pasiphae
Now that those facts have been gathered, we can run vagrant provision
and it should complete without trouble this time.
Now to verify that the haproxy config has been written as we expect,
we can run vagrant ssh ariadne -- cat /etc/haproxy/haproxy.cfg
and
get something akin to:
backend web-backend
balance roundrobin
mode http
server minos 172.28.128.4:80 check
server pasiphae 172.28.128.5:80 check
It worked! So although fact caching is intended for use in large organisations with thousands of nodes (possibly in disparate data centres) to speed up deployment, it can be handy working around weaknesses in the vagrant+ansible combination.