My organization is preparing for a migration of several resources to Azure. Some colleagues are eager to use Terraform.
I have a lot of Terraform experience, but in my opinion it’s best to use it for VPC networking, subnets, and static resources, and maybe virtual machines.
That said, LXD instances (so-called “system containers”) do blur the lines between VM workload and container. And I want to continue to encourage LXD adoption.
I tried the LXD Terraform provider exactly once a couple of months ago, without much success. But this was on my local LAN, and I didn’t try too hard to be honest :). Has anyone used the LXD Terraform provider a lot on a cloud? Can you give your impressions of whether/how I should pursue it?
LXD is an ever evolving landscape, i’d never discourage its use, but without projects IMO its worth considering if your provider is ever going to “keep up”.
That’s an excellent point. Perhaps not a dealbreaker, though. We don’t use anything besides the default project right now. And we don’t use clustering. So we might be lucky. Also, we’re all Go devs, so perhaps we can contribute this feature.
I’m more concerned with the basic mechanics of managing a container’s lifecycle. Terraform providers are notoriously buggy. Even quasi-official ones like AWS have frustrated me in the past. A big cloud provider’s API is more of a moving target than LXD, though.
Update. This reminds me. I need to add projects support to the Bolt provider.
Sounds like your already ready to rebuttal against Terraform.
Believe me, I have major criticisms of Terraform as a tool. I am a programmer at heart, but I’ve had full-time ops roles. There’s simply no denying it’s a widely-used tool. I think Terraform gets you into trouble if you use it for the following:
Any kind of active lifecycle management beyond create/destroy
Anything involving the internals of a Linux system (use Ansible/Chef/Puppet for that!)
DNS (kind of scary, imo)
But LXD presents an interesting possibility. Containers + profiles are reasonably declarative. And if you have a private image server solution, you could imagine launching large, beefy servers with lots of containers in Terraform.
The trouble is “Day 2”. How do you evolve the system? The UX of this is provider dependent. If the provider is reliable with nice config, it can be good.
Oh please, its a Saturday, dont make think about Active Directory / Service Discovery !
The only “problem” I see here is storage, we should be able to compose LXD instances Ubuntu 18.04 or Ubuntu 22.04 its the same.
cloud-config gets very close and probably works (but requires extra dev steps to apply to a profile then deploy & I understand the worry, at work I haven’t mitigated this problem)
Completely agree, its hard to build a system that isn’t competing with SNAP - many have tried and failed
Terraform sure has its share of bugs, but people often blame Terraform when they make random upgrades and stuff breaks. Don’t make random upgrades and it won’t break! The other case is when the cloud provider changes stuff on their side - that’s more difficult to work around, but I’d argue I’d try and provision my infra with a tool that works in any cloud rather than give up and start using hyperscaler’s own tools (fewer problems, but more lock-in).
There’s an LXD provider for Nomad. Not sure if it works, but maybe it doesn’t require a lot of fixing to get it to work - https://lxd-nomad.readthedocs.io. I mention Nomad because if you have DNS and service discovery problems, Nomad and Consul may be able to solve that.
It’s OK to use Ansible after Terraform deploy, but you can’t use it in a way that ingores the fact that certain settings mustn’t be changed from Ansible if you plan to have the resources also managed from Terraform later on.
I wouldn’t personally recommend using terraform-provider-lxd in professional setting without being willing to contribute some code. There are some rough edges with using it. As mentioned it’s missing projects, but I’ve also run into some option guards that need updating around devices. The provider also exits with error if a container in the state was deleted external to terraform.
─❯ lxc delete -f build1
─❯ terraform apply -target lxd_container.build1
╷
│ Error: Instance not found
│
│ with lxd_container.build1,
│ on config.tf.json line 55, in resource.lxd_container.build1:
│ 55: },
│
╵