Added terraform support for Exoscale (#7141)

* Added terraform support for Exoscale * Fixed markdown lint error on exoscale terraform
2021-01-23 05:37:39 +01:00 · 2021-01-23 05:37:39 +01:00 · 404ea0270e
parent ef939dee74
commit 404ea0270e
12 changed files with 675 additions and 0 deletions
--- a/contrib/terraform/exoscale/README.md
+++ b/contrib/terraform/exoscale/README.md
@ -0,0 +1,154 @@
+# Kubernetes on Exoscale with Terraform
+
+Provision a Kubernetes cluster on [Exoscale](https://www.exoscale.com/) using Terraform and Kubespray
+
+## Overview
+
+The setup looks like following
+
+```text
+                           Kubernetes cluster
+                        +-----------------------+
+---------------+       |   +--------------+    |
+|               |       |   | +--------------+  |
+| API server LB +---------> | |              |  |
+|               |       |   | | Master/etcd  |  |
+---------------+       |   | | node(s)      |  |
+                        |   +-+              |  |
+                        |     +--------------+  |
+                        |           ^           |
+                        |           |           |
+                        |           v           |
+---------------+       |   +--------------+    |
+|               |       |   | +--------------+  |
+|  Ingress LB   +---------> | |              |  |
+|               |       |   | |    Worker    |  |
+---------------+       |   | |    node(s)   |  |
+                        |   +-+              |  |
+                        |     +--------------+  |
+                        +-----------------------+
+```
+
+## Requirements
+
+* Terraform 0.13.0 or newer
+
+*0.12 also works if you modify the provider block to include version and remove all `versions.tf` files*
+
+## Quickstart
+
+NOTE: *Assumes you are at the root of the kubespray repo*
+
+Copy the sample inventory for your cluster and copy the default terraform variables.
+
+```bash
+CLUSTER=my-exoscale-cluster
+cp -r inventory/sample inventory/$CLUSTER
+cp contrib/terraform/exoscale/default.tfvars inventory/$CLUSTER/
+cd inventory/$CLUSTER
+```
+
+Edit `default.tfvars` to match your setup
+
+```bash
+# Ensure $EDITOR points to your favorite editor, e.g., vim, emacs, VS Code, etc.
+$EDITOR default.tfvars
+```
+
+For authentication you can use the credentials file `~/.cloudstack.ini` or `./cloudstack.ini`.
+The file should look like something like this:
+
+```ini
+[cloudstack]
+key = <API key>
+secret = <API secret>
+```
+
+Follow the [Exoscale IAM Quick-start](https://community.exoscale.com/documentation/iam/quick-start/) to learn how to generate API keys.
+
+### Encrypted credentials
+
+To have the credentials encrypted at rest, you can use [sops](https://github.com/mozilla/sops) and only decrypt the credentials at runtime.
+
+```bash
+cat << EOF > cloudstack.ini
+[cloudstack]
+key =
+secret =
+EOF
+sops --encrypt --in-place --pgp <PGP key fingerprint> cloudstack.ini
+sops cloudstack.ini
+```
+
+Run terraform to create the infrastructure
+
+```bash
+terraform init ../../contrib/terraform/exoscale
+terraform apply -var-file default.tfvars ../../contrib/terraform/exoscale
+```
+
+If your cloudstack credentials file is encrypted using sops, run the following:
+
+```bash
+terraform init ../../contrib/terraform/exoscale
+sops exec-file -no-fifo cloudstack.ini 'CLOUDSTACK_CONFIG={} terraform apply -var-file default.tfvars ../../contrib/terraform/exoscale'
+```
+
+You should now have a inventory file named `inventory.ini` that you can use with kubespray.
+You can now copy your inventory file and use it with kubespray to set up a cluster.
+You can type `terraform output` to find out the IP addresses of the nodes, as well as control-plane and data-plane load-balancer.
+
+It is a good idea to check that you have basic SSH connectivity to the nodes. You can do that by:
+
+```bash
+ansible -i inventory.ini -m ping all
+```
+
+Example to use this with the default sample inventory:
+
+```bash
+ansible-playbook -i inventory.ini ../../cluster.yml -b -v
+```
+
+## Teardown
+
+The Kubernetes cluster cannot create any load-balancers or disks, hence, teardown is as simple as Terraform destroy:
+
+```bash
+terraform destroy -var-file default.tfvars ../../contrib/terraform/exoscale
+```
+
+## Variables
+
+### Required
+
+* `ssh_pub_key`: Path to public ssh key to use for all machines
+* `zone`: The zone where to run the cluster
+* `machines`: Machines to provision. Key of this object will be used as the name of the machine
+  * `node_type`: The role of this node *(master|worker)*
+  * `size`: The size to use
+  * `boot_disk`: The boot disk to use
+    * `image_name`: Name of the image
+    * `root_partition_size`: Size *(in GB)* for the root partition
+    * `ceph_partition_size`: Size *(in GB)* for the partition for rook to use as ceph storage. *(Set to 0 to disable)*
+    * `node_local_partition_size`: Size *(in GB)* for the partition for node-local-storage. *(Set to 0 to disable)*
+* `ssh_whitelist`: List of IP ranges (CIDR) that will be allowed to ssh to the nodes
+* `api_server_whitelist`: List of IP ranges (CIDR) that will be allowed to connect to the API server
+* `nodeport_whitelist`: List of IP ranges (CIDR) that will be allowed to connect to the kubernetes nodes on port 30000-32767 (kubernetes nodeports)
+
+### Optional
+
+* `prefix`: Prefix to use for all resources, required to be unique for all clusters in the same project *(Defaults to `default`)*
+
+An example variables file can be found `default.tfvars`
+
+## Known limitations
+
+### Only single disk
+
+Since Exoscale doesn't support additional disks to be mounted onto an instance, this script has the ability to create partitions for [Rook](https://rook.io/) and [node-local-storage](https://kubernetes.io/docs/concepts/storage/volumes/#local).
+
+### No Kubernetes API
+
+The current solution doesn't use the [Exoscale Kubernetes cloud controller](https://github.com/exoscale/exoscale-cloud-controller-manager).
+This means that we need to set up a HTTP(S) loadbalancer in front of all workers and set the Ingress controller to DaemonSet mode.
--- a/contrib/terraform/exoscale/default.tfvars
+++ b/contrib/terraform/exoscale/default.tfvars
@ -0,0 +1,61 @@
+prefix = "default"
+zone   = "ch-gva-2"
+
+inventory_file = "inventory.ini"
+
+ssh_pub_key = "~/.ssh/id_rsa.pub"
+
+machines = {
+ "master-0": {
+   "node_type": "master",
+   "size": "Small",
+   "boot_disk": {
+     "image_name": "Linux Ubuntu 20.04 LTS 64-bit",
+     "root_partition_size": 50,
+     "node_local_partition_size": 0,
+     "ceph_partition_size": 0
+   }
+ },
+ "worker-0": {
+   "node_type": "worker",
+   "size": "Large",
+   "boot_disk": {
+     "image_name": "Linux Ubuntu 20.04 LTS 64-bit",
+     "root_partition_size": 50,
+     "node_local_partition_size": 0,
+     "ceph_partition_size": 0
+   }
+ },
+ "worker-1": {
+   "node_type": "worker",
+   "size": "Large",
+   "boot_disk": {
+     "image_name": "Linux Ubuntu 20.04 LTS 64-bit",
+     "root_partition_size": 50,
+     "node_local_partition_size": 0,
+     "ceph_partition_size": 0
+   }
+ },
+ "worker-2": {
+   "node_type": "worker",
+   "size": "Large",
+   "boot_disk": {
+     "image_name": "Linux Ubuntu 20.04 LTS 64-bit",
+     "root_partition_size": 50,
+     "node_local_partition_size": 0,
+     "ceph_partition_size": 0
+   }
+ }
+}
+
+nodeport_whitelist = [
+  "0.0.0.0/0"
+]
+
+ssh_whitelist = [
+  "0.0.0.0/0"
+]
+
+api_server_whitelist = [
+  "0.0.0.0/0"
+]
--- a/contrib/terraform/exoscale/main.tf
+++ b/contrib/terraform/exoscale/main.tf
@ -0,0 +1,49 @@
+provider "exoscale" {}
+
+module "kubernetes" {
+  source = "./modules/kubernetes-cluster"
+
+  prefix = var.prefix
+
+  machines = var.machines
+
+  ssh_pub_key = var.ssh_pub_key
+
+  ssh_whitelist        = var.ssh_whitelist
+  api_server_whitelist = var.api_server_whitelist
+  nodeport_whitelist   = var.nodeport_whitelist
+}
+
+#
+# Generate ansible inventory
+#
+
+data "template_file" "inventory" {
+  template = file("${path.module}/templates/inventory.tpl")
+
+  vars = {
+    connection_strings_master = join("\n", formatlist("%s ansible_user=ubuntu ansible_host=%s ip=%s etcd_member_name=etcd%d",
+                                        keys(module.kubernetes.master_ip_addresses),
+                                        values(module.kubernetes.master_ip_addresses).*.public_ip,
+                                        values(module.kubernetes.master_ip_addresses).*.private_ip,
+                                        range(1, length(module.kubernetes.master_ip_addresses) + 1)))
+    connection_strings_worker = join("\n", formatlist("%s ansible_user=ubuntu ansible_host=%s ip=%s",
+                                        keys(module.kubernetes.worker_ip_addresses),
+                                        values(module.kubernetes.worker_ip_addresses).*.public_ip,
+                                        values(module.kubernetes.worker_ip_addresses).*.private_ip))
+
+    list_master               = join("\n", keys(module.kubernetes.master_ip_addresses))
+    list_worker               = join("\n", keys(module.kubernetes.worker_ip_addresses))
+    api_lb_ip_address         = module.kubernetes.control_plane_lb_ip_address
+  }
+}
+
+resource "null_resource" "inventories" {
+  provisioner "local-exec" {
+    command = "echo '${data.template_file.inventory.rendered}' > ${var.inventory_file}"
+  }
+
+  triggers = {
+    template = data.template_file.inventory.rendered
+  }
+}
--- a/contrib/terraform/exoscale/modules/kubernetes-cluster/main.tf
+++ b/contrib/terraform/exoscale/modules/kubernetes-cluster/main.tf
@ -0,0 +1,198 @@
+data "exoscale_compute_template" "os_image" {
+  for_each = var.machines
+
+  zone = var.zone
+  name = each.value.boot_disk.image_name
+}
+
+data "exoscale_compute" "master_nodes" {
+  for_each = exoscale_compute.master
+
+  id = each.value.id
+
+  # Since private IP address is not assigned until the nics are created we need this
+  depends_on = [exoscale_nic.master_private_network_nic]
+}
+
+data "exoscale_compute" "worker_nodes" {
+  for_each = exoscale_compute.worker
+
+  id = each.value.id
+
+  # Since private IP address is not assigned until the nics are created we need this
+  depends_on = [exoscale_nic.worker_private_network_nic]
+}
+
+resource "exoscale_network" "private_network" {
+  zone = var.zone
+  name = "${var.prefix}-network"
+
+  start_ip = cidrhost(var.private_network_cidr, 1)
+  # cidr -1 = Broadcast address
+  # cidr -2 = DHCP server address (exoscale specific)
+  end_ip  = cidrhost(var.private_network_cidr, -3)
+  netmask = cidrnetmask(var.private_network_cidr)
+}
+
+resource "exoscale_compute" "master" {
+  for_each = {
+    for name, machine in var.machines :
+    name => machine
+    if machine.node_type == "master"
+  }
+
+  display_name    = "${var.prefix}-${each.key}"
+  template_id     = data.exoscale_compute_template.os_image[each.key].id
+  size            = each.value.size
+  disk_size       = each.value.boot_disk.root_partition_size + each.value.boot_disk.node_local_partition_size + each.value.boot_disk.ceph_partition_size
+  key_pair        = exoscale_ssh_keypair.ssh_key.name
+  state           = "Running"
+  zone            = var.zone
+  security_groups = [exoscale_security_group.master_sg.name]
+
+  user_data = templatefile(
+    "${path.module}/templates/cloud-init.tmpl",
+    {
+      eip_ip_address            = exoscale_ipaddress.ingress_controller_lb.ip_address
+      node_local_partition_size = each.value.boot_disk.node_local_partition_size
+      ceph_partition_size       = each.value.boot_disk.ceph_partition_size
+      root_partition_size       = each.value.boot_disk.root_partition_size
+      node_type                 = "master"
+    }
+  )
+}
+
+resource "exoscale_compute" "worker" {
+  for_each = {
+    for name, machine in var.machines :
+    name => machine
+    if machine.node_type == "worker"
+  }
+
+  display_name    = "${var.prefix}-${each.key}"
+  template_id     = data.exoscale_compute_template.os_image[each.key].id
+  size            = each.value.size
+  disk_size       = each.value.boot_disk.root_partition_size + each.value.boot_disk.node_local_partition_size + each.value.boot_disk.ceph_partition_size
+  key_pair        = exoscale_ssh_keypair.ssh_key.name
+  state           = "Running"
+  zone            = var.zone
+  security_groups = [exoscale_security_group.worker_sg.name]
+
+  user_data = templatefile(
+    "${path.module}/templates/cloud-init.tmpl",
+    {
+      eip_ip_address            = exoscale_ipaddress.ingress_controller_lb.ip_address
+      node_local_partition_size = each.value.boot_disk.node_local_partition_size
+      ceph_partition_size       = each.value.boot_disk.ceph_partition_size
+      root_partition_size       = each.value.boot_disk.root_partition_size
+      node_type                 = "worker"
+    }
+  )
+}
+
+resource "exoscale_nic" "master_private_network_nic" {
+  for_each = exoscale_compute.master
+
+  compute_id = each.value.id
+  network_id = exoscale_network.private_network.id
+}
+
+resource "exoscale_nic" "worker_private_network_nic" {
+  for_each = exoscale_compute.worker
+
+  compute_id = each.value.id
+  network_id = exoscale_network.private_network.id
+}
+
+resource "exoscale_security_group" "master_sg" {
+  name        = "${var.prefix}-master-sg"
+  description = "Security group for Kubernetes masters"
+}
+
+resource "exoscale_security_group_rules" "master_sg_rules" {
+  security_group_id = exoscale_security_group.master_sg.id
+
+  # SSH
+  ingress {
+    protocol  = "TCP"
+    cidr_list = var.ssh_whitelist
+    ports     = ["22"]
+  }
+
+  # Kubernetes API
+  ingress {
+    protocol  = "TCP"
+    cidr_list = var.api_server_whitelist
+    ports     = ["6443"]
+  }
+}
+
+resource "exoscale_security_group" "worker_sg" {
+  name        = "${var.prefix}-worker-sg"
+  description = "security group for kubernetes worker nodes"
+}
+
+resource "exoscale_security_group_rules" "worker_sg_rules" {
+  security_group_id = exoscale_security_group.worker_sg.id
+
+  # SSH
+  ingress {
+    protocol  = "TCP"
+    cidr_list = var.ssh_whitelist
+    ports     = ["22"]
+  }
+
+  # HTTP(S)
+  ingress {
+    protocol  = "TCP"
+    cidr_list = ["0.0.0.0/0"]
+    ports     = ["80", "443"]
+  }
+
+  # Kubernetes Nodeport
+  ingress {
+    protocol  = "TCP"
+    cidr_list = var.nodeport_whitelist
+    ports     = ["30000-32767"]
+  }
+}
+
+resource "exoscale_ipaddress" "ingress_controller_lb" {
+  zone                     = var.zone
+  healthcheck_mode         = "http"
+  healthcheck_port         = 80
+  healthcheck_path         = "/healthz"
+  healthcheck_interval     = 10
+  healthcheck_timeout      = 2
+  healthcheck_strikes_ok   = 2
+  healthcheck_strikes_fail = 3
+}
+
+resource "exoscale_secondary_ipaddress" "ingress_controller_lb" {
+  for_each = exoscale_compute.worker
+
+  compute_id = each.value.id
+  ip_address = exoscale_ipaddress.ingress_controller_lb.ip_address
+}
+
+resource "exoscale_ipaddress" "control_plane_lb" {
+  zone                     = var.zone
+  healthcheck_mode         = "tcp"
+  healthcheck_port         = 6443
+  healthcheck_interval     = 10
+  healthcheck_timeout      = 2
+  healthcheck_strikes_ok   = 2
+  healthcheck_strikes_fail = 3
+}
+
+resource "exoscale_secondary_ipaddress" "control_plane_lb" {
+  for_each = exoscale_compute.master
+
+  compute_id = each.value.id
+  ip_address = exoscale_ipaddress.control_plane_lb.ip_address
+}
+
+resource "exoscale_ssh_keypair" "ssh_key" {
+  name       = "${var.prefix}-ssh-key"
+  public_key = trimspace(file(pathexpand(var.ssh_pub_key)))
+}
--- a/contrib/terraform/exoscale/modules/kubernetes-cluster/output.tf
+++ b/contrib/terraform/exoscale/modules/kubernetes-cluster/output.tf
@ -0,0 +1,31 @@
+output "master_ip_addresses" {
+  value = {
+    for key, instance in exoscale_compute.master :
+    instance.name => {
+      "private_ip" = contains(keys(data.exoscale_compute.master_nodes), key) ? data.exoscale_compute.master_nodes[key].private_network_ip_addresses[0] : ""
+      "public_ip"  = exoscale_compute.master[key].ip_address
+    }
+  }
+}
+
+output "worker_ip_addresses" {
+  value = {
+    for key, instance in exoscale_compute.worker :
+    instance.name => {
+      "private_ip" = contains(keys(data.exoscale_compute.worker_nodes), key) ? data.exoscale_compute.worker_nodes[key].private_network_ip_addresses[0] : ""
+      "public_ip"  = exoscale_compute.worker[key].ip_address
+    }
+  }
+}
+
+output "cluster_private_network_cidr" {
+  value = var.private_network_cidr
+}
+
+output "ingress_controller_lb_ip_address" {
+  value = exoscale_ipaddress.ingress_controller_lb.ip_address
+}
+
+output "control_plane_lb_ip_address" {
+  value = exoscale_ipaddress.control_plane_lb.ip_address
+}
--- a/contrib/terraform/exoscale/modules/kubernetes-cluster/templates/cloud-init.tmpl
+++ b/contrib/terraform/exoscale/modules/kubernetes-cluster/templates/cloud-init.tmpl
@ -0,0 +1,38 @@
+#cloud-config
+%{ if ceph_partition_size > 0 || node_local_partition_size > 0}
+bootcmd:
+- [ cloud-init-per, once, move-second-header, sgdisk, --move-second-header, /dev/vda ]
+%{ if node_local_partition_size > 0 }
+  # Create partition for node local storage
+- [ cloud-init-per, once, create-node-local-part, parted, --script, /dev/vda, 'mkpart extended ext4 ${root_partition_size}GB %{ if ceph_partition_size == 0 }-1%{ else }${root_partition_size + node_local_partition_size}GB%{ endif }' ]
+- [ cloud-init-per, once, create-fs-node-local-part, mkfs.ext4, /dev/vda2 ]
+%{ endif }
+%{ if ceph_partition_size > 0 }
+  # Create partition for rook to use for ceph
+- [ cloud-init-per, once, create-ceph-part, parted, --script, /dev/vda, 'mkpart extended ${root_partition_size + node_local_partition_size}GB -1' ]
+%{ endif }
+%{ endif }
+
+write_files:
+  - path: /etc/netplan/eth1.yaml
+    content: |
+      network:
+        version: 2
+        ethernets:
+          eth1:
+            dhcp4: true
+runcmd:
+  - netplan apply
+  - /sbin/sysctl net.ipv4.conf.all.forwarding=1
+%{ if node_type == "worker" }
+  # TODO: When a VM is seen as healthy and is added to the EIP loadbalancer
+  #       pool it no longer can send traffic back to itself via the EIP IP
+  #       address.
+  #       Remove this if it ever gets solved.
+  - iptables -t nat -A PREROUTING -d ${eip_ip_address} -j DNAT --to 127.0.0.1
+%{ endif }
+%{ if node_local_partition_size > 0 }
+  - mkdir -p /mnt/disks/node-local-storage
+  - chown nobody:nogroup /mnt/disks/node-local-storage
+  - mount /dev/vda2 /mnt/disks/node-local-storage
+%{ endif }
--- a/contrib/terraform/exoscale/modules/kubernetes-cluster/variables.tf
+++ b/contrib/terraform/exoscale/modules/kubernetes-cluster/variables.tf
@ -0,0 +1,40 @@
+variable "zone" {
+  type = string
+  # This is currently the only zone that is supposed to be supporting
+  # so called "managed private networks".
+  # See: https://www.exoscale.com/syslog/introducing-managed-private-networks
+  default = "ch-gva-2"
+}
+
+variable "prefix" {}
+
+variable "machines" {
+  type = map(object({
+    node_type = string
+    size      = string
+    boot_disk = object({
+      image_name                = string
+      root_partition_size       = number
+      ceph_partition_size       = number
+      node_local_partition_size = number
+    })
+  }))
+}
+
+variable "ssh_pub_key" {}
+
+variable "ssh_whitelist" {
+  type = list(string)
+}
+
+variable "api_server_whitelist" {
+  type = list(string)
+}
+
+variable "nodeport_whitelist" {
+  type = list(string)
+}
+
+variable "private_network_cidr" {
+  default = "172.0.10.0/24"
+}
--- a/contrib/terraform/exoscale/modules/kubernetes-cluster/versions.tf
+++ b/contrib/terraform/exoscale/modules/kubernetes-cluster/versions.tf
@ -0,0 +1,9 @@
+terraform {
+  required_providers {
+    exoscale = {
+      source = "exoscale/exoscale"
+      version = ">= 0.21"
+    }
+  }
+  required_version = ">= 0.13"
+}
--- a/contrib/terraform/exoscale/output.tf
+++ b/contrib/terraform/exoscale/output.tf
@ -0,0 +1,15 @@
+output "master_ips" {
+  value = module.kubernetes.master_ip_addresses
+}
+
+output "worker_ips" {
+  value = module.kubernetes.worker_ip_addresses
+}
+
+output "ingress_controller_lb_ip_address" {
+  value = module.kubernetes.ingress_controller_lb_ip_address
+}
+
+output "control_plane_lb_ip_address" {
+  value = module.kubernetes.control_plane_lb_ip_address
+}
--- a/contrib/terraform/exoscale/templates/inventory.tpl
+++ b/contrib/terraform/exoscale/templates/inventory.tpl
@ -0,0 +1,19 @@
+[all]
+${connection_strings_master}
+${connection_strings_worker}
+
+[kube-master]
+${list_master}
+
+[kube-master:vars]
+supplementary_addresses_in_ssl_keys = [ "${api_lb_ip_address}" ]
+
+[etcd]
+${list_master}
+
+[kube-node]
+${list_worker}
+
+[k8s-cluster:children]
+kube-master
+kube-node
--- a/contrib/terraform/exoscale/variables.tf
+++ b/contrib/terraform/exoscale/variables.tf
@ -0,0 +1,46 @@
+variable zone {
+  description = "The zone where to run the cluster"
+}
+
+variable prefix {
+  description = "Prefix for resource names"
+  default     = "default"
+}
+
+variable machines {
+  description = "Cluster machines"
+  type = map(object({
+    node_type = string
+    size      = string
+    boot_disk = object({
+      image_name                = string
+      root_partition_size       = number
+      ceph_partition_size       = number
+      node_local_partition_size = number
+    })
+  }))
+}
+
+variable ssh_pub_key {
+  description = "Path to public SSH key file which is injected into the VMs."
+  type        = string
+}
+
+variable ssh_whitelist {
+  description = "List of IP ranges (CIDR) to whitelist for ssh"
+  type = list(string)
+}
+
+variable api_server_whitelist {
+  description = "List of IP ranges (CIDR) to whitelist for kubernetes api server"
+  type = list(string)
+}
+
+variable nodeport_whitelist {
+  description = "List of IP ranges (CIDR) to whitelist for kubernetes nodeports"
+  type = list(string)
+}
+
+variable "inventory_file" {
+  description = "Where to store the generated inventory file"
+}
--- a/contrib/terraform/exoscale/versions.tf
+++ b/contrib/terraform/exoscale/versions.tf
@ -0,0 +1,15 @@
+terraform {
+  required_providers {
+    exoscale = {
+      source = "exoscale/exoscale"
+      version = ">= 0.21"
+    }
+    null = {
+      source = "hashicorp/null"
+    }
+    template = {
+      source = "hashicorp/template"
+    }
+  }
+  required_version = ">= 0.13"
+}