2021-05-28 02:58:24 +08:00
# Kubernetes on Equinix Metal with Terraform
2019-01-31 23:24:36 +08:00
Provision a Kubernetes cluster with [Terraform ](https://www.terraform.io ) on
2021-05-28 02:58:24 +08:00
[Equinix Metal ](https://metal.equinix.com ) ([formerly Packet](https://blog.equinix.com/blog/2020/10/06/equinix-metal-metal-and-more/)).
2019-01-31 23:24:36 +08:00
## Status
2021-05-28 02:58:24 +08:00
This will install a Kubernetes cluster on Equinix Metal. It should work in all locations and on most server types.
2019-01-31 23:24:36 +08:00
## Approach
2020-12-26 04:10:27 +08:00
2019-01-31 23:24:36 +08:00
The terraform configuration inspects variables found in
2021-05-28 02:58:24 +08:00
[variables.tf ](variables.tf ) to create resources in your Equinix Metal project.
2019-01-31 23:24:36 +08:00
There is a [python script ](../terraform.py ) that reads the generated`.tfstate`
2023-01-27 13:24:25 +08:00
file to generate a dynamic inventory that is consumed by [cluster.yml ](../../../cluster.yml )
2019-01-31 23:24:36 +08:00
to actually install Kubernetes with Kubespray.
### Kubernetes Nodes
2020-12-26 04:10:27 +08:00
2019-01-31 23:24:36 +08:00
You can create many different kubernetes topologies by setting the number of
different classes of hosts.
2020-12-26 04:10:27 +08:00
2019-01-31 23:24:36 +08:00
- Master nodes with etcd: `number_of_k8s_masters` variable
- Master nodes without etcd: `number_of_k8s_masters_no_etcd` variable
- Standalone etcd hosts: `number_of_etcd` variable
- Kubernetes worker nodes: `number_of_k8s_nodes` variable
Note that the Ansible script will report an invalid configuration if you wind up
with an *even number* of etcd instances since that is not a valid configuration. This
restriction includes standalone etcd nodes that are deployed in a cluster along with
master nodes with etcd replicas. As an example, if you have three master nodes with
etcd replicas and three standalone etcd nodes, the script will fail since there are
now six total etcd replicas.
## Requirements
- [Install Terraform ](https://www.terraform.io/intro/getting-started/install.html )
2024-05-16 01:32:51 +08:00
- [Install Ansible dependencies ](/docs/ansible/ansible.md#installing-ansible )
2021-05-28 02:58:24 +08:00
- Account with Equinix Metal
2019-01-31 23:24:36 +08:00
- An SSH key pair
## SSH Key Setup
2021-05-28 02:58:24 +08:00
An SSH keypair is required so Ansible can access the newly provisioned nodes (Equinix Metal hosts). By default, the public SSH key defined in cluster.tfvars will be installed in authorized_key on the newly provisioned nodes (~/.ssh/id_rsa.pub). Terraform will upload this public key and then it will be distributed out to all the nodes. If you have already set this public key in Equinix Metal (i.e. via the portal), then set the public keyfile name in cluster.tfvars to blank to prevent the duplicate key from being uploaded which will cause an error.
2019-01-31 23:24:36 +08:00
If you don't already have a keypair generated (~/.ssh/id_rsa and ~/.ssh/id_rsa.pub), then a new keypair can be generated with the command:
```ShellSession
ssh-keygen -f ~/.ssh/id_rsa
```
## Terraform
2020-12-26 04:10:27 +08:00
2021-05-28 02:58:24 +08:00
Terraform will be used to provision all of the Equinix Metal resources with base software as appropriate.
2019-01-31 23:24:36 +08:00
### Configuration
#### Inventory files
Create an inventory directory for your cluster by copying the existing sample and linking the `hosts` script (used to build the inventory based on Terraform state):
```ShellSession
2023-01-27 13:24:25 +08:00
cp -LRp contrib/terraform/equinix/sample-inventory inventory/$CLUSTER
2020-12-26 04:10:27 +08:00
cd inventory/$CLUSTER
2023-01-27 13:24:25 +08:00
ln -s ../../contrib/terraform/equinix/hosts
2019-01-31 23:24:36 +08:00
```
This will be the base for subsequent Terraform commands.
2021-05-28 02:58:24 +08:00
#### Equinix Metal API access
2019-01-31 23:24:36 +08:00
2023-01-27 13:24:25 +08:00
Your Equinix Metal API key must be available in the `METAL_AUTH_TOKEN` environment variable.
2019-01-31 23:24:36 +08:00
This key is typically stored outside of the code repo since it is considered secret.
If someone gets this key, they can startup/shutdown hosts in your project!
2020-12-26 04:10:27 +08:00
For more information on how to generate an API key or find your project ID, please see
2021-05-28 02:58:24 +08:00
[Accounts Index ](https://metal.equinix.com/developers/docs/accounts/ ).
2019-01-31 23:24:36 +08:00
2021-05-28 02:58:24 +08:00
The Equinix Metal Project ID associated with the key will be set later in `cluster.tfvars` .
2019-01-31 23:24:36 +08:00
2021-05-28 02:58:24 +08:00
For more information about the API, please see [Equinix Metal API ](https://metal.equinix.com/developers/api/ ).
2019-01-31 23:24:36 +08:00
2023-01-27 13:24:25 +08:00
For more information about terraform provider authentication, please see [the equinix provider documentation ](https://registry.terraform.io/providers/equinix/equinix/latest/docs ).
2019-01-31 23:24:36 +08:00
Example:
2020-12-26 04:10:27 +08:00
2019-01-31 23:24:36 +08:00
```ShellSession
2023-01-27 13:24:25 +08:00
export METAL_AUTH_TOKEN="Example-API-Token"
2019-01-31 23:24:36 +08:00
```
Note that to deploy several clusters within the same project you need to use [terraform workspace ](https://www.terraform.io/docs/state/workspaces.html#using-workspaces ).
#### Cluster variables
2020-12-26 04:10:27 +08:00
2019-01-31 23:24:36 +08:00
The construction of the cluster is driven by values found in
[variables.tf ](variables.tf ).
2019-12-04 23:20:58 +08:00
For your cluster, edit `inventory/$CLUSTER/cluster.tfvars` .
2019-01-31 23:24:36 +08:00
The `cluster_name` is used to set a tag on each server deployed as part of this cluster.
This helps when identifying which hosts are associated with each cluster.
While the defaults in variables.tf will successfully deploy a cluster, it is recommended to set the following values:
2020-12-26 04:10:27 +08:00
- cluster_name = the name of the inventory directory created above as $CLUSTER
2023-01-27 13:24:25 +08:00
- equinix_metal_project_id = the Equinix Metal Project ID associated with the Equinix Metal API token above
2019-01-31 23:24:36 +08:00
#### Enable localhost access
2020-12-26 04:10:27 +08:00
Kubespray will pull down a Kubernetes configuration file to access this cluster by enabling the
2019-01-31 23:24:36 +08:00
`kubeconfig_localhost: true` in the Kubespray configuration.
2021-04-29 20:20:50 +08:00
Edit `inventory/$CLUSTER/group_vars/k8s_cluster/k8s_cluster.yml` and comment back in the following line and change from `false` to `true` :
2019-01-31 23:24:36 +08:00
`\# kubeconfig_localhost: false`
becomes:
`kubeconfig_localhost: true`
Once the Kubespray playbooks are run, a Kubernetes configuration file will be written to the local host at `inventory/$CLUSTER/artifacts/admin.conf`
#### Terraform state files
In the cluster's inventory folder, the following files might be created (either by Terraform
or manually), to prevent you from pushing them accidentally they are in a
2023-01-27 13:24:25 +08:00
`.gitignore` file in the `contrib/terraform/equinix` directory :
2019-01-31 23:24:36 +08:00
2020-12-26 04:10:27 +08:00
- `.terraform`
- `.tfvars`
- `.tfstate`
- `.tfstate.backup`
2023-01-27 13:24:25 +08:00
- `.lock.hcl`
2019-01-31 23:24:36 +08:00
You can still add them manually if you want to.
### Initialization
Before Terraform can operate on your cluster you need to install the required
plugins. This is accomplished as follows:
```ShellSession
2020-12-26 04:10:27 +08:00
cd inventory/$CLUSTER
2023-01-27 13:24:25 +08:00
terraform -chdir=../../contrib/terraform/metal init -var-file=cluster.tfvars
2019-01-31 23:24:36 +08:00
```
This should finish fairly quickly telling you Terraform has successfully initialized and loaded necessary modules.
### Provisioning cluster
2020-12-26 04:10:27 +08:00
2019-01-31 23:24:36 +08:00
You can apply the Terraform configuration to your cluster with the following command
issued from your cluster's inventory directory (`inventory/$CLUSTER`):
2020-12-26 04:10:27 +08:00
2019-01-31 23:24:36 +08:00
```ShellSession
2023-01-27 13:24:25 +08:00
terraform -chdir=../../contrib/terraform/equinix apply -var-file=cluster.tfvars
2020-12-26 04:10:27 +08:00
export ANSIBLE_HOST_KEY_CHECKING=False
ansible-playbook -i hosts ../../cluster.yml
2019-01-31 23:24:36 +08:00
```
### Destroying cluster
2020-12-26 04:10:27 +08:00
2019-01-31 23:24:36 +08:00
You can destroy your new cluster with the following command issued from the cluster's inventory directory:
```ShellSession
2023-01-27 13:24:25 +08:00
terraform -chdir=../../contrib/terraform/equinix destroy -var-file=cluster.tfvars
2019-01-31 23:24:36 +08:00
```
If you've started the Ansible run, it may also be a good idea to do some manual cleanup:
2020-12-26 04:10:27 +08:00
- Remove SSH keys from the destroyed cluster from your `~/.ssh/known_hosts` file
- Clean up any temporary cache files: `rm /tmp/$CLUSTER-*`
2019-01-31 23:24:36 +08:00
### Debugging
2020-12-26 04:10:27 +08:00
2019-01-31 23:24:36 +08:00
You can enable debugging output from Terraform by setting `TF_LOG` to `DEBUG` before running the Terraform command.
## Ansible
### Node access
#### SSH
Ensure your local ssh-agent is running and your ssh key has been added. This
step is required by the terraform provisioner:
2020-12-26 04:10:27 +08:00
```ShellSession
eval $(ssh-agent -s)
ssh-add ~/.ssh/id_rsa
2019-01-31 23:24:36 +08:00
```
If you have deployed and destroyed a previous iteration of your cluster, you will need to clear out any stale keys from your SSH "known hosts" file ( `~/.ssh/known_hosts` ).
#### Test access
2020-08-28 17:28:53 +08:00
Make sure you can connect to the hosts. Note that Flatcar Container Linux by Kinvolk will have a state `FAILED` due to Python not being present. This is okay, because Python will be installed during bootstrapping, so long as the hosts are not `UNREACHABLE` .
2019-01-31 23:24:36 +08:00
2020-12-26 04:10:27 +08:00
```ShellSession
2019-01-31 23:24:36 +08:00
$ ansible -i inventory/$CLUSTER/hosts -m ping all
example-k8s_node-1 | SUCCESS => {
"changed": false,
"ping": "pong"
}
example-etcd-1 | SUCCESS => {
"changed": false,
"ping": "pong"
}
example-k8s-master-1 | SUCCESS => {
"changed": false,
"ping": "pong"
}
```
If it fails try to connect manually via SSH. It could be something as simple as a stale host key.
### Deploy Kubernetes
2020-12-26 04:10:27 +08:00
```ShellSession
ansible-playbook --become -i inventory/$CLUSTER/hosts cluster.yml
2019-01-31 23:24:36 +08:00
```
This will take some time as there are many tasks to run.
## Kubernetes
### Set up kubectl
2020-12-26 04:10:27 +08:00
- [Install kubectl ](https://kubernetes.io/docs/tasks/tools/install-kubectl/ ) on the localhost.
- Verify that Kubectl runs correctly
2019-01-31 23:24:36 +08:00
2020-12-26 04:10:27 +08:00
```ShellSession
2019-01-31 23:24:36 +08:00
kubectl version
```
2020-12-26 04:10:27 +08:00
- Verify that the Kubernetes configuration file has been copied over
```ShellSession
2019-01-31 23:24:36 +08:00
cat inventory/alpha/$CLUSTER/admin.conf
```
2020-12-26 04:10:27 +08:00
- Verify that all the nodes are running correctly.
```ShellSession
2019-01-31 23:24:36 +08:00
kubectl version
kubectl --kubeconfig=inventory/$CLUSTER/artifacts/admin.conf get nodes
```
## What's next
Try out your new Kubernetes cluster with the [Hello Kubernetes service ](https://kubernetes.io/docs/tasks/access-application-cluster/service-access-application-cluster/ ).