kubespray/docs/operations/large-deployments.md

Large deployments of K8s
========================

For a large scaled deployments, consider the following configuration changes:

* Tune [ansible settings](https://docs.ansible.com/ansible/latest/intro_configuration.html)
  for `forks` and `timeout` vars to fit large numbers of nodes being deployed.

* Override containers' `foo_image_repo` vars to point to intranet registry.

* Override the ``download_run_once: true`` and/or ``download_localhost: true``.
  See [Downloading binaries and containers](/docs/advanced/downloads.md) for details.

* Adjust the `retry_stagger` global var as appropriate. It should provide sane
  load on a delegate (the first K8s control plane node) then retrying failed
  push or download operations.

* Tune parameters for DNS related applications
  Those are ``dns_replicas``, ``dns_cpu_limit``,
  ``dns_cpu_requests``, ``dns_memory_limit``, ``dns_memory_requests``.
  Please note that limits must always be greater than or equal to requests.

* Tune CPU/memory limits and requests. Those are located in roles' defaults
  and named like ``foo_memory_limit``, ``foo_memory_requests`` and
  ``foo_cpu_limit``, ``foo_cpu_requests``. Note that 'Mi' memory units for K8s
  will be submitted as 'M', if applied for ``docker run``, and cpu K8s units
  will end up with the 'm' skipped for docker as well. This is required as
  docker does not understand k8s units well.

* Tune ``kubelet_status_update_frequency`` to increase reliability of kubelet.
  ``kube_controller_node_monitor_grace_period``,
  ``kube_controller_node_monitor_period``,
  ``kube_apiserver_pod_eviction_not_ready_timeout_seconds`` &
  ``kube_apiserver_pod_eviction_unreachable_timeout_seconds`` for better Kubernetes reliability.
  Check out [Kubernetes Reliability](/docs/advanced/kubernetes-reliability.md)

* Tune network prefix sizes. Those are ``kube_network_node_prefix``,
  ``kube_service_addresses`` and ``kube_pods_subnet``.

* Add calico_rr nodes if you are deploying with Calico or Canal. Nodes recover
  from host/network interruption much quicker with calico_rr.

* Check out the
  [Inventory](/docs/getting_started/getting-started.md#building-your-own-inventory)
  section of the Getting started guide for tips on creating a large scale
  Ansible inventory.

* Override the ``etcd_events_cluster_setup: true`` store events in a separate
  dedicated etcd instance.

For example, when deploying 200 nodes, you may want to run ansible with
``--forks=50``, ``--timeout=600`` and define the ``retry_stagger: 60``.
Add retry_stagger var for failed download/pushes. * Add the retry_stagger var to tweak push and retry time strategies. * Add large deployments related docs. Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com> 2016-09-15 17:23:27 +08:00			`Large deployments of K8s`
			`========================`

			`For a large scaled deployments, consider the following configuration changes:`

Fix some docs.ansible.com url typo (#7550) 2021-04-26 23:33:02 +08:00			`* Tune [ansible settings](https://docs.ansible.com/ansible/latest/intro_configuration.html)`
Add retry_stagger var for failed download/pushes. * Add the retry_stagger var to tweak push and retry time strategies. * Add large deployments related docs. Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com> 2016-09-15 17:23:27 +08:00			for `forks` and `timeout` vars to fit large numbers of nodes being deployed.

			* Override containers' `foo_image_repo` vars to point to intranet registry.

Add download_always_pull check and sha256 for docker images Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com> 2016-12-19 22:50:04 +08:00			* Override the ``download_run_once: true`` and/or ``download_localhost: true``.
modify doc structure and update existing doc-links as preparation for new doc generation script 2024-05-16 01:32:51 +08:00			`See [Downloading binaries and containers](/docs/advanced/downloads.md) for details.`
Add retry_stagger var for failed download/pushes. * Add the retry_stagger var to tweak push and retry time strategies. * Add large deployments related docs. Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com> 2016-09-15 17:23:27 +08:00
Revert "Fix: typos in docs and comments (#7805)" (#8592) This reverts commit 417180246c2dd414ba1f9d3d730217eaa2187f6f. 2022-03-03 03:57:13 +08:00			* Adjust the `retry_stagger` global var as appropriate. It should provide sane
Docs: Replace master with control plane (#7767) This replaces master with "control plane" in Kubespray docs because of [1]. [1]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-cluster-lifecycle/kubeadm/2067-rename-master-label-taint/README.md#motivation 2021-07-01 15:55:55 +08:00			`load on a delegate (the first K8s control plane node) then retrying failed`
Add retry_stagger var for failed download/pushes. * Add the retry_stagger var to tweak push and retry time strategies. * Add large deployments related docs. Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com> 2016-09-15 17:23:27 +08:00			`push or download operations.`

Add markdown CI (#5380) 2019-12-04 23:22:57 +08:00			`* Tune parameters for DNS related applications`
Remove kubedns and dnsmasq. Move dns_late phase after apps (#4406) Both kubedns and dnsmasq modes are long not maintained. We should run dns_late steps at the end because sshd makes DNS lookups during Ansible run and has 2s timeouts for each failed lookup trying to connect to coredns before it is ready. 2019-04-02 03:32:34 +08:00			Those are ``dns_replicas``, ``dns_cpu_limit``,
Tune dnsmasq/kubedns limits, replicas, logging * Add dns_replicas, dns_memory/cpu_limit/requests vars for dns related apps. * When kube_log_level=4, log dnsmasq queries as well. * Add log level control for skydns (part of kubedns app). * Add limits/requests vars for dnsmasq (part of kubedns app) and dnsmasq daemon set. * Drop string defaults for kube_log_level as it is int and is defined in the global vars as well. * Add docs Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com> 2016-11-25 18:33:39 +08:00			``dns_cpu_requests``, ``dns_memory_limit``, ``dns_memory_requests``.
			`Please note that limits must always be greater than or equal to requests.`

Systemd units, limits, and bin path fixes * Add restart for weave service unit * Reuse docker_bin_dir everythere * Limit systemd managed docker containers by CPU/RAM. Do not configure native systemd limits due to the lack of consensus in the kernel community requires out-of-tree kernel patches. Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com> 2016-12-23 22:44:44 +08:00			`* Tune CPU/memory limits and requests. Those are located in roles' defaults`
			and named like ``foo_memory_limit``, ``foo_memory_requests`` and
			``foo_cpu_limit``, ``foo_cpu_requests``. Note that 'Mi' memory units for K8s
Kubernetes Reliability Improvements - Exclude kubelet CPU/RAM (kube-reserved) from cgroup. It decreases a chance of overcommitment - Add a possibility to modify Kubelet node-status-update-frequency - Add a posibility to configure node-monitor-grace-period, node-monitor-period, pod-eviction-timeout for Kubernetes controller manager - Add Kubernetes Relaibility Documentation with recomendations for various scenarios. Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com> 2017-02-07 22:01:02 +08:00			will be submitted as 'M', if applied for ``docker run``, and cpu K8s units
			`will end up with the 'm' skipped for docker as well. This is required as`
			`docker does not understand k8s units well.`

			* Tune ``kubelet_status_update_frequency`` to increase reliability of kubelet.
			``kube_controller_node_monitor_grace_period``,
			``kube_controller_node_monitor_period``,
Fixed issue #7112. Created new API Server vars that replace defunct Controller Manager one (#7114) Signed-off-by: Brendan Holmes <5072156+holmesb@users.noreply.github.com> 2021-01-08 23:20:53 +08:00			``kube_apiserver_pod_eviction_not_ready_timeout_seconds`` &
			``kube_apiserver_pod_eviction_unreachable_timeout_seconds`` for better Kubernetes reliability.
modify doc structure and update existing doc-links as preparation for new doc generation script 2024-05-16 01:32:51 +08:00			`Check out [Kubernetes Reliability](/docs/advanced/kubernetes-reliability.md)`
Systemd units, limits, and bin path fixes * Add restart for weave service unit * Reuse docker_bin_dir everythere * Limit systemd managed docker containers by CPU/RAM. Do not configure native systemd limits due to the lack of consensus in the kernel community requires out-of-tree kernel patches. Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com> 2016-12-23 22:44:44 +08:00
Update information about network sizes 2018-01-26 22:13:21 +08:00			* Tune network prefix sizes. Those are ``kube_network_node_prefix``,
			``kube_service_addresses`` and ``kube_pods_subnet``.

Rename ansible groups to use _ instead of - (#7552) * rename ansible groups to use _ instead of - k8s-cluster -> k8s_cluster k8s-node -> k8s_node calico-rr -> calico_rr no-floating -> no_floating Note: kube-node,k8s-cluster groups in upgrade CI need clean-up after v2.16 is tagged * ensure old groups are mapped to the new ones 2021-04-29 20:20:50 +08:00			`* Add calico_rr nodes if you are deploying with Calico or Canal. Nodes recover`
Docs - Removed incorrect info on calico_rr. (#8437) 2022-01-17 18:55:30 +08:00			`from host/network interruption much quicker with calico_rr.`
Add scale thresholds to split etcd and k8s-masters Also adds calico-rr group if there are standalone etcd nodes. Now if there are 50 or more nodes, 3 etcd nodes will be standalone. If there are 200 or more nodes, 2 kube-masters will be standalone. If thresholds are exceeded, kube-node group cannot add nodes that belong to etcd or kube-master groups (according to above statements). 2017-01-11 23:15:04 +08:00
			`* Check out the`
modify doc structure and update existing doc-links as preparation for new doc generation script 2024-05-16 01:32:51 +08:00			`[Inventory](/docs/getting_started/getting-started.md#building-your-own-inventory)`
Add scale thresholds to split etcd and k8s-masters Also adds calico-rr group if there are standalone etcd nodes. Now if there are 50 or more nodes, 3 etcd nodes will be standalone. If there are 200 or more nodes, 2 kube-masters will be standalone. If thresholds are exceeded, kube-node group cannot add nodes that belong to etcd or kube-master groups (according to above statements). 2017-01-11 23:15:04 +08:00			`section of the Getting started guide for tips on creating a large scale`
			`Ansible inventory.`

add tip to large-deployments.doc set the ``etcd_events_cluster_setup: true`` store events in a separate dedicated etcd instance. 2018-03-07 19:00:00 +08:00			* Override the ``etcd_events_cluster_setup: true`` store events in a separate
			`dedicated etcd instance.`

Add retry_stagger var for failed download/pushes. * Add the retry_stagger var to tweak push and retry time strategies. * Add large deployments related docs. Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com> 2016-09-15 17:23:27 +08:00			`For example, when deploying 200 nodes, you may want to run ansible with`
			``--forks=50``, ``--timeout=600`` and define the ``retry_stagger: 60``.