Commit Graph

178 Commits (605738757d3abfc02ecf3755b43be3c94d5aee97)

Author SHA1 Message Date
Anton Nerozya 1fedbded62 ignore_errors instead of failed_when: false 2017-06-29 20:15:14 +02:00
gdmelloatpoints 649654207f mount the etcd data directory in the container with the same path as on the host. 2017-06-27 09:29:47 -04:00
gdmelloatpoints 3123502f4c move `etcd_backup_prefix` to new home. 2017-06-27 09:12:34 -04:00
gdmelloatpoints 4ba237c5d8 Make etcd_backup_prefix configurable. Ensures that backups can be stored on a different location other than ${HOST}/var/backups, say an EBS volume on AWS. 2017-06-26 09:42:30 -04:00
gdmelloatpoints 5c1891ec9f In the etcd container, the etcd data directory is always /var/lib/etcd. Reverting to this value, since `etcd_data_dir` on the host maps to `/var/lib/etcd` in the container. 2017-06-23 13:49:31 -04:00
Gregory Storme fff0aec720 add configurable parameter for etcd_auto_compaction_retention 2017-06-14 10:39:38 +02:00
Brad Beam db3e8edacd Fixing up vault variables 2017-06-08 16:15:33 -05:00
Matthew Mosesohn ae7f59e249 Skip vault cert task evaluation completely when using script cert generation 2017-04-13 19:29:07 +03:00
Matthew Mosesohn 798f90c4d5 Merge pull request #1153 from mattymo/graceful_drain
Move graceful upgrade test to Ubuntu canal HA, adjust drain
2017-04-04 17:33:53 +03:00
Aleksandr Didenko 58acbe7caf Fix multiline when condition in sync_certs task
Folded style in multiline 'when' condition causes error with
unexpected ident. Changing it to literal style should fix
the issue.

Closes #1190
2017-03-30 22:21:04 +02:00
Matthew Mosesohn fb467df47c fix etcd restart 2017-03-29 23:22:49 +04:00
Sergii Golovatiuk f144fd1ed3 Refactor etcd role
- Run docker run from script rather than directly from systemd target
- Refactoring styling/templates

Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
2017-03-24 12:34:15 +01:00
Sergii Golovatiuk c04a6254b9 Backup etcd data before restarting etcd
etcd is crucial part of kubernetes cluster. Ansible restarts etcd on
reconfiguration. Backup helps operator to restore cluster manually in
case of any issues.

Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
2017-03-20 14:50:52 +01:00
Matthew Mosesohn 8195957461 Merge branch 'master' into idempotency2 2017-03-16 09:29:43 +03:00
Matthew Mosesohn a422ad0d50 More idempotency fixes
Fixed sync_tokens fact
Fixed sync_certs for k8s tokens fact
Disabled register docker images changability
Fixed CNI dir permission
Fix idempotency for etcd pre upgrade checks
2017-03-15 19:06:39 +03:00
Matthew Mosesohn 4c6829513c Fix etcd idempotency 2017-03-14 17:23:29 +03:00
Matthew Mosesohn 02a8e78902 Remove standalone etcd specific play, cleanup host mode
Now etcd role can optionally disable etcd cluster setup for faster
deployment when it is combined with etcd role.
2017-03-04 00:34:26 +04:00
Matthew Mosesohn 8f3d9e93ce Merge pull request #1111 from mattymo/use_find_for_certs
Use find module for checking for certificates
2017-03-03 20:08:33 +03:00
Matthew Mosesohn d176818c44 Use find module for checking for certificates
Also generate certs only when absent on master (rather than
when absent on target node)
2017-03-03 16:21:01 +03:00
Vijay Katam a0b1eda1d0 Add support for atomic host
Updates based on feedback

Simplify checks for file exists

remove invalid char

Review feedback. Use regular systemd file.

Add template for docker systemd atomic
2017-03-01 09:38:19 -08:00
Antoine Legrand 77e5171679 Merge pull request #1076 from VincentS/etcd_openssl_count_fix
Fixed counter in ETCD Openssl.conf
2017-03-01 14:17:27 +01:00
Sergii Golovatiuk f9ff93c606 Make etcd data dir configurable.
Closes: #1073
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
2017-02-27 21:35:51 +01:00
Vincent Schwarzer 0cbc3d8df6 Fixed counter in ETCD Openssl.conf
When a apiserver_loadbalancer_domain_name is added to the Openssl.conf
the counter gets not increased correctly. This didnt seem to have an
effect at the current kargo version.
2017-02-27 12:01:09 +01:00
Sergii Golovatiuk 00cfead9bb Increase SSL TTL to 3650 days
In real scenarios 365 days is short period of time. 3650 days is good
enough for long running k8s environments
2017-02-24 15:38:13 +01:00
Matthew Mosesohn d821448e2f Merge branch 'master' into synthscale 2017-02-21 22:17:43 +03:00
Matthew Mosesohn 0afadb9149 Merge pull request #1046 from skyscooby/pedantic-syntax-cleanup
Cleanup legacy syntax, spacing, files all to yml
2017-02-21 17:03:16 +03:00
Matthew Mosesohn d19e6dec7a speed up etcd preupgrade check 2017-02-20 20:18:10 +03:00
Matthew Mosesohn a21eb036ee Add no_log to cert tar tasks
This works around 4MB limit for gitlab CI runner.
2017-02-18 14:09:57 +04:00
Matthew Mosesohn 9c1701f2aa Add synthetic scale deployment mode
New deploy modes: scale, ha-scale, separate-scale
Creates 200 fake hosts for deployment with fake hostvars.

Useful for testing certificate generation and propagation to other
master nodes.

Updated test cases descriptions.
2017-02-18 14:09:55 +04:00
Andrew Greenwood ca9ea097df Cleanup legacy syntax, spacing, files all to yml
Migrate older inline= syntax to pure yml syntax for module args as to be consistant with most of the rest of the tasks
Cleanup some spacing in various files
Rename some files named yaml to yml for consistancy
2017-02-17 16:22:34 -05:00
Antoine Legrand e16ebcad6e Merge pull request #1042 from holser/fix_facts
Fix fact tags
2017-02-17 17:56:29 +01:00
Sergii Golovatiuk e91e58aec9 Fix fact tags
Ansible playbook fails when tags are limited to "facts,etcd" or to
"facts". This patch allows to run ansible-playbook to gather facts only
that don't require calico/flannel/weave components to be verified. This
allows to run ansible with 'facts,bootstrap-os' or just 'facts' to
gether facts that don't require specific components.

Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
2017-02-17 12:32:33 +01:00
Matthew Mosesohn 80c0e747a7 Fix references to CoreOS and Container Linux by CoreOS
Fixes #967
2017-02-16 19:25:17 +03:00
Vladimir Rutsky 09847567ae set "check_mode: no" for read-only "shell" steps that registers result
"shell" step doesn't support check mode, which currently leads to failures,
when Ansible is being run in check mode (because Ansible doesn't run command,
assuming that command might have effect, and no "rc" or "output" is registered).

Setting "check_mode: no" allows to run those "shell" commands in check mode
(which is safe, because those shell commands doesn't have side effects).
2017-02-13 18:53:41 +03:00
Josh Conant 245e05ce61 Vault security hardening and role isolation 2017-02-08 21:41:36 +00:00
Josh Conant f4ec2d18e5 Adding the Vault role 2017-02-08 21:31:28 +00:00
Matthew Mosesohn 0180ad7f38 Merge pull request #990 from mattymo/fix_cert_upgrade
Fix check for node-NODEID certs existence
2017-02-08 14:44:09 +03:00
Matthew Mosesohn e5779ab786 Fix check for node-NODEID certs existence
Fixes upgrade from pre-individual node cert envs.
2017-02-07 21:06:48 +03:00
Matthew Mosesohn 71e14a13b4 Re-tune ETCD performance params
Reduce election timeout to 5000ms (was 10000ms)
Raise heartbeat interval to 250ms (was 100ms)
Remove etcd cpu share (was 300)
Make etcd_cpu_limit and etcd_memory_limit optional.
2017-02-07 20:15:14 +03:00
Matthew Mosesohn fd30131dc2 Revert "Drop linux capabilities and rework users/groups" 2017-02-06 15:58:54 +03:00
Bogdan Dobrelya cb2e5ac776 Drop linux capabilities and rework users/groups
* Drop linux capabilities for unprivileged containerized
  worlkoads Kargo configures for deployments.
* Configure required securityContext/user/group/groups for kube
  components' static manifests, etcd, calico-rr and k8s apps,
  like dnsmasq daemonset.
* Rework cloud-init (etcd) users creation for CoreOS.
* Fix nologin paths, adjust defaults for addusers role and ensure
  supplementary groups membership added for users.
* Add netplug user for network plugins (yet unused by privileged
  networking containers though).
* Grant the kube and netplug users read access for etcd certs via
  the etcd certs group.
* Grant group read access to kube certs via the kube cert group.
* Remove priveleged mode for calico-rr and run it under its uid/gid
  and supplementary etcd_cert group.
* Adjust docs.
* Align cpu/memory limits and dropped caps with added rkt support
  for control plane.

Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
2017-01-20 08:50:42 +01:00
Matthew Mosesohn 8ce32eb3e1 Merge pull request #905 from galthaus/async-runs
Add tasks to ensure that the first nodes have their directories for cert gen
2017-01-19 18:32:27 +03:00
Greg Althaus 0d44599a63 Add explicit name printing in task names for deletgated task during
cert creation
2017-01-18 14:06:50 -06:00
Sergii Golovatiuk 43fa72b7b7 Flush handlers before etcd restart
systemctl daemon-reload should be run before when task modifies/creates
union for etcd. Otherwise etcd won't be able to start

Closes #892

Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
2017-01-17 15:04:25 +01:00
Greg Althaus 6c69da1573 This PR adds/or modifies a few tasks to allow for the playbook to
be run by limit on each node without regard for order.

The changes make sure that all of the directories needed to do
certificate management are on the master[0] or etcd[0] node regardless
of when the playbook gets run on each node.  This allows for separate
ansible playbook runs in parallel that don't have to be synchronized.
2017-01-14 23:24:34 -06:00
Greg Althaus 95bf380d07 If the inventory name of the host exceeds 63 characters,
the openssl tools will fail to create signing requests because
the CN is too long.  This is mainly a problem when FQDNs are used
in the inventory file.

THis will truncate the hostname for the CN field only at the
first dot.  This should handle the issue for most cases.
2017-01-13 10:02:23 -06:00
Aleksandr Didenko d9539e0f27 Fix etcd cert generation for calico-rr role
"etcd_node_cert_data" variable is undefinded for "calico-rr" role.
This patch adds "calico-rr" nodes to task where "etcd_node_cert_data"
variable is registered.
2017-01-09 12:06:25 +01:00
Bogdan Dobrelya 5af2c42bde Better fix for different CoreOS os family facts
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
2017-01-05 16:32:08 +01:00
Bogdan Dobrelya f7447837c5 Rename CoreOS fact
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
2017-01-05 14:02:29 +01:00
Brad Beam 4b6f29d5e1 Adding kubelet in rkt 2017-01-03 14:49:48 -06:00
Brad Beam 8dc19374cc Allowing etcd to run via rkt 2017-01-03 10:10:38 -06:00
Bogdan Dobrelya 58062be2a3 Drop non systemd OS types support
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
2017-01-02 12:14:03 +01:00
Matthew Mosesohn 1f9f885379 Fix etcd cert generation to support large deployments
Due to bash max args limits, we should pass all node filenames and
base64-encoded tar data through stdin/stdout instead.

Fixes #832
2016-12-30 12:55:26 +03:00
Bogdan Dobrelya a56d9de502 Systemd units, limits, and bin path fixes
* Add restart for weave service unit
* Reuse docker_bin_dir everythere
* Limit systemd managed docker containers by CPU/RAM. Do not configure native
  systemd limits due to the lack of consensus in the kernel community
  requires out-of-tree kernel patches.

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
2016-12-28 15:49:42 +01:00
Matthew Mosesohn f0c0390646 Fix creation and sync of etcd certs
Admin certs only go to etcd nodes
Only generate cert-data for nodes that need sync
2016-12-28 14:21:17 +04:00
Matthew Mosesohn e7a1949d85 Merge pull request #818 from mattymo/calico-rr-certs
Fix calico-rr to use etcd certs instead of kube certs
2016-12-28 08:47:16 +03:00
Matthew Mosesohn 6d9cd2d720 Fix calico-rr to use etcd certs instead of kube certs 2016-12-27 17:04:50 +03:00
Bogdan Dobrelya 79996b557b Rework ignore_errors to report no reds
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
2016-12-27 13:00:50 +01:00
Matthew Mosesohn 385f7f6e75 Update etcd.j2 2016-12-22 22:29:24 +03:00
Matthew Mosesohn 9f1e3db906 Adjust etcd server certificates
ETCD doesn't need cert/key options set. It only requires peer
cert options.
2016-12-22 23:05:17 +04:00
Spencer Smith b63d900625 Workaround etcdctl not yet being installed (#797)
workaround case for etcdctl not yet being installed, only allow for return code of 0 (no error)
2016-12-22 12:41:38 -05:00
Matthew Mosesohn ad796d188d Individual etcd ssl certs
Includes hooks for triggering calico, kubelet, and kube-apiserver restarts
if etcd certs changed.
2016-12-22 13:31:11 +03:00
Matthew Mosesohn 348fc5b109 Fix etcd to-SSL upgrade and task register vars 2016-12-19 15:05:49 +03:00
Matthew Mosesohn 9cc73bdf08 Fix etcd member list when upgrading ETCD from an old version 2016-12-15 12:00:45 +04:00
Bogdan Dobrelya a15d626771 Preconfigure DNS stack and docker early
In order to enable offline/intranet installation cases:
* Move DNS/resolvconf configuration to preinstall role. Remove
  skip_dnsmasq_k8s var as not needed anymore.

* Preconfigure DNS stack early, which may be the case when downloading
  artifacts from intranet repositories. Do not configure
  K8s DNS resolvers for hosts /etc/resolv.conf yet early (as they may be
  not existing).

* Reconfigure K8s DNS resolvers for hosts only after kubedns/dnsmasq
  was set up and before K8s apps to be created.

* Move docker install task to early stage as well and unbind it from the
  etcd role's specific install path. Fix external flannel dependency on
  docker role handlers. Also fix the docker restart handlers' steps
  ordering to match the expected sequence (the socket then the service).

* Add default resolver fact, which is
  the cloud provider specific and remove hardcoded GCE resolver.

* Reduce default ndots for hosts /etc/resolv.conf to 2. Multiple search
  domains combined with high ndots values lead to poor performance of
  DNS stack and make ansible workers to fail very often with the
  "Timeout (12s) waiting for privilege escalation prompt:" error.

* Update docs.

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
2016-12-09 17:30:55 +01:00
Bogdan Dobrelya 8cc84e132a Add tags
Add tags to allow more granular tasks filtering.
Add generator script for MD formatted tags found.
Add docs for tags how-to.

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
2016-12-09 12:14:28 +01:00
Spencer Smith 8870178a2d Merge pull request #627 from kubernetes-incubator/issue-626
add restart flag for docker run kubelet
2016-12-06 08:47:18 -08:00
Dan Bode ff675d40f9 Ensure that etcd health checks always pass
in the etcd handler, the reload etcd action
was called after ansible waits for etcd to be
up, this means that the health checks which are
called immediately after fail (resulting in the etcd
role always failing and never finishing)

This patch changes the order to move the 'wait for etcd
up' resource after the 'reload etcd resource', ensuring that
the service is up before the health check is called.
2016-11-18 14:15:00 -08:00
Spencer Smith 0eebe43c08 updated all instances of restart always to restart on-failure with a max of 5 times 2016-11-18 14:33:22 -05:00
Matthew Mosesohn 46ee9faca9 Fix ca certificate loading on CoreOS 2016-11-14 08:47:09 +04:00
Matthew Mosesohn a32cd85eb7 Add etcd TLS support 2016-11-09 18:38:28 +03:00
Matthew Mosesohn 95b460ae94 Remove etcd-proxy from all nodes and use etcd multiaccess 2016-11-09 13:31:12 +03:00
Smaine Kahlouch 4c0bf6225a Merge pull request #562 from kubespray/enable_standalone_node
Enable standalone node deployment
2016-10-24 13:10:53 +02:00
Matthew Mosesohn 65d2a3b0e5 Use only native cachable hostvars for etcd set_facts 2016-10-21 14:39:58 +03:00
Chad Swenson e6902d8ecc Use absolute path for etcdctl
Small fix. The shell module won't automatically resolve the path to the etcdctl binary, so i prefixed with {{ bin_dir }}/
2016-10-20 14:56:52 -05:00
Bogdan Dobrelya 390764c2b4 Add retry_stagger var for failed download/pushes.
* Add the retry_stagger var to tweak push and retry time strategies.
* Add large deployments related docs.

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
2016-09-15 16:43:58 +02:00
Bogdan Dobrelya 422428908a Download containers and save all
Move version/repo vars to download role.
Add container to download params, which overrides url/source_url,
if enabled.
Fix networking plugins download depending on kube_network_plugin.

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
2016-09-15 16:43:56 +02:00
Bogdan Dobrelya 6fdcaa1a63 Add retries for copying binaries from containers
Closes issue: https://github.com/kubespray/kargo/issues/479

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
2016-09-13 15:09:34 +02:00
Smaine Kahlouch 311baeed5d Merge pull request #448 from kubespray/etcdnosync
Add --no-sync to etcdctl member list
2016-08-29 18:58:14 +02:00
Matthew Mosesohn 256a4e1f29 Rebase etcd to v3.0.6
Fixes #450
2016-08-29 15:31:05 +03:00
Matthew Mosesohn 1345dd07f7 Add --no-sync to etcdctl member list
Fixes #447
2016-08-29 12:51:43 +03:00
Bogdan Dobrelya 8168689caa Refactor roles and hosts
Shorten deployment time with:
- Remove redundand roles if duplicated by a dependency and vice versa
- When a member of k8s-cluster, always install docker as a dependency
  of the etcd role and drop the docker role from cluster.yaml.
- Drop etcd and node role dependencies from master role as they are
  covered by the node role in k8s-cluster group as well. Copy defaults
  for master from node role.
- Decouple master, node, secrets roles handlers and vars to be used w/o
  cross references.

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
2016-08-25 13:27:57 +02:00
Matthew Mosesohn 0c953101ff Fix init scripts for etcd. Fixes #383
Fixes Ubuntu 14.04 deployment of etcd.
2016-08-15 14:09:42 +03:00
Matthew Mosesohn 5668e5f767 Fix etcd restart and handler systemd tasks
Changed Wants=docker.service to docker.socket

Renamed handlers for reloading systemd to contain role in task name.
2016-07-29 16:32:35 +03:00
Matthew Mosesohn 90fc407420 Fix etcd user for etcd-proxy service
Only affects sys V OSes (Ubuntu 14.04)

Fixes ##383
2016-07-27 11:54:47 +03:00
Matthew Mosesohn 1b1f5f22d4 Fix etcd standalone deployment
etcd facts are generated in kubernetes/preinstall, so etcd nodes need
to be evaluated first before the rest of the deployment.

Moved several directory facts from kubernetes/node to
kubernetes/preinstall because they are not backward dependent.
2016-07-26 18:15:06 +03:00
mattymo 8141b72d5e Merge branch 'master' into etcddockerdefault 2016-07-20 19:16:47 +03:00
Matthew Mosesohn 7a86b6c73e Set default etcd deployment to docker
Improved docker reload command to wait for etcd to be
up before proceeding. Switched reload to run restart
because it can't reload if it is not guaranteed to be
in running state.
2016-07-20 18:26:16 +03:00
Bogdan Dobrelya a76e5dbb11 Fix set_facts visibility
Move set_facts to the preinstall scope, so every role
may see it. For example, network plugins to see the etcd_endpoint.

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
2016-07-20 11:41:09 +02:00
Bogdan Dobrelya 32cd6e99b2 Add etcd proxy support
* Enforce a etcd-proxy role to a k8s-cluster group members. This
provides an HA layout for all of the k8s cluster internal clients.
* Proxies to be run on each node in the group as a separate etcd
instances with a readwrite proxy mode and listen the given endpoint,
which is either the access_ip:2379 or the localhost:2379.
* A notion for the 'kube_etcd_multiaccess' is: ignore endpoints and
loadbalancers and use the etcd members IPs as a comma-separated
list. Otherwise, clients shall use the local endpoint provided by a
etcd-proxy instances on each etcd node. A Netwroking plugins always
use that access mode.
* Fix apiserver's etcd servers args to use the etcd_access_endpoint.
* Fix networking plugins flannel/calico to use the etcd_endpoint.
* Fix name env var for non masters to be set as well.
* Fix etcd_client_url was not used anywhere and other etcd_* facts
evaluation was duplicated in a few places.
* Define proxy modes only in the env file, if not a master. Del
an automatic proxy mode decisions for etcd nodes in init/unit scripts.
* Use Wants= instead of Requires= as "This is the recommended way to
hook start-up of one unit to the start-up of another unit"
* Make apiserver/calico Wants= etcd-proxy to keep it always up

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Co-authored-by: Matthew Mosesohn <mmosesohn@mirantis.com>
2016-07-19 14:09:40 +02:00
Bogdan Dobrelya 0b874e8db2 Fix systemd service unit for etcd
See https://github.com/coreos/etcd/issues/4308

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
2016-07-15 16:22:17 +02:00
Smana ab8fdba484 deployment idempotent 2016-07-14 21:33:24 +02:00
Matthew Mosesohn b3282cd0bb Add optional deployment mode for Docker etcd_deployment_type
Running etcd in Docker reduces the number of individual file
downloads and services running on the host.

Note: etcd container v3.0.1 moves bindir to /usr/local/bin

Fixes: #298
2016-07-07 19:31:28 +03:00
Smana 40fbb3691d uprade to etcd v3.0.1 2016-07-02 14:14:32 +02:00
Smana 536454b079 upgrade etcd version to 2.3.7 2016-06-28 12:31:57 +02:00
Evgeny L 0500f27db8 Scale-up functionality for etcd cluster
* Set ETCD_INITIAL_CLUSTER_STATE from `new` to `existing`,
because parameter `new` makes sense only on cluster assembly
stage.
* If cluster exists and current node is not a part
of the cluster, add it with command `etcdctl add member name url`.

Closes kubespray/kargo/#270
2016-05-31 18:23:46 +03:00
Paul Czarkowski 7de87d958e turn adduser/download roles into meta roles
This should make things a little more composable,
by making these roles meta roles that perform no
actions by default we allow each role to own its own
resources.
2016-05-22 17:25:52 -05:00
David Reuss 180f2d1fde Pull correct variable for etcd initial variable
This shouldn't use the `inventory_hostname` variable, as that will just yield the same variable, but rather use the `host` which we're looping over.
2016-04-29 14:37:01 +02:00
Christopher M Luciano 47982ea21c Use ansible array format instead of dot-notation.
This fixes the ansible error ```'dict object' has no attribute
'ansible_default_ipv4'"}```. Closes #215
2016-04-25 08:45:58 -04:00
Smana fca384e24c first version of CoreOS on GCE
Please enter the commit message for your changes. Lines starting
2016-02-21 00:06:36 +01:00