According to code apiserver, scheduler, controller-manager, proxy don't
use resolution of objects they created. It's not harmful to change
policy to have external resolver.
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
In kubernetes 1.6 ClusterFirstWithHostNet was added as an option. In
accordance to it kubelet will generate resolv.conf based on own
resolv.conf. However, this doesn't create 'options', thus the proper
solution requires some investigation.
This patch sets the same resolv.conf for kubelet as host
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
- Run docker run from script rather than directly from systemd target
- Refactoring styling/templates
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
Updates based on feedback
Simplify checks for file exists
remove invalid char
Review feedback. Use regular systemd file.
Add template for docker systemd atomic
* Leave all.yml to keep only optional vars
* Store groups' specific vars by existing group names
* Fix optional vars casted as mandatory (add default())
* Fix missing defaults for an optional IP var
* Relink group_vars for terraform to reflect changes
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
Migrate older inline= syntax to pure yml syntax for module args as to be consistant with most of the rest of the tasks
Cleanup some spacing in various files
Rename some files named yaml to yml for consistancy
Kubelet is responsible for creating symlinks from /var/lib/docker to /var/log
to make fluentd logging collector work.
However without using host's /var/log those links are invisible to fluentd.
This is done on rkt configuration too.
Since systemd kubelet.service has {{ ssl_ca_dirs }}, fact should be
gathered before writing kubelet.service.
Closes: #1007
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
- Exclude kubelet CPU/RAM (kube-reserved) from cgroup. It decreases a
chance of overcommitment
- Add a possibility to modify Kubelet node-status-update-frequency
- Add a posibility to configure node-monitor-grace-period,
node-monitor-period, pod-eviction-timeout for Kubernetes controller
manager
- Add Kubernetes Relaibility Documentation with recomendations for
various scenarios.
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
kubelet lost the ability to load kernel modules. This
puts that back by adding the lib/modules mount to kubelet.
The new variable kubelet_load_modules can be set to true
to enable this item. It is OFF by default.
Netchecker is rewritten in Go lang with some new args instead of
env variables. Also netchecker-server no longer requires kubectl
container. Updating playbooks accordingly.
- Docker 1.12 and further don't need nsenter hack. This patch removes
it. Also, it bumps the minimal version to 1.12.
Closes#776
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
* Drop linux capabilities for unprivileged containerized
worlkoads Kargo configures for deployments.
* Configure required securityContext/user/group/groups for kube
components' static manifests, etcd, calico-rr and k8s apps,
like dnsmasq daemonset.
* Rework cloud-init (etcd) users creation for CoreOS.
* Fix nologin paths, adjust defaults for addusers role and ensure
supplementary groups membership added for users.
* Add netplug user for network plugins (yet unused by privileged
networking containers though).
* Grant the kube and netplug users read access for etcd certs via
the etcd certs group.
* Grant group read access to kube certs via the kube cert group.
* Remove priveleged mode for calico-rr and run it under its uid/gid
and supplementary etcd_cert group.
* Adjust docs.
* Align cpu/memory limits and dropped caps with added rkt support
for control plane.
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
* Add restart for weave service unit
* Reuse docker_bin_dir everythere
* Limit systemd managed docker containers by CPU/RAM. Do not configure native
systemd limits due to the lack of consensus in the kernel community
requires out-of-tree kernel patches.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Also place in global vars and do not repeat the kube_*_config_dir
and kube_namespace vars for better code maintainability and UX.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
In order to enable offline/intranet installation cases:
* Move DNS/resolvconf configuration to preinstall role. Remove
skip_dnsmasq_k8s var as not needed anymore.
* Preconfigure DNS stack early, which may be the case when downloading
artifacts from intranet repositories. Do not configure
K8s DNS resolvers for hosts /etc/resolv.conf yet early (as they may be
not existing).
* Reconfigure K8s DNS resolvers for hosts only after kubedns/dnsmasq
was set up and before K8s apps to be created.
* Move docker install task to early stage as well and unbind it from the
etcd role's specific install path. Fix external flannel dependency on
docker role handlers. Also fix the docker restart handlers' steps
ordering to match the expected sequence (the socket then the service).
* Add default resolver fact, which is
the cloud provider specific and remove hardcoded GCE resolver.
* Reduce default ndots for hosts /etc/resolv.conf to 2. Multiple search
domains combined with high ndots values lead to poor performance of
DNS stack and make ansible workers to fail very often with the
"Timeout (12s) waiting for privilege escalation prompt:" error.
* Update docs.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Fixes#655.
This is a teporary solution for long-polling idle connections to
apiserver. It will make Nginx not cut them for the duration of expected
timeout. It will also make Nginx extremely slow in realizing that there
is some issue with connectivity to apiserver as well, so it might not be
perfect permanent solution.
* Add an option to deploy K8s app to test e2e network connectivity
and cluster DNS resolve via Kubedns for nethost/simple pods
(defaults to false).
* Parametrize existing k8s apps templates with kube_namespace and
kube_config_dir instead of hardcode.
* For CoreOS, ensure nameservers from inventory to be put in the
first place to allow hostnet pods connectivity via short names
or FQDN and hostnet agents to pass as well, if netchecker
deployed.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
* Add dns_replicas, dns_memory/cpu_limit/requests vars for
dns related apps.
* When kube_log_level=4, log dnsmasq queries as well.
* Add log level control for skydns (part of kubedns app).
* Add limits/requests vars for dnsmasq (part of kubedns app) and
dnsmasq daemon set.
* Drop string defaults for kube_log_level as it is int and
is defined in the global vars as well.
* Add docs
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
W/o this patch, the "Download containers" task may be skipped
when running on the delegate node due to wrong "when" confition.
Then it fails to upload nginx image to the nodes as well.
Fix download nginx dependency so it always can be pushed to
nodes when download_run_once is enabled.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
According to http://kubernetes.io/docs/user-guide/images/ :
By default, the kubelet will try to pull each image from the
specified registry. However, if the imagePullPolicy property
of the container is set to IfNotPresent or Never, then a local\
image is used (preferentially or exclusively, respectively).
Use IfNotPresent value to allow images prepared by the download
role dependencies to be effectively used by kubelet without pull
errors resulting apps to stay blocked in PullBackOff/Error state
even when there are images on the localhost exist.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Pre download all required container images as roles' deps.
Drop unused flannel-server-helper images pre download.
Improve pods creation post-install test pre downloaded busybox.
Improve logs collection script with kubectl describe, fix sudo/etcd/weave
commands.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
- Move CNI configuration from `kubernetes/node` role to
`network_plugin/canal`
- Create SSL dir for Canal and symlink etcd SSL files
- Add needed options to `canal-config` configmap
- Run flannel and calico-node containers with proper configuration
- Drop debugs from collect-info playbook
- Drop sudo from collect-info step and add target dir var (required for travis jobs)
- Label all k8s apps, including static manifests
- Add logs for K8s apps to be collected as well
- Fix upload to GCS as a public-read tarball
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
* Don't push containers if not changed
* Do preinstall role only once and redistribute defaults to
corresponding roles
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Change the kubelet --hostname-override flag to use the ansible_hostname variable which should be more consistent with the value required by cloud providers
Add ansible_hostname alias to /etc/hosts when it is different from inventory_hostname to overcome node name limitations see https://github.com/kubernetes/kubernetes/issues/22770
Signed-off-by: Chad Swenson <chadswen@gmail.com>
The requirements for network policy feature are described here [1]. In
order to enable it, appropriate configuration must be provided to the CNI
plug in and Calico policy controller must be set up. Beside that
corresponding extensions needed to be enabled in k8s API.
Now to turn on the feature user can define `enable_network_policy`
customization variable for Ansible.
[1] http://kubernetes.io/docs/user-guide/networkpolicies/