Having max_mds value equals to the number of mds nodes generates a
warning in the ceph cluster status:
cluster:
id: 6d3e49a4-ab4d-4e03-a7d6-58913b8ec00a'
health: HEALTH_WARN'
insufficient standby MDS daemons available'
(...)
services:
mds: cephfs:3 {0=mds1=up:active,1=mds0=up:active,2=mds2=up:active}'
Let's use 2 active and 1 standby mds.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4a6d19dae2)
Since we now support podman, let's rename the playbook so it's more
generic.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7bc7e3669d)
If the new mon/osd node doesn't have python installed then we need to
execute the tasks from raw_install_python.yml.
Closes: #4368
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 34b03d1873)
The wait_for ansible module doesn't support the backets on IPv6 address
so need to remove them.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1769710
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 55adc10be3)
In addition to the grafana container tag change, we need to do the same
for the prometheus container stack based on the release present in the
OSE 4.1 container image.
$ docker run --rm openshift4/ose-prometheus-node-exporter:v4.1 --version
node_exporter, version 0.17.0
build user: root@67fee13ed48f
build date: 20191023-14:38:12
go version: go1.11.13
$ docker run --rm openshift4/ose-prometheus-alertmanager:4.1 --version
alertmanager, version 0.16.2
build user: root@70b79a3f29b6
build date: 20191023-14:57:30
go version: go1.11.13
$ docker run --rm openshift4/ose-prometheus:4.1 --version
prometheus, version 2.7.2
build user: root@12da054778a3
build date: 20191023-14:39:36
go version: go1.11.13
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3e29b8d5ff)
When a container is already running on a non containerized node then the
umount ceph partition task is skipped.
This is due to the container ps command which always returns 0 even if
the filter matches nothing.
We should run the umount task when:
1/ the container command is failing (not installed) : rc != 0
2/ the container command reports running ceph-osd containers : rc == 0
Also we should not fail on the ceph directory listing.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616159
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 39cfe0aa65)
If the container binary is podman, we shouldn't try to stop docker here.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b18476a1a6)
in order to be able to call container_binary without having to run the
whole ceph-facts role.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fe5ffe589e)
All containers are removed when systemd stops them.
There is no need to call this module in purge container playbook.
This commit also removes all docker_image task and remove all container
images in the final cleanup play.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1776736
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d23383a820)
When using the shortname, the URL for active alert launches with short
hostname and fails to connect to the server.
This commit changes the template in order to use the fqdn.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1765485
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a8d76d72d7)
This commit makes the ceph-dashboard role only printing ceph-dashboard
URL of the nodes present in grafana-server group
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1762163
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit cc0c1ce301)
This is needed to avoid following error:
```
ERROR! The requested handler 'restart ceph mons' was not found in either the main handlers list nor in the listening handlers list
```
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1777829
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a43a872105)
let's use `client_group_name` instead of hardcoding the name.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7fe0d55eff)
We must import this role in the first play otherwise the first call to
`client_group_name`fails.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1777829
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6526a25ab5)
This commit reverts the following change:
fcf181342a (diff-23b6f443c01ea2efcb4f36eedfea9089R7-R14)
this is causing CI failures so this commit is intended to unlock the CI.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5353ab8a23)
Configuration of cephfs with an existing cluster using --limit used to fail
at different tasks while running with site-docker.yml
This commit addresses both of those tasks
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1773489
Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
(cherry picked from commit 72c43cc5d9)
If we execute the site-container.yml playbook with specific tags (like
ceph_update_config) then we need to be sure to gather the facts otherwise
we will see error like:
The task includes an option with an undefined variable. The error was:
'ansible_hostname' is undefined
This commit also adds missing 'gather_facts: false' to mons plays.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1754432
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d7fd769b6d)
This will prevent failure of site-docker.yml with configs in doc.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1769760
Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
Co-Authored-By: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9a1f1626c3)
when `import_key` is enabled, if the key already exists, it will only be
fetched using ceph cli, if the mode specified in the `ceph_key` task is
different from what is applied by the ceph cli, the mode isn't restored because
we don't call `module.set_fs_attributes_if_different()` before
`module.exit_json(**result)`
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1734513
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b717b5f736)
This commit adds a playbook to be played before we run purge playbook,
it first creates an rbd image then map an rbd device on client0 so the
purge playbook will try to unmap it.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit db77fbda15)
in containerized context, using the binary provided in atomic os won't
work because it's an old version provided by ceph-common based on
10.2.5.
Using a container could be an idea but for large cluster with hundreds
of client nodes, that would require to pull the image of each of them
just to unmap the rbd devices.
Let's use the sysfs method in order to avoid any issue related to ceph
version that is shipped on the host.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1766064
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3cfcc7a105)
This commit removes the mergify config on stable-4.0
At the moment there is no need to have a mergify config on this branch
given that we don't use it.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
[1] introduced a regression on the fs.aio-max-nr sysctl value condition.
The enable key isn't a boolean but a string because the expression isn't
evaluated.
This string output "(osd_objectstore == 'bluestore')" is always true
because item.enable condition only matches non empty string. So the
sysctl value was applyied for both filestore and bluestore backend.
[2] added the bool filter to the condition but the filter always returns
false on string and the sysctl wasn't applyed at all.
This commit fixes the enable key value by evaluating the value instead
of using the string.
[1] https://github.com/ceph/ceph-ansible/commit/08a2b58
[2] https://github.com/ceph/ceph-ansible/commit/ab54fe2
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ece46d33be)
The ansible ssh connections are now using the ssh backend instead of
paramiko starting testinfra 3.1 and persistent connections too.
pytest 4.6 is the latest release to be supported by python 2.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 02df2ab5ea)
The latest grafana container tag is using grafana 6.x release which could
cause issue with the ceph dashboard integration.
Considering that the grafana container in RHCS 3 is based on 5.x then we
should use the same version.
$ docker run --rm rhceph/rhceph-3-dashboard-rhel7:3 -v
Version 5.2.4 (commit: unknown-dev)
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2037fb87b6)
Even if this improves ceph-disk/ceph-volume performances then it also
impact the ceph-osd process.
The ceph-osd process shouldn't use 1024:4096 value for the max open
files.
Removing the ulimit option from the container engine and doing this kind
of change on the container side [1].
[1] https://github.com/ceph/ceph-container/pull/1497
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9a996aef7f)
This change adds two tasks to set grafana-api user and password
that are required to inject dashboard layouts to the external
grafana instance.
Without these two parameters the ceph-ansible playbook fails
showing an authorization error (HTTPError: 401 Client Error:
Unauthorized").
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1767365
Signed-off-by: fmount <fpantano@redhat.com>
(cherry picked from commit 41b8c17356)
This commit adds a default value in the `with_dict` because when using
python 2.7, if a task using a `with_dict` has a condition, it is
evaluated anyway whereas in python 3 it isn't.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1766499
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e9823f319b)
There's no need to use the default filter on active/standby groups
because if the group doesn't exist then the play is just skipped.
Currently this generates warnings like:
[WARNING]: Could not match supplied host pattern, ignoring: |
[WARNING]: Could not match supplied host pattern, ignoring: default([])
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2ca79fcc99)
The active mds host should be based on the inventory hostname and not on
the ansible hostname.
The value returns under the mdsmap structure is based on the OS hostname
so we need to find the right node in the inventory with this value when
doing operation on inventory nodes.
Othewise we could see error like:
The task includes an option with an undefined variable. The error was:
"hostvars[foobar]" is undefined
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f1f2352c79)
pycodestyle returns:
E111 indentation is not a multiple of four
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8a0a13f67a)
To avoid unnecessary ansible warnings during playbook execution we can
move the library and plugins test files under a different directory.
[WARNING]: Skipping plugin (plugins/filter/test_ipaddrs_in_ranges.py) as
it seems to be invalid:
cannot import name 'ipaddrs_in_ranges'
Closes: #4656
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 6ce4fde820)
mon_host should use the inventory hostname and not the node hostname.
Fix creates an issue when the inventory and node hostname are different.
Closes: #4670
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 650bc0c3f0)
Without the become flag set to true, we can't executed the roles
successfully.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 77b212833e)
This must be consistent with what is used in `name` parameter.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d06057ebd2)
Let's skip this part of the code if there's no mds node in the
inventory.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5ec906c3af)
The ceph-container-engine role is missing from both playbooks so the
container engine (docker, podman) isn't install resulting in a failure
on the added nodes.
fatal: [xxxxx]: FAILED! => changed=false
cmd: docker --version
msg: '[Errno 2] No such file or directory'
rc: 2
Closes: #4634
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bfb1d6be12)
Add ceph_docker_registry_username and ceph_docker_registry_password
variables in ceph-defaults role so they will be present in the group_vars
samples but commented.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1763139
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b33c476f16)
on master, it doesn't make sense anymore to use device name, we should
use osd id instead.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b5a61fe2e3)
This commit removes the backslash in allow command parameter, this was
needed before the ceph_key module integration.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 384161edcd)
It doesn't make sense to test the old 3.0.x container images with
nautilus+ ceph releases.
Also disable the dashboard deployment and switch to bluestore backend.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3c2840da03)
This commit adds the support of the ceph-iscsi stable repository when
use ceph_repository community instead of always using the devel
repositories.
We're still using the devel repositories for rtslib and tcmu-runner in
both cases (dev and community).
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f2cb937193)
We don't have to use ceph_iscsi_config_dev (default true) on RHCS
because all iscsi packages are already included in the RHCS
repositories.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 82495eaf97)
When deploying with packages then the ceph-container-common role isn't
executed so the registry authentication task is ignored.
Closes: #4636
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9ad000618f)
If ansible-lint reports an error then it's skipped. We should fail in
this case.
This patch also fixes the pipefail lint in the rbd mirror role
[306] Shells that use pipes should set the pipefail option
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3969470fca)
[303] mktemp used in place of tempfile module
[602] Don't compare to empty string
[701] No 'galaxy_info' found
[702] Use 'galaxy_tags' rather than 'categories'
This patch also changes the ansible log_path value via the
ANSIBLE_LOG_PATH environment variable in the travis configuration to
avoid warnings.
[WARNING]: log file at /home/travis/ansible/ansible.log is not writeable
and we cannot create it, aborting
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f7fd0b6d4f)