It might be possible at some point even with osd flags `noout` and
`norebalance` set the PGs states can change depending on the amount of data
written meantime. It means the check for PGs state will fail.
This commit changes the way we set those flags:
we set them before an OSD node upgrade and unset them before the PGs
state check so they can recover.
Fixes: #3961
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
- Set default mock configuration to epel-8-x86_64, to match the
default dist value.
- Add support for alpha tags, like the recently added v5.0.0alpha1
Signed-off-by: Javier Pena <jpena@redhat.com>
This commit adds a playbook to be played before we run purge playbook,
it first creates an rbd image then map an rbd device on client0 so the
purge playbook will try to unmap it.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
in containerized context, using the binary provided in atomic os won't
work because it's an old version provided by ceph-common based on
10.2.5.
Using a container could be an idea but for large cluster with hundreds
of client nodes, that would require to pull the image of each of them
just to unmap the rbd devices.
Let's use the sysfs method in order to avoid any issue related to ceph
version that is shipped on the host.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1766064
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When ceph_rbd_mirror_configure is set to true we need to ensure that
the required variables aren't empty.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1760553
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
If for some reason, there's an old rgw socket file present in the
/var/run/ceph/ directory then the test command could fail with
test: xxxxxxxxx.asok: binary operator expected
$ ls -hl /var/run/ceph/
total 0
srwxr-xr-x. ceph-client.rgw.rgw0.rgw0.68.94153614631472.asok
srwxr-xr-x. ceph-client.rgw.rgw0.rgw0.68.94240997655088.asok
We can check the radosgw socket in /proc/net/unix to avoid using wildcard
in the socket name.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
If the new mon/osd node doesn't have python installed then we need to
execute the tasks from raw_install_python.yml.
Closes: #4368
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
[1] introduced a regression on the fs.aio-max-nr sysctl value condition.
The enable key isn't a boolean but a string because the expression isn't
evaluated.
This string output "(osd_objectstore == 'bluestore')" is always true
because item.enable condition only matches non empty string. So the
sysctl value was applyied for both filestore and bluestore backend.
[2] added the bool filter to the condition but the filter always returns
false on string and the sysctl wasn't applyed at all.
This commit fixes the enable key value by evaluating the value instead
of using the string.
[1] https://github.com/ceph/ceph-ansible/commit/08a2b58
[2] https://github.com/ceph/ceph-ansible/commit/ab54fe2
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The ansible ssh connections are now using the ssh backend instead of
paramiko starting testinfra 3.1 and persistent connections too.
pytest 4.6 is the latest release to be supported by python 2.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The latest grafana container tag is using grafana 6.x release which could
cause issue with the ceph dashboard integration.
Considering that the grafana container in RHCS 3 is based on 5.x then we
should use the same version.
$ docker run --rm rhceph/rhceph-3-dashboard-rhel7:3 -v
Version 5.2.4 (commit: unknown-dev)
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Even if this improves ceph-disk/ceph-volume performances then it also
impact the ceph-osd process.
The ceph-osd process shouldn't use 1024:4096 value for the max open
files.
Removing the ulimit option from the container engine and doing this kind
of change on the container side [1].
[1] https://github.com/ceph/ceph-container/pull/1497
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This change adds two tasks to set grafana-api user and password
that are required to inject dashboard layouts to the external
grafana instance.
Without these two parameters the ceph-ansible playbook fails
showing an authorization error (HTTPError: 401 Client Error:
Unauthorized").
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1767365
Signed-off-by: fmount <fpantano@redhat.com>
Curently, the dist and mock configurations are hardcoded in the
Makefile to be el8 and epel-7-x86_64, respectively. This commit
allows the user to override those settings using the DIST and
MOCK_CONFIG environment variables, falling back to the current
defaults if not set.
This provides additional flexibility when building the RPM directly
from the repository.
Signed-off-by: Javier Peña <jpena@redhat.com>
This brings the possibility to modify the rgw nfs user to use specific
keys when those are defined.
Signed-off-by: Radu Toader <radu.m.toader@gmail.com>
This commit adds a default value in the `with_dict` because when using
python 2.7, if a task using a `with_dict` has a condition, it is
evaluated anyway whereas in python 3 it isn't.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1766499
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
There's no need to use the default filter on active/standby groups
because if the group doesn't exist then the play is just skipped.
Currently this generates warnings like:
[WARNING]: Could not match supplied host pattern, ignoring: |
[WARNING]: Could not match supplied host pattern, ignoring: default([])
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The active mds host should be based on the inventory hostname and not on
the ansible hostname.
The value returns under the mdsmap structure is based on the OS hostname
so we need to find the right node in the inventory with this value when
doing operation on inventory nodes.
Othewise we could see error like:
The task includes an option with an undefined variable. The error was:
"hostvars[foobar]" is undefined
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
To avoid unnecessary ansible warnings during playbook execution we can
move the library and plugins test files under a different directory.
[WARNING]: Skipping plugin (plugins/filter/test_ipaddrs_in_ranges.py) as
it seems to be invalid:
cannot import name 'ipaddrs_in_ranges'
Closes: #4656
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
mon_host should use the inventory hostname and not the node hostname.
Fix creates an issue when the inventory and node hostname are different.
Closes: #4670
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The ceph-container-engine role is missing from both playbooks so the
container engine (docker, podman) isn't install resulting in a failure
on the added nodes.
fatal: [xxxxx]: FAILED! => changed=false
cmd: docker --version
msg: '[Errno 2] No such file or directory'
rc: 2
Closes: #4634
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Add ceph_docker_registry_username and ceph_docker_registry_password
variables in ceph-defaults role so they will be present in the group_vars
samples but commented.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1763139
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
since c09b82a80a392ccd0da7677c7b424ce5cd3fa5d6 in ceph/ceph we must call
mon_status from asok instead.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit adds the support of the ceph-iscsi stable repository when
use ceph_repository community instead of always using the devel
repositories.
We're still using the devel repositories for rtslib and tcmu-runner in
both cases (dev and community).
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
We don't have to use ceph_iscsi_config_dev (default true) on RHCS
because all iscsi packages are already included in the RHCS
repositories.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
We don't need this since [1]. Also this was only working for python2 and
not supporting python3.
[1] https://github.com/ceph/ceph-iscsi/commit/00f198a
This reverts commit 167737dd3d.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When deploying with packages then the ceph-container-common role isn't
executed so the registry authentication task is ignored.
Closes: #4636
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This commit removes the backslash in allow command parameter, this was
needed before the ceph_key module integration.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
It doesn't make sense to test the old 3.0.x container images with
nautilus+ ceph releases.
Also disable the dashboard deployment and switch to bluestore backend.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This task is failing when `ceph_docker_registry_auth` is enabled and
`ceph_docker_registry_username` is undefined with an ansible error
instead of the expected message.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1763139
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
If ansible-lint reports an error then it's skipped. We should fail in
this case.
This patch also fixes the pipefail lint in the rbd mirror role
[306] Shells that use pipes should set the pipefail option
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The [group|host]_vars directories are ignored for the dashboard playbook
when the inventory file directory doesn't contain those directories.
Closes: #4601
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1761612
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Otherwise it fails like following:
```
TASK [ceph-mds : allow multimds] **************************************************************************************************************************************************
Monday 22 July 2019 16:37:38 +0800 (0:00:03.269) 0:13:25.651 ***********
fatal: [rhel7u6clone1]: FAILED! => {"msg": "The conditional check 'ceph_release_num[ceph_release] == ceph_release_num.luminous' failed. The error was: error while evaluating conditional (ceph_release_num[ceph_release] == ceph_release_num.luminous): 'dict object' has no attribute u'dummy'\n\nThe error appears to have been in '/usr/share/ceph-ansible/roles/ceph-mds/tasks/create_mds_filesystems.yml': line 43, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: allow multimds\n ^ here\n"}
```
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1645379
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The commit replaces the pv/vg/lv commands used with the ansible command
module by the lvg and lvol modules.
This also fixes the size of the second data LV because we were only using
50% of the remaining space instead of 100%.
With a 50G device, the result was:
- data-lv1 was 25G
- data-lv2 was 12.5G
Instead of:
- data-lv1 was 25G
- data-lv2 was 25G
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When using python3 the name of the rtslib rpm is python3-rtslib. The
packages that use rtslib already have code that detects the python
version and distro deps, so drop it from the ceph iscsi gw task list and
let the ceph-iscsi rpm dependency handle it.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1760930
Signed-off-by: Mike Christie <mchristi@redhat.com>
this task is a leftover and no longer needed.
It even causes bug when collocating nfs with mon.
Closes: #4609
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Due the 'failed_when: false' statement present in the peer task then
the playbook continues to ran even if the peer task was failing (like
incorrect remote peer format.
"stderr": "rbd: invalid spec 'admin@cluster1'"
This patch adds a task to list the peer present and add the peer only if
it's not already added. With this we don't need the failed_when statement
anymore.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1665877
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>