When using monitor_address_block or radosgw_address_block variables
to configure the mon/rgw address we're getting the first ip address
from the ansible facts present in that cidr.
When there's VIP on that network the first filter could return the
wrong value.
This seems to affect only IPv6 setup because the VIP addresses are
added to the ansible facts at the beginning of the list. This is the
opposite (at the end) when using IPv4.
This causes the mon/rgw processes to bind on the VIP address.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1680155
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fd4b0ec7eb)
The path of the RGW environment file (in the /var/lib/ceph/radosgw/
directory) depends on the Ceph clustername. It was not taken into
account in the Ansible role `ceph-rgw`.
Signed-off-by: flaf <francois.lafont.1978@gmail.com>
(cherry picked from commit 4c3e77d869)
When mgrs are implicitly collocated on monitors (no mgrs in mgrs group).
That include was skipped because of this condition :
`inventory_hostname == groups[mgr_group_name][0]`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit cbfdbab177)
before managing mgr modules, we must ensure all mgr are available
otherwise we can hit failure like following:
```
stdout:Error ENOENT: all mgr daemons do not support module 'restful', pass --force to force enablement
```
It happens because all mgr are not yet available when trying to manage
with mgr modules.
Closes: #3100
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f596cc1711)
According to rdo testing https://review.rdoproject.org/r/#/c/18721
a check on the output of the ceph_health value is added to
allow the playbook to make several attempts (according to the
retry/delay variables) when waiting the cluster quorum or
when the container bootstrap is not ended.
It avoids the failure of the command execution when it doesn't
receive a valid json object to decode (because cluster is too
slow to boostrap compared to ceph-ansible task execution).
Signed-off-by: fpantano <fpantano@redhat.com>
(cherry picked from commit afbb90e4ac)
In containerized deployment the default radosgw quota is too low
for production environment.
This is causing performance degradation compared to bare-metal.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1680171
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d3ae9fd05f)
looks like newer version of pytest-xdist requires pytest>=4.4.0
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ba0a95211c)
perform upgrade from luminous to nautilus.
All dev* references are not needed anymore in stable-4.0
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
`lvm_volumes` and/or `devices` variable(s) can be undefined depending on
the scenario chosen.
These tasks should be run only if these variable are defined, otherwise
it ends up with undefined variable errors.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0180738313)
Since https://github.com/ceph/ceph/commit/77912c0 ceph-volume uses
stdout encoding based on LC_CTYPE and PYTHONIOENCODING environment
variables.
Thoses variables aren't set when using ansible.
Currently this commit breaks non containerized deployment on Ubuntu.
TASK [use ceph-volume to create bluestore osds] ********************
cmd:
- ceph-volume
- --cluster
- ceph
- lvm
- create
- --bluestore
- --data
- /dev/sdb
rc: 1
stderr: |-
Traceback (most recent call last):
(...)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in
position 132: ordinal not in range(128)
Note that the task is failing on ansible side due to the stdout
decoding but the osd creation is successful.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7e5e4229b7)
there's no need to test this on all scenarios.
testing idempotency on all_daemons should be enough and allow us to save
precious resources for the CI.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 136bfe096c)
On stable-4.0 branch we don't want to use dev setup but stable
release (nautilus).
Also update the container image tag to reflect this change.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Similar to #3658
Since there's too many changes between master and stable branches let's
commit directly in each branches instead of trying to backport this
commit.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When installing python-minimal on Ubuntu bionic, this will add the
/usr/bin/python symlink to the default python interpreter.
On bionic, this isn't python2 but python3.
$ /usr/bin/python --version
Python 3.6.7
The python docker library is only installed for python2 which causes
issues when running the purge-docker-cluster playbook. This playbook
uses the ansible docker modules and requires to have python bindings
installed on the remote host.
Without the bindings we can see python error reported by the docker
module.
msg: Failed to import docker or docker-py - No module named 'docker'.
Try `pip install docker` or `pip install docker-py` (Python 2.6)
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
On containerized CI jobs the playbook executed is purge-cluster.yml
but it should be set to purge-docker-cluster.yml
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Ubuntu cloud archive is configurable via ceph_repository variable but
the uca choice isn't accepted.
This commit fixes this issue and also validates the associated uca
repository variables.
Resolves: #3739
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
sometimes those tasks might fail because of a timeout.
I've been facing this several times in the CI, adding this retry might
help and won't hurt in any case.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
all rgw instances should be stopped according to the multiple rgw
instances support added in rolling_update.yml
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Add a couple of fixes to allow containerized deployments upgrade support
to upgrade from luminous/mimic to nautilus.
- pass CEPH_CONTAINER_IMAGE and CEPH_CONTAINER_BINARY environment
variable to the ceph_key module,
- fix the docker exec command in 'waiting for the containerized monitor
to join the quorum' task according to the `delegate_to` parameter,
- override `docker_exec_cmd` in `ceph-facts` with `mon_host` when
rolling_update is `True`,
- do not run unnecessarily `create_mds_filesystems.yml` when performing an
upgrade.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
iscsigws were missing.
The 'complete upgrade' couldn't complete because rolling_update was set
to False for iscsigw nodes.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This file is becoming too big, let's isolate the update related code in
a dedicated tox configuration file.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
once the cluster is upgraded to nautilus, we can complete the process by
disallowing pre-nautilus OSDs and enabling all new nautilus-only functionality
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
As of 1c760904b0, ceph-ansible implicitly
bootstrap managers on monitors.
mgrs must be upgraded only after all monitors, therefore, this commit
refact the way mgrs are upgraded to be sure we don't upgrade a mgr
during the monitors upgrade.
This commit also ensure we handle the case were we split managers on
dedicated nodes.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This directory is created by ceph-config node by node.
In the upgrade context we need it to be created on ALL monitors as soon
as the first iteration because of the task right after which creates and sends
the keyrings on all monitors.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This prevents the packaging from restarting services before we do need
to restart them in the rolling update sequence.
We want to handle services restart at rolling_update playbook.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
There is no need to set osd flags (noout, norebalance) each time we
upgrade a mon.
This commit moves up those tasks (before stopping the mon) so we don't need
to delegate them.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We actually want to ensure the node being upgraded is joining the quorum
instead of the monitor picked up earlier.
Indeed, the `mon_host`is used only in `delegate_to:` so we can still run ceph
commands while the monitor being upgraded is stopped.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
the `containerized` parameter in ceph_key module doesn't exist anymore.
This was making the module failing but was hidden because of the
`ignore_errors: True`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
As of nautilus, the initial keyrings list has changed, it means when
upgrading from Luminous or Mimic, it is expected there's a mismatch
between what is found on the cluster and the expected initial keyring
list hardcoded in ceph_key module. We shouldn't fail when upgrading to
nautilus.
str_to_bool() took from ceph-volume.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-Authored-by: Alfredo Deza <adeza@redhat.com>
rolling_update playbook already takes care of stopping/starting services
during the sequence. There's no need to trigger potential unwanted
services restart.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When using osd_scenario lvm, we never check if the lvm2 package is
present on the host.
When using containerized deployment and docker on CentOS/RedHat this
package will be automatically installed as a dependency but not for
Ubuntu distribution.
OSD deployed via ceph-volume require the lvmetad.socket to be active
and running.
Resolves: #3728
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Also fixed rhcs_edits.txt for variable ceph_docker_registry.
Moved namespace to ceph_docker_image variable.
Signed-off-by: Phuong Nguyen <pnguyen@redhat.com>
Since all files in container image have moved to `/opt/ceph-container`
this check must look for new AND the old path so it's backward
compatible. Otherwise it could end up by templating an inconsistent
`ceph-osd-run.sh`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When using monitor_address_block to determine the ip address of the
monitor node, we need an ip address available in that cidr to be
present in the ansible facts (ansible_all_ipv[46]_addresses).
Currently we don't check if there's an ip address available during
the ceph-validate role.
As a result, the ceph-config role fails due to an empty list during
ceph.conf template creation but the error isn't explicit.
TASK [ceph-config : generate ceph.conf configuration file] *****
fatal: [0]: FAILED! => {"msg": "No first item, sequence was empty."}
With this patch we will fail before the ceph deployment with an
explicit failure message.
Resolves: rhbz#1673687
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When using community repository we need to set the priority on the
ceph repositories because we could have some conflict with EPEL
packages.
In order to set the priority on the ceph repositories, we need to
install the yum-plugin-priorities package.
http://docs.ceph.com/docs/master/install/get-packages/#rpm-packages
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>