e6bfb84 introduced a regression in the switch from non containerized
to container deployment.
We need to stop all previous OSDs services. We just don't need the
ceph-disk pattern in the regex.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
remove old systemd unit files (non-containerized) during the
switch_to_containers transition.
We have seen sometimes the unit started is the old one instead of the
new systemd unit generated.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
use the ceph binary from the container instead of the host.
If the ceph CLI version isn't compatible between host and container
image, it can cause the CLI to hang.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
`ceph-mon` tries to redeploy monitors because it assumes it was not yet
deployed since `mon_socket_stat` and `ceph_mon_container_stat` are
undefined (indeed, we stop the daemon before calling `ceph-mon` in the
switch_to_containers playbook).
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
sometimes we play the whole role `ceph-defaults` just to access the
default value of some variables. It means we play the `facts.yml` part
in this role while it's not desired. Splitting this role will speedup
the playbook.
Closes: #3282
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since sharing variables amongst roles has been made default since
Ansible 2.6, private option has been deprecated; so stop using it.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Use podman or docker wether they are available or not. podman will be
prioritized over docker if present.
Signed-off-by: Sébastien Han <seb@redhat.com>
It's easier lookup a directoriy instead of the block devices,
especially because of ceph-volume and ceph-disk have a different way to
handle devices.
Signed-off-by: Sébastien Han <seb@redhat.com>
Prior to this commit we were only disabling ceph-osd units, but forgot
the ceph.target which is controlling everything and will restart the
ceph-osd units at each reboot.
Now that everything gets disabled there won't be any conflicts between
the old non-container and the new container units.
Signed-off-by: Sébastien Han <seb@redhat.com>
If we mask it we won't be able to start the OSD container since now the
osd container use the osd ID as a name such as: ceph-osd@0
Fixes the error: Failed to execute operation: Cannot send after transport endpoint shutdown
Signed-off-by: Sébastien Han <seb@redhat.com>
Since import_role and include_role are more readable, explicit (about
the nature of inclusion) and flexible (allows placibf inclusion
anywhere) amongst the tasks, use them instead of using roles or role
keyword. Besides, these keywords also allow more arguments.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
The current regex had a limitation of 99 OSDs, now this limit has been
removed and regardless the number of OSDs they will all be collected.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1630430
Signed-off-by: Sébastien Han <seb@redhat.com>
Fixes the deprecation warning:
[DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of
using `result|search` use `result is search`.
Signed-off-by: Noah Watkins <nwatkins@redhat.com>
We need to copy this key into /etc/ceph so when ceph-docker-common runs
it can fetch it to the ansible server. Previously the task wasn't not
failing because `fail_on_missing` was False before 2.5, so now it's True
hence the failure.
Signed-off-by: Sébastien Han <seb@redhat.com>
Add missing call the ceph-handler role, otherwise we can't have
reference to variable registered from ceph-handler from other roles.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Similar to c13a3c3 we must allow scrubbing when running this playbook.
In cluster with a large number of PGs, it can be expected some of them
scrubbing, it's a normal operation.
Preventing from scrubbing operation force to set noscrub flag.
This commit allows to switch from non containerized to containerized
environment even while PGs are scrubbing.
Closes: #3182
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
During the transition from jewel non-container to container old ceph
units are disabled. ceph-disk can still remain in some cases and will
appear as 'loaded failed', this is not a problem although operators
might not like to see these units failing. That's why we remove them if
we find them.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1577846
Signed-off-by: Sébastien Han <seb@redhat.com>
In addition to b324c17 this commit fix the ceph uid for osd role in the
switch from non containerized to containerized playbook.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
If we don't do this, umounting devices declared like this
/dev/disk/by-id/ata-QEMU_HARDDISK_QM00001
will fail like:
umount: /dev/disk/by-id/ata-QEMU_HARDDISK_QM000011: mountpoint not found
Since we append '1' (partition 1), this won't work.
So we need to resolved the link to get something like /dev/sdb and then
append 1 to /dev/sdb1
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
We know bindmount with the :z option at the end of the -v command so
this will basically run the exact same command as we used to run. So to
speak:
chcon -Rt svirt_sandbox_file_t /var/lib/ceph
Signed-off-by: Sébastien Han <seb@redhat.com>
If we're working with a jewel cluster then this service will not exist.
This is mainly a problem with CI testing because our tests are setup to
work with both jewel and luminous, meaning that eventhough we want to
test jewel we still have a nfs-ganesha host in the test causing these
tasks to run.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
Use the pg check before doing the pg check, not on the quorum check.
Also never quote int when doing comparaison.
Signed-off-by: Sébastien Han <seb@redhat.com>
This commits allows us to run
switch-from-non-containerized-to-containerized-ceph-daemons.yml multiple
times.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1489353
Signed-off-by: Sébastien Han <seb@redhat.com>
If devices is passed through an extra var this register won't work so
let's only register the var is devices is not defined.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1489099
Signed-off-by: Sébastien Han <seb@redhat.com>
we need to force the value of `docker` variable which is initially set
to `false` since it's a migration from non-containerized to
containerized cluster.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We must mask the image so we are sure that even if the system reboots
then the OSDs won't start.
Also remove Ceph udev rules if found on the system prior to deploy
containers. If we don't do this we are exposed to conflicts between udev
rules and sytemd unit files.
Also add the CI will now test the migration from a non-containerized cluster to a
containerized cluster.
Signed-off-by: Sébastien Han <seb@redhat.com>
This commit eases the use of the
infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml
playbook. We basically run it with a couple of pre-tasks and then we let
the playbook run the docker roles.
It obviously expect to have proper variables configured in order to
work.
Signed-off-by: Sébastien Han <seb@redhat.com>
In the switch to containers migration there were broken references
to ceph_mon_docker_subnet variable, replaced with public_network.
Also fixes references to ceph_mon_docker_extra_env setting for it
a default as it could be undefined.
remove `ceph_mon_docker_interface` and use `monitor_interface` instead
for both containerized and non-containerized deployment.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The new test in the checks PGs are no longer working on distributions
where /bin/sh isn't linked to /bin/bash.
Fix: #1619
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Avoid screen scrapping by rewriting `waiting for clean pgs` tasks like it is
done in 304de48.
Use the json output returned by `ceph -s` instead
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Problem: too many different commands to do the same thing. The 'cut'
command on infrastructure-playbooks/purge-cluster.yml was also wrong.
This sed command from osixia in ceph-docker
https://github.com/ceph/ceph-docker/pull/580/ addresses all the
scenarios.
Signed-off-by: Sébastien Han <seb@redhat.com>