in containerized context, using the binary provided in atomic os won't
work because it's an old version provided by ceph-common based on
10.2.5.
Using a container could be an idea but for large cluster with hundreds
of client nodes, that would require to pull the image of each of them
just to unmap the rbd devices.
Let's use the sysfs method in order to avoid any issue related to ceph
version that is shipped on the host.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1766064
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3cfcc7a105)
This tries to first unmount any cephfs/nfs-ganesha mount point on client
nodes, then unmap any mapped rbd devices and finally it tries to remove
ceph kernel modules.
If it fails it means some resources are still busy and should be cleaned
manually before continuing to purge the cluster.
This is done early in the playbook so the cluster stays untouched until
everything is ready for that operation, otherwise if you try to redeploy
a cluster it could end up by getting confused by leftover from previous
deployment.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1337915
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 20e4852888)
`lvm_volumes` and/or `devices` variable(s) can be undefined depending on
the scenario chosen.
These tasks should be run only if these variable are defined, otherwise
it ends up with undefined variable errors.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0180738313)
The systemd ceph-osd@.service file used for starting the ceph osd
containers is used in all osd_scenarios.
Currently purging a containerized deployment using the lvm scenario
didn't remove the ceph-osd systemd service.
If the next deployment is a non-containerized deployment, the OSDs
won't be online because the file is still present and override the
one from the package.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7cc626b72d)
sometimes we play the whole role `ceph-defaults` just to access the
default value of some variables. It means we play the `facts.yml` part
in this role while it's not desired. Splitting this role will speedup
the playbook.
Closes: #3282
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0eb56e36f8)
This task has to be called after the role `ceph-defaults` has been
played, otherwise, `mon_group_name` will never be known.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a12de3e048)
we want a default value for `mon_group_name`, not for
`groups[mon_group_name]`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d0b3cb7f85)
the OSD part of the purge delegates commands on monitor node, we need to
gather monitors facts to know the `ansible_hostname` fact that is used
in the `docker_exec_cmd` fact.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1a4a6ec855)
ceph-defaults relies on facts so we must gather facts before running it.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 62111ff53c)
Recently we introduced the collocation of mon and mgr by default, so we
don't need to have an explicit mgrs section for this. This means we have
to remove the mgr container on the mon machines too.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 325a159415)
# Conflicts:
# infrastructure-playbooks/purge-docker-cluster.yml
It's useful when running on CI to see what might remain on the machines.
So we list all the containers and images. We expect the list to be
empty.
We fail if we see containers running.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 2bcc00896f)
This commits adds the support for purging cluster that were deployed
with ceph-volume. It also separates nicely with a block intruction the
work to do when lvm is used or not.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 1751885bc9)
add iscsi support for both non containerized and containerized
deployment in purge playbooks.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1651054
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 78116fa6db)
38dc20e74b introduced a bug in the purge
playbooks because using `*` in `command` module doesn't work.
`/var/lib/ceph/*` files are not purged it means there is a leftover.
When trying to redeploy a cluster, it failed because monitor daemon was
detecting existing keyring, therefore, it assumed a cluster already
existed.
Typical error (from container output):
```
Sep 26 13:18:16 mon0 docker[31316]: 2018-09-26 13:18:16 /entrypoint.sh: Existing mon, trying to rejoin cluster...
Sep 26 13:18:16 mon0 docker[31316]: 2018-09-26 13:18:16.9323937f15b0d74700 -1 auth: unable to find a keyring on /etc/ceph/test.client.admin.keyring,/etc/ceph/test.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,:(2) No such file or directory
Sep 26 13:18:23 mon0 docker[31316]: 2018-09-26 13:18:23 /entrypoint.sh:
SUCCESS
```
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1633563
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Sometime /var/lib/ceph is mounted on a device so we won't be able to
remove it (device busy) so let's remove its content only.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1615872
Signed-off-by: Sébastien Han <seb@redhat.com>
Added 'ignore_errors: true' to multiple lines which run docker commands; even in cases where docker is no longer installed. Because of this, certain tasks in the purge-docker-cluster.yml will cause the playbook to fail if re-run and stop the purge. This leaves behind a dirty environment, and a playbook which can no longer be run.
Fix Regex line 275: Sometimes 'list-units' will output 4 spaces between loaded+active. The update will account for both scenarios.
purge fetch_directory: in other roles fetch_directory is hard linked ex.: "{{ fetch_directory }}"/"{{ somedir }}". That being said, fetch_directory will never have a trailing slash in the all.yml so this task was never being run(causing failures when trying to re-deploy).
Signed-off-by: Randy J. Martinez <ramartin@redhat.com>
The `remove_packages` prompt is redundant to the `ireallymeanit` prompt
since it does exactly the same thing. I guess the only goal of this task
was to make a break to warn user about `--skip-tags=with_pkg` feature.
This warning should be part of the first prompt.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
the `zap ceph osd disks` task should iter on `resolved_parent_device`
instead of `combined_devices_list` which contain only the base device
name (vs. full path name in `combined_devices_list`).
this fixes the issue where docker complain about container name because
of illegal characters such as `/` :
```
"/usr/bin/docker-current: Error response from daemon: Invalid container
name (ceph-osd-zap-magna074-/dev/sdb1), only [a-zA-Z0-9][a-zA-Z0-9_.-]
are allowed.","See '/usr/bin/docker-current run --help'."
""
```
having the the basename of the device path is enough for the container
name.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1540137
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This is a typo caused by leftover.
It was previously written like this :
`shell: echo /dev/$(lsblk -no pkname "{{ item }}") }}")`
and has been rewritten to :
`shell: $(lsblk --nodeps -no pkname "{{ item }}") }}")`
because we are appending later the '/dev/' in the next task.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1540137
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Using `lsblk` to resolv the parent device is better than just removing the last
char when passing it to the zap container.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Avoid using `awk` to get the different devices from the partlabel.
Using `blkid` is more readable.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
purge-docker-cluster must remove all osd_disk_prepare logs in
`{{ ceph_osd_docker_run_script_path }}`, otherwise if you purge your
cluster and try to redeploy it, osds will fail to start since because it
will try to retrieve find a partition uuid which doesn't exist.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1510470
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
During purge osd, the containers are not stopped because of a typo, as a
result, all the devices can't be unmounted later.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Ansible throws warnings when using yum/dnf/rpm with the command
module:
[WARNING]: Consider using yum module rather than running yum
This patch adds the `warn: no` argument to suppress the warnings
in the Ansible output.
rbd-mirror containers are not stopped in purge-docker-cluster playbook
because of the wrong name used.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
unti now, mgr nodes are not managed by purge-cluster.yml, therefore it
breaks scenario like purge_cluster.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The old name is used in `rolling_update.yml` and
`purge-docker-cluster.yml`, it breaks the
`test_rgw_service_is_running()` test.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
There is only two main scenarios now:
* collocated: everything remains on the same device:
- data, db, wal for bluestore
- data and journal for filestore
* non-collocated: dedicated device for some of the component
Signed-off-by: Sébastien Han <seb@redhat.com>
We need to include ceph_docker_registry when removing containers/images
because if we don't it will assume docker.io which is not always where
the image originated from, causing the playbook to fail.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
If we're purging a containerized cluster that did not use the
raw_multi_journal OSD scenario then raw_journal_devices will not be
defined which causes the playbook to fail.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1455187
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
Doing so at playbook level overrides whatever values might be set for
these in the user's group_vars directory that's relative to their
inventory.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
We now run the container and waits until it dies. Prior to this we were
stopping it before completion so not all the devices where zapped.
Signed-off-by: Sébastien Han <seb@redhat.com>
We changed the way we declare image.
Prior to this patch we must have a "user/image:tag"
format, which is incompatible with non docker-hub registry where you
usually don't have a "user". On the docker hub a "user" is also
identified as a namespace, so for Ceph the user was "ceph".
Variables have been simplified with only:
* ceph_docker_image
* ceph_docker_image_tag
1. For docker hub images: ceph_docker_name: "ceph/daemon" will give
you the 'daemon' image of the 'ceph' user.
2. For non docker hub images: ceph_docker_name: "daemon" will simply
give you the "daemon" image.
Infrastructure playbooks have been modified as well.
The file group_vars/all.docker.yml.sample has been removed as well.
It is hard to maintain since we have to generate it manually. If
you want to configure specific variables for a specific daemon simply
edit group_vars/$DAEMON.yml
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1420207
Signed-off-by: Sébastien Han <seb@redhat.com>