remove old systemd unit files (non-containerized) during the
switch_to_containers transition.
We have seen sometimes the unit started is the old one instead of the
new systemd unit generated.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
use the ceph binary from the container instead of the host.
If the ceph CLI version isn't compatible between host and container
image, it can cause the CLI to hang.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
instead of using `RuntimeDirectory` parameter in systemd unit files,
let's use a systemd `tmpfiles.d` to ensure `/run/ceph`.
Explanation:
`podman` doesn't create the `/var/run/ceph` if it doesn't exist the time
where the container is run while `docker` used to create it.
In case of `switch_to_containers` scenario, `/run/ceph` gets created by
a tmpfiles.d systemd file; when switching to containers, the systemd
unit file complains because `/run/ceph` already exists
The better fix would be to ensure `/usr/lib/tmpfiles.d/ceph-common.conf`
is removed and only rely on `RuntimeDirectory` from systemd unit file parameter
but we come from a non-containerized environment which is already running,
it means `/run/ceph` is already created and when starting the unit to
start the container, systemd will still complain and we can't simply
remove the directory if daemons are collocated.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
`ceph-mon` tries to redeploy monitors because it assumes it was not yet
deployed since `mon_socket_stat` and `ceph_mon_container_stat` are
undefined (indeed, we stop the daemon before calling `ceph-mon` in the
switch_to_containers playbook).
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since it's already confusing whether ntp_daemon_type should be "ntp" or
"ntpd", fix the mistake in the title of the task that aborts if
ntp_daemon_type is set to "ntpd" and OS being used is Atomic.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Since Atomic doesn't allow any installations and NTPd is not present
on Atomic image we are using, abort when ntp_daemon_type is set to ntpd.
https://github.com/ceph/ceph-ansible/issues/3572
Signed-off-by: Rishabh Dave <ridave@redhat.com>
167 is the ceph uid for Red Hat based system, thus trying to deploy a
monitor on Debian fail since the ceph user id on that system is 64045.
This commit uses the ceph_uid variable which contains the right uid
based on system/container detection.
Closes: https://github.com/ceph/ceph-ansible/issues/3589
Signed-off-by: Sébastien Han <seb@redhat.com>
When {{omit}} is concatenated with another string, it expands to something
like __omit_place_holder__63eea0d96dd6ed867b95405e11d87dddf61f448d.
However, in these use-cases we need an empty string.
Regression introduced in d53f55e807.
Signed-off-by: Leah Neukirchen <leah.neukirchen@mayflower.de>
The task setup chronyd called the handler disable chronyd, which of
course defeats the purpose.
Changing the task to disable ntpd instead fixes the issue of chronyd
being disabled after it got enabled.
Fixes: #3582
Signed-off-by: Patrick C. F. Ernzer pcfe@redhat.com
Typical error:
```
fatal: [iscsi-gw0]: FAILED! =>
msg: 'an error occurred while trying to read the file ''/home/guits/ceph-ansible/tests/functional/all_daemons/fetch/e5f4ab94-c099-4781-b592-dbd440a9d6f3/iscsi-gateway.key'': [Errno 13] Permission denied: b''/home/guits/ceph-ansible/tests/functional/all_daemons/fetch/e5f4ab94-c099-4781-b592-dbd440a9d6f3/iscsi-gateway.key'''
```
`become: True` is not needed on the following task:
`copy crt file(s) to gateway nodes`.
Since it's already set in the main playbook (site.yml/site-container.yml)
The thing is that the files get generated in the 'fetch_directory' with
root user because there is a 'delegate_to' + we run the playbook with
`become: True` (from main playbook).
The idea here is to create files under ansible user so we can open them
later to copy them on the remote machine.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
With 'podman version 1.0.0' on RHEL8 beta the 'get ceph version' and
'ceph monitor mkfs' commands fail [1] with "error configuring network
namespace for container Missing CNI default network".
When net=host is added these errors are resolved. net=host is used in
many other calls (grep -R net=host | wc -l --> 38).
Fixes: #3561
Signed-off-by: John Fulton <fulton@redhat.com>
(cherry picked from commit 410abd7745)
when ceph-container-common notifies handlers because a new container
image has been pulled, ceph-handler will throw an error because of
undefined variables since they are set in ceph-facts role.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
/var/run/ceph resides in a non persistent filesystem (tmpfs)
After a reboot, all daemons won't start because this directory will be
missing.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
without this, the command `ceph-volume lvm list --format json` hangs and
takes a very long time to complete.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
`test_mon_host_line_has_correct_value()` will cover this test in
anycase. It doesn't worth to have a dedicated test for this.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
- also add `--foreground` which seems to fix some issue we are facing when
using timeout with `podman`.
- use this fact in the `is ceph running already?` task.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since fact_caching_connection uses a directory in $HOME already, write the
ansible.log to the same directory.
Fixes: #3509
Signed-off-by: Patrick C. F. Ernzer <pcfe@redhat.com>
The restart_osd_daemon.sh generated from the j2 template
contains a python call which uses 'print x' instead of
'print(x)'. Add the missing parentheses to make this call
compatible with both 2 and 3.
Also add parentheses to other python print calls found
in roles/ceph-client/defaults/main.yml and
infrastructure-playbooks/cluster-os-migration.yml.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1671721
Signed-off-by: John Fulton <fulton@redhat.com>
This task used to live in ceph-osd, but we need it defined here to that
ceph-config can use it when trying to determine the number of osds.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
If osd_auto_discovery is set with the lvm scenario it's expected for
lvm_volumes and devices to be empty.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
If user sets "docker_pull_timeout: '0'" then do not use the
timeout command when running podman/docker pull. Also, use
"timeout -s KILL"; without KILL, podman on RHEL8 beta does
not timeout and deployment can hang.
Related: https://bugzilla.redhat.com/show_bug.cgi?id=1670625
Signed-off-by: John Fulton <fulton@redhat.com>