This adds ceph-*.target systemd unit files support for containerized
deployments.
This also fixes a regression introduced by PR #6719 (rgw and nfs systemd
units not getting purged)
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1962748
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 09ef465f62)
When calling the `ceph_key` module with `state: info`, if the ceph
command called fails, the actual error is hidden by the module which
makes it pretty difficult to troubleshoot.
The current code always states that if rc is not equal to 0 the keyring
doesn't exist.
`state: info` should always return the actual rc, stdout and stderr.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1964889
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d58500ade0)
Skip the `get initial keyring when it already exists` task when both commands
whose `stdout` output it requires have been skipped (e.g. when running in check
mode).
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit e98d9b70bd)
It has come to our attention that using ansible_* vars that are
populated with INJECT_FACTS_AS_VARS=True is not very performant. In
order to be able to support setting that to off, we need to update the
references to use ansible_facts[<thing>] instead of ansible_<thing>.
Related: ansible#73654
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935406
Signed-off-by: Alex Schultz <aschultz@redhat.com>
(cherry picked from commit a7f2fa73e6)
Due to missing condition on `cephx` variable, cephx disabled deployments
are broken.
This commit fixes this.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1910151
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4af0845702)
Since the backing generate_secret() just hands out urandom output,
running as privileged doesn't seem to be required. It's not
desireable to provide sudo in some Ansible runner environments.
Signed-off-by: Jukka Nousiainen <jukka.nousiainen@csc.fi>
(cherry picked from commit eb7473491b)
adding monitor is no longer possible because we generate a new mon
keyring each time the playbook is run.
Fixes: #5864
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1902281
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 970c6a4ee6)
Most ansible module using a state parameter default to the present
value (when available) instead of using it as a mandatory option.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit abb4023d76)
Since this variable makes it possible to customize the mode for ceph
directories, let's make it a bit more explicit by adding a default value
in ceph-defaults.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 483adb5d79)
After rolling updates performed with
`infrastructure-playbooks/rolling_updates.yml`, files located in
`/var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }}` had mode 0755 (including
the keyring), making them world-readable.
This commit separates the task that configured permissions recursively on
`/var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }}` into two separate tasks:
1. Set the ownership and mode of the directory itself;
2. Recursively set ownership in the directory, but don't modify the mode.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 0d76826bbb)
Skip the `get initial keyring when it already exists` task when both commands
whose `stdout` output it requires have been skipped (e.g. when running in check
mode).
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 8f436ab5d8)
Make sure the `site.yml.sample` playbook can be run in check mode by skipping
tasks that try to read the output of commands that have been skipped.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 54ba38e35e)
In non containerized deployment we check if the service is running
via the socket file presence.
This is done via the xxx_socket_stat variable that check the file
socket in the /var/run/ceph/ directory.
In some scenarios, we could have the socket file still present in
that directory but not used by any process.
That's why we have the xxx_stat variable which clean those leftovers.
The problem here is that we're set the variable for the handlers status
(like handler_mon_status) based on xxx_socket_stat instead of xxx_stat.
That means we will trigger the handlers if there's an old socket file
present on the system without any process associated.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1866834
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 733596582d)
If we want to create crush rules with the create-replicated sub command
and device class then we need to have the OSD created before the crush
rules otherwise the device classes won't exist.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ed36a11eab)
We must pick up a mon which actually exists in ceph-facts in order to
detect if a cluster is running. Otherwise, it will state no cluster is
already running which will end up deploying a new monitor isolated in a
new quorum.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622688
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 86f3eeb717)
[303] mktemp used in place of tempfile module
[602] Don't compare to empty string
[701] No 'galaxy_info' found
[702] Use 'galaxy_tags' rather than 'categories'
This patch also changes the ansible log_path value via the
ANSIBLE_LOG_PATH environment variable in the travis configuration to
avoid warnings.
[WARNING]: log file at /home/travis/ansible/ansible.log is not writeable
and we cannot create it, aborting
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f7fd0b6d4f)
This commit isolates the systemd unit files generation for containers into
separate yml files in order to be able importing each corresponding roles
without playing all tasks.
This is needed so we can run ceph-ansible to render systemd unit files
so they call podman instead of docker.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit bd64167469)
This commit makes it possible to parametrize the ceph directories modes.
So it changes hardocded mode for ceph related directories from 0755 to
customizable with `ceph_directories_mode` variable.
Closes: #2920
Signed-off-by: Artur Fijalkowski <artur.fijalkowski@ing.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 011270ca69)
there is no need to use `shell` in these tasks. Let's use `command`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4df92152c0)
in order to use the right binary name when using python cli in command
or shell module.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 13815ad3ca)
The ability to add nodes with the monitor role to an existing cluster
whose name differs from the default name is fixed.
Signed-off-by: ilyashestopalov <usr.tester@yandex.ru>
(cherry picked from commit 904532c5e2)
We already set the become flag to true at a play level in the site*
playbooks so we don't need to set it at a task level.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7c3640177b)
By running ceph-ansible there are a lot ``[DEPRECATION WARNING]`` like these:
```
[DEPRECATION WARNING]: evaluating containerized_deployment as a bare variable,
this behaviour will go away and you might need to add |bool to the expression
in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This
feature will be removed in version 2.12. Deprecation warnings can be disabled
by setting deprecation_warnings=False in ansible.cfg.
```
Now appended ``| bool`` on a lot of the affected variables.
Sometimes the coding style from ``variable|bool`` changed to ``variable | bool`` *(with spaces at the pipe)*.
Closes: #4022
Signed-off-by: L3D <l3d@c3woc.de>
(cherry picked from commit ab54fe20ec)
This commit renames the `docker_exec_cmd` variable to
`container_exec_cmd` so it's more generic.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e74d80e72f)
Except for some corner case, it's not correct to access some other
node's copy of variable docker_exec_cmd. Therefore replace
"hostvars[groups[mon_group_name][0]]['docker_exec_cmd']" by
"docker_exec_cmd".
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 89748d579a)
Add code in ceph-mgr for creating a keyring for manager in so that
managers can be deployed on a separate node too.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 56bfec7c58)
Keywords requiring only one item shouldn't express it by creating a
list with single item.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 739a662c80)
Conflicts:
roles/ceph-mon/tasks/ceph_keys.yml
roles/ceph-validate/tasks/check_devices.yml
Otherwise the reader is forced to search for "when" when blocks are too
long.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit e0beaf123a)
Conflicts:
roles/ceph-config/tasks/main.yml
roles/ceph-container-common/tasks/pre_requisites/prerequisites.yml
roles/ceph-validate/tasks/check_devices.yml
When adding a new monitor, we must reuse the existing initial monitor
keyring. Otherwise, the new monitor will issue its 'mkfs' with a new
monitor keyring and it will result with a mismatch between them. The
new monitor will be unable to join the quorum in the end.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit edf1ee2073)
Let's use a condition to run this task only on the first mon.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 631e5d3144)
According to rdo testing https://review.rdoproject.org/r/#/c/18721
a check on the output of the ceph_health value is added to
allow the playbook to make several attempts (according to the
retry/delay variables) when waiting the cluster quorum or
when the container bootstrap is not ended.
It avoids the failure of the command execution when it doesn't
receive a valid json object to decode (because cluster is too
slow to boostrap compared to ceph-ansible task execution).
Signed-off-by: fpantano <fpantano@redhat.com>
(cherry picked from commit afbb90e4ac)
This prevents the packaging from restarting services before we do need
to restart them in the rolling update sequence.
We want to handle services restart at rolling_update playbook.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
As of nautilus, the initial keyrings list has changed, it means when
upgrading from Luminous or Mimic, it is expected there's a mismatch
between what is found on the cluster and the expected initial keyring
list hardcoded in ceph_key module. We shouldn't fail when upgrading to
nautilus.
str_to_bool() took from ceph-volume.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-Authored-by: Alfredo Deza <adeza@redhat.com>
Currently the default crush rule value is added to the ceph config
on the mon nodes as an extra configuration applied after the template
generation via the ansible ini module.
This implies two behaviors:
1/ On each ceph-ansible run, the ceph.conf will be regenerated via
ceph-config+template and then ceph-mon+ini_file. This leads to a
non necessary daemons restart.
2/ When other ceph daemons are collocated on the monitor nodes
(like mgr or rgw), the default crush rule value will be erased by
the ceph.conf template (mon -> mgr -> rgw).
This patch adds the osd_pool_default_crush_rule config to the ceph
template and only for the monitor nodes (like crush_rules.yml).
The default crush rule id is read (if exist) from the current ceph
configuration.
The default configuration is -1 (ceph default).
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1638092
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The following lint issues have been resolved:
[301] Commands should not change things if nothing needs doing
/home/travis/build/ceph/ceph-ansible/roles/ceph-mon/tasks/ceph_keys.yml:2
[305] Use shell only when shell functionality is required
/home/travis/build/ceph/ceph-ansible/roles/ceph-osd/tasks/start_osds.yml:47
[301] Commands should not change things if nothing needs doing
/home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:2
[301] Commands should not change things if nothing needs doing
/home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:7
[301] Commands should not change things if nothing needs doing
/home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:14
[301] Commands should not change things if nothing needs doing
/home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:19
[301] Commands should not change things if nothing needs doing
/home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:24
Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>
There's no need to set the client_admin_ceph_authtool_cap variable
via a set_fact task.
Instead we can set this in the role defaults.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The administrator keyring needs full capabilities on mds like mon,
osd and mgr.
Whithout this, the client.admin key won't be able to run commands
against mds (like ceph tell mds.0 session ls)
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1672878
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Set directories to 755 and files to 644 to /var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }} recursively instead of setting files and directories to 755 recursively. The ceph mon process writes files to this path with permissions 644. This update stops ansible from updating the permissions in /var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }} every time ceph mon writes a file and increases idempotency.
Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>
`ceph-mon` tries to redeploy monitors because it assumes it was not yet
deployed since `mon_socket_stat` and `ceph_mon_container_stat` are
undefined (indeed, we stop the daemon before calling `ceph-mon` in the
switch_to_containers playbook).
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
167 is the ceph uid for Red Hat based system, thus trying to deploy a
monitor on Debian fail since the ceph user id on that system is 64045.
This commit uses the ceph_uid variable which contains the right uid
based on system/container detection.
Closes: https://github.com/ceph/ceph-ansible/issues/3589
Signed-off-by: Sébastien Han <seb@redhat.com>