When using podman, the systemd unit scripts don't have a dependency
on the network. So we're not sure that the network is up and running
when the containers are starting.
With docker this behaviour is already handled because the systemd
unit scripts depend on docker service which is started after the
network.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
By running ceph-ansible there are a lot ``[DEPRECATION WARNING]`` like these:
```
[DEPRECATION WARNING]: evaluating containerized_deployment as a bare variable,
this behaviour will go away and you might need to add |bool to the expression
in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This
feature will be removed in version 2.12. Deprecation warnings can be disabled
by setting deprecation_warnings=False in ansible.cfg.
```
Now appended ``| bool`` on a lot of the affected variables.
Sometimes the coding style from ``variable|bool`` changed to ``variable | bool`` *(with spaces at the pipe)*.
Closes: #4022
Signed-off-by: L3D <l3d@c3woc.de>
This commit renames the `docker_exec_cmd` variable to
`container_exec_cmd` so it's more generic.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Add code in ceph-mgr for creating a keyring for manager in so that
managers can be deployed on a separate node too.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Except for some corner case, it's not correct to access some other
node's copy of variable docker_exec_cmd. Therefore replace
"hostvars[groups[mon_group_name][0]]['docker_exec_cmd']" by
"docker_exec_cmd".
Signed-off-by: Rishabh Dave <ridave@redhat.com>
When adding a new monitor, we must reuse the existing initial monitor
keyring. Otherwise, the new monitor will issue its 'mkfs' with a new
monitor keyring and it will result with a mismatch between them. The
new monitor will be unable to join the quorum in the end.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Rishabh Dave <ridave@redhat.com>
According to rdo testing https://review.rdoproject.org/r/#/c/18721
a check on the output of the ceph_health value is added to
allow the playbook to make several attempts (according to the
retry/delay variables) when waiting the cluster quorum or
when the container bootstrap is not ended.
It avoids the failure of the command execution when it doesn't
receive a valid json object to decode (because cluster is too
slow to boostrap compared to ceph-ansible task execution).
Signed-off-by: fpantano <fpantano@redhat.com>
This prevents the packaging from restarting services before we do need
to restart them in the rolling update sequence.
We want to handle services restart at rolling_update playbook.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
As of nautilus, the initial keyrings list has changed, it means when
upgrading from Luminous or Mimic, it is expected there's a mismatch
between what is found on the cluster and the expected initial keyring
list hardcoded in ceph_key module. We shouldn't fail when upgrading to
nautilus.
str_to_bool() took from ceph-volume.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-Authored-by: Alfredo Deza <adeza@redhat.com>
Currently the default crush rule value is added to the ceph config
on the mon nodes as an extra configuration applied after the template
generation via the ansible ini module.
This implies two behaviors:
1/ On each ceph-ansible run, the ceph.conf will be regenerated via
ceph-config+template and then ceph-mon+ini_file. This leads to a
non necessary daemons restart.
2/ When other ceph daemons are collocated on the monitor nodes
(like mgr or rgw), the default crush rule value will be erased by
the ceph.conf template (mon -> mgr -> rgw).
This patch adds the osd_pool_default_crush_rule config to the ceph
template and only for the monitor nodes (like crush_rules.yml).
The default crush rule id is read (if exist) from the current ceph
configuration.
The default configuration is -1 (ceph default).
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1638092
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
We don't need to set After=docker.service when the container_binary
variable isn't set to docker.
It doesn't break anything currently but it could be confusing when
using podman.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The following lint issues have been resolved:
[301] Commands should not change things if nothing needs doing
/home/travis/build/ceph/ceph-ansible/roles/ceph-mon/tasks/ceph_keys.yml:2
[305] Use shell only when shell functionality is required
/home/travis/build/ceph/ceph-ansible/roles/ceph-osd/tasks/start_osds.yml:47
[301] Commands should not change things if nothing needs doing
/home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:2
[301] Commands should not change things if nothing needs doing
/home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:7
[301] Commands should not change things if nothing needs doing
/home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:14
[301] Commands should not change things if nothing needs doing
/home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:19
[301] Commands should not change things if nothing needs doing
/home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:24
Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>
There's no need to set the client_admin_ceph_authtool_cap variable
via a set_fact task.
Instead we can set this in the role defaults.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The administrator keyring needs full capabilities on mds like mon,
osd and mgr.
Whithout this, the client.admin key won't be able to run commands
against mds (like ceph tell mds.0 session ls)
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1672878
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Set directories to 755 and files to 644 to /var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }} recursively instead of setting files and directories to 755 recursively. The ceph mon process writes files to this path with permissions 644. This update stops ansible from updating the permissions in /var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }} every time ceph mon writes a file and increases idempotency.
Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>
instead of using `RuntimeDirectory` parameter in systemd unit files,
let's use a systemd `tmpfiles.d` to ensure `/run/ceph`.
Explanation:
`podman` doesn't create the `/var/run/ceph` if it doesn't exist the time
where the container is run while `docker` used to create it.
In case of `switch_to_containers` scenario, `/run/ceph` gets created by
a tmpfiles.d systemd file; when switching to containers, the systemd
unit file complains because `/run/ceph` already exists
The better fix would be to ensure `/usr/lib/tmpfiles.d/ceph-common.conf`
is removed and only rely on `RuntimeDirectory` from systemd unit file parameter
but we come from a non-containerized environment which is already running,
it means `/run/ceph` is already created and when starting the unit to
start the container, systemd will still complain and we can't simply
remove the directory if daemons are collocated.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
`ceph-mon` tries to redeploy monitors because it assumes it was not yet
deployed since `mon_socket_stat` and `ceph_mon_container_stat` are
undefined (indeed, we stop the daemon before calling `ceph-mon` in the
switch_to_containers playbook).
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
167 is the ceph uid for Red Hat based system, thus trying to deploy a
monitor on Debian fail since the ceph user id on that system is 64045.
This commit uses the ceph_uid variable which contains the right uid
based on system/container detection.
Closes: https://github.com/ceph/ceph-ansible/issues/3589
Signed-off-by: Sébastien Han <seb@redhat.com>
With 'podman version 1.0.0' on RHEL8 beta the 'get ceph version' and
'ceph monitor mkfs' commands fail [1] with "error configuring network
namespace for container Missing CNI default network".
When net=host is added these errors are resolved. net=host is used in
many other calls (grep -R net=host | wc -l --> 38).
Fixes: #3561
Signed-off-by: John Fulton <fulton@redhat.com>
(cherry picked from commit 410abd7745)
/var/run/ceph resides in a non persistent filesystem (tmpfs)
After a reboot, all daemons won't start because this directory will be
missing.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
You can now use 'ceph_mon_container_listen_port' to change the port the
monitor will listen on.
Setting the default to 3300 (assigned by IANA) since Nautilus has released the messenger2
transport protocol.
Signed-off-by: Sébastien Han <seb@redhat.com>
This reverts commit ee08d1f89a which was
mostly to workaround a bug in ceph@master. Now, ceph@master is fixed so
reverting this. Thanks to https://github.com/ceph/ceph/pull/25900
Signed-off-by: Sébastien Han <seb@redhat.com>
Somewhat something changed with the introduction of msg2 and we have to
add each node as a peer so the monitors can form a quorum. This might be
due to our CI environment, although adding this is completly harmless
and solves monitors not being able to form quorum.
It seems that the initial monitor map wasn't containing the right
information about the peers (addresses like 0.0.0.0/0r1, for each rank.
Signed-off-by: Sébastien Han <seb@redhat.com>
These aliases have led to several issues making believe that ceph
binaries are actually present on the host when running the command.
However it wasn't explicit that the commands were only ran inside a
container.
It has brought to much confusion so we decided to remove them.
Closes: https://github.com/ceph/ceph-ansible/issues/3445
Signed-off-by: Sébastien Han <seb@redhat.com>
Json is a type structure which is always typed as a string, where before
this we were declaring a dict, which is not a json valid structure.
Signed-off-by: Sébastien Han <seb@redhat.com>
This commit unifies the container and non-container code, which in the
meantime gives use the ability to deploy N mon container at the same
time without having to serialized the deployment. This will drastically
reduces the time needed to bootstrap the cluster.
Note, this is only possible since Nautilus because the monitors are
bootstrap the initial keys on their own once they reach quorum. In the
Nautilus version of the ceph-container mon, we stopped generating the
keys 'manually' from inside the container, for more detail see: https://github.com/ceph/ceph-container/pull/1238
Signed-off-by: Sébastien Han <seb@redhat.com>
This will speed up the deployment and also deploy mon and mgr collocated
just as recommended.
This won't prevent you of adding more and dedicaded machines for mgr if
needed.
Signed-off-by: Sébastien Han <seb@redhat.com>
During the first iteration, the command won't return anything, or can
simply fail and might not return a valid json structure. Ansible will
fail parsing it in the filter `from_json` so let's default that variable
to empty dictionary.
Signed-off-by: Sébastien Han <seb@redhat.com>
`osd_pool_default_pg_num` parameter is set in `ceph-mon`.
When using ceph-ansible with `--limit` on a specifc group of nodes, it
will fail when trying to access this variables since it wouldn't be
defined.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1518696
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This is needed for Nautilus since the ceph-create-keys script goes away.
(https://github.com/ceph/ceph/pull/21305)
Now the module if called with 'state: fetch_initial_keys' will lookup
keys generated by the monitor and write them down on the filesystem to
the right location (/etc/ceph and /var/lib/ceph/boostrap*).
This is not applicable to container since keys are generated by the
container only.
Signed-off-by: Sébastien Han <seb@redhat.com>