Playbook must fail anyway, the `rescue` block has been introduced for
unmasking the unit after the playbook has failed.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e9ddb972fe)
This commit makes the playbook fetch the minimal current ceph
configuration and write it later on monitoring nodes so `cephadm` can
proceed with the adoption.
When a monitoring stack was deployed on a dedicated node, it means no
`ceph.conf` file was written, `cephadm` requires a `ceph.conf` in order
to adopt the daemon present on the node.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1939887
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b445df0479)
It has come to our attention that using ansible_* vars that are
populated with INJECT_FACTS_AS_VARS=True is not very performant. In
order to be able to support setting that to off, we need to update the
references to use ansible_facts[<thing>] instead of ansible_<thing>.
Related: ansible#73654
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935406
Signed-off-by: Alex Schultz <aschultz@redhat.com>
(cherry picked from commit a7f2fa73e6)
`command -v` is a bash script which needs a shell to run.
Fixes: #6325
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 14c472707c)
This commit makes the playbook fetch the minimal current ceph
configuration and write it later on monitoring nodes so `cephadm` can
proceed with the adoption.
When a monitoring stack was deployed on a dedicated node, it means no
`ceph.conf` file was written, `cephadm` requires a `ceph.conf` in order
to adopt the daemon present on the node.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1939887
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b445df0479)
This commit makes sure purge playbooks remove those file if for any reason they
have been left.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1920900
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b9dd253a4f)
There's no need to slow down the playbook execution time by migrating
all the `ceph-crash` instances in a serial way. Let's remove the
`serial: 1` so the migration is achieved in a parallel way.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 980a5a7df4)
When running the rolling_update.yml playbook and adding the dashboard
component in the same time then the requirement (like container packages)
aren't installed.
This could lead to a failure in case of using authentication on the
container registry because the playbook will try to login on the registry
but podman/docker aren't yet installed.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1903504
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918650
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 48a456dc8c)
Since b105549 we don't install node-exporter on client nodes so we should
also exclude the client node from the node-exporter upgrade.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 94af3c87d1)
Those devices (db/wal) are never zapped in lvm batch deployment.
Iterating over `dedicated_devices` and `bluestore_wal_devices` fixes
this issue.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1922926
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 984191ac7f)
Since the default of `osd_objectstore` has changed as of 3.2, some
deployments might have a mix of filestore and bluestore OSDs on a same
node. In some specific cases, there's a possibility that a filestore OSD
shares a journal/db device with a bluestore OSD. We shouldn't try to
redeploy in this context because ceph-volume will complain. (either
because in lvm batch you can't pass partition or about gpt header).
The safest option is to skip the migration on the node when such a mix
is detected or force all osds including those already using bluestore
(option `force_filestore_to_bluestore=True` has to be passed as an extra var).
If all OSDs are using filestore, then they will be migrated to
bluestore.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1875777
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e66f12d138)
The current check makes no sense because it checks any of other monitor
than the one being played (either a previous one already converted or a
next that isn't yet converted) is present on the quorum.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1909011
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 175ffa1b88)
Instead of iterate over the host list for adding the node/label to the
host orchestrator configuration then we can do it parallelly.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5b6f907a72)
This adds cephadm_bootstrap ansible module for replacing the command module
usage with the cephadm bootstrap command.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c3ed124d31)
This adds cephadm_adopt ansible module for replacing the command module
usage with the cephadm adopt command.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 08f118077f)
This adds ceph_osd_flag ansible module for replacing the command module
usage with the ceph osd set/unset commands.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5da593604a)
Since podman 2.x, there's now a confirmation when running podman
container prune command.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0108c9f941)
Let's discard the ansible lint error 306 and add a "# noqa 306" on tasks
where we don't need `set -o pipefail`
Fixes: #6090
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 86a8889ee3)
`ceph.target` should be disabled only. Otherwise, in collocation
scenario you stop other collocated services in the OSD play which isn't
what we want to do. Each daemon has its corresponding play for managing
the transition to container.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1901865
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0b05620597)
adding monitor is no longer possible because we generate a new mon
keyring each time the playbook is run.
Fixes: #5864
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 970c6a4ee6)
ignore 302,303 and 505 errors
[302] Using command rather than an argument to e.g. file
[303] Using command rather than module
[505] referenced files must exist
they aren't relevant on these tasks.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 195d88fcda)
Fix ansible-lint 504 error:
[504] Do not use 'local_action', use 'delegate_to: localhost'
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c948b668eb)
Fix ansible-lint 502 error:
[502] All tasks should be named
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 97dd9218dd)
Fix ansible-lint 305 error:
[305] Use shell only when shell functionality is required
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 11b4bf5083)
Fix ansible lint 206 error:
[206] Variables should have spaces before and after: {{ var_name }}
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9fba6eecfa)
Fix ansible lint 301 error:
[301] Commands should not change things if nothing needs doing
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5450de58b3)
Fix ansible lint 306 error:
[306] Shells that use pipes should set the pipefail option
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1879c26eb9)
fa2bb3a only fix the symlink owner/group issue in the OSD play. If the
OSDs are collocated with other services like MONs and MGRs then the
chown command will fail.
$ find /var/lib/ceph/osd/ceph-0 -not -user 167 -execdir chown 167:167 {} +
chown: cannot dereference './block': Permission denied
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1896448
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 35ed9977aa)
When deploying the ceph OSD via the packages then the ceph-osd@.service
unit is configured as enabled-runtime.
This means that each ceph-osd service will inherit from that state.
The enabled-runtime systemd state doesn't survive after a reboot.
For non containerized deployment the OSD are still starting after a
reboot because there's the ceph-volume@.service and/or ceph-osd.target
units that are doing the job.
$ systemctl list-unit-files|egrep '^ceph-(volume|osd)'|column -t
ceph-osd@.service enabled-runtime
ceph-volume@.service enabled
ceph-osd.target enabled
When switching to containerized deployment we are stopping/disabling
ceph-osd@XX.servive, ceph-volume and ceph.target and then removing the
systemd unit files.
But the new systemd units for containerized ceph-osd service will still
inherit from ceph-osd@.service unit file.
As a consequence, if an OSD host is rebooting after the playbook execution
then the ceph-osd service won't come back because they aren't enabled at
boot.
This patch also adds a reboot and testinfra run after running the switch
to container playbook.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1881288
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fa2bb3af86)
cec994b introduced a regression when a mgr is collocated with a mon.
During the mon upgrade, the mgr service is masked to avoid to be
restarted on packages update.
Then the start mgr task is failing because the service is still masked.
Instead we should unmask it.
Fixes: #5983
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3d3ce26327)
bd611a7 introduced the new ceph_fs module but missed some tasks in
rolling_update and shrink-mds playbooks.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 16afe90806)
The ceph status command returns a lot of information stored in variables
and/or facts which could consume resources for nothing.
When checking the cluster health, we're using the health structure in the
ceph status output.
To optimize this, we could use the ceph health command which contains
the same needed information.
$ ceph status -f json | wc -c
2001
$ ceph health -f json | wc -c
46
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit acddf4fb67)
The ceph status command returns a lot of information stored in variables
and/or facts which could consume resources for nothing.
When checking the rgw/rbdmirror services status, we're only using the
servicmap structure in the ceph status output.
To optimize this, we could use the ceph service dump command which contains
the same needed information.
This command returns less information and is slightly faster than the ceph
status command.
$ ceph status -f json | wc -c
2001
$ ceph service dump -f json | wc -c
1105
$ time ceph status -f json > /dev/null
real 0m0.557s
user 0m0.516s
sys 0m0.040s
$ time ceph service dump -f json > /dev/null
real 0m0.454s
user 0m0.434s
sys 0m0.020s
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3f9081931f)
The ceph status command returns a lot of information stored in variables
and/or facts which could consume resources for nothing.
When checking the quorum status, we're only using the quorum_names
structure in the ceph status output.
To optimize this, we could use the ceph quorum_status command which contains
the same needed information.
This command returns less information.
$ ceph status -f json | wc -c
2001
$ ceph quorum_status -f json | wc -c
957
$ time ceph status -f json > /dev/null
real 0m0.577s
user 0m0.538s
sys 0m0.029s
$ time ceph quorum_status -f json > /dev/null
real 0m0.544s
user 0m0.527s
sys 0m0.016s
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 88f91d8c12)
The ceph status command returns a lot of information stored in variables
and/or facts which could consume resources for nothing.
When checking the pgs state, we're using the pgmap structure in the ceph
status output.
To optimize this, we could use the ceph pg stat command which contains
the same needed information.
This command returns less information (only about pgs) and is slightly
faster than the ceph status command.
$ ceph status -f json | wc -c
2000
$ ceph pg stat -f json | wc -c
240
$ time ceph status -f json > /dev/null
real 0m0.529s
user 0m0.503s
sys 0m0.024s
$ time ceph pg stat -f json > /dev/null
real 0m0.426s
user 0m0.409s
sys 0m0.016s
The data returned by the ceph status is even bigger when using the
nautilus release.
$ ceph status -f json | wc -c
35005
$ ceph pg stat -f json | wc -c
240
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ee50588590)
This adds the ceph_fs ansible module for replacing the command module
usage with the ceph fs command.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bd611a785b)
This playbook isn't needed anymore, we can achieve this operation by
running main playbook with `--limit` option.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 20718582da)
This commit adds the `osd_auto_discovery` scenario support in the
filestore-to-bluestore playbook.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1881523
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8b1eeef18a)
There's no need ot have a copy of this file in infrastructure-playbooks
directory.
playbooks in that directory can be run from the root dir of
ceph-ansible.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f906caa6da)
As of 2.10, group names containing a dash are invalid.
However, setting this option makes it still possible to use a dash in
group names and prevent this warning to show up.
It might need to be definitely addressed in a future ansible release.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1880476
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6938ed1302)
When running the switch2container playbook on a Debian based system
then the systemd unit path isn't the same than Red Hat based system.
Because the systemd unit files aren't removed then the new container
systemd unit isn't take in count.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c1af69a7e7)
Most ansible module using a state parameter default to the present
value (when available) instead of using it as a mandatory option.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit abb4023d76)
We already do this in the site-container.yml playbook because we don't
need docker/podman installed on all client nodes and having the
container image only on the first client node.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8ecbdc6ede)
When running the rolling_update playbook with an inventory without
monitor nodes defined (like external scenario) then we can't retrieve
the cluster fsid from the running monitor.
In this scenario we have to pass this information manually (group_vars
or host_vars).
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1877426
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f63022dfec)
On DCN environments, or when multiple ceph cluster are configured,
we need to specify the cluster name before running the command or
the rolling_update playbook will fail during minor updates.
Closes: https://bugzilla.redhat.com/1876447
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
(cherry picked from commit cb64df30b6)
In the OSP context, during the rolling update the playbook fails
with the following error:
'''
ERROR! The field 'hosts' has an invalid value, which includes an
undefined variable. The error was: list object has no element 0
'''
This PR just change the hosts field providing a valid mons group
value.
Closes: https://bugzilla.redhat.com/1876803
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
(cherry picked from commit e65f9a5c72)
There's no need to use `ignore_errors: true` on these tasks.
Using a loop on the task stopping mon daemons allows us to avoid
duplicating this task, the `ignore_errors` isn't needed here because it
won't fail the playbook if one of the ID doesn't exist (shortname vs. fqdn)
Using the right condition on the task starting the mgr daemon allows us
to avoid using an `ignore_errors: true` as well.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit cec994b973)
When using fqdn in inventory host file, this task will fail because the
mds is registered with its shortname.
It means we must use `mds_to_kill_hostname` in this task.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1869837
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 51c382677d)
This way we keep consistency with purge-container-cluster.yml playbook.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f77fa6e2a4)
ceph-volume can generate large logs at some point.
debug logs by definition should be enabled only when debugging.
Let's make it customizable with a variable which is set to `False` by
default.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 448cc280b7)
When running `infrastructure-playbooks/purge-cluster.yml` twice, it fails the
second time on the `ensure rbd devices are unmapped` task, because `rbdmap`
isn't installed anymore at that point.
This commit adds a check that ensures `rbdmap` is available, and skips the
`ensure rbd devices are unmapped` task if it isn't.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit a57fd7a090)
The task "remove old systemd unit file" under "switching from
non-containerized to containerized ceph rgw" only removes
the ceph-radosgw@.service file. The task should also remove
the ceph-radosgw.target file, like the "remove old systemd unit
files" tasks for the mons, mgrs, osds, etc, in order to clean up
all of the unused systemd unit files.
Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>
(cherry picked from commit d19e6033b2)
Otherwise it leaves an empty directory.
When shrinking and redeploying multiple OSDs you have no guarantee it
will reuse the same osd id.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8933bfde33)
This handles missing /etc/ceph/osd, by ensuring we actually found files in
`/etc/ceph/osd` before trying to slurp their content.
This also add a missing `| default(False)` to avoid fowlloing error:
```
fatal: [ceph01]: FAILED! =>
msg: |-
The conditional check 'ceph_osd_data_json[item.2]['encrypted'] | bool' failed. The error was: error while evaluating conditional (ceph_osd_data_json[item.2]['encrypted'] | bool): 'dict object' has no attribute 'encrypted'
```
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1862416
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit fe8fbd3ee2)
In addition of 155e2a2, the active mds daemons isn't stop/start
correctly as opposed as the other services so that daemon doesn't come
back after the upgrade.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1861688
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ec0a37a74f)
The dashboard upgrade workflow should do the same process than the ceph
upgrade otherwise any systemd unit modification won't be apply on the
monitoring/dashboard stack.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1859173
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a6209bd957)
During the daemon upgrade we're
- stopping the service when it's not containerized
- running the daemon role
- start the service when it's not containerized
- restart the service when it's containerized
This implementation has multiple issue.
1/ We don't use the same service workflow when using containers
or baremetal.
2/ The explicity daemon start isn't required since we'are already
doing this in the daemon role.
3/ Any non backward changes in the systemd unit template (for
containerized deployment) won't work due to the restart usage.
This patch refacts the rolling_update playbook by using the same service
stop task for both containerized and baremetal deployment at the start
of the upgrade play.
It removes the explicit service start task because it's already included
in the dedicated role.
The service restart tasks for containerized deployment are also
removed.
Finally, this adds the missing service stop task for ceph crash upgrade
workflow.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1859173
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 155e2a23d5)
This commit introduces a new role `ceph-crash` in order to deploy
everything needed for the ceph-crash daemon.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9d2f2108e1)
Set the cephadm cmd as a fact instead of rewriting the same command
over and over.
This also fix an issue when using docker as container engine because
the --docker cephadm parameter should be use before the subcommand
not after.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5ef965c4dc)
This adds a new playbook for deploying ceph via cephadm.
This also adds a new dedicated tox file for CI purpose.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 957903d561)
This is a partial revert of b38019e because we don't want to execute
the whole play on the monitor otherwise if we have some empty group
like rgws or mdss then the orchestrator commands will still be
executed.
Instead we should keep the real target group name at play level and
delegate the orchestator commands to the monitor. The whole play
will be skipped is the group is empty.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9596494911)
Print a message at the end of the playbook to inform users that they
don't have to user ceph-ansible playbooks anymore as everything else
need to be done via cephadm (day 2 operation).
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 75ae1b7e90)
When reporting the orchestrator service/daemon list at the end of the
playbook, we can use the --refresh option otherwise we could have
an outdated output.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7164426456)
After adopting a monitor we need to wait that monitor to join back
the quorum before moving to the next node.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0c3a2b72ff)
Like rolling_update or switch2container playbooks, we need to set/unset
some osd flags before and after the OSD daemons adoption.
This also adds a task for waiting for clean pgs at then of an OSd node.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d3b3c8948e)
At the end of the process when don't need the cephadm script.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c3bbc6b13c)
At the end of the playbook we can show the orchestrator status like
we do with the ceph status in initial deployment.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 381201a394)
It's better to use the --placement parameter when using ceph orch apply
commands to avoid confusion in the parameters.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 91a6c79e41)
cephadm uses default value for dashboard container images which need to
be customized by ansible for upstream or downstream purpose.
This feature wasn't present when cephadm-adopt.yml has been designed.
Also set the container_image_base variable for upgrade purpose.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f2d997396e)
It looks like we can't run the ceph orch apply commands on nodes other
than monitors even if it used to work in the past.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b38019e3ca)
If the systemd service exists successfully then we don't need to reset
the failed state.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 27efcbc0e5)
The ceph config assimilate-conf command requires the client.admin
keyring which isn't present on all nodes most of the time.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fd36433826)
When rgw and osd are collocated, the current workflow prevents from
scaling out the radosgw_num_instances parameter when rerunning the
playbook.
The environment file used in the rgw systemd template is rendered when
executing the `ceph-rgw` role but during a new run of the playbook (in
order to scale out rgw instances), handlers are triggered from `ceph-osd`
role which is run before `ceph-rgw`, therefore it tries to start the new
rgw daemon whereas its corresponding environment file hasn't been
rendered yet and fails like following:
```
ceph-radosgw@rgw.ceph4osd3.rgw1.service failed to run 'start-pre' task: No such file or directory
```
This commit moves the tasks generating this file in `ceph-config` role
so it is generated early.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1851906
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7dd68b9ac1)
By default, ansible gathers facts from facter and ohai if installed on
the remote nodes, given we don't need them, let's exclude these facts
from our facts gathering
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c95adc564b)
If a failure occurs in ceph-validate, the upgrade playbook keeps running
where we expect it to fail.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8f9cdf4b10)
The commit adds a new playbook for converting an existing ceph cluster
deployed by ceph-ansible to the cephadm orchestrator.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 548ff26256)
Since we only have one scenario since nautilus then we can just move
the container start command from ceph-osd-run.sh to the systemd unit
service.
As a result, the ceph-osd-run.sh.j2 template and the
ceph_osd_docker_run_script_path variable are removed.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 829990e60d)
This commit makes the images pulling skipped if podman isn't installed
on the machine.
In OSP context, the podman installation is done later in the workflow,
it means all `podman pull` commands will fail.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1849559
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 37b20b6525)
We shouldn't set this flag when running switch_to_containers playbook.
Otherwise the playbook fails waiting for pgs to be clean.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1843569
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b91d60d384)
The systemd LOAD and ACTIVE fileds could have more than one space between
both values.
This update the systemd regex the same way we're using it in different
part of the code.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1843500
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 50140c9b5d)
The rbdmirror group name was using the wrong variable definition.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c0a213f928)
The dashboard nodes (alertmanager, grafana, node-exporter, and prometheus)
were not manage during the docker to podman migration.
This adds the systemd container template of those services to a dedicated
file (systemd.yml) in order to include it in the docker2podman playbook.
This also adds the dashboard container images pull from docker to podman.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1829389
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 252e78b4e4)
The docker2podman playbook only installs the podman package and updates
the systemd units with the right container_binary value.
We never pull the container image so if one service is restarted then
the container image will be pulled first before the service can start
which could cause longer downstream.
To avoid to download the container image from internet again we can just
pull it from the local docker daemon.
The container_{binding,package,service}_name variables are removed
because they are only used in the ceph-container-engine role which
isn't call in this playbook.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d38f21aeba)
When using skipped variables with from_json filter and python2 then we
need to have a default value otherwise the skipped task will fail.
Unexpected templating type error occurred on
({{ (ceph_volume_lvm_list.stdout | from_json) }}): expected string or
buffer
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790472
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2b9edba131)
The workflow in this playbook should be the same than in rolling_update,
we should first set noout and nodeep-scrub flags before migrating the
first osd and unset osd flags after the last osd is migrated.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2cfaa056e0)
When setting/unsetting osd flags, we can use `tasks_from` when importing
`ceph-facts` role to save some times given that we only need this role
for setting `container_binary`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6df7887f87)
We must call `ceph-osd` role from `container_options_facts.yml` because
ceph-osd-run.sh.j2 needs variables set in this file.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1819681
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4a4f54f6ee)
This commits removes these two symlinks.
They were there for backward compatibility and were marked deprecated as
of stable-4.0
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9219991441)
Adding `--all` to the `systemctl list-units` command in order to get
*all* osds id on the node (including stoppped osds). Otherwise, it will
purge the cluster but there will be leftover after that.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1814542
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5e7962ccf6)
Since Ceph Octopus is python3 only we don't need to specify the max open
files anymore with the container engine.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 64701437de)