This adds cephadm_adopt ansible module for replacing the command module
usage with the cephadm adopt command.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 08f118077f)
When creating a new pool, target_size_ratio was ignored by ansible module ceph_pool.py.
target_size_ratio is now used when pg_autoscale_mode is on.
Tests added to library tests.
This adds too the use in the role ceph-rgw.
Signed-off-by: Fabien Brachere <fabien.brachere@celeste.fr>
(cherry picked from commit 4026ba9da1)
Since the major ceph-volume lvm batch refactoring, the report value
is different.
Before the refact, the report was a dict with the OSDs list to be created
under the "osds" key.
After the refact, the report is a list of dict.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 827b23353f)
This adds ceph_osd_flag ansible module for replacing the command module
usage with the ceph osd set/unset commands.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5da593604a)
The alertmanager, grafana and prometheus configuration file are
generated with the template module which doesn't allow for using
config overrides.
Instead we could use the config_template plugin action and add a
new variable for overrides (one for each component).
With this patch, one should be able to add configuration to
prometheus with the following:
---
alertmanager_conf_overrides:
global:
smtp_smarthost: 'localhost:25'
...
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1902999
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5a41026347)
Use global crush_device_class variable if it's not set per OSD
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit 5e9444fa5c)
This avoids interactive mode for `vagrant box remove`.
This can happen for some reason when there's leftover from previous
deployment (VMs not destroyed as expected)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 011c97786b)
Currently we create an object from the primary sites but we try to read
that object still from the master which doesn't make sense, we should
try to read it from a secondary site.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e2ea403d5e)
set fetch_directory variable in default/main.yml instead of using the
defaults jinja filter in tasks/main.yml.
Fixes: #6072
Signed-off-by: Karl-Heinz Preuß <karl-heinz.preuss@cms.hu-berlin.de>
(cherry picked from commit 6ce34ef59f)
This reverts commit 4d1fdd2b05.
This breaks the backward compatibility with previous osd_memory_target
calculation and we could have a value lower than the minimum value allowed
(896M) which causes some ceph commands to fail (like ceph assimilate-conf).
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit aa6e1f20ea)
Since podman 2.x, there's now a confirmation when running podman
container prune command.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0108c9f941)
Since the backing generate_secret() just hands out urandom output,
running as privileged doesn't seem to be required. It's not
desireable to provide sudo in some Ansible runner environments.
Signed-off-by: Jukka Nousiainen <jukka.nousiainen@csc.fi>
(cherry picked from commit eb7473491b)
Since the fetch_directory variable has been dropped then we don't need
the override in rhcs file.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a2cbab16a4)
Let's discard the ansible lint error 306 and add a "# noqa 306" on tasks
where we don't need `set -o pipefail`
Fixes: #6090
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 86a8889ee3)
This allows skipping this task if really desired.
Use it carefully. Use it at your own risk.
Fixes: #6073
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5c4ae5356d)
As of podman `2.0.5`, `--cap-add` and `--privileged` are exclusive
options.
```
Nov 30 13:56:30 magna089 podman[171677]: Error: invalid config provided: CapAdd and privileged are mutually exclusive options
```
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1902149
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d40dd764e0)
As of podman 2.0.5, `--ignore` param conflicts with `--storage`.
```
Nov 30 13:53:10 magna089 podman[164443]: Error: --storage conflicts with --volumes, --all, --latest, --ignore and --cidfile
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c68b124ba8)
`ceph.target` should be disabled only. Otherwise, in collocation
scenario you stop other collocated services in the OSD play which isn't
what we want to do. Each daemon has its corresponding play for managing
the transition to container.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1901865
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0b05620597)
Set the owner/group on alertmanager and prometheus directories and
files to nobody and nogroup (uid and gid 65534) to avoid permission
issues.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1901543
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit eb452d35bc)
adding monitor is no longer possible because we generate a new mon
keyring each time the playbook is run.
Fixes: #5864
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 970c6a4ee6)
When using a custom pool for iSCSI gateway then we need to set the pool
name in the configuration otherwise the default rbd pool name will be
used.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 40a87c4b92)
Let's use a github workflow instead of travis for this.
With this commit we can get rid of Travis.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 94c37b9de8)
ignore 302,303 and 505 errors
[302] Using command rather than an argument to e.g. file
[303] Using command rather than module
[505] referenced files must exist
they aren't relevant on these tasks.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 195d88fcda)
Fix ansible-lint 504 error:
[504] Do not use 'local_action', use 'delegate_to: localhost'
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c948b668eb)
Fix ansible-lint 502 error:
[502] All tasks should be named
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 97dd9218dd)
Fix ansible-lint 305 error:
[305] Use shell only when shell functionality is required
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 11b4bf5083)
Fix ansible lint 206 error:
[206] Variables should have spaces before and after: {{ var_name }}
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9fba6eecfa)
Fix ansible lint 301 error:
[301] Commands should not change things if nothing needs doing
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5450de58b3)
Fix ansible lint 306 error:
[306] Shells that use pipes should set the pipefail option
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1879c26eb9)
This commit ensures that the `/var/lib/ceph/osd/{{ cluster }}-{{ osd_id }}` is
present before starting OSDs.
This is needed specificly when redeploying an OSD in case of OS upgrade
failure.
Since ceph data are still present on its devices then the node can be
redeployed, however those directories aren't present since they are
initially created by ceph-volume. We could recreate them manually but
for better user experience we can ask ceph-ansible to recreate them.
NOTE:
this only works for OSDs that were deployed with ceph-volume.
ceph-disk deployed OSDs would have to get those directories recreated
manually.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1898486
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 873fc8ec0f)
We don't need to use run_once on that task when having running monitors
otherwise the read task could be skip and the set task will fail.
The conditional check 'crush_rule_variable.rc == 0' failed. The error
was: error while evaluating conditional (crush_rule_variable.rc == 0):
'dict object' has no attribute 'rc'
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1898856
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit e150df789e)
This commit enforces the pytest-rerunfailures installed so it's <9.0
This is to avoid the following error:
```
ERROR: pytest-rerunfailures 9.0 has requirement pytest>=5.0, but you'll have pytest 4.6.11 which is incompatible.
```
latest version of pytest-rerunfailures isn't compatible with the version
of pytest we are using.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 19097026fb)
This commit changes the bind mount option for the mount point
`/var/lib/ceph` in the systemd template for mon and mgr containers. This
is needed in case of collocating mon/mgr with osds using dmcrypt
scenario.
Once mon/mgr got converted to containers, the dmcrypt layer sub mount is
still seen in `/var/lib/ceph`. For some reason it makes the
corresponding devices busy so any other container can't open/close it.
As a result, it prevents osds from starting properly.
Since it only happens on the nodes converted before the OSD play, the idea is
to bind mount `/var/lib/ceph` on mon and mgr with the `rshared` option
so once the sub mount is unmounted, it is propagated inside the
container so it doesn't see that mount point.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1896392
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f5ba6d9b01)
This is a workaround to avoid error like following:
```
Error: error creating container storage: the container name "ceph-mgr-magna022" is already in use by "4a5f674e113f837a0cc561dea5d2cd55d16ca159a647b7794ab06c4c276ef701"
```
that doesn't seem to be 100% reproducible but it shows up after a
reboot. The only workaround we came up with at the moment is to run
`podman rm --storage <container>` before starting it.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1887716
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5ba7824c55)
fa2bb3a only fix the symlink owner/group issue in the OSD play. If the
OSDs are collocated with other services like MONs and MGRs then the
chown command will fail.
$ find /var/lib/ceph/osd/ceph-0 -not -user 167 -execdir chown 167:167 {} +
chown: cannot dereference './block': Permission denied
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1896448
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 35ed9977aa)
The `osd_pool_default_crush_rule` is set based on `crush_rule_variable`, which
is the output of a `grep` command.
However, two consecutive tasks can set that variable, and if the second task is
skipped, it still overwrites the `crush_rule_variable`, leading the
`osd_pool_default_crush_rule` to be set to `ceph_osd_pool_default_crush_rule`
instead of the output of the first task.
This commit ensures that the fact is set right after the `crush_rule_variable`
is assigned, before it can be overwritten.
Closes#5912
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit c5f7343a2f)
The osd_memory_target variable was only used if it was higher than the
calculated value based on the number of OSDs. This is changed to always
use the value if it is set in the configuration. This allows this value
to be intentionally set lower so that it does not have to be changed
when more OSDs are added later.
Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch>
(cherry picked from commit 4d1fdd2b05)
When deploying the ceph OSD via the packages then the ceph-osd@.service
unit is configured as enabled-runtime.
This means that each ceph-osd service will inherit from that state.
The enabled-runtime systemd state doesn't survive after a reboot.
For non containerized deployment the OSD are still starting after a
reboot because there's the ceph-volume@.service and/or ceph-osd.target
units that are doing the job.
$ systemctl list-unit-files|egrep '^ceph-(volume|osd)'|column -t
ceph-osd@.service enabled-runtime
ceph-volume@.service enabled
ceph-osd.target enabled
When switching to containerized deployment we are stopping/disabling
ceph-osd@XX.servive, ceph-volume and ceph.target and then removing the
systemd unit files.
But the new systemd units for containerized ceph-osd service will still
inherit from ceph-osd@.service unit file.
As a consequence, if an OSD host is rebooting after the playbook execution
then the ceph-osd service won't come back because they aren't enabled at
boot.
This patch also adds a reboot and testinfra run after running the switch
to container playbook.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1881288
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fa2bb3af86)
There are some use cases where there's a need to skip the execution
of the ceph-ansible client role even though the client section of the
inventory isn't empty.
This can happen in contexts where the services are colocated or when
a all-in-one deployment is performed.
The purpose of this change is adding a 'ceph_client' tag to avoid
altering the ceph-ansible execution flow but at the same time be able
to include or exclude a set of tasks using this tag.
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
(cherry picked from commit fafd5f871a)
This sets the `dashboard_grafana_api_no_ssl_verify` default value
according to the length of `dashboard_crt` and `dashboard_key`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5cadfea42e)
If some OSDs are to be created and others already exist the calculation
only counted the to be created OSDs. This changes the calculation to
take all OSDs into account.
Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch>
(cherry picked from commit 15044da030)
cec994b introduced a regression when a mgr is collocated with a mon.
During the mon upgrade, the mgr service is masked to avoid to be
restarted on packages update.
Then the start mgr task is failing because the service is still masked.
Instead we should unmask it.
Fixes: #5983
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3d3ce26327)