When collocating OSDs with other daemon, `num_osds` is incorrectly calculated
because `ceph-config` is called multiple times.
Indeed, the following code:
```
num_osds: "{{ lvm_list.stdout | default('{}') | from_json | length | int + num_osds | default(0) | int }}"
```
makes `num_osds` be incremented each time `ceph-config` is called.
We have to reset it in order to get the correct number of expected OSDs.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This adds the realm pull operation to the current radosgw_realm module.
The pull operation requires the url, access/secret key variables.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Skip the `get initial keyring when it already exists` task when both commands
whose `stdout` output it requires have been skipped (e.g. when running in check
mode).
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
There are times where being able to skip OSD creation is useful to the
admin (see #1777 for example), and skipping the prepare_osd tag is a
way to achieve this. Document this fact.
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
Sometimes it's useful to be able to skip the OSD creation step when
running ceph-ansible (cf #1777). The lvm scenario has a prepare_osd
tag on the relevant play. This commit adds the same tag to the
lvm-batch scenario.
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
It has come to our attention that using ansible_* vars that are
populated with INJECT_FACTS_AS_VARS=True is not very performant. In
order to be able to support setting that to off, we need to update the
references to use ansible_facts[<thing>] instead of ansible_<thing>.
Related: ansible#73654
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935406
Signed-off-by: Alex Schultz <aschultz@redhat.com>
http://docs.ceph.com/docs/nautilus/radosgw/frontends/ 404s so replace
it with a working "latest" docs link, and correct the spelling of
"additional" while I'm at it.
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
the `ceph_cmd` fact is missing the `--net=host` parameter.
Some tasks consuming this fact can fail like following:
```
Error: error configuring network namespace for container b8ec913db1fb694ae683faf202680de7a59c714a004e533aba87e8503d29261f: Missing CNI default network
```
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1931365
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The current dashboard images deployed have a bad health index.
Updating to a newer version fixes this issue.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1925350
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
config_template.py depends on six, which isn't listed in the default
requirements.txt. This previously frequently wasn't a problem, because
six used to be a standard package being installed into a venv, and
lots of other projects depended on it.
It also does get installed for unit and integration tests via
tests/requirements.txt, so any broken dependency on six wouldn't be
detected by tox runs.
However, as other projects and distributions have phased out Python
2.7 support the dependency on six becomes less common. Thus, as long
as ceph-ansible does require it for config_template.py, add it to the
base requirements.
Signed-off-by: Florian Haas <florian@citynetwork.eu>
When asking `ceph-volume` to report only in `lvm batch` context, there's
a bug described in bz1896803 [1] when `--yes` is passed (which by the
way isn't necessary with `--report`).
This commit ensure `--yes` isn't passed to `ceph-volume` when `--report`
is used.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1896803
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1896803
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit makes sure purge playbooks remove those file if for any reason they
have been left.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1920900
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
There's no need to slow down the playbook execution time by migrating
all the `ceph-crash` instances in a serial way. Let's remove the
`serial: 1` so the migration is achieved in a parallel way.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
we aren't deploying enough OSD daemon, so it fails like following:
```
stderr: 'Error ERANGE: pool id 10 pg_num 256 size 2 would mean 1536 total pgs, which exceeds max 1500 (mon_max_pg_per_osd 250 * num_in_osds 6)'
```
Let's increase the value of `mon_max_pg_per_osd` in order to get around
this issue in the CI.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This adds more documentation to the configuration and usage of
containerizerd deployment.
Closes: #6198
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When running the rolling_update.yml playbook and adding the dashboard
component in the same time then the requirement (like container packages)
aren't installed.
This could lead to a failure in case of using authentication on the
container registry because the playbook will try to login on the registry
but podman/docker aren't yet installed.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1903504
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918650
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The monitoring node running grafana needs the rhcs tools repostory
enabled in non containerized deployment to be able to install the
ceph-grafana-dashboards rpm package.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918650
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This commit pins the ansible-lint version to 4.3.7 as ceph-ansible isn't
compatible with recent changes in 5.0.0
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Otherwise, the job fails when it tries to create a bucket with `s3cmd mb`
command because we have too many PGs per OSD.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
if `rgw_zonegroupmaster` is not defined at the rgw instance level in
`rgw_instances` it will fallback to a wrong variable (`rgw_zonemaster`).
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1925247
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since b105549 we don't install node-exporter on client nodes so we should
also exclude the client node from the node-exporter upgrade.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Since eefe11d the grafana-server group has been renamed to monitoring
but the dashboard playbook wasn't updated.
This was still working due to the backward compatibility added in the
ceph-facts role.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The CentOS 8 vagrant box has finally been updated [1] with a recent
version (the latest one 2011 which means CentOS 8.3).
We don't need to download the vagrant libvirt box with a direct url
anymore from the CentOS infrastructure.
[1] https://app.vagrantup.com/centos/boxes/8
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
since master is now deploying quincy, we must update this.
Otherwise, it will fail like following:
```
Error EPERM: require_osd_release cannot be lowered once it has been set
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Due to recent changes in shaman, there's a chance it returns the wrong
repository from architecture point of view.
We can query shaman and ask for the correct architecture to get around
this.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The ceph mgr command output is printed on stderr instead of stdout which
prevent to set the changed flag to false if the module is already enabled.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
There's no reason to not use the ceph_osd_flag module to set/unset osd
flags.
Also if there's no OSD nodes in the inventory then we don't need to
execute the set/unset play.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Instead of doing some scripting via the shell module, we can use the
parted ansible module to check the boot flag on partitions.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Those devices (db/wal) are never zapped in lvm batch deployment.
Iterating over `dedicated_devices` and `bluestore_wal_devices` fixes
this issue.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1922926
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
There's no need to set the rgw_instances_all fact for each node. We can
rely on run_once for that one.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When the zonegroup or the zone doesn't have a realm associated then
it's not possible to modify that ressource.
This patch allows to retrieve the current realm id and compare it to
the realm id from the realm in parameter.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When rerunning the cephadm-adopt.yml playbook the radosgw realm,
zonegroup and zone tasks will fail because the task isn't
idempotent.
Using the radosgw ansible modules solves that problem.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>