When collocating OSDs with other daemon, `num_osds` is incorrectly calculated
because `ceph-config` is called multiple times.
Indeed, the following code:
```
num_osds: "{{ lvm_list.stdout | default('{}') | from_json | length | int + num_osds | default(0) | int }}"
```
makes `num_osds` be incremented each time `ceph-config` is called.
We have to reset it in order to get the correct number of expected OSDs.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 31a0f2653d)
If the legacy name `grafana-server` is still being used when upgrading
from Nautilus to Pacific, the task that sets the fact `rolling_update`
to `true` doesn't run on the node(s) included in that group. Indeed the
play where we set this fact (`rolling_update`) only runs on the group
`monitoring_group_name | default('monitoring')`.
As a workaround, we can run earlier the task which converts the
`grafana-server` group name to `monitoring`.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935554
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6ccc8b4722)
There are times where being able to skip OSD creation is useful to the
admin (see #1777 for example), and skipping the prepare_osd tag is a
way to achieve this. Document this fact.
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
(cherry picked from commit e66b7b7449)
Sometimes it's useful to be able to skip the OSD creation step when
running ceph-ansible (cf #1777). The lvm scenario has a prepare_osd
tag on the relevant play. This commit adds the same tag to the
lvm-batch scenario.
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
(cherry picked from commit 88d119e95a)
While working on the previous PR, I found a couple of typos in the
docs. This fixes those.
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
(cherry picked from commit 8b1474ab75)
http://docs.ceph.com/docs/nautilus/radosgw/frontends/ 404s so replace
it with a working "pacific" docs link, and correct the spelling of
"additional" while I'm at it.
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
(cherry picked from commit 847611048e)
In order to avoid false positive in the CI that I've been unable to
reproduce.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f7fd1c2298)
the `ceph_cmd` fact is missing the `--net=host` parameter.
Some tasks consuming this fact can fail like following:
```
Error: error configuring network namespace for container b8ec913db1fb694ae683faf202680de7a59c714a004e533aba87e8503d29261f: Missing CNI default network
```
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1931365
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f143b1a647)
config_template.py depends on six, which isn't listed in the default
requirements.txt. This previously frequently wasn't a problem, because
six used to be a standard package being installed into a venv, and
lots of other projects depended on it.
It also does get installed for unit and integration tests via
tests/requirements.txt, so any broken dependency on six wouldn't be
detected by tox runs.
However, as other projects and distributions have phased out Python
2.7 support the dependency on six becomes less common. Thus, as long
as ceph-ansible does require it for config_template.py, add it to the
base requirements.
Signed-off-by: Florian Haas <florian@citynetwork.eu>
(cherry picked from commit d49ea9818b)
The current dashboard images deployed have a bad health index.
Updating to a newer version fixes this issue.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1925350
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a16ae693d8)
When asking `ceph-volume` to report only in `lvm batch` context, there's
a bug described in bz1896803 [1] when `--yes` is passed (which by the
way isn't necessary with `--report`).
This commit ensure `--yes` isn't passed to `ceph-volume` when `--report`
is used.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1896803
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1896803
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fe6d6ba622)
This commit makes sure purge playbooks remove those file if for any reason they
have been left.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1920900
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b9dd253a4f)
There's no need to slow down the playbook execution time by migrating
all the `ceph-crash` instances in a serial way. Let's remove the
`serial: 1` so the migration is achieved in a parallel way.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 980a5a7df4)
we aren't deploying enough OSD daemon, so it fails like following:
```
stderr: 'Error ERANGE: pool id 10 pg_num 256 size 2 would mean 1536 total pgs, which exceeds max 1500 (mon_max_pg_per_osd 250 * num_in_osds 6)'
```
Let's increase the value of `mon_max_pg_per_osd` in order to get around
this issue in the CI.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 682116023d)
Given there's no pacific packages available at
https://download.ceph.com, let's use shaman in order to test against
Ceph Pacific
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This adds more documentation to the configuration and usage of
containerizerd deployment.
Closes: #6198
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When running the rolling_update.yml playbook and adding the dashboard
component in the same time then the requirement (like container packages)
aren't installed.
This could lead to a failure in case of using authentication on the
container registry because the playbook will try to login on the registry
but podman/docker aren't yet installed.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1903504
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918650
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The monitoring node running grafana needs the rhcs tools repostory
enabled in non containerized deployment to be able to install the
ceph-grafana-dashboards rpm package.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918650
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This commit pins the ansible-lint version to 4.3.7 as ceph-ansible isn't
compatible with recent changes in 5.0.0
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Otherwise, the job fails when it tries to create a bucket with `s3cmd mb`
command because we have too many PGs per OSD.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
if `rgw_zonegroupmaster` is not defined at the rgw instance level in
`rgw_instances` it will fallback to a wrong variable (`rgw_zonemaster`).
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1925247
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since b105549 we don't install node-exporter on client nodes so we should
also exclude the client node from the node-exporter upgrade.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Since eefe11d the grafana-server group has been renamed to monitoring
but the dashboard playbook wasn't updated.
This was still working due to the backward compatibility added in the
ceph-facts role.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The CentOS 8 vagrant box has finally been updated [1] with a recent
version (the latest one 2011 which means CentOS 8.3).
We don't need to download the vagrant libvirt box with a direct url
anymore from the CentOS infrastructure.
[1] https://app.vagrantup.com/centos/boxes/8
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
since master is now deploying quincy, we must update this.
Otherwise, it will fail like following:
```
Error EPERM: require_osd_release cannot be lowered once it has been set
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Due to recent changes in shaman, there's a chance it returns the wrong
repository from architecture point of view.
We can query shaman and ask for the correct architecture to get around
this.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The ceph mgr command output is printed on stderr instead of stdout which
prevent to set the changed flag to false if the module is already enabled.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
There's no reason to not use the ceph_osd_flag module to set/unset osd
flags.
Also if there's no OSD nodes in the inventory then we don't need to
execute the set/unset play.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Instead of doing some scripting via the shell module, we can use the
parted ansible module to check the boot flag on partitions.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Those devices (db/wal) are never zapped in lvm batch deployment.
Iterating over `dedicated_devices` and `bluestore_wal_devices` fixes
this issue.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1922926
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
There's no need to set the rgw_instances_all fact for each node. We can
rely on run_once for that one.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When the zonegroup or the zone doesn't have a realm associated then
it's not possible to modify that ressource.
This patch allows to retrieve the current realm id and compare it to
the realm id from the realm in parameter.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When rerunning the cephadm-adopt.yml playbook the radosgw realm,
zonegroup and zone tasks will fail because the task isn't
idempotent.
Using the radosgw ansible modules solves that problem.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Rerunning the cephadm_adopt module on an already adopted daemon will
fail because the cephadm adopt command isn't idempotent.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918424
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
If the cephadm-adopt.yml fails during the first execution and some
daemons have already been adopted by cephadm then we can't rerun
the playbook because the old container won't exist anymore.
Error: no container with name or ID ceph-mon-xxx found: no such container
If the daemons are adopted then the old systemd unit doesn't exist anymore
so any call to that unit with systemd will fail.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918424
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
We already do that in the other systemd templates (mgr, mds, etc..)
and would present to add workaround in other orchestration tool.
This change is for containerized deployment only.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1882724
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>