Commit Graph

5640 Commits (1fd0661d3ed5acae02120daeb56b33f3b3c570ff)
 

Author SHA1 Message Date
Guillaume Abrioux 1fd0661d3e rolling_update: unmask monitor service after a failure
if for some reason the playbook fails after the service was
stopped, disabled and masked and before it got restarted, enabled and
unmasked, the playbook leaves the service masked and which can make users
confused and forces them to unmask the unit manually.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1917680

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 07029e1bf1)
2021-03-26 15:20:35 +01:00
Ali Maredia fcd9544048 docs: rgw multisite docs with new rgw_instances config
Docs reflect that each instance of `rgw_instances`
can now take rgw_zonemaster, rgw_zonesecondary,
rgw_zonegroupmaster, rgw_multisite_proto.

Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit a59bc2da3b)
2021-03-26 07:42:35 +01:00
Guillaume Abrioux 8aa0dc2868 library: drop ceph_facts
This is never called in the playbook and seems unmaintained.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b01f16e835)
2021-03-26 00:07:18 +01:00
Ken Dreyer 37088d8c4f README-MULTISITE: fix typos
This commit fixes some typos in MULTISITE documentation.

Signed-off-by: Ken Dreyer <ktdreyer@redhat.com>
(cherry picked from commit 63a246db41)
2021-03-26 00:06:21 +01:00
Guillaume Abrioux 8a4fd99db7 convert some missed `ansible_*`` calls to `ansible_facts['*']`
This converts some missed calls to `ansible_*` that were missed in
initial PR #6312

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0163ecc924)
2021-03-26 00:04:49 +01:00
Alex Schultz 181924db7b Disable facts by default in ansible.cfg
As a continuation of a7f2fa73e6, this
change switches fact injection to off by default in the provided
ansible.cfg.

Signed-off-by: Alex Schultz <aschultz@redhat.com>
(cherry picked from commit db031a4993)
2021-03-26 00:04:49 +01:00
Alex Schultz 56aac327dd Use ansible_facts
It has come to our attention that using ansible_* vars that are
populated with INJECT_FACTS_AS_VARS=True is not very performant.  In
order to be able to support setting that to off, we need to update the
references to use ansible_facts[<thing>] instead of ansible_<thing>.

Related: ansible#73654
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935406
Signed-off-by: Alex Schultz <aschultz@redhat.com>
(cherry picked from commit a7f2fa73e6)
2021-03-26 00:04:49 +01:00
Guillaume Abrioux ab857d8b54 tests: use master build for iscsigws
pacific builds for iscsi pkgs aren't available, as a workaround we can
use builds from master.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-24 21:36:24 +01:00
Guillaume Abrioux 723efc8576 tests: switch to quay.ceph.io for dashboard images
for some reason, `quay.io/app-sre/grafana` no longer exist.
as a workaround, all dashboard related images have been mirrored on
quay.ceph.io.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c90b0985e5)
2021-03-24 21:36:24 +01:00
Guillaume Abrioux 65d1cfd634 iscsi: fetch right repo from shaman
due to recent changes in shaman, we must fetch the right repo by
filtering on the desired architecture.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5801171b37)
2021-03-24 21:36:24 +01:00
Guillaume Abrioux 439cb79e3e tests: fix `test_rgw_is_up` test
The data structure seems to have been modified in ceph@master (quincy).

This commit update the test accordingly.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b8080bac41)
2021-03-24 21:36:24 +01:00
Guillaume Abrioux fb75fce4fa tests: fix `test_nfs_is_up` test
the data structure seems to have been modified in ceph@master (quincy).

This commit update the test accordingly.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7e1db0b599)
2021-03-24 21:36:24 +01:00
Guillaume Abrioux 43c7c20fa9 ceph_volume: fix bug in `is_lv()`
This function makes the `ceph_volume` module be not idempotent in
containerized context because it tries to run a container and bindmount
directories that no longer exist.

In that case, the `lvs` command being executed returns something
different than `0` so we can't call `json.loads(out)['report'][0]['lv']`
since it might throw an python error.

The idea is to return `True` only if `rc` is equal to `0` and
`len(result)` is greater than `0`, which means the command matched an
LV.

Fixes: #6284

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ed79bc7a4e)
2021-03-24 21:36:24 +01:00
Guillaume Abrioux a4d4f53080 fix 'command -v' tasks
`command -v` is a bash script which needs a shell to run.

Fixes: #6325

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 14c472707c)
2021-03-22 13:52:39 +01:00
Guillaume Abrioux 8d25b4305e adopt: convert legacy grafana-server groupname early
This is a follow up on PR #6332

cephadm-adopt.yml playbook is affected by the same bug

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1938658

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit af95595c82)
2021-03-18 08:56:44 +01:00
Guillaume Abrioux 5893a17886 tests: remove 1 client VM in external_clients job
We only use 2 client in this scenario, there's no need to fire up a
third VM.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fb1a5f071a)
2021-03-18 08:54:33 +01:00
Guillaume Abrioux 05ab3a7d50 validate: update `ceph_repository_community` check
this updates the `ceph_repository_community` check in `ceph-validate`
with the right ceph release expected.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 47b9b75ace)
2021-03-18 08:54:33 +01:00
Guillaume Abrioux 01939808b0 nfs: bump nfs-ganesha version
This commit updates the default version of nfs-ganesha to V3.5 which is the
latest version available upstream.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c78388e580)
2021-03-18 08:54:33 +01:00
Guillaume Abrioux c296824ae0 cephadm_adopt: fetch and write ceph minimal config
This commit makes the playbook fetch the minimal current ceph
configuration and write it later on monitoring nodes so `cephadm` can
proceed with the adoption.
When a monitoring stack was deployed on a dedicated node, it means no
`ceph.conf` file was written, `cephadm` requires a `ceph.conf` in order
to adopt the daemon present on the node.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1939887

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b445df0479)
2021-03-18 08:51:59 +01:00
Guillaume Abrioux 688e432c32 facts: fix nfs/external cluster scenario
These tasks shouldn't be run when at least 1 monitor isn't present in
the inventory.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1937997

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ccd1cbb732)
2021-03-18 06:40:33 +01:00
Guillaume Abrioux d65c7b4035 config: reset num_osds
When collocating OSDs with other daemon, `num_osds` is incorrectly calculated
because `ceph-config` is called multiple times.

Indeed, the following code:
```
num_osds: "{{ lvm_list.stdout | default('{}') | from_json | length | int + num_osds | default(0) | int }}"
```

makes `num_osds` be incremented each time `ceph-config` is called.

We have to reset it in order to get the correct number of expected OSDs.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 31a0f2653d)
2021-03-17 17:35:19 +01:00
Guillaume Abrioux 732e5b10b8 update: convert legacy grafana-server groupname early
If the legacy name `grafana-server` is still being used when upgrading
from Nautilus to Pacific, the task that sets the fact `rolling_update`
to `true` doesn't run on the node(s) included in that group. Indeed the
play where we set this fact (`rolling_update`) only runs on the group
`monitoring_group_name | default('monitoring')`.
As a workaround, we can run earlier the task which converts the
`grafana-server` group name to `monitoring`.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935554

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6ccc8b4722)
2021-03-16 14:33:40 +01:00
Matthew Vernon 3c8191194d docs: Document the prepare_osd tag
There are times where being able to skip OSD creation is useful to the
admin (see #1777 for example), and skipping the prepare_osd tag is a
way to achieve this. Document this fact.

Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
(cherry picked from commit e66b7b7449)
2021-03-12 09:19:55 +01:00
Matthew Vernon 6deb88d8fb ceph-osd: add prepare_osd tag to lvm-batch scenario
Sometimes it's useful to be able to skip the OSD creation step when
running ceph-ansible (cf #1777). The lvm scenario has a prepare_osd
tag on the relevant play. This commit adds the same tag to the
lvm-batch scenario.

Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
(cherry picked from commit 88d119e95a)
2021-03-12 09:19:55 +01:00
Matthew Vernon 6a23be19f4 Docs: fix some typos
While working on the previous PR, I found a couple of typos in the
docs. This fixes those.

Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
(cherry picked from commit 8b1474ab75)
2021-03-11 22:04:53 +01:00
Matthew Vernon 1a67f59789 Fix typo and broken link for documenting RGW frontends
http://docs.ceph.com/docs/nautilus/radosgw/frontends/ 404s so replace
it with a working "pacific" docs link, and correct the spelling of
"additional" while I'm at it.

Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
(cherry picked from commit 847611048e)
2021-03-03 14:17:31 +01:00
Guillaume Abrioux 6832c8d7a5 tests: increase nb of rerun in pytest
In order to avoid false positive in the CI that I've been unable to
reproduce.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f7fd1c2298)
2021-03-03 14:12:46 +01:00
Guillaume Abrioux f42ed8e1e0 dashboard: add missing parameter in `ceph_cmd`
the `ceph_cmd` fact is missing the `--net=host` parameter.

Some tasks consuming this fact can fail like following:

```
Error: error configuring network namespace for container b8ec913db1fb694ae683faf202680de7a59c714a004e533aba87e8503d29261f: Missing CNI default network
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1931365

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f143b1a647)
2021-03-03 14:12:46 +01:00
Florian Haas 95949ec787 requirements.txt: Move the six dependency into the general requirements
config_template.py depends on six, which isn't listed in the default
requirements.txt. This previously frequently wasn't a problem, because
six used to be a standard package being installed into a venv, and
lots of other projects depended on it.

It also does get installed for unit and integration tests via
tests/requirements.txt, so any broken dependency on six wouldn't be
detected by tox runs.

However, as other projects and distributions have phased out Python
2.7 support the dependency on six becomes less common. Thus, as long
as ceph-ansible does require it for config_template.py, add it to the
base requirements.

Signed-off-by: Florian Haas <florian@citynetwork.eu>
(cherry picked from commit d49ea9818b)
2021-03-01 15:16:55 +01:00
Guillaume Abrioux accdcf78e6 defaults: update rhcs dashboard images versions
The current dashboard images deployed have a bad health index.
Updating to a newer version fixes this issue.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1925350

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a16ae693d8)
2021-02-18 18:21:53 +01:00
Guillaume Abrioux bb9bba685f library: do not always add --yes in batch mode
When asking `ceph-volume` to report only in `lvm batch` context, there's
a bug described in bz1896803 [1] when `--yes` is passed (which by the
way isn't necessary with `--report`).
This commit ensure `--yes` isn't passed to `ceph-volume` when `--report`
is used.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1896803

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1896803

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fe6d6ba622)
2021-02-14 06:29:16 +01:00
Guillaume Abrioux 3326b6d54f purge: rm service-cid files
This commit makes sure purge playbooks remove those file if for any reason they
have been left.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1920900

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b9dd253a4f)
2021-02-12 18:33:19 +01:00
Guillaume Abrioux 5803619a5d switch2container: do not serialize the ceph-crash migration
There's no need to slow down the playbook execution time by migrating
all the `ceph-crash` instances in a serial way. Let's remove the
`serial: 1` so the migration is achieved in a parallel way.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 980a5a7df4)
2021-02-12 14:06:15 +01:00
Guillaume Abrioux 2feefdc861 tests: increase `mon_max_pg_per_osd`
we aren't deploying enough OSD daemon, so it fails like following:

```
  stderr: 'Error ERANGE: pool id 10 pg_num 256 size 2 would mean 1536 total pgs, which exceeds max 1500 (mon_max_pg_per_osd 250 * num_in_osds 6)'
```

Let's increase the value of `mon_max_pg_per_osd` in order to get around
this issue in the CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 682116023d)
2021-02-12 09:15:24 +01:00
Guillaume Abrioux 980a0dd00e rolling_update: update specific pacific task
update the 'require-osd-release' task.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-02-12 09:15:24 +01:00
Guillaume Abrioux 7dd4a8a059 tests: use shaman to test against ceph pacific
Given there's no pacific packages available at
https://download.ceph.com, let's use shaman in order to test against
Ceph Pacific

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-02-12 09:15:24 +01:00
Guillaume Abrioux 9102d6c090 doc: add a note about "latest" tags
See the change for details.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4e95180c80)
2021-02-11 16:41:50 +01:00
Dimitri Savineau 950a6ae406 cephadm-adopt: remove prometheus workaround
This was fixed by [1][2]

[1] https://tracker.ceph.com/issues/45120
[2] https://github.com/ceph/ceph/commit/252d4b30

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-02-10 13:51:41 +01:00
Dimitri Savineau d42d584085 doc: update containerized deployment
This adds more documentation to the configuration and usage of
containerizerd deployment.

Closes: #6198

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-02-10 13:50:53 +01:00
Guillaume Abrioux 7e5071856c doc: update the documentation
- mention `stable-6.0` requirements.
- update some patterns.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-02-10 13:50:10 +01:00
Dimitri Savineau 48a456dc8c rolling_update: enforce ceph-container-engine
When running the rolling_update.yml playbook and adding the dashboard
component in the same time then the requirement (like container packages)
aren't installed.
This could lead to a failure in case of using authentication on the
container registry because the playbook will try to login on the registry
but podman/docker aren't yet installed.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1903504
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918650

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-02-10 08:17:11 +01:00
Dimitri Savineau e4dd0067c6 ceph-common: enable rhcs tools repo for monitoring
The monitoring node running grafana needs the rhcs tools repostory
enabled in non containerized deployment to be able to install the
ceph-grafana-dashboards rpm package.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918650

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-02-10 08:17:11 +01:00
Guillaume Abrioux 2f1d287b1c tests: pin ansible-lint version
This commit pins the ansible-lint version to 4.3.7 as ceph-ansible isn't
compatible with recent changes in 5.0.0

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-02-10 07:48:24 +01:00
Guillaume Abrioux 54bae480d2 tests: set `mon_max_pg_per_osd` in rgw_multisite
Otherwise, the job fails when it tries to create a bucket with `s3cmd mb`
command because we have too many PGs per OSD.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-02-10 07:01:21 +01:00
Guillaume Abrioux 931b87e830 rgw: fix a typo in multisite
if `rgw_zonegroupmaster` is not defined at the rgw instance level in
`rgw_instances` it will fallback to a wrong variable (`rgw_zonemaster`).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1925247

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-02-10 07:01:21 +01:00
Dimitri Savineau 94af3c87d1 rolling_update: exclude clients from node-exporter
Since b105549 we don't install node-exporter on client nodes so we should
also exclude the client node from the node-exporter upgrade.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-02-09 14:41:13 +01:00
Dimitri Savineau 58b101d9ff docs: nautilus uses ansible 2.9
This updates the ansible release required to deploy nautilus with the
stable-4.0 branch.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-02-09 12:46:38 +01:00
Dimitri Savineau e7cdcfa342 dashboard: update with the new monitoring group
Since eefe11d the grafana-server group has been renamed to monitoring
but the dashboard playbook wasn't updated.
This was still working due to the backward compatibility added in the
ceph-facts role.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-02-09 12:45:02 +01:00
Dimitri Savineau ed094ea07a vagrant: remove centos/8 workaround
The CentOS 8 vagrant box has finally been updated [1] with a recent
version (the latest one 2011 which means CentOS 8.3).
We don't need to download the vagrant libvirt box with a direct url
anymore from the CentOS infrastructure.

[1] https://app.vagrantup.com/centos/boxes/8

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-02-09 12:41:59 +01:00
Guillaume Abrioux b9cdee40a2 update: update ceph release pattern in complete upgrade play
since master is now deploying quincy, we must update this.
Otherwise, it will fail like following:

```
Error EPERM: require_osd_release cannot be lowered once it has been set
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-02-06 00:34:14 +01:00