Commit Graph

5744 Commits (a305296384c44e387965d9970ede001e1745141d)
 

Author SHA1 Message Date
Guillaume Abrioux c90b0985e5 tests: switch to quay.ceph.io for dashboard images
for some reason, `quay.io/app-sre/grafana` no longer exist.
as a workaround, all dashboard related images have been mirrored on
quay.ceph.io.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-23 19:58:27 +01:00
Guillaume Abrioux 5801171b37 iscsi: fetch right repo from shaman
due to recent changes in shaman, we must fetch the right repo by
filtering on the desired architecture.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-23 19:58:27 +01:00
Guillaume Abrioux b8080bac41 tests: fix `test_rgw_is_up` test
The data structure seems to have been modified in ceph@master (quincy).

This commit update the test accordingly.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-23 19:58:27 +01:00
Guillaume Abrioux 7e1db0b599 tests: fix `test_nfs_is_up` test
the data structure seems to have been modified in ceph@master (quincy).

This commit update the test accordingly.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-23 19:58:27 +01:00
Guillaume Abrioux ed79bc7a4e ceph_volume: fix bug in `is_lv()`
This function makes the `ceph_volume` module be not idempotent in
containerized context because it tries to run a container and bindmount
directories that no longer exist.

In that case, the `lvs` command being executed returns something
different than `0` so we can't call `json.loads(out)['report'][0]['lv']`
since it might throw an python error.

The idea is to return `True` only if `rc` is equal to `0` and
`len(result)` is greater than `0`, which means the command matched an
LV.

Fixes: #6284

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-23 19:58:27 +01:00
Brad Hubbard bf8cdad937 Make sure the repo url contains the correct arch
We can end up with an arm only repo unless we are specific about the
architecture we require. Brings the deb code in line with the rpm
equivalent.

Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
2021-03-22 09:39:48 +01:00
Guillaume Abrioux fe918722fb github: use actions/stale
This commit replaces the current stale bot which seems to be broken with
the github actions/stale one.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-22 09:38:14 +01:00
Guillaume Abrioux 14c472707c fix 'command -v' tasks
`command -v` is a bash script which needs a shell to run.

Fixes: #6325

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-18 20:29:05 +01:00
Guillaume Abrioux 07029e1bf1 rolling_update: unmask monitor service after a failure
if for some reason the playbook fails after the service was
stopped, disabled and masked and before it got restarted, enabled and
unmasked, the playbook leaves the service masked and which can make users
confused and forces them to unmask the unit manually.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1917680

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-18 15:22:38 +01:00
Guillaume Abrioux b445df0479 cephadm_adopt: fetch and write ceph minimal config
This commit makes the playbook fetch the minimal current ceph
configuration and write it later on monitoring nodes so `cephadm` can
proceed with the adoption.
When a monitoring stack was deployed on a dedicated node, it means no
`ceph.conf` file was written, `cephadm` requires a `ceph.conf` in order
to adopt the daemon present on the node.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1939887

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-17 17:39:12 +01:00
Guillaume Abrioux ccd1cbb732 facts: fix nfs/external cluster scenario
These tasks shouldn't be run when at least 1 monitor isn't present in
the inventory.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1937997

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-17 16:05:48 +01:00
Guillaume Abrioux af95595c82 adopt: convert legacy grafana-server groupname early
This is a follow up on PR #6332

cephadm-adopt.yml playbook is affected by the same bug

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1938658

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-17 16:04:11 +01:00
Guillaume Abrioux ee1f0ce444 Revert "tests: disable nfs testing on master"
This reverts commit 8372b6792f.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-17 13:42:20 +01:00
Guillaume Abrioux b27398163a validate: followup on 98e32b9
update the message accordingly to the check updated in
commit 98e32b92f3

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-17 09:46:05 +01:00
Guillaume Abrioux a112572734 clients: build filtered clients group early
when the group `_filtered_clients` is built, the order can change from
the original `clients` group which can cause issues since we run
`ceph-container-engine` on the first client only. It means later in the
playbook we can make call to the container CLI on a node where the
container engine wasn't installed.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-16 19:38:04 +01:00
Guillaume Abrioux 49668378fb tests: remove 1 client VM in external_clients job
We only use 2 client in this scenario, there's no need to fire up a
third VM.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-16 19:38:04 +01:00
Guillaume Abrioux 8372b6792f tests: disable nfs testing on master
nfs-ganesha builds in shaman are broken.
This commit disables nfs-ganesha testing in order to unlock the CI.

This is a temporary commit intented to be reverted.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-16 19:38:04 +01:00
Guillaume Abrioux 98e32b92f3 validate: update `ceph_repository_community` check
this updates the `ceph_repository_community` check in `ceph-validate`
with the right ceph release expected.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-16 19:38:04 +01:00
Guillaume Abrioux 6c6939104d nfs: bump nfs-ganesha version
This commit updates the default version of nfs-ganesha to V3.5 which is the
latest version available upstream.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-16 19:38:04 +01:00
Guillaume Abrioux 6ccc8b4722 update: convert legacy grafana-server groupname early
If the legacy name `grafana-server` is still being used when upgrading
from Nautilus to Pacific, the task that sets the fact `rolling_update`
to `true` doesn't run on the node(s) included in that group. Indeed the
play where we set this fact (`rolling_update`) only runs on the group
`monitoring_group_name | default('monitoring')`.
As a workaround, we can run earlier the task which converts the
`grafana-server` group name to `monitoring`.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935554

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-15 15:25:48 +01:00
Guillaume Abrioux 31a0f2653d config: reset num_osds
When collocating OSDs with other daemon, `num_osds` is incorrectly calculated
because `ceph-config` is called multiple times.

Indeed, the following code:
```
num_osds: "{{ lvm_list.stdout | default('{}') | from_json | length | int + num_osds | default(0) | int }}"
```

makes `num_osds` be incremented each time `ceph-config` is called.

We have to reset it in order to get the correct number of expected OSDs.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-15 10:41:28 +01:00
Dimitri Savineau 5b86ac8801 library: add realm pull to radosgw_realm module
This adds the realm pull operation to the current radosgw_realm module.
The pull operation requires the url, access/secret key variables.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-03-12 18:21:37 +01:00
Benoît Knecht 2437f14581 ceph-mon: Fix check mode for deploy monitor tasks
Skip the `get initial keyring when it already exists` task when both commands
whose `stdout` output it requires have been skipped (e.g. when running in check
mode).

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
2021-03-12 18:19:46 +01:00
Guillaume Abrioux 1f3289934a stalebot: update config
This decreases the number of days of inactivity before an issue becomes
sstale.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-12 09:45:19 +01:00
Kai Stian Olstad 31f8fd793d docs: Update URL to LVM batch
Signed-off-by: Kai Stian Olstad <kai.stian.olstad@gmail.com>
2021-03-11 22:03:51 +01:00
Matthew Vernon e66b7b7449 docs: Document the prepare_osd tag
There are times where being able to skip OSD creation is useful to the
admin (see #1777 for example), and skipping the prepare_osd tag is a
way to achieve this. Document this fact.

Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
2021-03-11 22:02:52 +01:00
Matthew Vernon 88d119e95a ceph-osd: add prepare_osd tag to lvm-batch scenario
Sometimes it's useful to be able to skip the OSD creation step when
running ceph-ansible (cf #1777). The lvm scenario has a prepare_osd
tag on the relevant play. This commit adds the same tag to the
lvm-batch scenario.

Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
2021-03-11 22:02:52 +01:00
Matthew Vernon 8b1474ab75 Docs: fix some typos
While working on the previous PR, I found a couple of typos in the
docs. This fixes those.

Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
2021-03-11 17:00:11 +01:00
Alex Schultz a7f2fa73e6 Use ansible_facts
It has come to our attention that using ansible_* vars that are
populated with INJECT_FACTS_AS_VARS=True is not very performant.  In
order to be able to support setting that to off, we need to update the
references to use ansible_facts[<thing>] instead of ansible_<thing>.

Related: ansible#73654
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935406
Signed-off-by: Alex Schultz <aschultz@redhat.com>
2021-03-08 20:54:02 +01:00
Guillaume Abrioux f7fd1c2298 tests: increase nb of rerun in pytest
In order to avoid false positive in the CI that I've been unable to
reproduce.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-03 12:31:51 +01:00
Matthew Vernon 847611048e Fix typo and broken link for documenting RGW frontends
http://docs.ceph.com/docs/nautilus/radosgw/frontends/ 404s so replace
it with a working "latest" docs link, and correct the spelling of
"additional" while I'm at it.

Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
2021-03-02 18:31:38 +01:00
Guillaume Abrioux f143b1a647 dashboard: add missing parameter in `ceph_cmd`
the `ceph_cmd` fact is missing the `--net=host` parameter.

Some tasks consuming this fact can fail like following:

```
Error: error configuring network namespace for container b8ec913db1fb694ae683faf202680de7a59c714a004e533aba87e8503d29261f: Missing CNI default network
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1931365

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-02 07:34:25 +01:00
Guillaume Abrioux a16ae693d8 defaults: update rhcs dashboard images versions
The current dashboard images deployed have a bad health index.
Updating to a newer version fixes this issue.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1925350

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-02-18 03:13:03 +01:00
Florian Haas d49ea9818b requirements.txt: Move the six dependency into the general requirements
config_template.py depends on six, which isn't listed in the default
requirements.txt. This previously frequently wasn't a problem, because
six used to be a standard package being installed into a venv, and
lots of other projects depended on it.

It also does get installed for unit and integration tests via
tests/requirements.txt, so any broken dependency on six wouldn't be
detected by tox runs.

However, as other projects and distributions have phased out Python
2.7 support the dependency on six becomes less common. Thus, as long
as ceph-ansible does require it for config_template.py, add it to the
base requirements.

Signed-off-by: Florian Haas <florian@citynetwork.eu>
2021-02-15 20:56:20 +01:00
Guillaume Abrioux fe6d6ba622 library: do not always add --yes in batch mode
When asking `ceph-volume` to report only in `lvm batch` context, there's
a bug described in bz1896803 [1] when `--yes` is passed (which by the
way isn't necessary with `--report`).
This commit ensure `--yes` isn't passed to `ceph-volume` when `--report`
is used.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1896803

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1896803

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-02-12 14:07:14 +01:00
Dimitri Savineau 4047d02ee6 Add quincy release
Add the 17th ceph release: quincy.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-02-12 10:02:08 +01:00
Guillaume Abrioux b9dd253a4f purge: rm service-cid files
This commit makes sure purge playbooks remove those file if for any reason they
have been left.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1920900

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-02-12 10:01:31 +01:00
Guillaume Abrioux 980a5a7df4 switch2container: do not serialize the ceph-crash migration
There's no need to slow down the playbook execution time by migrating
all the `ceph-crash` instances in a serial way. Let's remove the
`serial: 1` so the migration is achieved in a parallel way.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-02-11 21:36:23 +01:00
Guillaume Abrioux 682116023d tests: increase `mon_max_pg_per_osd`
we aren't deploying enough OSD daemon, so it fails like following:

```
  stderr: 'Error ERANGE: pool id 10 pg_num 256 size 2 would mean 1536 total pgs, which exceeds max 1500 (mon_max_pg_per_osd 250 * num_in_osds 6)'
```

Let's increase the value of `mon_max_pg_per_osd` in order to get around
this issue in the CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-02-11 16:35:55 +01:00
Guillaume Abrioux 4e95180c80 doc: add a note about "latest" tags
See the change for details.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-02-11 14:07:33 +01:00
Guillaume Abrioux 26acaf9eeb mergify: add stable-6.0 backport configuration
This adds the stable-6.0 backport configuration in mergify.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-02-10 14:57:01 +01:00
Dimitri Savineau 950a6ae406 cephadm-adopt: remove prometheus workaround
This was fixed by [1][2]

[1] https://tracker.ceph.com/issues/45120
[2] https://github.com/ceph/ceph/commit/252d4b30

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-02-10 13:51:41 +01:00
Dimitri Savineau d42d584085 doc: update containerized deployment
This adds more documentation to the configuration and usage of
containerizerd deployment.

Closes: #6198

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-02-10 13:50:53 +01:00
Guillaume Abrioux 7e5071856c doc: update the documentation
- mention `stable-6.0` requirements.
- update some patterns.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-02-10 13:50:10 +01:00
Dimitri Savineau 48a456dc8c rolling_update: enforce ceph-container-engine
When running the rolling_update.yml playbook and adding the dashboard
component in the same time then the requirement (like container packages)
aren't installed.
This could lead to a failure in case of using authentication on the
container registry because the playbook will try to login on the registry
but podman/docker aren't yet installed.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1903504
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918650

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-02-10 08:17:11 +01:00
Dimitri Savineau e4dd0067c6 ceph-common: enable rhcs tools repo for monitoring
The monitoring node running grafana needs the rhcs tools repostory
enabled in non containerized deployment to be able to install the
ceph-grafana-dashboards rpm package.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918650

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-02-10 08:17:11 +01:00
Guillaume Abrioux 2f1d287b1c tests: pin ansible-lint version
This commit pins the ansible-lint version to 4.3.7 as ceph-ansible isn't
compatible with recent changes in 5.0.0

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-02-10 07:48:24 +01:00
Guillaume Abrioux 54bae480d2 tests: set `mon_max_pg_per_osd` in rgw_multisite
Otherwise, the job fails when it tries to create a bucket with `s3cmd mb`
command because we have too many PGs per OSD.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-02-10 07:01:21 +01:00
Guillaume Abrioux 931b87e830 rgw: fix a typo in multisite
if `rgw_zonegroupmaster` is not defined at the rgw instance level in
`rgw_instances` it will fallback to a wrong variable (`rgw_zonemaster`).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1925247

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-02-10 07:01:21 +01:00
Dimitri Savineau 94af3c87d1 rolling_update: exclude clients from node-exporter
Since b105549 we don't install node-exporter on client nodes so we should
also exclude the client node from the node-exporter upgrade.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-02-09 14:41:13 +01:00