Commit Graph

5581 Commits (rhcs-4.3)
 

Author SHA1 Message Date
Guillaume Abrioux b2cf677b71 dashboard: support prometheus storage.tsdb.retention.time parameter
This commit adds the parameter `--storage.tsdb.retention.time` to the
prometheus systemd unit template.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1928000

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b60c61ce45)
2021-04-02 13:17:59 +02:00
Guillaume Abrioux 5fd299e358 update: followup on 07029e1
Playbook must fail anyway, the `rescue` block has been introduced for
unmasking the unit after the playbook has failed.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e9ddb972fe)
2021-03-29 15:22:23 +02:00
Guillaume Abrioux 82b934cfc1 rolling_update: unmask monitor service after a failure
if for some reason the playbook fails after the service was
stopped, disabled and masked and before it got restarted, enabled and
unmasked, the playbook leaves the service masked and which can make users
confused and forces them to unmask the unit manually.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1917680

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 07029e1bf1)
2021-03-29 15:22:23 +02:00
Guillaume Abrioux 653d180ec0 defaults: add a comment about `igw_network`
This add a quick documentation in ceph-defaults about `igw_network`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c5728bdc63)
2021-03-29 11:24:28 +02:00
Guillaume Abrioux fe47a02134 dashboard: support igw nodes with dedicated subnet
This adds the possibility to deploy the dashboard with igw nodes using
a dedicated subnet.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1926170

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c33de174f1)
2021-03-26 21:26:14 +01:00
VasishtaShastry 58a28656ff Peer addition won't be skipped if remote is not in peer
rbd-mirroring is not configured as adding peer is getting skipped.
Peer addition should not get skipped if its not added already

Closes - https://bugzilla.redhat.com/show_bug.cgi?id=1942444

Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
(cherry picked from commit 006998e804)
2021-03-26 19:14:35 +01:00
Guillaume Abrioux a8420d41c6 update: stop ceph-crash service before upgrading
This adds the missing service stop task for ceph-crash upgrade workflow.

It should have been added through commit
`15872e3db1e342238636bc9c8e1aef6bd1d3dcd8` in stable-4.0 but at the time
we backported this patch ceph-crash wasn't implemented yet so the
ceph-crash related content in this patch was removed. Then, ceph-crash
has been implemented later so we are still missing this part of the patch in
stable-4.0.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1943471

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-26 16:18:50 +01:00
Guillaume Abrioux a65968a9d1 tests: pin ruamel.yaml version
0.17.0 which was released today (03/26/2021) breaks ansible-lint execution with
py2.7.

From https://pypi.org/project/ruamel.yaml we can read:

> The 0.16.13 release was the last that will tested to be working on Python 2.7.

Let's enforce the version on 0.16.13 when running with py2.7

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-03-26 14:42:27 +01:00
Ali Maredia ba6fa5c959 docs: rgw multisite docs with new rgw_instances config
Docs reflect that each instance of `rgw_instances`
can now take rgw_zonemaster, rgw_zonesecondary,
rgw_zonegroupmaster, rgw_multisite_proto.

Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit a59bc2da3b)
2021-03-26 07:43:02 +01:00
Guillaume Abrioux 9780490b2f convert some missed `ansible_*`` calls to `ansible_facts['*']`
This converts some missed calls to `ansible_*` that were missed in
initial PR #6312

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0163ecc924)
2021-03-26 00:16:58 +01:00
Alex Schultz 6229b3bdba Disable facts by default in ansible.cfg
As a continuation of a7f2fa73e6, this
change switches fact injection to off by default in the provided
ansible.cfg.

Signed-off-by: Alex Schultz <aschultz@redhat.com>
(cherry picked from commit db031a4993)
2021-03-26 00:16:58 +01:00
Alex Schultz 7ddbe74712 Use ansible_facts
It has come to our attention that using ansible_* vars that are
populated with INJECT_FACTS_AS_VARS=True is not very performant.  In
order to be able to support setting that to off, we need to update the
references to use ansible_facts[<thing>] instead of ansible_<thing>.

Related: ansible#73654
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935406
Signed-off-by: Alex Schultz <aschultz@redhat.com>
(cherry picked from commit a7f2fa73e6)
2021-03-26 00:16:58 +01:00
Guillaume Abrioux 697e5823f3 library: drop ceph_facts
This is never called in the playbook and seems unmaintained.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b01f16e835)
2021-03-26 00:07:43 +01:00
Ken Dreyer 4aadd659e2 README-MULTISITE: fix typos
This commit fixes some typos in MULTISITE documentation.

Signed-off-by: Ken Dreyer <ktdreyer@redhat.com>
(cherry picked from commit 63a246db41)
2021-03-26 00:07:04 +01:00
Guillaume Abrioux 93defc4f4b tests: switch to quay.ceph.io for dashboard images
for some reason, `quay.io/app-sre/grafana` no longer exist.
as a workaround, all dashboard related images have been mirrored on
quay.ceph.io.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c90b0985e5)
2021-03-25 14:11:11 +01:00
Guillaume Abrioux 7fd332e7fe iscsi: fetch right repo from shaman
due to recent changes in shaman, we must fetch the right repo by
filtering on the desired architecture.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5801171b37)
2021-03-25 14:11:11 +01:00
Guillaume Abrioux bd6cc79fa0 tests: fix `test_rgw_is_up` test
The data structure seems to have been modified in ceph@master (quincy).

This commit update the test accordingly.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b8080bac41)
2021-03-25 14:11:11 +01:00
Guillaume Abrioux 358ea3853a tests: fix `test_nfs_is_up` test
the data structure seems to have been modified in ceph@master (quincy).

This commit update the test accordingly.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7e1db0b599)
2021-03-25 14:11:11 +01:00
Guillaume Abrioux b46d2bf0a6 ceph_volume: fix bug in `is_lv()`
This function makes the `ceph_volume` module be not idempotent in
containerized context because it tries to run a container and bindmount
directories that no longer exist.

In that case, the `lvs` command being executed returns something
different than `0` so we can't call `json.loads(out)['report'][0]['lv']`
since it might throw an python error.

The idea is to return `True` only if `rc` is equal to `0` and
`len(result)` is greater than `0`, which means the command matched an
LV.

Fixes: #6284

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ed79bc7a4e)
2021-03-25 14:11:11 +01:00
Guillaume Abrioux 2cd8c3637c fix 'command -v' tasks
`command -v` is a bash script which needs a shell to run.

Fixes: #6325

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 14c472707c)
2021-03-22 13:53:11 +01:00
Guillaume Abrioux bbf8b2fdf6 facts: fix nfs/external cluster scenario
These tasks shouldn't be run when at least 1 monitor isn't present in
the inventory.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1937997

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ccd1cbb732)
2021-03-18 06:41:00 +01:00
Guillaume Abrioux dc2a11ce3f config: reset num_osds
When collocating OSDs with other daemon, `num_osds` is incorrectly calculated
because `ceph-config` is called multiple times.

Indeed, the following code:
```
num_osds: "{{ lvm_list.stdout | default('{}') | from_json | length | int + num_osds | default(0) | int }}"
```

makes `num_osds` be incremented each time `ceph-config` is called.

We have to reset it in order to get the correct number of expected OSDs.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 31a0f2653d)
2021-03-17 17:35:52 +01:00
Guillaume Abrioux 8b86b2ede3 tests: increase nb of rerun in pytest
In order to avoid false positive in the CI that I've been unable to
reproduce.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f7fd1c2298)
2021-03-12 17:52:00 +01:00
Matthew Vernon ce25fc74eb Docs: fix some typos
While working on the previous PR, I found a couple of typos in the
docs. This fixes those.

Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
(cherry picked from commit 8b1474ab75)
2021-03-12 09:36:11 +01:00
Dimitri Savineau 6921aafb2b debian/uca: remove the handler notification
The "update apt cache" in the ceph-handler role was never called and the
handler trigger after adding the uca repository doesn't exist at all.
Instead of using a handler for that we can just set the update_cache
parameter to true like the other apt_repository tasks.

Resolve merge conflict from cherry-picking this commit.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 09d6706697)
2021-03-11 22:06:11 +01:00
Guillaume Abrioux e6447bdc2b library: do not always add --yes in batch mode
When asking `ceph-volume` to report only in `lvm batch` context, there's
a bug described in bz1896803 [1] when `--yes` is passed (which by the
way isn't necessary with `--report`).
This commit ensure `--yes` isn't passed to `ceph-volume` when `--report`
is used.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1896803

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1896803

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fe6d6ba622)
2021-03-11 13:53:06 +01:00
Guillaume Abrioux 0d0723298f purge: rm service-cid files
This commit makes sure purge playbooks remove those file if for any reason they
have been left.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1920900

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b9dd253a4f)
2021-03-11 13:52:48 +01:00
Guillaume Abrioux 932abbc8cf switch2container: do not serialize the ceph-crash migration
There's no need to slow down the playbook execution time by migrating
all the `ceph-crash` instances in a serial way. Let's remove the
`serial: 1` so the migration is achieved in a parallel way.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 980a5a7df4)
2021-03-11 13:52:39 +01:00
Dimitri Savineau 8f26ffdbac rolling_update: enforce ceph-container-engine
When running the rolling_update.yml playbook and adding the dashboard
component in the same time then the requirement (like container packages)
aren't installed.
This could lead to a failure in case of using authentication on the
container registry because the playbook will try to login on the registry
but podman/docker aren't yet installed.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1903504
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918650

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 48a456dc8c)
2021-03-11 13:52:21 +01:00
Dimitri Savineau 735965ef9c ceph-common: enable rhcs tools repo for monitoring
The monitoring node running grafana needs the rhcs tools repostory
enabled in non containerized deployment to be able to install the
ceph-grafana-dashboards rpm package.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918650

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit e4dd0067c6)
2021-03-11 13:52:21 +01:00
Dimitri Savineau 3ba27c9387 rolling_update: exclude clients from node-exporter
Since b105549 we don't install node-exporter on client nodes so we should
also exclude the client node from the node-exporter upgrade.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 94af3c87d1)
2021-03-11 13:52:02 +01:00
Guillaume Abrioux 1b424ad5e9 purge: zap and destroy db and wal devices for lvm batch
Those devices (db/wal) are never zapped in lvm batch deployment.
Iterating over `dedicated_devices` and `bluestore_wal_devices` fixes
this issue.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1922926

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 984191ac7f)
2021-03-11 13:51:38 +01:00
Tyler Bishop ba76102952 facts: support device aliases for (dedicated|bluestore_wal)_devices
Just likve `devices`, this commit adds the support for linux device aliases for
`dedicated_devices` and `bluestore_wal_devices`.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1919084

Signed-off-by: Tyler Bishop <tbishop@liquidweb.com>
(cherry picked from commit ee4b8804ae)
2021-03-11 13:51:19 +01:00
Guillaume Abrioux e3165f9a07 mon: fix cephx disabled deployment
Due to missing condition on `cephx` variable, cephx disabled deployments
are broken.
This commit fixes this.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1910151

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4af0845702)
2021-03-11 13:51:04 +01:00
Guillaume Abrioux bb1f66cb51 switch2container: fix mon quorum check
The current check makes no sense because it checks any of other monitor
than the one being played (either a previous one already converted or a
next that isn't yet converted) is present on the quorum.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1909011

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 175ffa1b88)
2021-03-11 13:50:27 +01:00
Guillaume Abrioux 241418409d common: ensure shaman returns right repo
Due to recent changes in shaman, there's a chance it returns the wrong
repository from architecture point of view.
We can query shaman and ask for the correct architecture to get around
this.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 39649f0ce8)
2021-03-10 16:43:04 +01:00
Matthew Vernon fdf437743c Fix typo and broken link for documenting RGW frontends
http://docs.ceph.com/docs/nautilus/radosgw/frontends/ 404s so replace
it with a working "latest" docs link, and correct the spelling of
"additional" while I'm at it.

Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
(cherry picked from commit 847611048e)
2021-03-03 14:20:26 +01:00
Florian Haas 6fe14c6d01 requirements.txt: Move the six dependency into the general requirements
config_template.py depends on six, which isn't listed in the default
requirements.txt. This previously frequently wasn't a problem, because
six used to be a standard package being installed into a venv, and
lots of other projects depended on it.

It also does get installed for unit and integration tests via
tests/requirements.txt, so any broken dependency on six wouldn't be
detected by tox runs.

However, as other projects and distributions have phased out Python
2.7 support the dependency on six becomes less common. Thus, as long
as ceph-ansible does require it for config_template.py, add it to the
base requirements.

Signed-off-by: Florian Haas <florian@citynetwork.eu>
(cherry picked from commit d49ea9818b)
2021-03-03 13:22:29 +01:00
Guillaume Abrioux c3304c213b dashboard: add missing parameter in `ceph_cmd`
the `ceph_cmd` fact is missing the `--net=host` parameter.

Some tasks consuming this fact can fail like following:

```
Error: error configuring network namespace for container b8ec913db1fb694ae683faf202680de7a59c714a004e533aba87e8503d29261f: Missing CNI default network
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1931365

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f143b1a647)
2021-03-03 12:57:08 +01:00
Guillaume Abrioux 858048560e update: fix require-osd-release task
This commit fixes two issues in rolling_update.yml:

- `container_exec_cmd_update_osd` is unset in the `complete osd upgrade`
play so it never runs the command in a container.
- the 'require-osd-release' task is never applied because the condition
  looks for luminous release.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1930164

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-02-18 22:22:06 +01:00
Guillaume Abrioux 158224503d defaults: update rhcs dashboard images versions
The current dashboard images deployed have a bad health index.
Updating to a newer version fixes this issue.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1925350

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a16ae693d8)
2021-02-18 18:22:28 +01:00
Guillaume Abrioux 3c0a5a0b61 doc: add a note about "latest" tags
See the change for details.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4e95180c80)
2021-02-11 16:50:43 +01:00
Dimitri Savineau a43960790f doc: update containerized deployment
This adds more documentation to the configuration and usage of
containerizerd deployment.

Closes: #6198

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d42d584085)
2021-02-11 16:50:43 +01:00
Guillaume Abrioux 55d0c79046 tests: install correct ansible-lint version
We need to pin the ansible-lint version depending on python version
being used.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-02-10 08:32:24 +01:00
Guillaume Abrioux 8fada83589 tests: set `mon_max_pg_per_osd` in rgw_multisite
Otherwise, the job fails when it tries to create a bucket with `s3cmd mb`
command because we have too many PGs per OSD.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 54bae480d2)
2021-02-10 08:32:24 +01:00
Guillaume Abrioux b5d082c4bc rgw: fix a typo in multisite
if `rgw_zonegroupmaster` is not defined at the rgw instance level in
`rgw_instances` it will fallback to a wrong variable (`rgw_zonemaster`).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1925247

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 931b87e830)
2021-02-10 08:32:24 +01:00
Guillaume Abrioux 920f07514a rgw: quick fix in create_zone_user.yml
typical error:

```
2021-02-01 03:11:09,809 p=93834 u=cephuser n=ansible | TASK [ceph-rgw : check if the realm system user already exists] ***************************************************************************************************************************************************
2021-02-01 03:11:09,809 p=93834 u=cephuser n=ansible | Monday 01 February 2021  03:11:09 -0500 (0:00:00.084)       0:14:38.607 *******
2021-02-01 03:11:09,836 p=93834 u=cephuser n=ansible | fatal: [ceph-kvm-ms2-1611241931591-node7-rgw]: FAILED! =>
  msg: |-
    The task includes an option with an undefined variable. The error was: 'None' has no attribute 'realm'
```

This task should be skipped when `zone_users` is undefined.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1922998

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-02-01 11:28:57 -05:00
Dimitri Savineau 6278c5a4e3 ceph-mon: add ExecStartPre docker stop to systemd
We already do that in the other systemd templates (mgr, mds, etc..)
and would present to add workaround in other orchestration tool.
This change is for containerized deployment only.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1882724

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3749d297c7)
2021-01-29 12:00:14 -05:00
Guillaume Abrioux aeee3471e3 rgw: avoid useless call to ceph-rgw
since `ceph-rgw` may be called from `ceph-handler` in some contexts we
should avoid rerunning it unnecessarily.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8617081664)
2021-01-28 16:37:50 -05:00
Guillaume Abrioux b903446fa4 containers: use --cpus instead --cpu-quota
When using docker 1.13.1, the current condition:

```
{% if (container_binary == 'docker' and ceph_docker_version.split('.')[0] is version_compare('13', '>=')) or container_binary == 'podman' -%}
```

is wrong because it compares the first digit (1) whereas it should
compare the second one.
It means we always use `--cpu-quota` although documentation recommend
using `--cpus` when docker version is 1.13.1 or higher.

From the doc:
> --cpu-quota=<value>	Impose a CPU CFS quota on the container. The number of
> microseconds per --cpu-period that the container is limited to before
> throttled. As such acting as the effective ceiling.
> If you use Docker 1.13 or higher, use --cpus instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3e262e072b)
2021-01-28 16:37:50 -05:00