Commit Graph

5560 Commits (694be0dce95084f758e1e6ac465589ef927e88d6)
 

Author SHA1 Message Date
Guillaume Abrioux 694be0dce9 facts: set is_rgw_instances_defined from configure_dashboard
When we come from configure_dashboard.yml, this fact should be set if
`rgw_instances` is defined in group_vars/host_vars. Otherwise, the next
task that set the fact `rgw_instances` will be run as it will assume it
wasn't user defined.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2117294

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 33ac715cfb)
2022-08-11 17:01:48 +02:00
Guillaume Abrioux f3296c18d8 config: use osd_memory_target value from ceph_conf_overrides if defined
otherwise it's impossible to override `osd_memory_target`
via `ceph_conf_overrides`.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2056675

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit cf4836dc95)
2022-08-10 16:32:49 +02:00
Guillaume Abrioux 611c8399ff rbd-mirror: do not call config_template with fqcn
This branch supports Ansible 2.9 only which doesn't use collections.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2037646

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2022-08-09 06:55:02 +02:00
Guillaume Abrioux 97c3b9691d replace 'master' references with 'main'
Because the branch'master' was renamed 'main'

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0ce39416e8)
2022-08-08 13:46:31 +02:00
Guillaume Abrioux 9167e19b1c tests: skip rbdmirror tests on non-secondary daemon
the daemon is not running on the 'primary' daemon.
Therefore, these tests are not needed.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 37e67fb672)
2022-08-08 13:06:50 +02:00
Guillaume Abrioux 056de01714 tests: set no_log_on_ceph_key_tasks=false
In order to not have to always reproduce it when a failure shows up in the CI
having the failure logged can make us save some time.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f1239b6907)
2022-08-08 13:05:23 +02:00
Guillaume Abrioux 64d912bad2 rbd-mirror: follow up on recent rbd-mirror refactor
- ensure /var/lib/ceph/bootstrap-rbd-mirror exists
- always install ceph-base on rbdmirror nodes (otherwise, ceph-crash
  isn't present)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 041435e1e3)
(cherry picked from commit b634fb1cb3)
(cherry picked from commit 302da16c27)
2022-08-08 13:05:23 +02:00
Teoman ONAY 0ab159242a Set ceph_rbd_mirror_pool default value
Signed-off-by: Teoman ONAY <tonay@redhat.com>
(cherry picked from commit 0c50bfac98)
(cherry picked from commit 8a0b5a9571)
(cherry picked from commit 4359464bb6)
2022-08-08 13:05:23 +02:00
Guillaume Abrioux ee5303bd0f rbd-mirror: major refactor
- Use config-key store to add cluster peer.
- Support multiple pools mirroring.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b74ff6e22c)
2022-08-08 13:05:20 +02:00
Guillaume Abrioux 658274b7f2 config: do not always set _osd_memory_target
When 'osd_memory_target' is overridden in ceph_conf_overrides.
The task that sets the fact `osd_memory_target` in the ceph-config role
should be skipped.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2056675#c11

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit cb5d6b48fb)
2022-08-08 09:46:26 +02:00
Teoman ONAY 89d7518db8 Playbook fails when using --limit to install new MDS
"set_fact container_run_cmd" is not set when using --limit on MDS as facts
were not run on first MON.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2111017

Signed-off-by: Teoman ONAY <tonay@redhat.com>
(cherry picked from commit 9a4a3f5f19)
2022-08-01 15:55:20 +02:00
Guillaume Abrioux 76ae5f9ef7 config: followup on 8a5628b51
Add missing `--cluster {{ cluster }}` on task
`set osd_memory_target` in the main.yml file of the
ceph-config role.
Also it moves the task after ceph configuration file is actually written.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9f59c7286f)
2022-07-22 11:12:26 +02:00
Guillaume Abrioux 4661c5679f shrink-osd: use command instead of ceph_volume_simple_scan
This module isn't available in RHCS 4

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2071035

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2022-07-18 10:11:27 +02:00
Guillaume Abrioux f8e969c890 backup-and-restore: use archive/unarchive approach
current approach is too complex and causes too many issues permission
issues.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2051640

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit dffe7b47de)
2022-07-11 14:05:47 +02:00
Guillaume Abrioux 1b09682f75 backup-and-restore: various fixes
- preserve mode and ownership on main directories
- make sure the directories are well present prior to restoring files.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2051640

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 047af3a3f6)
2022-07-11 14:05:43 +02:00
Guillaume Abrioux 1ecdc189b1 backup-and-restore: fix check on 'target_node' variable
If the user doesn't pass a valid name (present in the inventory)
the playbook will fail like following:

```
fatal: [localhost -> {{ target_node }}]: FAILED! =>
  msg: |-
    The task includes an option with an undefined variable. The error was: "hostvars['10.70.46.40']" is undefined
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2051640

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b18a1aa3ca)
2022-07-11 14:05:40 +02:00
Guillaume Abrioux e28615809c backup-and-restore: fix check on 'mode' variable
Typical failure:

```
fatal: [localhost]: FAILED! =>
  msg: |-
    The conditional check 'mode not in ['backup', 'restore']' failed. The error was: error while evaluating conditional (mode not in ['backup', 'restore']): 'mode' is undefined
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2051640

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 848dd03fa6)
2022-07-11 14:05:36 +02:00
Guillaume Abrioux 85049464ce backup-and-restore: fix a typo
Typo introduced during initial implementation.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2051640

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e28c486e52)
2022-07-11 14:05:33 +02:00
Guillaume Abrioux 2c2833b540 contrib: add a playbook
this playbook can backup or restore some ceph files.
(/etc/ceph, /var/lib/ceph, ...)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2051640

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ed0bba4d77)
2022-07-11 14:05:27 +02:00
Guillaume Abrioux f9b7bd327c config/osd: various fixes
- sets `osd_memory_target` per osd host.
- ceph.conf refactor (osd)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2056675

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8a5628b516)
2022-07-11 14:01:23 +02:00
Guillaume Abrioux 68470e91e7 config: fix indentation in main.yml
For consistency and readability.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5283fa6e96)
2022-07-11 14:01:09 +02:00
Teoman ONAY 6feb7646a1 Refresh /etc/ceph/osd json files content before zapping the disks
If the physical disk to device path mapping has changed since the
last ceph-volume simple scan (e.g. addition or removal of disks),
a wrong disk could be deleted.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2071035

Signed-off-by: Teoman ONAY <tonay@redhat.com>
(cherry picked from commit 64e08f2c0b)
2022-07-11 09:16:34 +02:00
Guillaume Abrioux 6dc7bfdd3c facts: fix set_radosgw_address.yml
use `include_tasks` instead of `import_tasks`.
Given that with `import_tasks` statements are preprocessed
and the tasks that defines it hasn't been run yet, it will fail
and complain like following:

```
The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute '_interface'
```

Using `include_tasks` instead fixes this.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 434793e2fe)
(cherry picked from commit d57377ef61)
2022-07-07 10:30:25 +02:00
Guillaume Abrioux f8cc43558f facts: fix deployments with different net interface names
Deployments when radosgws don't have the same names for
network interface.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2095605

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f6b49f78a9)
(cherry picked from commit ab6fc3e6f7)
2022-07-07 10:30:20 +02:00
Guillaume Abrioux 28063d1d69 purge: reset-failed ceph-crash
This ensures we always reset-failed the ceph-crash service.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2055992

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e368ee0fc9)
2022-05-23 09:53:31 +02:00
Guillaume Abrioux 6485e1a69e purge: remove ceph directories on client nodes
Otherwise any ceph directories are left over on client nodes
after the purge.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2024815

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 20035852a4)
(cherry picked from commit 346d4a1e1d)
2022-05-19 18:00:13 +02:00
Guillaume Abrioux 36fceeaaff update: speed up client play
there's no need to run the roles ceph-facts, ceph-config and ceph-client
altogether on client nodes in rolling update playbook.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2019831

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 817c03bc0e)
(cherry picked from commit c0da98b1d6)
2022-05-19 17:58:01 +02:00
Guillaume Abrioux 8d9ebf2d0b container: align systemd units with rpm
Update `After=` and `Wants=` parameters in container systemd units
and make them be aligned with the systemd units that come
from the packaging.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2027440

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f01536ea19)
(cherry picked from commit 690c879aef)
2022-05-19 17:55:17 +02:00
Guillaume Abrioux f4bd5c7d91 switch2containers: fail if less than 3 monitors
This playbook doesn't support less than 3 monitors present in the inventory.
Just like the rolling_update playbook, let's fail if less than
3 monitors are present.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2049132

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f08129edf2)
(cherry picked from commit b970ab6691)
2022-05-19 17:53:53 +02:00
Guillaume Abrioux fe4e04f779 dashboard: allow collecting stats from the host
This commit makes podman bindmount `/:/rootfs:ro` so the container can
collect data from the host.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2028775

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0f34cd16d8)
(cherry picked from commit 2e2d893d28)
2022-05-09 13:46:05 +02:00
Guillaume Abrioux dbe940f1a7 purge: ceph-crash purge fixes
This fixes the service file removal and makes the playbook
call `systemctl reset-failed` on the service because in Ceph
Nautilus, ceph-crash doesn't handle `SIGTERM` signal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2055992

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2f11982590)
(cherry picked from commit 7a570c719e)
2022-05-09 13:45:16 +02:00
Guillaume Abrioux 26b396cb6c common: support setting pg autoscaler to off
The current implementation doesn't allow to disable the pg autoscaler
on created pools. This allows only 'on' or 'warn'.

With this commit, this is now possible to disable it.

Valid values would be ['on', 'yes', 'true', 'off', 'no', 'false']

```
openstack_glance_pool:
  name: "images"
  pg_num: 128
  pgp_num: 128
  rule_name: "replicated_rule"
  type: 1
  application: "rbd"
  size: 3
  pg_autoscale_mode: off
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2062621

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9d1ff8f236)
2022-05-09 13:44:31 +02:00
Teoman ONAY b22e1b87d1 Turn off SELinux separation for containers MON and RGW
Initially MONs and RGW binded /etc/pki/ca-trust/extracted using the :z flag
(introduced to solve an OSP TripleO issue on RHEL - #3638) but using
this flag prevents local services (like sssd) running on the host from accessing
the certificates/files in that folder.

Signed-off-by: Teoman ONAY <tonay@redhat.com>
(cherry picked from commit 7e8ce2567e)
(cherry picked from commit cf44ad76f6)
2022-05-09 13:44:04 +02:00
Guillaume Abrioux 45b3e50e25 facts: follow up on aa0cc93
when these variables are defined in the inventory host file,
all tasks are skipped then because the node being played isn't
aware about the values from the rgw nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2063029

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 328bd7c975)
2022-04-21 11:34:18 +02:00
Guillaume Abrioux 8d39897f38 facts: fix mon/mgr collocation
`service dump` hangs when no active mgr is available.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 617dce5e10)
2022-04-20 06:47:23 +02:00
Guillaume Abrioux 1453915ed9 dashboard: fix regression
introduced by ceph/ceph-ansible/pull/7150

when no rgw is present, it fails.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2076192

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1a56fd6a21)
2022-04-20 06:47:16 +02:00
Guillaume Abrioux c4ec2c4e54 dashboard: support --limit execution with rgw
When the following conditions are met:

- rgw is deployed,
- dashboard is deployed,
- playbook is called with --limit,
- a node being processed is collocated on either a mon or mgr.

The playbook fails because `rgw_instances` is undefined.
The idea here is to make sure this variable is always defined.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2063029

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit aa0cc9381d)
2022-04-14 08:50:33 +02:00
Guillaume Abrioux 49b8e0d89c dashboard: always set `dashboard_server_addr`
When running the playbook with `--limit`, if the play targeted doesn't match
hosts present in the mgr group the playbook can fail.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2063029

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 72e4654aae)
(cherry picked from commit d1e4b83106)
2022-03-28 10:40:36 +02:00
Guillaume Abrioux 3c05ac6c46 dashboard: fix radosgw system user creation
The radosgw system user creation will fail when `rgw_instances`
is set at the host_var level because this variable won't bet set
on monitor nodes, given that this is where the tasks is delegated, it fails.

The idea here is to check over all rgw instances that are defined and set a
boolean fact in order to check if at least one instance has `rgw_zonemaster` set
to `True`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2034595

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2022-01-03 10:22:09 +01:00
Guillaume Abrioux de14c6aeb2 validate: fix bug when using vault
since a variable encrypted with vault is no longer a string but a
encrypted object we can't use the filter | length, we have to convert it
to a string before.

Fixes: #6991

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6ad7e52869)
2021-11-29 13:42:43 +01:00
Guillaume Abrioux f5dd0a8c37 mgr: append balancer module to ceph_mgr_modules
otherwise the osd play in rolling_update can fail when it tries to
disable it before upgrading osd nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 45a1d634d8)
2021-11-10 14:10:30 +01:00
Guillaume Abrioux 48a8b1cc34 update: move a set_fact
ceph-facts roles makes decisions based on the fact `rolling_update` so
it must be called before we run this role.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e5edcc4214)
2021-11-03 11:50:50 +01:00
Guillaume Abrioux a9a7c35a74 update: support --limit on monitor nodes
Change needed in order to support --limit on mon nodes.
Otherwise, a call to `hostvars[groups[mon_group_name][0]]['_current_monitor_address']`
throws an error:

```
"The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute '_current_monitor_address'"
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304#c28

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 82eee4303b)
2021-10-29 01:41:13 +02:00
Guillaume Abrioux b8db1166c5 nfs/rgw: support enforcing keys
if one sets `ceph_nfs_rgw_access_key` and/or `ceph_nfs_rgw_secret_key`,
the nfs/rgw user creation won't take those variables into account and it
will generate a user with automatically generated credentials.
It ends up with a mismatch between what will be set in ganesha.conf and
the created user.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2010754

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-10-26 16:39:58 +02:00
Per Abildgaard Toft c5e4851a3f shrink-osd: fix regression because of a wrong regex
968891f449 introduced a regression.
The regex is wrong because it doesn't allow to shrink osds with id
greater than 9

Fixes: #6950

Signed-off-by: Per Abildgaard Toft <per@minfejl.dk>
(cherry picked from commit 84118a3063)
2021-10-26 16:39:33 +02:00
Guillaume Abrioux 3f4abb09b4 shrink-osd: check osd id format
This adds a check early in order to ensure the format of osd ids passed
is correct.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2005734

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 968891f449)
2021-10-26 16:39:33 +02:00
Guillaume Abrioux 2c9fc7f517 rolling_update: modify default health_osd_check_*
let's do more retries with a shorter delay.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 50a21d695e)
2021-10-26 16:39:09 +02:00
Guillaume Abrioux a15448e5cc tests: add new scenario subset_update
new scenario in order to test the subset upgrade approach using tags.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fb8a66149b)
2021-10-25 23:22:35 +02:00
Guillaume Abrioux 3dd96da652 rolling_update: fix pre and post osd upgrade play
when using --limit osds, the play before and after osd upgrade are
skipped because we use `hosts: "{{ mon_group_name | default('mons') }}[0]"`
using `hosts: "{{ osds_group_name | default('osds') }}" with
`delegate_to` to the first monitor addresses this issue.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fc9f87c45f)
2021-10-25 23:22:35 +02:00
Guillaume Abrioux dc1a4c29ea update: support upgrading a subset of nodes
It can be useful in a large cluster deployment to split the upgrade and
only upgrade a group of nodes at a time.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e5cf9db2b0)
2021-10-25 23:22:35 +02:00