Commit Graph

801 Commits (6374849f74b60614fb1ccd5ce59c283c3601c043)

Author SHA1 Message Date
Guillaume Abrioux e30fdeba3c purge-dashboard: check for legacy group name 'grafana-server'
When using the legacy group name 'grafana-server', this playbook will run but
won't remove properly all monitoring resources as expected.

Fixes: #7265

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a9cb444be1)
2022-08-04 07:10:27 +02:00
Guillaume Abrioux 783b9235e5 adopt: fix placement update calls for rgw
The commands called here are not built correctly.
This commit fixes it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2058038#c27

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 30c7e88d81)
2022-08-04 07:10:09 +02:00
Guillaume Abrioux 32b84e7e8e rbd-mirror: follow up on recent rbd-mirror refactor
- ensure /var/lib/ceph/bootstrap-rbd-mirror exists
- always install ceph-base on rbdmirror nodes (otherwise, ceph-crash
  isn't present)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 041435e1e3)
(cherry picked from commit b634fb1cb3)
2022-08-04 06:52:06 +02:00
Teoman ONAY 0981158e03 Refresh /etc/ceph/osd json files content before zapping the disks
If the physical disk to device path mapping has changed since the
last ceph-volume simple scan (e.g. addition or removal of disks),
a wrong disk could be deleted.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2071035

Signed-off-by: Teoman ONAY <tonay@redhat.com>
(cherry picked from commit 64e08f2c0b)
2022-07-11 13:43:37 +02:00
Guillaume Abrioux 392ddec2d7 backup-and-restore: use archive/unarchive approach
current approach is too complex and causes too many issues permission
issues.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2051640

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit dffe7b47de)
2022-07-07 17:14:53 +02:00
Guillaume Abrioux e1e5cb52f1 update: fix a typo
s/pre-quincy/pre-pacific
s/quincy-only/pacific-only

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2022-07-06 13:09:19 +02:00
Guillaume Abrioux f5020f6130 backup-and-restore: various fixes
- preserve mode and ownership on main directories
- make sure the directories are well present prior to restoring files.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2051640

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 047af3a3f6)
2022-07-05 14:45:46 +02:00
Guillaume Abrioux 8d011b4ab8 Revert "upgrade: block upgrade when rgw multisite is active"
This reverts commit 51bc8cb636.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7d848fa19e)
2022-07-03 07:28:14 +02:00
Guillaume Abrioux f28002713f backup-and-restore: fix check on 'target_node' variable
If the user doesn't pass a valid name (present in the inventory)
the playbook will fail like following:

```
fatal: [localhost -> {{ target_node }}]: FAILED! =>
  msg: |-
    The task includes an option with an undefined variable. The error was: "hostvars['10.70.46.40']" is undefined
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2051640

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b18a1aa3ca)
2022-06-29 09:09:20 +02:00
Guillaume Abrioux 1db668d95a backup-and-restore: fix check on 'mode' variable
Typical failure:

```
fatal: [localhost]: FAILED! =>
  msg: |-
    The conditional check 'mode not in ['backup', 'restore']' failed. The error was: error while evaluating conditional (mode not in ['backup', 'restore']): 'mode' is undefined
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2051640

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 848dd03fa6)
2022-06-29 08:52:38 +02:00
Guillaume Abrioux 941102d4e6 purge: reset-failed ceph-crash
This ensures we always reset-failed the ceph-crash service.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2055992

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5ab46f836d)
2022-06-15 21:13:22 +02:00
Guillaume Abrioux c9a81026ea backup-and-restore: fix a typo
Typo introduced during initial implementation.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2051640

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e28c486e52)
2022-06-15 13:02:33 +02:00
Francesco Pantano 2885b6175c Add ceph_infra tag to rolling_update
When the upgrade from Ceph 4 to 5 is performed in the OpenStack context,
ceph-ansible triggers the rolling_update playbook, which is supposed to
rollout new Ceph containers.  The ceph-infra role tries to take care
about firewall, ntp config and logrotate; however, TripleO manages them
through tripleo-heat-templates.  This patch just add an additional tag
to skip the ceph-infra role in the OpenStack context.

Closes: https://bugzilla.redhat.com/2090456
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
(cherry picked from commit 0e9b3902b0)
2022-06-14 14:39:10 +02:00
Seena Fallah 54aca30a24 ansible: use ansible.utils.ipwrap instead of ansible.netcommon.ipwrap
ansible.netcommon.ipwrap is deprecated and is not being redirected with ansible 2.9.*

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
2022-06-14 09:36:39 +02:00
Guillaume Abrioux c9dd9a09d2 switch to ansible.netcommon.ipwrap
As of 2.10, Ansible moved ipwrap to netcommon collection.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2022-06-07 16:30:18 +02:00
Guillaume Abrioux 4d3e25c85e cephadm_adopt: set autotune_memory_target_ratio
This adds a task that sets `autotune_memory_target_ratio` depending on the
value of `is_hci`.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2028693

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 41d62596fc)
2022-05-30 16:42:10 +02:00
Guillaume Abrioux 081c170120 cephadm-adopt: remove legacy directory after adoption
When this directory is left after the osd adoption, it leads to the following error:

```
[WRN] CEPHADM_REFRESH_FAILED: failed to probe daemons or devices
    host axdesec2ocs1n002.ecommerce.inditex.grp `cephadm ceph-volume` failed: cephadm exited with an error code: 1, stderr:Inferring config /var/lib/ceph/41555360-e96b-4b16-a37c-873e0c940091/mon.axdesec2ocs1n002/config
ERROR: [Errno 2] No such file or directory: '/var/lib/ceph/41555360-e96b-4b16-a37c-873e0c940091/mon.axdesec2ocs1n002/config'.
```

this is because of an unexpected behavior regarding 'config inferring' when a legacy directory is present in /var/lib/ceph.

Note: this doesn't fix the root cause, this is a workaround.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2075510

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6e2ebe857d)
2022-05-13 06:58:16 +02:00
Guillaume Abrioux c8df6e08eb contrib: add a playbook
this playbook can backup or restore some ceph files.
(/etc/ceph, /var/lib/ceph, ...)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2051640

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ed0bba4d77)
2022-05-12 17:33:25 +02:00
Teoman ONAY 274a780237 Using another user than root for cephadm ssh connections fails
Fixes commit da42f3d139

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2048734

Signed-off-by: Teoman ONAY <tonay@redhat.com>
(cherry picked from commit f851d3232c)
2022-03-21 09:35:28 +01:00
Guillaume Abrioux f7b7ba30d9 upgrade: block upgrade when rgw multisite is active
With this commit, upgrading a cluster from Nautilus to Pacific with
active rgw multisite replication will be blocked.
This is because a lot of bugs are currently present in Pacific regarding
RGW multisite.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2063702

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 51bc8cb636)
2022-03-21 08:42:55 +01:00
Guillaume Abrioux c618712f14 purge: ceph-crash purge fixes
This fixes the service file removal and makes the playbook
call `systemctl reset-failed` on the service because in Ceph
Nautilus, ceph-crash doesn't handle `SIGTERM` signal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2055992

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2f11982590)
2022-03-04 12:51:36 +01:00
Guillaume Abrioux bcab0d7a55 adopt: fix node labelling
When using group of group, the playbook will apply undesired
labels on nodes.
This commit fixes it by applying only the expected labels.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2057528

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 266b6e739c)
2022-03-03 17:01:58 +01:00
Teoman ONAY 839ad5927d Add cluster custom name support
When using cluster custom names, cephadm commands are executed using
the default admin keyring name which fails.

Signed-off-by: Teoman ONAY <tonay@redhat.com>
(cherry picked from commit f8c6bba657)
2022-03-03 17:01:58 +01:00
Teoman ONAY c3ce6fc41a Enable user to change the account used for ssh connection
By default cephadm uses root account to connect remotely
to other nodes in the cluster. This change allows to choose
another account.
This commit also allows to use a dedicated subnet for cephadm mgmt.

Signed-off-by: Teoman ONAY <tonay@redhat.com>
(cherry picked from commit da42f3d139)
2022-03-03 17:01:58 +01:00
Guillaume Abrioux 8096e4f4ce switch2containers: fail if less than 3 monitors
This playbook doesn't support less than 3 monitors present in the inventory.
Just like the rolling_update playbook, let's fail if less than
3 monitors are present.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2049132

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f08129edf2)
2022-02-22 09:23:40 +01:00
Guillaume Abrioux 314ba6e3e9 adopt: fix rbd-mirror adoption
We can't use `{{ cephadm_cmd }}` here because the monitors aren't yet adopted.
We must use `{{ ceph_cmd }}` instead.
This also fixes some filters `| default()` (they must be moved before `| from_json()`)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967440

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 94e51d5c14)
2022-02-10 08:49:43 +01:00
Guillaume Abrioux 371c25f0ef adopt: fix bug in mon_ip_list set_fact
`default('{}')` must be before `| from_json`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f30767432b)
2022-02-09 12:40:09 +01:00
Guillaume Abrioux cb197575dd adopt: check for POOL_APP_NOT_ENABLED warning
This commit makes the cephadm-adopt playbook fail if the cluster
has the `POOL_APP_NOT_ENABLED` warning raised.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2040243

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ddae06e1a2)
2022-02-09 12:40:09 +01:00
jowsiewski 3c38b1e410 Remove the remaining packages
Signed-off-by: jowsiewski <owsiewski@gmail.com>
(cherry picked from commit 1dfd195c7e)
2022-02-04 11:14:38 +01:00
Francesco Pantano 8f15179d57 Add with_pkg tag on package related tasks
In the OpenStack context we let the integration tool (TripleO)
deal with repositories and packages.
This change just adds the with_pkg tag to allow TripleO skipping
both the repositories and packages installation.

Signed-off-by: Francesco Pantano <fpantano@redhat.com>
(cherry picked from commit 12dd8b5df1)
2022-02-04 09:52:07 +01:00
Guillaume Abrioux fa281c7538 adopt: create nfs exports at the user level
The current implementation is wrong.
ceph-ansible lists all existing buckets and try to create
an export for each of them.
Instead, it's easier to create the export at the user level.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2037691

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7f517cdd22)
2022-01-29 15:25:46 +01:00
Guillaume Abrioux 17d8351971 cephadm-adopt: use named args in rgw export creation
In order to avoid breaking changes, let's use named argument
instead of positional argument syntax in the command line
used to create rgw export.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2037691

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit aee1f06497)
2022-01-06 16:52:05 +01:00
Guillaume Abrioux e676502c8f purge: remove ceph directories on client nodes
Otherwise any ceph directories are left over on client nodes
after the purge.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2024815

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 20035852a4)
2022-01-06 10:33:31 +01:00
Guillaume Abrioux 7791fac222 update: speed up client play
wip

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 817c03bc0e)
2021-12-15 13:48:14 +01:00
Guillaume Abrioux 8a32576d20 cephadm-adopt: ensure /etc/ceph is present on monitoring node
When deploying the monitoring stack on a dedicated node, the directory
`/etc/ceph` has never been created. Therefore, the play for adopting the
monitoring stack fails because it can't write the minimal config file.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2029697

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7ece59b41d)
2021-12-07 23:09:42 +01:00
Guillaume Abrioux b16d9fc289 cephadm-adopt: bindmount /var/lib/ceph with 'ro'
When collocating osds with iscsigw daemons, cephadm bindmounts the
following:

```
-v /var/lib/ceph/6126c064-6a9e-4092-8a64-977930df0843/iscsi.rbd.ceph-ameenasuhani-4fs3bq-node5.vomtqb/configfs:/sys/kernel/config
```

this prevents cephadm-adopt playbook from running container and bindmounting `/var/lib/ceph:/var/lib/ceph:z`

since 'ro' is enough in this playbook, let's replace the ':z' option on
this bindmount with ':ro'

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2027411

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c4fdf956bd)
2021-11-30 21:04:31 +01:00
Guillaume Abrioux 1628347253 adopt: fix ceph_origin and ceph_repository defaults
This is overriding those variables because the precedence at the 'block
var' level is greater than the group_vars/host_vars.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2026861

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e5ea2ece99)
2021-11-30 13:02:24 +01:00
Guillaume Abrioux 6bdaa9e3d5 cephadm: support adding hosts with ipv6
The current implementation doesn't support adding hosts when using ipv6
addresses.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4f2c2af9b4)
2021-11-08 10:36:14 +01:00
Guillaume Abrioux 0097cb09f1 cephadm: use public_network when adding hosts
When adding host, using ansible_facts['default_ipv4']['address'] might
not be the desired network, we shouldn't enforce the subnet with the
default route.
Let's use the public_network instead.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2006415

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2f34531304)
2021-11-08 10:36:14 +01:00
Dimitri Savineau 041e8b0eaa cephadm-adopt: remove logrotate configuration
cephadm uses its own logrotate configuration file so ceph-ansible needs
to remove that custom file during the cephadm-adopt playbook.

Closes: #6944

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c41241244e)
2021-11-03 11:51:03 +01:00
Guillaume Abrioux 19dadc98da update: move a set_fact
ceph-facts roles makes decisions based on the fact `rolling_update` so
it must be called before we run this role.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e5edcc4214)
2021-11-03 11:50:27 +01:00
Guillaume Abrioux 8f648269ec update: support --limit on monitor nodes
Change needed in order to support --limit on mon nodes.
Otherwise, a call to `hostvars[groups[mon_group_name][0]]['_current_monitor_address']`
throws an error:

```
"The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute '_current_monitor_address'"
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304#c28

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 82eee4303b)
2021-11-03 08:48:38 +01:00
Guillaume Abrioux a752edbd29 Revert "update: block upgrade when nfs+rgw is deployed"
This reverts commit 93f1765259.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2017508

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-10-28 08:13:05 +02:00
Guillaume Abrioux f7d67f7669 rolling_update: modify default health_osd_check_*
let's do more retries with a shorter delay.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 50a21d695e)
2021-10-25 21:08:44 +02:00
Guillaume Abrioux e5ef104c57 adopt: fix rbd mirror adoption
The rbd mirroring is broken because cephadm doesn't bindmount /etc/ceph anymore.
It means the keyrings and ceph config file aren't available after the
migration.
The idea here is to remove the current rbd mirror peer and add it back
to the mon config store so we aren't bound to the /etc/ceph directory.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967440

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9c794aa9bc)
2021-10-25 20:14:07 +02:00
Guillaume Abrioux b1bdb708d0 adopt: use mgr/nfs volume
use the mgr 'nfs' module to recreate nfs exports.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1954971

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4257410dcd)
2021-10-25 17:16:15 +02:00
Guillaume Abrioux efc6979db5 rolling_update: fix pre and post osd upgrade play
when using --limit osds, the play before and after osd upgrade are
skipped because we use `hosts: "{{ mon_group_name | default('mons') }}[0]"`
using `hosts: "{{ osds_group_name | default('osds') }}" with
`delegate_to` to the first monitor addresses this issue.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fc9f87c45f)
2021-10-25 15:33:18 +02:00
Guillaume Abrioux ca25ebb323 update: support upgrading a subset of nodes
It can be useful in a large cluster deployment to split the upgrade and
only upgrade a group of nodes at a time.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e5cf9db2b0)
2021-10-25 15:33:18 +02:00
Per Abildgaard Toft 3edc6ac5f2 shrink-osd: fix regression because of a wrong regex
968891f449 introduced a regression.
The regex is wrong because it doesn't allow to shrink osds with id
greater than 9

Fixes: #6950

Signed-off-by: Per Abildgaard Toft <per@minfejl.dk>
(cherry picked from commit 84118a3063)
2021-10-21 12:38:25 +02:00
Seena Fallah fde6354dcd cephadm: set ssh configs at bootstrap step
Add support ssh_user and ssh_config to cephadm bootstrap plugin

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit ae6be71b08)
2021-10-15 16:15:38 +02:00