Commit Graph

69 Commits (9f04949ba0ed029fffe455283bd583e124e4d2fd)

Author SHA1 Message Date
Guillaume Abrioux 8a32576d20 cephadm-adopt: ensure /etc/ceph is present on monitoring node
When deploying the monitoring stack on a dedicated node, the directory
`/etc/ceph` has never been created. Therefore, the play for adopting the
monitoring stack fails because it can't write the minimal config file.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2029697

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7ece59b41d)
2021-12-07 23:09:42 +01:00
Guillaume Abrioux b16d9fc289 cephadm-adopt: bindmount /var/lib/ceph with 'ro'
When collocating osds with iscsigw daemons, cephadm bindmounts the
following:

```
-v /var/lib/ceph/6126c064-6a9e-4092-8a64-977930df0843/iscsi.rbd.ceph-ameenasuhani-4fs3bq-node5.vomtqb/configfs:/sys/kernel/config
```

this prevents cephadm-adopt playbook from running container and bindmounting `/var/lib/ceph:/var/lib/ceph:z`

since 'ro' is enough in this playbook, let's replace the ':z' option on
this bindmount with ':ro'

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2027411

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c4fdf956bd)
2021-11-30 21:04:31 +01:00
Guillaume Abrioux 1628347253 adopt: fix ceph_origin and ceph_repository defaults
This is overriding those variables because the precedence at the 'block
var' level is greater than the group_vars/host_vars.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2026861

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e5ea2ece99)
2021-11-30 13:02:24 +01:00
Guillaume Abrioux 6bdaa9e3d5 cephadm: support adding hosts with ipv6
The current implementation doesn't support adding hosts when using ipv6
addresses.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4f2c2af9b4)
2021-11-08 10:36:14 +01:00
Guillaume Abrioux 0097cb09f1 cephadm: use public_network when adding hosts
When adding host, using ansible_facts['default_ipv4']['address'] might
not be the desired network, we shouldn't enforce the subnet with the
default route.
Let's use the public_network instead.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2006415

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2f34531304)
2021-11-08 10:36:14 +01:00
Dimitri Savineau 041e8b0eaa cephadm-adopt: remove logrotate configuration
cephadm uses its own logrotate configuration file so ceph-ansible needs
to remove that custom file during the cephadm-adopt playbook.

Closes: #6944

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c41241244e)
2021-11-03 11:51:03 +01:00
Guillaume Abrioux e5ef104c57 adopt: fix rbd mirror adoption
The rbd mirroring is broken because cephadm doesn't bindmount /etc/ceph anymore.
It means the keyrings and ceph config file aren't available after the
migration.
The idea here is to remove the current rbd mirror peer and add it back
to the mon config store so we aren't bound to the /etc/ceph directory.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967440

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9c794aa9bc)
2021-10-25 20:14:07 +02:00
Guillaume Abrioux b1bdb708d0 adopt: use mgr/nfs volume
use the mgr 'nfs' module to recreate nfs exports.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1954971

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4257410dcd)
2021-10-25 17:16:15 +02:00
Seena Fallah 7b19748304 cephadm-adopt: configure repository for cephadm installation
Configure repository for cephadm installation and use package install in both containerized and non containerized deployment

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit 339212a7c6)
2021-10-13 08:10:05 +02:00
Francesco Pantano 642a83dc6b Add ceph_nfs_adopt tag to the cephadm-adopt playbook
There are existing OpenStack scenarios where nfs is still not managed
by cephadm. For this reason sometimes is useful skip the nfs part of
the adoption playbook and leave this daemon unmanaged.
The purpose of this patch is providing a tag to enable the OpenStack
operators to skip this playbook section.

Closes: https://bugzilla.redhat.com/2009212
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
(cherry picked from commit b7299f258b)
2021-10-01 23:32:33 +02:00
Guillaume Abrioux d196881ebb cephadm-adopt: add no_log: true
Let's add a `no_log: true` on the `cephadm registry-login` task.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0a3b916ee7)
2021-09-28 21:15:02 +02:00
Guillaume Abrioux a053adbe84 adopt: stop iscsi services in the first place
If old containers are still running, it can make tcmu-runner process
unable to open devices and there's nothing else to do than restarting
the container.

Also, as per discussion with iscsi experts, iscsi should be migrated before
OSDs. (the client should be closed before the server)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2000412

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d12efa1ab4)
2021-09-28 18:46:49 +02:00
Seena Fallah cb5a675e49 cephadm-adopt: use cephadm_ssh_user for ssh user
Use cephadm_ssh_user to set custom user (not root) for cephadm to ssh to the hosts

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit 67389d08d4)
2021-09-13 16:26:24 +02:00
Daniel Pivonka 969e41fa2e cephadm-adopt: set cephadm registry login info
registry login info needs to be stored in cluster for cephadm and future hosts

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2000103
Signed-off-by: Daniel Pivonka <dpivonka@redhat.com>
(cherry picked from commit 1c50dc29cf)
2021-09-13 16:18:40 +02:00
Dimitri Savineau ac5353a2d8 cephadm-adopt: fix orch host add with FQDN
When a node is configured with FQDN as the hostname value then the
`ceph orch host add` command will fail because the `ansible_hostname` used
by that command contains the short hostname which won't match the current
hostname (FQDN)
Instead we can use the ansible_nodename fact.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1997083

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2630f8d47a)
2021-08-26 17:10:55 -04:00
Dimitri Savineau e3e849378e cephadm-adopt: remove ceph-nfs.target
This systemd target doesn't exist at all.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8ba6101bbb)
2021-08-18 15:29:03 -04:00
Guillaume Abrioux d7311aeefc containers: introduce target systemd unit
This adds ceph-*.target systemd unit files support for containerized
deployments.
This also fixes a regression introduced by PR #6719 (rgw and nfs systemd
units not getting purged)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1962748

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 09ef465f62)
2021-08-18 13:42:50 -04:00
Guillaume Abrioux 6e9cf80747 adopt: import rgw ssl certificate into kv store
Without this, when rgw is managed by cephadm, it fails to start because
the ssl certificate isn't present in the kv store.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1987010
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1988404

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 930fc4c850)
2021-08-05 14:47:47 -04:00
Dimitri Savineau 2377da8f9b infra: use dedicated variables for balancer status
The balancer status is registered during the cephadm-adopt, rolling_update
and swith2container playbooks. But it is also used in the ceph-handler role
which is included in those playbooks too.
Even if the ceph-handler tasks are skipped for rolling_update and
switch2container, the balancer_status variable is erased with the skip task
result.

play1:
  register: balancer_status
play2:
  register: balancer_status <-- skipped
play3:
  when: (balancer_status.stdout | from_json)['active'] | bool

This leads to issue like:

The conditional check '(balancer_status.stdout | from_json)['active'] | bool'
failed. The error was: Unexpected templating type error occurred on
({% if (balancer_status.stdout | from_json)['active'] | bool %} True
{% else %} False {% endif %}): expected string or buffer.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1982054

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 386661699b)
2021-08-04 11:43:47 -04:00
Dimitri Savineau 31cc8bd2aa osds: use osd pool ls instead of osd dump command
The ceph osd pool ls detail command is a subset of the ceph osd dump
command.

$ ceph osd dump --format json|wc -c
10117
$ ceph osd pool ls detail --format json|wc -c
4740

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 06471a4b82)
2021-08-03 13:57:14 -04:00
Benoît Knecht c8348ab0d9 infrastructure-playbooks: Get Ceph info in check mode
In the `set osd flags` block, run the Ceph commands that gather information
from the cluster (and don't make any changes to it) even when running in check
mode.

This allows the tasks that depend on the variables set by those tasks to
succeed in check mode.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit d7653dca95)
2021-08-02 15:53:49 +02:00
Dimitri Savineau e54c8e93ee cephadm-adopt: set application on ganesha pool
Set the nfs application to the ganesha pool.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1956840

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit aeb9f562e5)
2021-07-21 16:22:25 +02:00
Dimitri Savineau 3ec8e90b34 cephadm-adopt: enable osd memory autotune for HCI
This enables the osd_memory_target_autotune option on HCI environment.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1973149

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a305296384)
2021-07-21 16:22:10 +02:00
Dimitri Savineau cf734e19b7 common: fix py2 pool_list from_json when skipped
When using python 2 and the task with a loop is skipped then it generates
an error.

Unexpected templating type error occurred on
({{ (pool_list.stdout | from_json)['pools'] }}): expected string or buffer

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit cf6e33346e)
2021-07-21 14:00:30 +02:00
Guillaume Abrioux 3cc8c667d0 common: disable/enable pg_autoscaler
The PG autoscaler can disrupt the PG checks so the idea here is to
disable it and re-enable it back after the restart is done.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 13036115e2)
2021-07-20 11:04:25 -04:00
Guillaume Abrioux f80837c23e cephadm_adopt: add any_errors_fatal on play
Add any_errors_fatal: true in cephadm-adopt playbook.
We should stop the playbook execution when a task throws an error.
Otherwise it can lead to unexpected behavior.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1976179

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3b804a61dd)
2021-07-03 11:58:57 +02:00
Guillaume Abrioux 0856d3e47f cephadm-adopt/rgw: add host target in svc_id
If multi-realms were deployed with several instances belonging to the same
realm and zone using the same port on different nodes, the service id
expected by cephadm will be the same and therefore only one service will
be deployed. We need to create a service called
`<node>.<realm>.<zone>.<port>` to be sure the service name will be unique
and well deployed on the expected node in order to preserve backward
compatibility with the rgws instances that were deployed with
ceph-ansible.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967455

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 31311b03ed)
2021-06-29 15:18:49 +02:00
Guillaume Abrioux aa332ac64d cephadm-adopt: support rgw multisite adoption
We need to support rgw multisite deployments.
This commit makes the adoption playbook support this kind of deployment.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967455

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fc784fc44c)
2021-06-24 09:48:27 +02:00
Guillaume Abrioux 17f9780274 cephadm-adopt: fix mgr placement hosts task
When no `[mgrs]` group is defined in the inventory, mgr daemon are
implicitly collocated with monitors.
This task currently relies on the length of the mgr group in order to
tell cephadm to deploy mgr daemons.
If there's no `[mgrs]` group defined in the inventory, it will ask
cephadm to deploy 0 mgr daemon which doesn't make sense and will throw
an error.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970313

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f9a73149a4)
2021-06-14 13:55:45 +02:00
Guillaume Abrioux 747d259511 cephadm_adopt: fix ceph-crash migration
ceph-ansible leaves a ceph-crash container in containerized deployment.
It means we end up with 2 ceph-crash containers running after the
migration playbook is complete.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1954614

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 22c18e82f0)
2021-04-29 07:14:17 +02:00
Guillaume Abrioux 60c0fb8a7a cephadm_adopt: fix rgw placement task
Due to a recent breaking change in ceph, this command must be modified
to add the <svc_id> parameter.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1f40c12502)
2021-04-27 15:17:28 +02:00
Guillaume Abrioux a1f445cc73 cephadm_adopt: create a 'nfs-ganesha' pool
When migrating from a cluster with no MDS nodes deployed,
`{{ cephfs_data_pool.name }}` doesn't exist so we need to create a pool
for storing nfs export objects.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1950403

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit bb7d37fb6a)
2021-04-27 15:17:28 +02:00
Guillaume Abrioux 6b87d8c95c cephadm_adopt: support nfs-ganesha adoption
This commit adds the nfs-ganesha adoption support in the
`cephadm-adopt.yml` playbook.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1944504

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a9220654f5)
2021-04-12 15:32:22 +02:00
Guillaume Abrioux 5aa9d0dfb4 cephadm_adopt: modify placement policy for rgw
the adoption playbook should use `radosgw_num_instances` in order to
determine how much rgw instance it should set recreate.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1943170

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1ffc4df6b6)
2021-04-12 15:32:22 +02:00
Guillaume Abrioux c2d40d4383 cephadm_adopt: fix a typo
This play doesn't nothing else than stopping/removing rgw daemons.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ee44d86072)
2021-04-12 15:32:22 +02:00
Alex Schultz 56aac327dd Use ansible_facts
It has come to our attention that using ansible_* vars that are
populated with INJECT_FACTS_AS_VARS=True is not very performant.  In
order to be able to support setting that to off, we need to update the
references to use ansible_facts[<thing>] instead of ansible_<thing>.

Related: ansible#73654
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935406
Signed-off-by: Alex Schultz <aschultz@redhat.com>
(cherry picked from commit a7f2fa73e6)
2021-03-26 00:04:49 +01:00
Guillaume Abrioux 8d25b4305e adopt: convert legacy grafana-server groupname early
This is a follow up on PR #6332

cephadm-adopt.yml playbook is affected by the same bug

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1938658

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit af95595c82)
2021-03-18 08:56:44 +01:00
Guillaume Abrioux c296824ae0 cephadm_adopt: fetch and write ceph minimal config
This commit makes the playbook fetch the minimal current ceph
configuration and write it later on monitoring nodes so `cephadm` can
proceed with the adoption.
When a monitoring stack was deployed on a dedicated node, it means no
`ceph.conf` file was written, `cephadm` requires a `ceph.conf` in order
to adopt the daemon present on the node.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1939887

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b445df0479)
2021-03-18 08:51:59 +01:00
Dimitri Savineau 950a6ae406 cephadm-adopt: remove prometheus workaround
This was fixed by [1][2]

[1] https://tracker.ceph.com/issues/45120
[2] https://github.com/ceph/ceph/commit/252d4b30

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-02-10 13:51:41 +01:00
Dimitri Savineau 76a663245d cephadm-adopt: use ceph_osd_flag module
There's no reason to not use the ceph_osd_flag module to set/unset osd
flags.
Also if there's no OSD nodes in the inventory then we don't need to
execute the set/unset play.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-02-03 08:29:31 +01:00
Dimitri Savineau 2734a12d44 cephadm-adopt: use radosgw modules for idempotency
When rerunning the cephadm-adopt.yml playbook the radosgw realm,
zonegroup and zone tasks will fail because the task isn't
idempotent.
Using the radosgw ansible modules solves that problem.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-01-29 21:07:39 +01:00
Dimitri Savineau 6886700a00 cephadm-adopt: make the playbook idempotent
If the cephadm-adopt.yml fails during the first execution and some
daemons have already been adopted by cephadm then we can't rerun
the playbook because the old container won't exist anymore.

Error: no container with name or ID ceph-mon-xxx found: no such container

If the daemons are adopted then the old systemd unit doesn't exist anymore
so any call to that unit with systemd will fail.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918424

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-01-29 21:07:39 +01:00
Dimitri Savineau 13427eddac cephadm-adopt: add grafana group conversion
The grafana group conversion task wasn't present in the cephadm-adopt.yml
playbook.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1917530

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-01-18 20:52:58 +01:00
Dimitri Savineau 5b6f907a72 cephadm: remove loop on host add tasks
Instead of iterate over the host list for adding the node/label to the
host orchestrator configuration then we can do it parallelly.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-12-16 15:14:28 +01:00
Dimitri Savineau 08f118077f library: add cephadm_adopt module
This adds cephadm_adopt ansible module for replacing the command module
usage with the cephadm adopt command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-12-02 09:15:44 +01:00
Dimitri Savineau cf7345f143 consume ceph_volume module when possible
We should always use the ceph_volume ansible module when possible.
This patch replace the ceph-volume inventory and lvm {list,zap} commands
called via the command/shell modules by the corresponding call with the
ceph_volume module.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-12-01 17:54:10 +01:00
Dimitri Savineau eaf0ebfc85 library: add ceph_mgr_module module
This adds ceph_mgr_module ansible module for replacing the command module
usage with the ceph mgr module enable/disable commands.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-11-30 16:52:02 +01:00
Guillaume Abrioux 195d88fcda lint: ignore 302,303,505 errors
ignore 302,303 and 505 errors

[302] Using command rather than an argument to e.g. file
[303] Using command rather than module
[505] referenced files must exist

they aren't relevant on these tasks.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-11-23 08:33:47 +01:00
Dimitri Savineau 88f91d8c12 monitor: use quorum_status instead of ceph status
The ceph status command returns a lot of information stored in variables
and/or facts which could consume resources for nothing.
When checking the quorum status, we're only using the quorum_names
structure in the ceph status output.
To optimize this, we could use the ceph quorum_status command which contains
the same needed information.
This command returns less information.

$ ceph status -f json  | wc -c
2001
$ ceph quorum_status -f json  | wc -c
957
$ time ceph status -f json > /dev/null

real	0m0.577s
user	0m0.538s
sys	0m0.029s
$ time ceph quorum_status -f json > /dev/null

real	0m0.544s
user	0m0.527s
sys	0m0.016s

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-11-03 09:05:33 +01:00
Dimitri Savineau ee50588590 osds: use pg stat command instead of ceph status
The ceph status command returns a lot of information stored in variables
and/or facts which could consume resources for nothing.
When checking the pgs state, we're using the pgmap structure in the ceph
status output.
To optimize this, we could use the ceph pg stat command which contains
the same needed information.
This command returns less information (only about pgs) and is slightly
faster than the ceph status command.

$ ceph status -f json | wc -c
2000
$ ceph pg stat -f json | wc -c
240
$ time ceph status -f json > /dev/null

real	0m0.529s
user	0m0.503s
sys	0m0.024s
$ time ceph pg stat -f json > /dev/null

real	0m0.426s
user	0m0.409s
sys	0m0.016s

The data returned by the ceph status is even bigger when using the
nautilus release.

$ ceph status -f json | wc -c
35005
$ ceph pg stat -f json | wc -c
240

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-11-03 09:05:33 +01:00