The rbd mirroring is broken because cephadm doesn't bindmount /etc/ceph anymore.
It means the keyrings and ceph config file aren't available after the
migration.
The idea here is to remove the current rbd mirror peer and add it back
to the mon config store so we aren't bound to the /etc/ceph directory.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967440
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9c794aa9bc)
when using --limit osds, the play before and after osd upgrade are
skipped because we use `hosts: "{{ mon_group_name | default('mons') }}[0]"`
using `hosts: "{{ osds_group_name | default('osds') }}" with
`delegate_to` to the first monitor addresses this issue.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fc9f87c45f)
It can be useful in a large cluster deployment to split the upgrade and
only upgrade a group of nodes at a time.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e5cf9db2b0)
968891f449 introduced a regression.
The regex is wrong because it doesn't allow to shrink osds with id
greater than 9
Fixes: #6950
Signed-off-by: Per Abildgaard Toft <per@minfejl.dk>
(cherry picked from commit 84118a3063)
Add support ssh_user and ssh_config to cephadm bootstrap plugin
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit ae6be71b08)
Configure repository for cephadm installation and use package install in both containerized and non containerized deployment
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit 339212a7c6)
There are existing OpenStack scenarios where nfs is still not managed
by cephadm. For this reason sometimes is useful skip the nfs part of
the adoption playbook and leave this daemon unmanaged.
The purpose of this patch is providing a tag to enable the OpenStack
operators to skip this playbook section.
Closes: https://bugzilla.redhat.com/2009212
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
(cherry picked from commit b7299f258b)
Use cephadm_ssh_user to set custom user (not root) for cephadm to ssh to the hosts
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit 0b78faa723)
This is needed if you want a copy of the admin keyring on the admin
nodes.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b555f1d1cd)
Let's add a `no_log: true` on the `cephadm registry-login` task.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0a3b916ee7)
If old containers are still running, it can make tcmu-runner process
unable to open devices and there's nothing else to do than restarting
the container.
Also, as per discussion with iscsi experts, iscsi should be migrated before
OSDs. (the client should be closed before the server)
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2000412
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d12efa1ab4)
Use cephadm_ssh_user to set custom user (not root) for cephadm to ssh to the hosts
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit 67389d08d4)
`container_binary` isn't set anymore in the purge osd play because of a
regression introduced by 60aa70a.
The CI didn't catch it because the play purging node-exporter sets this
variable for all nodes before we run the purge osd play.
This commit fixes this regression.
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit a51ce767ca)
When a node is configured with FQDN as the hostname value then the
`ceph orch host add` command will fail because the `ansible_hostname` used
by that command contains the short hostname which won't match the current
hostname (FQDN)
Instead we can use the ansible_nodename fact.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1997083
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2630f8d47a)
This adds ceph-*.target systemd unit files support for containerized
deployments.
This also fixes a regression introduced by PR #6719 (rgw and nfs systemd
units not getting purged)
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1962748
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 09ef465f62)
The balancer status is registered during the cephadm-adopt, rolling_update
and swith2container playbooks. But it is also used in the ceph-handler role
which is included in those playbooks too.
Even if the ceph-handler tasks are skipped for rolling_update and
switch2container, the balancer_status variable is erased with the skip task
result.
play1:
register: balancer_status
play2:
register: balancer_status <-- skipped
play3:
when: (balancer_status.stdout | from_json)['active'] | bool
This leads to issue like:
The conditional check '(balancer_status.stdout | from_json)['active'] | bool'
failed. The error was: Unexpected templating type error occurred on
({% if (balancer_status.stdout | from_json)['active'] | bool %} True
{% else %} False {% endif %}): expected string or buffer.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1982054
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 386661699b)
The ceph osd pool ls detail command is a subset of the ceph osd dump
command.
$ ceph osd dump --format json|wc -c
10117
$ ceph osd pool ls detail --format json|wc -c
4740
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 06471a4b82)
eec3878 introduced a regression for upgrade scenarios where there's no
monitor nodes at all (like ganesha standalone, external clients, etc..)
TASK [get the ceph release being deployed] ************************************
task path: infrastructure-playbooks/rolling_update.yml:121
Thursday 29 July 2021 15:55:29 +0000 (0:00:00.484) 0:00:15.802 *********
fatal: [client0]: FAILED! =>
msg: '''dict object'' has no attribute ''mons'''
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit e87a47cf0c)
In the `set osd flags` block, run the Ceph commands that gather information
from the cluster (and don't make any changes to it) even when running in check
mode.
This allows the tasks that depend on the variables set by those tasks to
succeed in check mode.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit d7653dca95)
Check early which Ceph release is going to be deployed and fail if it
doesn't correspond to the ceph-ansible version being used.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978643
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit eec38784ec)
This adds a task that zaps by osd id so we can support the scenario
where osds were deployed with `osd_auto_discovery` is true.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1876860
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4144074a50)
This refactor merges the two playbooks so we only have to maintain 1
playbook.
(Symlink the old purge-container-cluster.yml playbook for backward
compatibility).
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 17cd83bf3a)
Those variables are useless given this is not possible to override them.
Let's replace them with the hardcoded name instead.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6b50401d0c)
1303611 introduced tasks for disabling the pg_autoscaler on pools and
the balancer but thoses tasks are already executed on the first monitor
node so we don't need to add the run_once statement.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 738fa9428a)
When using python 2 and the task with a loop is skipped then it generates
an error.
Unexpected templating type error occurred on
({{ (pool_list.stdout | from_json)['pools'] }}): expected string or buffer
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit cf6e33346e)
The PG autoscaler can disrupt the PG checks so the idea here is to
disable it and re-enable it back after the restart is done.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 13036115e2)
This commit reindents the playbook.
Also improve readability by adding an extra line between plays.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 60aa70a128)
If one a the monitor is out of the quorum then nothing prevents the upgrade
playbook to run.
We only check if we have at least three monitor nodes but we should also
check if those monitor nodes are correctly present in the quorum.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1952571
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 97148dd58c)
It's better to fail the playbook so the user is aware the straw2
migration has failed.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c396122ad9)
After an upgrade, the presence of straw buckets will produce the
following warning (HEALTH_WARN):
```
crush map has legacy tunables (require firefly, min is hammer)
```
because straw bucket is a firefly feature it needs to be converted to
straw2.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967964
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit eee576477c)
The dashboard/monitoring stack can be deployed via the dashboard_enabled
variable. But there's nothing similar if we can to remove that part only
and keep the ceph cluster up and running.
The current purge playbooks remove everything.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786691
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8e4ef7d6da)
Add any_errors_fatal: true in cephadm-adopt playbook.
We should stop the playbook execution when a task throws an error.
Otherwise it can lead to unexpected behavior.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1976179
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3b804a61dd)
This adds the monitoring group in the "final cleanup play" so any cid
files generated are well removed when purging the cluster.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1974536
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 037d8cd05e)
There's no benefit to gather facts again on each play in
rolling_update.yml
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2c77d0094c)
Starting RHCS 5, there's no ISO available anymore.
This removes all ISO variables and the ceph_repository_type variable.
Closes: #6626
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a05730b38a)