Commit Graph

32 Commits (03ed9e111c43816954919ba08b6ec0b6c5ef3e9c)

Author SHA1 Message Date
Dimitri Savineau 1044940304 osds: use osd pool ls instead of osd dump command
The ceph osd pool ls detail command is a subset of the ceph osd dump
command.

$ ceph osd dump --format json|wc -c
10117
$ ceph osd pool ls detail --format json|wc -c
4740

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 06471a4b82)
2021-08-03 14:03:35 -04:00
Benoît Knecht 9668137daf ceph-handler: Fix osd handler in check mode
Run the Ceph commands that only gather information (without making any changes
to the cluster) when running Ansible in check mode.

This allows the tasks that depend on the variables set by those tasks to
succeed in check mode.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 498acd7527)
2021-08-02 15:54:34 +02:00
Dimitri Savineau 17b9ff03d2 common: fix py2 pool_list from_json when skipped
When using python 2 and the task with a loop is skipped then it generates
an error.

Unexpected templating type error occurred on
({{ (pool_list.stdout | from_json)['pools'] }}): expected string or buffer

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit cf6e33346e)
2021-07-21 09:54:46 -04:00
Guillaume Abrioux f7882bbc02 common: disable/enable pg_autoscaler
The PG autoscaler can disrupt the PG checks so the idea here is to
disable it and re-enable it back after the restart is done.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 13036115e2)
2021-07-21 09:40:18 -04:00
Guillaume Abrioux 611494b88f rolling_update: fix mon+rgw/multisite collocation
When monitors and rgw are collocated with multisite enabled, the
rolling_update playbook fails because during the workflow, we run some
radosgw-admin commands very early on the first mon even though this is
the monitor being upgraded, it means the container doesn't exist since
it was stopped.

This block is relevant only for scaling out rgw daemons or initial
deployment. In rolling_update workflow, it is not needed so let's skip
it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970232

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f7166cccbf)
2021-06-14 13:59:16 +02:00
Guillaume Abrioux 3ef9690cd1 docker2podman: skip some role imports from handler
when running docker-to-podman playbook, there's no need to call
`ceph-config` and `ceph-rgw` from the role `ceph-handler`.
It can even have side effects when coming from a baremetal cluster that
was previously migrated using the switch-to-containers playbook. Indeed
it might complain about missing .target systemd unit since they are
removed during that migration.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1944999

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 70f19be367)
2021-04-12 13:30:31 +02:00
Alex Schultz 7ddbe74712 Use ansible_facts
It has come to our attention that using ansible_* vars that are
populated with INJECT_FACTS_AS_VARS=True is not very performant.  In
order to be able to support setting that to off, we need to update the
references to use ansible_facts[<thing>] instead of ansible_<thing>.

Related: ansible#73654
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935406
Signed-off-by: Alex Schultz <aschultz@redhat.com>
(cherry picked from commit a7f2fa73e6)
2021-03-26 00:16:58 +01:00
Guillaume Abrioux aeee3471e3 rgw: avoid useless call to ceph-rgw
since `ceph-rgw` may be called from `ceph-handler` in some contexts we
should avoid rerunning it unnecessarily.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8617081664)
2021-01-28 16:37:50 -05:00
Guillaume Abrioux 290d3ef369 rgw: support switching from single-site to multisite
When collocating rgw with either a mon, mgr or osd, switching from
single site to a multisite rgw setup failed because of the handlers
triggered between the ansible play of the collocated daemon and the play
of the rgw. Since the multisite changes are not yet applied the handlers
fail.
The idea here is to ensure we run the multisite configuration from the
ceph-handler role before the restart happens, this way it won't complain
because of non existing multisite configuration.

(Note: this is also valid when simply changing a multisite configuration)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1888630

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 513c8cfe55)
2021-01-06 10:38:50 -05:00
Dimitri Savineau 3f16132e44 library: add ceph_osd_flag module
This adds ceph_osd_flag ansible module for replacing the command module
usage with the ceph osd set/unset commands.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5da593604a)
2020-12-15 17:36:28 +01:00
Guillaume Abrioux 709deb90cc handler: refact check_socket_non_container
the `stat --printf=%n` returns something like following:

```
ok: [osd0] => changed=false
  cmd: |-
    stat --printf=%n /var/run/ceph/ceph-osd*.asok
  delta: '0:00:00.009388'
  end: '2020-10-06 06:18:28.109500'
  failed_when_result: false
  rc: 0
  start: '2020-10-06 06:18:28.100112'
  stderr: ''
  stderr_lines: <omitted>
  stdout: /var/run/ceph/ceph-osd.2.asok/var/run/ceph/ceph-osd.5.asok
  stdout_lines: <omitted>
```

it makes the next task "check if the ceph osd socket is in-use" grep
like this:

```
ok: [osd0] => changed=false
  cmd:
  - grep
  - -q
  - /var/run/ceph/ceph-osd.2.asok/var/run/ceph/ceph-osd.5.asok
  - /proc/net/unix
```

which will obviously fail because this path never exists. It makes the
OSD handler broken.

Let's use `find` module instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 46d4d97da9)
2020-10-14 10:31:05 +02:00
Guillaume Abrioux 52826caa51 rgw: fix multi instances scaleout in baremetal
When rgw and osd are collocated, the current workflow prevents from
scaling out the radosgw_num_instances parameter when rerunning the
playbook in baremetal deployments.

When ceph-osd notifies handlers, it means rgw handlers are triggered
too. The issue with this is that they are triggered before the role
ceph-rgw is run.
In the case a scaleout operation is expected on `radosgw_num_instances`
it causes an issue because keyrings haven't been created yet so the new
instances won't start.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1881313

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a802fa2810)
2020-10-06 09:21:58 -04:00
Dimitri Savineau fabaec6351 ceph-handler: set handler on xxx_stat result
In non containerized deployment we check if the service is running
via the socket file presence.
This is done via the xxx_socket_stat variable that check the file
socket in the /var/run/ceph/ directory.
In some scenarios, we could have the socket file still present in
that directory but not used by any process.
That's why we have the xxx_stat variable which clean those leftovers.

The problem here is that we're set the variable for the handlers status
(like handler_mon_status) based on xxx_socket_stat instead of xxx_stat.
That means we will trigger the handlers if there's an old socket file
present on the system without any process associated.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1866834

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 733596582d)
2020-09-29 16:33:08 +02:00
Dimitri Savineau 182319d58c ceph-handler: add missing condition on ceph-crash
The ceph-crash tasks present in the ceph-handler role don't need to be
executed on all nodes.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 18e3c7a0a2)
2020-09-10 20:35:04 -04:00
Guillaume Abrioux 66dde0034b ceph-crash: introduce new role ceph-crash
This commit introduces a new role `ceph-crash` in order to deploy
everything needed for the ceph-crash daemon.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9d2f2108e1)
2020-09-10 20:35:04 -04:00
Dimitri Savineau cce042c65b ceph-handler: remove iscsigws restart scripts
The iscsigws restart scripts for tcmu-runner and rbd-target-{api,gw}
services only call the systemctl restart command.
We don't really need to copy a shell script to do it when we can use
the ansible service module instead.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit cbe79428e6)
2020-07-27 09:33:00 -04:00
Guillaume Abrioux 9fb69e13ed handler: read container_exec_cmd value from first mon
Given that we delegate to the first monitor, we must read the value of
`container_exec_cmd` from this node.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792320

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit eb9112d8fb)
2020-01-23 18:34:14 +01:00
Guillaume Abrioux 1462423059 handler: fix call to container_exec_cmd in handler_osds
When unsetting the noup flag, we must call container_exec_cmd from the
delegated node (first mon member)
Also, adding a `run_once: true` because this task needs to be run only 1
time.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792320

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 22865cde9c)
2020-01-20 12:45:51 -05:00
Guillaume Abrioux 2d85fab02d osd: support scaling up using --limit
This commit lets add-osd.yml in place but mark the deprecation of the
playbook.
Scaling up OSDs is now possible using --limit

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3496a0efa2)
2020-01-14 09:12:34 -05:00
Guillaume Abrioux e001ded6f6 handler: fix bug
411bd07d54 introduced a bug in handlers

using `handler_*_status` instead of `hostvars[item]['handler_*_status']`
causes handlers to be triggered in anycase even though
`handler_*_status` was set to `False` on a specific node.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622688

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 30200802d9)
2020-01-08 19:46:11 -05:00
Dimitri Savineau 86b7137b27 ceph-iscsi: notify rbd target services
When the iscsi gateway or the ceph configuration file change then we
need to notify the rbd target api/gw services to be restarted.
This patch also merges the rbd-target-api and rbd-target-gw handler
into a single file and listen.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bc701860d5)
2019-10-16 11:34:15 -04:00
Dimitri Savineau 3313bc5c1f ceph-handler: group listen topics and condition
We are using multiple listen topics with the handlers. That means that
we are notifying 4 tasks for each handler.
Instead we can group the listen on an include_tasks and based on the
group condition.

Before:

NOTIFIED HANDLER ceph-handler : set _mon_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy mon restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph mon daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _mon_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _osd_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy osd restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph osds daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _osd_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _mds_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy mds restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph mds daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _mds_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _rgw_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy rgw restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph rgw daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _rgw_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _mgr_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy mgr restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph mgr daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _mgr_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _rbdmirror_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy rbd mirror restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph rbd mirror daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _rbdmirror_handler_called after restart for mon0

After:

NOTIFIED HANDLER ceph-handler : mons handler for mon0
NOTIFIED HANDLER ceph-handler : osds handler for mon0
NOTIFIED HANDLER ceph-handler : mdss handler for mon0
NOTIFIED HANDLER ceph-handler : rgws handler for mon0
NOTIFIED HANDLER ceph-handler : mgrs handler for mon0
NOTIFIED HANDLER ceph-handler : rbdmirrors handler for mon0

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fe9c5b8c68)
2019-10-15 13:29:06 -04:00
Guillaume Abrioux 13f6a0a22a handler: followup on #4519
This commit adds some missing `| bool` filters.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ccc11cfc93)
2019-10-15 13:29:06 -04:00
Guillaume Abrioux fd10fbc047 handlers: refact osd handler
This commit merges the two restart tasks into a single one, this way
it's one task less to notify.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 411bd07d54)
2019-10-15 13:29:06 -04:00
Guillaume Abrioux 857c68087d handler: followup on #4519
This commit adds some missing `| bool` filters.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ccc11cfc93)
2019-10-07 09:09:36 +02:00
Giulio Fidente cb66a62ae2 Look for additional names when checking ceph-nfs container status
Ganesha cannot be operated active/active, in those deployments
where it is managed by pacemaker the container name can be
different than the default.

This change uses "ceph_nfs_service_suffix" where previously
missing to ensure tasks will work with customized names.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1750005
Signed-off-by: Giulio Fidente <gfidente@redhat.com>
(cherry picked from commit d2a2bd7c42)
2019-09-09 16:48:50 -04:00
Dimitri Savineau 3815add534 ceph-handler: replace fuser by /proc/net/unix
We're using fuser command to see if a process is using a ceph unix
socket file. But the fuser command runs through every PID present in
/proc/<PID> to see if one of them is using the file.
On a system running thousands processes, the fuser command can take
a long time to finish.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1717011

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit da9891da1e)
2019-06-12 23:00:36 +02:00
L3D 1daca1ba83 ansible: use 'bool' filter on boolean conditionals
By running ceph-ansible there are a lot ``[DEPRECATION WARNING]`` like these:
```
[DEPRECATION WARNING]: evaluating containerized_deployment as a bare variable,
this behaviour will go away and you might need to add |bool to the expression
in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This
feature will be removed in version 2.12. Deprecation warnings can be disabled
by setting deprecation_warnings=False in ansible.cfg.
```

Now appended ``| bool`` on a lot of the affected variables.

Sometimes the coding style from ``variable|bool`` changed to ``variable | bool`` *(with spaces at the pipe)*.

Closes: #4022

Signed-off-by: L3D <l3d@c3woc.de>
(cherry picked from commit ab54fe20ec)
2019-06-07 16:05:51 +02:00
Rishabh Dave 06b3ab2a6b improve coding style
Keywords requiring only one item shouldn't express it by creating a
list with single item.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 739a662c80)

Conflicts:
	roles/ceph-mon/tasks/ceph_keys.yml
	roles/ceph-validate/tasks/check_devices.yml
2019-05-06 15:09:06 +00:00
Sébastien Han a96e910114 Add new container scenario
Test with podman instead of docker and also support for python 3 only.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-27 16:47:40 +00:00
Sébastien Han 790f52f934 ceph-handler: change osd container check
Now that the container is named ceph-osd@<id> looking for something that
contains a host is not necessary. This is also backward compatible as it
will continue to match container names with hostname in them.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-10 16:08:41 -04:00
Sébastien Han 4db6a213f7 add ceph-handler role
The role contains all the handlers for Ceph services. We decided to
leave ceph-defaults role with variables and a few facts only. This is
useful when organizing the site.yml files and also adding the known
variables to infrastructure-playbooks.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-09-28 15:15:49 +00:00