the `stat --printf=%n` returns something like following:
```
ok: [osd0] => changed=false
cmd: |-
stat --printf=%n /var/run/ceph/ceph-osd*.asok
delta: '0:00:00.009388'
end: '2020-10-06 06:18:28.109500'
failed_when_result: false
rc: 0
start: '2020-10-06 06:18:28.100112'
stderr: ''
stderr_lines: <omitted>
stdout: /var/run/ceph/ceph-osd.2.asok/var/run/ceph/ceph-osd.5.asok
stdout_lines: <omitted>
```
it makes the next task "check if the ceph osd socket is in-use" grep
like this:
```
ok: [osd0] => changed=false
cmd:
- grep
- -q
- /var/run/ceph/ceph-osd.2.asok/var/run/ceph/ceph-osd.5.asok
- /proc/net/unix
```
which will obviously fail because this path never exists. It makes the
OSD handler broken.
Let's use `find` module instead.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When rgw and osd are collocated, the current workflow prevents from
scaling out the radosgw_num_instances parameter when rerunning the
playbook in baremetal deployments.
When ceph-osd notifies handlers, it means rgw handlers are triggered
too. The issue with this is that they are triggered before the role
ceph-rgw is run.
In the case a scaleout operation is expected on `radosgw_num_instances`
it causes an issue because keyrings haven't been created yet so the new
instances won't start.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1881313
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
In non containerized deployment we check if the service is running
via the socket file presence.
This is done via the xxx_socket_stat variable that check the file
socket in the /var/run/ceph/ directory.
In some scenarios, we could have the socket file still present in
that directory but not used by any process.
That's why we have the xxx_stat variable which clean those leftovers.
The problem here is that we're set the variable for the handlers status
(like handler_mon_status) based on xxx_socket_stat instead of xxx_stat.
That means we will trigger the handlers if there's an old socket file
present on the system without any process associated.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1866834
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This commit introduces a new role `ceph-crash` in order to deploy
everything needed for the ceph-crash daemon.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit merges the two restart tasks into a single one, this way
it's one task less to notify.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The role contains all the handlers for Ceph services. We decided to
leave ceph-defaults role with variables and a few facts only. This is
useful when organizing the site.yml files and also adding the known
variables to infrastructure-playbooks.
Signed-off-by: Sébastien Han <seb@redhat.com>