ceph-ansible/roles/ceph-common/templates/restart_osd_daemon.sh.j2

#!/bin/bash

RETRIES="{{ handler_health_osd_check_retries }}"
DELAY="{{ handler_health_osd_check_delay }}"
CEPH_CLI="--name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/{{ cluster }}.keyring --cluster {{ cluster }}"

check_pgs() {
  while [ $RETRIES -ne 0 ]; do
    test "[""$(ceph $CEPH_CLI -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)["pgmap"]["num_pgs"])')""]" = "$(ceph $CEPH_CLI -s -f json | python -c 'import sys, json; print [ i["count"] for i in json.load(sys.stdin)["pgmap"]["pgs_by_state"] if i["state_name"] == "active+clean"]')"
    RET=$?
    test $RET -eq 0 && return 0
    sleep $DELAY
    let RETRIES=RETRIES-1
  done
  # PGs not clean, exiting with return code 1
  echo "Error while running 'ceph $CEPH_CLI -s', PGs were not reported as active+clean"
  echo "It is possible that the cluster has less OSDs than the replica configuration"
  echo "Will refuse to continue"
  ceph $CEPH_CLI -s
  exit 1
}

for id in $(ls /var/lib/ceph/osd/ | sed 's/.*-//'); do
  # First, restart daemon(s)
  systemctl restart ceph-osd@${id}
  # We need to wait because it may take some time for the socket to actually exists
  COUNT=10
  # Wait and ensure the socket exists after restarting the daemon
  SOCKET=/var/run/ceph/{{ cluster }}-osd.${id}.asok
  while [ $COUNT -ne 0 ]; do
    test -S $SOCKET && check_pgs && continue 2
    sleep 1
    let COUNT=COUNT-1
  done
  # If we reach this point, it means the socket is not present.
  echo "Socket file ${SOCKET} could not be found, which means the osd daemon is not running."
  exit 1
done
Common: Fix handlers that are not properly triggered. Until now, only the first task were executed. The idea here is to use `listen` statement to be able to notify multiple handler and regroup all of them in `./handlers/main.yml` as notifying an included handler task is not possible. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> 2017-04-04 01:55:11 +08:00			`#!/bin/bash`

			`RETRIES="{{ handler_health_osd_check_retries }}"`
			`DELAY="{{ handler_health_osd_check_delay }}"`
			`CEPH_CLI="--name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/{{ cluster }}.keyring --cluster {{ cluster }}"`

			`check_pgs() {`
			`while [ $RETRIES -ne 0 ]; do`
Make the new check PGs working with /bin/sh The new test in the checks PGs are no longer working on distributions where /bin/sh isn't linked to /bin/bash. Fix: #1619 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> 2017-06-22 23:42:03 +08:00			`test "[""$(ceph $CEPH_CLI -s -f json \| python -c 'import sys, json; print(json.load(sys.stdin)["pgmap"]["num_pgs"])')""]" = "$(ceph $CEPH_CLI -s -f json \| python -c 'import sys, json; print [ i["count"] for i in json.load(sys.stdin)["pgmap"]["pgs_by_state"] if i["state_name"] == "active+clean"]')"`
Common: Fix handlers that are not properly triggered. Until now, only the first task were executed. The idea here is to use `listen` statement to be able to notify multiple handler and regroup all of them in `./handlers/main.yml` as notifying an included handler task is not possible. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> 2017-04-04 01:55:11 +08:00			`RET=$?`
Restart all OSDs and do not stop after the first one. The current handler only restarts one OSD on each OSD server. After the first one the handler stops, not matter what results the checks had. Co-Authored-By: Gaudenz Steinlin (@gaudenz) 2017-06-12 16:30:22 +08:00			`test $RET -eq 0 && return 0`
Common: Fix handlers that are not properly triggered. Until now, only the first task were executed. The idea here is to use `listen` statement to be able to notify multiple handler and regroup all of them in `./handlers/main.yml` as notifying an included handler task is not possible. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> 2017-04-04 01:55:11 +08:00			`sleep $DELAY`
			`let RETRIES=RETRIES-1`
			`done`
			`# PGs not clean, exiting with return code 1`
ceph-common: improve error message on restart osd daemon script Signed-off-by: Alfredo Deza <adeza@redhat.com> 2017-05-12 23:01:03 +08:00			`echo "Error while running 'ceph $CEPH_CLI -s', PGs were not reported as active+clean"`
			`echo "It is possible that the cluster has less OSDs than the replica configuration"`
			`echo "Will refuse to continue"`
			`ceph $CEPH_CLI -s`
Common: Fix handlers that are not properly triggered. Until now, only the first task were executed. The idea here is to use `listen` statement to be able to notify multiple handler and regroup all of them in `./handlers/main.yml` as notifying an included handler task is not possible. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> 2017-04-04 01:55:11 +08:00			`exit 1`
			`}`

			`for id in $(ls /var/lib/ceph/osd/ \| sed 's/.*-//'); do`
			`# First, restart daemon(s)`
			`systemctl restart ceph-osd@${id}`
			`# We need to wait because it may take some time for the socket to actually exists`
			`COUNT=10`
			`# Wait and ensure the socket exists after restarting the daemon`
			`SOCKET=/var/run/ceph/{{ cluster }}-osd.${id}.asok`
			`while [ $COUNT -ne 0 ]; do`
Restart all OSDs and do not stop after the first one. The current handler only restarts one OSD on each OSD server. After the first one the handler stops, not matter what results the checks had. Co-Authored-By: Gaudenz Steinlin (@gaudenz) 2017-06-12 16:30:22 +08:00			`test -S $SOCKET && check_pgs && continue 2`
Common: Fix handlers that are not properly triggered. Until now, only the first task were executed. The idea here is to use `listen` statement to be able to notify multiple handler and regroup all of them in `./handlers/main.yml` as notifying an included handler task is not possible. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> 2017-04-04 01:55:11 +08:00			`sleep 1`
			`let COUNT=COUNT-1`
			`done`
			`# If we reach this point, it means the socket is not present.`
Common: Restore check_socket Restore the check_socket that was removed by `5bec62b`. This commit also improves the logging in `restart_*_daemon.sh` scripts Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> 2017-04-18 18:40:43 +08:00			`echo "Socket file ${SOCKET} could not be found, which means the osd daemon is not running."`
Common: Fix handlers that are not properly triggered. Until now, only the first task were executed. The idea here is to use `listen` statement to be able to notify multiple handler and regroup all of them in `./handlers/main.yml` as notifying an included handler task is not possible. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> 2017-04-04 01:55:11 +08:00			`exit 1`
			`done`