ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Dimitri Savineau	070db68ffd	ceph-handler: don't restart all OSDs with limit When using the ansible --limit option on one or few OSD nodes and if the handler is triggered then we will restart the OSD service on all OSDs nodes instead of the hosts limited by the limit value. Even if the play is limited by the --limit value we are using all OSD nodes from the OSD group. with_items: '{{ groups[osd_group_name] }}' Instead we should iterate only on the nodes present in both OSD group and limit list. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `0346871fb5`)	2019-10-04 07:43:17 +02:00
Dimitri Savineau	28009496f6	ceph-handler: Fix osd restart condition In containerized deployment, the restart OSD handler couldn't be triggered in most ansible execution. This is due to the usage of run_once + a condition on the inventory hostname and the last filter. The run_once is triggered first so ansible will pick a node in the osd group to execute the restart task. But if this node isn't the last one in the osd group then the task is ignored. There's more probability that the task will be ignored than executed. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `5b1c15653f`)	2019-09-10 16:53:38 -04:00
Giulio Fidente	e0e9fa47df	Look for additional names when checking ceph-nfs container status Ganesha cannot be operated active/active, in those deployments where it is managed by pacemaker the container name can be different than the default. This change uses "ceph_nfs_service_suffix" where previously missing to ensure tasks will work with customized names. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1750005 Signed-off-by: Giulio Fidente <gfidente@redhat.com> (cherry picked from commit `d2a2bd7c42`)	2019-09-09 16:48:59 -04:00
Dimitri Savineau	bedc0ab69d	ceph-osd: use OSD id with systemd ceph-disk When using containerized deployment we have to create the systemd service unit based on a template. The current implementation with ceph-disk is using the device name as paramater to the systemd service and for the container name too. $ systemctl start ceph-osd@sdb $ docker ps --filter 'name=ceph-osd-' CONTAINER ID IMAGE NAMES 065530d0a27f ceph/daemon:latest-luminous ceph-osd-strg0-sdb This is the only scenario (compared to non containerized or ceph-volume based deployment) that isn't using the OSD id. $ systemctl start ceph-osd@0 $ docker ps --filter 'name=ceph-osd-' CONTAINER ID IMAGE NAMES d34552ec157e ceph/daemon:latest-luminous ceph-osd-0 Also if the device mapping doesn't persist to system reboot (ie sdb might be remapped to sde) then the OSD service won't come back after the reboot. This patch allows to use the OSD id with the ceph-osd systemd service but requires to activate the OSD manually with ceph-disk first in order to affect the ID to that OSD. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1670734 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-26 16:07:22 -04:00
Dimitri Savineau	94cdef2757	ceph-handler: Fix rgw socket in restart script If the SOCKET variable isn't defined in the script then the test command won't fail because the return code is 0 $ test -S $ echo $? 0 There multiple issues in that script: - The default SOCKET value isn't defined. - Update the wget parameters because the command is doing a loop. We now use the same option than curl. - The check_rest function doesn't test the radosgw at all due to a wrong test command (test against a string) and always returns 0. This needs to use the DOCKER_EXEC variable in order to execute the command. $ test 'wget http://192.168.100.11:8080' $ echo $? 0 Resolves: #3926 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `c90f605b51`)	2019-07-08 10:38:35 -04:00
Dimitri Savineau	9cc5d1e903	ceph-handler: Fix radosgw_address default value The rgw restart script set the RGW_IP variable depending on ansible variables: - radosgw_address - radosgw_address_block - radosgw_interface Those variables have default values defined in ceph-defaults role: radosgw_interface: interface radosgw_address: 0.0.0.0 radosgw_address_block: subnet But in the rgw restart script we always use the radosgw_address value instead of the radosgw_interface when defined because we aren't testing the right default value. As a consequence, the RGW_IP variable will be set to 0.0.0.0 even if the ip address associated to the radosgw_interface variable is set correctly. This causes the check_rest function to fail. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-07 07:24:38 +02:00
Dimitri Savineau	2b492e3de1	ceph-handler: Fix OSD restart script There's two big issues with the current OSD restart script. 1/ We try to test if the ceph osd daemon socket exists but we use a wildcard for the socket name : /var/run/ceph/*.asok. This fails because we usually have multiple ceph osd sockets (or other ceph daemon collocated) present in /var/run/ceph directory. Currently the test fails with: bash: line xxx: [: too many arguments But it doesn't stop the script execution. Instead we can specify the full ceph osd socket name because we already know the OSD id. 2/ The container filter pattern is wrong and could matches multiple containers resulting the script to fail. We use the filter with two different patterns. One is with the device name (sda, sdb, ..) and the other one is with the OSD id (ceph-osd-0, ceph-osd-15, ..). In both case we could match more than needed. $ docker container ls CONTAINER ID IMAGE NAMES 958121a7cc7d ceph-daemon:latest ceph-osd-strg0-sda 589a982d43b5 ceph-daemon:latest ceph-osd-strg0-sdb 46c7240d71f3 ceph-daemon:latest ceph-osd-strg0-sdaa 877985ec3aca ceph-daemon:latest ceph-osd-strg0-sdab $ docker container ls -q -f "name=sda" 958121a7cc7d 46c7240d71f3 877985ec3aca $ docker container ls CONTAINER ID IMAGE NAMES 2db399b3ee85 ceph-daemon:latest ceph-osd-5 099dc13f08f1 ceph-daemon:latest ceph-osd-13 5d0c2fe8f121 ceph-daemon:latest ceph-osd-17 d6c7b89db1d1 ceph-daemon:latest ceph-osd-1 $ docker container ls -q -f "name=ceph-osd-1" 099dc13f08f1 5d0c2fe8f121 d6c7b89db1d1 Adding an extra '$' character at the end of the pattern solves the problem. Finally removing the get_container_osd_id function because it's not used in the script at all. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `45d46541cb`)	2019-06-21 14:49:55 -04:00
Dimitri Savineau	95f3908e44	ceph-handler: replace fuser by /proc/net/unix We're using fuser command to see if a process is using a ceph unix socket file. But the fuser command runs through every PID present in /proc/<PID> to see if one of them is using the file. On a system running thousands processes, the fuser command can take a long time to finish. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1717011 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `da9891da1e`)	2019-06-12 23:00:21 +02:00
Dimitri Savineau	bbb8ca6643	mon/rgw: use last ipv6 address When using monitor_address_block or radosgw_address_block variables to configure the mon/rgw address we're getting the first ip address from the ansible facts present in that cidr. When there's VIP on that network the first filter could return the wrong value. This seems to affect only IPv6 setup because the VIP addresses are added to the ansible facts at the beginning of the list. This is the opposite (at the end) when using IPv4. This causes the mon/rgw processes to bind on the VIP address. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1680155 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-04-09 06:17:27 +02:00
Sébastien Han	2fca8555cc	handler: show unit logs on error This will tremendously help debugging daemons that fail on restart by showing the systemd unit logs. Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `a9b337ba66`)	2018-11-27 12:44:15 +00:00
Guillaume Abrioux	b953965399	handler: remove some leftover in restart_*_daemon.sh.j2 Remove some legacy in those restart script. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-16 11:53:55 +00:00
Guillaume Abrioux	60bc1e38db	handler: fix osd containers handler `ceph_osd_container_stat` might not be set on other osd node. We must ensure we are on the last node before trying to evaluate `ceph_osd_container_stat`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-15 10:30:40 +02:00
Guillaume Abrioux	40b7747af7	remove jewel support As of now, we should no longer support Jewel in ceph-ansible. The latest ceph-ansible release supporting Jewel is `stable-3.1`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-12 23:38:17 +00:00
Sébastien Han	2bea8d8ecf	handler: add support for ceph-volume containerized restart The restart script wasn't working with the current new addition of ceph-volume in container where now OSDs have the OSD id name in the container name. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-10 16:08:41 -04:00
Sébastien Han	790f52f934	ceph-handler: change osd container check Now that the container is named ceph-osd@<id> looking for something that contains a host is not necessary. This is also backward compatible as it will continue to match container names with hostname in them. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-10 16:08:41 -04:00
Sébastien Han	4db6a213f7	add ceph-handler role The role contains all the handlers for Ceph services. We decided to leave ceph-defaults role with variables and a few facts only. This is useful when organizing the site.yml files and also adding the known variables to infrastructure-playbooks. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-09-28 15:15:49 +00:00

16 Commits (2d40e3923f61c24985ae1c4e515048a332672c49)