ceph-ansible/roles/ceph-osd/tasks
Sébastien Han 5132cc3de4 Do not search osd ids if ceph-volume
Description of problem: The 'get osd id' task goes through all the 10 times (and its respective timeouts) to make sure that the number of OSDs in the osd directory match the number of devices.

This happens always, regardless if the setup and deployment is correct.

Version-Release number of selected component (if applicable): Surely the latest. But any ceph-ansible version that contains ceph-volume support is affected.

How reproducible: 100%

Steps to Reproduce:
1. Use ceph-volume (LVM) to deploy OSDs
2. Avoid using anything in the 'devices' section
3. Deploy the cluster

Actual results:
TASK [ceph-osd : get osd id _uses_shell=True, _raw_params=ls /var/lib/ceph/osd/ | sed 's/.*-//'] **********************************************************************************************************************************************
task path: /Users/alfredo/python/upstream/ceph/src/ceph-volume/ceph_volume/tests/functional/lvm/.tox/xenial-filestore-dmcrypt/tmp/ceph-ansible/roles/ceph-osd/tasks/start_osds.yml:6
FAILED - RETRYING: get osd id (10 retries left).
FAILED - RETRYING: get osd id (9 retries left).
FAILED - RETRYING: get osd id (8 retries left).
FAILED - RETRYING: get osd id (7 retries left).
FAILED - RETRYING: get osd id (6 retries left).
FAILED - RETRYING: get osd id (5 retries left).
FAILED - RETRYING: get osd id (4 retries left).
FAILED - RETRYING: get osd id (3 retries left).
FAILED - RETRYING: get osd id (2 retries left).
FAILED - RETRYING: get osd id (1 retries left).
ok: [osd0] => {
    "attempts": 10,
    "changed": false,
    "cmd": "ls /var/lib/ceph/osd/ | sed 's/.*-//'",
    "delta": "0:00:00.002717",
    "end": "2018-01-21 18:10:31.237933",
    "failed": true,
    "failed_when_result": false,
    "rc": 0,
    "start": "2018-01-21 18:10:31.235216"
}

STDOUT:

0
1
2

Expected results:
There aren't any (or just a few) timeouts while the OSDs are found

Additional info:
This is happening because the check is mapping the number of "devices" defined for ceph-disk (in this case it would be 0) to match the number of OSDs found.

Basically this line:

    until: osd_id.stdout_lines|length == devices|unique|length

Means in this 2 OSD case it is trying to ensure the following incorrect condition:

    until: 2 == 0

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1537103
2018-01-30 14:44:38 +01:00
..
docker container: trigger handlers on systemd file change 2018-01-10 16:46:42 +01:00
scenarios ceph-osd: adds dmcrypt to the lvm scenario 2018-01-24 14:10:08 +01:00
activate_osds.yml osd: skip devices marked as '/dev/dead' 2018-01-11 17:34:32 +01:00
build_devices.yml tests: remove OSD_FORCE_ZAP variable from tests 2017-11-14 17:55:01 +01:00
ceph_disk_cli_options_facts.yml Remove jinja2 delimiters from `when` keys 2017-10-12 11:27:42 -05:00
check_gpt.yml osd: fix check gpt 2017-12-20 17:42:45 +01:00
check_mandatory_vars.yml ceph-osd lvm scnearios are no longer limited to filestore 2017-10-25 08:23:45 -04:00
copy_configs.yml Use check_mode instead of always_run 2017-10-25 09:53:34 -05:00
main.yml osd: ensure a gpt label is set on device 2017-11-17 17:32:23 +01:00
osd_fragment.yml Use check_mode instead of always_run 2017-10-25 09:53:34 -05:00
pre_requisite.yml osd: remove leftover and fix a typo 2017-11-21 11:11:34 +01:00
start_osds.yml Do not search osd ids if ceph-volume 2018-01-30 14:44:38 +01:00