ceph-ansible/roles/ceph-osd/tasks/start_osds.yml

---
- block:
  # For openstack VMs modify the mount point below depending on if the Openstack
  # VM deploy tool defaults to mounting ephemeral disks
  - name: umount ceph disk (if on openstack)
    mount:
      name: /mnt
      src: /dev/vdb
      fstype: ext3
      state: unmounted
    when:
      - ceph_docker_on_openstack

  - name: test if the container image has the disk_list function
    command: "{{ container_binary }} run --rm --net=host --entrypoint=stat {{ ceph_docker_registry }}/{{ ceph_docker_image }}:{{ ceph_docker_image_tag }} disk_list.sh"
    changed_when: false
    failed_when: false
    register: disk_list
    when:
      - osd_scenario != 'lvm'

  - name: generate ceph osd docker run script
    become: true
    template:
      src: "{{ role_path }}/templates/ceph-osd-run.sh.j2"
      dest: "{{ ceph_osd_docker_run_script_path }}/ceph-osd-run.sh"
      owner: "root"
      group: "root"
      mode: "0744"
      setype: "bin_t"
    notify:
      - restart ceph osds
  when:
    - containerized_deployment

# this is for ceph-disk, the ceph-disk command is gone so we have to list /var/lib/ceph
- name: get osd ids
  shell: |
    ls /var/lib/ceph/osd/ | sed 's/.*-//'
  changed_when: false
  register: osd_ids_non_container

- name: set_fact docker_exec_start_osd
  set_fact:
    docker_exec_start_osd: "{{ '{{ container_binary }} run --rm --net=host --privileged=true -v /var/run/udev/:/var/run/udev/:z -v /run/lvm/:/run/lvm/ -v /etc/ceph:/etc/ceph:z -v /dev:/dev --entrypoint=ceph-volume ' + ceph_docker_registry + '/' + ceph_docker_image + ':' + ceph_docker_image_tag if containerized_deployment else 'ceph-volume' }}"

- name: collect osd ids
  shell: >
    {{ docker_exec_start_osd }} lvm list --format json
  changed_when: false
  failed_when: false
  register: ceph_osd_ids

- name: generate systemd unit file
  become: true
  template:
    src: "{{ role_path }}/templates/ceph-osd.service.j2"
    dest: /etc/systemd/system/ceph-osd@.service
    owner: "root"
    group: "root"
    mode: "0644"
  notify:
    - restart ceph osds
  when:
    - containerized_deployment

- name: systemd start osd
  systemd:
    name: ceph-osd@{{ item | regex_replace('/dev/', '') if osd_scenario != 'lvm' and containerized_deployment else item }}
    state: started
    enabled: yes
    daemon_reload: yes
  with_items: "{{ devices if osd_scenario != 'lvm' and containerized_deployment else ((ceph_osd_ids.stdout | from_json).keys() | list) if osd_scenario == 'lvm' and not containerized_deployment else osd_ids_non_container.stdout_lines }}"

- name: ensure systemd service override directory exists
  file:
    state: directory
    path: "/etc/systemd/system/ceph-osd@.service.d/"
  when:
    - ceph_osd_systemd_overrides is defined
    - ansible_service_mgr == 'systemd'

- name: add ceph-osd systemd service overrides
  config_template:
    src: "ceph-osd.service.d-overrides.j2"
    dest: "/etc/systemd/system/ceph-osd@.service.d/ceph-osd-systemd-overrides.conf"
    config_overrides: "{{ ceph_osd_systemd_overrides | default({}) }}"
    config_type: "ini"
  when:
    - ceph_osd_systemd_overrides is defined
    - ansible_service_mgr == 'systemd'
lvm_osds: ensure osd daemons are started Signed-off-by: Andrew Schoen <aschoen@redhat.com> 2017-07-26 05:48:13 +08:00			`---`
osd: commonize start_osd code since `ceph-volume` introduction, there is no need to split those tasks. Let's refact this part of the code so it's clearer. By the way, this was breaking rolling_update.yml when `openstack_config: true` playbook because nothing ensured OSDs were started in ceph-osd role (In `openstack_config.yml` there is a check ensuring all OSD are UP which was obviously failing) and resulted with OSDs on the last OSD node not started anyway. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> 2018-11-07 18:45:29 +08:00			`- block:`
			`# For openstack VMs modify the mount point below depending on if the Openstack`
			`# VM deploy tool defaults to mounting ephemeral disks`
			`- name: umount ceph disk (if on openstack)`
			`mount:`
			`name: /mnt`
			`src: /dev/vdb`
			`fstype: ext3`
			`state: unmounted`
			`when:`
			`- ceph_docker_on_openstack`

osd: re-introduce disk_list check This commit https://github.com/ceph/ceph-ansible/commit/4cc1506303739f13bb7a6e1022646ef90e004c90#diff-51bbe3572e46e3b219ad726da44b64ebL13 accidentally removed this check. This is a must have for ceph-disk based containerized OSDs. Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit 9b5a93e3a58bff07ce965ce2d6dabd4060537b5c) 2018-11-29 07:10:29 +08:00			`- name: test if the container image has the disk_list function`
Fix CNI error when net=host is not used on OSD calls Follow up fix that 410abd7 missed. Related: ceph#3561 Signed-off-by: John Fulton <fulton@redhat.com> 2019-02-06 04:28:37 +08:00			`command: "{{ container_binary }} run --rm --net=host --entrypoint=stat {{ ceph_docker_registry }}/{{ ceph_docker_image }}:{{ ceph_docker_image_tag }} disk_list.sh"`
osd: re-introduce disk_list check This commit https://github.com/ceph/ceph-ansible/commit/4cc1506303739f13bb7a6e1022646ef90e004c90#diff-51bbe3572e46e3b219ad726da44b64ebL13 accidentally removed this check. This is a must have for ceph-disk based containerized OSDs. Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit 9b5a93e3a58bff07ce965ce2d6dabd4060537b5c) 2018-11-29 07:10:29 +08:00			`changed_when: false`
			`failed_when: false`
			`register: disk_list`
			`when:`
			`- osd_scenario != 'lvm'`

osd: commonize start_osd code since `ceph-volume` introduction, there is no need to split those tasks. Let's refact this part of the code so it's clearer. By the way, this was breaking rolling_update.yml when `openstack_config: true` playbook because nothing ensured OSDs were started in ceph-osd role (In `openstack_config.yml` there is a check ensuring all OSD are UP which was obviously failing) and resulted with OSDs on the last OSD node not started anyway. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> 2018-11-07 18:45:29 +08:00			`- name: generate ceph osd docker run script`
			`become: true`
			`template:`
			`src: "{{ role_path }}/templates/ceph-osd-run.sh.j2"`
			`dest: "{{ ceph_osd_docker_run_script_path }}/ceph-osd-run.sh"`
			`owner: "root"`
			`group: "root"`
			`mode: "0744"`
Add new container scenario Test with podman instead of docker and also support for python 3 only. Signed-off-by: Sébastien Han <seb@redhat.com> 2018-11-08 17:02:37 +08:00			`setype: "bin_t"`
osd: commonize start_osd code since `ceph-volume` introduction, there is no need to split those tasks. Let's refact this part of the code so it's clearer. By the way, this was breaking rolling_update.yml when `openstack_config: true` playbook because nothing ensured OSDs were started in ceph-osd role (In `openstack_config.yml` there is a check ensuring all OSD are UP which was obviously failing) and resulted with OSDs on the last OSD node not started anyway. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> 2018-11-07 18:45:29 +08:00			`notify:`
			`- restart ceph osds`
			`when:`
			`- containerized_deployment`

osd: manage legacy ceph-disk non-container startup The code is now able (again) to start osds that where configured with ceph-disk on a non-container scenario. Closes: https://github.com/ceph/ceph-ansible/issues/3388 Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit 452069cb3a2d0ee11552f88924474e3608f7d912) 2018-11-29 21:59:25 +08:00			`# this is for ceph-disk, the ceph-disk command is gone so we have to list /var/lib/ceph`
			`- name: get osd ids`
			`shell: \|`
			`ls /var/lib/ceph/osd/ \| sed 's/.*-//'`
Add changed_when: false to the "get osd ids" statement The "get osd ids" statement only registers the osd_ids_non_container variable. Running "ls /var/lib/ceph/osd/ \| sed 's/.*-//'" should never produce a change on the system. Adding changed_when: false prevents irrelevant change messages from Ansible. Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu> 2019-03-01 04:57:03 +08:00			`changed_when: false`
osd: manage legacy ceph-disk non-container startup The code is now able (again) to start osds that where configured with ceph-disk on a non-container scenario. Closes: https://github.com/ceph/ceph-ansible/issues/3388 Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit 452069cb3a2d0ee11552f88924474e3608f7d912) 2018-11-29 21:59:25 +08:00			`register: osd_ids_non_container`

osd: commonize start_osd code since `ceph-volume` introduction, there is no need to split those tasks. Let's refact this part of the code so it's clearer. By the way, this was breaking rolling_update.yml when `openstack_config: true` playbook because nothing ensured OSDs were started in ceph-osd role (In `openstack_config.yml` there is a check ensuring all OSD are UP which was obviously failing) and resulted with OSDs on the last OSD node not started anyway. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> 2018-11-07 18:45:29 +08:00			`- name: set_fact docker_exec_start_osd`
			`set_fact:`
Fix CNI error when net=host is not used on OSD calls Follow up fix that 410abd7 missed. Related: ceph#3561 Signed-off-by: John Fulton <fulton@redhat.com> 2019-02-06 04:28:37 +08:00			`docker_exec_start_osd: "{{ '{{ container_binary }} run --rm --net=host --privileged=true -v /var/run/udev/:/var/run/udev/:z -v /run/lvm/:/run/lvm/ -v /etc/ceph:/etc/ceph:z -v /dev:/dev --entrypoint=ceph-volume ' + ceph_docker_registry + '/' + ceph_docker_image + ':' + ceph_docker_image_tag if containerized_deployment else 'ceph-volume' }}"`
osd: commonize start_osd code since `ceph-volume` introduction, there is no need to split those tasks. Let's refact this part of the code so it's clearer. By the way, this was breaking rolling_update.yml when `openstack_config: true` playbook because nothing ensured OSDs were started in ceph-osd role (In `openstack_config.yml` there is a check ensuring all OSD are UP which was obviously failing) and resulted with OSDs on the last OSD node not started anyway. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> 2018-11-07 18:45:29 +08:00
			`- name: collect osd ids`
			`shell: >`
			`{{ docker_exec_start_osd }} lvm list --format json`
lvm_osds: ensure osd daemons are started Signed-off-by: Andrew Schoen <aschoen@redhat.com> 2017-07-26 05:48:13 +08:00			`changed_when: false`
			`failed_when: false`
osd: commonize start_osd code since `ceph-volume` introduction, there is no need to split those tasks. Let's refact this part of the code so it's clearer. By the way, this was breaking rolling_update.yml when `openstack_config: true` playbook because nothing ensured OSDs were started in ceph-osd role (In `openstack_config.yml` there is a check ensuring all OSD are UP which was obviously failing) and resulted with OSDs on the last OSD node not started anyway. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> 2018-11-07 18:45:29 +08:00			`register: ceph_osd_ids`

			`- name: generate systemd unit file`
			`become: true`
			`template:`
			`src: "{{ role_path }}/templates/ceph-osd.service.j2"`
			`dest: /etc/systemd/system/ceph-osd@.service`
			`owner: "root"`
			`group: "root"`
			`mode: "0644"`
			`notify:`
			`- restart ceph osds`
Do not search osd ids if ceph-volume Description of problem: The 'get osd id' task goes through all the 10 times (and its respective timeouts) to make sure that the number of OSDs in the osd directory match the number of devices. This happens always, regardless if the setup and deployment is correct. Version-Release number of selected component (if applicable): Surely the latest. But any ceph-ansible version that contains ceph-volume support is affected. How reproducible: 100% Steps to Reproduce: 1. Use ceph-volume (LVM) to deploy OSDs 2. Avoid using anything in the 'devices' section 3. Deploy the cluster Actual results: TASK [ceph-osd : get osd id _uses_shell=True, _raw_params=ls /var/lib/ceph/osd/ \| sed 's/.-//'] ********************************************************************************************************************************************* task path: /Users/alfredo/python/upstream/ceph/src/ceph-volume/ceph_volume/tests/functional/lvm/.tox/xenial-filestore-dmcrypt/tmp/ceph-ansible/roles/ceph-osd/tasks/start_osds.yml:6 FAILED - RETRYING: get osd id (10 retries left). FAILED - RETRYING: get osd id (9 retries left). FAILED - RETRYING: get osd id (8 retries left). FAILED - RETRYING: get osd id (7 retries left). FAILED - RETRYING: get osd id (6 retries left). FAILED - RETRYING: get osd id (5 retries left). FAILED - RETRYING: get osd id (4 retries left). FAILED - RETRYING: get osd id (3 retries left). FAILED - RETRYING: get osd id (2 retries left). FAILED - RETRYING: get osd id (1 retries left). ok: [osd0] => { "attempts": 10, "changed": false, "cmd": "ls /var/lib/ceph/osd/ \| sed 's/.*-//'", "delta": "0:00:00.002717", "end": "2018-01-21 18:10:31.237933", "failed": true, "failed_when_result": false, "rc": 0, "start": "2018-01-21 18:10:31.235216" } STDOUT: 0 1 2 Expected results: There aren't any (or just a few) timeouts while the OSDs are found Additional info: This is happening because the check is mapping the number of "devices" defined for ceph-disk (in this case it would be 0) to match the number of OSDs found. Basically this line: until: osd_id.stdout_lines\|length == devices\|unique\|length Means in this 2 OSD case it is trying to ensure the following incorrect condition: until: 2 == 0 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1537103 2018-01-29 21:28:23 +08:00			`when:`
osd: commonize start_osd code since `ceph-volume` introduction, there is no need to split those tasks. Let's refact this part of the code so it's clearer. By the way, this was breaking rolling_update.yml when `openstack_config: true` playbook because nothing ensured OSDs were started in ceph-osd role (In `openstack_config.yml` there is a check ensuring all OSD are UP which was obviously failing) and resulted with OSDs on the last OSD node not started anyway. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> 2018-11-07 18:45:29 +08:00			`- containerized_deployment`

			`- name: systemd start osd`
			`systemd:`
osd: manage legacy ceph-disk non-container startup The code is now able (again) to start osds that where configured with ceph-disk on a non-container scenario. Closes: https://github.com/ceph/ceph-ansible/issues/3388 Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit 452069cb3a2d0ee11552f88924474e3608f7d912) 2018-11-29 21:59:25 +08:00			`name: ceph-osd@{{ item \| regex_replace('/dev/', '') if osd_scenario != 'lvm' and containerized_deployment else item }}`
osd: commonize start_osd code since `ceph-volume` introduction, there is no need to split those tasks. Let's refact this part of the code so it's clearer. By the way, this was breaking rolling_update.yml when `openstack_config: true` playbook because nothing ensured OSDs were started in ceph-osd role (In `openstack_config.yml` there is a check ensuring all OSD are UP which was obviously failing) and resulted with OSDs on the last OSD node not started anyway. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> 2018-11-07 18:45:29 +08:00			`state: started`
			`enabled: yes`
			`daemon_reload: yes`
start_osds: use list instead of keys (re-introduce) the python3 fix merged by: https://github.com/ceph/ceph-ansible/pull/3346 was reintroduced a few days later by: https://github.com/ceph/ceph-ansible/commit/82a6b5adec4d72eb4b7219147f2225b7b2904460 and this patch fixes it again :) Signed-off-by: Noah Watkins <nwatkins@redhat.com> 2018-12-06 06:04:48 +08:00			`with_items: "{{ devices if osd_scenario != 'lvm' and containerized_deployment else ((ceph_osd_ids.stdout \| from_json).keys() \| list) if osd_scenario == 'lvm' and not containerized_deployment else osd_ids_non_container.stdout_lines }}"`
lvm_osds: ensure osd daemons are started Signed-off-by: Andrew Schoen <aschoen@redhat.com> 2017-07-26 05:48:13 +08:00
Allow ceph service systemd overrides to be specified ceph services can fail to start under certain circumstances (for example, when running in a container) because the default systemd service configuration causes namespace issues. To work around this we can override the system service settings by placing an overrides file in the ceph-<service>@.service.d directory. This can be generic so as to allow any potential changes required to the ceph-<service> service files. The overrides file is only setup when the "ceph_<service>_systemd_overrides" config_template override variable is specified. The available service systemd override files are as follows: ceph_mds_systemd_overrides ceph_mgr_systemd_overrides ceph_mon_systemd_overrides ceph_osd_systemd_overrides ceph_rbd_mirror_systemd_overrides ceph_rgw_systemd_overrides 2017-07-05 21:47:48 +08:00			`- name: ensure systemd service override directory exists`
			`file:`
			`state: directory`
			`path: "/etc/systemd/system/ceph-osd@.service.d/"`
			`when:`
			`- ceph_osd_systemd_overrides is defined`
			`- ansible_service_mgr == 'systemd'`

			`- name: add ceph-osd systemd service overrides`
			`config_template:`
			`src: "ceph-osd.service.d-overrides.j2"`
			`dest: "/etc/systemd/system/ceph-osd@.service.d/ceph-osd-systemd-overrides.conf"`
			`config_overrides: "{{ ceph_osd_systemd_overrides \| default({}) }}"`
			`config_type: "ini"`
			`when:`
			`- ceph_osd_systemd_overrides is defined`
start_osds: Use list instead of keys If you use python3 based ansible then keys() returns a dict_keys object, not a list of keys. This breaks the installation on such a system. Using the list filter provides a more robust solution that should work on both python2 and python3 based ansible. You can find some more information about the issue, here: https://github.com/ansible/ansible/issues/19514 Signed-off-by: Boris Ranto <branto@redhat.com> 2018-11-20 07:45:40 +08:00			`- ansible_service_mgr == 'systemd'`