Commit Graph

427 Commits (690336aabdd1f25df4616e3bba348f1d3302b4ae)

Author SHA1 Message Date
Andrew Schoen f1e04835f4 rolling_update: ceph commands should use --cluster
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit e2529dcd7f)
2019-04-18 19:12:13 +02:00
Andrew Schoen 545d93aae8 rolling_update: set num_osds to the number of running osds
We do this so that the ceph-config role can most accurately
report the number of osds for the generation of the ceph.conf
file.

We don't want to use ceph-volume to determine the number of
osds because in an upgrade to nautilus ceph-volume won't be able to
accurately count osds created by ceph-disk.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 67453853ff)
2019-04-18 19:12:13 +02:00
Andrew Schoen c28388bb06 rolling_update: migrate ceph-disk osds to ceph-volume
When upgrading to nautlius run ``ceph-volume simple scan`` and
``ceph-volume simple activate --all`` to migrate any running
ceph-disk osds to ceph-volume.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1656460

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 28c47e4d1b)
2019-04-18 19:12:13 +02:00
Guillaume Abrioux 35afd6a63a update: ensure tasks are executed on an upgraded mon
These tasks must be run from a monitor which is upgraded otherwise it
might fail.
See: https://tracker.ceph.com/issues/39355

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7eb42c9e8e)
2019-04-18 19:10:10 +02:00
Guillaume Abrioux 495711f296 update: ensure ceph command returns 0
these commands could return something else than 0.
Let's ensure all retries have been done before actually failing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ed84325b1d)
2019-04-18 19:10:10 +02:00
Guillaume Abrioux 4a678ac102 update: set osd flags before upgrading any mon
Typical error:

```
failed: [mon0 -> mon2] (item=noout) => changed=true
  cmd:
  - ceph
  - --cluster
  - ceph
  - osd
  - set
  - noout
  delta: '0:00:00.293756'
  end: '2019-04-17 06:31:57.552386'
  item: noout
  msg: non-zero return code
  rc: 1
  start: '2019-04-17 06:31:57.258630'
  stderr: |-
    Traceback (most recent call last):
      File "/bin/ceph", line 1222, in <module>
        retval = main()
      File "/bin/ceph", line 1146, in main
        sigdict = parse_json_funcsigs(outbuf.decode('utf-8'), 'cli')
      File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs
        cmd['sig'] = parse_funcsig(cmd['sig'])
      File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig
        raise JsonFormat(s)
    ceph_argparse.JsonFormat: unknown type CephBool
  stderr_lines:
  - 'Traceback (most recent call last):'
  - '  File "/bin/ceph", line 1222, in <module>'
  - '    retval = main()'
  - '  File "/bin/ceph", line 1146, in main'
  - '    sigdict = parse_json_funcsigs(outbuf.decode(''utf-8''), ''cli'')'
  - '  File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs'
  - '    cmd[''sig''] = parse_funcsig(cmd[''sig''])'
  - '  File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig'
  - '    raise JsonFormat(s)'
  - 'ceph_argparse.JsonFormat: unknown type CephBool'
  stdout: ''
  stdout_lines: <omitted>
```

Having mixed versions of monitors seems to cause this error.
Moving these tasks before any monitor gets upgraded seems to be enough
to get around this issue.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 543d1e2e41)
2019-04-18 19:10:10 +02:00
Rishabh Dave 72309b49fe allow adding a monitor to a deployed cluster
Add a playbook that deploys a new monitor on a new node, adds that node
to the Ceph cluster and the monitor to the quorum and updates the ceph
configuration file on OSD nodes.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit d5967af7fb)
2019-04-16 11:14:21 +02:00
Dimitri Savineau 1c3fbe5a60 purge-cluster: remove python-ceph-argparse package
When using purge-cluster playbook with nautilus, there's still the
python-ceph-argparse package installed on the host preventing to
reinstall a ceph cluster with a different version (like luminous or
mimic)

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit eb658b3af6)
2019-04-15 17:32:22 +02:00
Dimitri Savineau f90c051589 switch-from-non-containerized: stop all osds
e6bfb84 introduced a regression in the switch from non containerized
to container deployment.
We need to stop all previous OSDs services. We just don't need the
ceph-disk pattern in the regex.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 150acba8c5)
2019-04-12 00:45:21 +00:00
Guillaume Abrioux f8c544c4a8 purge: remove references to ceph-disk
as of stable-4.0, ceph-disk is no longer supported.
These tasks aren't needed anymore.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a1254d767c)
2019-04-12 00:45:21 +00:00
Guillaume Abrioux f1ede335e4 shrink-osd: remove legacy playbook
as of stable-4.0, ceph-disk is no longer supported.
Let's remove this legacy version of the playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 73aa788459)
2019-04-12 00:45:21 +00:00
Guillaume Abrioux f5478dcc0b switch_to_containers: remove ceph-disk references
as of stable-4.0, ceph-disk is no longer supported.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e6bfb843f4)
2019-04-12 00:45:21 +00:00
Guillaume Abrioux 4a663e1fc0 osd: remove variable osd_scenario
As of stable-4.0, the only valid scenario is `lvm`.
Thus, this makes this variable useless.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4d35e9eeed)
2019-04-12 00:45:21 +00:00
Guillaume Abrioux 2581c4d511 update: fix undefined error when no mgr group is declared
if mgr group isn't defined in inventory, that task will fail with
undefined error.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c1e4529b0e)
2019-04-11 09:20:22 -04:00
Dimitri Savineau 532d749b2e rolling_update: Remove ceph aliases
ceph aliases have been introduced in stable-3.2 during the ceph
deployment. On master this has been removed but we don't handle
this removal in the upgrade from stable-3.2 to master via the
rolling_update playbook.
Also remove the task from purge-docker-cluster missing from
d9e7835

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 57b4e76d11)
2019-04-10 00:02:35 +00:00
Guillaume Abrioux b723ef3fa2 purge: fix lvm-batch purge osd
`lvm_volumes` and/or `devices` variable(s) can be undefined depending on
the scenario chosen.

These tasks should be run only if these variable are defined, otherwise
it ends up with undefined variable errors.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0180738313)
2019-04-04 03:38:52 +02:00
Guillaume Abrioux f55e2b08be remove all NBSPs on master branch
Similar to #3658

Since there's too many changes between master and stable branches let's
commit directly in each branches instead of trying to backport this
commit.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-28 11:57:55 +00:00
Dimitri Savineau c8442f3705 rolling_update: Update systemd unit regex for nvme
The systemd unit regex doesn't handle nvme devices (/dev/nvmeXn1).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1687828

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-03-26 12:01:00 +00:00
Guillaume Abrioux 78aac3e96a update: followup on edfdc49
all rgw instances should be stopped according to the multiple rgw
instances support added in rolling_update.yml

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux f6e0185146 update: add containerized deployment upgrade support (L->N)
Add a couple of fixes to allow containerized deployments upgrade support
to upgrade from luminous/mimic to nautilus.

- pass CEPH_CONTAINER_IMAGE and CEPH_CONTAINER_BINARY environment
variable to the ceph_key module,
- fix the docker exec command in 'waiting for the containerized monitor
to join the quorum' task according to the `delegate_to` parameter,
- override `docker_exec_cmd` in `ceph-facts` with `mon_host` when
rolling_update is `True`,
- do not run unnecessarily `create_mds_filesystems.yml` when performing an
upgrade.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 1816b876ee update: add missing hosts in facts gathering
iscsigws were missing.
The 'complete upgrade' couldn't complete because rolling_update was set
to False for iscsigw nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 45ba90c169 update: remove rbdmirror legacy task
This task is no longer needed for next release.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 0ea0adf039 update: show all daemons version at the end
Let's display all daemons version at the end of the playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux f31d6d9485 update: enable new nautilus-only functionality
once the cluster is upgraded to nautilus, we can complete the process by
disallowing pre-nautilus OSDs and enabling all new nautilus-only functionality

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux afdaa70a63 update: enable msgr2 protocol
This commit enable the msgr2 protocol when the cluster is fully upgraded
to nautilus

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux ef096dd021 update: ensure mgrs are upgraded after ALL monitors
As of 1c760904b0, ceph-ansible implicitly
bootstrap managers on monitors.
mgrs must be upgraded only after all monitors, therefore, this commit
refact the way mgrs are upgraded to be sure we don't upgrade a mgr
during the monitors upgrade.

This commit also ensure we handle the case were we split managers on
dedicated nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 7fa2434f0f update: ensure /var/lib/ceph/bootstrap-rbd-mirror is present
This directory is created by ceph-config node by node.
In the upgrade context we need it to be created on ALL monitors as soon
as the first iteration because of the task right after which creates and sends
the keyrings on all monitors.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 82764afe8d update: mask systemd service units during upgrade
This prevents the packaging from restarting services before we do need
to restart them in the rolling update sequence.
We want to handle services restart at rolling_update playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 8add55451c update: set osd flags only once
There is no need to set osd flags (noout, norebalance) each time we
upgrade a mon.

This commit moves up those tasks (before stopping the mon) so we don't need
to delegate them.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux f7c6f4e0b6 update: fix tasks waiting for the node to join the quorum
We actually want to ensure the node being upgraded is joining the quorum
instead of the monitor picked up earlier.

Indeed, the `mon_host`is used only in `delegate_to:` so we can still run ceph
commands while the monitor being upgraded is stopped.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 32569b79e2 update: remove an old parameter in ceph_key module call
the `containerized` parameter in ceph_key module doesn't exist anymore.
This was making the module failing but was hidden because of the
`ignore_errors: True`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Dimitri Savineau b23c05ae52 add-osd.yml: Add become flag for ceph-validate
The check_devices task fails if the ceph-validate role isn't executed
as a privileged user (Permission denied).

failed: [osd0] (item=/dev/sdb) => {"changed": false, "err": "Error:
Error opening /dev/sdb: Permission denied\n", "item": "/dev/sdb",
"msg": "Error while getting device information with parted script:
'/sbin/parted -s -m /dev/sdb -- unit 'MiB' print'", "out": "", "rc": 1}

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-03-09 05:54:46 +00:00
Guillaume Abrioux a440878533 add-osd: gather facts in second part of playbook
otherwise, it will end up with error like following:

```
FAILED! => {"msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_hostname'"}
```

because facts won't have been gathered.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1670663

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-04 14:44:27 +01:00
Guillaume Abrioux 47ebef374f purge: fix rbd-mirror group name
the default is rbdmirrors in ceph-defaults

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-01 20:31:14 +00:00
Guillaume Abrioux a915308477 purge: fix rbd mirror purge
as of b70d54ac80 the service launched isn't
ceph-rbd-mirror@admin.service.

it's now `ceph-rbd-mirror@rbd-mirror.{{ ansible_hostname }}`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-01 20:31:14 +00:00
Guillaume Abrioux 3849f30f58 purge: do not remove /var/lib/apt/lists/*
removing the content of this directory seems a bit agressive and cause a
redeployment to fail after a purge on debian based distrubition.

Typical error:
```
fatal: [mon0]: FAILED! => changed=false
  attempts: 3
  msg: No package matching 'ceph' is available
```

The following task will consider the cache is still valid, so apt
doesn't refresh it:
```
- name: update apt cache if cache_valid_time has expired
  apt:
    update_cache: yes
    cache_valid_time: 3600
  register: result
  until: result is succeeded
```

since the task installing ceph packages has a `update_cache: no` it
fails:

```
- name: install ceph for debian
  apt:
    name: "{{ debian_ceph_pkgs | unique }}"
    update_cache: no
    state: "{{ (upgrade_ceph_packages|bool) | ternary('latest','present') }}"
    default_release: "{{ ceph_stable_release_uca | default('') }}{{ ansible_distribution_release ~ '-backports' if ceph_origin == 'distro' and ceph_use_distro_backports else '' }}"
  register: result
  until: result is succeeded
```

/tmp/* isn't specific to ceph as well, so we shouldn't remove everything
in this directory.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-01 20:31:14 +00:00
Guillaume Abrioux 89f77589fa purge: fix purge of lvm devices
using `shell` module seems to be the only way to make this task working
on rhel based distribution AND debian based distributions.

on ubuntu, using `command` ansible module fails like following
(not due to `sudo` usage or not):
```
ok: [osd1] => changed=false
  cmd: command -v ceph-volume
  failed_when_result: false
  msg: '[Errno 2] No such file or directory: ''command'': ''command'''
  rc: 2
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-01 20:31:14 +00:00
Guillaume Abrioux 69310a5cd6 switch_to_containers: support multiple rgw instances per host
add multiple rgw instances per host in switch_to_containers playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-13 09:42:27 +01:00
Guillaume Abrioux 70f1eea9b2 switch_to_containers: remove non-containerized systemd unit files
remove old systemd unit files (non-containerized) during the
switch_to_containers transition.

We have seen sometimes the unit started is the old one instead of the
new systemd unit generated.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-13 09:42:27 +01:00
Guillaume Abrioux 4064035a54 switch_to_containers: use ceph binary from container
use the ceph binary from the container instead of the host.
If the ceph CLI version isn't compatible between host and container
image, it can cause the CLI to hang.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-13 09:42:27 +01:00
Guillaume Abrioux 7e0a70f7a8 switch_to_containers: do not try to redeploy monitors
`ceph-mon` tries to redeploy monitors because it assumes it was not yet
deployed since `mon_socket_stat` and `ceph_mon_container_stat` are
undefined (indeed, we stop the daemon before calling `ceph-mon` in the
switch_to_containers playbook).

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-13 09:42:27 +01:00
John Fulton 37b5d1084a Make python print statements python3 compatible
The restart_osd_daemon.sh generated from the j2 template
contains a python call which uses 'print x' instead of
'print(x)'. Add the missing parentheses to make this call
compatible with both 2 and 3.

Also add parentheses to other python print calls found
in roles/ceph-client/defaults/main.yml and
infrastructure-playbooks/cluster-os-migration.yml.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1671721
Signed-off-by: John Fulton <fulton@redhat.com>
2019-02-01 15:23:27 +00:00
Noah Watkins 9a43674d2e shrink_osd: use cv zap by fsid to remove parts/lvs
Fixes:
  https://bugzilla.redhat.com/show_bug.cgi?id=1569413
  https://bugzilla.redhat.com/show_bug.cgi?id=1572933

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2019-01-24 16:34:13 +01:00
Guillaume Abrioux edfdc49488 rolling_update: support multiple rgw instance
1ac94c048f introduced the support of
multiple rgw instances on a single host but somehow has missed to
implement this feature in rolling_update.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-01-22 13:45:38 +01:00
Giulio Fidente ff8dbe114c Preserve rolling_update backward compatibility with ansible < 2.5
Signed-off-by: Giulio Fidente <gfidente@redhat.com>
2019-01-21 14:05:45 +01:00
guihecheng 1ac94c048f rgw: add support for multiple rgw instances on a single host
With this, we could have multiple rgw instances on a single host
with a single run, don't have to use rgw-standalone.yml which does not
seems able to bind ports separately.
If you want to have multiple rgw instances, just change 'radosgw_instances'
to the number you want, which defaults to 1.
Not compatible with Multi-Site yet.

Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com>
2019-01-18 11:12:28 +01:00
Guillaume Abrioux 268f2cef82 update: do not enforce `serial: 1` on client nodes
There is no need to enforce `serial: 1` on client nodes.
Let's make it parameterizable by introducing a new *extra* variable
`client_update_batch`, if not filled this will default to `{{
ansible_forks }}`.

NOTE: this is only usable as an extra variable passed with
`-e client_update_batch=<num>`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1650184

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-01-02 16:55:08 +00:00
Daniel-Pivonka ba149972be Example ceph_add_users_buckets playbook
This is example playbook will show how to bulk add rgw users and buckets

Signed-off-by: Daniel-Pivonka <dpivonka@redhat.com>
2018-12-20 14:23:25 +01:00
Guillaume Abrioux d7e77012ef retry on packages and repositories failures
add register/until on all packaging related tasks to avoid non valid CI
failure.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-12-19 14:48:27 +00:00
Noah Watkins 110049e825 playbook: report storage device inventory
Signed-off-by: Noah Watkins <nwatkins@redhat.com>
2018-12-18 10:51:31 +01:00