Commit Graph

447 Commits (34f9d51178f4cd37a7df1bb74897dff7eb5c065f)

Author SHA1 Message Date
Dimitri Savineau 44c63903ca purge-cluster: clean all ceph repo files
We currently only purge rh_storage yum repository file but depending
on the ceph_repository value we are using, the ceph repository file
could have a different name.

Resolves: #4056

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-06-07 09:28:14 +02:00
guihecheng 59e702ec39 Add section for purging rgw loadbalancer in purge-cluster.yml
Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com>
2019-06-06 17:12:04 +02:00
L3D ab54fe20ec ansible: use 'bool' filter on boolean conditionals
By running ceph-ansible there are a lot ``[DEPRECATION WARNING]`` like these:
```
[DEPRECATION WARNING]: evaluating containerized_deployment as a bare variable,
this behaviour will go away and you might need to add |bool to the expression
in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This
feature will be removed in version 2.12. Deprecation warnings can be disabled
by setting deprecation_warnings=False in ansible.cfg.
```

Now appended ``| bool`` on a lot of the affected variables.

Sometimes the coding style from ``variable|bool`` changed to ``variable | bool`` *(with spaces at the pipe)*.

Closes: #4022

Signed-off-by: L3D <l3d@c3woc.de>
2019-06-06 10:21:17 +02:00
Dimitri Savineau 7503098ca0 remove ceph-agent role and references
The ceph-agent role was used only for RHCS 2 (jewel) so it's not
usefull anymore.
The current code will fail on CentOS distribution because the rhscon
package is only avaible on Red Hat with the RHCS 2 repository and
this ceph release is supported on stable-3.0 branch.

Resolves: #4020

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-06-03 13:35:50 +02:00
Guillaume Abrioux 55420d6253 roles: introduce `ceph-container-engine` role
This commit splits the current `ceph-container-common` role.

This introduces a new role `ceph-container-engine` which handles the
tasks specific to the installation of containers tools (docker/podman).

This is needed for the ceph-dashboard implementation for 2 main reasons:

1/ Since the ceph-dashboard stack is only containerized, we must install
everything needed to run containers even in non containerized
deployments. Splitting this role allows us to not have to call the full
`ceph-container-common` role which would run a bunch of unneeded tasks
that would have been skipped anyway.

2/ The current implementation would have required to run
`ceph-container-common` on all ceph-clients nodes which would have been
conflicting with 9d3517c670 (we don't want
to run ceph-container-common on all client nodes, see mentioned commit
for more details)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-05-22 13:02:10 +02:00
Guillaume Abrioux 72d8315299 switch to ansible 2.8
- remove private attribute with import_role.
- update documentation.
- update rpm spec requirement.
- fix MagicMock python import in unit tests.

Closes: #3765

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-05-20 13:04:58 +02:00
Dimitri Savineau 638604929b purge-docker-cluster: don't remove data on atomic
Because we don't manage the docker service on atomic (yet) via the
ceph-container-common role then we can't stop docker dans remove
the data.
For now let's do that only for non atomic hosts.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-05-17 09:33:50 +02:00
Guillaume Abrioux e74d80e72f rename docker_exec_cmd variable
This commit renames the `docker_exec_cmd` variable to
`container_exec_cmd` so it's more generic.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-05-16 16:39:13 +02:00
Zack Cerza 9b4339a2ba purge-docker-cluster.yml: Default lvm_volumes
We were failing when that variable is unset; purge-cluster.yml contains
this workaround.

Signed-off-by: Zack Cerza <zack@redhat.com>
2019-05-16 16:39:13 +02:00
Boris Ranto 2f141a6e80 Merge cephmetrics/dashboard-ansible repo
This commit will merge dashboard-ansible installation scripts with
ceph-ansible. This includes several new roles to setup ceph-dashboard
and the underlying technologies like prometheus and grafana server.

Signed-off-by: Boris Ranto & Zack Cerza <team-gmeno@redhat.com>
Co-authored-by: Zack Cerza <zcerza@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-05-16 16:39:13 +02:00
wumingqiao 5320aa11c4 shrink_osd: mark all osd(s) out in one command
Signed-off-by: wumingqiao <wumingqiao@beyondcent.com>
2019-05-15 16:04:28 +02:00
Dimitri Savineau 168d7cd016 purge-docker-cluster: remove docker data
We never clean the content of /var/lib/docker so we can still have
some data present in this directory after run the purge playbook.
Pip isn't used anymore.
Also update the docker package name (especially the python binding
one).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-05-14 10:55:43 +02:00
Dimitri Savineau ea1f8f551c gather-ceph-logs: fix logs list generation
The shell module doesn't have a stdout_lines attributes. Instead of
using the shell module, we can use the find modules.

Also adding `become: false` to the local tmp directory creation
otherwise we won't have enough right to fetch the files into this
directory.

Resolves: #3966

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-05-13 16:23:05 +02:00
Mike Christie d7ef12910e igw: Fix rolling update service ordering
We must stop tcmu-runner after the other rbd-target-* services
because they may need to interact with tcmu-runner during shutdown.
There is also a bug in some kernels where IO can get stuck in the
kernel and by stopping rbd-target-* first we can make sure all IO is
flushed.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1659611

Signed-off-by: Mike Christie <mchristi@redhat.com>
2019-05-10 09:40:52 +02:00
Rishabh Dave 6e8fb2b3ea remove infrastructure-playbooks/rgw-standalone.yml
We don't need infrastructure-playbooks/rgw-standalone.yml since
site.yml.sample and site-cotainer.yml.sample can add a new RGW node to
an already deployed Ceph cluster.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2019-05-07 13:05:17 +02:00
letterwuyu d57f6fcdc6 Fix comment content
Signed-off-by: lishuhao letterwuyu@gmail.com
2019-05-07 10:54:44 +02:00
Dimitri Savineau f1048627ea rolling_update: restart all ceph-iscsi services
Currently only rbd-target-gw service is restarted during an update.
We also need to restart tcmu-runner and rbd-target-api services
during the ceph iscsi upgrade.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1659611

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-24 07:47:23 +00:00
Rishabh Dave 739a662c80 improve coding style
Keywords requiring only one item shouldn't express it by creating a
list with single item.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2019-04-23 15:37:07 +02:00
Guillaume Abrioux 7eb42c9e8e update: ensure tasks are executed on an upgraded mon
These tasks must be run from a monitor which is upgraded otherwise it
might fail.
See: https://tracker.ceph.com/issues/39355

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-18 11:16:11 +02:00
Guillaume Abrioux ed84325b1d update: ensure ceph command returns 0
these commands could return something else than 0.
Let's ensure all retries have been done before actually failing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-18 11:16:11 +02:00
Guillaume Abrioux 543d1e2e41 update: set osd flags before upgrading any mon
Typical error:

```
failed: [mon0 -> mon2] (item=noout) => changed=true
  cmd:
  - ceph
  - --cluster
  - ceph
  - osd
  - set
  - noout
  delta: '0:00:00.293756'
  end: '2019-04-17 06:31:57.552386'
  item: noout
  msg: non-zero return code
  rc: 1
  start: '2019-04-17 06:31:57.258630'
  stderr: |-
    Traceback (most recent call last):
      File "/bin/ceph", line 1222, in <module>
        retval = main()
      File "/bin/ceph", line 1146, in main
        sigdict = parse_json_funcsigs(outbuf.decode('utf-8'), 'cli')
      File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs
        cmd['sig'] = parse_funcsig(cmd['sig'])
      File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig
        raise JsonFormat(s)
    ceph_argparse.JsonFormat: unknown type CephBool
  stderr_lines:
  - 'Traceback (most recent call last):'
  - '  File "/bin/ceph", line 1222, in <module>'
  - '    retval = main()'
  - '  File "/bin/ceph", line 1146, in main'
  - '    sigdict = parse_json_funcsigs(outbuf.decode(''utf-8''), ''cli'')'
  - '  File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs'
  - '    cmd[''sig''] = parse_funcsig(cmd[''sig''])'
  - '  File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig'
  - '    raise JsonFormat(s)'
  - 'ceph_argparse.JsonFormat: unknown type CephBool'
  stdout: ''
  stdout_lines: <omitted>
```

Having mixed versions of monitors seems to cause this error.
Moving these tasks before any monitor gets upgraded seems to be enough
to get around this issue.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-18 11:16:11 +02:00
Andrew Schoen e2529dcd7f rolling_update: ceph commands should use --cluster
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2019-04-18 10:55:11 +02:00
Andrew Schoen 67453853ff rolling_update: set num_osds to the number of running osds
We do this so that the ceph-config role can most accurately
report the number of osds for the generation of the ceph.conf
file.

We don't want to use ceph-volume to determine the number of
osds because in an upgrade to nautilus ceph-volume won't be able to
accurately count osds created by ceph-disk.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2019-04-18 10:55:11 +02:00
Andrew Schoen 28c47e4d1b rolling_update: migrate ceph-disk osds to ceph-volume
When upgrading to nautlius run ``ceph-volume simple scan`` and
``ceph-volume simple activate --all`` to migrate any running
ceph-disk osds to ceph-volume.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1656460

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2019-04-18 10:55:11 +02:00
Rishabh Dave d5967af7fb allow adding a monitor to a deployed cluster
Add a playbook that deploys a new monitor on a new node, adds that node
to the Ceph cluster and the monitor to the quorum and updates the ceph
configuration file on OSD nodes.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2019-04-15 10:00:50 +02:00
Dimitri Savineau eb658b3af6 purge-cluster: remove python-ceph-argparse package
When using purge-cluster playbook with nautilus, there's still the
python-ceph-argparse package installed on the host preventing to
reinstall a ceph cluster with a different version (like luminous or
mimic)

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-15 09:15:08 +02:00
Dimitri Savineau 150acba8c5 switch-from-non-containerized: stop all osds
e6bfb84 introduced a regression in the switch from non containerized
to container deployment.
We need to stop all previous OSDs services. We just don't need the
ceph-disk pattern in the regex.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-11 16:26:53 -04:00
Guillaume Abrioux a1254d767c purge: remove references to ceph-disk
as of stable-4.0, ceph-disk is no longer supported.
These tasks aren't needed anymore.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 11:57:02 -04:00
Guillaume Abrioux 73aa788459 shrink-osd: remove legacy playbook
as of stable-4.0, ceph-disk is no longer supported.
Let's remove this legacy version of the playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 11:57:02 -04:00
Guillaume Abrioux e6bfb843f4 switch_to_containers: remove ceph-disk references
as of stable-4.0, ceph-disk is no longer supported.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 11:57:02 -04:00
Guillaume Abrioux 4d35e9eeed osd: remove variable osd_scenario
As of stable-4.0, the only valid scenario is `lvm`.
Thus, this makes this variable useless.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 11:57:02 -04:00
Guillaume Abrioux c1e4529b0e update: fix undefined error when no mgr group is declared
if mgr group isn't defined in inventory, that task will fail with
undefined error.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 09:22:35 +02:00
Dimitri Savineau 57b4e76d11 rolling_update: Remove ceph aliases
ceph aliases have been introduced in stable-3.2 during the ceph
deployment. On master this has been removed but we don't handle
this removal in the upgrade from stable-3.2 to master via the
rolling_update playbook.
Also remove the task from purge-docker-cluster missing from
d9e7835

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-09 16:50:10 +02:00
Rishabh Dave 192fea0fec add-osds: don't hardcode group names
Instead of hardcoding group names, import ceph-defaults earlier. Also,
rectify a minor mistake in vagrant_varaibles.yml for containerized
version of add_osds.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2019-04-08 10:09:00 +02:00
Guillaume Abrioux 0180738313 purge: fix lvm-batch purge osd
`lvm_volumes` and/or `devices` variable(s) can be undefined depending on
the scenario chosen.

These tasks should be run only if these variable are defined, otherwise
it ends up with undefined variable errors.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-02 14:56:24 +02:00
Dimitri Savineau 7cc626b72d purge-docker-cluster: Remove ceph-osd service
The systemd ceph-osd@.service file used for starting the ceph osd
containers is used in all osd_scenarios.
Currently purging a containerized deployment using the lvm scenario
didn't remove the ceph-osd systemd service.
If the next deployment is a non-containerized deployment, the OSDs
won't be online because the file is still present and override the
one from the package.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-03-28 18:26:38 +00:00
Guillaume Abrioux f55e2b08be remove all NBSPs on master branch
Similar to #3658

Since there's too many changes between master and stable branches let's
commit directly in each branches instead of trying to backport this
commit.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-28 11:57:55 +00:00
Dimitri Savineau c8442f3705 rolling_update: Update systemd unit regex for nvme
The systemd unit regex doesn't handle nvme devices (/dev/nvmeXn1).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1687828

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-03-26 12:01:00 +00:00
Guillaume Abrioux 78aac3e96a update: followup on edfdc49
all rgw instances should be stopped according to the multiple rgw
instances support added in rolling_update.yml

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux f6e0185146 update: add containerized deployment upgrade support (L->N)
Add a couple of fixes to allow containerized deployments upgrade support
to upgrade from luminous/mimic to nautilus.

- pass CEPH_CONTAINER_IMAGE and CEPH_CONTAINER_BINARY environment
variable to the ceph_key module,
- fix the docker exec command in 'waiting for the containerized monitor
to join the quorum' task according to the `delegate_to` parameter,
- override `docker_exec_cmd` in `ceph-facts` with `mon_host` when
rolling_update is `True`,
- do not run unnecessarily `create_mds_filesystems.yml` when performing an
upgrade.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 1816b876ee update: add missing hosts in facts gathering
iscsigws were missing.
The 'complete upgrade' couldn't complete because rolling_update was set
to False for iscsigw nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 45ba90c169 update: remove rbdmirror legacy task
This task is no longer needed for next release.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 0ea0adf039 update: show all daemons version at the end
Let's display all daemons version at the end of the playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux f31d6d9485 update: enable new nautilus-only functionality
once the cluster is upgraded to nautilus, we can complete the process by
disallowing pre-nautilus OSDs and enabling all new nautilus-only functionality

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux afdaa70a63 update: enable msgr2 protocol
This commit enable the msgr2 protocol when the cluster is fully upgraded
to nautilus

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux ef096dd021 update: ensure mgrs are upgraded after ALL monitors
As of 1c760904b0, ceph-ansible implicitly
bootstrap managers on monitors.
mgrs must be upgraded only after all monitors, therefore, this commit
refact the way mgrs are upgraded to be sure we don't upgrade a mgr
during the monitors upgrade.

This commit also ensure we handle the case were we split managers on
dedicated nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 7fa2434f0f update: ensure /var/lib/ceph/bootstrap-rbd-mirror is present
This directory is created by ceph-config node by node.
In the upgrade context we need it to be created on ALL monitors as soon
as the first iteration because of the task right after which creates and sends
the keyrings on all monitors.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 82764afe8d update: mask systemd service units during upgrade
This prevents the packaging from restarting services before we do need
to restart them in the rolling update sequence.
We want to handle services restart at rolling_update playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 8add55451c update: set osd flags only once
There is no need to set osd flags (noout, norebalance) each time we
upgrade a mon.

This commit moves up those tasks (before stopping the mon) so we don't need
to delegate them.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux f7c6f4e0b6 update: fix tasks waiting for the node to join the quorum
We actually want to ensure the node being upgraded is joining the quorum
instead of the monitor picked up earlier.

Indeed, the `mon_host`is used only in `delegate_to:` so we can still run ceph
commands while the monitor being upgraded is stopped.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00