Commit Graph

483 Commits (4e42d085f7147d38100b7535bc16667e747e9c52)

Author SHA1 Message Date
Dimitri Savineau 5f91be8740 switch_to_containers: umount osd lockbox partition
When switching from a baremetal deployment to a containerized deployment
we only umount the OSD data partition.
If the OSD is encrypted (dmcrypt: true) then there's an additional
partition (part number 5) used for the lockbox and mount in the
/var/lib/ceph/osd-lockbox/ directory.
Because this partition isn't umount then the containerized OSD aren't
able to start. The partition is still mount by the system and can't be
remount from the container.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616159

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 19edf707a5)
2019-10-08 00:57:05 +00:00
Guillaume Abrioux b325cc386e switch_to_containers: do not re-set `ceph_uid`
This commit refacts the way we set `ceph_uid` fact in `ceph-facts` and
removes all `set_fact` tasks for `ceph_uid` in switch-to-containers playbook
to avoid duplicated code.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fa9b42e98e)
2019-10-07 10:18:17 -04:00
Guillaume Abrioux 468aa5d63b switch_to_containers: optimize ownership change
As per https://github.com/ceph/ceph-ansible/pull/4323#issuecomment-538420164

using `find` command should be faster.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1757400

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-Authored-by: Giulio Fidente <gfidente@redhat.com>
(cherry picked from commit c5d0c90bb7)
2019-10-07 10:18:17 -04:00
Guillaume Abrioux 37fd0b179b update: import ceph-defaults role in first play
Typical error:

```
fatal: [mon0]: FAILED! =>
  msg: |-
    The conditional check 'not delegate_facts_host | bool or inventory_hostname in groups.get(client_group_name, [])' failed. The error was: error while evaluating conditional (not delegate_facts_host | bool or inventory_hostname in groups.get(client_group_name, [])): 'client_group_name' is undefined
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8138d4193c)
2019-10-07 11:21:23 +02:00
Guillaume Abrioux 9a4fcfabe1 main: exclude client nodes from facts gathering when delegate_facts_host
This commit excludes client nodes from facts gathering, they are not
needed and can speed up this task.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 865d2eac9b)
2019-10-07 11:21:23 +02:00
Dimitri Savineau ec1c57f690 dashboard: remove useless block section
The block section were used with the dashboard_enabled condition when
the code was included in the main playbooks.
Because this condition isn't present in the dashboard playbook anymore
we can remove the block section.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit cf47594b47)
2019-10-04 13:28:37 +02:00
Guillaume Abrioux 9a79ed1bf0 rgw: refact tasks directory layout
This commit moves containerized deployment related files to `./tasks/`
directory. This is needed to make `docker-to-podman.yml` working since
we use `tasks_from:` option.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e08194dd67)
2019-10-01 18:50:51 +02:00
Guillaume Abrioux 7f902994b3 rbdmirror: refact tasks directory layout
This commit moves containerized deployment related files to `./tasks/`
directory. This is needed to make `docker-to-podman.yml` working since
we use `tasks_from:` option.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c69816c6b7)
2019-10-01 18:50:51 +02:00
Guillaume Abrioux d7a06c67db iscsigw: refact tasks directory layout
This commit moves containerized deployment related files to `./tasks/
directory. This is needed to make `docker-to-podman.yml` working since
we use `tasks_from:` option.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4636f3f7e2)
2019-10-01 18:50:51 +02:00
Guillaume Abrioux b564c37696 upgrade: add an infra playbook to migrate systemd units to podman
this commit adds a new playbook to force systemd units for containers to
use podman instead of docker.
This is needed in the rhel8 upgrade context so after the base OS is upgraded
containers can be started using podman.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f2017dcda2)
2019-10-01 18:50:51 +02:00
Guillaume Abrioux 4afe1b748c update: reset mon_host after mons upgrade
after all mon are upgraded, let's reset mon_host which is used in the
rest of the playbook for setting `container_exec_cmd` so we are sure to
use the right value.

Typical error:

```
failed: [mds0 -> mon0] (item={u'path': u'/var/lib/ceph/bootstrap-mds/ceph.keyring', u'name': u'client.bootstrap-mds', u'copy_key': True}) => changed=true
  ansible_loop_var: item
  cmd:
  - docker
  - exec
  - ceph-mon-mon2
  - ceph
  - --cluster
  - ceph
  - auth
  - get
  - client.bootstrap-mds
  delta: '0:00:00.016294'
  end: '2019-09-27 13:54:58.828835'
  item:
    copy_key: true
    name: client.bootstrap-mds
    path: /var/lib/ceph/bootstrap-mds/ceph.keyring
  msg: non-zero return code
  rc: 1
  start: '2019-09-27 13:54:58.812541'
  stderr: 'Error response from daemon: No such container: ceph-mon-mon2'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d84160a170)
2019-09-28 09:01:16 +02:00
Harald Jensås 5fea830414 Replace ipaddr() with ips_in_ranges()
This change implements a filter_plugin that is used in the
ceph-facts, ceph-validate roles and infrastucture-playbooks.
The new filter plugin will return a list of all IP address
that reside in any one of the given IP ranges. The new filter
replaces the use of the ipaddr filter.

ceph.conf already support a comma separated list of CIDRs
for the public_network and cluster_network options.

Changes: [1] and [2] introduced a regression in ceph-ansible
where public_network can no longer be a comma separated list
of cidrs.

With this change a comma separated list of subnet CIDRs can
also be used for monitor_address_block and radosgw_address_block.

[1] commit: d67230b2a2
[2] commit: 20e4852888

Related-To: https://bugs.launchpad.net/tripleo/+bug/1840030
Related-To: https://bugzilla.redhat.com/show_bug.cgi?id=1740283

Closes: #4333
Please backport to stable-4.0

Signed-off-by: Harald Jensås <hjensas@redhat.com>
(cherry picked from commit e695efcaf7)
2019-09-27 17:49:46 +02:00
Sam Choraria 7594bc9181 rolling_update.yml: force ceph-volume scan on osds
The rolling_update.yml playbook fails when scanning ceph-disk osds while
deploying nautilus. The --force flag is required to scan existing osds
and rewrite their json metadata.

Signed-off-by: Sam Choraria <sam.choraria@bbc.co.uk>
(cherry picked from commit 7cc9f93680)
2019-09-26 14:51:59 -04:00
Guillaume Abrioux 96dafd676c infrastructure-playbooks: add filestore-to-bluestore.yml
This playbook helps to migrate all osds on a node from filestore to
bluestore backend.
Note that *ALL* osd on the specified osd nodes will be shrinked and
redeployed.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3f9ccdaa8a)
2019-09-26 16:21:54 +02:00
Guillaume Abrioux 26e0f4db97 lv-create: fix a typo
This commit fixes a typo.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c785ad3637)
2019-09-26 16:21:54 +02:00
Mehdy 8c37894109 shrink-rgw.yml: fix confirmation play's name
the confirmation play's name should confirm removing rgw instead of
monitor

Signed-off-by: Mehdy Khoshnoody <mehdy.khoshnoody@gmail.com>
(cherry picked from commit 9fa98d79fd)
2019-09-25 16:37:44 +02:00
Dimitri Savineau a5775be7c4 shrink-mon: search mon in the quorum_names list
If we're looking at the mon hostname in the ceph status output then
there's some scenarios where this could be true.
If we collocate some services (mons, mgrs, etc..) then the hostname of
the monitor to shrink will still be present in the ceph status (like
in mgrs or other).
Instead we should check the hostame only in the mon part of the output.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 734c0dc310)
2019-09-18 14:47:40 +00:00
Kevin Jones 3a8de9cc36 Set proper ownership command performance improvement
By changing the set ownership command from using the file module in combination with a with_items loop to a raw chown command, we can achieve a 98% performance increase here.

On a ceph cluster with a significant amount of directories and files in /var/lib/ceph, the file module has to run checks on ownership of all those directories and files to determine whether a change is needed.

In this case, we just want to explicitly set the ownership of all these directories and files to the ceph_uid

Added context note to all set proper ownership tasks

Signed-off-by: Kevin Jones <kevinjones@redhat.com>
(cherry picked from commit 47bf47c9d8)
2019-08-22 12:59:58 +02:00
Guillaume Abrioux 236020fb2b shrink-mon: refact 'verify the monitor is out of the cluster' task
use `from_json` filter instead of a `| python` so we can get rid of the
`shell` module usage here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5573f17e76)
2019-08-19 18:47:14 +00:00
Rishabh Dave b28ed96378 use pre_tasks and post_tasks in shrink-mon.yml too
This commit should've been part of commit
2fb12ae554.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 2034387f57)
2019-08-19 18:47:14 +00:00
Guillaume Abrioux 2f77704591 common: use discovered_interpreter_python fact
in order to use the right binary name when using python cli in command
or shell module.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 13815ad3ca)
2019-08-19 18:47:14 +00:00
Dimitri Savineau f9d9ffac8f dashboard: run dashboard role on mgr/mon nodes
We don't need to execute the ceph-dashboard role on the nodes present
in the grafana-server group. This one is dedicated to the grafana and
prometheus stack.
The ceph-dashboard needs to executed where the ceph-mgr is running. It
is either on the dedicated mgr nodes or if mgr and mon are collocated
implicitly on the mon nodes.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 16939eff9e)
2019-08-08 13:47:09 +02:00
Rishabh Dave 72a062b6fa add a playbook the remove rgw from a given node
Add a playbook named shrink-rgw.yml to infrastructure-playbooks/ that
can remove a RGW from a node in an already deployed Ceph cluster.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 632a44bdf2)
2019-07-31 15:25:15 -04:00
Rishabh Dave 8ca88b41cc infra-playbooks: rewite a condition for better readability
Use facility built-in in Ansible to check whether a command was executed
successfully rather looking at its return value.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 5aecdd3ba6)
2019-07-29 15:52:29 +02:00
Guillaume Abrioux d0ad1cf0f1 dashboard: use dedicated group only
There's no need to add complexity and trying to fallback on other group.
Let's deploy dashboard on all nodes present in grafana-server group.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d67230b2a2)
2019-07-29 15:46:58 +02:00
Dimitri Savineau dd87db70ca dashboard: move code into a dedicated playbook
Move dashboard, grafana/prometheus and node-exporter plays into a
dedicated playbook in infrastructure-playbook directory.
To avoid using 'dashboard_enabled | bool' condition multiple time
in the main playbook we can just import the dashboard playbook or
not.
This patch also allows to use an unique dashboard playbook for
both baremetal and container playbooks.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 43135840b1)
2019-07-29 15:46:58 +02:00
Dimitri Savineau 43d625b59a Remove NBSP characters
Some NBSP are still present in the yaml files.
Adding a test in travis CI.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 07c6695d16)
2019-07-26 16:23:41 -04:00
Guillaume Abrioux bee8a31afe shrink-rbdmirror: check if rbdmirror is well removed from cluster
This commits adds a check to ensure the daemon has been removed from the
cluster.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 916dc1f52f)
2019-07-16 15:02:49 +02:00
Rishabh Dave 0a15d1d112 add a playbook that removes rbd-mirror from a node
Add a playbook named "shrink-rbdmirror.yml" in infrastructure-playbooks/
that removes a RBD Mirror from a node in an already deployed Ceph
cluster.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit c4824acb19)
2019-07-16 15:02:49 +02:00
Rishabh Dave 6197d1c8d9 add a playbook that removes manager from a node
Add a playbook, named "shrink-mgr.yml", in infrastructure-playbooks/
that removes a MGR from a node in an already deployed Ceph cluster.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit f4ea75051b)
2019-07-09 15:00:56 +00:00
Guillaume Abrioux 85a448429d shrink-mds: refact post tasks
This commit refacts the way we check the "mds_to_kill" node is well
stopped.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 7df62fde34)
2019-07-09 12:07:47 +02:00
Rishabh Dave 38c2785e95 add a playbook that removes mds from a node
Add a playbook, named "shrink-mds.yml", in infrastructure-playbooks/
that removes a MDS from a node in an already deployed Ceph cluster.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 235b1fccc6)
2019-07-09 12:07:47 +02:00
Mike Christie cf6050d4e6 igw: Support new ceph-iscsi package during purge
The ceph-iscsi-config and ceph-iscsi-cli packages were combined into
ceph-iscsi and its APIs changed. This fixes up the iscsi purge task to
support the new API and old one.

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit b163206db7)
2019-07-04 00:04:04 +00:00
Guillaume Abrioux 0a0cdc0963 purge: ensure no ceph kernel thread is present
This tries to first unmount any cephfs/nfs-ganesha mount point on client
nodes, then unmap any mapped rbd devices and finally it tries to remove
ceph kernel modules.
If it fails it means some resources are still busy and should be cleaned
manually before continuing to purge the cluster.
This is done early in the playbook so the cluster stays untouched until
everything is ready for that operation, otherwise if you try to redeploy
a cluster it could end up by getting confused by leftover from previous
deployment.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1337915

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 20e4852888)
2019-06-24 13:20:50 +02:00
Guillaume Abrioux 77d24203fa upgrade: accept HEALTH_OK and HEALTH_WARN as valid state
3a100cfa52 introduced a check which is a
bit too restrictive, let's accept HEALTH_OK and HEALTH_WARN.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6dce51183b)
2019-06-21 15:47:33 +00:00
Dimitri Savineau aa197f77fc remove ceph restapi references
The ceph restapi configuration was only available until Luminous
release so we don't need those leftovers for nautilus+.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit da8b7ab7fb)
2019-06-20 15:15:10 -04:00
Guillaume Abrioux b93064c7c8 rolling_update: fail early if cluster state is not OK
starting an upgrade if the cluster isn't HEALTH_OK isn't a good idea.
Let's check for the cluster status before trying to upgrade.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3a100cfa52)
2019-06-19 08:41:25 +00:00
Guillaume Abrioux 53dd58e84c rolling_update: only mask and stop unit in mgr part
Otherwise it fails like following:

```
fatal: [mon0]: FAILED! => changed=false
  msg: |-
    Unable to enable service ceph-mgr@mon0: Failed to execute operation: Cannot send after transport endpoint shutdown
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 51b2813e04)
2019-06-19 08:41:25 +00:00
Dimitri Savineau 6e565b251d remove ceph-agent role and references
The ceph-agent role was used only for RHCS 2 (jewel) so it's not
usefull anymore.
The current code will fail on CentOS distribution because the rhscon
package is only avaible on Red Hat with the RHCS 2 repository and
this ceph release is supported on stable-3.0 branch.

Resolves: #4020

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7503098ca0)
2019-06-17 15:56:00 -04:00
L3D 1daca1ba83 ansible: use 'bool' filter on boolean conditionals
By running ceph-ansible there are a lot ``[DEPRECATION WARNING]`` like these:
```
[DEPRECATION WARNING]: evaluating containerized_deployment as a bare variable,
this behaviour will go away and you might need to add |bool to the expression
in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This
feature will be removed in version 2.12. Deprecation warnings can be disabled
by setting deprecation_warnings=False in ansible.cfg.
```

Now appended ``| bool`` on a lot of the affected variables.

Sometimes the coding style from ``variable|bool`` changed to ``variable | bool`` *(with spaces at the pipe)*.

Closes: #4022

Signed-off-by: L3D <l3d@c3woc.de>
(cherry picked from commit ab54fe20ec)
2019-06-07 16:05:51 +02:00
Dimitri Savineau 7a384e7ec2 purge-cluster: clean all ceph repo files
We currently only purge rh_storage yum repository file but depending
on the ceph_repository value we are using, the ceph repository file
could have a different name.

Resolves: #4056

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 44c63903ca)
2019-06-07 12:05:40 +00:00
guihecheng a6312ba9bc Add section for purging rgw loadbalancer in purge-cluster.yml
Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com>
(cherry picked from commit 59e702ec39)
2019-06-06 19:44:30 +00:00
Guillaume Abrioux 16c6d530c6 roles: introduce `ceph-container-engine` role
This commit splits the current `ceph-container-common` role.

This introduces a new role `ceph-container-engine` which handles the
tasks specific to the installation of containers tools (docker/podman).

This is needed for the ceph-dashboard implementation for 2 main reasons:

1/ Since the ceph-dashboard stack is only containerized, we must install
everything needed to run containers even in non containerized
deployments. Splitting this role allows us to not have to call the full
`ceph-container-common` role which would run a bunch of unneeded tasks
that would have been skipped anyway.

2/ The current implementation would have required to run
`ceph-container-common` on all ceph-clients nodes which would have been
conflicting with 9d3517c670 (we don't want
to run ceph-container-common on all client nodes, see mentioned commit
for more details)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 55420d6253)
2019-05-22 15:24:11 -04:00
Guillaume Abrioux d83db2c8ed switch to ansible 2.8
- remove private attribute with import_role.
- update documentation.
- update rpm spec requirement.
- fix MagicMock python import in unit tests.

Closes: #3765

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 72d8315299)
2019-05-21 09:17:46 +02:00
Dimitri Savineau 023cdffd95 purge-docker-cluster: don't remove data on atomic
Because we don't manage the docker service on atomic (yet) via the
ceph-container-common role then we can't stop docker dans remove
the data.
For now let's do that only for non atomic hosts.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 638604929b)
2019-05-17 10:44:52 -04:00
Guillaume Abrioux e29fd842a6 rename docker_exec_cmd variable
This commit renames the `docker_exec_cmd` variable to
`container_exec_cmd` so it's more generic.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e74d80e72f)
2019-05-17 16:05:58 +02:00
Zack Cerza 0496ce8e5c purge-docker-cluster.yml: Default lvm_volumes
We were failing when that variable is unset; purge-cluster.yml contains
this workaround.

Signed-off-by: Zack Cerza <zack@redhat.com>
(cherry picked from commit 9b4339a2ba)
2019-05-17 16:05:58 +02:00
Boris Ranto 5ac7559736 Merge cephmetrics/dashboard-ansible repo
This commit will merge dashboard-ansible installation scripts with
ceph-ansible. This includes several new roles to setup ceph-dashboard
and the underlying technologies like prometheus and grafana server.

Signed-off-by: Boris Ranto & Zack Cerza <team-gmeno@redhat.com>
Co-authored-by: Zack Cerza <zcerza@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2f141a6e80)
2019-05-17 16:05:58 +02:00
wumingqiao 30b1ca9aeb shrink_osd: mark all osd(s) out in one command
Signed-off-by: wumingqiao <wumingqiao@beyondcent.com>
(cherry picked from commit 5320aa11c4)
2019-05-15 21:44:30 -04:00
Dimitri Savineau 1e23d853f9 purge-docker-cluster: remove docker data
We never clean the content of /var/lib/docker so we can still have
some data present in this directory after run the purge playbook.
Pip isn't used anymore.
Also update the docker package name (especially the python binding
one).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 168d7cd016)
2019-05-14 11:00:30 +02:00