Commit Graph

5081 Commits (87e1f0cc6c790e14893bfc18d9c37daad7126f38)
 

Author SHA1 Message Date
Boris Ranto fda901fff9 dashboard: Add and copy alerting rules
This commit adds a list of alerting rules for ceph-dashboard from the
old cephmetrics project. It also installs the configuration file so that
the rules get recognized by the prometheus server.

Signed-off-by: Boris Ranto <branto@redhat.com>
(cherry picked from commit 8f77caa932)
2019-05-17 16:05:58 +02:00
Zack Cerza 0496ce8e5c purge-docker-cluster.yml: Default lvm_volumes
We were failing when that variable is unset; purge-cluster.yml contains
this workaround.

Signed-off-by: Zack Cerza <zack@redhat.com>
(cherry picked from commit 9b4339a2ba)
2019-05-17 16:05:58 +02:00
Boris Ranto 5ac7559736 Merge cephmetrics/dashboard-ansible repo
This commit will merge dashboard-ansible installation scripts with
ceph-ansible. This includes several new roles to setup ceph-dashboard
and the underlying technologies like prometheus and grafana server.

Signed-off-by: Boris Ranto & Zack Cerza <team-gmeno@redhat.com>
Co-authored-by: Zack Cerza <zcerza@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2f141a6e80)
2019-05-17 16:05:58 +02:00
wumingqiao 30b1ca9aeb shrink_osd: mark all osd(s) out in one command
Signed-off-by: wumingqiao <wumingqiao@beyondcent.com>
(cherry picked from commit 5320aa11c4)
2019-05-15 21:44:30 -04:00
Guillaume Abrioux 0f9df389df tests: fix a typo in dev_setup.yml
c907ec41ae introduced a typo.
This commit fixes it.

```
[WARNING]: While constructing a mapping from /home/guits/ceph-ansible/tests/functional/dev_setup.yml, line 21, column 9, found a duplicate dict key (replace).
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2798774e96)
2019-05-15 13:35:55 +02:00
Dimitri Savineau 1e23d853f9 purge-docker-cluster: remove docker data
We never clean the content of /var/lib/docker so we can still have
some data present in this directory after run the purge playbook.
Pip isn't used anymore.
Also update the docker package name (especially the python binding
one).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 168d7cd016)
2019-05-14 11:00:30 +02:00
Dimitri Savineau bd33bcef2b container-common: allow podman for other distros
Currently podman installation is very tied to RHEL 8 even if we're
able to install it on Debian/Ubuntu distribution.
This patch changes the way we are starting or not the (fat) container
daemon. Before the condition was based on the distribution release
and now on the container_service_name variable.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d2ad191eca)
2019-05-13 10:36:22 -04:00
Bruceforce f34c1dcd9d ceph-nfs: fixed with_items
If we do this in one line we get the error described in #3968

fixes #3968

Signed-off-by: Bruceforce <markus.greis@gmx.de>
(cherry picked from commit c3b0ee30a1)
2019-05-13 10:36:12 -04:00
Dimitri Savineau 6814fd5ce5 gather-ceph-logs: fix logs list generation
The shell module doesn't have a stdout_lines attributes. Instead of
using the shell module, we can use the find modules.

Also adding `become: false` to the local tmp directory creation
otherwise we won't have enough right to fetch the files into this
directory.

Resolves: #3966

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ea1f8f551c)
2019-05-13 10:33:26 -04:00
Dimitri Savineau 6a48ff8a37 Update RHCS version with Nautilus
RHCS 4 will be based on Nautilus and only usable on RHEL 8.
Updated the default ceph_rhcs_version to 4 and update the rhcs
repositories to rhcs 4 with RHEL 8.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ba49225eab)
2019-05-13 16:23:24 +02:00
Bruceforce a007be17b7 ceph-nfs: fixed condition for "stable repos specific tasks"
The old condition would resolve to
"when": "nfs_ganesha_stable - ceph_repository == 'community'"

now it is
"when": [
          "nfs_ganesha_stable",
          "ceph_repository == 'community'"
        ]

Please backport to stable-4.0

Signed-off-by: Bruceforce <markus.greis@gmx.de>
(cherry picked from commit 29f2c953b4)
2019-05-13 11:05:40 +02:00
Kevin Coakley e1b5b20111 Set the rgw_create_pools pools application to rgw
Set the application to rgw for pools created from rgw_create_pools. On Ceph Nautilus the heath is set to HEALTH_WARN with the message "application not enabled on X pool(s)" if an application isn't specified for a pool.

Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>
(cherry picked from commit 381c58ca3e)
2019-05-13 11:05:14 +02:00
Rishabh Dave 8959ed50a5 ceph-mds: group similar tasks in create_mds_filesystem.yml
Group similar tasks together using block keyword.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 1a4dccdbb9)
2019-05-10 15:54:40 +02:00
Rishabh Dave 238a2696a6 ceph-rbd-mirror: refactor tasks/main.yml
Use blocks for similar tasks in main.yml. And move when keywords before
block keywords.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 121b5e4184)
2019-05-10 15:54:16 +02:00
Mike Christie 78a55a3df3 igw: Fix rolling update service ordering
We must stop tcmu-runner after the other rbd-target-* services
because they may need to interact with tcmu-runner during shutdown.
There is also a bug in some kernels where IO can get stuck in the
kernel and by stopping rbd-target-* first we can make sure all IO is
flushed.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1659611

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit d7ef12910e)
2019-05-10 15:53:44 +02:00
Dimitri Savineau 975987d043 tox: Refact lvm_osds scenario
The current lvm_osds only tests filestore on one OSD node.
We also have bs_lvm_osds to test bluestore and encryption.
Let's use only one scenario to test filestore/bluestore and with or
without dmcrypt on four OSD nodes.
Also use validate_dmcrypt_bool_value instead of types.boolean on
dmcrypt validation via notario.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 52b9f3fb28)
2019-05-09 13:11:33 +02:00
Guillaume Abrioux cc6127d669 facts: fix external cluster bug
running an external ceph cluster deployment with (obviously) no
monitors defined in inventory breaks with an undefined error because
`_monitor_addresses` never get defined.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1707460

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 936c6fca78)
2019-05-09 08:30:33 +02:00
Rishabh Dave 9e6b2e3bc5 don't access other node's docker_exec_cmd variable
Except for some corner case, it's not correct to access some other
node's copy of variable docker_exec_cmd. Therefore replace
"hostvars[groups[mon_group_name][0]]['docker_exec_cmd']" by
"docker_exec_cmd".

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 89748d579a)
2019-05-07 17:56:30 +02:00
Rishabh Dave df95900913 ceph-mgr: create keys for MGRs
Add code in ceph-mgr for creating a keyring for manager in so that
managers can be deployed on a separate node too.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 56bfec7c58)
2019-05-07 15:12:29 +02:00
Rishabh Dave ab50486dd9 allow adding a manager to a deployed cluster
Add a playbook that deploys manager on a new node and adds that node to
the already deployed Ceph cluster.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit d2cfd8b780)
2019-05-07 15:12:29 +02:00
Rishabh Dave a56bbb46fe allow adding a RGW to already deployed cluster
Add a tox scenario that adds a new RGW node as a part of already
deployed Ceph cluster and deploys RGW there.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit f201222447)

Conflicts:
	tox.ini
replaced "dev" and "nautilus" during cherry-pick.
2019-05-07 14:03:29 +02:00
Rishabh Dave b6d5352783 remove infrastructure-playbooks/rgw-standalone.yml
We don't need infrastructure-playbooks/rgw-standalone.yml since
site.yml.sample and site-cotainer.yml.sample can add a new RGW node to
an already deployed Ceph cluster.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 6e8fb2b3ea)
2019-05-07 13:11:48 +02:00
Gaudenz Steinlin 29650e71d8 Fix check mode support
Adds "check_mode: no" to commands which register cluster state in a
variable and don't modify anything. These commands have to run in order
to support running the playbook in check mode.

Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch>
(cherry picked from commit 3c8987c7a5)
2019-05-07 13:07:45 +02:00
Rishabh Dave ae9fa7ca09 allow adding a RBD mirror to already deployed cluster
Add a tox scenario that adds a new RBD mirror node as a part of already
deployed Ceph cluster and deploys RBD mirror there.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 221b2b4988)

Conflicts:
	tox.ini
"dev" was to replaced by "nautilus" in "envlist"
2019-05-07 11:37:53 +02:00
letterwuyu 27a8179cd8 Fix comment content
Signed-off-by: lishuhao letterwuyu@gmail.com
(cherry picked from commit d57f6fcdc6)
2019-05-07 11:11:22 +02:00
Rishabh Dave 06b3ab2a6b improve coding style
Keywords requiring only one item shouldn't express it by creating a
list with single item.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 739a662c80)

Conflicts:
	roles/ceph-mon/tasks/ceph_keys.yml
	roles/ceph-validate/tasks/check_devices.yml
2019-05-06 15:09:06 +00:00
Dimitri Savineau 4752327340 ansible: remove private and static attribute
This will be removed in ansible 2.8 and breaks the playbook execution
with this release.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ae266c6f2b)
2019-05-02 20:21:26 -04:00
Dimitri Savineau 2eb7642ad3 ceph-mds: Increase cpu limit to 4
In containerized deployment the default mds cpu quota is too low
for production environment.
This is causing performance degradation compared to bare-metal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1695850

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1999cf3d19)
2019-04-30 12:12:01 -04:00
Dimitri Savineau d8688e0eb9 ceph-osd: Increase cpu limit to 4
In containerized deployment the default osd cpu quota is too low
for production environment using NVMe devices.
This is causing performance degradation compared to bare-metal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1695880

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c17106874c)
2019-04-30 12:11:42 -04:00
Dimitri Savineau 92340d049c rolling_update: restart all ceph-iscsi services
Currently only rbd-target-gw service is restarted during an update.
We also need to restart tcmu-runner and rbd-target-api services
during the ceph iscsi upgrade.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1659611

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f1048627ea)
2019-04-30 12:09:52 -04:00
Dimitri Savineau e29a8a1f31 ceph-iscsi: start tcmu-runner for non-container
Only rbd-target-api and rbd-target-gw were started/enabled for non
containerized deployment.
The issue doesn't happen with containerized setup.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4ae5ce399b)
2019-04-29 23:03:59 +00:00
Dimitri Savineau 7c62aaa78a tests: group and parametrize tests
Instead of creating a dedicated test and using the same testinfra
module we can group them into a single test to avoid multiple ansible
connections and testinfra module execution.
This patch also adds parametrize pytest decorator when possible.
Finally fixing some flake minor issue.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 564ec9c992)
2019-04-29 23:03:59 +00:00
Rishabh Dave ebd2ae520d ceph-config: remove redundant condition on a block
Signed-off-by: Rishabh Dave <ridave@redhat.com>
2019-04-25 13:51:58 +02:00
Dimitri Savineau 748605293e tox: Remove update scenario reference
update scenario is now handled by tox-update.ini file so we shoudn't
have update reference in tox.ini file.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8ab6a3391f)
2019-04-24 15:12:24 -04:00
Dimitri Savineau 690336aabd Update group_vars according to defaults
b2f2426 didn't use the generate_group_vars_sample.sh script so we
currently have a difference between the content in group_vars and the
ceph-defaults/defaults directories.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1eeddc394d)
2019-04-24 15:10:42 -04:00
Rishabh Dave cad35d5c52 "when" keyword should precede "block" keyword
Otherwise the reader is forced to search for "when" when blocks are too
long.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit e0beaf123a)

Conflicts:
	roles/ceph-config/tasks/main.yml
	roles/ceph-container-common/tasks/pre_requisites/prerequisites.yml
	roles/ceph-validate/tasks/check_devices.yml
2019-04-24 16:25:43 +02:00
Guillaume Abrioux f360f8d1a9 validate: fix a typo
5aa2779461 introduced a typo.
This commit fixes it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d6e28ffd27)
2019-04-23 10:19:42 -04:00
Guillaume Abrioux fa388f51cb validate: fix notario error
Typical error:

```
AttributeError: 'Invalid' object has no attribute 'message'
```

As of python 2.6, `BaseException.message` has been deprecated.
When using python3, it fails because it has been removed.

Let's use `str(error)` instead so we don't hit this error when using
python3.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2326180bf9)
2019-04-23 09:47:42 -04:00
Kyle Bader cd0eddc460 rgw: add cpuset support
1/ The OSD already supports cpuset to be used for containerized deployments
through the use of the ceph_osd_docker_cpuset_cpus variable. This adds similar
support to the RGW service for containerized deployments by setting a new
variable named ceph_rgw_docker_cpuset_cpus. Like the OSD, there are times where
using distinct cores has advantages over using the CFS in kernel scheduler.

ceph_rgw_docker_cpuset_cpus accepts a comma delimited set of CPU ids

2/ Add support for specifying --cpuset-mem variable to restrict the cgroup's memory
allocations to a particular numa node, which should typically correspond with
the cpu ids of that numa node that were provided with --cpuset-cpus. To ensure
the correct cpu ids are used one can run `numactl --hardware`  to list the nodes
and which cpu ids correspond to each.

Signed-off-by: Kyle Bader <kbader@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0bee90b201)
2019-04-23 09:09:32 +02:00
Radu Toader 6e02e5faae Allow CephFS pool to be created with specific rule_name, erasure_profile just like rbd pools
Signed-off-by: Radu Toader <radu.m.toader@gmail.com>
(cherry picked from commit b2f242660e)
2019-04-20 06:40:08 +00:00
Dimitri Savineau f770917517 ceph-container-common: modify requirement flow
Until now it was not possible to install a specific container package
because it was somehow hardcoded.
This patch allows to override the container package name (docker.io
vs docker-ce) and refacts the package installation. This could be
achieve via the container_package_name variable.
Instead of using one task per distribution we can set the package and
service name in vars. This allows to have a unified package task.
Also refactorize the debian_prerequisites tasks because the content
was outdated.

https://docs.docker.com/install/linux/docker-ce/debian/
https://docs.docker.com/install/linux/docker-ce/ubuntu/

Resolves: #3609

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8105a1cefb)
2019-04-19 04:07:22 +00:00
Andrew Schoen f1e04835f4 rolling_update: ceph commands should use --cluster
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit e2529dcd7f)
2019-04-18 19:12:13 +02:00
Andrew Schoen 545d93aae8 rolling_update: set num_osds to the number of running osds
We do this so that the ceph-config role can most accurately
report the number of osds for the generation of the ceph.conf
file.

We don't want to use ceph-volume to determine the number of
osds because in an upgrade to nautilus ceph-volume won't be able to
accurately count osds created by ceph-disk.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 67453853ff)
2019-04-18 19:12:13 +02:00
Andrew Schoen 1e0e50fc90 ceph-osd: do not run lvm batch tasks during update
When performing a rolling update do not try to create
any new osds with `ceph-volume lvm batch`. This is troublesome
because when upgrading to nautilus the devices list might contain
devices that are currently being used by ceph-disk and have GPT
headers on them, which will cause ceph-volume to fail when
trying to use such a device. Any devices originally created
by ceph-disk will need to be removed from the devices list
before any new osds can be created.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 5e3dfe5021)
2019-04-18 19:12:13 +02:00
Andrew Schoen ba7d4c4954 tests: adds the migrate_ceph_disk_to_ceph_volume scenario
This test deploys a luminous cluster with ceph-disk created osds
and then upgrades to nautilus and migrates those osds to ceph-volume.
The nodes are then rebooted and cluster state verified.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 399a821439)
2019-04-18 19:12:13 +02:00
Andrew Schoen c28388bb06 rolling_update: migrate ceph-disk osds to ceph-volume
When upgrading to nautlius run ``ceph-volume simple scan`` and
``ceph-volume simple activate --all`` to migrate any running
ceph-disk osds to ceph-volume.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1656460

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 28c47e4d1b)
2019-04-18 19:12:13 +02:00
Dimitri Savineau 2d3c636fa8 ceph-mgr: Add extra module packages
Since Nautilus there's mgr extra modules not present in ceph-mgr
package but in dedicated packages.

Resolves: #3860

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 86315272c7)
2019-04-18 19:10:31 +02:00
Guillaume Abrioux 35afd6a63a update: ensure tasks are executed on an upgraded mon
These tasks must be run from a monitor which is upgraded otherwise it
might fail.
See: https://tracker.ceph.com/issues/39355

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7eb42c9e8e)
2019-04-18 19:10:10 +02:00
Guillaume Abrioux 495711f296 update: ensure ceph command returns 0
these commands could return something else than 0.
Let's ensure all retries have been done before actually failing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ed84325b1d)
2019-04-18 19:10:10 +02:00
Guillaume Abrioux 4a678ac102 update: set osd flags before upgrading any mon
Typical error:

```
failed: [mon0 -> mon2] (item=noout) => changed=true
  cmd:
  - ceph
  - --cluster
  - ceph
  - osd
  - set
  - noout
  delta: '0:00:00.293756'
  end: '2019-04-17 06:31:57.552386'
  item: noout
  msg: non-zero return code
  rc: 1
  start: '2019-04-17 06:31:57.258630'
  stderr: |-
    Traceback (most recent call last):
      File "/bin/ceph", line 1222, in <module>
        retval = main()
      File "/bin/ceph", line 1146, in main
        sigdict = parse_json_funcsigs(outbuf.decode('utf-8'), 'cli')
      File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs
        cmd['sig'] = parse_funcsig(cmd['sig'])
      File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig
        raise JsonFormat(s)
    ceph_argparse.JsonFormat: unknown type CephBool
  stderr_lines:
  - 'Traceback (most recent call last):'
  - '  File "/bin/ceph", line 1222, in <module>'
  - '    retval = main()'
  - '  File "/bin/ceph", line 1146, in main'
  - '    sigdict = parse_json_funcsigs(outbuf.decode(''utf-8''), ''cli'')'
  - '  File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs'
  - '    cmd[''sig''] = parse_funcsig(cmd[''sig''])'
  - '  File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig'
  - '    raise JsonFormat(s)'
  - 'ceph_argparse.JsonFormat: unknown type CephBool'
  stdout: ''
  stdout_lines: <omitted>
```

Having mixed versions of monitors seems to cause this error.
Moving these tasks before any monitor gets upgraded seems to be enough
to get around this issue.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 543d1e2e41)
2019-04-18 19:10:10 +02:00