Commit Graph

4715 Commits (15c745d998cc0a3bb8cac6908131aa7671e476b7)
 

Author SHA1 Message Date
Guillaume Abrioux cc6127d669 facts: fix external cluster bug
running an external ceph cluster deployment with (obviously) no
monitors defined in inventory breaks with an undefined error because
`_monitor_addresses` never get defined.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1707460

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 936c6fca78)
2019-05-09 08:30:33 +02:00
Rishabh Dave 9e6b2e3bc5 don't access other node's docker_exec_cmd variable
Except for some corner case, it's not correct to access some other
node's copy of variable docker_exec_cmd. Therefore replace
"hostvars[groups[mon_group_name][0]]['docker_exec_cmd']" by
"docker_exec_cmd".

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 89748d579a)
2019-05-07 17:56:30 +02:00
Rishabh Dave df95900913 ceph-mgr: create keys for MGRs
Add code in ceph-mgr for creating a keyring for manager in so that
managers can be deployed on a separate node too.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 56bfec7c58)
2019-05-07 15:12:29 +02:00
Rishabh Dave ab50486dd9 allow adding a manager to a deployed cluster
Add a playbook that deploys manager on a new node and adds that node to
the already deployed Ceph cluster.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit d2cfd8b780)
2019-05-07 15:12:29 +02:00
Rishabh Dave a56bbb46fe allow adding a RGW to already deployed cluster
Add a tox scenario that adds a new RGW node as a part of already
deployed Ceph cluster and deploys RGW there.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit f201222447)

Conflicts:
	tox.ini
replaced "dev" and "nautilus" during cherry-pick.
2019-05-07 14:03:29 +02:00
Rishabh Dave b6d5352783 remove infrastructure-playbooks/rgw-standalone.yml
We don't need infrastructure-playbooks/rgw-standalone.yml since
site.yml.sample and site-cotainer.yml.sample can add a new RGW node to
an already deployed Ceph cluster.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 6e8fb2b3ea)
2019-05-07 13:11:48 +02:00
Gaudenz Steinlin 29650e71d8 Fix check mode support
Adds "check_mode: no" to commands which register cluster state in a
variable and don't modify anything. These commands have to run in order
to support running the playbook in check mode.

Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch>
(cherry picked from commit 3c8987c7a5)
2019-05-07 13:07:45 +02:00
Rishabh Dave ae9fa7ca09 allow adding a RBD mirror to already deployed cluster
Add a tox scenario that adds a new RBD mirror node as a part of already
deployed Ceph cluster and deploys RBD mirror there.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 221b2b4988)

Conflicts:
	tox.ini
"dev" was to replaced by "nautilus" in "envlist"
2019-05-07 11:37:53 +02:00
letterwuyu 27a8179cd8 Fix comment content
Signed-off-by: lishuhao letterwuyu@gmail.com
(cherry picked from commit d57f6fcdc6)
2019-05-07 11:11:22 +02:00
Rishabh Dave 06b3ab2a6b improve coding style
Keywords requiring only one item shouldn't express it by creating a
list with single item.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 739a662c80)

Conflicts:
	roles/ceph-mon/tasks/ceph_keys.yml
	roles/ceph-validate/tasks/check_devices.yml
2019-05-06 15:09:06 +00:00
Dimitri Savineau 4752327340 ansible: remove private and static attribute
This will be removed in ansible 2.8 and breaks the playbook execution
with this release.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ae266c6f2b)
2019-05-02 20:21:26 -04:00
Dimitri Savineau 2eb7642ad3 ceph-mds: Increase cpu limit to 4
In containerized deployment the default mds cpu quota is too low
for production environment.
This is causing performance degradation compared to bare-metal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1695850

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1999cf3d19)
2019-04-30 12:12:01 -04:00
Dimitri Savineau d8688e0eb9 ceph-osd: Increase cpu limit to 4
In containerized deployment the default osd cpu quota is too low
for production environment using NVMe devices.
This is causing performance degradation compared to bare-metal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1695880

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c17106874c)
2019-04-30 12:11:42 -04:00
Dimitri Savineau 92340d049c rolling_update: restart all ceph-iscsi services
Currently only rbd-target-gw service is restarted during an update.
We also need to restart tcmu-runner and rbd-target-api services
during the ceph iscsi upgrade.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1659611

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f1048627ea)
2019-04-30 12:09:52 -04:00
Dimitri Savineau e29a8a1f31 ceph-iscsi: start tcmu-runner for non-container
Only rbd-target-api and rbd-target-gw were started/enabled for non
containerized deployment.
The issue doesn't happen with containerized setup.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4ae5ce399b)
2019-04-29 23:03:59 +00:00
Dimitri Savineau 7c62aaa78a tests: group and parametrize tests
Instead of creating a dedicated test and using the same testinfra
module we can group them into a single test to avoid multiple ansible
connections and testinfra module execution.
This patch also adds parametrize pytest decorator when possible.
Finally fixing some flake minor issue.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 564ec9c992)
2019-04-29 23:03:59 +00:00
Rishabh Dave ebd2ae520d ceph-config: remove redundant condition on a block
Signed-off-by: Rishabh Dave <ridave@redhat.com>
2019-04-25 13:51:58 +02:00
Dimitri Savineau 748605293e tox: Remove update scenario reference
update scenario is now handled by tox-update.ini file so we shoudn't
have update reference in tox.ini file.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8ab6a3391f)
2019-04-24 15:12:24 -04:00
Dimitri Savineau 690336aabd Update group_vars according to defaults
b2f2426 didn't use the generate_group_vars_sample.sh script so we
currently have a difference between the content in group_vars and the
ceph-defaults/defaults directories.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1eeddc394d)
2019-04-24 15:10:42 -04:00
Rishabh Dave cad35d5c52 "when" keyword should precede "block" keyword
Otherwise the reader is forced to search for "when" when blocks are too
long.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit e0beaf123a)

Conflicts:
	roles/ceph-config/tasks/main.yml
	roles/ceph-container-common/tasks/pre_requisites/prerequisites.yml
	roles/ceph-validate/tasks/check_devices.yml
2019-04-24 16:25:43 +02:00
Guillaume Abrioux f360f8d1a9 validate: fix a typo
5aa2779461 introduced a typo.
This commit fixes it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d6e28ffd27)
2019-04-23 10:19:42 -04:00
Guillaume Abrioux fa388f51cb validate: fix notario error
Typical error:

```
AttributeError: 'Invalid' object has no attribute 'message'
```

As of python 2.6, `BaseException.message` has been deprecated.
When using python3, it fails because it has been removed.

Let's use `str(error)` instead so we don't hit this error when using
python3.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2326180bf9)
2019-04-23 09:47:42 -04:00
Kyle Bader cd0eddc460 rgw: add cpuset support
1/ The OSD already supports cpuset to be used for containerized deployments
through the use of the ceph_osd_docker_cpuset_cpus variable. This adds similar
support to the RGW service for containerized deployments by setting a new
variable named ceph_rgw_docker_cpuset_cpus. Like the OSD, there are times where
using distinct cores has advantages over using the CFS in kernel scheduler.

ceph_rgw_docker_cpuset_cpus accepts a comma delimited set of CPU ids

2/ Add support for specifying --cpuset-mem variable to restrict the cgroup's memory
allocations to a particular numa node, which should typically correspond with
the cpu ids of that numa node that were provided with --cpuset-cpus. To ensure
the correct cpu ids are used one can run `numactl --hardware`  to list the nodes
and which cpu ids correspond to each.

Signed-off-by: Kyle Bader <kbader@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0bee90b201)
2019-04-23 09:09:32 +02:00
Radu Toader 6e02e5faae Allow CephFS pool to be created with specific rule_name, erasure_profile just like rbd pools
Signed-off-by: Radu Toader <radu.m.toader@gmail.com>
(cherry picked from commit b2f242660e)
2019-04-20 06:40:08 +00:00
Dimitri Savineau f770917517 ceph-container-common: modify requirement flow
Until now it was not possible to install a specific container package
because it was somehow hardcoded.
This patch allows to override the container package name (docker.io
vs docker-ce) and refacts the package installation. This could be
achieve via the container_package_name variable.
Instead of using one task per distribution we can set the package and
service name in vars. This allows to have a unified package task.
Also refactorize the debian_prerequisites tasks because the content
was outdated.

https://docs.docker.com/install/linux/docker-ce/debian/
https://docs.docker.com/install/linux/docker-ce/ubuntu/

Resolves: #3609

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8105a1cefb)
2019-04-19 04:07:22 +00:00
Andrew Schoen f1e04835f4 rolling_update: ceph commands should use --cluster
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit e2529dcd7f)
2019-04-18 19:12:13 +02:00
Andrew Schoen 545d93aae8 rolling_update: set num_osds to the number of running osds
We do this so that the ceph-config role can most accurately
report the number of osds for the generation of the ceph.conf
file.

We don't want to use ceph-volume to determine the number of
osds because in an upgrade to nautilus ceph-volume won't be able to
accurately count osds created by ceph-disk.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 67453853ff)
2019-04-18 19:12:13 +02:00
Andrew Schoen 1e0e50fc90 ceph-osd: do not run lvm batch tasks during update
When performing a rolling update do not try to create
any new osds with `ceph-volume lvm batch`. This is troublesome
because when upgrading to nautilus the devices list might contain
devices that are currently being used by ceph-disk and have GPT
headers on them, which will cause ceph-volume to fail when
trying to use such a device. Any devices originally created
by ceph-disk will need to be removed from the devices list
before any new osds can be created.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 5e3dfe5021)
2019-04-18 19:12:13 +02:00
Andrew Schoen ba7d4c4954 tests: adds the migrate_ceph_disk_to_ceph_volume scenario
This test deploys a luminous cluster with ceph-disk created osds
and then upgrades to nautilus and migrates those osds to ceph-volume.
The nodes are then rebooted and cluster state verified.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 399a821439)
2019-04-18 19:12:13 +02:00
Andrew Schoen c28388bb06 rolling_update: migrate ceph-disk osds to ceph-volume
When upgrading to nautlius run ``ceph-volume simple scan`` and
``ceph-volume simple activate --all`` to migrate any running
ceph-disk osds to ceph-volume.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1656460

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 28c47e4d1b)
2019-04-18 19:12:13 +02:00
Dimitri Savineau 2d3c636fa8 ceph-mgr: Add extra module packages
Since Nautilus there's mgr extra modules not present in ceph-mgr
package but in dedicated packages.

Resolves: #3860

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 86315272c7)
2019-04-18 19:10:31 +02:00
Guillaume Abrioux 35afd6a63a update: ensure tasks are executed on an upgraded mon
These tasks must be run from a monitor which is upgraded otherwise it
might fail.
See: https://tracker.ceph.com/issues/39355

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7eb42c9e8e)
2019-04-18 19:10:10 +02:00
Guillaume Abrioux 495711f296 update: ensure ceph command returns 0
these commands could return something else than 0.
Let's ensure all retries have been done before actually failing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ed84325b1d)
2019-04-18 19:10:10 +02:00
Guillaume Abrioux 4a678ac102 update: set osd flags before upgrading any mon
Typical error:

```
failed: [mon0 -> mon2] (item=noout) => changed=true
  cmd:
  - ceph
  - --cluster
  - ceph
  - osd
  - set
  - noout
  delta: '0:00:00.293756'
  end: '2019-04-17 06:31:57.552386'
  item: noout
  msg: non-zero return code
  rc: 1
  start: '2019-04-17 06:31:57.258630'
  stderr: |-
    Traceback (most recent call last):
      File "/bin/ceph", line 1222, in <module>
        retval = main()
      File "/bin/ceph", line 1146, in main
        sigdict = parse_json_funcsigs(outbuf.decode('utf-8'), 'cli')
      File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs
        cmd['sig'] = parse_funcsig(cmd['sig'])
      File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig
        raise JsonFormat(s)
    ceph_argparse.JsonFormat: unknown type CephBool
  stderr_lines:
  - 'Traceback (most recent call last):'
  - '  File "/bin/ceph", line 1222, in <module>'
  - '    retval = main()'
  - '  File "/bin/ceph", line 1146, in main'
  - '    sigdict = parse_json_funcsigs(outbuf.decode(''utf-8''), ''cli'')'
  - '  File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs'
  - '    cmd[''sig''] = parse_funcsig(cmd[''sig''])'
  - '  File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig'
  - '    raise JsonFormat(s)'
  - 'ceph_argparse.JsonFormat: unknown type CephBool'
  stdout: ''
  stdout_lines: <omitted>
```

Having mixed versions of monitors seems to cause this error.
Moving these tasks before any monitor gets upgraded seems to be enough
to get around this issue.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 543d1e2e41)
2019-04-18 19:10:10 +02:00
Guillaume Abrioux b4377f6163 update: refact msgr2 migration
this commit refact the msgr2 protocol introduction.

If it's a fresh install, let's go with v2 only.
If we upgrade to nautilus, we should go with v2+v1 syntax to ensure
nothing breaks.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a4bc7bda51)
2019-04-18 19:10:10 +02:00
Dimitri Savineau 84d6bb226b ceph-iscsi-gw: Remove library directory
The library directory that contain the custom ceph modules in present
in the ceph-ansible root directory.
All igw_* mocules are already present there so we don't need the one
present in roles/ceph-iscsi-gw/library.
Also remove the associated spec file.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c8814d1331)
2019-04-18 16:32:58 +02:00
Guillaume Abrioux 6b5487d1e5 mds: remove legacy task
this task has nothing to do in stable-4.0 and after.
Let's remove it since stable-4.0 and after aren't intended to deploy
luminous.

Closes: #3873

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 58f3851573)
2019-04-18 10:15:43 -04:00
Dimitri Savineau e83f02ad71 test_mons: test mon listening on port 3300
Since nautilus and msgr2 the monitors also bind on port 3300 in
addition of 6789.
This patch updates test_mons to reflect that change.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c84a74592a)
2019-04-18 09:28:23 +02:00
Dimitri Savineau 85489315fa test_osds: remove scenario leftover
Since there's only only scenario available we don't need lvm_scenario
and no_lvm_scenario.
Also add missing assert for ceph-volume tests.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f601549a8a)
2019-04-18 09:27:34 +02:00
Dimitri Savineau 8edb064606 allow using ansible 2.8
Currently we only support ansible 2.7
We plan to use 2.8 when it will be release so we have to support both
2.7 and 2.8.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1700548

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit e471bce76b)
2019-04-17 18:14:58 +02:00
Guillaume Abrioux 3787c9b7ad defaults: refact package dependencies installation.
Because 5c98e361df could be seen as a non
backward compatible change this commit reverts it and bring back package
dependencies installation support.
Let's just modify the default value instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit edfa4310d3)
2019-04-16 12:06:25 -04:00
Guillaume Abrioux 5aca0996ed defaults: remove some package dependencies
These packages aren't needed anymore.
They were needed for ceph-init-detect buti as of ceph-init-detect doesn't exist
anymore.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1683885

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5c98e361df)
2019-04-16 12:06:25 -04:00
Rishabh Dave 72309b49fe allow adding a monitor to a deployed cluster
Add a playbook that deploys a new monitor on a new node, adds that node
to the Ceph cluster and the monitor to the quorum and updates the ceph
configuration file on OSD nodes.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit d5967af7fb)
2019-04-16 11:14:21 +02:00
Rishabh Dave a3e4bf3796 check if mon daemon is installed before restarting it
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 96c180cc0e)
2019-04-16 11:14:21 +02:00
Guillaume Abrioux f8b69694cc mon: check if an initial monitor keyring already exists
When adding a new monitor, we must reuse the existing initial monitor
keyring. Otherwise, the new monitor will issue its 'mkfs' with a new
monitor keyring and it will result with a mismatch between them. The
new monitor will be unable to join the quorum in the end.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit edf1ee2073)
2019-04-16 11:14:21 +02:00
Dimitri Savineau 1c3fbe5a60 purge-cluster: remove python-ceph-argparse package
When using purge-cluster playbook with nautilus, there's still the
python-ceph-argparse package installed on the host preventing to
reinstall a ceph cluster with a different version (like luminous or
mimic)

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit eb658b3af6)
2019-04-15 17:32:22 +02:00
Dimitri Savineau 02c63e8d45 docs: Update ceph.conf supported section
[rgw] isn't a valide section.
[client.rgw.{instance_name] should be used instead.

Resolves: #3841

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2c8b585edb)
2019-04-15 07:11:46 +00:00
Dimitri Savineau f90c051589 switch-from-non-containerized: stop all osds
e6bfb84 introduced a regression in the switch from non containerized
to container deployment.
We need to stop all previous OSDs services. We just don't need the
ceph-disk pattern in the regex.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 150acba8c5)
2019-04-12 00:45:21 +00:00
Guillaume Abrioux f8c544c4a8 purge: remove references to ceph-disk
as of stable-4.0, ceph-disk is no longer supported.
These tasks aren't needed anymore.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a1254d767c)
2019-04-12 00:45:21 +00:00
Guillaume Abrioux f1ede335e4 shrink-osd: remove legacy playbook
as of stable-4.0, ceph-disk is no longer supported.
Let's remove this legacy version of the playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 73aa788459)
2019-04-12 00:45:21 +00:00