Commit Graph

4711 Commits (ee29f7370add57bb016b693f662b1a34a0a7c607)
 

Author SHA1 Message Date
Dimitri Savineau c17106874c ceph-osd: Increase cpu limit to 4
In containerized deployment the default osd cpu quota is too low
for production environment using NVMe devices.
This is causing performance degradation compared to bare-metal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1695880

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-24 17:59:42 +02:00
Jugwan Eom 6c4f48812a validate: check custom repository config options
This adds missing configuration options when the 'custom'
 repository is used.

Signed-off-by: Jugwan Eom <zugwan@gmail.com>
2019-04-24 11:34:12 +02:00
Dimitri Savineau 4ae5ce399b ceph-iscsi: start tcmu-runner for non-container
Only rbd-target-api and rbd-target-gw were started/enabled for non
containerized deployment.
The issue doesn't happen with containerized setup.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-24 10:03:25 +02:00
Dimitri Savineau 564ec9c992 tests: group and parametrize tests
Instead of creating a dedicated test and using the same testinfra
module we can group them into a single test to avoid multiple ansible
connections and testinfra module execution.
This patch also adds parametrize pytest decorator when possible.
Finally fixing some flake minor issue.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-24 10:03:25 +02:00
Dimitri Savineau 8ab6a3391f tox: Remove update scenario reference
update scenario is now handled by tox-update.ini file so we shoudn't
have update reference in tox.ini file.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-24 10:02:45 +02:00
Dimitri Savineau 1eeddc394d Update group_vars according to defaults
b2f2426 didn't use the generate_group_vars_sample.sh script so we
currently have a difference between the content in group_vars and the
ceph-defaults/defaults directories.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-24 09:48:25 +02:00
Dimitri Savineau f1048627ea rolling_update: restart all ceph-iscsi services
Currently only rbd-target-gw service is restarted during an update.
We also need to restart tcmu-runner and rbd-target-api services
during the ceph iscsi upgrade.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1659611

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-24 07:47:23 +00:00
Guillaume Abrioux d6e28ffd27 validate: fix a typo
5aa2779461 introduced a typo.
This commit fixes it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-23 10:18:05 -04:00
Rishabh Dave 739a662c80 improve coding style
Keywords requiring only one item shouldn't express it by creating a
list with single item.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2019-04-23 15:37:07 +02:00
Guillaume Abrioux 2326180bf9 validate: fix notario error
Typical error:

```
AttributeError: 'Invalid' object has no attribute 'message'
```

As of python 2.6, `BaseException.message` has been deprecated.
When using python3, it fails because it has been removed.

Let's use `str(error)` instead so we don't hit this error when using
python3.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-23 09:36:19 -04:00
Radu Toader b2f242660e Allow CephFS pool to be created with specific rule_name, erasure_profile just like rbd pools
Signed-off-by: Radu Toader <radu.m.toader@gmail.com>
2019-04-20 02:26:05 +00:00
Dimitri Savineau 8105a1cefb ceph-container-common: modify requirement flow
Until now it was not possible to install a specific container package
because it was somehow hardcoded.
This patch allows to override the container package name (docker.io
vs docker-ce) and refacts the package installation. This could be
achieve via the container_package_name variable.
Instead of using one task per distribution we can set the package and
service name in vars. This allows to have a unified package task.
Also refactorize the debian_prerequisites tasks because the content
was outdated.

https://docs.docker.com/install/linux/docker-ce/debian/
https://docs.docker.com/install/linux/docker-ce/ubuntu/

Resolves: #3609

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-18 16:18:01 +02:00
Florian Haas 37962fec92 doc: update index.rst with current information for stable-4.0
With the stable-4.0 branch nearing release, update
docs/source/index.rst with current information about which Ceph
releases are supported, and which Ansible versions are required, for
each branch.

Signed-off-by: Florian Haas <florian@citynetwork.eu>
2019-04-18 16:16:46 +02:00
Guillaume Abrioux 58f3851573 mds: remove legacy task
this task has nothing to do in stable-4.0 and after.
Let's remove it since stable-4.0 and after aren't intended to deploy
luminous.

Closes: #3873

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-18 15:55:45 +02:00
Kyle Bader 0bee90b201 rgw: add cpuset support
1/ The OSD already supports cpuset to be used for containerized deployments
through the use of the ceph_osd_docker_cpuset_cpus variable. This adds similar
support to the RGW service for containerized deployments by setting a new
variable named ceph_rgw_docker_cpuset_cpus. Like the OSD, there are times where
using distinct cores has advantages over using the CFS in kernel scheduler.

ceph_rgw_docker_cpuset_cpus accepts a comma delimited set of CPU ids

2/ Add support for specifying --cpuset-mem variable to restrict the cgroup's memory
allocations to a particular numa node, which should typically correspond with
the cpu ids of that numa node that were provided with --cpuset-cpus. To ensure
the correct cpu ids are used one can run `numactl --hardware`  to list the nodes
and which cpu ids correspond to each.

Signed-off-by: Kyle Bader <kbader@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-18 15:55:19 +02:00
Dimitri Savineau 86315272c7 ceph-mgr: Add extra module packages
Since Nautilus there's mgr extra modules not present in ceph-mgr
package but in dedicated packages.

Resolves: #3860

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-18 15:31:22 +02:00
Guillaume Abrioux 7eb42c9e8e update: ensure tasks are executed on an upgraded mon
These tasks must be run from a monitor which is upgraded otherwise it
might fail.
See: https://tracker.ceph.com/issues/39355

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-18 11:16:11 +02:00
Guillaume Abrioux ed84325b1d update: ensure ceph command returns 0
these commands could return something else than 0.
Let's ensure all retries have been done before actually failing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-18 11:16:11 +02:00
Guillaume Abrioux 543d1e2e41 update: set osd flags before upgrading any mon
Typical error:

```
failed: [mon0 -> mon2] (item=noout) => changed=true
  cmd:
  - ceph
  - --cluster
  - ceph
  - osd
  - set
  - noout
  delta: '0:00:00.293756'
  end: '2019-04-17 06:31:57.552386'
  item: noout
  msg: non-zero return code
  rc: 1
  start: '2019-04-17 06:31:57.258630'
  stderr: |-
    Traceback (most recent call last):
      File "/bin/ceph", line 1222, in <module>
        retval = main()
      File "/bin/ceph", line 1146, in main
        sigdict = parse_json_funcsigs(outbuf.decode('utf-8'), 'cli')
      File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs
        cmd['sig'] = parse_funcsig(cmd['sig'])
      File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig
        raise JsonFormat(s)
    ceph_argparse.JsonFormat: unknown type CephBool
  stderr_lines:
  - 'Traceback (most recent call last):'
  - '  File "/bin/ceph", line 1222, in <module>'
  - '    retval = main()'
  - '  File "/bin/ceph", line 1146, in main'
  - '    sigdict = parse_json_funcsigs(outbuf.decode(''utf-8''), ''cli'')'
  - '  File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs'
  - '    cmd[''sig''] = parse_funcsig(cmd[''sig''])'
  - '  File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig'
  - '    raise JsonFormat(s)'
  - 'ceph_argparse.JsonFormat: unknown type CephBool'
  stdout: ''
  stdout_lines: <omitted>
```

Having mixed versions of monitors seems to cause this error.
Moving these tasks before any monitor gets upgraded seems to be enough
to get around this issue.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-18 11:16:11 +02:00
Guillaume Abrioux a4bc7bda51 update: refact msgr2 migration
this commit refact the msgr2 protocol introduction.

If it's a fresh install, let's go with v2 only.
If we upgrade to nautilus, we should go with v2+v1 syntax to ensure
nothing breaks.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-18 11:16:11 +02:00
Andrew Schoen e2529dcd7f rolling_update: ceph commands should use --cluster
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2019-04-18 10:55:11 +02:00
Andrew Schoen 67453853ff rolling_update: set num_osds to the number of running osds
We do this so that the ceph-config role can most accurately
report the number of osds for the generation of the ceph.conf
file.

We don't want to use ceph-volume to determine the number of
osds because in an upgrade to nautilus ceph-volume won't be able to
accurately count osds created by ceph-disk.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2019-04-18 10:55:11 +02:00
Andrew Schoen 5e3dfe5021 ceph-osd: do not run lvm batch tasks during update
When performing a rolling update do not try to create
any new osds with `ceph-volume lvm batch`. This is troublesome
because when upgrading to nautilus the devices list might contain
devices that are currently being used by ceph-disk and have GPT
headers on them, which will cause ceph-volume to fail when
trying to use such a device. Any devices originally created
by ceph-disk will need to be removed from the devices list
before any new osds can be created.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2019-04-18 10:55:11 +02:00
Andrew Schoen 399a821439 tests: adds the migrate_ceph_disk_to_ceph_volume scenario
This test deploys a luminous cluster with ceph-disk created osds
and then upgrades to nautilus and migrates those osds to ceph-volume.
The nodes are then rebooted and cluster state verified.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2019-04-18 10:55:11 +02:00
Andrew Schoen 28c47e4d1b rolling_update: migrate ceph-disk osds to ceph-volume
When upgrading to nautlius run ``ceph-volume simple scan`` and
``ceph-volume simple activate --all`` to migrate any running
ceph-disk osds to ceph-volume.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1656460

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2019-04-18 10:55:11 +02:00
Dimitri Savineau c8814d1331 ceph-iscsi-gw: Remove library directory
The library directory that contain the custom ceph modules in present
in the ceph-ansible root directory.
All igw_* mocules are already present there so we don't need the one
present in roles/ceph-iscsi-gw/library.
Also remove the associated spec file.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-18 10:37:57 +02:00
Dimitri Savineau f601549a8a test_osds: remove scenario leftover
Since there's only only scenario available we don't need lvm_scenario
and no_lvm_scenario.
Also add missing assert for ceph-volume tests.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-17 17:28:12 +02:00
Dimitri Savineau e471bce76b allow using ansible 2.8
Currently we only support ansible 2.7
We plan to use 2.8 when it will be release so we have to support both
2.7 and 2.8.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1700548

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-17 16:57:37 +02:00
Dimitri Savineau 9f99f539f7 tests/functional/setup: change mount options
In the CI jobs we can change the mount options of the main partition
to avoid extra operations on disk.
Adding jmespath to tests/requirements.txt due to the json_query
filter usage.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-17 08:23:07 +02:00
Dimitri Savineau c84a74592a test_mons: test mon listening on port 3300
Since nautilus and msgr2 the monitors also bind on port 3300 in
addition of 6789.
This patch updates test_mons to reflect that change.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-17 08:19:48 +02:00
Guillaume Abrioux edfa4310d3 defaults: refact package dependencies installation.
Because 5c98e361df could be seen as a non
backward compatible change this commit reverts it and bring back package
dependencies installation support.
Let's just modify the default value instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-16 11:07:59 -04:00
Guillaume Abrioux 83df60cbc3 defaults: remove some package dependencies
These packages aren't needed anymore.
They were needed for ceph-init-detect buti as of ceph-init-detect doesn't exist
anymore.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1683885

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-15 11:28:58 -04:00
Rishabh Dave d5967af7fb allow adding a monitor to a deployed cluster
Add a playbook that deploys a new monitor on a new node, adds that node
to the Ceph cluster and the monitor to the quorum and updates the ceph
configuration file on OSD nodes.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2019-04-15 10:00:50 +02:00
Rishabh Dave 96c180cc0e check if mon daemon is installed before restarting it
Signed-off-by: Rishabh Dave <ridave@redhat.com>
2019-04-15 10:00:50 +02:00
Guillaume Abrioux edf1ee2073 mon: check if an initial monitor keyring already exists
When adding a new monitor, we must reuse the existing initial monitor
keyring. Otherwise, the new monitor will issue its 'mkfs' with a new
monitor keyring and it will result with a mismatch between them. The
new monitor will be unable to join the quorum in the end.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Rishabh Dave <ridave@redhat.com>
2019-04-15 10:00:50 +02:00
Dimitri Savineau eb658b3af6 purge-cluster: remove python-ceph-argparse package
When using purge-cluster playbook with nautilus, there's still the
python-ceph-argparse package installed on the host preventing to
reinstall a ceph cluster with a different version (like luminous or
mimic)

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-15 09:15:08 +02:00
Dimitri Savineau 2c8b585edb docs: Update ceph.conf supported section
[rgw] isn't a valide section.
[client.rgw.{instance_name] should be used instead.

Resolves: #3841

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-15 09:09:02 +02:00
Dimitri Savineau 150acba8c5 switch-from-non-containerized: stop all osds
e6bfb84 introduced a regression in the switch from non containerized
to container deployment.
We need to stop all previous OSDs services. We just don't need the
ceph-disk pattern in the regex.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-11 16:26:53 -04:00
Guillaume Abrioux a1254d767c purge: remove references to ceph-disk
as of stable-4.0, ceph-disk is no longer supported.
These tasks aren't needed anymore.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 11:57:02 -04:00
Guillaume Abrioux 73aa788459 shrink-osd: remove legacy playbook
as of stable-4.0, ceph-disk is no longer supported.
Let's remove this legacy version of the playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 11:57:02 -04:00
Guillaume Abrioux e6bfb843f4 switch_to_containers: remove ceph-disk references
as of stable-4.0, ceph-disk is no longer supported.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 11:57:02 -04:00
Guillaume Abrioux f899da3172 osd: remove legacy file
this file is not used anymore, let's remove it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 11:57:02 -04:00
Guillaume Abrioux 3519281b44 tests: pass osd_scenario value to lvm_setup.yml
we must pass the value of osd_scenario from the stable-3.2 branch which
is used for the initial deployment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 11:57:02 -04:00
Guillaume Abrioux 83e84c6a4a tests: remove test_journal_collocation.py in OSD testing
this test is related to ceph-disk which is dropped as of stable-4.0

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 11:57:02 -04:00
Guillaume Abrioux bb15c19519 resync sample file
d17b1b48b6 introduced a change that hasn't been reported in sample files

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 11:57:02 -04:00
Guillaume Abrioux 4f68462009 osd: remove ceph-disk scenarios files
these files aren't needed anymore since we only use lvm scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 11:57:02 -04:00
Guillaume Abrioux f0416c8892 osd: remove dedicated_devices variable
This variable was related to ceph-disk scenarios.
Since we are entirely dropping ceph-disk support as of stable-4.0, let's
remove this variable.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 11:57:02 -04:00
Guillaume Abrioux 4d35e9eeed osd: remove variable osd_scenario
As of stable-4.0, the only valid scenario is `lvm`.
Thus, this makes this variable useless.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 11:57:02 -04:00
Guillaume Abrioux 4d5637fd8a osd: remove legacy file
ceph_disk_cli_options_facts.yml is not used anymore, let's remove it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-11 11:57:02 -04:00
Sébastien Han 2888c0825f validate: only check device when they are devices
We only validate the devices that are passed if there is a list of
devices to validate.

Signed-off-by: Sébastien Han <seb@redhat.com>
2019-04-11 11:57:02 -04:00