Commit Graph

4279 Commits (28009496f60fed9fc9123b691c62ba4d2b3e31d1)
 

Author SHA1 Message Date
Dimitri Savineau 8a74928a19 tox: Refact lvm_osds scenario
The current lvm_osds only tests filestore on one OSD node.
We also have bs_lvm_osds to test bluestore and encryption.
Let's use only one scenario to test filestore/bluestore and with or
without dmcrypt on four OSD nodes.
Also use validate_dmcrypt_bool_value instead of types.boolean on
dmcrypt validation via notario.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 52b9f3fb28)
2019-05-10 11:24:32 +02:00
Mike Christie 0a24078bbb igw: Fix rolling update service ordering
We must stop tcmu-runner after the other rbd-target-* services
because they may need to interact with tcmu-runner during shutdown.
There is also a bug in some kernels where IO can get stuck in the
kernel and by stopping rbd-target-* first we can make sure all IO is
flushed.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1659611

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit d7ef12910e)
2019-05-10 11:12:50 +02:00
Guillaume Abrioux 900244e065 Revert "Revert "cv: support zap by osd fsid""
This reverts commit addcc1e61a.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-05-10 09:13:10 +02:00
Guillaume Abrioux f1b4874176 Revert "Revert "shrink_osd: use cv zap by fsid to remove parts/lvs""
This reverts commit 043ee8c158.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-05-10 09:13:10 +02:00
Guillaume Abrioux 5053f32c15 osds: allow passing devices by path
ceph-volume didn't work when the devices where passed by path.
Since it now support it, let's allow this feature in ceph-ansible

Closes: #3812

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8f2c45dfd3)
2019-05-09 14:21:43 +02:00
Guillaume Abrioux addcc1e61a Revert "cv: support zap by osd fsid"
This reverts commit 8454f0144a.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-25 21:27:37 +02:00
Guillaume Abrioux 043ee8c158 Revert "shrink_osd: use cv zap by fsid to remove parts/lvs"
This reverts commit be59e0b451.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-25 21:27:37 +02:00
Dimitri Savineau 2fa8099fa7 osd: set default bluestore_wal_devices empty
We only need to set the wal dedicated device when there's three tiers
of storage used.
Currently the block.wal partition will also be created on the same
device than block.db.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1685253

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-25 07:13:38 +00:00
Dimitri Savineau 9ff19cc604 rolling_update: restart all ceph-iscsi services
Currently only rbd-target-gw service is restarted during an update.
We also need to restart tcmu-runner and rbd-target-api services
during the ceph iscsi upgrade.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1659611

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f1048627ea)
2019-04-24 23:17:41 +00:00
Dimitri Savineau 7418999638 ceph-mds: Increase cpu limit to 4
In containerized deployment the default mds cpu quota is too low
for production environment.
This is causing performance degradation compared to bare-metal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1695850

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1999cf3d19)
2019-04-24 21:44:23 +00:00
Dimitri Savineau 54128db5cd ceph-osd: Fix merge conflict from mergify
The PR #3916 was merged automatically by mergify even if there was a
confict in the ceph-osd-run.sh.j2 template.
This commit resolves the conflict.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-24 12:41:23 -04:00
Dimitri Savineau 3ae2a687ed ceph-osd: Increase cpu limit to 4
In containerized deployment the default osd cpu quota is too low
for production environment using NVMe devices.
This is causing performance degradation compared to bare-metal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1695880

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c17106874c)

# Conflicts:
#	roles/ceph-osd/templates/ceph-osd-run.sh.j2
2019-04-24 16:02:28 +00:00
Dimitri Savineau c056ae7b8c ansible.cfg: Add library path to configuration
Ceph module path needs to be configured if we want to avoid issues
like:

no action detected in task. This often indicates a misspelled module
name, or incorrect module path

Currently the ansible-lint command in Travis CI complains about that.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1668478

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a1a871cade)
2019-04-24 07:49:48 +00:00
Matthew Vernon 1556d802ff ceph-mon: increase timeout waiting for admin and bootstrap keys
With a large and/or busy cluster, it can take significantly more than
30s for a restarted monitor to get to the point where
`ceph-create-keys` returns successfully. A recent upgrade of our
production cluster failed here because it took a couple of minutes for
the newly-upgraded `mon` to be ready. So increase the timeout
significantly.

This patch is applied to stable-3.2, because the affected code is
refactored in stable-4.0 and ceph-create-keys is no longer called.

Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
2019-04-12 17:03:39 +00:00
Dimitri Savineau f3785ef7dd tests: Add debug to ceph-override.json
It's usefull to have logs in debug mode enabled in order to have
more information for developpers.
Also reindent to json file.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d25af1b872)
2019-04-11 15:38:14 +00:00
Dimitri Savineau e3e6285aa9 tests/functional: use ceph-override.json symlink
We don't need to have multiple ceph-override.json copies. We
currently already have symlink to all_daemons/ceph-override.json so
we can do it for all scenarios.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a19054be18)
2019-04-11 15:38:14 +00:00
Dimitri Savineau 56215d7688 ceph-mds: Set application pool to cephfs
We don't need to use the cephfs variable for the application pool
name because it's always cephfs.
If the cephfs variable is set to something else than the default
value it will break the appplication pool task.

Resolves: #3790

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d2efb7f02b)
2019-04-11 15:38:14 +00:00
Guillaume Abrioux c5c354a61a remove all NBSPs char in stable-3.2 branch
this can cause issues, let's replace all of these chars with real
spaces.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-10 13:27:48 +02:00
Matthew Vernon a8c9b65d13 UCA: Uncomment UCA variables in defaults, fix consequent breakage
The Ubuntu Cloud Archive-related (UCA) defaults in
roles/ceph-defaults/defaults/main.yml were commented out, which means
if you set `ceph_repository` to "uca", you get undefined variable
errors, e.g.

```
The task includes an option with an undefined variable. The error was: 'ceph_stable_repo_uca' is undefined

The error appears to have been in '/nfs/users/nfs_m/mv3/software/ceph-ansible/roles/ceph-common/tasks/installs/debian_uca_repository.yml': line 6, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

- name: add ubuntu cloud archive repository
  ^ here

```

Unfortunately, uncommenting these results in some other breakage,
because further roles were written that use the fact of
`ceph_stable_release_uca` being defined as a proxy for "we're using
UCA", so try and install packages from the bionic-updates/queens
release, for example, which doesn't work. So there are a few `apt` tasks
that need modifying to not use `ceph_stable_release_uca` unless
`ceph_origin` is `repository` and `ceph_repository` is `uca`.

Closes: #3475
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
(cherry picked from commit 9dd913cf8a)
2019-04-09 16:54:37 +00:00
Dimitri Savineau efa0083f3c ceph-osd: Drop memory flag with bluestore
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit dc1c0dcee2)
2019-04-09 13:26:20 +00:00
Dimitri Savineau bbb8ca6643 mon/rgw: use last ipv6 address
When using monitor_address_block or radosgw_address_block variables
to configure the mon/rgw address we're getting the first ip address
from the ansible facts present in that cidr.
When there's VIP on that network the first filter could return the
wrong value.
This seems to affect only IPv6 setup because the VIP addresses are
added to the ansible facts at the beginning of the list. This is the
opposite (at the end) when using IPv4.
This causes the mon/rgw processes to bind on the VIP address.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1680155

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-09 06:17:27 +02:00
Guillaume Abrioux e8a526c5e0 tests: fix update job
jenkins sets CEPH_ANSIBLE_BRANCH to stable-3.2, this makes all
nightly job failing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-08 09:32:43 -04:00
Ali Maredia e943288cae rgw multisite: add more than 1 rgw to the master or secondary zone
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1664869

Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 37f46a8c5d)
2019-04-06 08:50:30 +00:00
Guillaume Abrioux f567f66085 tests: run lvm_setup.yml on secondary cluster
otherwise ceph-osd fails:

```
ceph-volume lvm prepare: error: Unable to proceed with non-existing device: test_group/data-lv2
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-06 08:44:53 +02:00
Dimitri Savineau d1b3d18af1 radosgw: Raise cpu limit to 8
In containerized deployment the default radosgw quota is too low
for production environment.
This is causing performance degradation compared to bare-metal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1680171

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d3ae9fd05f)
2019-04-04 19:14:28 +02:00
Guillaume Abrioux aba3d64b87 tests: do not deploy ceph@master in rgw_multisite
deploying ceph@master in stable-3.2 is not possible.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-04 17:52:19 +02:00
Guillaume Abrioux 82ed220367 tests: add back testinfra testing
136bfe0 removed testinfra testing on all scenario excepted all_daemons

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8d106c2c58)
2019-04-04 10:36:34 +00:00
Guillaume Abrioux 68a832e3c8 tests: pin pytest-xdist to 1.27.0
looks like newer version of pytest-xdist requires pytest>=4.4.0

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ba0a95211c)
2019-04-04 10:36:34 +00:00
Guillaume Abrioux 7136f1734e purge: fix lvm-batch purge osd
`lvm_volumes` and/or `devices` variable(s) can be undefined depending on
the scenario chosen.

These tasks should be run only if these variable are defined, otherwise
it ends up with undefined variable errors.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0180738313)
2019-04-03 08:48:39 +02:00
Guillaume Abrioux 3421cb08d9 tests: test idempotency only on all_daemons job
there's no need to test this on all scenarios.
testing idempotency on all_daemons should be enough and allow us to save
precious resources for the CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 136bfe096c)
2019-04-02 15:28:28 +00:00
Dimitri Savineau fa6d9c940a rolling_update: Update systemd unit regex for nvme
The systemd unit regex doesn't handle nvme devices (/dev/nvmeXn1).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1687828

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c8442f3705)
2019-04-01 15:22:24 +00:00
Guillaume Abrioux f200f1ca87 tests: refact update scenario (stable-3.2)
refact the update scenario like it has been made in master.
(see f0e616962)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-01 16:35:24 +02:00
Dimitri Savineau 8e2cfd9d24 purge-docker-cluster: Remove ceph-osd service
The systemd ceph-osd@.service file used for starting the ceph osd
containers is used in all osd_scenarios.
Currently purging a containerized deployment using the lvm scenario
didn't remove the ceph-osd systemd service.
If the next deployment is a non-containerized deployment, the OSDs
won't be online because the file is still present and override the
one from the package.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7cc626b72d)
2019-04-01 09:10:29 +00:00
Dimitri Savineau e08846c14c tox: Fix container purge jobs
On containerized CI jobs the playbook executed is purge-cluster.yml
but it should be set to purge-docker-cluster.yml

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bd0869cd01)
2019-04-01 06:59:15 +00:00
Guillaume Abrioux 005cb09ba9 tests: add mgr and nfs nodes in all_daemons
even not used, we need to fire up those VMs to be able to perform the
upgrade in the CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-28 15:40:43 +01:00
Dimitri Savineau e994dabaec Add uca to ceph_repository choices validation
Ubuntu cloud archive is configurable via ceph_repository variable but
the uca choice isn't accepted.
This commit fixes this issue and also validates the associated uca
repository variables.

Resolves: #3739

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 94505a3af2)
2019-03-26 10:27:08 +00:00
Guillaume Abrioux b92c826661 defaults: change default value for ceph_docker_image_tag
Since nautilus has been released, it's now the latest stable release, it
means the tag `latest` now refers to nautilus.

`stable-3.2` isn't intended to deploy nautilus, therefore, we should
change the default value for this variable to the latest release
stable-3.2 is able to deploy (mimic).

Closes: #3734

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-21 18:37:21 +00:00
Dimitri Savineau e4a71eabd9 ceph-osd: Ensure lvm2 is installed
When using osd_scenario lvm, we never check if the lvm2 package is
present on the host.
When using containerized deployment and docker on CentOS/RedHat this
package will be automatically installed as a dependency but not for
Ubuntu distribution.
OSD deployed via ceph-volume require the lvmetad.socket to be active
and running.

Resolves: #3728

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 179fdfbc19)
2019-03-20 22:59:28 +00:00
Bruceforce 567ad1826b ceph_crush: fix rstrip for python 3
Removing bytes literals since rstrip only supports type String or None.

Please backport to stable-3.2

Signed-off-by: Bruceforce <markus.greis@gmx.de>
(cherry picked from commit 6d506dba1a)
2019-03-20 01:20:30 +00:00
Bruceforce 2590d3cfba ceph_volume: fix rstrip for python 3
Removing bytes literals since rstrip only supports type String or None.

Signed-off-by: Bruceforce <markus.greis@gmx.de>
2019-03-19 18:53:50 +00:00
Phuong Nguyen 274bf3e038 Remove trailing forward slash in ceph_docker_registry variable from group_vars/rhcs.yml.sample file.
Also fixed rhcs_edits.txt for variable ceph_docker_registry.

Moved namespace to ceph_docker_image variable.

Signed-off-by: Phuong Nguyen <pnguyen@redhat.com>
(cherry picked from commit 3305309e87)
2019-03-19 14:40:27 +00:00
Guillaume Abrioux d3f6556041 osd: backward compatibility with old disk_list.sh location
Since all files in container image have moved to `/opt/ceph-container`
this check must look for new AND the old path so it's backward
compatible. Otherwise it could end up by templating an inconsistent
`ceph-osd-run.sh`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 987bdac963)
2019-03-18 21:56:53 +00:00
Dimitri Savineau 46e8898093 ceph-validate: fail if there's no ipaddr available in monitor_address_block subnet
When using monitor_address_block to determine the ip address of the
monitor node, we need an ip address available in that cidr to be
present in the ansible facts (ansible_all_ipv[46]_addresses).
Currently we don't check if there's an ip address available during
the ceph-validate role.
As a result, the ceph-config role fails due to an empty list during
ceph.conf template creation but the error isn't explicit.

TASK [ceph-config : generate ceph.conf configuration file] *****
fatal: [0]: FAILED! => {"msg": "No first item, sequence was empty."}

With this patch we will fail before the ceph deployment with an
explicit failure message.

Resolves: rhbz#1673687

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5c39735be5)
2019-03-18 18:31:18 +00:00
Gregory Orange 86e39a29c8 Change docker_container parameter network to network_mode
Addressing "populate kv_store with custom ceph.conf":
Unsupported parameters for (docker_container) module. Looking at
https://docs.ansible.com/ansible/latest/modules/docker_container_module.html
shows that the correct parameter is network_mode, not network.

Signed-off-by: Gregory Orange <gregoryo2014@users.noreply.github.com>
2019-03-18 13:23:10 +00:00
Dimitri Savineau bfa99cdd53 Set the default crush rule in ceph.conf
Currently the default crush rule value is added to the ceph config
on the mon nodes as an extra configuration applied after the template
generation via the ansible ini module.

This implies two behaviors:

1/ On each ceph-ansible run, the ceph.conf will be regenerated via
ceph-config+template and then ceph-mon+ini_file. This leads to a
non necessary daemons restart.

2/ When other ceph daemons are collocated on the monitor nodes
(like mgr or rgw), the default crush rule value will be erased by
the ceph.conf template (mon -> mgr -> rgw).

This patch adds the osd_pool_default_crush_rule config to the ceph
template and only for the monitor nodes (like crush_rules.yml).
The default crush rule id is read (if exist) from the current ceph
configuration.
The default configuration is -1 (ceph default).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1638092

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d8538ad4e1)
2019-03-14 14:48:03 +00:00
Dimitri Savineau ef9525482b add-osd.yml: Add become flag for ceph-validate
The check_devices task fails if the ceph-validate role isn't executed
as a privileged user (Permission denied).

failed: [osd0] (item=/dev/sdb) => {"changed": false, "err": "Error:
Error opening /dev/sdb: Permission denied\n", "item": "/dev/sdb",
"msg": "Error while getting device information with parted script:
'/sbin/parted -s -m /dev/sdb -- unit 'MiB' print'", "out": "", "rc": 1}

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b23c05ae52)
2019-03-12 14:48:03 +01:00
Dimitri Savineau 2f3206abeb ceph-osd: Install numactl package when needed
With 3e32dce we can run OSD containers with numactl support.
When using numactl command in a containerized deployment we need to
be sure that the corresponding package is installed on the host.
The package installation is only executed when the
ceph_osd_numactl_opts variable isn't empty.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b7f4e3e7c7)
2019-03-12 08:14:47 +00:00
Guillaume Abrioux 34086ec233 osd: support numactl options on OSD activate
This commit adds OSD containers activate with numactl support.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1684146

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b3eb9206fa)
2019-03-11 09:50:29 +00:00
Guillaume Abrioux 224bab0d70 tests: add mgrs section in non_container-collocation
No mgrs are deployed in this scenario, causing the testinfra jobs to
fail.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-05 10:49:45 +01:00
Guillaume Abrioux 36fafadc67 tests: fix collocation scenario
ceph_origin and ceph_repository are mandatory variables.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-05 10:49:45 +01:00