Commit Graph

4321 Commits (cb0926262d9beb21f2c7746ca56a1adc6ec987c4)
 

Author SHA1 Message Date
Dimitri Savineau 9ff19cc604 rolling_update: restart all ceph-iscsi services
Currently only rbd-target-gw service is restarted during an update.
We also need to restart tcmu-runner and rbd-target-api services
during the ceph iscsi upgrade.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1659611

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f1048627ea)
2019-04-24 23:17:41 +00:00
Dimitri Savineau 7418999638 ceph-mds: Increase cpu limit to 4
In containerized deployment the default mds cpu quota is too low
for production environment.
This is causing performance degradation compared to bare-metal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1695850

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1999cf3d19)
2019-04-24 21:44:23 +00:00
Dimitri Savineau 54128db5cd ceph-osd: Fix merge conflict from mergify
The PR #3916 was merged automatically by mergify even if there was a
confict in the ceph-osd-run.sh.j2 template.
This commit resolves the conflict.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-24 12:41:23 -04:00
Dimitri Savineau 3ae2a687ed ceph-osd: Increase cpu limit to 4
In containerized deployment the default osd cpu quota is too low
for production environment using NVMe devices.
This is causing performance degradation compared to bare-metal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1695880

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c17106874c)

# Conflicts:
#	roles/ceph-osd/templates/ceph-osd-run.sh.j2
2019-04-24 16:02:28 +00:00
Dimitri Savineau c056ae7b8c ansible.cfg: Add library path to configuration
Ceph module path needs to be configured if we want to avoid issues
like:

no action detected in task. This often indicates a misspelled module
name, or incorrect module path

Currently the ansible-lint command in Travis CI complains about that.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1668478

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a1a871cade)
2019-04-24 07:49:48 +00:00
Matthew Vernon 1556d802ff ceph-mon: increase timeout waiting for admin and bootstrap keys
With a large and/or busy cluster, it can take significantly more than
30s for a restarted monitor to get to the point where
`ceph-create-keys` returns successfully. A recent upgrade of our
production cluster failed here because it took a couple of minutes for
the newly-upgraded `mon` to be ready. So increase the timeout
significantly.

This patch is applied to stable-3.2, because the affected code is
refactored in stable-4.0 and ceph-create-keys is no longer called.

Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
2019-04-12 17:03:39 +00:00
Dimitri Savineau f3785ef7dd tests: Add debug to ceph-override.json
It's usefull to have logs in debug mode enabled in order to have
more information for developpers.
Also reindent to json file.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d25af1b872)
2019-04-11 15:38:14 +00:00
Dimitri Savineau e3e6285aa9 tests/functional: use ceph-override.json symlink
We don't need to have multiple ceph-override.json copies. We
currently already have symlink to all_daemons/ceph-override.json so
we can do it for all scenarios.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a19054be18)
2019-04-11 15:38:14 +00:00
Dimitri Savineau 56215d7688 ceph-mds: Set application pool to cephfs
We don't need to use the cephfs variable for the application pool
name because it's always cephfs.
If the cephfs variable is set to something else than the default
value it will break the appplication pool task.

Resolves: #3790

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d2efb7f02b)
2019-04-11 15:38:14 +00:00
Guillaume Abrioux c5c354a61a remove all NBSPs char in stable-3.2 branch
this can cause issues, let's replace all of these chars with real
spaces.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-10 13:27:48 +02:00
Matthew Vernon a8c9b65d13 UCA: Uncomment UCA variables in defaults, fix consequent breakage
The Ubuntu Cloud Archive-related (UCA) defaults in
roles/ceph-defaults/defaults/main.yml were commented out, which means
if you set `ceph_repository` to "uca", you get undefined variable
errors, e.g.

```
The task includes an option with an undefined variable. The error was: 'ceph_stable_repo_uca' is undefined

The error appears to have been in '/nfs/users/nfs_m/mv3/software/ceph-ansible/roles/ceph-common/tasks/installs/debian_uca_repository.yml': line 6, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

- name: add ubuntu cloud archive repository
  ^ here

```

Unfortunately, uncommenting these results in some other breakage,
because further roles were written that use the fact of
`ceph_stable_release_uca` being defined as a proxy for "we're using
UCA", so try and install packages from the bionic-updates/queens
release, for example, which doesn't work. So there are a few `apt` tasks
that need modifying to not use `ceph_stable_release_uca` unless
`ceph_origin` is `repository` and `ceph_repository` is `uca`.

Closes: #3475
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
(cherry picked from commit 9dd913cf8a)
2019-04-09 16:54:37 +00:00
Dimitri Savineau efa0083f3c ceph-osd: Drop memory flag with bluestore
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit dc1c0dcee2)
2019-04-09 13:26:20 +00:00
Dimitri Savineau bbb8ca6643 mon/rgw: use last ipv6 address
When using monitor_address_block or radosgw_address_block variables
to configure the mon/rgw address we're getting the first ip address
from the ansible facts present in that cidr.
When there's VIP on that network the first filter could return the
wrong value.
This seems to affect only IPv6 setup because the VIP addresses are
added to the ansible facts at the beginning of the list. This is the
opposite (at the end) when using IPv4.
This causes the mon/rgw processes to bind on the VIP address.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1680155

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-09 06:17:27 +02:00
Guillaume Abrioux e8a526c5e0 tests: fix update job
jenkins sets CEPH_ANSIBLE_BRANCH to stable-3.2, this makes all
nightly job failing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-08 09:32:43 -04:00
Ali Maredia e943288cae rgw multisite: add more than 1 rgw to the master or secondary zone
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1664869

Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 37f46a8c5d)
2019-04-06 08:50:30 +00:00
Guillaume Abrioux f567f66085 tests: run lvm_setup.yml on secondary cluster
otherwise ceph-osd fails:

```
ceph-volume lvm prepare: error: Unable to proceed with non-existing device: test_group/data-lv2
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-06 08:44:53 +02:00
Dimitri Savineau d1b3d18af1 radosgw: Raise cpu limit to 8
In containerized deployment the default radosgw quota is too low
for production environment.
This is causing performance degradation compared to bare-metal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1680171

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d3ae9fd05f)
2019-04-04 19:14:28 +02:00
Guillaume Abrioux aba3d64b87 tests: do not deploy ceph@master in rgw_multisite
deploying ceph@master in stable-3.2 is not possible.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-04 17:52:19 +02:00
Guillaume Abrioux 82ed220367 tests: add back testinfra testing
136bfe0 removed testinfra testing on all scenario excepted all_daemons

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8d106c2c58)
2019-04-04 10:36:34 +00:00
Guillaume Abrioux 68a832e3c8 tests: pin pytest-xdist to 1.27.0
looks like newer version of pytest-xdist requires pytest>=4.4.0

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ba0a95211c)
2019-04-04 10:36:34 +00:00
Guillaume Abrioux 7136f1734e purge: fix lvm-batch purge osd
`lvm_volumes` and/or `devices` variable(s) can be undefined depending on
the scenario chosen.

These tasks should be run only if these variable are defined, otherwise
it ends up with undefined variable errors.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0180738313)
2019-04-03 08:48:39 +02:00
Guillaume Abrioux 3421cb08d9 tests: test idempotency only on all_daemons job
there's no need to test this on all scenarios.
testing idempotency on all_daemons should be enough and allow us to save
precious resources for the CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 136bfe096c)
2019-04-02 15:28:28 +00:00
Dimitri Savineau fa6d9c940a rolling_update: Update systemd unit regex for nvme
The systemd unit regex doesn't handle nvme devices (/dev/nvmeXn1).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1687828

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c8442f3705)
2019-04-01 15:22:24 +00:00
Guillaume Abrioux f200f1ca87 tests: refact update scenario (stable-3.2)
refact the update scenario like it has been made in master.
(see f0e616962)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-01 16:35:24 +02:00
Dimitri Savineau 8e2cfd9d24 purge-docker-cluster: Remove ceph-osd service
The systemd ceph-osd@.service file used for starting the ceph osd
containers is used in all osd_scenarios.
Currently purging a containerized deployment using the lvm scenario
didn't remove the ceph-osd systemd service.
If the next deployment is a non-containerized deployment, the OSDs
won't be online because the file is still present and override the
one from the package.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7cc626b72d)
2019-04-01 09:10:29 +00:00
Dimitri Savineau e08846c14c tox: Fix container purge jobs
On containerized CI jobs the playbook executed is purge-cluster.yml
but it should be set to purge-docker-cluster.yml

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bd0869cd01)
2019-04-01 06:59:15 +00:00
Guillaume Abrioux 005cb09ba9 tests: add mgr and nfs nodes in all_daemons
even not used, we need to fire up those VMs to be able to perform the
upgrade in the CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-28 15:40:43 +01:00
Dimitri Savineau e994dabaec Add uca to ceph_repository choices validation
Ubuntu cloud archive is configurable via ceph_repository variable but
the uca choice isn't accepted.
This commit fixes this issue and also validates the associated uca
repository variables.

Resolves: #3739

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 94505a3af2)
2019-03-26 10:27:08 +00:00
Guillaume Abrioux b92c826661 defaults: change default value for ceph_docker_image_tag
Since nautilus has been released, it's now the latest stable release, it
means the tag `latest` now refers to nautilus.

`stable-3.2` isn't intended to deploy nautilus, therefore, we should
change the default value for this variable to the latest release
stable-3.2 is able to deploy (mimic).

Closes: #3734

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-21 18:37:21 +00:00
Dimitri Savineau e4a71eabd9 ceph-osd: Ensure lvm2 is installed
When using osd_scenario lvm, we never check if the lvm2 package is
present on the host.
When using containerized deployment and docker on CentOS/RedHat this
package will be automatically installed as a dependency but not for
Ubuntu distribution.
OSD deployed via ceph-volume require the lvmetad.socket to be active
and running.

Resolves: #3728

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 179fdfbc19)
2019-03-20 22:59:28 +00:00
Bruceforce 567ad1826b ceph_crush: fix rstrip for python 3
Removing bytes literals since rstrip only supports type String or None.

Please backport to stable-3.2

Signed-off-by: Bruceforce <markus.greis@gmx.de>
(cherry picked from commit 6d506dba1a)
2019-03-20 01:20:30 +00:00
Bruceforce 2590d3cfba ceph_volume: fix rstrip for python 3
Removing bytes literals since rstrip only supports type String or None.

Signed-off-by: Bruceforce <markus.greis@gmx.de>
2019-03-19 18:53:50 +00:00
Phuong Nguyen 274bf3e038 Remove trailing forward slash in ceph_docker_registry variable from group_vars/rhcs.yml.sample file.
Also fixed rhcs_edits.txt for variable ceph_docker_registry.

Moved namespace to ceph_docker_image variable.

Signed-off-by: Phuong Nguyen <pnguyen@redhat.com>
(cherry picked from commit 3305309e87)
2019-03-19 14:40:27 +00:00
Guillaume Abrioux d3f6556041 osd: backward compatibility with old disk_list.sh location
Since all files in container image have moved to `/opt/ceph-container`
this check must look for new AND the old path so it's backward
compatible. Otherwise it could end up by templating an inconsistent
`ceph-osd-run.sh`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 987bdac963)
2019-03-18 21:56:53 +00:00
Dimitri Savineau 46e8898093 ceph-validate: fail if there's no ipaddr available in monitor_address_block subnet
When using monitor_address_block to determine the ip address of the
monitor node, we need an ip address available in that cidr to be
present in the ansible facts (ansible_all_ipv[46]_addresses).
Currently we don't check if there's an ip address available during
the ceph-validate role.
As a result, the ceph-config role fails due to an empty list during
ceph.conf template creation but the error isn't explicit.

TASK [ceph-config : generate ceph.conf configuration file] *****
fatal: [0]: FAILED! => {"msg": "No first item, sequence was empty."}

With this patch we will fail before the ceph deployment with an
explicit failure message.

Resolves: rhbz#1673687

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5c39735be5)
2019-03-18 18:31:18 +00:00
Gregory Orange 86e39a29c8 Change docker_container parameter network to network_mode
Addressing "populate kv_store with custom ceph.conf":
Unsupported parameters for (docker_container) module. Looking at
https://docs.ansible.com/ansible/latest/modules/docker_container_module.html
shows that the correct parameter is network_mode, not network.

Signed-off-by: Gregory Orange <gregoryo2014@users.noreply.github.com>
2019-03-18 13:23:10 +00:00
Dimitri Savineau bfa99cdd53 Set the default crush rule in ceph.conf
Currently the default crush rule value is added to the ceph config
on the mon nodes as an extra configuration applied after the template
generation via the ansible ini module.

This implies two behaviors:

1/ On each ceph-ansible run, the ceph.conf will be regenerated via
ceph-config+template and then ceph-mon+ini_file. This leads to a
non necessary daemons restart.

2/ When other ceph daemons are collocated on the monitor nodes
(like mgr or rgw), the default crush rule value will be erased by
the ceph.conf template (mon -> mgr -> rgw).

This patch adds the osd_pool_default_crush_rule config to the ceph
template and only for the monitor nodes (like crush_rules.yml).
The default crush rule id is read (if exist) from the current ceph
configuration.
The default configuration is -1 (ceph default).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1638092

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d8538ad4e1)
2019-03-14 14:48:03 +00:00
Dimitri Savineau ef9525482b add-osd.yml: Add become flag for ceph-validate
The check_devices task fails if the ceph-validate role isn't executed
as a privileged user (Permission denied).

failed: [osd0] (item=/dev/sdb) => {"changed": false, "err": "Error:
Error opening /dev/sdb: Permission denied\n", "item": "/dev/sdb",
"msg": "Error while getting device information with parted script:
'/sbin/parted -s -m /dev/sdb -- unit 'MiB' print'", "out": "", "rc": 1}

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b23c05ae52)
2019-03-12 14:48:03 +01:00
Dimitri Savineau 2f3206abeb ceph-osd: Install numactl package when needed
With 3e32dce we can run OSD containers with numactl support.
When using numactl command in a containerized deployment we need to
be sure that the corresponding package is installed on the host.
The package installation is only executed when the
ceph_osd_numactl_opts variable isn't empty.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b7f4e3e7c7)
2019-03-12 08:14:47 +00:00
Guillaume Abrioux 34086ec233 osd: support numactl options on OSD activate
This commit adds OSD containers activate with numactl support.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1684146

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b3eb9206fa)
2019-03-11 09:50:29 +00:00
Guillaume Abrioux 224bab0d70 tests: add mgrs section in non_container-collocation
No mgrs are deployed in this scenario, causing the testinfra jobs to
fail.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-05 10:49:45 +01:00
Guillaume Abrioux 36fafadc67 tests: fix collocation scenario
ceph_origin and ceph_repository are mandatory variables.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-05 10:49:45 +01:00
Guillaume Abrioux e548a9ae7c tests: use memory backend for cache fact
force ansible to generate facts for each run.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4a1bafdc21)
2019-03-05 08:40:11 +01:00
Guillaume Abrioux 1209fb1874 tests: pin testinfra version
As of testinfra 2.0.0, the binary name is `py.test`.

But let's pin the version to 1.19.0.
Indeed, migrating to 2.0.0 requires our current testing to be reworked a bit.
Since we don't have the bandwidth ATM for this, it's better to simply
keep testing with testinfra 1.19.0.

Note that I've replaced all `testinfra` occurences by `py.test` anyway.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b42250332a)
2019-03-04 15:48:44 +00:00
Guillaume Abrioux 4dd46ec396 add-osd: gather facts in second part of playbook
otherwise, it will end up with error like following:

```
FAILED! => {"msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_hostname'"}
```

because facts won't have been gathered.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1670663

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a440878533)
2019-03-04 15:48:44 +00:00
Guillaume Abrioux 06ad7e0b57 purge: fix rbd-mirror group name
the default is rbdmirrors in ceph-defaults

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 47ebef374f)
2019-03-01 22:16:19 +00:00
Guillaume Abrioux a8467d8f33 purge: fix rbd mirror purge
as of b70d54ac80 the service launched isn't
ceph-rbd-mirror@admin.service.

it's now `ceph-rbd-mirror@rbd-mirror.{{ ansible_hostname }}`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a915308477)
2019-03-01 22:16:19 +00:00
Guillaume Abrioux 5470e6fa42 purge: do not remove /var/lib/apt/lists/*
removing the content of this directory seems a bit agressive and cause a
redeployment to fail after a purge on debian based distrubition.

Typical error:
```
fatal: [mon0]: FAILED! => changed=false
  attempts: 3
  msg: No package matching 'ceph' is available
```

The following task will consider the cache is still valid, so apt
doesn't refresh it:
```
- name: update apt cache if cache_valid_time has expired
  apt:
    update_cache: yes
    cache_valid_time: 3600
  register: result
  until: result is succeeded
```

since the task installing ceph packages has a `update_cache: no` it
fails:

```
- name: install ceph for debian
  apt:
    name: "{{ debian_ceph_pkgs | unique }}"
    update_cache: no
    state: "{{ (upgrade_ceph_packages|bool) | ternary('latest','present') }}"
    default_release: "{{ ceph_stable_release_uca | default('') }}{{ ansible_distribution_release ~ '-backports' if ceph_origin == 'distro' and ceph_use_distro_backports else '' }}"
  register: result
  until: result is succeeded
```

/tmp/* isn't specific to ceph as well, so we shouldn't remove everything
in this directory.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3849f30f58)
2019-03-01 22:16:19 +00:00
Guillaume Abrioux 255eab59ac purge: fix purge of lvm devices
using `shell` module seems to be the only way to make this task working
on rhel based distribution AND debian based distributions.

on ubuntu, using `command` ansible module fails like following
(not due to `sudo` usage or not):
```
ok: [osd1] => changed=false
  cmd: command -v ceph-volume
  failed_when_result: false
  msg: '[Errno 2] No such file or directory: ''command'': ''command'''
  rc: 2
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 89f77589fa)
2019-03-01 22:16:19 +00:00
VasishtaShastry 2393d82306 Extends check_devices tasks to non-collocated an lvm-batch scenarios
Tuned name of a task and error message to make it more user understandable

Fixes BZ 1648168 - ceph-validate : devices are not validated in non-collocated and lvm_batch scenario

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1648168

Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
(cherry picked from commit 34c25ef49b)
2019-03-01 04:06:57 +00:00