Commit Graph

4653 Commits (590f6026bb7f284333c1b4d01946e2c3ac388358)
 

Author SHA1 Message Date
Sébastien Han d6d2a3b040 doc: update osd scenario
This commits adds documentation for the lvm scenario and the deprecation
of collocated and non-collocated scenario.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit f8e1542423)
2019-04-12 00:45:21 +00:00
Sébastien Han 343a99c8b7 osd: default osd_scenario to lvm
osd_scenario has become obsolete and defaults to lvm. With lvm there is
no such things has collocated and non-collocated.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 52df15895b)
2019-04-12 00:45:21 +00:00
Sébastien Han 279044155f validate: print a message for old scenarios
ceph-disk is not supported anymore, so all the newly created OSDs will
be configured using ceph-volume.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 9ea1e49407)
2019-04-12 00:45:21 +00:00
Sébastien Han 11c6655f57 osd: remove ceph-disk support
We don't support the preparation of OSD with ceph-disk. ceph-volume is
only supported. However, the start operation of OSD is still supported.
So let's say you change a config option, the handlers will be able to
restart all the OSDs via their respective systemd unit files.

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e2a5aa062e)
2019-04-12 00:45:21 +00:00
Dimitri Savineau c7d0fbbd19 tests: Add debug to ceph-override.json
It's usefull to have logs in debug mode enabled in order to have
more information for developpers.
Also reindent to json file.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d25af1b872)
2019-04-11 17:47:21 +02:00
Dimitri Savineau a1e73e8ff1 tests/functional: use ceph-override.json symlink
We don't need to have multiple ceph-override.json copies. We
currently already have symlink to all_daemons/ceph-override.json so
we can do it for all scenarios.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a19054be18)
2019-04-11 17:47:21 +02:00
Dimitri Savineau c9a3def3a6 ceph-mds: Set application pool to cephfs
We don't need to use the cephfs variable for the application pool
name because it's always cephfs.
If the cephfs variable is set to something else than the default
value it will break the appplication pool task.

Resolves: #3790

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d2efb7f02b)
2019-04-11 17:47:21 +02:00
Guillaume Abrioux 2581c4d511 update: fix undefined error when no mgr group is declared
if mgr group isn't defined in inventory, that task will fail with
undefined error.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c1e4529b0e)
2019-04-11 09:20:22 -04:00
Guillaume Abrioux f5f8d264e2 osds: allow passing devices by path
ceph-volume didn't work when the devices where passed by path.
Since it now support it, let's allow this feature in ceph-ansible

Closes: #3812

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7e0adca7a4)
2019-04-11 02:25:15 +00:00
Dimitri Savineau 1e944b6022 rgw: change default frontend on nautilus
As discussed in ceph/ceph#26599, beast is now the default frontend
for rados gateway with nautilus release.
Add rgw_thread_pool_size variable with 512 as default value and keep
backward compatibility with num_threads option when using civetweb.
Update radosgw_civetweb_num_threads to reflect rgw_thread_pool_size
change.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d17b1b48b6)
2019-04-10 14:42:33 -04:00
Guillaume Abrioux a718ddec50 mon: remove useless delegate_to
Let's use a condition to run this task only on the first mon.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 631e5d3144)
2019-04-10 09:52:29 +00:00
Matthew Vernon a4d75c6ea6 UCA: Uncomment UCA variables in defaults, fix consequent breakage
The Ubuntu Cloud Archive-related (UCA) defaults in
roles/ceph-defaults/defaults/main.yml were commented out, which means
if you set `ceph_repository` to "uca", you get undefined variable
errors, e.g.

```
The task includes an option with an undefined variable. The error was: 'ceph_stable_repo_uca' is undefined

The error appears to have been in '/nfs/users/nfs_m/mv3/software/ceph-ansible/roles/ceph-common/tasks/installs/debian_uca_repository.yml': line 6, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

- name: add ubuntu cloud archive repository
  ^ here

```

Unfortunately, uncommenting these results in some other breakage,
because further roles were written that use the fact of
`ceph_stable_release_uca` being defined as a proxy for "we're using
UCA", so try and install packages from the bionic-updates/queens
release, for example, which doesn't work. So there are a few `apt` tasks
that need modifying to not use `ceph_stable_release_uca` unless
`ceph_origin` is `repository` and `ceph_repository` is `uca`.

Closes: #3475
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
(cherry picked from commit 9dd913cf8a)
2019-04-10 03:50:27 +00:00
Dimitri Savineau 4cc318d13c container-common: Enable docker on boot for ubuntu
docker daemon is automatically started during package installation
but the service isn't enabled on boot.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 37816570c6)
2019-04-10 00:02:35 +00:00
Dimitri Savineau 532d749b2e rolling_update: Remove ceph aliases
ceph aliases have been introduced in stable-3.2 during the ceph
deployment. On master this has been removed but we don't handle
this removal in the upgrade from stable-3.2 to master via the
rolling_update playbook.
Also remove the task from purge-docker-cluster missing from
d9e7835

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 57b4e76d11)
2019-04-10 00:02:35 +00:00
Rishabh Dave c60915733a allow adding a MDS to already deployed cluster
Add a tox scenario that adds an new MDS node as a part of already
deployed Ceph cluster and deploys MDS there.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit c0dfa9b61a)
2019-04-09 16:48:59 +02:00
Dimitri Savineau 8715490223 ceph-facts: use last ipv6 address for mon/rgw
When using monitor_address_block or radosgw_address_block variables
to configure the mon/rgw address we're getting the first ip address
from the ansible facts present in that cidr.
When there's VIP on that network the first filter could return the
wrong value.
This seems to affect only IPv6 setup because the VIP addresses are
added to the ansible facts at the beginning of the list. This is the
opposite (at the end) when using IPv4.
This causes the mon/rgw processes to bind on the VIP address.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1680155

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fd4b0ec7eb)
2019-04-09 10:48:14 -04:00
François Lafont af78673328 ceph-rgw: Fix bad paths which depend on the clustername
The path of the RGW environment file (in the /var/lib/ceph/radosgw/
directory) depends on the Ceph clustername. It was not taken into
account in the Ansible role `ceph-rgw`.

Signed-off-by: flaf <francois.lafont.1978@gmail.com>
(cherry picked from commit 4c3e77d869)
2019-04-09 10:44:45 -04:00
Guillaume Abrioux bf672f14fe mgr: manage mgr modules when mgr and mon are collocated
When mgrs are implicitly collocated on monitors (no mgrs in mgrs group).
That include was skipped because of this condition :

`inventory_hostname == groups[mgr_group_name][0]`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit cbfdbab177)
2019-04-09 10:59:32 +02:00
Guillaume Abrioux 3272c2347f mgr: wait for all mgr to be available
before managing mgr modules, we must ensure all mgr are available
otherwise we can hit failure like following:

```
stdout:Error ENOENT: all mgr daemons do not support module 'restful', pass --force to force enablement
```

It happens because all mgr are not yet available when trying to manage
with mgr modules.

Closes: #3100

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f596cc1711)
2019-04-09 10:59:32 +02:00
Guillaume Abrioux 36f72446a8 tests: fix update job
jenkins sets `CEPH_ANSIBLE_BRANCH` to `stable-4.0`, this makes all
nightly job failing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-08 09:19:34 -04:00
Ali Maredia 4b35360876 rgw multisite: add more than 1 rgw to the master or secondary zone
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1664869

Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 37f46a8c5d)
2019-04-07 10:00:18 +00:00
fpantano f8cbc27a83 Check ceph_health_raw.stdout value as string during mon bootstrap
According to rdo testing https://review.rdoproject.org/r/#/c/18721
a check on the output of the ceph_health value is added to
allow the playbook to make several attempts (according to the
retry/delay variables) when waiting the cluster quorum or
when the container bootstrap is not ended.
It avoids the failure of the command execution when it doesn't
receive a valid json object to decode (because cluster is too
slow to boostrap compared to ceph-ansible task execution).

Signed-off-by: fpantano <fpantano@redhat.com>
(cherry picked from commit afbb90e4ac)
2019-04-04 19:15:55 +02:00
Dimitri Savineau ace23a1479 radosgw: Raise cpu limit to 8
In containerized deployment the default radosgw quota is too low
for production environment.
This is causing performance degradation compared to bare-metal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1680171

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d3ae9fd05f)
2019-04-04 19:15:01 +02:00
Guillaume Abrioux cd120baaba tests: add back testinfra testing
136bfe0 removed testinfra testing on all scenario excepted all_daemons

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8d106c2c58)
2019-04-04 13:11:33 +00:00
Guillaume Abrioux d3d02ba2b1 tests: pin pytest-xdist to 1.27.0
looks like newer version of pytest-xdist requires pytest>=4.4.0

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ba0a95211c)
2019-04-04 13:11:33 +00:00
Guillaume Abrioux 9074c19e10 tests: fix update scenario (stable-4.0)
perform upgrade from luminous to nautilus.

All dev* references are not needed anymore in stable-4.0

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-04 03:38:52 +02:00
Guillaume Abrioux b723ef3fa2 purge: fix lvm-batch purge osd
`lvm_volumes` and/or `devices` variable(s) can be undefined depending on
the scenario chosen.

These tasks should be run only if these variable are defined, otherwise
it ends up with undefined variable errors.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0180738313)
2019-04-04 03:38:52 +02:00
Guillaume Abrioux 3fd4354aaa tests: switch rhel-container-podman to nautilus
in stable-4.0 this should be set to nautilus.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-03 11:27:46 +02:00
Dimitri Savineau 0274b880f1 ceph-volume: Add PYTHONIOENCODING env variable
Since https://github.com/ceph/ceph/commit/77912c0 ceph-volume uses
stdout encoding based on LC_CTYPE and PYTHONIOENCODING environment
variables.
Thoses variables aren't set when using ansible.
Currently this commit breaks non containerized deployment on Ubuntu.

TASK [use ceph-volume to create bluestore osds] ********************
  cmd:
  - ceph-volume
  - --cluster
  - ceph
  - lvm
  - create
  - --bluestore
  - --data
  - /dev/sdb
  rc: 1
  stderr: |-
    Traceback (most recent call last):
    (...)
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in
    position 132: ordinal not in range(128)

Note that the task is failing on ansible side due to the stdout
decoding but the osd creation is successful.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7e5e4229b7)
2019-04-03 11:27:46 +02:00
Guillaume Abrioux 655ac5eb93 tests: test idempotency only on all_daemons job
there's no need to test this on all scenarios.
testing idempotency on all_daemons should be enough and allow us to save
precious resources for the CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 136bfe096c)
2019-04-03 11:27:46 +02:00
Dimitri Savineau 47d6e505a0 tox: Set nautilus as default release
On stable-4.0 branch we don't want to use dev setup but stable
release (nautilus).
Also update the container image tag to reflect this change.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-01 17:00:36 +02:00
Guillaume Abrioux f55e2b08be remove all NBSPs on master branch
Similar to #3658

Since there's too many changes between master and stable branches let's
commit directly in each branches instead of trying to backport this
commit.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-28 11:57:55 +00:00
Dimitri Savineau 40a8e1160c container: Add python3-docker on Ubuntu bionic
When installing python-minimal on Ubuntu bionic, this will add the
/usr/bin/python symlink to the default python interpreter.
On bionic, this isn't python2 but python3.

$ /usr/bin/python --version
Python 3.6.7

The python docker library is only installed for python2 which causes
issues when running the purge-docker-cluster playbook. This playbook
uses the ansible docker modules and requires to have python bindings
installed on the remote host.
Without the bindings we can see python error reported by the docker
module.

msg: Failed to import docker or docker-py - No module named 'docker'.
Try `pip install docker` or `pip install docker-py` (Python 2.6)

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-03-28 08:03:58 +00:00
Dimitri Savineau 7b7f79171a tests/functional: Use the ansible reboot module
Ansible 2.7 introduces the reboot module so we don't need to use the
shell/reboot + wait_for tasks.

https://docs.ansible.com/ansible/latest/modules/reboot_module.html

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-03-27 08:30:50 +00:00
Dimitri Savineau bd0869cd01 tox: Fix container purge jobs
On containerized CI jobs the playbook executed is purge-cluster.yml
but it should be set to purge-docker-cluster.yml

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-03-26 21:36:56 +00:00
Dimitri Savineau c8442f3705 rolling_update: Update systemd unit regex for nvme
The systemd unit regex doesn't handle nvme devices (/dev/nvmeXn1).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1687828

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-03-26 12:01:00 +00:00
Dimitri Savineau 4cca366102 travis: Remove galaxy lint rules repository
The galaxy-lint-rules github repository isn't used anymore and has
been archived.
All the rules are now part of the ansible-lint project.

https://github.com/ansible/galaxy-lint-rules
https://github.com/ansible/ansible-lint

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-03-26 11:08:38 +00:00
Dimitri Savineau 94505a3af2 Add uca to ceph_repository choices validation
Ubuntu cloud archive is configurable via ceph_repository variable but
the uca choice isn't accepted.
This commit fixes this issue and also validates the associated uca
repository variables.

Resolves: #3739

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-03-26 09:03:44 +00:00
Guillaume Abrioux 6f47c20c3a rgw: fix a typo
ee2d52d33d introduced a typo.
This commit fixes it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 3c4f464c54 rgw: cleanup legacy task
this task was here for backward compatibility.
It's time to remove it in the next release.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 9134624578 rgw: add a retry on pool related tasks
sometimes those tasks might fail because of a timeout.
I've been facing this several times in the CI, adding this retry might
help and won't hurt in any case.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 78aac3e96a update: followup on edfdc49
all rgw instances should be stopped according to the multiple rgw
instances support added in rolling_update.yml

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux f6e0185146 update: add containerized deployment upgrade support (L->N)
Add a couple of fixes to allow containerized deployments upgrade support
to upgrade from luminous/mimic to nautilus.

- pass CEPH_CONTAINER_IMAGE and CEPH_CONTAINER_BINARY environment
variable to the ceph_key module,
- fix the docker exec command in 'waiting for the containerized monitor
to join the quorum' task according to the `delegate_to` parameter,
- override `docker_exec_cmd` in `ceph-facts` with `mon_host` when
rolling_update is `True`,
- do not run unnecessarily `create_mds_filesystems.yml` when performing an
upgrade.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 1816b876ee update: add missing hosts in facts gathering
iscsigws were missing.
The 'complete upgrade' couldn't complete because rolling_update was set
to False for iscsigw nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 45ba90c169 update: remove rbdmirror legacy task
This task is no longer needed for next release.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 0ea0adf039 update: show all daemons version at the end
Let's display all daemons version at the end of the playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 7386249c71 facts: retrieve fsid during rolling_update playbook
otherwise it generates a new cluster fsid and makes the upgrade failing

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux 5c3ce4ca77 mon: fetch initial keyring even when running rolling_update
otherwise, the task to copy mgr keyring fails during the rolling_update.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux f0e616962d tests: split tox configuration into multiple pieces
This file is becoming too big, let's isolate the update related code in
a dedicated tox configuration file.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00
Guillaume Abrioux f31d6d9485 update: enable new nautilus-only functionality
once the cluster is upgraded to nautilus, we can complete the process by
disallowing pre-nautilus OSDs and enabling all new nautilus-only functionality

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-25 16:02:56 -04:00