Commit Graph

528 Commits (53793b352e31bfc9ee3c0f9dd6c55254c50d46d3)

Author SHA1 Message Date
Guillaume Abrioux 53793b352e tests: (lvm_setup.yml), don't shrink lvol
when rerunning lvm_setup.yml on existing cluster with OSDs already
deployed, it fails like following:

```
fatal: [osd0]: FAILED! => changed=false
  msg: Sorry, no shrinking of data-lv2 to 0 permitted.
```

because we are asking `lvol` module to create a volume on an empty VG
with size extents = `100%FREE`.

The default behavior of `lvol` is to shrink the volume if the LV's current
size is greater than the requested size.

Given the requested size is calculated like this:

`size_requested = size_percent * this_vg['free'] / 100`

in our case, it is similar to:

`size_requested = 100 * 0 / 100` which basically means `0`

So the current LV size is well greater than the requested size which
leads the module to attempt to shrink it to 0 which isn't obviously now
allowed.

Adding `shrink: false` to the module calls fixes this issue.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 218f4ae361)
2020-07-22 18:47:27 -04:00
Dimitri Savineau 056a4fe866 ceph-dashboard: update create/get rgw user tasks
Since [1] if a rgw user already exists then the radosgw-admin user create
command will return an error instead of modifying the current user.
We were already doing separated tasks for create and get operation but
only for multisite configuration but it's not enough.
Instead we should do the get task first and depending on the result
execute the create.
This commit also adds missing run_once and delegate_to statement.

[1] https://github.com/ceph/ceph/commit/269e9b9

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ac0f68ccf0)
2020-07-20 21:21:57 +02:00
Jan Fajerski 14e9672f00 lvm_setup: lookup device from inventory, default to /dev/sd* names
This fixes a long standing fail in ceph-volumes lvm test suite.
Otherwise the default behaviour should not change.

Signed-off-by: Jan Fajerski <jfajerski@suse.com>
(cherry picked from commit 1fe8e819f9)
2020-06-29 10:25:58 +02:00
Dimitri Savineau a99c94ea11 ceph-osd: remove ceph-osd-run.sh script
Since we only have one scenario since nautilus then we can just move
the container start command from ceph-osd-run.sh to the systemd unit
service.
As a result, the ceph-osd-run.sh.j2 template and the
ceph_osd_docker_run_script_path variable are removed.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 829990e60d)
2020-06-23 17:35:01 +02:00
Guillaume Abrioux 8ef3fee41b ceph_volume: make zap function idempotent
This commit makes the zap function idempotent, especially when using
lvm_volumes variable.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1845668

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3f47236470)
2020-06-23 10:49:07 +02:00
Dimitri Savineau a97e24fee9 docker2podman: manage dashboard nodes
The dashboard nodes (alertmanager, grafana, node-exporter, and prometheus)
were not manage during the docker to podman migration.

This adds the systemd container template of those services to a dedicated
file (systemd.yml) in order to include it in the docker2podman playbook.

This also adds the dashboard container images pull from docker to podman.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1829389

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 252e78b4e4)
2020-06-03 13:20:24 -04:00
Dimitri Savineau e34c95d28f tests: update mgr dashboard socket listening test
Since 15ed9ee the ceph-mgr daemon binds on the IP address on the public
network instead of binding on all addresses.
This commit updates the testinfra code to reflect that change.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0f0a14772c)
2020-04-07 15:25:45 +02:00
Dimitri Savineau 47be2a2719 tests: register mark in pytest configuration
Unregister marks generates warnings like:

PytestUnknownMarkWarning: Unknown pytest.mark.docker - is this a typo?
You can register custom marks to avoid this warning

https://docs.pytest.org/en/latest/mark.html

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ac4f8763aa)
2020-04-07 15:25:45 +02:00
Dimitri Savineau 1cfb84ae94 tests: add dashboard testinfra configuration
This commit adds basic tests for grafana, prometheus, node-exporter and
ceph mgr dashboard services.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f2c6281207)
2020-04-07 15:25:45 +02:00
Guillaume Abrioux 825aed5ec1 ceph_key: remove 'update' state
With this change, the state `present` is enough to update a keyring.
If the keyring already exist, it will be updated if caps or secret
passed to the module are different.
If the keyring doen't exist, it will be created.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1808367

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 553584cbd0)
2020-04-01 18:08:51 -04:00
Guillaume Abrioux 03355aec8c tests: add more coverage in external_clients scenario
Run create_users_keys.yml in external_clients scenario

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8c1c34b201)
2020-03-31 19:42:40 -04:00
Dimitri Savineau dcd02e6494 ceph_volume: fix multiple db/wal devices
When using the lvm batch ceph-volume subcommand with dedicated devices
for bluestore (db/wal) then the list of devices is convert to a string
instead of being extended via an iterable.
This was working with only one dedicated device but starting with more
then the ceph_volume module fails.

TASK [ceph-osd : use ceph-volume lvm batch to create bluestore osds] **
fatal: [xxxxxx]: FAILED! => changed=true
  cmd:
  - ceph-volume
  - --cluster
  - ceph
  - lvm
  - batch
  - --bluestore
  - --yes
  - --prepare
  - --osds-per-device
  - '4'
  - /dev/nvme2n1
  - /dev/nvme3n1
  - /dev/nvme4n1
  - /dev/nvme5n1
  - /dev/nvme6n1
  - --db-devices
  - /dev/nvme0n1 /dev/nvme1n1
  - --report
  - --format=json
  msg: non-zero return code
  rc: 2
  stderr: |2-
     stderr: lsblk: /dev/nvme0n1 /dev/nvme1n1: not a block device
     stderr: error: /dev/nvme0n1 /dev/nvme1n1: No such file or directory
     stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
    usage: ceph-volume lvm batch [-h] [--db-devices [DB_DEVICES [DB_DEVICES ...]]]
                                 [--wal-devices [WAL_DEVICES [WAL_DEVICES ...]]]
                                 [--journal-devices [JOURNAL_DEVICES [JOURNAL_DEVICES ...]]]
                                 [--no-auto] [--bluestore] [--filestore]
                                 [--report] [--yes] [--format {json,pretty}]
                                 [--dmcrypt]
                                 [--crush-device-class CRUSH_DEVICE_CLASS]
                                 [--no-systemd]
                                 [--osds-per-device OSDS_PER_DEVICE]
                                 [--block-db-size BLOCK_DB_SIZE]
                                 [--block-wal-size BLOCK_WAL_SIZE]
                                 [--journal-size JOURNAL_SIZE] [--prepare]
                                 [--osd-ids [OSD_IDS [OSD_IDS ...]]]
                                 [DEVICES [DEVICES ...]]
    ceph-volume lvm batch: error: Unable to proceed with non-existing device: /dev/nvme0n1 /dev/nvme1n1

So the dedicated device list is considered as a single string.

This commit also adds the block_db_devices and wal_devices documentation
to the ceph_volume module.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816713

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 760b6cd7b0)
2020-03-30 10:04:26 -04:00
Guillaume Abrioux d682cf6de5 tests: add inventory host for 5.0 upgrade job
This inventory is intended to be used in the upgrade scenario in
stable-5.0 branch.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-26 11:23:23 +01:00
Dimitri Savineau 55c222d088 dashboard: allow to set read-only admin user
This commit allows one to set the role for the admin user as read-only.
This can be controlled via the dashboard_admin_user_ro variable but the
default value is false for backward compatibility.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1810176

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fb69f6990c)
2020-03-19 13:24:05 -04:00
Guillaume Abrioux c26e80fdbf rgw: add multi-instances support when deploying multisite
This commit adds the multi-instances when deploying rgw multisite

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 60a2e28189)
2020-03-12 19:04:26 -04:00
Dimitri Savineau 3fc4cc9f62 tests/requirements: bump testinfra
3.4 is the latest testinfra release available but python2 is dropped
starting 4.0.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ccec67aa6a)
2020-03-09 13:33:44 +01:00
Guillaume Abrioux b9e397ebaf tests: add more osd nodes in all_daemons scenario
This commit adds more osd nodes in all_daemons scenario in order to test
erasure pool creation.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9f0c6df94f)
2020-03-06 16:10:03 +01:00
Guillaume Abrioux 0800c60721 tests: update ooo job
This commit changes the value passed for the attribute 'rule_name' in
openstack_pools definition. It doesn't make sense to have emptry string
as passed value here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 248978596a)
2020-03-06 16:10:03 +01:00
Guillaume Abrioux 6cc9c28c5d tests: add erasure pool creation test in CI
This commit makes the CI testing an OSD pool erasure creation due to the
recent refact of the OSD pool creation tasks in the playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8cacba1f54)
2020-03-06 16:10:03 +01:00
Guillaume Abrioux 01559892d1 tests: enable pg autoscaler on 1 pool
This commit enables the pg autoscaler on 1 pool.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a3b797e059)
2020-03-06 16:10:03 +01:00
Ali Maredia 2c440d4427 rgw multisite: enable more than 1 realm per cluster
Make it so that more than one realm, zonegroup,
or zone can be created during a run of the rgw
multisite ansible playbooks.

The rgw hosts now need to be grouped into zones
and realms in the inventory.

.yml files need to be created in group_vars
for the realms and zones. Sample yaml files
are available.

Also remove multsite destroy playbook
and add --cluster before radosgw-admin commands

remove manually added rgw_zone_endpoints var
and have ceph-ansible automatically add the
correct endpoints of all the rgws in a rgw_zone
from the information provided in that rgws hostvars.

Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 71f55bd54d)
2020-03-04 14:39:23 -05:00
Guillaume Abrioux 96b7857347 requirements: enforce ansible version requirement
See https://github.com/advisories/GHSA-3m93-m4q6-mc6v

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a2d2e70ac2)
2020-02-27 09:56:55 -05:00
Ali Maredia 7d2a217270 rgw: extend automatic rgw pool creation capability
Add support for erasure code pools.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1731148

Signed-off-by: Ali Maredia <amaredia@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1834c1e48d)
2020-02-17 17:44:53 -05:00
Guillaume Abrioux e3cd719ebe tests: add external_clients scenario
This commit adds a new 'external ceph clients' scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 641729357e)
2020-01-31 13:37:10 +01:00
Guillaume Abrioux 2e7d7b70ed tests: set dashboard|grafana_admin_password
Set these 2 variables in all test scenarios where `dashboard_enabled` is
`True`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c040199c8f)
2020-01-29 14:15:41 +01:00
Guillaume Abrioux d5dca5087a tests: add 'all_in_one' scenario
Add new scenario 'all_in_one' in order to catch more collocated related
issues.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3e7dbb4b16)
2020-01-27 17:54:39 -05:00
Dimitri Savineau 0abea70e29 filestore-to-bluestore: fix osd_auto_discovery
When osd_auto_discovery is set then we need to refresh the
ansible_devices fact between after the filestore OSD purge
otherwise the devices fact won't be populated.
Also remove the gpt header on ceph_disk_osds_devices because
the devices is empty at this point for osd_auto_discovery.
Adding the bool filter when needed.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bb3eae0c80)
2020-01-22 10:06:17 +01:00
Dimitri Savineau e4965e9ea9 filestore-to-bluestore: --destroy with raw devices
We still need --destroy when using a raw device otherwise we won't be
able to recreate the lvm stack on that device with bluestore.

Running command: /usr/sbin/vgcreate -s 1G --force --yes ceph-bdc67a84-894a-4687-b43f-bcd76317580a /dev/sdd
 stderr: Physical volume '/dev/sdd' is already in volume group 'ceph-b7801d50-e827-4857-95ec-3291ad6f0151'
  Unable to add physical volume '/dev/sdd' to volume group 'ceph-b7801d50-e827-4857-95ec-3291ad6f0151'
  /dev/sdd: physical volume not initialized.
--> Was unable to complete a new OSD, will rollback changes

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792227

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f995b079a6)
2020-01-21 18:26:55 +01:00
Guillaume Abrioux fc7212b192 tests: add time command in vagrant_up.sh
monitor how long it takes to get all VMs up and running

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 16bcef4f28)
2020-01-10 17:41:27 +01:00
Guillaume Abrioux 2c96155c32 tests: retry to fire up VMs on vagrant failure
Add a script to retry several times to fire up VMs to avoid vagrant
failures.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 1ecb3a9352)
2020-01-10 17:41:27 +01:00
Guillaume Abrioux 7c2918d684 tests: add a docker2podman scenario
This commit adds a new scenario in order to test docker-to-podman.yml
migration playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit dc672e86ec)
2020-01-10 17:41:27 +01:00
Dimitri Savineau bd016960cf ceph-osd: add device class to crush rules
This adds device class support to crush rules when using the class key
in the rule dict via the create-replicated sub command.
If the class key isn't specified then we use the create-simple sub
command for backward compatibility.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1636508

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ef2cb99f73)
2020-01-10 11:07:25 -05:00
Dimitri Savineau 09ccf22052 tests: use community repository
We don't need to use dev repository on stable branches.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-09 21:39:23 +01:00
Dimitri Savineau d625cefbac tests: use ceph iscsi stable repository
The ceph iscsi repository was still set to dev (shaman) instead of
using the stable ceph-iscsi repository.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-08 19:29:59 +01:00
Guillaume Abrioux 78799ecf55 tests: add filestore_to_bluestore job
This commit adds a new job in order to test the
filestore-to-bluestore.yml infrastructure playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 40de34fb5e)
2019-12-11 16:37:21 +01:00
Dimitri Savineau acf2476f09 tests: reduce max_mds from 3 to 2
Having max_mds value equals to the number of mds nodes generates a
warning in the ceph cluster status:

cluster:
id:     6d3e49a4-ab4d-4e03-a7d6-58913b8ec00a'
health: HEALTH_WARN'
        insufficient standby MDS daemons available'
(...)
services:
  mds:     cephfs:3 {0=mds1=up:active,1=mds0=up:active,2=mds2=up:active}'

Let's use 2 active and 1 standby mds.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4a6d19dae2)
2019-12-04 17:49:33 -05:00
Guillaume Abrioux 99cdcf9d29 tests: add coverage on purge playbook
This commit adds a playbook to be played before we run purge playbook,
it first creates an rbd image then map an rbd device on client0 so the
purge playbook will try to unmap it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit db77fbda15)
2019-11-14 10:49:38 -05:00
Dimitri Savineau 54365389f8 tests/requirements: bump testinfra and pytest
The ansible ssh connections are now using the ssh backend instead of
paramiko starting testinfra 3.1 and persistent connections too.
pytest 4.6 is the latest release to be supported by python 2.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 02df2ab5ea)
2019-11-04 12:58:37 -05:00
Dimitri Savineau 359abedbb1 move library/plugins tests files under tests dir
To avoid unnecessary ansible warnings during playbook execution we can
move the library and plugins test files under a different directory.

[WARNING]: Skipping plugin (plugins/filter/test_ipaddrs_in_ranges.py) as
it seems to be invalid:
cannot import name 'ipaddrs_in_ranges'

Closes: #4656

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 6ce4fde820)
2019-10-28 15:54:31 +01:00
Guillaume Abrioux 2fd51f8097 tests: use osd ids instead of device name in ooo_collocation
on master, it doesn't make sense anymore to use device name, we should
use osd id instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b5a61fe2e3)
2019-10-23 17:17:24 +02:00
Guillaume Abrioux cd397590f6 tests: fix keyring creation in ooo_collocation
This commit removes the backslash in allow command parameter, this was
needed before the ceph_key module integration.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 384161edcd)
2019-10-23 17:17:24 +02:00
Dimitri Savineau bb2f139a1d tests: update container tag for ooo_collocation
It doesn't make sense to test the old 3.0.x container images with
nautilus+ ceph releases.
Also disable the dashboard deployment and switch to bluestore backend.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3c2840da03)
2019-10-23 17:17:24 +02:00
Dimitri Savineau f8b84ce108 tests: fix the size on the second data LV
The commit replaces the pv/vg/lv commands used with the ansible command
module by the lvg and lvol modules.
This also fixes the size of the second data LV because we were only using
50% of the remaining space instead of 100%.

With a 50G device, the result was:
  - data-lv1 was 25G
  - data-lv2 was 12.5G
Instead of:
  - data-lv1 was 25G
  - data-lv2 was 25G

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2c03c6fcd3)
2019-10-18 17:26:18 -04:00
Guillaume Abrioux 9bc7f8a7d7 tests: add multimds coverage
This commit makes the all_daemons scenario deploying 3 mds in order to
cover the multimds case.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 25b98b2ce3)
2019-10-18 22:09:04 +02:00
Dimitri Savineau c6f1ef893d tests: reduce handler mon and osd delay
We don't need to have high handler delay in the CI so reducing to
10 seconds.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 04ec1ad3cc)
2019-10-18 22:09:04 +02:00
Dimitri Savineau 8117ed34d4 Remove validate action and notario dependency
The current ceph-validate role is using both validate action and fail
module tasks to validate the ceph configuration.
The validate action is based on the notario python library. When one of
the notario validation fails then a python stack trace is reported to the
ansible task. This output isn't understandable by users.

This patch removes the validate action and the notario depencendy. The
validation is now done with only fail ansible module.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1654790

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0f978d969b)
2019-10-15 10:21:54 -04:00
Dimitri Savineau 067aa3aabd tests: fix rgw multisite vagrant variables
The secondary vagrant variables didn't have the grafana vm variable
set which create an vagrant error.

There was an error loading a Vagrantfile. The file being loaded
and the error message are shown below. This is usually caused by
an invalid or undefined variable.

This patch also changes the ssh-extra-args parameter to ssh-common-args
to get the same values for ssh/sftp/scp. Otherwise we can see warnings
from ansible and some tasks are failing.

[WARNING]: sftp transfer mechanism failed on [mon0]. Use ANSIBLE_DEBUG=1
to see detailed information

It also updates the ssh-common-args value for the rgw-multisite scenario
to reflect the ANSIBLE_SSH_ARGS environment variable value.

Finally changing the IP addresses due to the Vagrant refact done in the
commit 778c51a

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 010158ff84)
2019-10-04 16:48:00 -04:00
Guillaume Abrioux 4f806786da tests: remove debug log verbosity
This was added for debugging purpose.
It's generating very large log output, let's remove this now.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 01f6dd52b3)
2019-09-30 09:16:33 +02:00
Guillaume Abrioux 99e6807f51 tests: pin jinja2 version
ensure we get the latest jinja2 version.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 006df148d0)
2019-09-26 16:21:54 +02:00
Guillaume Abrioux b1e61be9c6 tests: set copy_admin_key at group_vars level
setting it at extra vars level prevent from setting it per node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5bb6a4da42)
2019-09-26 16:21:54 +02:00