Commit Graph

542 Commits (5e7962ccf6c9fa35f5611888826131c4e9e8f043)

Author SHA1 Message Date
Dimitri Savineau 64701437de container: remove ulimit nofile parameter
Since Ceph Octopus is python3 only we don't need to specify the max open
files anymore with the container engine.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-30 09:54:23 +02:00
Dimitri Savineau 760b6cd7b0 ceph_volume: fix multiple db/wal/journal devices
When using the lvm batch ceph-volume subcommand with dedicated devices
for filestore (journal) or bluestore (db/wal) then the list of devices
is convert to a string instead of being extended via an iterable.
This was working with only one dedicated device but starting with more
then the ceph_volume module fails.

TASK [ceph-osd : use ceph-volume lvm batch to create bluestore osds] **
fatal: [xxxxxx]: FAILED! => changed=true
  cmd:
  - ceph-volume
  - --cluster
  - ceph
  - lvm
  - batch
  - --bluestore
  - --yes
  - --prepare
  - --osds-per-device
  - '4'
  - /dev/nvme2n1
  - /dev/nvme3n1
  - /dev/nvme4n1
  - /dev/nvme5n1
  - /dev/nvme6n1
  - --db-devices
  - /dev/nvme0n1 /dev/nvme1n1
  - --report
  - --format=json
  msg: non-zero return code
  rc: 2
  stderr: |2-
     stderr: lsblk: /dev/nvme0n1 /dev/nvme1n1: not a block device
     stderr: error: /dev/nvme0n1 /dev/nvme1n1: No such file or directory
     stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
    usage: ceph-volume lvm batch [-h] [--db-devices [DB_DEVICES [DB_DEVICES ...]]]
                                 [--wal-devices [WAL_DEVICES [WAL_DEVICES ...]]]
                                 [--journal-devices [JOURNAL_DEVICES [JOURNAL_DEVICES ...]]]
                                 [--no-auto] [--bluestore] [--filestore]
                                 [--report] [--yes] [--format {json,pretty}]
                                 [--dmcrypt]
                                 [--crush-device-class CRUSH_DEVICE_CLASS]
                                 [--no-systemd]
                                 [--osds-per-device OSDS_PER_DEVICE]
                                 [--block-db-size BLOCK_DB_SIZE]
                                 [--block-wal-size BLOCK_WAL_SIZE]
                                 [--journal-size JOURNAL_SIZE] [--prepare]
                                 [--osd-ids [OSD_IDS [OSD_IDS ...]]]
                                 [DEVICES [DEVICES ...]]
    ceph-volume lvm batch: error: Unable to proceed with non-existing device: /dev/nvme0n1 /dev/nvme1n1

So the dedicated device list is considered as a single string.

This commit also adds the journal_devices, block_db_devices and
wal_devices documentation to the ceph_volume module.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816713

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-30 09:49:54 +02:00
Guillaume Abrioux 83fdf24caf doc/tests: bump to ansible 2.9 on master
Add testing against ansible 2.9 on master branch.
This commit also updates the documentation.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-25 08:01:27 +01:00
Dimitri Savineau 0f0a14772c tests: update mgr dashboard socket listening test
Since 15ed9ee the ceph-mgr daemon binds on the IP address on the public
network instead of binding on all addresses.
This commit updates the testinfra code to reflect that change.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-24 21:11:02 +01:00
Dimitri Savineau ac4f8763aa tests: register mark in pytest configuration
Unregister marks generates warnings like:

PytestUnknownMarkWarning: Unknown pytest.mark.docker - is this a typo?
You can register custom marks to avoid this warning

https://docs.pytest.org/en/latest/mark.html

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-24 15:19:18 +01:00
Dimitri Savineau f2c6281207 tests: add dashboard testinfra configuration
This commit adds basic tests for grafana, prometheus, node-exporter and
ceph mgr dashboard services.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-24 15:19:18 +01:00
Dimitri Savineau df8f853c85 Add pacific release
Add the 16th ceph release: pacific.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-24 09:47:12 +01:00
Dimitri Savineau fb69f6990c dashboard: allow to set read-only admin user
This commit allows one to set the role for the admin user as read-only.
This can be controlled via the dashboard_admin_user_ro variable but the
default value is false for backward compatibility.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1810176

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-19 15:34:41 +01:00
Guillaume Abrioux 60a2e28189 rgw: add multi-instances support when deploying multisite
This commit adds the multi-instances when deploying rgw multisite

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-12 16:44:48 -04:00
Dimitri Savineau e62532de46 update osd pool set size command
Since [1] we can't use osd pool without replicas (size: 1) by default.
We now need to set the mon_allow_pool_size_one flag to true in the ceph
configuration and add the --yes-i-really-mean-it flag to the osd pool
set size cli.

[1] https://github.com/ceph/ceph/commit/21508bd

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-11 11:25:42 +01:00
Dimitri Savineau ccec67aa6a tests/requirements: bump testinfra
3.4 is the latest testinfra release available but python2 is dropped
starting 4.0.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-09 09:46:11 +01:00
Ali Maredia 71f55bd54d rgw multisite: enable more than 1 realm per cluster
Make it so that more than one realm, zonegroup,
or zone can be created during a run of the rgw
multisite ansible playbooks.

The rgw hosts now need to be grouped into zones
and realms in the inventory.

.yml files need to be created in group_vars
for the realms and zones. Sample yaml files
are available.

Also remove multsite destroy playbook
and add --cluster before radosgw-admin commands

remove manually added rgw_zone_endpoints var
and have ceph-ansible automatically add the
correct endpoints of all the rgws in a rgw_zone
from the information provided in that rgws hostvars.

Signed-off-by: Ali Maredia <amaredia@redhat.com>
2020-03-04 12:58:13 -05:00
Guillaume Abrioux 9f0c6df94f tests: add more osd nodes in all_daemons scenario
This commit adds more osd nodes in all_daemons scenario in order to test
erasure pool creation.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-04 09:29:01 -05:00
Guillaume Abrioux 248978596a tests: update ooo job
This commit changes the value passed for the attribute 'rule_name' in
openstack_pools definition. It doesn't make sense to have emptry string
as passed value here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-04 09:29:01 -05:00
Guillaume Abrioux 8cacba1f54 tests: add erasure pool creation test in CI
This commit makes the CI testing an OSD pool erasure creation due to the
recent refact of the OSD pool creation tasks in the playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-04 09:29:01 -05:00
Guillaume Abrioux a3b797e059 tests: enable pg autoscaler on 1 pool
This commit enables the pg autoscaler on 1 pool.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-04 09:29:01 -05:00
Guillaume Abrioux 896d00b50e tests: add lvm batch filestore testing
This commit adds an OSD node in lvm-batch scenario in order to test
filestore backend.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-03 13:50:19 -05:00
Guillaume Abrioux 0fc99bb6fa tests: increase journal_size value
Looks like we are still seeing issue [1].
Let's increase this value to unlock the CI (however, it still needs to
be investigated).

Typical error (see [1] for further details) :
```
[root@osd2 ~]# ceph-volume --cluster ceph lvm batch --filestore --yes --journal-size '2048' /dev/sda /dev/sdb --journal-devices /dev/sdc
Running command: /sbin/vgcreate --force --yes ceph-journals-817ef90b-77ac-4f52-b8a9-30893849fb78 /dev/sdc
 stdout: Physical volume "/dev/sdc" successfully created.
 stdout: Volume group "ceph-journals-817ef90b-77ac-4f52-b8a9-30893849fb78" successfully created
--> Refusing to continue with configured size for journal
-->  RuntimeError: journal sizes must be larger than 2GB, detected: 1024.00 MB
```

[1] https://tracker.ceph.com/issues/41374

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-03 13:23:57 -05:00
Guillaume Abrioux 0326d992c2 osd: add journal option in ceph_volume call (batch)
This commit adds the journal option to the ceph_volume call when
scenario is lvm batch

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-02-28 17:29:59 -05:00
Guillaume Abrioux a2d2e70ac2 requirements: enforce ansible version requirement
See https://github.com/advisories/GHSA-3m93-m4q6-mc6v

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-02-27 09:28:17 -05:00
Dimitri Savineau ac0f68ccf0 ceph-dashboard: update create/get rgw user tasks
Since [1] if a rgw user already exists then the radosgw-admin user create
command will return an error instead of modifying the current user.
We were already doing separated tasks for create and get operation but
only for multisite configuration but it's not enough.
Instead we should do the get task first and depending on the result
execute the create.
This commit also adds missing run_once and delegate_to statement.

[1] https://github.com/ceph/ceph/commit/269e9b9

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-18 10:22:21 +01:00
Ali Maredia 1834c1e48d rgw: extend automatic rgw pool creation capability
Add support for erasure code pools.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1731148

Signed-off-by: Ali Maredia <amaredia@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-17 16:07:43 +01:00
Dimitri Savineau 85d7102a95 Revert "vagrant: temp workaround for CentOS 8 cloud image"
The CentOS 8 vagrant image download is now fixed.

This reverts commit a5385e1048.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-17 11:30:39 +01:00
Dimitri Savineau 779a4a6d71 tests: don't install s3cmd on containerized setup
The s3cmd package should only be installed on non containerized
deployment.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-17 11:27:52 +01:00
Guillaume Abrioux 910fc61fdc tests: remove legacy `osd_scenario` variable
As of stable-4.0 most of these references aren't needed anymore.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-02-04 10:05:33 +01:00
Guillaume Abrioux 641729357e tests: add external_clients scenario
This commit adds a new 'external ceph clients' scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-31 12:02:15 +01:00
Guillaume Abrioux c040199c8f tests: set dashboard|grafana_admin_password
Set these 2 variables in all test scenarios where `dashboard_enabled` is
`True`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-29 08:45:34 +01:00
Guillaume Abrioux 3e7dbb4b16 tests: add 'all_in_one' scenario
Add new scenario 'all_in_one' in order to catch more collocated related
issues.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-27 15:30:45 -05:00
Dimitri Savineau bb3eae0c80 filestore-to-bluestore: fix osd_auto_discovery
When osd_auto_discovery is set then we need to refresh the
ansible_devices fact between after the filestore OSD purge
otherwise the devices fact won't be populated.
Also remove the gpt header on ceph_disk_osds_devices because
the devices is empty at this point for osd_auto_discovery.
Adding the bool filter when needed.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-22 09:36:09 +01:00
Dimitri Savineau f995b079a6 filestore-to-bluestore: --destroy with raw devices
We still need --destroy when using a raw device otherwise we won't be
able to recreate the lvm stack on that device with bluestore.

Running command: /usr/sbin/vgcreate -s 1G --force --yes ceph-bdc67a84-894a-4687-b43f-bcd76317580a /dev/sdd
 stderr: Physical volume '/dev/sdd' is already in volume group 'ceph-b7801d50-e827-4857-95ec-3291ad6f0151'
  Unable to add physical volume '/dev/sdd' to volume group 'ceph-b7801d50-e827-4857-95ec-3291ad6f0151'
  /dev/sdd: physical volume not initialized.
--> Was unable to complete a new OSD, will rollback changes

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792227

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-21 11:37:39 -05:00
Dimitri Savineau a5385e1048 vagrant: temp workaround for CentOS 8 cloud image
The CentOS cloud infrastructure storing the vagrant CentOS 8 image
changed the directory path and remove the old 8.0 image so the vagrant
box add centos/8 fails returning a 404 http error.
As a workaround we can pull the image from CentOS instead of letting
vagrant doing the resolution.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-15 17:52:35 +01:00
Dimitri Savineau 3900527e16 tests/setup: update mount options on EL 8
The nobarrier mount flag doesn't exist anymoer on XFS in the EL 8
kernel. That's why the task wasn't working on those systems.
We can still use the other options instead of skipping the task.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-11 05:33:01 +01:00
Guillaume Abrioux dc672e86ec tests: add a docker2podman scenario
This commit adds a new scenario in order to test docker-to-podman.yml
migration playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-10 10:21:29 -05:00
Guillaume Abrioux 4f2baaab8c tests: disable nfs testing
nfs-ganesha makes the CI failing because of issue related to SELinux.

See:
- https://bugzilla.redhat.com/show_bug.cgi?id=1788563
- https://github.com/nfs-ganesha/nfs-ganesha/issues/527

Until we can get this fixed, let's disable nfs-ganesha testing
temporarily.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-08 11:13:46 +01:00
Dimitri Savineau 7b3e6b932c tests/functional: change docker to podman
Some docker commands were hardcoded in tests playbooks and some
conditions were not taking care of the containerized_deployment
variable but only the atomic fact.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-08 11:13:46 +01:00
Guillaume Abrioux 217d95abb2 common: add centos8 support
Ceph octopus only supports CentOS 8.

This commit adds CentOS 8 support:
  - update vagrant image in tox configurations.
  - add CentOS 8 repository for el8 dependencies.
  - CentOS 8 container engine is podman (same than RHEL 8).
  - don't use the epel mirror on sepia because it's epel7 only.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-08 11:13:46 +01:00
Guillaume Abrioux 40de34fb5e tests: add filestore_to_bluestore job
This commit adds a new job in order to test the
filestore-to-bluestore.yml infrastructure playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-11 09:04:41 -05:00
Dimitri Savineau 4a6d19dae2 tests: reduce max_mds from 3 to 2
Having max_mds value equals to the number of mds nodes generates a
warning in the ceph cluster status:

cluster:
id:     6d3e49a4-ab4d-4e03-a7d6-58913b8ec00a'
health: HEALTH_WARN'
        insufficient standby MDS daemons available'
(...)
services:
  mds:     cephfs:3 {0=mds1=up:active,1=mds0=up:active,2=mds2=up:active}'

Let's use 2 active and 1 standby mds.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-12-04 14:07:29 -05:00
Dimitri Savineau 3f29b243ea tests: fix cluster health status
The current ceph cluster health is in warning state:

health: HEALTH_WARN
        13 pool(s) have no replicas configured
        2 pool(s) have non-power-of-two pg_num

Because we're using only 1 replica then we need to disable the redundancy
check.
The pool pg num should be a power of two number (like 16).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-11-27 16:20:17 +01:00
Dimitri Savineau ef2cb99f73 ceph-osd: add device class to crush rules
This adds device class support to crush rules when using the class key
in the rule dict via the create-replicated sub command.
If the class key isn't specified then we use the create-simple sub
command for backward compatibility.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1636508

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-11-14 16:25:46 +01:00
Guillaume Abrioux 16bcef4f28 tests: add time command in vagrant_up.sh
monitor how long it takes to get all VMs up and running

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-11-08 15:47:46 +01:00
Guillaume Abrioux db77fbda15 tests: add coverage on purge playbook
This commit adds a playbook to be played before we run purge playbook,
it first creates an rbd image then map an rbd device on client0 so the
purge playbook will try to unmap it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-11-08 09:06:11 -05:00
Dimitri Savineau 02df2ab5ea tests/requirements: bump testinfra and pytest
The ansible ssh connections are now using the ssh backend instead of
paramiko starting testinfra 3.1 and persistent connections too.
pytest 4.6 is the latest release to be supported by python 2.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-11-04 09:09:49 -05:00
Dimitri Savineau 6ce4fde820 move library/plugins tests files under tests dir
To avoid unnecessary ansible warnings during playbook execution we can
move the library and plugins test files under a different directory.

[WARNING]: Skipping plugin (plugins/filter/test_ipaddrs_in_ranges.py) as
it seems to be invalid:
cannot import name 'ipaddrs_in_ranges'

Closes: #4656

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-28 09:23:17 +01:00
Guillaume Abrioux b5a61fe2e3 tests: use osd ids instead of device name in ooo_collocation
on master, it doesn't make sense anymore to use device name, we should
use osd id instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-22 13:45:19 +02:00
Guillaume Abrioux 384161edcd tests: fix keyring creation in ooo_collocation
This commit removes the backslash in allow command parameter, this was
needed before the ceph_key module integration.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-22 13:45:19 +02:00
Dimitri Savineau 3c2840da03 tests: update container tag for ooo_collocation
It doesn't make sense to test the old 3.0.x container images with
nautilus+ ceph releases.
Also disable the dashboard deployment and switch to bluestore backend.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-22 13:45:19 +02:00
Guillaume Abrioux 25b98b2ce3 tests: add multimds coverage
This commit makes the all_daemons scenario deploying 3 mds in order to
cover the multimds case.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-18 13:43:13 -04:00
Dimitri Savineau 2c03c6fcd3 tests: fix the size on the second data LV
The commit replaces the pv/vg/lv commands used with the ansible command
module by the lvg and lvol modules.
This also fixes the size of the second data LV because we were only using
50% of the remaining space instead of 100%.

With a 50G device, the result was:
  - data-lv1 was 25G
  - data-lv2 was 12.5G
Instead of:
  - data-lv1 was 25G
  - data-lv2 was 25G

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-17 15:49:15 -04:00
Dimitri Savineau 0f978d969b Remove validate action and notario dependency
The current ceph-validate role is using both validate action and fail
module tasks to validate the ceph configuration.
The validate action is based on the notario python library. When one of
the notario validation fails then a python stack trace is reported to the
ansible task. This output isn't understandable by users.

This patch removes the validate action and the notario depencendy. The
validation is now done with only fail ansible module.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1654790

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-15 11:34:49 +02:00