when rerunning lvm_setup.yml on existing cluster with OSDs already
deployed, it fails like following:
```
fatal: [osd0]: FAILED! => changed=false
msg: Sorry, no shrinking of data-lv2 to 0 permitted.
```
because we are asking `lvol` module to create a volume on an empty VG
with size extents = `100%FREE`.
The default behavior of `lvol` is to shrink the volume if the LV's current
size is greater than the requested size.
Given the requested size is calculated like this:
`size_requested = size_percent * this_vg['free'] / 100`
in our case, it is similar to:
`size_requested = 100 * 0 / 100` which basically means `0`
So the current LV size is well greater than the requested size which
leads the module to attempt to shrink it to 0 which isn't obviously now
allowed.
Adding `shrink: false` to the module calls fixes this issue.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 218f4ae361)
Since [1] if a rgw user already exists then the radosgw-admin user create
command will return an error instead of modifying the current user.
We were already doing separated tasks for create and get operation but
only for multisite configuration but it's not enough.
Instead we should do the get task first and depending on the result
execute the create.
This commit also adds missing run_once and delegate_to statement.
[1] https://github.com/ceph/ceph/commit/269e9b9
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ac0f68ccf0)
This fixes a long standing fail in ceph-volumes lvm test suite.
Otherwise the default behaviour should not change.
Signed-off-by: Jan Fajerski <jfajerski@suse.com>
(cherry picked from commit 1fe8e819f9)
Since we only have one scenario since nautilus then we can just move
the container start command from ceph-osd-run.sh to the systemd unit
service.
As a result, the ceph-osd-run.sh.j2 template and the
ceph_osd_docker_run_script_path variable are removed.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 829990e60d)
This commit makes the zap function idempotent, especially when using
lvm_volumes variable.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1845668
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3f47236470)
The dashboard nodes (alertmanager, grafana, node-exporter, and prometheus)
were not manage during the docker to podman migration.
This adds the systemd container template of those services to a dedicated
file (systemd.yml) in order to include it in the docker2podman playbook.
This also adds the dashboard container images pull from docker to podman.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1829389
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 252e78b4e4)
Since 15ed9ee the ceph-mgr daemon binds on the IP address on the public
network instead of binding on all addresses.
This commit updates the testinfra code to reflect that change.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0f0a14772c)
Unregister marks generates warnings like:
PytestUnknownMarkWarning: Unknown pytest.mark.docker - is this a typo?
You can register custom marks to avoid this warning
https://docs.pytest.org/en/latest/mark.html
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ac4f8763aa)
With this change, the state `present` is enough to update a keyring.
If the keyring already exist, it will be updated if caps or secret
passed to the module are different.
If the keyring doen't exist, it will be created.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1808367
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 553584cbd0)
When using the lvm batch ceph-volume subcommand with dedicated devices
for bluestore (db/wal) then the list of devices is convert to a string
instead of being extended via an iterable.
This was working with only one dedicated device but starting with more
then the ceph_volume module fails.
TASK [ceph-osd : use ceph-volume lvm batch to create bluestore osds] **
fatal: [xxxxxx]: FAILED! => changed=true
cmd:
- ceph-volume
- --cluster
- ceph
- lvm
- batch
- --bluestore
- --yes
- --prepare
- --osds-per-device
- '4'
- /dev/nvme2n1
- /dev/nvme3n1
- /dev/nvme4n1
- /dev/nvme5n1
- /dev/nvme6n1
- --db-devices
- /dev/nvme0n1 /dev/nvme1n1
- --report
- --format=json
msg: non-zero return code
rc: 2
stderr: |2-
stderr: lsblk: /dev/nvme0n1 /dev/nvme1n1: not a block device
stderr: error: /dev/nvme0n1 /dev/nvme1n1: No such file or directory
stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
usage: ceph-volume lvm batch [-h] [--db-devices [DB_DEVICES [DB_DEVICES ...]]]
[--wal-devices [WAL_DEVICES [WAL_DEVICES ...]]]
[--journal-devices [JOURNAL_DEVICES [JOURNAL_DEVICES ...]]]
[--no-auto] [--bluestore] [--filestore]
[--report] [--yes] [--format {json,pretty}]
[--dmcrypt]
[--crush-device-class CRUSH_DEVICE_CLASS]
[--no-systemd]
[--osds-per-device OSDS_PER_DEVICE]
[--block-db-size BLOCK_DB_SIZE]
[--block-wal-size BLOCK_WAL_SIZE]
[--journal-size JOURNAL_SIZE] [--prepare]
[--osd-ids [OSD_IDS [OSD_IDS ...]]]
[DEVICES [DEVICES ...]]
ceph-volume lvm batch: error: Unable to proceed with non-existing device: /dev/nvme0n1 /dev/nvme1n1
So the dedicated device list is considered as a single string.
This commit also adds the block_db_devices and wal_devices documentation
to the ceph_volume module.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816713
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 760b6cd7b0)
This commit allows one to set the role for the admin user as read-only.
This can be controlled via the dashboard_admin_user_ro variable but the
default value is false for backward compatibility.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1810176
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fb69f6990c)
3.4 is the latest testinfra release available but python2 is dropped
starting 4.0.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ccec67aa6a)
This commit adds more osd nodes in all_daemons scenario in order to test
erasure pool creation.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9f0c6df94f)
This commit changes the value passed for the attribute 'rule_name' in
openstack_pools definition. It doesn't make sense to have emptry string
as passed value here.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 248978596a)
This commit makes the CI testing an OSD pool erasure creation due to the
recent refact of the OSD pool creation tasks in the playbook.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8cacba1f54)
Make it so that more than one realm, zonegroup,
or zone can be created during a run of the rgw
multisite ansible playbooks.
The rgw hosts now need to be grouped into zones
and realms in the inventory.
.yml files need to be created in group_vars
for the realms and zones. Sample yaml files
are available.
Also remove multsite destroy playbook
and add --cluster before radosgw-admin commands
remove manually added rgw_zone_endpoints var
and have ceph-ansible automatically add the
correct endpoints of all the rgws in a rgw_zone
from the information provided in that rgws hostvars.
Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 71f55bd54d)
Set these 2 variables in all test scenarios where `dashboard_enabled` is
`True`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c040199c8f)
Add new scenario 'all_in_one' in order to catch more collocated related
issues.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3e7dbb4b16)
When osd_auto_discovery is set then we need to refresh the
ansible_devices fact between after the filestore OSD purge
otherwise the devices fact won't be populated.
Also remove the gpt header on ceph_disk_osds_devices because
the devices is empty at this point for osd_auto_discovery.
Adding the bool filter when needed.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bb3eae0c80)
We still need --destroy when using a raw device otherwise we won't be
able to recreate the lvm stack on that device with bluestore.
Running command: /usr/sbin/vgcreate -s 1G --force --yes ceph-bdc67a84-894a-4687-b43f-bcd76317580a /dev/sdd
stderr: Physical volume '/dev/sdd' is already in volume group 'ceph-b7801d50-e827-4857-95ec-3291ad6f0151'
Unable to add physical volume '/dev/sdd' to volume group 'ceph-b7801d50-e827-4857-95ec-3291ad6f0151'
/dev/sdd: physical volume not initialized.
--> Was unable to complete a new OSD, will rollback changes
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792227
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f995b079a6)
Add a script to retry several times to fire up VMs to avoid vagrant
failures.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 1ecb3a9352)
This commit adds a new scenario in order to test docker-to-podman.yml
migration playbook.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit dc672e86ec)
This adds device class support to crush rules when using the class key
in the rule dict via the create-replicated sub command.
If the class key isn't specified then we use the create-simple sub
command for backward compatibility.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1636508
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ef2cb99f73)
The ceph iscsi repository was still set to dev (shaman) instead of
using the stable ceph-iscsi repository.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This commit adds a new job in order to test the
filestore-to-bluestore.yml infrastructure playbook.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 40de34fb5e)
Having max_mds value equals to the number of mds nodes generates a
warning in the ceph cluster status:
cluster:
id: 6d3e49a4-ab4d-4e03-a7d6-58913b8ec00a'
health: HEALTH_WARN'
insufficient standby MDS daemons available'
(...)
services:
mds: cephfs:3 {0=mds1=up:active,1=mds0=up:active,2=mds2=up:active}'
Let's use 2 active and 1 standby mds.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4a6d19dae2)
This commit adds a playbook to be played before we run purge playbook,
it first creates an rbd image then map an rbd device on client0 so the
purge playbook will try to unmap it.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit db77fbda15)
The ansible ssh connections are now using the ssh backend instead of
paramiko starting testinfra 3.1 and persistent connections too.
pytest 4.6 is the latest release to be supported by python 2.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 02df2ab5ea)
To avoid unnecessary ansible warnings during playbook execution we can
move the library and plugins test files under a different directory.
[WARNING]: Skipping plugin (plugins/filter/test_ipaddrs_in_ranges.py) as
it seems to be invalid:
cannot import name 'ipaddrs_in_ranges'
Closes: #4656
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 6ce4fde820)
on master, it doesn't make sense anymore to use device name, we should
use osd id instead.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b5a61fe2e3)
This commit removes the backslash in allow command parameter, this was
needed before the ceph_key module integration.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 384161edcd)
It doesn't make sense to test the old 3.0.x container images with
nautilus+ ceph releases.
Also disable the dashboard deployment and switch to bluestore backend.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3c2840da03)
The commit replaces the pv/vg/lv commands used with the ansible command
module by the lvg and lvol modules.
This also fixes the size of the second data LV because we were only using
50% of the remaining space instead of 100%.
With a 50G device, the result was:
- data-lv1 was 25G
- data-lv2 was 12.5G
Instead of:
- data-lv1 was 25G
- data-lv2 was 25G
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2c03c6fcd3)
This commit makes the all_daemons scenario deploying 3 mds in order to
cover the multimds case.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 25b98b2ce3)
We don't need to have high handler delay in the CI so reducing to
10 seconds.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 04ec1ad3cc)
The current ceph-validate role is using both validate action and fail
module tasks to validate the ceph configuration.
The validate action is based on the notario python library. When one of
the notario validation fails then a python stack trace is reported to the
ansible task. This output isn't understandable by users.
This patch removes the validate action and the notario depencendy. The
validation is now done with only fail ansible module.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1654790
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0f978d969b)
The secondary vagrant variables didn't have the grafana vm variable
set which create an vagrant error.
There was an error loading a Vagrantfile. The file being loaded
and the error message are shown below. This is usually caused by
an invalid or undefined variable.
This patch also changes the ssh-extra-args parameter to ssh-common-args
to get the same values for ssh/sftp/scp. Otherwise we can see warnings
from ansible and some tasks are failing.
[WARNING]: sftp transfer mechanism failed on [mon0]. Use ANSIBLE_DEBUG=1
to see detailed information
It also updates the ssh-common-args value for the rgw-multisite scenario
to reflect the ANSIBLE_SSH_ARGS environment variable value.
Finally changing the IP addresses due to the Vagrant refact done in the
commit 778c51a
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 010158ff84)
This was added for debugging purpose.
It's generating very large log output, let's remove this now.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 01f6dd52b3)
setting it at extra vars level prevent from setting it per node.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5bb6a4da42)