This commit add new osd scenarios, it aims to simplify the CI setup and
brings a better coverage on the OSD scenarios.
We decided to differentiate between filestore and bluestore, thinking
ahead when filestore won't be supported anymore.
So we now have two classes of tests:
* Filestore
* Bluestore
In each of those classes we have container and non-container.
Then for each we test the following:
* collocated
* collocated dmcrypt
* non-collocated
* non-collocated dmcrypt
* auto discovery collocated
* auto discovery collocated dmcrypt
This gives us a nice coverage and also reduces the footprint on the CI.
We are now up to 4 scenarios, each containing 6 OSD VMs.
Signed-off-by: Sébastien Han <seb@redhat.com>
1. add the variables to docker_collocation
2. trigger the check when a MDS is part of the inventory file, not when
we run on an MDS...
Signed-off-by: Sébastien Han <seb@redhat.com>
vagrant is serialized and takes a lot of time compare to simple reboot.
See the benchmarks below for 3 VMs:
[leseb@rick docker]$ time ANSIBLE_SSH_ARGS="-F
/home/leseb/reproduce-ci/tmp.zgGC7d5mIC/build/workspace/ceph-ansible/tests/functional/centos/7/docker/vagrant_ssh_config" ansible-playbook -i /home/leseb/reproduce-ci/tmp.zgGC7d5mIC/build/workspace/ceph-ansible/tests/functional/centos/7/docker/hosts reboot.yml
PLAY [mons]
****************************************************************************************************************************************************************************************************
TASK [Gathering Facts]
*****************************************************************************************************************************************************************************************
ok: [mon1]
ok: [mon2]
ok: [mon0]
TASK [restart machine]
*****************************************************************************************************************************************************************************************
changed: [mon2]
changed: [mon1]
changed: [mon0]
TASK [wait for server to boot]
*********************************************************************************************************************************************************************************
ok: [mon2 -> localhost]
ok: [mon0 -> localhost]
ok: [mon1 -> localhost]
TASK [uptime]
**************************************************************************************************************************************************************************************************
changed: [mon2]
changed: [mon0]
changed: [mon1]
PLAY RECAP
*****************************************************************************************************************************************************************************************************
mon0 : ok=4 changed=2 unreachable=0
failed=0
mon1 : ok=4 changed=2 unreachable=0
failed=0
mon2 : ok=4 changed=2 unreachable=0
failed=0
real 0m35.112s
user 0m5.737s
sys 0m1.849s
[leseb@rick docker]$ time vagrant reload
==> mon0: Halting domain...
==> mon0: Starting domain.
==> mon0: Waiting for domain to get an IP address...
==> mon0: Waiting for SSH to become available...
==> mon0: Creating shared folders metadata...
==> mon0: Rsyncing folder:
/home/leseb/reproduce-ci/tmp.zgGC7d5mIC/build/workspace/ceph-ansible/tests/functional/centos/7/docker/
=> /home/vagrant/sync
==> mon0: Machine already provisioned. Run `vagrant provision` or use
the `--provision`
==> mon0: flag to force provisioning. Provisioners marked to run always
will still run.
==> mon1: Halting domain...
==> mon1: Starting domain.
==> mon1: Waiting for domain to get an IP address...
==> mon1: Waiting for SSH to become available...
==> mon1: Creating shared folders metadata...
==> mon1: Rsyncing folder:
/home/leseb/reproduce-ci/tmp.zgGC7d5mIC/build/workspace/ceph-ansible/tests/functional/centos/7/docker/
=> /home/vagrant/sync
==> mon1: Machine already provisioned. Run `vagrant provision` or use
the `--provision`
==> mon1: flag to force provisioning. Provisioners marked to run always
will still run.
==> mon2: Halting domain...
==> mon2: Starting domain.
==> mon2: Waiting for domain to get an IP address...
==> mon2: Waiting for SSH to become available...
==> mon2: Creating shared folders metadata...
==> mon2: Rsyncing folder:
/home/leseb/reproduce-ci/tmp.zgGC7d5mIC/build/workspace/ceph-ansible/tests/functional/centos/7/docker/
=> /home/vagrant/sync
==> mon2: Machine already provisioned. Run `vagrant provision` or use
the `--provision`
==> mon2: flag to force provisioning. Provisioners marked to run always
will still run.
real 1m31.850s
user 0m7.387s
sys 0m0.796s
Reboot via Ansible: 0m35.112s
Reboot via vagrant: 1m31.850s
We save 1/3 time.
Signed-off-by: Sébastien Han <seb@redhat.com>
We now have a variable called ceph_pools that is mandatory when
deploying a MDS.
It's a dictionnary that contains a pool name and a PG count. PG count is
mandatory and must be set, the playbook will fail otherwise.
Closes: https://github.com/ceph/ceph-ansible/issues/2017
Signed-off-by: Sébastien Han <seb@redhat.com>
The `always_run` key is deprecated and being removed in Ansible 2.4.
Using it causes a warning to be displayed:
[DEPRECATION WARNING]: always_run is deprecated.
This patch changes all instances of `always_run` to use the `always`
tag, which causes the task to run each time the playbook runs.
- the rbd-mirror unit systemd name is not the same when running jewel vs
luminous.
- servicemap is not available on jewel.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since we introduced collocation testing scenario, we need to adapt
current tests to this new scenario.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
If we don't do this the client will create pools with a replica 3 since
osd_pool_default_size was gone in ceph-override.json. This was making
switch_to_containers failing.
Signed-off-by: Sébastien Han <seb@redhat.com>
Shared folder is not required for tests.
We should avoid hitting the error :
```
uninitialized constant VagrantPlugins::ProviderLibvirt::Action::ShareFolders
```
Also, disabling it might reduce the needed time in certains cases for the VMs
to be started.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
If two environments are using the same subnet, we will get trouble
because of ips addresses conflicts.
This commit ensures each scenario has a uniq subnet for both public and cluster
network so we can setup several test environment at a time on a same hypervisor.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
the path `/dev/disk/by-path/pci-0000:00:01.1-ata-1.0` doesn't exist.
it has to be changed to `/dev/disk/by-path/pci-0000:00:01.1-ata-1`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit refacts the code regarding all `set_osd_pool_default_*`
related tasks by avoiding usage of useless `set_fact` to determine
whether a key is present in `ceph_conf_overrides`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
If we don't bootstrap the mgr after the mon and the osds handler are
called, we will never be able to reach a clean state since the pgs
stats are handled by the mgr. This also happens when doing daemon
collocation.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1493920
Signed-off-by: Sébastien Han <seb@redhat.com>
This test doesn't work at the moment and need to be fixed.
Disabling it temporary to avoid errors in the CI.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
On a container env, machines don't have any ceph binaries so we need to
use a container to run the commands.
Signed-off-by: Sébastien Han <seb@redhat.com>
Delete these before creating them incase they are left around in a purge
cluster testing scenario. The purge-cluster.yml playbook does not
currently remove partitions used for journals.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
The partition only needs created and given a gpt label so that a
PARTUUID will exist on the partition.
This task also makes the purge_lvm_osds scenario fail on the second
deployment after purging.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
Prior to this patch this activation sequence for autodetection was
always skipped because we were asking to activate on device without
partitions, which doesn't make sense.
We also fix the way we lookup for a device, since the data partition is
always numbered 1, we take the min element of the dict.
Closes: https://github.com/ceph/ceph-ansible/issues/1782
Signed-off-by: Sébastien Han <seb@redhat.com>
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>