The jobs launches by the CI are not using 'ansible.cfg'.
There are some parameters that should avoid SSH failure that we are used
to see in the CI so far.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since we encountered issue with this on ansible2.2, this commit provide
the ability to enable or disable it regarding which ansible we are
running.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Add a missing test `test_rbd_mirror_service_is_running_from_luminous()`.
Also using bash -c "<cmd>" to make testinfra aware that later in
the upgrade process we are now running `luminous` ceph release so we
must skip the rbd tests related to `jewel` ceph release.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
ceph-ansible is now being testing against ansible2.2 and ansible2.4. We
need to update tox.ini so we use the right version of testinfra
regarding which ansible version we are using.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
- split purge_cluster because we need to test filestore and bluestore
scenarios.
- clean some leftover.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since it has been decided to stop testing against kraken, we have to
test upgrade from jewel to luminous instead of kraken.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit add new osd scenarios, it aims to simplify the CI setup and
brings a better coverage on the OSD scenarios.
We decided to differentiate between filestore and bluestore, thinking
ahead when filestore won't be supported anymore.
So we now have two classes of tests:
* Filestore
* Bluestore
In each of those classes we have container and non-container.
Then for each we test the following:
* collocated
* collocated dmcrypt
* non-collocated
* non-collocated dmcrypt
* auto discovery collocated
* auto discovery collocated dmcrypt
This gives us a nice coverage and also reduces the footprint on the CI.
We are now up to 4 scenarios, each containing 6 OSD VMs.
Signed-off-by: Sébastien Han <seb@redhat.com>
vagrant is serialized and takes a lot of time compare to simple reboot.
See the benchmarks below for 3 VMs:
[leseb@rick docker]$ time ANSIBLE_SSH_ARGS="-F
/home/leseb/reproduce-ci/tmp.zgGC7d5mIC/build/workspace/ceph-ansible/tests/functional/centos/7/docker/vagrant_ssh_config" ansible-playbook -i /home/leseb/reproduce-ci/tmp.zgGC7d5mIC/build/workspace/ceph-ansible/tests/functional/centos/7/docker/hosts reboot.yml
PLAY [mons]
****************************************************************************************************************************************************************************************************
TASK [Gathering Facts]
*****************************************************************************************************************************************************************************************
ok: [mon1]
ok: [mon2]
ok: [mon0]
TASK [restart machine]
*****************************************************************************************************************************************************************************************
changed: [mon2]
changed: [mon1]
changed: [mon0]
TASK [wait for server to boot]
*********************************************************************************************************************************************************************************
ok: [mon2 -> localhost]
ok: [mon0 -> localhost]
ok: [mon1 -> localhost]
TASK [uptime]
**************************************************************************************************************************************************************************************************
changed: [mon2]
changed: [mon0]
changed: [mon1]
PLAY RECAP
*****************************************************************************************************************************************************************************************************
mon0 : ok=4 changed=2 unreachable=0
failed=0
mon1 : ok=4 changed=2 unreachable=0
failed=0
mon2 : ok=4 changed=2 unreachable=0
failed=0
real 0m35.112s
user 0m5.737s
sys 0m1.849s
[leseb@rick docker]$ time vagrant reload
==> mon0: Halting domain...
==> mon0: Starting domain.
==> mon0: Waiting for domain to get an IP address...
==> mon0: Waiting for SSH to become available...
==> mon0: Creating shared folders metadata...
==> mon0: Rsyncing folder:
/home/leseb/reproduce-ci/tmp.zgGC7d5mIC/build/workspace/ceph-ansible/tests/functional/centos/7/docker/
=> /home/vagrant/sync
==> mon0: Machine already provisioned. Run `vagrant provision` or use
the `--provision`
==> mon0: flag to force provisioning. Provisioners marked to run always
will still run.
==> mon1: Halting domain...
==> mon1: Starting domain.
==> mon1: Waiting for domain to get an IP address...
==> mon1: Waiting for SSH to become available...
==> mon1: Creating shared folders metadata...
==> mon1: Rsyncing folder:
/home/leseb/reproduce-ci/tmp.zgGC7d5mIC/build/workspace/ceph-ansible/tests/functional/centos/7/docker/
=> /home/vagrant/sync
==> mon1: Machine already provisioned. Run `vagrant provision` or use
the `--provision`
==> mon1: flag to force provisioning. Provisioners marked to run always
will still run.
==> mon2: Halting domain...
==> mon2: Starting domain.
==> mon2: Waiting for domain to get an IP address...
==> mon2: Waiting for SSH to become available...
==> mon2: Creating shared folders metadata...
==> mon2: Rsyncing folder:
/home/leseb/reproduce-ci/tmp.zgGC7d5mIC/build/workspace/ceph-ansible/tests/functional/centos/7/docker/
=> /home/vagrant/sync
==> mon2: Machine already provisioned. Run `vagrant provision` or use
the `--provision`
==> mon2: flag to force provisioning. Provisioners marked to run always
will still run.
real 1m31.850s
user 0m7.387s
sys 0m0.796s
Reboot via Ansible: 0m35.112s
Reboot via vagrant: 1m31.850s
We save 1/3 time.
Signed-off-by: Sébastien Han <seb@redhat.com>
This patch adds the `profile_tasks` callback plugin to the whitelist
so that we can identify the tasks which are taking the longest amount
of time to run.
We don't test server reboot, a lot of things can happen after that.
So now, we deploy, reboot then we run testinfra.
Signed-off-by: Sébastien Han <seb@redhat.com>
Prior to this patch this activation sequence for autodetection was
always skipped because we were asking to activate on device without
partitions, which doesn't make sense.
We also fix the way we lookup for a device, since the data partition is
always numbered 1, we take the min element of the dict.
Closes: https://github.com/ceph/ceph-ansible/issues/1782
Signed-off-by: Sébastien Han <seb@redhat.com>
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
we need to force the value of `docker` variable which is initially set
to `false` since it's a migration from non-containerized to
containerized cluster.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We must mask the image so we are sure that even if the system reboots
then the OSDs won't start.
Also remove Ceph udev rules if found on the system prior to deploy
containers. If we don't do this we are exposed to conflicts between udev
rules and sytemd unit files.
Also add the CI will now test the migration from a non-containerized cluster to a
containerized cluster.
Signed-off-by: Sébastien Han <seb@redhat.com>
The installation process is now described as follow:
* you still have to choose a 'ceph_origin' installation method. The
origin can be a 'repository' (add a new repository), distro (it will use
the packages provided by the native repo source of your distribution),
local (only available on redhat system, it installs locally built
packages). This option is not well tested, so use it carefully
* if ceph_origin == 'repository' you will have to decide what kind of
repository you want to enable:
- community: corresponds to the stable upstream/community version
- enterprise: corresponds to the stable enterprise/downstream version
(basically you are a red hat customer)
- dev: it will install ceph from packages built out of the github
development branches
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
There is only two main scenarios now:
* collocated: everything remains on the same device:
- data, db, wal for bluestore
- data and journal for filestore
* non-collocated: dedicated device for some of the component
Signed-off-by: Sébastien Han <seb@redhat.com>
If you use the 'dev' factor, the testing scenario will
use repos from shaman.ceph.com. You can define CEPH_DEV_BRANCH
and CEPH_DEV_SHA1 to specify which repo you'd like to test.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
Since we are hitting this bug :
https://bugzilla.redhat.com/show_bug.cgi?id=1324587
eg:
`failed: internal error: Monitor path /var/lib/libvirt/qemu/domain-bs-docker-cl
uster-dmcrypt-journal-collocation_mon0_1499294943_ba9faf7bf296533177f6/monitor.
sock too big for destination`
and we can't upgrade libvirt in our CI for some reason
we need to get the directories name shorter in order to workaround this
issue
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When we purge a containerized cluster we need to use the correct
playbook when redploying the cluster.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
rhcs is based off of jewel, so we need to set this var in the tests so
that the ceph-mgr role is skipped.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
I continue to have issues with extra-vars as json. The latest issue
being that the ceph_docker_image_tag config option included in the json
was being ignored. I can't find the root cause, by using the key/value
format seems to work.
I've also removed several options here to simply the interface. We can
add those back if they become necessary.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
I'm removing this because when we use an 'rhcs' scenario then we attempt
to set CEPH_STABLE=false as an environment variable. The issue with that
is because the value is coming from an environment variable it is always
treated as a string and ansible treats that as a boolean True. I plan to
set the ceph_stable value with our rhcs_setup.yml playbook instead of
relying on ---extra-vars and environment variables.
Related ansible issue: https://github.com/ansible/ansible/issues/17193
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
When testing this downstream it makes more sense for this scenario to be
named just 'cluster' because have 'centos7' in the name is misleading.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
This allows for us to have a copy of the existing testing scenarios with
a 'rhcs-' prefix. We can use that in the tox.ini to take actions we need
to properly test Red Hat Ceph Storage.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
This will prevent ansible from misreading any of these values. There
were failures with xenial deployments because the value set for
``ceph_rhcs`` was being treated as a boolean True even though I'd set
the value to false. This is because boolean values passed in with
--extra-vars must use the json format.
The formatting of the json is very important as you need a '\' to escape
the starting and ending json to make tox happy. Also, each line needs to
end with '\' if it's a multi-line command.
Another thing to note is that if you want to use extra vars at the
command line to respond to a vars_prompt it must be in key/value format.
This is why we have a -e and a --extra-vars on the purge and update
tests.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
When using CEPH_DEV=true you'll need to set CEPH_STABLE=false so that
that an upstream repo file doesn't get created.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
Use CEPH_STABLE_RELEASE to set the name of the ceph release you plan to
install. When testing an upgrade scenario you'll also need to set
UPGRADE_CEPH_STABLE_RELEASE.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
To run tests that deploy shaman repos set CEPH_DEV=true and optionally
use CEPH_DEV_BRANCH and CEPH_DEV_SHA1 to define with branch and sha1 to
test. CEPH_DEV_BRANCH defaults to master and CEPH_DEV_SHA1 defaults to
latest.
For example, this would run the journal_collocation test with the latest
build of the master branch:
CEPH_DEV=true tox -rve ansible2.2-journal_collocation
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
For example, the following would run the journal collocation test and
would install ceph from the repos already on the nodes:
CEPH_ORIGIN=distro tox -rve ansible2.2-journal_collocation
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
The purge_dmcrypt scenario also tests centos7, so change this one to
xenial so we can have more test coverage.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
This also removes the purge_cluster_collocated scenario as it's not
needed now because of purge_cluster.
Moving all the purge commands into its own section allows for ease of
reuse when creating new purge scenarios.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
There is an Ansible bug which makes the playbook fail when we are
running a playbook from the non-git root directory. The real problem is
that the ansible.cfg is not honoured and we are including variable from
roles/<role>/defaults/main.yml
The fix is too copy the purge cluster playbook on the git root directory
and execute it.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
This scenario brings up a 1 mon 1 osd cluster using journal collocation,
purges the cluster and then verifies it can redeploy the cluster.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
This commits allows for scenarios to pick their own playbook while
defaulting to use site.yml.sample.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>