Prior to this patch, the certificates where being generated on a single
node only (because of the run_once: true). Thus certificates were not
distributed on all the gateway nodes.
This would require a second ansible run to work. This patches fix the
creation and keys's distribution on all the nodes.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1540845
Signed-off-by: Sébastien Han <seb@redhat.com>
We should stop putting everything in 'all'. This is too easy and this is
error prone as well for those who are separating variables into host
type, things that you should do.
Signed-off-by: Sébastien Han <seb@redhat.com>
We now run tests on the newly created ceph_crush module. Now the CI will
create a specific hierarchy for the OSD.
Signed-off-by: Sébastien Han <seb@redhat.com>
The ceph-ansible upstream CI runs severals tests, including a
'idempotency/handlers' test. It means the playbook is run a first time
and then a second time with an other container image version to ensure the
handlers run properly and the containers are well restarted.
This can cause issues.
For instance, in that specific case which drove me to submit this commit,
I've hit the case where `latest` image ships ceph 12.2.3 while the `stable-3.0`
(which is the image used for the second run) ships ceph 12.2.2.
The goal of this test is not to verify we can upgrade from a specific
version to another but to ensure handlers are working even if it's a valid
failure here.
It should be caught by a test dedicated to that usecase.
We just need to have a container image which has a different id for
the upstream CI, we need the same content in container imagebut a different
image id in the registry since the test relies on image id to decide whether
the container should be restarted.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since we have a task to test the handlers we can test a new container to
validate the service restart on a new container image.
Signed-off-by: Sébastien Han <seb@redhat.com>
Use a nicer syntax for `local_action` tasks.
We used to have oneliner like this:
```
local_action: wait_for port=22 host={{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }} state=started delay=10 timeout=500 }}
```
The usual syntax:
```
local_action:
module: wait_for
port: 22
host: "{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}"
state: started
delay: 10
timeout: 500
```
is nicer and kind of way to keep consistency regarding the whole
playbook.
This also fix a potential issue about missing quotation :
```
Traceback (most recent call last):
File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 213, in <module>
main()
File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 185, in main
rc, out, err = module.run_command(args, executable=executable, use_unsafe_shell=shell, encoding=None, data=stdin)
File "/tmp/ansible_wQtWsi/ansible_modlib.zip/ansible/module_utils/basic.py", line 2710, in run_command
File "/usr/lib64/python2.7/shlex.py", line 279, in split
return list(lex) File "/usr/lib64/python2.7/shlex.py", line 269, in next
token = self.get_token()
File "/usr/lib64/python2.7/shlex.py", line 96, in get_token
raw = self.read_token()
File "/usr/lib64/python2.7/shlex.py", line 172, in read_token
raise ValueError, "No closing quotation"
ValueError: No closing quotation
```
writing `local_action: shell echo {{ fsid }} | tee {{ fetch_directory }}/ceph_cluster_uuid.conf`
can cause trouble because it's complaining with missing quotes, this fix solves this issue.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1510555
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The --crush-device-class flag for ceph-volume is not available in luminous so lets
remove this testing option for now until it's more widely available.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
When deploying Jewel from master we still need to enable this code since
the container image has such check. This check still exists because
ceph-disk is not able to create a GPT label on a drive that does not
have one.
Signed-off-by: Sébastien Han <seb@redhat.com>
the entrypoint to generate users keyring is `ceph-authtool`, therefore,
it can expand the `$(ceph-authtool --gen-print-key)` inside the
container. Users must generate a keyring themselves.
This commit also adds a check to ensure keyring are properly filled when
`user_config: true`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Add a missing test `test_rbd_mirror_service_is_running_from_luminous()`.
Also using bash -c "<cmd>" to make testinfra aware that later in
the upgrade process we are now running `luminous` ceph release so we
must skip the rbd tests related to `jewel` ceph release.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
ceph-ansible is now being testing against ansible2.2 and ansible2.4. We
need to update tox.ini so we use the right version of testinfra
regarding which ansible version we are using.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit add new osd scenarios, it aims to simplify the CI setup and
brings a better coverage on the OSD scenarios.
We decided to differentiate between filestore and bluestore, thinking
ahead when filestore won't be supported anymore.
So we now have two classes of tests:
* Filestore
* Bluestore
In each of those classes we have container and non-container.
Then for each we test the following:
* collocated
* collocated dmcrypt
* non-collocated
* non-collocated dmcrypt
* auto discovery collocated
* auto discovery collocated dmcrypt
This gives us a nice coverage and also reduces the footprint on the CI.
We are now up to 4 scenarios, each containing 6 OSD VMs.
Signed-off-by: Sébastien Han <seb@redhat.com>
1. add the variables to docker_collocation
2. trigger the check when a MDS is part of the inventory file, not when
we run on an MDS...
Signed-off-by: Sébastien Han <seb@redhat.com>
vagrant is serialized and takes a lot of time compare to simple reboot.
See the benchmarks below for 3 VMs:
[leseb@rick docker]$ time ANSIBLE_SSH_ARGS="-F
/home/leseb/reproduce-ci/tmp.zgGC7d5mIC/build/workspace/ceph-ansible/tests/functional/centos/7/docker/vagrant_ssh_config" ansible-playbook -i /home/leseb/reproduce-ci/tmp.zgGC7d5mIC/build/workspace/ceph-ansible/tests/functional/centos/7/docker/hosts reboot.yml
PLAY [mons]
****************************************************************************************************************************************************************************************************
TASK [Gathering Facts]
*****************************************************************************************************************************************************************************************
ok: [mon1]
ok: [mon2]
ok: [mon0]
TASK [restart machine]
*****************************************************************************************************************************************************************************************
changed: [mon2]
changed: [mon1]
changed: [mon0]
TASK [wait for server to boot]
*********************************************************************************************************************************************************************************
ok: [mon2 -> localhost]
ok: [mon0 -> localhost]
ok: [mon1 -> localhost]
TASK [uptime]
**************************************************************************************************************************************************************************************************
changed: [mon2]
changed: [mon0]
changed: [mon1]
PLAY RECAP
*****************************************************************************************************************************************************************************************************
mon0 : ok=4 changed=2 unreachable=0
failed=0
mon1 : ok=4 changed=2 unreachable=0
failed=0
mon2 : ok=4 changed=2 unreachable=0
failed=0
real 0m35.112s
user 0m5.737s
sys 0m1.849s
[leseb@rick docker]$ time vagrant reload
==> mon0: Halting domain...
==> mon0: Starting domain.
==> mon0: Waiting for domain to get an IP address...
==> mon0: Waiting for SSH to become available...
==> mon0: Creating shared folders metadata...
==> mon0: Rsyncing folder:
/home/leseb/reproduce-ci/tmp.zgGC7d5mIC/build/workspace/ceph-ansible/tests/functional/centos/7/docker/
=> /home/vagrant/sync
==> mon0: Machine already provisioned. Run `vagrant provision` or use
the `--provision`
==> mon0: flag to force provisioning. Provisioners marked to run always
will still run.
==> mon1: Halting domain...
==> mon1: Starting domain.
==> mon1: Waiting for domain to get an IP address...
==> mon1: Waiting for SSH to become available...
==> mon1: Creating shared folders metadata...
==> mon1: Rsyncing folder:
/home/leseb/reproduce-ci/tmp.zgGC7d5mIC/build/workspace/ceph-ansible/tests/functional/centos/7/docker/
=> /home/vagrant/sync
==> mon1: Machine already provisioned. Run `vagrant provision` or use
the `--provision`
==> mon1: flag to force provisioning. Provisioners marked to run always
will still run.
==> mon2: Halting domain...
==> mon2: Starting domain.
==> mon2: Waiting for domain to get an IP address...
==> mon2: Waiting for SSH to become available...
==> mon2: Creating shared folders metadata...
==> mon2: Rsyncing folder:
/home/leseb/reproduce-ci/tmp.zgGC7d5mIC/build/workspace/ceph-ansible/tests/functional/centos/7/docker/
=> /home/vagrant/sync
==> mon2: Machine already provisioned. Run `vagrant provision` or use
the `--provision`
==> mon2: flag to force provisioning. Provisioners marked to run always
will still run.
real 1m31.850s
user 0m7.387s
sys 0m0.796s
Reboot via Ansible: 0m35.112s
Reboot via vagrant: 1m31.850s
We save 1/3 time.
Signed-off-by: Sébastien Han <seb@redhat.com>
We now have a variable called ceph_pools that is mandatory when
deploying a MDS.
It's a dictionnary that contains a pool name and a PG count. PG count is
mandatory and must be set, the playbook will fail otherwise.
Closes: https://github.com/ceph/ceph-ansible/issues/2017
Signed-off-by: Sébastien Han <seb@redhat.com>
The `always_run` key is deprecated and being removed in Ansible 2.4.
Using it causes a warning to be displayed:
[DEPRECATION WARNING]: always_run is deprecated.
This patch changes all instances of `always_run` to use the `always`
tag, which causes the task to run each time the playbook runs.
- the rbd-mirror unit systemd name is not the same when running jewel vs
luminous.
- servicemap is not available on jewel.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since we introduced collocation testing scenario, we need to adapt
current tests to this new scenario.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
If we don't do this the client will create pools with a replica 3 since
osd_pool_default_size was gone in ceph-override.json. This was making
switch_to_containers failing.
Signed-off-by: Sébastien Han <seb@redhat.com>
Shared folder is not required for tests.
We should avoid hitting the error :
```
uninitialized constant VagrantPlugins::ProviderLibvirt::Action::ShareFolders
```
Also, disabling it might reduce the needed time in certains cases for the VMs
to be started.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
If two environments are using the same subnet, we will get trouble
because of ips addresses conflicts.
This commit ensures each scenario has a uniq subnet for both public and cluster
network so we can setup several test environment at a time on a same hypervisor.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
the path `/dev/disk/by-path/pci-0000:00:01.1-ata-1.0` doesn't exist.
it has to be changed to `/dev/disk/by-path/pci-0000:00:01.1-ata-1`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit refacts the code regarding all `set_osd_pool_default_*`
related tasks by avoiding usage of useless `set_fact` to determine
whether a key is present in `ceph_conf_overrides`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
If we don't bootstrap the mgr after the mon and the osds handler are
called, we will never be able to reach a clean state since the pgs
stats are handled by the mgr. This also happens when doing daemon
collocation.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1493920
Signed-off-by: Sébastien Han <seb@redhat.com>
This test doesn't work at the moment and need to be fixed.
Disabling it temporary to avoid errors in the CI.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
On a container env, machines don't have any ceph binaries so we need to
use a container to run the commands.
Signed-off-by: Sébastien Han <seb@redhat.com>
Delete these before creating them incase they are left around in a purge
cluster testing scenario. The purge-cluster.yml playbook does not
currently remove partitions used for journals.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
The partition only needs created and given a gpt label so that a
PARTUUID will exist on the partition.
This task also makes the purge_lvm_osds scenario fail on the second
deployment after purging.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
Prior to this patch this activation sequence for autodetection was
always skipped because we were asking to activate on device without
partitions, which doesn't make sense.
We also fix the way we lookup for a device, since the data partition is
always numbered 1, we take the min element of the dict.
Closes: https://github.com/ceph/ceph-ansible/issues/1782
Signed-off-by: Sébastien Han <seb@redhat.com>
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
we need to force the value of `docker` variable which is initially set
to `false` since it's a migration from non-containerized to
containerized cluster.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The installation process is now described as follow:
* you still have to choose a 'ceph_origin' installation method. The
origin can be a 'repository' (add a new repository), distro (it will use
the packages provided by the native repo source of your distribution),
local (only available on redhat system, it installs locally built
packages). This option is not well tested, so use it carefully
* if ceph_origin == 'repository' you will have to decide what kind of
repository you want to enable:
- community: corresponds to the stable upstream/community version
- enterprise: corresponds to the stable enterprise/downstream version
(basically you are a red hat customer)
- dev: it will install ceph from packages built out of the github
development branches
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The lvm_volumes variable is now a list of dictionaries that represent
each OSD you'd like to deploy using ceph-volume. Each dictionary must
have the following keys: data, journal and data_vg. Each dictionary also
can optionaly provide a journal_vg key.
The 'data' key represents the lv name used for the OSD and the 'data_vg'
key is the vg name that the given lv resides on. The 'journal' key is
either an lv, device or partition. The 'journal_vg' key is optional and
must be the vg name for the journal lv if given. This key is mainly used
for purging of the journal lv if purge-cluster.yml is run.
For example:
lvm_volumes:
- data: data_lv1
journal: journal_lv1
data_vg: vg1
journal_vg: vg2
- data: data_lv2
journal: /dev/sdc
data_vg: vg1
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
Resolves issue: Multiple RGW Ceph.conf Issue #1258
In multi-RGW setup, in ceph.conf the RGW sections
contain identical bind IP in civetweb line. So this
modification fixes that issue and puts the right IP
for each RGW.
Signed-off-by: SirishaGuduru SGuduru@walmartlabs.com
Modified ceph-defaults and ran generate_group_vars_sample.sh
group_vars/osds.yml.sample and group_vars/rhcs.yml.sample are
not part of the changes. But they got modified when
generate_group_vars_sample.sh is ran to generate group_vars/
all.yml.sample.
Uncommented added variables in ceph-defaults
Updated tests by adding value for radosgw_interface
Added radosgw_interface to centos cluster tests
Modified ceph-rgw role,rebased and ran generate_group_vars_sample.sh
In ceph-rgw role removed check_mandatory_vars.yml.
Rebased on master.
Ran generate_group_vars_sample.sh and then the below files got
modified.