bring the recent refact about `osd_pool_default_pg_num` and
`osd_pool_default_size` into podman scenario as well.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since we are now testing on docker and podman our functionnal tests must
reflect that. So now, if we detect the podman binary we will use it,
otherwise we default to docker.
Signed-off-by: Sébastien Han <seb@redhat.com>
We run an initial deployment with `osd_pool_default_size: 1` in
`ceph_conf_overrides`.
When re-running the playbook to test idempotency and handlers, we reset
`ceph_conf_overrides`, we must append a new value instead of just
overwritting it, otherwise, this can lead to error in the CI.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
`osd_pool_default_pg_num` parameter is set in `ceph-mon`.
When using ceph-ansible with `--limit` on a specifc group of nodes, it
will fail when trying to access this variables since it wouldn't be
defined.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1518696
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
setting this setting to 1 makes the CI covering the related code in the
playbook without breaking the upgrade scenarios.
Those scenarios were broken because there is a check `TASK [waiting for
clean pgs...]` in rolling_update.yml, since the pool size for
`cephfs_metadata` and `cephfs_data` are updated to `2` in
`ceph-override.json` and there is not enough osd to honor this size,
some PGs are degraded and make the mentioned check failing.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
** configuration seems to be for filestore:
[ERROR]: [ceph-osd0] Validation failed for variable: lvm_volumes
** Removing `radosgw_interface: eth1` to resolve:
The task includes an option with an undefined variable. The error was:
'ansible.vars.hostvars.HostVarsVars object' has no attribute
u'ansible_eth1'
The error appears to have been in
'/home/nwatkins/src/ceph-ansible/roles/ceph-defaults/tasks/set_radosgw_address.yml':
line 21, column 5, but may be elsewhere in the file depending on the
exact syntax problem.
The offending line appears to be:
- name: set_fact _radosgw_address to radosgw_interface - ipv4
^ here
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Let's test ceph-ansible master against ansible 2.7 to catch early any
potential issue with this ansible version.
Closes: #3148
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Adding more memory to VMs for rgw_multisite scenarios could avoid this error
I have recently hit in the CI:
(It is worth it to set 1024Mb since there is only 2 nodes in those
scenarios.)
```
fatal: [osd0]: FAILED! => {
"changed": false,
"cmd": [
"docker",
"run",
"--rm",
"--entrypoint",
"/usr/bin/ceph",
"docker.io/ceph/daemon:latest-luminous",
"--version"
],
"delta": "0:00:04.799084",
"end": "2018-10-29 17:10:39.136602",
"rc": 1,
"start": "2018-10-29 17:10:34.337518"
}
STDERR:
Traceback (most recent call last):
File "/usr/bin/ceph", line 125, in <module>
import rados
ImportError: libceph-common.so.0: cannot map zero-fill pages: Cannot allocate memory
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Add a playbook that will upload a file on the master then try to get
info from the secondary node, this way we can check if the replication
is ok.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This will setup 2 cluster with rgw multisite enabled.
First cluster will act as the 'master', the 2nd will be the secondary
one.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We do not use @<device> anymore so we don't need to perform the
readlink check anymore.
Also we are making an exception for ooo which is still using ceph-disk.
Signed-off-by: Sébastien Han <seb@redhat.com>
ceph-disk is now deprecated in ceph-ansible so let's convert all the ci
tests to use lvm instead of ceph-disk.
Signed-off-by: Sébastien Han <seb@redhat.com>
Since we are removing the ceph-disk test from the ci in master then
there is no need to have the functionnal tests in master anymore.
Signed-off-by: Sébastien Han <seb@redhat.com>
since we set `configure_firewall: true` in
`ceph-defaults/defaults/main.yml` there is no need to explicitly set it
in `centos7_cluster` and `docker_cluster` testing scenarios.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This approach doesn't work with all scenarios because it's comparing a
local OSD number expected to a global OSD number found in the whole
cluster.
This reverts commit b8ad35ceb9.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Let's get the osd tree from mons instead on osds.
This way we don't have to predict an OSD container name.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Three fixes:
- fix a typo in vagrant_variables that cause a networking issue for
containerized scenario.
- add containerized_deployment: true
- remove a useless block of code: the fact docker_exec_cmd is set in
ceph-defaults which is played right after.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Adding testing scenarios for day-2-operation playbook.
Steps:
- deploys a cluster,
- run testinfra,
- test idempotency,
- add a new osd node,
- run testinfra
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
As of now, we should no longer support Jewel in ceph-ansible.
The latest ceph-ansible release supporting Jewel is `stable-3.1`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The CI is still running ceph-disk tests upstream. So until
https://github.com/ceph/ceph-ansible/pull/3187 is merged nothing will
pass anymore.
Signed-off-by: Sébastien Han <seb@redhat.com>
This commit does a couple of things:
* Avoid code duplication
* Clarify the code
* add more unit tests
* add myself to the author of the module
Signed-off-by: Sébastien Han <seb@redhat.com>
not gathering fact causes `package` module to fail because it needs to
detect which OS we are running on to select the right package manager.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
get more coverage by adding an RGW daemon collocated on osd0.
We've missed a bug in the past which could have been caught earlier in
the CI.
Let's add this additional daemon in order to have a better coverage.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
If this is set to anything other than the default value of 1 then the
--osds-per-device flag will be used by the batch command to define how
many osds will be created per device.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
There is no point of using hosts running on atomic AND centos hosts. So
let's run containerized scenarios on Atomic only.
This solves this error here:
```
fatal: [client2]: FAILED! => {
"failed": true
}
MSG:
The conditional check 'ceph_current_status.rc == 0' failed. The error was: error while evaluating conditional (ceph_current_status.rc == 0): 'dict object' has no attribute 'rc'
The error appears to have been in '/home/jenkins-build/build/workspace/ceph-ansible-nightly-luminous-stable-3.1-ooo_collocation/roles/ceph-defaults/tasks/facts.yml': line 74, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: set_fact ceph_current_status (convert to json)
^ here
```
From https://2.jenkins.ceph.com/view/ceph-ansible-stable3.1/job/ceph-ansible-nightly-luminous-stable-3.1-ooo_collocation/37/consoleFull#1765217701b5dd38fa-a56e-4233-a5ca-584604e56e3a
What's happening here is all the hosts excepts the clients are running atomic, so here: https://github.com/ceph/ceph-ansible/blob/master/site-docker.yml.sample#L62
The condition will skipped all the nodes excepts the clients, thus when running ceph-default, the task "is ceph running already?" is skipped but the task above needs the rc of the skipped task.
This is not an error from the playbook, it's a CI setup issue.
Signed-off-by: Sébastien Han <seb@redhat.com>
Using an explicitly named testing environment name allows us to have a
specific [testenv] block for this test. This greatly simplifies how it will
work as it doesn't really anything from the ceph cluster tests.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
This adds the action 'batch' to the ceph-volume module so that we can
run the new 'ceph-volume lvm batch' subcommand. A functional test is
also included.
If devices is defind and osd_scenario is lvm then the 'ceph-volume lvm
batch' command will be used to create the OSDs.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
jewel used to create a default `rbd` pool in the default crush root
`default`, we need to have at least 1 osd to satisfy the PGs for this
created pool, otherwise the cluster will be in HEALTH_ERR state because
of `pgs stuck unclean`/`pgs stuck inactive`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>