let's deploy mgr on a dedicated node.
This makes update job failing on stable-4.0 branch since there's a
mismatch between the two inventories.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The ooo-collocation scenario was still using an old container image and
doesn't match the requirement on latest stable-3.2 code. We need to use
at least the container image v3.2.5.
Also updating the OSD tests to reflect the changes introduced by the
commit bedc0ab because we don't have the OSD systemd unit script using
device name anymore.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
nfs-ganesha v2.5 and 2.6 have hit EOL. Install nfs-ganesha v2.7
stable that is currently being maintained.
Signed-off-by: Ramana Raja <rraja@redhat.com>
(cherry picked from commit dfff89ce67)
This was removed because of broken repositories which made the CI
failing. That doesn't make sense anymore so adding back it
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
if we don't assign the rbd application tag on this pool,
the cluster will get `HEALTH_WARN` state like following:
```
HEALTH_WARN application not enabled on 1 pool(s)
POOL_APP_NOT_ENABLED application not enabled on 1 pool(s)
application not enabled on pool 'rbd'
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4cf17a6fdd)
The current lvm_osds only tests filestore on one OSD node.
We also have bs_lvm_osds to test bluestore and encryption.
Let's use only one scenario to test filestore/bluestore and with or
without dmcrypt on four OSD nodes.
Also use validate_dmcrypt_bool_value instead of types.boolean on
dmcrypt validation via notario.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 52b9f3fb28)
ceph-volume didn't work when the devices where passed by path.
Since it now support it, let's allow this feature in ceph-ansible
Closes: #3812
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8f2c45dfd3)
It's usefull to have logs in debug mode enabled in order to have
more information for developpers.
Also reindent to json file.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d25af1b872)
We don't need to have multiple ceph-override.json copies. We
currently already have symlink to all_daemons/ceph-override.json so
we can do it for all scenarios.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a19054be18)
looks like newer version of pytest-xdist requires pytest>=4.4.0
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ba0a95211c)
As of testinfra 2.0.0, the binary name is `py.test`.
But let's pin the version to 1.19.0.
Indeed, migrating to 2.0.0 requires our current testing to be reworked a bit.
Since we don't have the bandwidth ATM for this, it's better to simply
keep testing with testinfra 1.19.0.
Note that I've replaced all `testinfra` occurences by `py.test` anyway.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b42250332a)
** configuration seems to be for filestore:
[ERROR]: [ceph-osd0] Validation failed for variable: lvm_volumes
** Removing `radosgw_interface: eth1` to resolve:
The task includes an option with an undefined variable. The error was:
'ansible.vars.hostvars.HostVarsVars object' has no attribute
u'ansible_eth1'
The error appears to have been in
'/home/nwatkins/src/ceph-ansible/roles/ceph-defaults/tasks/set_radosgw_address.yml':
line 21, column 5, but may be elsewhere in the file depending on the
exact syntax problem.
The offending line appears to be:
- name: set_fact _radosgw_address to radosgw_interface - ipv4
^ here
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
(cherry picked from commit 50255b9640)
Based on https://github.com/ceph/ceph-container/pull/1269 and given
there are no stable packages and reliable repository, we disable nfs
ganesha temporarly.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 6c3ef90ebe)
`osd_pool_default_pg_num` parameter is set in `ceph-mon`.
When using ceph-ansible with `--limit` on a specifc group of nodes, it
will fail when trying to access this variables since it wouldn't be
defined.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1518696
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d4c0960f04)
We run an initial deployment with `osd_pool_default_size: 1` in
`ceph_conf_overrides`.
When re-running the playbook to test idempotency and handlers, we reset
`ceph_conf_overrides`, we must append a new value instead of just
overwritting it, otherwise, this can lead to error in the CI.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f290e49df8)
setting this setting to 1 makes the CI covering the related code in the
playbook without breaking the upgrade scenarios.
Those scenarios were broken because there is a check `TASK [waiting for
clean pgs...]` in rolling_update.yml, since the pool size for
`cephfs_metadata` and `cephfs_data` are updated to `2` in
`ceph-override.json` and there is not enough osd to honor this size,
some PGs are degraded and make the mentioned check failing.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3ac6619fb9)
Adding more memory to VMs for rgw_multisite scenarios could avoid this error
I have recently hit in the CI:
(It is worth it to set 1024Mb since there is only 2 nodes in those
scenarios.)
```
fatal: [osd0]: FAILED! => {
"changed": false,
"cmd": [
"docker",
"run",
"--rm",
"--entrypoint",
"/usr/bin/ceph",
"docker.io/ceph/daemon:latest-luminous",
"--version"
],
"delta": "0:00:04.799084",
"end": "2018-10-29 17:10:39.136602",
"rc": 1,
"start": "2018-10-29 17:10:34.337518"
}
STDERR:
Traceback (most recent call last):
File "/usr/bin/ceph", line 125, in <module>
import rados
ImportError: libceph-common.so.0: cannot map zero-fill pages: Cannot allocate memory
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Add a playbook that will upload a file on the master then try to get
info from the secondary node, this way we can check if the replication
is ok.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This will setup 2 cluster with rgw multisite enabled.
First cluster will act as the 'master', the 2nd will be the secondary
one.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We do not use @<device> anymore so we don't need to perform the
readlink check anymore.
Also we are making an exception for ooo which is still using ceph-disk.
Signed-off-by: Sébastien Han <seb@redhat.com>
ceph-disk is now deprecated in ceph-ansible so let's convert all the ci
tests to use lvm instead of ceph-disk.
Signed-off-by: Sébastien Han <seb@redhat.com>
Since we are removing the ceph-disk test from the ci in master then
there is no need to have the functionnal tests in master anymore.
Signed-off-by: Sébastien Han <seb@redhat.com>
since we set `configure_firewall: true` in
`ceph-defaults/defaults/main.yml` there is no need to explicitly set it
in `centos7_cluster` and `docker_cluster` testing scenarios.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This approach doesn't work with all scenarios because it's comparing a
local OSD number expected to a global OSD number found in the whole
cluster.
This reverts commit b8ad35ceb9.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Let's get the osd tree from mons instead on osds.
This way we don't have to predict an OSD container name.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Three fixes:
- fix a typo in vagrant_variables that cause a networking issue for
containerized scenario.
- add containerized_deployment: true
- remove a useless block of code: the fact docker_exec_cmd is set in
ceph-defaults which is played right after.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Adding testing scenarios for day-2-operation playbook.
Steps:
- deploys a cluster,
- run testinfra,
- test idempotency,
- add a new osd node,
- run testinfra
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
As of now, we should no longer support Jewel in ceph-ansible.
The latest ceph-ansible release supporting Jewel is `stable-3.1`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The CI is still running ceph-disk tests upstream. So until
https://github.com/ceph/ceph-ansible/pull/3187 is merged nothing will
pass anymore.
Signed-off-by: Sébastien Han <seb@redhat.com>
This commit does a couple of things:
* Avoid code duplication
* Clarify the code
* add more unit tests
* add myself to the author of the module
Signed-off-by: Sébastien Han <seb@redhat.com>