if we don't assign the rbd application tag on this pool,
the cluster will get `HEALTH_WARN` state like following:
```
HEALTH_WARN application not enabled on 1 pool(s)
POOL_APP_NOT_ENABLED application not enabled on 1 pool(s)
application not enabled on pool 'rbd'
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4cf17a6fdd)
The current lvm_osds only tests filestore on one OSD node.
We also have bs_lvm_osds to test bluestore and encryption.
Let's use only one scenario to test filestore/bluestore and with or
without dmcrypt on four OSD nodes.
Also use validate_dmcrypt_bool_value instead of types.boolean on
dmcrypt validation via notario.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 52b9f3fb28)
ceph-volume didn't work when the devices where passed by path.
Since it now support it, let's allow this feature in ceph-ansible
Closes: #3812
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8f2c45dfd3)
It's usefull to have logs in debug mode enabled in order to have
more information for developpers.
Also reindent to json file.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d25af1b872)
We don't need to have multiple ceph-override.json copies. We
currently already have symlink to all_daemons/ceph-override.json so
we can do it for all scenarios.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a19054be18)
** configuration seems to be for filestore:
[ERROR]: [ceph-osd0] Validation failed for variable: lvm_volumes
** Removing `radosgw_interface: eth1` to resolve:
The task includes an option with an undefined variable. The error was:
'ansible.vars.hostvars.HostVarsVars object' has no attribute
u'ansible_eth1'
The error appears to have been in
'/home/nwatkins/src/ceph-ansible/roles/ceph-defaults/tasks/set_radosgw_address.yml':
line 21, column 5, but may be elsewhere in the file depending on the
exact syntax problem.
The offending line appears to be:
- name: set_fact _radosgw_address to radosgw_interface - ipv4
^ here
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
(cherry picked from commit 50255b9640)
Based on https://github.com/ceph/ceph-container/pull/1269 and given
there are no stable packages and reliable repository, we disable nfs
ganesha temporarly.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 6c3ef90ebe)
`osd_pool_default_pg_num` parameter is set in `ceph-mon`.
When using ceph-ansible with `--limit` on a specifc group of nodes, it
will fail when trying to access this variables since it wouldn't be
defined.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1518696
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d4c0960f04)
We run an initial deployment with `osd_pool_default_size: 1` in
`ceph_conf_overrides`.
When re-running the playbook to test idempotency and handlers, we reset
`ceph_conf_overrides`, we must append a new value instead of just
overwritting it, otherwise, this can lead to error in the CI.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f290e49df8)
setting this setting to 1 makes the CI covering the related code in the
playbook without breaking the upgrade scenarios.
Those scenarios were broken because there is a check `TASK [waiting for
clean pgs...]` in rolling_update.yml, since the pool size for
`cephfs_metadata` and `cephfs_data` are updated to `2` in
`ceph-override.json` and there is not enough osd to honor this size,
some PGs are degraded and make the mentioned check failing.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3ac6619fb9)
Adding more memory to VMs for rgw_multisite scenarios could avoid this error
I have recently hit in the CI:
(It is worth it to set 1024Mb since there is only 2 nodes in those
scenarios.)
```
fatal: [osd0]: FAILED! => {
"changed": false,
"cmd": [
"docker",
"run",
"--rm",
"--entrypoint",
"/usr/bin/ceph",
"docker.io/ceph/daemon:latest-luminous",
"--version"
],
"delta": "0:00:04.799084",
"end": "2018-10-29 17:10:39.136602",
"rc": 1,
"start": "2018-10-29 17:10:34.337518"
}
STDERR:
Traceback (most recent call last):
File "/usr/bin/ceph", line 125, in <module>
import rados
ImportError: libceph-common.so.0: cannot map zero-fill pages: Cannot allocate memory
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Add a playbook that will upload a file on the master then try to get
info from the secondary node, this way we can check if the replication
is ok.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This will setup 2 cluster with rgw multisite enabled.
First cluster will act as the 'master', the 2nd will be the secondary
one.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
ceph-disk is now deprecated in ceph-ansible so let's convert all the ci
tests to use lvm instead of ceph-disk.
Signed-off-by: Sébastien Han <seb@redhat.com>
Since we are removing the ceph-disk test from the ci in master then
there is no need to have the functionnal tests in master anymore.
Signed-off-by: Sébastien Han <seb@redhat.com>
since we set `configure_firewall: true` in
`ceph-defaults/defaults/main.yml` there is no need to explicitly set it
in `centos7_cluster` and `docker_cluster` testing scenarios.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This approach doesn't work with all scenarios because it's comparing a
local OSD number expected to a global OSD number found in the whole
cluster.
This reverts commit b8ad35ceb9.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Let's get the osd tree from mons instead on osds.
This way we don't have to predict an OSD container name.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Three fixes:
- fix a typo in vagrant_variables that cause a networking issue for
containerized scenario.
- add containerized_deployment: true
- remove a useless block of code: the fact docker_exec_cmd is set in
ceph-defaults which is played right after.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Adding testing scenarios for day-2-operation playbook.
Steps:
- deploys a cluster,
- run testinfra,
- test idempotency,
- add a new osd node,
- run testinfra
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
As of now, we should no longer support Jewel in ceph-ansible.
The latest ceph-ansible release supporting Jewel is `stable-3.1`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The CI is still running ceph-disk tests upstream. So until
https://github.com/ceph/ceph-ansible/pull/3187 is merged nothing will
pass anymore.
Signed-off-by: Sébastien Han <seb@redhat.com>
This commit does a couple of things:
* Avoid code duplication
* Clarify the code
* add more unit tests
* add myself to the author of the module
Signed-off-by: Sébastien Han <seb@redhat.com>
not gathering fact causes `package` module to fail because it needs to
detect which OS we are running on to select the right package manager.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
get more coverage by adding an RGW daemon collocated on osd0.
We've missed a bug in the past which could have been caught earlier in
the CI.
Let's add this additional daemon in order to have a better coverage.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
There is no point of using hosts running on atomic AND centos hosts. So
let's run containerized scenarios on Atomic only.
This solves this error here:
```
fatal: [client2]: FAILED! => {
"failed": true
}
MSG:
The conditional check 'ceph_current_status.rc == 0' failed. The error was: error while evaluating conditional (ceph_current_status.rc == 0): 'dict object' has no attribute 'rc'
The error appears to have been in '/home/jenkins-build/build/workspace/ceph-ansible-nightly-luminous-stable-3.1-ooo_collocation/roles/ceph-defaults/tasks/facts.yml': line 74, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: set_fact ceph_current_status (convert to json)
^ here
```
From https://2.jenkins.ceph.com/view/ceph-ansible-stable3.1/job/ceph-ansible-nightly-luminous-stable-3.1-ooo_collocation/37/consoleFull#1765217701b5dd38fa-a56e-4233-a5ca-584604e56e3a
What's happening here is all the hosts excepts the clients are running atomic, so here: https://github.com/ceph/ceph-ansible/blob/master/site-docker.yml.sample#L62
The condition will skipped all the nodes excepts the clients, thus when running ceph-default, the task "is ceph running already?" is skipped but the task above needs the rc of the skipped task.
This is not an error from the playbook, it's a CI setup issue.
Signed-off-by: Sébastien Han <seb@redhat.com>