The data structure has slightly changed on mimic.
Prior to mimic, it used to be:
```
{
"enabled_modules": [
"status"
],
"disabled_modules": [
"balancer",
"dashboard",
"influx",
"localpool",
"prometheus",
"restful",
"selftest",
"zabbix"
]
}
```
From mimic it looks like this:
```
{
"enabled_modules": [
"status"
],
"disabled_modules": [
{
"name": "balancer",
"can_run": true,
"error_string": ""
},
{
"name": "dashboard",
"can_run": true,
"error_string": ""
}
]
}
```
This means we can't simply check if `item` is in `item in
_ceph_mgr_modules.disabled_modules`
the idea here is to use filter `map(attribute='name')` to build a list
when deploying mimic.
Fixes: #2766
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The container runs for 300 sec, then dies and removes itself thanks to
the '--rm' option, so there is no point of removing it. Also this is
causing failure under some circonstances.
Closing: https://bugzilla.redhat.com/show_bug.cgi?id=1568157
Signed-off-by: Sébastien Han <seb@redhat.com>
We now add a default 'rbd' application type to each pool we create. This
will remove the warning: " application not enabled on N pool(s) "
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1590275
Signed-off-by: Sébastien Han <seb@redhat.com>
Some fixes have gone into
git.openstack.org/openstack/ansible-config_template to deal with a few
bugs we have run into.
This PR brings the ceph-ansible config_template version up to the same
as the ansible-config_template openstack repo.
Closes: #2742
Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>
The script ceph-osd-run.sh holds the config options to start the
container, if one of these options are modified we must restart the
container. This was not the case before becauase the 'notify' flag
wasn't present.
Closing: https://bugzilla.redhat.com/show_bug.cgi?id=1596061
Signed-off-by: Sébastien Han <seb@redhat.com>
When using a module there is no need to apply this Ansible option. The
module will handle the idempotency on its own. So the module decides
wether or not the task has changed during the execution.
Signed-off-by: Sébastien Han <seb@redhat.com>
keyring files in /etc/ceph. Default value is the same as it was (0600),
but this variable allows user to override it (f.e. set it to 0640).
Signed-off-by: George Shuklin <george.shuklin@gmail.com>
During 226f80c22b only Debian package
installs had the correct state set to ensure packages were upgraded when
the "upgrade_ceph_packages" var was set to true.
Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>
--net=host was hardcoded in the startup line so even though
mon_docker_net_host was set to False the net option would always be
activated.
mon_docker_net_host is set to True by default so this commit does not
change the behaviour.
Signed-off-by: Sébastien Han <seb@redhat.com>
avoid duplicating test unnecessarily just because of docker exec syntax.
Using the same logic than in the playbook with `docker_exec_cmd` allow us
to execute the same test on both containerized and non containerized environment.
The idea is to set a variable `docker_exec_cmd` with the
'docker exec <container-name>' string when containerized and
set it to '' when non containerized.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
these tests are skipped on bluestore osds scenarios.
they were going to fail anyway since they are run on mon nodes and
`devices` is defined in inventory for each osd node. It means
`num_devices * num_osd_hosts` returns `0`.
The result is that the test expects to have 0 OSDs up.
The idea here is to move these tests so they are run on OSD nodes.
Each OSD node checks their respective OSD to be UP, if an OSD has 2
devices defined in `devices` variable, it means we are checking for 2
OSD to be up on that node, if each node has all its OSD up, we can say
all OSD are up.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
since ooo_collocation scenario is supposed to be the same scenario than the
one tested by OSP and they are not passing `rgw_create_pools` the test
`test_docker_rgw_tuning_pools_are_set` will fail:
```
> pools = node["vars"]["rgw_create_pools"]
E KeyError: 'rgw_create_pools'
```
skipping this test if `node["vars"]["rgw_create_pools"]` is not defined
fixes this failure.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
At the moment, a lot of tests are skipped when daemons are collocated.
Our tests consider a node belong to only 1 group while it's possible for
certain scenario it can belong to multiple groups.
Also pinning to pytest 3.6.1 so we can use `request.node.iter_markers()`
Co-Authored-by: Alfredo Deza <adeza@redhat.com>
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Mergify automatically merges pull requests when they're ready so you
don't have to. You set the rules, it does the rest.
Signed-off-by: Sébastien Han <seb@redhat.com>
Depending on your setup, ceph-mgr might get restarted multiple times.
When this is done to fast, systemd will prevent further restarts because of
configured limits in the ceph-mgr systemd unit file.
Resetting the failure count will prevent this problem. The reset is done before
the restart so in case of a real problem during the restart it still fails.
Fixes: #2768
Signed-off-by: Christian Zunker <christian.zunker@codecentric.cloud>
that may be helpful to know why a test has been skipped.
from pytest doc:
```
-r chars show extra test summary info as specified by chars
(f)ailed, (E)error, (s)skipped, (x)failed, (X)passed,
(p)passed, (P)passed with output, (a)all except pP.
Warnings are displayed at all times except when
--disable-warnings is set
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
It might happen that the list of ips/hosts in following line (ceph.conf)
- `mon initial memebers = <hosts>`
- `mon host = <ips>`
are not ordered the same way depending on deployment.
This patch makes the tests looking for each ip or hostname in respective
lines.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
adding more node in this scenario could help to have a better coverage
so we can catch more potential bugs.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Currently we expect that if configure_firewall is set to True to have
firewalld enabled and running. Let's enforce that.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1589146
Signed-off-by: Sébastien Han <seb@redhat.com>
As discussed with the cores, the current limits are too low and should
be bumped to higher value.
So now by default monitors get 3GB and OSDs get 5GB.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1591876
Signed-off-by: Sébastien Han <seb@redhat.com>
since `latest` points to `mimic`, we need to force the test to keep the
same ceph release when testing anything else than `mimic`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The 'dummy' container is created only on first client node, it means we
must seek to destroy this container only on this node, otherwise this
can cause failure like following :
```
fatal: [192.168.24.8]: FAILED! => {"changed": false, "cmd": ["docker", "rm",
"-f", "ceph-create-keys"], "delta": "0:00:00.023692", "end": "2018-06-12
20:56:07.261278", "msg": "non-zero return code", "rc": 1, "start":
"2018-06-12 20:56:07.237586", "stderr": "Error response from daemon: No such
container: ceph-create-keys", "stderr_lines": ["Error response from daemon: No
such container: ceph-create-keys"], "stdout": "", "stdout_lines": []}
```
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1590746
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
it looks "ansible~=2.4" install latest ansible release in 2.5 so we must
specify we want latest release but inferior to 2.5.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Prior to this patch if you were running on a Red Hat system,
ceph-ansible would try to configure firewalld for you without the
operators's consent.
Now you can enable or disable the fw configuration by setting
configure_firewall to either true or false.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1589146
Signed-off-by: Sébastien Han <seb@redhat.com>
we see more and more failure like `fatal: [mon0]: UNREACHABLE! => {}` in
`centos7_cluster` scenario, Since we have 30Gb RAM on hypervisors, we
can give monitors a bit more RAM. By the way, nodes on containerized cluster
testing scenario have already 1024Mb memory allocated.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since latest points to mimic for the ceph container images, we need to
set `CEPH_DOCKER_IMAGE_TAG` to `latest-luminous` when ceph release is
luminous
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This error message may be confusing and need to be more explicit on
where you have to install notario, indeed, people may think this library
must be installed on configured nodes while it must be installed on the
node you are running the playbook.
Fixes: #2649
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The current secure cluster play runs with all the monitors. The rerun
of this task is unnecessary and can be skipped.
Fixes: #2737
Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>
We don't have the same service running on non-container for now, this
will change soon but for let's only run the test on container.
Signed-off-by: Sébastien Han <seb@redhat.com>
combining `run_once: true` with `inventory_hostname ==
groups.get(client_group_name) | first` might cause bug when the only
node being run is not the first in the group.
In a deployment with a single client node it might cause issue because
sometimes keyring won't be created since the task could be definitively
skipped.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1588093
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
For ceph-iscsi-gw and ceph-rbd-mirror roles the group_name are named
differently (by default) than the role name so we have to change the
script to generate the correct name.
Signed-off-by: Sébastien Han <seb@redhat.com>