Commit Graph

3907 Commits (4159326a182d15376bf5e5913da4bb6281e27957)
 

Author SHA1 Message Date
Julien Danjou d0cd6de5a5 Use rebase as merge strategy for PR
Signed-off-by: Julien Danjou <julien@danjou.info>
2018-06-26 13:38:40 +02:00
Julien Danjou 824ec6d256 Use rebase as merge strategy for PR
Signed-off-by: Julien Danjou <julien@danjou.info>
2018-06-26 11:10:45 +02:00
Sébastien Han ddb1520c33 add mergify
Mergify automatically merges pull requests when they're ready so you
don't have to. You set the rules, it does the rest.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-25 16:43:54 +02:00
Michel Rode 7774935707 Added 'squash' as a parameter to nfs-ganesha.
Set the default to 'root_squash' - which is the default of nfs-ganesha.

Signed-off-by: Michel Rode <rmichel@devnu11.net>
2018-06-25 09:13:17 +02:00
Guillaume Abrioux 372b4a7ba4 doc: Update CONTRIBUTING.md
Let's add more information which could avoid contributors to waste their time
and CI resources.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-22 14:48:23 +02:00
Yaniv Kaul 888fb4fcdd Fix RHEL based Ansible installation
It is now on its own channel, not extras.

Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
2018-06-21 18:41:20 +02:00
Guillaume Abrioux 775a77dcd9 Revert "tests: add more verbosity in testinfra"
This reverts commit 68eb850b27.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-20 16:56:48 +02:00
Christian Zunker 48394597c9 reset failed count of ceph-mgr
Depending on your setup, ceph-mgr might get restarted multiple times.
When this is done to fast, systemd will prevent further restarts because of
configured limits in the ceph-mgr systemd unit file.

Resetting the failure count will prevent this problem. The reset is done before
the restart so in case of a real problem during the restart it still fails.

Fixes: #2768

Signed-off-by: Christian Zunker <christian.zunker@codecentric.cloud>
2018-06-20 13:59:16 +02:00
Guillaume Abrioux 68eb850b27 tests: add more verbosity in testinfra
that may be helpful to know why a test has been skipped.

from pytest doc:

```
  -r chars              show extra test summary info as specified by chars
                        (f)ailed, (E)error, (s)skipped, (x)failed, (X)passed,
                        (p)passed, (P)passed with output, (a)all except pP.
                        Warnings are displayed at all times except when
                        --disable-warnings is set
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-20 13:54:18 +02:00
Guillaume Abrioux f68936ca7e tests: fix *_has_correct_value tests
It might happen that the list of ips/hosts in following line (ceph.conf)
- `mon initial memebers = <hosts>`
- `mon host = <ips>`

are not ordered the same way depending on deployment.

This patch makes the tests looking for each ip or hostname in respective
lines.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-20 08:01:57 +02:00
Guillaume Abrioux 481c14455a tests: add more nodes in ooo testing scenario
adding more node in this scenario could help to have a better coverage
so we can catch more potential bugs.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-18 16:44:23 +02:00
Sébastien Han bea4027f0c common: start firewalld if configure_firewall
Currently we expect that if configure_firewall is set to True to have
firewalld enabled and running. Let's enforce that.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1589146
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-18 04:02:50 -04:00
Sébastien Han a9ed3579ae mon/osd: bump container memory limit
As discussed with the cores, the current limits are too low and should
be bumped to higher value.
So now by default monitors get 3GB and OSDs get 5GB.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1591876
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-17 11:20:27 -04:00
Guillaume Abrioux 21894655a7 tests: keep same ceph release during handlers/idempotency test
since `latest` points to `mimic`, we need to force the test to keep the
same ceph release when testing anything else than `mimic`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-15 11:45:51 -04:00
Guillaume Abrioux 51cf3b7fa0 client: try to kill dummy container only on first client node
The 'dummy' container is created only on first client node, it means we
must seek to destroy this container only on this node, otherwise this
can cause failure like following :
```
fatal: [192.168.24.8]: FAILED! => {"changed": false, "cmd": ["docker", "rm",
"-f", "ceph-create-keys"], "delta": "0:00:00.023692", "end": "2018-06-12
20:56:07.261278", "msg": "non-zero return code", "rc": 1, "start":
"2018-06-12 20:56:07.237586", "stderr": "Error response from daemon: No such
container: ceph-create-keys", "stderr_lines": ["Error response from daemon: No
such container: ceph-create-keys"], "stdout": "", "stdout_lines": []}

```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1590746

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-13 16:10:46 +02:00
Patrick Donnelly 9ce81ae845 ceph-mds: do not enable multimds on jewel
Multiple active MDS became stable in Luminous.

Introduced-by: c8573fe0d7
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-06-12 10:47:34 +02:00
Guillaume Abrioux 752781e2c9 core: make ansible pinning to latest ansible 2.4
it looks "ansible~=2.4" install latest ansible release in 2.5 so we must
specify we want latest release but inferior to 2.5.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-12 14:26:29 +08:00
Sébastien Han 2e8412734a common: ability to enable/disable fw configuration
Prior to this patch if you were running on a Red Hat system,
ceph-ansible would try to configure firewalld for you without the
operators's consent.
Now you can enable or disable the fw configuration by setting
configure_firewall to either true or false.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1589146
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-11 21:51:59 +02:00
Guillaume Abrioux bbb8691335 tests: increase memory to 1024Mb for centos7_cluster scenario
we see more and more failure like `fatal: [mon0]: UNREACHABLE! => {}` in
`centos7_cluster` scenario, Since we have 30Gb RAM on hypervisors, we
can give monitors a bit more RAM. By the way, nodes on containerized cluster
testing scenario have already 1024Mb memory allocated.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-11 23:52:15 +08:00
Guillaume Abrioux a351b08726 tests: set CEPH_DOCKER_IMAGE_TAG when ceph release is luminous
Since latest points to mimic for the ceph container images, we need to
set `CEPH_DOCKER_IMAGE_TAG` to `latest-luminous` when ceph release is
luminous

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-11 23:35:56 +08:00
Guillaume Abrioux a4ad2eb27f validate: be more explicit with error msg when notario isn't installed
This error message may be confusing and need to be more explicit on
where you have to install notario, indeed, people may think this library
must be installed on configured nodes while it must be installed on the
node you are running the playbook.

Fixes: #2649

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-11 21:31:34 +08:00
Konstantin Shalygin 3a07568496 ceph-osd: set 'openstack_keys_tmp' only when 'openstack_config' is defined.
If 'openstack_config' is false this task shouldn't be executed.

Signed-off-by: Konstantin Shalygin <k0ste@k0ste.ru>
2018-06-11 13:03:55 +02:00
Vishal Kanaujia 1a610df02b Fix to run secure cluster only once in a run
The current secure cluster play runs with all the monitors. The rerun
of this task is unnecessary and can be skipped.

Fixes: #2737

Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>
2018-06-11 08:37:29 +02:00
Sébastien Han 6035978ed9 test: only on containerized iscsi
We don't have the same service running on non-container for now, this
will change soon but for let's only run the test on container.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-11 08:34:48 +02:00
Guillaume Abrioux 090ecff94e client: keyrings aren't created when single client node
combining `run_once: true` with `inventory_hostname ==
groups.get(client_group_name) | first` might cause bug when the only
node being run is not the first in the group.

In a deployment with a single client node it might cause issue because
sometimes keyring won't be created since the task could be definitively
skipped.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1588093

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-08 15:05:47 +02:00
Sébastien Han 315ab08b16 contrib: fix generate group_vars samples
For ceph-iscsi-gw and ceph-rbd-mirror roles the group_name are named
differently (by default) than the role name so we have to change the
script to generate the correct name.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-08 10:21:54 +02:00
Sébastien Han 20c8065e48 ceph-iscsi: rename group iscsi_gws
Let's try to avoid using dashes as testinfra needs to be able to read
the groups.
Typically, with iscsi-gws we can't add a marker for these iscsi nodes,
using an underscore fixes the issue.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-08 10:21:54 +02:00
Sébastien Han c00fb12497 ci: add functionnal tests for iscsi
We test if:

* packages are installed
* services are runnning
* service units are enabled

Also fix linting issues

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-08 10:21:54 +02:00
Sébastien Han fdeee9eb19 site-docker: add iscsi role
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-08 10:21:54 +02:00
Sébastien Han 5ff2f03e3f ci: add iscsi test
Add iscsi CI coverage, this will now deploy iscsi gateways in container.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-08 10:21:54 +02:00
Sébastien Han 91bf53ee93 ceph-iscsi: support for containerize deployment
We now have the ability to deploy a containerized version of ceph-iscsi.
The result is similar to the non-containerized version, you simply have
3 containers running for the following services:

* rbd-target-api
* rbd-target-gw
* tcmu-runner

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1508144
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-08 10:21:54 +02:00
Andrew Schoen 8363ab43d3 pin version of ansible to 2.4 in requirements.txt
This is the latest version that we support. If we don't pin this we
get a 2.5.x version installed that causes the playbook to fail in
various ways.

Fixes: https://github.com/ceph/ceph-ansible/issues/2631

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-06-08 11:19:31 +08:00
Andrew Schoen 4b5cac2e07 tests: increase ssh timeout and retries in ansible.cfg
We see quite a few failures in the CI related to testing nodes losing
ssh connection. This modification allows ansible to retry more times and
wait longer before timing out. This seems to really affect testing
scenarios that use a large amount of testing nodes. The centos7_cluster
scenario specifically has 12 nodes and suffered from these failures
often.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-06-07 18:00:52 +02:00
Guillaume Abrioux 28d21b4e9c tests: update ooo inventory hostfile
Update the inventory host for tripleo testing scenario so it's the same
parameters than in tripleo CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-07 17:26:35 +02:00
Guillaume Abrioux 8a653cacd5 client: add a default value for keyring file
Potential error if someone doesnt pass the mode in `keys` dict for
client nodes:

```
fatal: [client2]: FAILED! => {}

MSG:

The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'mode'

The error appears to have been in '/home/guits/ceph-ansible/roles/ceph-client/tasks/create_users_keys.yml': line 117, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

- name: get client cephx keys
  ^ here

exception type: <class 'ansible.errors.AnsibleUndefinedVariable'>
exception: 'dict object' has no attribute 'mode'

```

adding a default value will avoid the deployment failing for this.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-07 17:26:35 +02:00
Guillaume Abrioux 5eacc8f8d8 tests: add a dummy value for 'dev' release
Functional tests are broken when testing against 'dev' release (ceph).
Adding a dummy value here will make it possible to run ceph-ansible CI
against dev ceph release.

Typical error:

```
>       if request.node.get_marker("from_luminous") and ceph_release_num[ceph_stable_release] < ceph_release_num['luminous']:
E       KeyError: 'dev'
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fd1487d93f21b609a637053f5b33cd2a4e408d00)
2018-06-07 13:59:17 +02:00
Andrew Schoen 24ef47b0e5 ceph-common: move firewall checks after package installation
We need to do this because on dev or rhcs installs ceph_stable_release
is not mandatory and the firewall check tasks have a task that is
conditional based off the installed version of ceph. If we perform those
checks after package install then they will not fail on dev or rhcs
installs.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-06-07 13:59:17 +02:00
Guillaume Abrioux 7b156deb67 client: use dummy created container when there is no mon in inventory
the `docker_exec_cmd` fact set in client role when there is no monitor
in inventory is wrong, `ceph-client-{{ hostname }}` is never created so
it will fail anyway.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-07 16:16:38 +08:00
Guillaume Abrioux c94ada69e8 tests: improve mds tests
the expected number of mds daemon consist of number of daemons that are
'up' + number of daemons 'up:standby'.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-07 14:01:58 +08:00
Guillaume Abrioux 433ecc7cbc osd: copy openstack keys over to all mon
When configuring openstack, the created keyrings aren't copied over to
all monitors nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1588093

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-07 13:58:57 +08:00
Guillaume Abrioux 232a16d77f rolling_update: fix facts gathering delegation
this is kind of follow up on what has been made in #2560.
See #2560 and #2553 for details.

Closes: #2708

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-06 16:36:30 +08:00
Patrick Donnelly 91f9da530f change max_mds default to 1
Otherwise, with the removal of mds_allow_multimds, the default of 3 will be set
on every new FS.

Introduced by: c8573fe0d7

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1583020
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-06-06 12:16:42 +08:00
Guillaume Abrioux f0cd4b0651 tests: skip disabling fastest mirror detection on atomic host
There is no need to execute this task on atomic hosts.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-05 15:39:37 +02:00
Guillaume Abrioux 47276764f7 tests: fix rgw tests
41b4632 has introduced a change in functionnals tests.
Since the admin keyring isn't copied on rgw nodes anymore in tests, let's use
the rgw keyring to achieve them.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-05 15:24:32 +02:00
Vishal Kanaujia 2cdb0d1812 Syntax error fix in rgw multisite role
This checkin fixes a syntax error in RGW multisite role under when
clause.

Fixes: #2704

Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>
2018-06-05 16:01:07 +05:30
Sébastien Han 41b4632abc test: do not always copy admin key
The admin key must be copied on the osd nodes only when we test the
shrink scenario. Shrink relies on ceph-disk commands that require the
admin key on the node where it's being executed.

Now we only copy the key when running on the shrink-osd scenario.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-05 09:39:30 +02:00
Guillaume Abrioux 2cf06b515f rgw: refact rgw pools creation
Refact of 8704144e31
There is no need to have duplicated tasks for this. The rgw pools
creation should be delegated on a monitor node se we don't have to care
if the admin keyring is present on rgw node.
By the way, only one task is needed to create the pools, we just need to
use the `docker_exec_cmd` fact already defined in `ceph-defaults` to
achieve it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1550281

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-05 15:00:20 +08:00
Ha Phan 1f3c9ce4f3 Use python instead of python2
The initial keyring is generated from ansible server locally and the snippet works well for both v2 and v3 of python.

I don't see any reason why we should explicitly invoke`python2` instead of just `python`.

In some setups, `python2` is not symlinked to `python`; while `python` and `python3` refer to v2 and v3 respectively.

Signed-off-by: Ha Phan <thanhha.work@gmail.com>
2018-06-04 14:24:10 +02:00
Sébastien Han db50aec13d ceph-common: add firewall rules for ceph-mgr
Prior to this commit the firewall tasks were not opening the ceph-mgr
ports. This would lead to unclean configuration since the ceph-mgr
daemons can not connect to the OSDs.
Thi commit opens the right ports on the ceph-mgr nodes to talk with the
OSDs.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1526400
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-04 12:11:41 +02:00
Vishal Kanaujia 08d9432454 Rolling upgrades should use norebalance flag for OSDs
The rolling upgrades playbook should have norebalance flag set for
OSDs upgrades to wait only for recovery.

Fixes: #2657
Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>
2018-06-04 10:59:01 +02:00