Commit Graph

3845 Commits (6423ab4ad377edacc62e69e5e9011917a10a0fb1)
 

Author SHA1 Message Date
Sébastien Han a9ed3579ae mon/osd: bump container memory limit
As discussed with the cores, the current limits are too low and should
be bumped to higher value.
So now by default monitors get 3GB and OSDs get 5GB.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1591876
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-17 11:20:27 -04:00
Guillaume Abrioux 21894655a7 tests: keep same ceph release during handlers/idempotency test
since `latest` points to `mimic`, we need to force the test to keep the
same ceph release when testing anything else than `mimic`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-15 11:45:51 -04:00
Guillaume Abrioux 51cf3b7fa0 client: try to kill dummy container only on first client node
The 'dummy' container is created only on first client node, it means we
must seek to destroy this container only on this node, otherwise this
can cause failure like following :
```
fatal: [192.168.24.8]: FAILED! => {"changed": false, "cmd": ["docker", "rm",
"-f", "ceph-create-keys"], "delta": "0:00:00.023692", "end": "2018-06-12
20:56:07.261278", "msg": "non-zero return code", "rc": 1, "start":
"2018-06-12 20:56:07.237586", "stderr": "Error response from daemon: No such
container: ceph-create-keys", "stderr_lines": ["Error response from daemon: No
such container: ceph-create-keys"], "stdout": "", "stdout_lines": []}

```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1590746

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-13 16:10:46 +02:00
Patrick Donnelly 9ce81ae845 ceph-mds: do not enable multimds on jewel
Multiple active MDS became stable in Luminous.

Introduced-by: c8573fe0d7
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-06-12 10:47:34 +02:00
Guillaume Abrioux 752781e2c9 core: make ansible pinning to latest ansible 2.4
it looks "ansible~=2.4" install latest ansible release in 2.5 so we must
specify we want latest release but inferior to 2.5.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-12 14:26:29 +08:00
Sébastien Han 2e8412734a common: ability to enable/disable fw configuration
Prior to this patch if you were running on a Red Hat system,
ceph-ansible would try to configure firewalld for you without the
operators's consent.
Now you can enable or disable the fw configuration by setting
configure_firewall to either true or false.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1589146
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-11 21:51:59 +02:00
Guillaume Abrioux bbb8691335 tests: increase memory to 1024Mb for centos7_cluster scenario
we see more and more failure like `fatal: [mon0]: UNREACHABLE! => {}` in
`centos7_cluster` scenario, Since we have 30Gb RAM on hypervisors, we
can give monitors a bit more RAM. By the way, nodes on containerized cluster
testing scenario have already 1024Mb memory allocated.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-11 23:52:15 +08:00
Guillaume Abrioux a351b08726 tests: set CEPH_DOCKER_IMAGE_TAG when ceph release is luminous
Since latest points to mimic for the ceph container images, we need to
set `CEPH_DOCKER_IMAGE_TAG` to `latest-luminous` when ceph release is
luminous

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-11 23:35:56 +08:00
Guillaume Abrioux a4ad2eb27f validate: be more explicit with error msg when notario isn't installed
This error message may be confusing and need to be more explicit on
where you have to install notario, indeed, people may think this library
must be installed on configured nodes while it must be installed on the
node you are running the playbook.

Fixes: #2649

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-11 21:31:34 +08:00
Konstantin Shalygin 3a07568496 ceph-osd: set 'openstack_keys_tmp' only when 'openstack_config' is defined.
If 'openstack_config' is false this task shouldn't be executed.

Signed-off-by: Konstantin Shalygin <k0ste@k0ste.ru>
2018-06-11 13:03:55 +02:00
Vishal Kanaujia 1a610df02b Fix to run secure cluster only once in a run
The current secure cluster play runs with all the monitors. The rerun
of this task is unnecessary and can be skipped.

Fixes: #2737

Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>
2018-06-11 08:37:29 +02:00
Sébastien Han 6035978ed9 test: only on containerized iscsi
We don't have the same service running on non-container for now, this
will change soon but for let's only run the test on container.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-11 08:34:48 +02:00
Guillaume Abrioux 090ecff94e client: keyrings aren't created when single client node
combining `run_once: true` with `inventory_hostname ==
groups.get(client_group_name) | first` might cause bug when the only
node being run is not the first in the group.

In a deployment with a single client node it might cause issue because
sometimes keyring won't be created since the task could be definitively
skipped.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1588093

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-08 15:05:47 +02:00
Sébastien Han 315ab08b16 contrib: fix generate group_vars samples
For ceph-iscsi-gw and ceph-rbd-mirror roles the group_name are named
differently (by default) than the role name so we have to change the
script to generate the correct name.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-08 10:21:54 +02:00
Sébastien Han 20c8065e48 ceph-iscsi: rename group iscsi_gws
Let's try to avoid using dashes as testinfra needs to be able to read
the groups.
Typically, with iscsi-gws we can't add a marker for these iscsi nodes,
using an underscore fixes the issue.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-08 10:21:54 +02:00
Sébastien Han c00fb12497 ci: add functionnal tests for iscsi
We test if:

* packages are installed
* services are runnning
* service units are enabled

Also fix linting issues

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-08 10:21:54 +02:00
Sébastien Han fdeee9eb19 site-docker: add iscsi role
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-08 10:21:54 +02:00
Sébastien Han 5ff2f03e3f ci: add iscsi test
Add iscsi CI coverage, this will now deploy iscsi gateways in container.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-08 10:21:54 +02:00
Sébastien Han 91bf53ee93 ceph-iscsi: support for containerize deployment
We now have the ability to deploy a containerized version of ceph-iscsi.
The result is similar to the non-containerized version, you simply have
3 containers running for the following services:

* rbd-target-api
* rbd-target-gw
* tcmu-runner

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1508144
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-08 10:21:54 +02:00
Andrew Schoen 8363ab43d3 pin version of ansible to 2.4 in requirements.txt
This is the latest version that we support. If we don't pin this we
get a 2.5.x version installed that causes the playbook to fail in
various ways.

Fixes: https://github.com/ceph/ceph-ansible/issues/2631

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-06-08 11:19:31 +08:00
Andrew Schoen 4b5cac2e07 tests: increase ssh timeout and retries in ansible.cfg
We see quite a few failures in the CI related to testing nodes losing
ssh connection. This modification allows ansible to retry more times and
wait longer before timing out. This seems to really affect testing
scenarios that use a large amount of testing nodes. The centos7_cluster
scenario specifically has 12 nodes and suffered from these failures
often.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-06-07 18:00:52 +02:00
Guillaume Abrioux 28d21b4e9c tests: update ooo inventory hostfile
Update the inventory host for tripleo testing scenario so it's the same
parameters than in tripleo CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-07 17:26:35 +02:00
Guillaume Abrioux 8a653cacd5 client: add a default value for keyring file
Potential error if someone doesnt pass the mode in `keys` dict for
client nodes:

```
fatal: [client2]: FAILED! => {}

MSG:

The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'mode'

The error appears to have been in '/home/guits/ceph-ansible/roles/ceph-client/tasks/create_users_keys.yml': line 117, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

- name: get client cephx keys
  ^ here

exception type: <class 'ansible.errors.AnsibleUndefinedVariable'>
exception: 'dict object' has no attribute 'mode'

```

adding a default value will avoid the deployment failing for this.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-07 17:26:35 +02:00
Guillaume Abrioux 5eacc8f8d8 tests: add a dummy value for 'dev' release
Functional tests are broken when testing against 'dev' release (ceph).
Adding a dummy value here will make it possible to run ceph-ansible CI
against dev ceph release.

Typical error:

```
>       if request.node.get_marker("from_luminous") and ceph_release_num[ceph_stable_release] < ceph_release_num['luminous']:
E       KeyError: 'dev'
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fd1487d93f21b609a637053f5b33cd2a4e408d00)
2018-06-07 13:59:17 +02:00
Andrew Schoen 24ef47b0e5 ceph-common: move firewall checks after package installation
We need to do this because on dev or rhcs installs ceph_stable_release
is not mandatory and the firewall check tasks have a task that is
conditional based off the installed version of ceph. If we perform those
checks after package install then they will not fail on dev or rhcs
installs.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-06-07 13:59:17 +02:00
Guillaume Abrioux 7b156deb67 client: use dummy created container when there is no mon in inventory
the `docker_exec_cmd` fact set in client role when there is no monitor
in inventory is wrong, `ceph-client-{{ hostname }}` is never created so
it will fail anyway.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-07 16:16:38 +08:00
Guillaume Abrioux c94ada69e8 tests: improve mds tests
the expected number of mds daemon consist of number of daemons that are
'up' + number of daemons 'up:standby'.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-07 14:01:58 +08:00
Guillaume Abrioux 433ecc7cbc osd: copy openstack keys over to all mon
When configuring openstack, the created keyrings aren't copied over to
all monitors nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1588093

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-07 13:58:57 +08:00
Guillaume Abrioux 232a16d77f rolling_update: fix facts gathering delegation
this is kind of follow up on what has been made in #2560.
See #2560 and #2553 for details.

Closes: #2708

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-06 16:36:30 +08:00
Patrick Donnelly 91f9da530f change max_mds default to 1
Otherwise, with the removal of mds_allow_multimds, the default of 3 will be set
on every new FS.

Introduced by: c8573fe0d7

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1583020
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-06-06 12:16:42 +08:00
Guillaume Abrioux f0cd4b0651 tests: skip disabling fastest mirror detection on atomic host
There is no need to execute this task on atomic hosts.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-05 15:39:37 +02:00
Guillaume Abrioux 47276764f7 tests: fix rgw tests
41b4632 has introduced a change in functionnals tests.
Since the admin keyring isn't copied on rgw nodes anymore in tests, let's use
the rgw keyring to achieve them.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-05 15:24:32 +02:00
Vishal Kanaujia 2cdb0d1812 Syntax error fix in rgw multisite role
This checkin fixes a syntax error in RGW multisite role under when
clause.

Fixes: #2704

Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>
2018-06-05 16:01:07 +05:30
Sébastien Han 41b4632abc test: do not always copy admin key
The admin key must be copied on the osd nodes only when we test the
shrink scenario. Shrink relies on ceph-disk commands that require the
admin key on the node where it's being executed.

Now we only copy the key when running on the shrink-osd scenario.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-05 09:39:30 +02:00
Guillaume Abrioux 2cf06b515f rgw: refact rgw pools creation
Refact of 8704144e31
There is no need to have duplicated tasks for this. The rgw pools
creation should be delegated on a monitor node se we don't have to care
if the admin keyring is present on rgw node.
By the way, only one task is needed to create the pools, we just need to
use the `docker_exec_cmd` fact already defined in `ceph-defaults` to
achieve it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1550281

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-05 15:00:20 +08:00
Ha Phan 1f3c9ce4f3 Use python instead of python2
The initial keyring is generated from ansible server locally and the snippet works well for both v2 and v3 of python.

I don't see any reason why we should explicitly invoke`python2` instead of just `python`.

In some setups, `python2` is not symlinked to `python`; while `python` and `python3` refer to v2 and v3 respectively.

Signed-off-by: Ha Phan <thanhha.work@gmail.com>
2018-06-04 14:24:10 +02:00
Sébastien Han db50aec13d ceph-common: add firewall rules for ceph-mgr
Prior to this commit the firewall tasks were not opening the ceph-mgr
ports. This would lead to unclean configuration since the ceph-mgr
daemons can not connect to the OSDs.
Thi commit opens the right ports on the ceph-mgr nodes to talk with the
OSDs.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1526400
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-04 12:11:41 +02:00
Vishal Kanaujia 08d9432454 Rolling upgrades should use norebalance flag for OSDs
The rolling upgrades playbook should have norebalance flag set for
OSDs upgrades to wait only for recovery.

Fixes: #2657
Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>
2018-06-04 10:59:01 +02:00
Erwan Velu 493f615eae ceph-defaults: Enable local epel repository
During the tests, the remote epel repository is generating a lots of
errors leading to broken jobs (issue #2666)

This patch is about using a local repository instead of a random one.
To achieve that, we make a preliminary install of epel-release, remove
the metalink and enforce a baseurl to our local http mirror.

That should speed up the build process but also avoid the random errors
we face.

This patch is part of a patch series that tries to remove all possible yum failures.

Signed-off-by: Erwan Velu <erwan@redhat.com>
2018-06-04 08:11:35 +02:00
jtudelag 600e1e2c26 rgws: renames create_pools variable with rgw_create_pools.
Renamed to be consistent with the role (rgw) and have a meaningful name.

Signed-off-by: Jorge Tudela <jtudelag@redhat.com>
2018-06-04 06:23:42 +02:00
jtudelag 8704144e31 Adds RGWs pool creation to containerized installation.
ceph command has to be executed from one of the monitor containers
if not admin copy present in RGWs. Task has to be delegated then.

Adds test to check proper RGW pool creation for Docker container scenarios.

Signed-off-by: Jorge Tudela <jtudelag@redhat.com>
2018-06-04 06:23:42 +02:00
Guillaume Abrioux aae37b44f5 mons: move set_fact of openstack_keys in ceph-osd
Since the openstack_config.yml has been moved to `ceph-osd` we must move
this `set_fact` in ceph-osd otherwise the tasks in
`openstack_config.yml` using `openstack_keys` will actually use the
defaults value from `ceph-defaults`.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1585139

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-01 17:12:01 +02:00
Andrew Schoen c2423e2c48 ceph-defaults: add the nautilus 14.x entry to ceph_release_num
The first 14.x tag has been cut so this needs to be added so that
version detection will still work on the master branch of ceph.

Fixes: https://github.com/ceph/ceph-ansible/issues/2671

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-06-01 16:51:23 +02:00
Guillaume Abrioux 9d5265fe11 osds: wait for osds to be up before creating pools
This is a follow up on #2628.
Even with the openstack pools creation moved later in the playbook,
there is still an issue because OSDs are not all UP when trying to
create pools.

Adding a task which checks for all OSDs to be UP with a `retries/until`
condition should definitively fix this issue.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-01 15:46:52 +02:00
Guillaume Abrioux 0b67f42feb Makefile: followup on #2585
Fix a typo in `tag` target, double quote are missing here.

Without them, the `make tag` command fails like this:

```
if [[ "v3.0.35" ==  ]]; then \
            echo "e5f2df8 on stable-3.0 is already tagged as v3.0.35"; \
            exit 1; \
        fi
/bin/sh: -c: line 0: unexpected argument `]]' to conditional binary operator
/bin/sh: -c: line 0: syntax error near `;'
/bin/sh: -c: line 0: `if [[ "v3.0.35" ==  ]]; then     echo "e5f2df8 on stable-3.0 is already tagged as v3.0.35";     exit 1; fi'
make: *** [tag] Error 2
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-01 12:50:03 +02:00
Guillaume Abrioux c68126d6fd mdss: do not make pg_num a mandatory params
When playing ceph-mds role, mon nodes have set a fact with the default
pg num for osd pools, we can simply default to this value for cephfs
pools (`cephfs_pools` variable).

At the moment the variable definition for `cephfs_pools` looks like:

```
cephfs_pools:
  - { name: "{{ cephfs_data }}", pgs: "" }
  - { name: "{{ cephfs_metadata }}", pgs: "" }
```

and we have a task in `ceph-validate` to ensure `pgs` has been set to a
valid value.

We could simply avoid this check by setting the default value of `pgs`
to `hostvars[groups[mon_group_name][0]]['osd_pool_default_pg_num']` and
let to users the possibility to override this value.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1581164

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-30 16:20:34 +02:00
Guillaume Abrioux 34e646e767 osds: do not set docker_exec_cmd fact
in `ceph-osd` there is no need to set `docker_exec_cmd` since the only
place where this fact is used is in `openstack_config.yml` which
delegate all docker command to a monitor node. It means we need the
`docker_exec_cmd` fact that has been set referring to `ceph-mon-*`
containers, this fact is already set earlier in `ceph-defaults`.

By the way, when collocating an OSD with a MON it fails because the container
`ceph-osd-{{ ansible_hostname }}` doesn't exist.

Removing this task will allow to collocate an OSD with a MON.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1584179

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-30 16:17:29 +02:00
Guillaume Abrioux 6f489015e4 tests: fix broken symlink
`requirements2.5.txt` is pointing to `tests/requirements2.4.txt` while
it should point to `requirements2.4.txt` since they are in the same
directory.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-30 16:13:47 +02:00
Guillaume Abrioux 34f7042852 tests: resize root partition when atomic host
For a few moment we can see failures in the CI for containerized
scenarios because VMs are running out of space at some point.

The default in the images used is to have only 3Gb for root partition
which doesn't sound like a lot.

Typical error seen:

```
STDERR:

failed to register layer: Error processing tar file(exit status 1): open /usr/share/zoneinfo/Atlantic/Canary: no space left on device
```

Indeed, on the machine we can see:
```
Every 2.0s: df -h                                                                                                                                                                                                                                       Tue May 29 17:21:13 2018
Filesystem                 Size  Used Avail Use% Mounted on
/dev/mapper/atomicos-root  3.0G  3.0G   14M 100% /
```

The idea here is to expand this partition with all the available space
remaining by issuing an `lvresize` followed by an `xfs_growfs`.

```
-bash-4.2# lvresize -l +100%FREE /dev/atomicos/root
  Size of logical volume atomicos/root changed from <2.93 GiB (750 extents) to 9.70 GiB (2484 extents).
  Logical volume atomicos/root successfully resized.
```

```
-bash-4.2# xfs_growfs /
meta-data=/dev/mapper/atomicos-root isize=512    agcount=4, agsize=192000 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=768000, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 768000 to 2543616
```

```
-bash-4.2# df -h
Filesystem                 Size  Used Avail Use% Mounted on
/dev/mapper/atomicos-root  9.7G  1.4G  8.4G  14% /
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-30 10:54:35 +02:00
Erwan Velu 1fca827724 CONTRIBUTING.md: Initial release
As per issue #2623, it is important to define the commit guidelines.
This commit is about adding a first version of it.

Fixes: #2653
Signed-off-by: Erwan Velu <erwan@redhat.com>
2018-05-30 09:38:27 +02:00