Commit Graph

271 Commits (d1c361cdafa6e4a041d0c40cf84682f6c5729349)

Author SHA1 Message Date
Guillaume Abrioux 156daf1018 tests: increase memory to 1024Mb for centos7_cluster scenario
we see more and more failure like `fatal: [mon0]: UNREACHABLE! => {}` in
`centos7_cluster` scenario, Since we have 30Gb RAM on hypervisors, we
can give monitors a bit more RAM. By the way, nodes on containerized cluster
testing scenario have already 1024Mb memory allocated.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit bbb8691335)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-11 17:55:04 +02:00
Guillaume Abrioux fd10fcedff tests: update ooo inventory hostfile
Update the inventory host for tripleo testing scenario so it's the same
parameters than in tripleo CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 28d21b4e9c)
2018-06-07 18:27:00 +02:00
Guillaume Abrioux 48e7cc506c tests: improve mds tests
the expected number of mds daemon consist of number of daemons that are
'up' + number of daemons 'up:standby'.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c94ada69e8)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-07 17:45:29 +08:00
Sébastien Han 5199300a6b test: do not always copy admin key
The admin key must be copied on the osd nodes only when we test the
shrink scenario. Shrink relies on ceph-disk commands that require the
admin key on the node where it's being executed.

Now we only copy the key when running on the shrink-osd scenario.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 41b4632abc)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-06 10:50:56 +02:00
Guillaume Abrioux f940163ab5 tests: fix rgw tests
41b4632 has introduced a change in functionnals tests.
Since the admin keyring isn't copied on rgw nodes anymore in tests, let's use
the rgw keyring to achieve them.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 47276764f7)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-06 15:41:13 +08:00
Guillaume Abrioux a558d8aef3 rgw: refact rgw pools creation
Refact of 8704144e31
There is no need to have duplicated tasks for this. The rgw pools
creation should be delegated on a monitor node se we don't have to care
if the admin keyring is present on rgw node.
By the way, only one task is needed to create the pools, we just need to
use the `docker_exec_cmd` fact already defined in `ceph-defaults` to
achieve it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1550281

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2cf06b515f)
2018-06-06 11:30:29 +08:00
jtudelag 36b2c4a527 rgws: renames create_pools variable with rgw_create_pools.
Renamed to be consistent with the role (rgw) and have a meaningful name.

Signed-off-by: Jorge Tudela <jtudelag@redhat.com>
(cherry picked from commit 600e1e2c26)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-05 18:56:24 +02:00
jtudelag 1d94d12c9f Adds RGWs pool creation to containerized installation.
ceph command has to be executed from one of the monitor containers
if not admin copy present in RGWs. Task has to be delegated then.

Adds test to check proper RGW pool creation for Docker container scenarios.

Signed-off-by: Jorge Tudela <jtudelag@redhat.com>
(cherry picked from commit 8704144e31)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-05 18:56:24 +02:00
Guillaume Abrioux cd6ef8e9ec tests: skip disabling fastest mirror detection on atomic host
There is no need to execute this task on atomic hosts.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f0cd4b0651)
2018-06-05 16:02:54 +02:00
Erwan Velu 51a7eb5a70 ceph-defaults: Enable local epel repository
During the tests, the remote epel repository is generating a lots of
errors leading to broken jobs (issue #2666)

This patch is about using a local repository instead of a random one.
To achieve that, we make a preliminary install of epel-release, remove
the metalink and enforce a baseurl to our local http mirror.

That should speed up the build process but also avoid the random errors
we face.

This patch is part of a patch series that tries to remove all possible yum failures.

Signed-off-by: Erwan Velu <erwan@redhat.com>
(cherry picked from commit 493f615eae)
2018-06-05 16:02:54 +02:00
Guillaume Abrioux 4328e0b42a mdss: do not make pg_num a mandatory params
When playing ceph-mds role, mon nodes have set a fact with the default
pg num for osd pools, we can simply default to this value for cephfs
pools (`cephfs_pools` variable).

At the moment the variable definition for `cephfs_pools` looks like:

```
cephfs_pools:
  - { name: "{{ cephfs_data }}", pgs: "" }
  - { name: "{{ cephfs_metadata }}", pgs: "" }
```

and we have a task in `ceph-validate` to ensure `pgs` has been set to a
valid value.

We could simply avoid this check by setting the default value of `pgs`
to `hostvars[groups[mon_group_name][0]]['osd_pool_default_pg_num']` and
let to users the possibility to override this value.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1581164

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c68126d6fd)
2018-05-30 21:45:03 +02:00
Guillaume Abrioux e1c1017e15 tests: resize root partition when atomic host
For a few moment we can see failures in the CI for containerized
scenarios because VMs are running out of space at some point.

The default in the images used is to have only 3Gb for root partition
which doesn't sound like a lot.

Typical error seen:

```
STDERR:

failed to register layer: Error processing tar file(exit status 1): open /usr/share/zoneinfo/Atlantic/Canary: no space left on device
```

Indeed, on the machine we can see:
```
Every 2.0s: df -h                                                                                                                                                                                                                                       Tue May 29 17:21:13 2018
Filesystem                 Size  Used Avail Use% Mounted on
/dev/mapper/atomicos-root  3.0G  3.0G   14M 100% /
```

The idea here is to expand this partition with all the available space
remaining by issuing an `lvresize` followed by an `xfs_growfs`.

```
-bash-4.2# lvresize -l +100%FREE /dev/atomicos/root
  Size of logical volume atomicos/root changed from <2.93 GiB (750 extents) to 9.70 GiB (2484 extents).
  Logical volume atomicos/root successfully resized.
```

```
-bash-4.2# xfs_growfs /
meta-data=/dev/mapper/atomicos-root isize=512    agcount=4, agsize=192000 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=768000, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 768000 to 2543616
```

```
-bash-4.2# df -h
Filesystem                 Size  Used Avail Use% Mounted on
/dev/mapper/atomicos-root  9.7G  1.4G  8.4G  14% /
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 34f7042852)
2018-05-30 14:40:50 +02:00
Guillaume Abrioux 92f0de3792 tests: avoid yum failures
In the CI we can see at many times failures like following:

`Failure talking to yum: Cannot find a valid baseurl for repo:
base/7/x86_64`

It seems the fastest mirror detection is sometimes counterproductive and
leads yum to fail.

This fix has been added in the `setup.yml`.
This playbook was used until now only just before playing `testinfra`
and could be used before running ceph-ansible so we can add some
provisionning tasks.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Erwan Velu <evelu@redhat.com>
(cherry picked from commit 98cb6ed8f6)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-30 14:40:50 +02:00
Guillaume Abrioux b00a3cf790 tests: move cephfs_pools variable
let's move this variable in group_vars/all.yml in all testing scenarios
accordingly to this commit 1f15a81c48 so
we keep consistency between the playbook and the tests.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a10e73d78d)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-24 21:29:42 +02:00
Guillaume Abrioux 873abdbf0c osds: move openstack pools creation in ceph-osd
When deploying a large number of OSD nodes it can be an issue because the
protection check [1] won't pass since it tries to create pools before all
OSDs are active.

The idea here is to move openstack pools creation at the end of `ceph-osd` role.

[1] e59258943b/src/mon/OSDMonitor.cc (L5673)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 564a662baf)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-24 21:29:42 +02:00
Luigi Toscano d7f0ea33c9 ceph-radosgw: disable NSS PKI db when SSL is disabled
The NSS PKI database is needed only if radosgw_keystone_ssl
is explicitly set to true, otherwise the SSL integration is
not enabled.

It is worth noting that the PKI support was removed from Keystone
starting from the Ocata release, so some code paths should be
changed anyway.

Also, remove radosgw_keystone, which is not useful anymore.
This variable was used until fcba2c801a.
Now profiles drives the setting of rgw keystone *.

Signed-off-by: Luigi Toscano <ltoscano@redhat.com>
(cherry picked from commit 43e96c1f98)
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-05-24 15:41:42 +02:00
Guillaume Abrioux a68091c923 tests: update the type for the rule used in pools
As of ceph 12.2.5 the type of the parameter `type` is not a name anymore but
an id, therefore an `int` is expected otherwise it will fail with the
following error

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-04-30 08:15:18 +02:00
Sébastien Han 71efa2eaf4 ci: bump client nodes to 2
In order to test the key distribution is correct we must have 2 client
nodes.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-23 18:34:58 +02:00
Guillaume Abrioux 77831ccb7a tests: update tests for mds to cover multimds case
in case of multimds we must check for the number of mds up instead of
just checking if the hostname of the node is in the fsmap.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-04-12 18:20:58 +02:00
Sébastien Han 82589021e0 ci: fix tripleO scenario
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-11 12:18:34 +02:00
Sébastien Han 2011ec3bcd ci: client copy admin key
If we don't copy the admin key we can't add the key into ceph.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-11 12:18:34 +02:00
Sébastien Han cf73647e7a ci: remove useless tests
These are already handled by ceph-client/defaults/main.yml so the keys
will be created once user_config is set to True.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-11 12:18:34 +02:00
Andrew Schoen 98e237d234 tests: no need to remove partitions in lvm_setup.yml
Now that we are using ceph_volume_zap the partitions are
kept around and should be able to be reused.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-04-10 14:19:21 +02:00
Sébastien Han f3caee8460 ceph-iscsi: fix certificates generation and distribution
Prior to this patch, the certificates where being generated on a single
node only (because of the run_once: true). Thus certificates were not
distributed on all the gateway nodes.

This would require a second ansible run to work. This patches fix the
creation and keys's distribution on all the nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1540845
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-04 09:27:39 +02:00
John Fulton e6e6bd078a Refer to expected-num-ojects as expected_num_objects, not size
Follow up patch to PR 2432 [1] which replaces "size" (sorry if
the original bug used that term, which can be confusing) with
expected_num_objects as is used in the Ceph documentation [2].

[1] https://github.com/ceph/ceph-ansible/pull/2432/files
[2] http://docs.ceph.com/docs/jewel/rados/operations/pools
2018-03-26 15:41:51 +02:00
Sébastien Han 3ab89ab48c ci: re-arrange group_vars files
We should stop putting everything in 'all'. This is too easy and this is
error prone as well for those who are separating variables into host
type, things that you should do.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han d5f8cac820 ci: remove left over iscsi_gws file
Wrong file that is not used, only iscsi-ggw that is present is correct.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han 8000ae342e remove unsed ceph_rgw_civetweb_port variable
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han f119b25bbe client: implement proper pools creation
Just like we did for the monitor and openstack_config we now have the
ability to precisely create pools.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han e302c1baae mon: add support for erasure code pool
You can now specify type: erasure and   erasure_profile to use when
declaring the pool dictionnary.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han 4806ff4ff8 ci: test pool creation on container
On containerized scenario we also want to test pool creation.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han fc0fa48e0d test: add tests for creating crush tree
We now run tests on the newly created ceph_crush module. Now the CI will
create a specific hierarchy for the OSD.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-06 15:24:31 +00:00
Sébastien Han fd94840a6e ci: add copy_admin_key test to container scenario
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-02 20:59:10 +00:00
Sébastien Han 165d9dec10 remove kernel.pid_max
This is now managed by Ceph packages.

See: https://github.com/ceph/ceph/pull/18544/files

http://tracker.ceph.com/issues/21929

Closes: https://github.com/ceph/ceph-ansible/issues/2410

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-02-23 13:57:57 +01:00
Guillaume Abrioux 4a8986459f tests: change ceph_docker_image_tag for 2nd run
The ceph-ansible upstream CI runs severals tests, including a
'idempotency/handlers' test. It means the playbook is run a first time
and then a second time with an other container image version to ensure the
handlers run properly and the containers are well restarted.
This can cause issues.
For instance, in that specific case which drove me to submit this commit,
I've hit the case where `latest` image ships ceph 12.2.3 while the `stable-3.0`
(which is the image used for the second run) ships ceph 12.2.2.

The goal of this test is not to verify we can upgrade from a specific
version to another but to ensure handlers are working even if it's a valid
failure here.
It should be caught by a test dedicated to that usecase.

We just need to have a container image which has a different id for
the upstream CI, we need the same content in container imagebut a different
image id in the registry since the test relies on image id to decide whether
the container should be restarted.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-02-23 13:54:32 +01:00
Guillaume Abrioux 707458c979 ci: add tripleo scenario testing
This should help to see earlier any failure in a tripleo deployment scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-02-23 13:54:32 +01:00
Sébastien Han 7d690878df test: add test for containers resources changes
We change the ceph_mon_docker_memory_limit on the second run, this
should trigger a restart of services.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-02-14 02:01:29 +01:00
Sébastien Han 79864a8936 test: add test for restart on new container image
Since we have a task to test the handlers we can test a new container to
validate the service restart on a new container image.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-02-14 02:01:29 +01:00
Guillaume Abrioux deaf273b25 syntax: change local_action syntax
Use a nicer syntax for `local_action` tasks.
We used to have oneliner like this:
```
local_action: wait_for port=22 host={{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }} state=started delay=10 timeout=500 }}
```

The usual syntax:
```
    local_action:
      module: wait_for
      port: 22
      host: "{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}"
      state: started
      delay: 10
      timeout: 500
```
is nicer and kind of way to keep consistency regarding the whole
playbook.

This also fix a potential issue about missing quotation :

```
Traceback (most recent call last):
  File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 213, in <module>
    main()
  File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 185, in main
    rc, out, err = module.run_command(args, executable=executable, use_unsafe_shell=shell, encoding=None, data=stdin)
  File "/tmp/ansible_wQtWsi/ansible_modlib.zip/ansible/module_utils/basic.py", line 2710, in run_command
  File "/usr/lib64/python2.7/shlex.py", line 279, in split
    return list(lex)                                                                                                                                                                                                                                                                                                            File "/usr/lib64/python2.7/shlex.py", line 269, in next
    token = self.get_token()
  File "/usr/lib64/python2.7/shlex.py", line 96, in get_token
    raw = self.read_token()
  File "/usr/lib64/python2.7/shlex.py", line 172, in read_token
    raise ValueError, "No closing quotation"
ValueError: No closing quotation
```

writing `local_action: shell echo {{ fsid }} | tee {{ fetch_directory }}/ceph_cluster_uuid.conf`
can cause trouble because it's complaining with missing quotes, this fix solves this issue.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1510555

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-01-31 10:45:34 +01:00
Andrew Schoen cfb75b8e29 tests: remove crush_device_class from lvm tests
The --crush-device-class flag for ceph-volume is not available in luminous so lets
remove this testing option for now until it's more widely available.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-01-18 15:03:38 +01:00
Andrew Schoen 64f5772140 tests: adds crush_device_class to lvm tests
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-01-17 13:49:29 +01:00
Sébastien Han 39f2bfd5d5 fix jewel scenarios on container
When deploying Jewel from master we still need to enable this code since
the container image has such check. This check still exists because
ceph-disk is not able to create a GPT label on a drive that does not
have one.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-12-20 13:43:19 +01:00
Guillaume Abrioux ab1dd3027a client: don't try to generate keys
the entrypoint to generate users keyring is `ceph-authtool`, therefore,
it can expand the `$(ceph-authtool --gen-print-key)` inside the
container. Users must generate a keyring themselves.
This commit also adds a check to ensure keyring are properly filled when
`user_config: true`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-12-14 17:22:07 +01:00
Guillaume Abrioux aa0b1ed118 tests: remove OSD_FORCE_ZAP variable from tests
according to ceph/ceph-container#840, this variable is no longer needed.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-11-14 17:55:01 +01:00
Sébastien Han d05206236c
Merge pull request #2124 from ceph/lvm-setup-fix
test: when creating the /dev/sdc2 partition specify label as gpt
2017-10-31 16:51:16 +01:00
Andrew Schoen 37a48209cc test: when creating the /dev/sdc2 partition specify label as gpt
ansible==2.4 requires that label be set to gpt, or it will be defaulted
to msdos.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-10-31 09:38:47 -05:00
Guillaume Abrioux c28882c1cd tests: add missing test for rbd
Add a missing test `test_rbd_mirror_service_is_running_from_luminous()`.
Also using bash -c "<cmd>" to make testinfra aware that later in
the upgrade process we are now running `luminous` ceph release so we
must skip the rbd tests related to `jewel` ceph release.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-10-30 19:44:56 +01:00
Sébastien Han faccd0acf0 Merge pull request #2100 from ceph/lvm-bluestore
ceph-volume lvm bluestore support
2017-10-27 17:36:16 +02:00
Major Hayden f73232caa4
Use check_mode instead of always_run
This patch changes the `always_run: yes` task option to
`check_mode: no` to avoid Ansible warnings.
2017-10-25 09:53:34 -05:00
Major Hayden c2b5118c1b
Revert "Avoid deprecated always_run"
This reverts commit 620fb37dd4.
2017-10-25 09:48:09 -05:00