ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	09d795b5b7	tests: add mimic support for test_rbd_mirror_is_up() prior mimic, the data structure returned by `ceph -s -f json` used to gather information about rbd-mirror daemons looked like below: ``` "servicemap": { "epoch": 8, "modified": "2018-07-05 13:21:06.207483", "services": { "rbd-mirror": { "daemons": { "summary": "", "ceph-nano-luminous-faa32aebf00b": { "start_epoch": 8, "start_stamp": "2018-07-05 13:21:04.668450", "gid": 14107, "addr": "172.17.0.2:0/2229952892", "metadata": { "arch": "x86_64", "ceph_version": "ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)", "cpu": "Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz", "distro": "centos", "distro_description": "CentOS Linux 7 (Core)", "distro_version": "7", "hostname": "ceph-nano-luminous-faa32aebf00b", "instance_id": "14107", "kernel_description": "#1 SMP Wed Mar 14 15:12:16 UTC 2018", "kernel_version": "4.9.87-linuxkit-aufs", "mem_swap_kb": "1048572", "mem_total_kb": "2046652", "os": "Linux" } } } } } } ``` This part has changed from mimic and became: ``` "servicemap": { "epoch": 2, "modified": "2018-07-04 09:54:36.164786", "services": { "rbd-mirror": { "daemons": { "summary": "", "14151": { "start_epoch": 2, "start_stamp": "2018-07-04 09:54:35.541272", "gid": 14151, "addr": "192.168.1.80:0/240942528", "metadata": { "arch": "x86_64", "ceph_release": "mimic", "ceph_version": "ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable)", "ceph_version_short": "13.2.0", "cpu": "Intel(R) Xeon(R) CPU X5650 @ 2.67GHz", "distro": "centos", "distro_description": "CentOS Linux 7 (Core)", "distro_version": "7", "hostname": "ceph-rbd-mirror0", "id": "ceph-rbd-mirror0", "instance_id": "14151", "kernel_description": "#1 SMP Wed May 9 18:05:47 UTC 2018", "kernel_version": "3.10.0-862.2.3.el7.x86_64", "mem_swap_kb": "1572860", "mem_total_kb": "1015548", "os": "Linux" } } } } } } ``` This patch modifies the function `test_rbd_mirror_is_up()` in `test_rbd_mirror.py` so it works with `mimic` and keeps backward compatibility with `luminous` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-07-06 14:39:13 +02:00
Guillaume Abrioux	f2e57a56db	tests: factorize docker tests using docker_exec_cmd logic avoid duplicating test unnecessarily just because of docker exec syntax. Using the same logic than in the playbook with `docker_exec_cmd` allow us to execute the same test on both containerized and non containerized environment. The idea is to set a variable `docker_exec_cmd` with the 'docker exec <container-name>' string when containerized and set it to '' when non containerized. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-06-27 07:00:14 +00:00
Guillaume Abrioux	fe79a5d240	tests: refact test_all__osds_are_up_and_in these tests are skipped on bluestore osds scenarios. they were going to fail anyway since they are run on mon nodes and `devices` is defined in inventory for each osd node. It means `num_devices num_osd_hosts` returns `0`. The result is that the test expects to have 0 OSDs up. The idea here is to move these tests so they are run on OSD nodes. Each OSD node checks their respective OSD to be UP, if an OSD has 2 devices defined in `devices` variable, it means we are checking for 2 OSD to be up on that node, if each node has all its OSD up, we can say all OSD are up. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-06-26 15:23:39 +00:00
Guillaume Abrioux	1c3dae4a90	tests: skip rgw_tuning_pools_are_set when rgw_create_pools is not defined since ooo_collocation scenario is supposed to be the same scenario than the one tested by OSP and they are not passing `rgw_create_pools` the test `test_docker_rgw_tuning_pools_are_set` will fail: ``` > pools = node["vars"]["rgw_create_pools"] E KeyError: 'rgw_create_pools' ``` skipping this test if `node["vars"]["rgw_create_pools"]` is not defined fixes this failure. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-06-26 15:23:39 +00:00
Guillaume Abrioux	f68936ca7e	tests: fix *_has_correct_value tests It might happen that the list of ips/hosts in following line (ceph.conf) - `mon initial memebers = <hosts>` - `mon host = <ips>` are not ordered the same way depending on deployment. This patch makes the tests looking for each ip or hostname in respective lines. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-06-20 08:01:57 +02:00
Guillaume Abrioux	481c14455a	tests: add more nodes in ooo testing scenario adding more node in this scenario could help to have a better coverage so we can catch more potential bugs. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-06-18 16:44:23 +02:00
Guillaume Abrioux	21894655a7	tests: keep same ceph release during handlers/idempotency test since `latest` points to `mimic`, we need to force the test to keep the same ceph release when testing anything else than `mimic`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-06-15 11:45:51 -04:00
Guillaume Abrioux	bbb8691335	tests: increase memory to 1024Mb for centos7_cluster scenario we see more and more failure like `fatal: [mon0]: UNREACHABLE! => {}` in `centos7_cluster` scenario, Since we have 30Gb RAM on hypervisors, we can give monitors a bit more RAM. By the way, nodes on containerized cluster testing scenario have already 1024Mb memory allocated. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-06-11 23:52:15 +08:00
Sébastien Han	6035978ed9	test: only on containerized iscsi We don't have the same service running on non-container for now, this will change soon but for let's only run the test on container. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-06-11 08:34:48 +02:00
Sébastien Han	20c8065e48	ceph-iscsi: rename group iscsi_gws Let's try to avoid using dashes as testinfra needs to be able to read the groups. Typically, with iscsi-gws we can't add a marker for these iscsi nodes, using an underscore fixes the issue. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-06-08 10:21:54 +02:00
Sébastien Han	c00fb12497	ci: add functionnal tests for iscsi We test if: * packages are installed * services are runnning * service units are enabled Also fix linting issues Signed-off-by: Sébastien Han <seb@redhat.com>	2018-06-08 10:21:54 +02:00
Sébastien Han	5ff2f03e3f	ci: add iscsi test Add iscsi CI coverage, this will now deploy iscsi gateways in container. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-06-08 10:21:54 +02:00
Guillaume Abrioux	28d21b4e9c	tests: update ooo inventory hostfile Update the inventory host for tripleo testing scenario so it's the same parameters than in tripleo CI. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-06-07 17:26:35 +02:00
Guillaume Abrioux	c94ada69e8	tests: improve mds tests the expected number of mds daemon consist of number of daemons that are 'up' + number of daemons 'up:standby'. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-06-07 14:01:58 +08:00
Guillaume Abrioux	f0cd4b0651	tests: skip disabling fastest mirror detection on atomic host There is no need to execute this task on atomic hosts. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-06-05 15:39:37 +02:00
Guillaume Abrioux	47276764f7	tests: fix rgw tests `41b4632` has introduced a change in functionnals tests. Since the admin keyring isn't copied on rgw nodes anymore in tests, let's use the rgw keyring to achieve them. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-06-05 15:24:32 +02:00
Sébastien Han	41b4632abc	test: do not always copy admin key The admin key must be copied on the osd nodes only when we test the shrink scenario. Shrink relies on ceph-disk commands that require the admin key on the node where it's being executed. Now we only copy the key when running on the shrink-osd scenario. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-06-05 09:39:30 +02:00
Guillaume Abrioux	2cf06b515f	rgw: refact rgw pools creation Refact of `8704144e31` There is no need to have duplicated tasks for this. The rgw pools creation should be delegated on a monitor node se we don't have to care if the admin keyring is present on rgw node. By the way, only one task is needed to create the pools, we just need to use the `docker_exec_cmd` fact already defined in `ceph-defaults` to achieve it. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1550281 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-06-05 15:00:20 +08:00
Erwan Velu	493f615eae	ceph-defaults: Enable local epel repository During the tests, the remote epel repository is generating a lots of errors leading to broken jobs (issue #2666) This patch is about using a local repository instead of a random one. To achieve that, we make a preliminary install of epel-release, remove the metalink and enforce a baseurl to our local http mirror. That should speed up the build process but also avoid the random errors we face. This patch is part of a patch series that tries to remove all possible yum failures. Signed-off-by: Erwan Velu <erwan@redhat.com>	2018-06-04 08:11:35 +02:00
jtudelag	600e1e2c26	rgws: renames create_pools variable with rgw_create_pools. Renamed to be consistent with the role (rgw) and have a meaningful name. Signed-off-by: Jorge Tudela <jtudelag@redhat.com>	2018-06-04 06:23:42 +02:00
jtudelag	8704144e31	Adds RGWs pool creation to containerized installation. ceph command has to be executed from one of the monitor containers if not admin copy present in RGWs. Task has to be delegated then. Adds test to check proper RGW pool creation for Docker container scenarios. Signed-off-by: Jorge Tudela <jtudelag@redhat.com>	2018-06-04 06:23:42 +02:00
Guillaume Abrioux	c68126d6fd	mdss: do not make pg_num a mandatory params When playing ceph-mds role, mon nodes have set a fact with the default pg num for osd pools, we can simply default to this value for cephfs pools (`cephfs_pools` variable). At the moment the variable definition for `cephfs_pools` looks like: ``` cephfs_pools: - { name: "{{ cephfs_data }}", pgs: "" } - { name: "{{ cephfs_metadata }}", pgs: "" } ``` and we have a task in `ceph-validate` to ensure `pgs` has been set to a valid value. We could simply avoid this check by setting the default value of `pgs` to `hostvars[groups[mon_group_name][0]]['osd_pool_default_pg_num']` and let to users the possibility to override this value. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1581164 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-30 16:20:34 +02:00
Guillaume Abrioux	34f7042852	tests: resize root partition when atomic host For a few moment we can see failures in the CI for containerized scenarios because VMs are running out of space at some point. The default in the images used is to have only 3Gb for root partition which doesn't sound like a lot. Typical error seen: ``` STDERR: failed to register layer: Error processing tar file(exit status 1): open /usr/share/zoneinfo/Atlantic/Canary: no space left on device ``` Indeed, on the machine we can see: ``` Every 2.0s: df -h Tue May 29 17:21:13 2018 Filesystem Size Used Avail Use% Mounted on /dev/mapper/atomicos-root 3.0G 3.0G 14M 100% / ``` The idea here is to expand this partition with all the available space remaining by issuing an `lvresize` followed by an `xfs_growfs`. ``` -bash-4.2# lvresize -l +100%FREE /dev/atomicos/root Size of logical volume atomicos/root changed from <2.93 GiB (750 extents) to 9.70 GiB (2484 extents). Logical volume atomicos/root successfully resized. ``` ``` -bash-4.2# xfs_growfs / meta-data=/dev/mapper/atomicos-root isize=512 agcount=4, agsize=192000 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=0 spinodes=0 data = bsize=4096 blocks=768000, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 data blocks changed from 768000 to 2543616 ``` ``` -bash-4.2# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/atomicos-root 9.7G 1.4G 8.4G 14% / ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-30 10:54:35 +02:00
Guillaume Abrioux	98cb6ed8f6	tests: avoid yum failures In the CI we can see at many times failures like following: `Failure talking to yum: Cannot find a valid baseurl for repo: base/7/x86_64` It seems the fastest mirror detection is sometimes counterproductive and leads yum to fail. This fix has been added in the `setup.yml`. This playbook was used until now only just before playing `testinfra` and could be used before running ceph-ansible so we can add some provisionning tasks. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> Co-authored-by: Erwan Velu <evelu@redhat.com>	2018-05-28 22:04:35 +02:00
Guillaume Abrioux	a10e73d78d	tests: move cephfs_pools variable let's move this variable in group_vars/all.yml in all testing scenarios accordingly to this commit `1f15a81c48` so we keep consistency between the playbook and the tests. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-24 09:39:38 -07:00
Guillaume Abrioux	564a662baf	osds: move openstack pools creation in ceph-osd When deploying a large number of OSD nodes it can be an issue because the protection check [1] won't pass since it tries to create pools before all OSDs are active. The idea here is to move openstack pools creation at the end of `ceph-osd` role. [1] `e59258943b/src/mon/OSDMonitor.cc (L5673)` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-24 09:39:38 -07:00
Luigi Toscano	43e96c1f98	ceph-radosgw: disable NSS PKI db when SSL is disabled The NSS PKI database is needed only if radosgw_keystone_ssl is explicitly set to true, otherwise the SSL integration is not enabled. It is worth noting that the PKI support was removed from Keystone starting from the Ocata release, so some code paths should be changed anyway. Also, remove radosgw_keystone, which is not useful anymore. This variable was used until `fcba2c801a`. Now profiles drives the setting of rgw keystone *. Signed-off-by: Luigi Toscano <ltoscano@redhat.com>	2018-05-23 23:24:09 -07:00
Guillaume Abrioux	a68091c923	tests: update the type for the rule used in pools As of ceph 12.2.5 the type of the parameter `type` is not a name anymore but an id, therefore an `int` is expected otherwise it will fail with the following error Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-30 08:15:18 +02:00
Sébastien Han	71efa2eaf4	ci: bump client nodes to 2 In order to test the key distribution is correct we must have 2 client nodes. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-23 18:34:58 +02:00
Guillaume Abrioux	77831ccb7a	tests: update tests for mds to cover multimds case in case of multimds we must check for the number of mds up instead of just checking if the hostname of the node is in the fsmap. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-12 18:20:58 +02:00
Sébastien Han	82589021e0	ci: fix tripleO scenario Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-11 12:18:34 +02:00
Sébastien Han	2011ec3bcd	ci: client copy admin key If we don't copy the admin key we can't add the key into ceph. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-11 12:18:34 +02:00
Sébastien Han	cf73647e7a	ci: remove useless tests These are already handled by ceph-client/defaults/main.yml so the keys will be created once user_config is set to True. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-11 12:18:34 +02:00
Andrew Schoen	98e237d234	tests: no need to remove partitions in lvm_setup.yml Now that we are using ceph_volume_zap the partitions are kept around and should be able to be reused. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-04-10 14:19:21 +02:00
Sébastien Han	f3caee8460	ceph-iscsi: fix certificates generation and distribution Prior to this patch, the certificates where being generated on a single node only (because of the run_once: true). Thus certificates were not distributed on all the gateway nodes. This would require a second ansible run to work. This patches fix the creation and keys's distribution on all the nodes. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1540845 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-04 09:27:39 +02:00
John Fulton	e6e6bd078a	Refer to expected-num-ojects as expected_num_objects, not size Follow up patch to PR 2432 [1] which replaces "size" (sorry if the original bug used that term, which can be confusing) with expected_num_objects as is used in the Ceph documentation [2]. [1] https://github.com/ceph/ceph-ansible/pull/2432/files [2] http://docs.ceph.com/docs/jewel/rados/operations/pools	2018-03-26 15:41:51 +02:00
Sébastien Han	3ab89ab48c	ci: re-arrange group_vars files We should stop putting everything in 'all'. This is too easy and this is error prone as well for those who are separating variables into host type, things that you should do. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-14 14:22:00 +01:00
Sébastien Han	d5f8cac820	ci: remove left over iscsi_gws file Wrong file that is not used, only iscsi-ggw that is present is correct. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-14 14:22:00 +01:00
Sébastien Han	8000ae342e	remove unsed ceph_rgw_civetweb_port variable Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-14 14:22:00 +01:00
Sébastien Han	f119b25bbe	client: implement proper pools creation Just like we did for the monitor and openstack_config we now have the ability to precisely create pools. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-14 14:22:00 +01:00
Sébastien Han	e302c1baae	mon: add support for erasure code pool You can now specify type: erasure and erasure_profile to use when declaring the pool dictionnary. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-14 14:22:00 +01:00
Sébastien Han	4806ff4ff8	ci: test pool creation on container On containerized scenario we also want to test pool creation. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-14 14:22:00 +01:00
Sébastien Han	fc0fa48e0d	test: add tests for creating crush tree We now run tests on the newly created ceph_crush module. Now the CI will create a specific hierarchy for the OSD. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-06 15:24:31 +00:00
Sébastien Han	fd94840a6e	ci: add copy_admin_key test to container scenario Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-02 20:59:10 +00:00
Sébastien Han	165d9dec10	remove kernel.pid_max This is now managed by Ceph packages. See: https://github.com/ceph/ceph/pull/18544/files http://tracker.ceph.com/issues/21929 Closes: https://github.com/ceph/ceph-ansible/issues/2410 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-02-23 13:57:57 +01:00
Guillaume Abrioux	4a8986459f	tests: change ceph_docker_image_tag for 2nd run The ceph-ansible upstream CI runs severals tests, including a 'idempotency/handlers' test. It means the playbook is run a first time and then a second time with an other container image version to ensure the handlers run properly and the containers are well restarted. This can cause issues. For instance, in that specific case which drove me to submit this commit, I've hit the case where `latest` image ships ceph 12.2.3 while the `stable-3.0` (which is the image used for the second run) ships ceph 12.2.2. The goal of this test is not to verify we can upgrade from a specific version to another but to ensure handlers are working even if it's a valid failure here. It should be caught by a test dedicated to that usecase. We just need to have a container image which has a different id for the upstream CI, we need the same content in container imagebut a different image id in the registry since the test relies on image id to decide whether the container should be restarted. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-02-23 13:54:32 +01:00
Guillaume Abrioux	707458c979	ci: add tripleo scenario testing This should help to see earlier any failure in a tripleo deployment scenario. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-02-23 13:54:32 +01:00
Sébastien Han	7d690878df	test: add test for containers resources changes We change the ceph_mon_docker_memory_limit on the second run, this should trigger a restart of services. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-02-14 02:01:29 +01:00
Sébastien Han	79864a8936	test: add test for restart on new container image Since we have a task to test the handlers we can test a new container to validate the service restart on a new container image. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-02-14 02:01:29 +01:00
Guillaume Abrioux	deaf273b25	syntax: change local_action syntax Use a nicer syntax for `local_action` tasks. We used to have oneliner like this: ``` local_action: wait_for port=22 host={{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }} state=started delay=10 timeout=500 }} ``` The usual syntax: ``` local_action: module: wait_for port: 22 host: "{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}" state: started delay: 10 timeout: 500 ``` is nicer and kind of way to keep consistency regarding the whole playbook. This also fix a potential issue about missing quotation : ``` Traceback (most recent call last): File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 213, in <module> main() File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 185, in main rc, out, err = module.run_command(args, executable=executable, use_unsafe_shell=shell, encoding=None, data=stdin) File "/tmp/ansible_wQtWsi/ansible_modlib.zip/ansible/module_utils/basic.py", line 2710, in run_command File "/usr/lib64/python2.7/shlex.py", line 279, in split return list(lex) File "/usr/lib64/python2.7/shlex.py", line 269, in next token = self.get_token() File "/usr/lib64/python2.7/shlex.py", line 96, in get_token raw = self.read_token() File "/usr/lib64/python2.7/shlex.py", line 172, in read_token raise ValueError, "No closing quotation" ValueError: No closing quotation ``` writing `local_action: shell echo {{ fsid }} \| tee {{ fetch_directory }}/ceph_cluster_uuid.conf` can cause trouble because it's complaining with missing quotes, this fix solves this issue. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1510555 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-31 10:45:34 +01:00

1 2 3 4 5 ...

282 Commits (09d795b5b737a05164772f5e3ba469577d605344)