ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Neha Ojha	27027a17d3	osd: add osd memory target option BlueStore's cache is sized conservatively by default, so that it does not overwhelm under-provisioned servers. The default is 1G for HDD, and 3G for SSD. To replace the page cache, as much memory as possible should be given to BlueStore. This is required for good performance. Since ceph-ansible knows how much memory a host has, it can set `bluestore cache size = max(total host memory / num OSDs on this host * safety factor, 1G)` Due to fragmentation and other memory use not included in bluestore's cache, a safety factor of 0.5 for dedicated nodes and 0.2 for hyperconverged nodes is recommended. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1595003 Signed-off-by: Neha Ojha <nojha@redhat.com> Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-18 10:12:46 +00:00
Guillaume Abrioux	57f0b6a476	shrink-osd: follow up on `36fb3cde` - Adds loop in bash to satisfy the 1:n relation between `osd_hosts` and the different device lists. - Fixes some container name which were using the host hostname instead of the actual container one. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-18 07:27:41 +00:00
Guillaume Abrioux	98c210d757	site-docker: fix undefined variable error `mon_group_name` isn't defined here, we must hardcode it. Typical error: ``` The task includes an option with an undefined variable. The error was: 'mon_group_name' is undefined ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-18 07:27:41 +00:00
Sébastien Han	735e1917db	shrink-osd: purge dedicated devices Once the OSD is destroyed we also have to purge the associated devices, this means purging journal, db , wal partitions too. This now works for container and non-container. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1572933 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-09-18 07:27:41 +00:00
Mike Christie	8fcd63cc50	igw: enable and start rbd-target-api The commit: commit `1164cdc002` Author: Guillaume Abrioux <gabrioux@redhat.com> Date: Thu Aug 2 11:58:47 2018 +0200 iscsigw: install ceph-iscsi-cli package installs the cli package but does not start and enable the rbd-target-api daemon needed for gwcli to communicate with the igw nodes. This patch just enables and starts it for the non-container setup. The container setup is already doing this. This fixes bz https://bugzilla.redhat.com/show_bug.cgi?id=1613963 Signed-off-by: Mike Christie <mchristi@redhat.com>	2018-09-13 19:35:45 +00:00
Guillaume Abrioux	3382c5226c	tests: fix monitor_address for shrink_osd scenario `b89cc1746` introduced a typo. This commit fixes it Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-13 18:14:01 +02:00
Guillaume Abrioux	4159326a18	shrink-osd: fix purge osd on containerized deployment `ce1dd8d` introduced the purge osd on containers but it was incorrect. `resolve parent device` and `zap ceph osd disks` tasks must be delegated to their respective OSD nodes. Indeed, they were run on the ansible node, it means it was trying to resolve parent devices from this node where it should be done on OSD nodes. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1612095 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-13 18:14:01 +02:00
Guillaume Abrioux	7a61771539	doc: update lvm doc As of `e3820a2` the creation of logical volumes is now supported by ceph-ansible. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-13 15:29:24 +00:00
Guillaume Abrioux	a6f77340fd	nfs: ignore error on semanage command for ganesha_t As of rhel 7.6, it has been decided it doesn't make sense to confine `ganesha_t` anymore. It means this domain won't exist anymore. Let's add a `failed_when: false` in order to make the deployment not failing when trying to run this command. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1626070 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-13 13:06:47 +02:00
Guillaume Abrioux	8f2c660d25	tests: pin sphinx version to 1.7.9 using sphinx 1.8.0 breaks our doc test CI job. Typical error: ``` Exception occurred: File "/home/jenkins-build/build/workspace/ceph-ansible-docs-pull-requests/docs/.tox/docs/lib/python2.7/site-packages/sphinx/highlighting.py", line 26, in <module> from sphinx.ext import doctest SyntaxError: unqualified exec is not allowed in function 'run' it contains a nested function with free variables (doctest.py, line 97) ``` See: https://github.com/sphinx-doc/sphinx/issues/5417 Pinning to 1.7.9 to fix our CI. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-13 13:05:43 +02:00
Andrew Schoen	b36f3e06b5	ceph_volume: adds the osds_per_device parameter If this is set to anything other than the default value of 1 then the --osds-per-device flag will be used by the batch command to define how many osds will be created per device. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-09-12 20:27:14 +00:00
Guillaume Abrioux	1c88c444a3	mon: fix `ExecStartPre` option in systemd unit file This command line is not supported. According to official documentation: ``` Note that shell command lines are not directly supported. If shell command lines are to be used, they need to be passed explicitly to a shell implementation of some kind. ``` We must run this using /bin/sh instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-11 10:48:21 +02:00
Guillaume Abrioux	9ff26e80f2	defaults: add a default value to rgw_hostname let's add ansible_hostname as a default value for rgw_hostname if no hostname in servicemap matches ansible_fqdn. Fixes: #3063 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622505 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-10 12:07:44 +02:00
Guillaume Abrioux	6954ac184f	tests: do not upgrade ceph release for switch_to_containers scenario Using `UPDATE_*` environment variables here will make an upgrade of the ceph release when running switch_to_containers scenario which is not correct. Eg: If ceph luminous was first deployed, then we should switch to ceph luminous containers, not to mimic. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-09 14:25:14 +02:00
Guillaume Abrioux	ecbd3e4558	Revert "client: add quotes to the dict values" This commit is adding quotes that make keyring unusuable eg: ``` client.john key: AQAN0RdbAAAAABAAH5D3WgMN9Rxw3M8jkpMIfg== caps: [mds] '' caps: [mgr] 'allow *' caps: [mon] 'allow rw' caps: [osd] 'allow rw' ``` Trying to import such a keyring and use it will result: ``` Error EACCES: access denied ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1623417 This reverts commit `424815501a`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-07 17:21:55 +00:00
Tom Barron	bf8f589958	run rados cmd in container if containerized deployment When ceph-nfs is deployed containerized and ceph-common is not installed on the host the start_nfs task fails because the rados command is missing on the host. Run rados commands from a ceph container instead so that they will succeed. Signed-off-by: Tom Barron <tpb@dyncloud.net>	2018-09-03 17:06:00 +00:00
Markos Chandras	217f35dbdb	roles: ceph-rgw: Enable the ceph-radosgw target If the ceph-radosgw target is not enabled, then enabling the ceph-radosgw@ service has no effect since nothing will pull it on the next reboot. As such, we need to ensure that the target is enabled. Signed-off-by: Markos Chandras <mchandras@suse.de>	2018-09-03 15:48:58 +02:00
Sébastien Han	38dc20e74b	purge: only purge /var/lib/ceph content Sometime /var/lib/ceph is mounted on a device so we won't be able to remove it (device busy) so let's remove its content only. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1615872 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-09-03 10:51:24 +02:00
Alfredo Deza	58b2308036	tests: use new 'num_osds' variable in tests Signed-off-by: Alfredo Deza <adeza@redhat.com>	2018-08-31 21:23:20 +00:00
Alfredo Deza	e5fcb0d2d2	tests: allow defining arbitrary number of OSDs Some tests might want to set this since number of devices will not necessarily map to number of OSDs Signed-off-by: Alfredo Deza <adeza@redhat.com>	2018-08-31 21:23:20 +00:00
Andy McCrae	772e6b9be2	Dont run client dummy container on non-x86_64 hosts The dummy client container currently wont work on non-x86_64 hosts. This PR creates a filtered client group that contains only hosts that are x86_64 - which can then be the group to run the dummy container against. This is for the specific case of a containerized_deployment where there is a mixture of non-x86_64 hosts and x86_64 hosts. As such the filtered group will contain all hosts when running with containerized_deployment: false. Currently ppc64le is not supported for Ceph server components. Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>	2018-08-31 11:34:00 +00:00
Ali Maredia	561ec9203d	infrastructure-playbooks: add comments for lv_vars.yml Add comments telling user that devices used in playbooks must not have GPT/FS/RAID signatures Signed-off-by: Ali Maredia <amaredia@redhat.com>	2018-08-29 21:10:20 +00:00
Ali Maredia	77eb459a88	infrastructure playbooks: remove lv-create error msg remove error message when PV creation fails Signed-off-by: Ali Maredia <amaredia@redhat.com>	2018-08-29 21:10:20 +00:00
Sébastien Han	124fc727f4	doc: remove old statement We have been supporting multiple devices for journalin containerized deployments for a while now and forgot about this. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622393 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-28 13:31:57 -07:00
Sébastien Han	9ba670567e	remove warning for unsupported variables As promised, these will go unsupported for 3.1 so let's actually remove them :). Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622729 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-28 13:31:57 -07:00
Ali Maredia	e1ff438800	infrastructure-playbooks: failure msg for pvcreate Add a message for when PV creation fails. This message alerts users that FS/GPT/RAID signatures could still on the device and the reason for the failures. `wipefs -a $device` needs to be run to fix this issue. Signed-off-by: Ali Maredia <amaredia@redhat.com>	2018-08-28 20:21:42 +00:00
Sébastien Han	ae5ebeeb00	sites: fix conditonnal Same problem again... ceph_release_num[ceph_release] is only set in ceph-docker-common/common roles so putting the condition on that role will never work. Removing the condition. The downside of this is we will be installing packages and then skip the role on the node. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622210 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-27 22:11:15 +02:00
Sébastien Han	30cfeb5427	site-docker.yml: remove useless condition If we play site-docker.yml, we are already in a containerized_deployment. So the condition is not needed. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-23 16:13:54 +02:00
Sébastien Han	7012835d2b	ci: stop using different images on the same run There is no point of using hosts running on atomic AND centos hosts. So let's run containerized scenarios on Atomic only. This solves this error here: ``` fatal: [client2]: FAILED! => { "failed": true } MSG: The conditional check 'ceph_current_status.rc == 0' failed. The error was: error while evaluating conditional (ceph_current_status.rc == 0): 'dict object' has no attribute 'rc' The error appears to have been in '/home/jenkins-build/build/workspace/ceph-ansible-nightly-luminous-stable-3.1-ooo_collocation/roles/ceph-defaults/tasks/facts.yml': line 74, column 3, but may be elsewhere in the file depending on the exact syntax problem. The offending line appears to be: - name: set_fact ceph_current_status (convert to json) ^ here ``` From https://2.jenkins.ceph.com/view/ceph-ansible-stable3.1/job/ceph-ansible-nightly-luminous-stable-3.1-ooo_collocation/37/consoleFull#1765217701b5dd38fa-a56e-4233-a5ca-584604e56e3a What's happening here is all the hosts excepts the clients are running atomic, so here: https://github.com/ceph/ceph-ansible/blob/master/site-docker.yml.sample#L62 The condition will skipped all the nodes excepts the clients, thus when running ceph-default, the task "is ceph running already?" is skipped but the task above needs the rc of the skipped task. This is not an error from the playbook, it's a CI setup issue. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-23 16:13:54 +02:00
Sébastien Han	6d7fa99ff7	defaults: fix rgw_hostname A couple if things were wrong in the initial commit: * ceph_release_num[ceph_release] >= ceph_release_num['luminous'] will never work since the ceph_release fact is set in the roles after. So either ceph-common or ceph-docker-common set it * we can easily re-use the initial command to check if a cluster is running, it's more elegant than running it twice. * set the fact rgw_hostname on rgw nodes only Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1618678 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-22 17:46:00 +02:00
Sébastien Han	0d448da695	vagrant: move variable samples to contrib Let's clean up the root of the repo a bit Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-21 23:54:24 +02:00
Sébastien Han	a2ad2fb3d5	rm ceph-aio-no-vagrant.sh Script is out dated. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-21 23:54:24 +02:00
Sébastien Han	017aa6ef20	remove monitor_keys_example file This file is not needed, if you want to generate a key you can run: python -c "import os ; import struct ; import time; import base64 ; key = os.urandom(16) ; header = struct.pack('<hiih',1,int(time.time()),0,len(key)) ; print(base64.b64encode(header + key).decode())" Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-21 23:54:24 +02:00
Andy McCrae	18684b7209	Sync config_template with base plugin The config_template plugin exists in the ceph-common role so that config_template will still work with ansible galaxy. This PR syncs the config_template module from the base of the repo in plugins/actions to the ceph-common role. Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>	2018-08-21 16:10:33 +00:00
Sébastien Han	2e6e885bb7	rolling_upgrade: set sortbitwise properly Running 'osd set sortbitwise' when we detect a version 12 of Ceph is wrong. When OSD are getting updated, even though the package is updated they won't send their updated version (12) and will stick with 10 if the command is not applied. So we have to check if OSD are sending a version 10 and then run the command to unlock the OSDs. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600943 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-21 12:22:32 +00:00
Sébastien Han	77a3a682f3	iscsi group name preserve backward compatibility Recently we renamed the group_name for iscsi iscsigws where previously it was named iscsi-gws. Existing deployments with a host file section with iscsi-gws must continue to work. This commit adds the old group name as a backoward compatility, no error from Ansible should be expected, if the hostgroup is not found nothing is played. Close: https://bugzilla.redhat.com/show_bug.cgi?id=1619167 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-20 23:52:19 +02:00
Sébastien Han	8c70a5b197	osd: fix ceph_release We need ceph_release in the condition, not ceph_stable_release Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1619255 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-20 20:14:56 +02:00
Sébastien Han	b738706810	take-over-existing-cluster: do not call var_files We were using var_files long ago when default variables were not in ceph-defaults, now the role exists this is not need. Moreover having these two var files added: - roles/ceph-defaults/defaults/main.yml - group_vars/all.yml Will create collision and override necessary variables. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1555305 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-20 14:47:04 +02:00
Markos Chandras	126e2e3f92	roles: ceph-defaults: Check if 'rgw' attribute exists for rgw_hostname If there are no services on the cluster, then the 'rgw' could be missing and the task is failing with the following problem: msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'rgw' We fix this by checking the existence of the 'rgw' attribute. If it's missing, we skip the task since the role already contains code to set a good default rgw_hostname. Signed-off-by: Markos Chandras <mchandras@suse.de>	2018-08-20 11:37:45 +02:00
Markos Chandras	37e50114de	roles: ceph-defaults: Delegate cluster information task to monitor node Since commit `f422efb1d6` ("config: ensure rgw section has the correct name") we observe the following failures in new Ceph deployment with OpenStack-Ansible fatal: [aio1_ceph-rgw_container-fc588f0a]: FAILED! => {"changed": false, "cmd": "ceph --cluster ceph -s -f json", "msg": "[Errno 2] No such file or directory" This is because the task executes 'ceph' but at this point no package installation has happened. Packages are normally installed in the 'ceph-common' role which runs after the 'ceph-defaults' one. Since we are looking to obtain cluster information, the task should be delegated to a monitor node similar to other tasks in that role Signed-off-by: Markos Chandras <mchandras@suse.de>	2018-08-20 11:37:45 +02:00
Dardo D Kleiner	f6519e4003	mgr: improve/fix disabled modules check Follow up on `36942af698` "disabled_modules" is always a list, it's the items in the list that can be dicts in mimic. Many ways to fix this, here's one. Signed-off-by: Dardo D Kleiner <dardokleiner@gmail.com>	2018-08-20 11:23:58 +02:00
Andrew Schoen	04df3f0802	lv-create: use copy instead of the template module The copy module does in fact do variable interpolation so we do not need to use the template module or keep a template in the source. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Andrew Schoen	f5a4c89869	tests: cat the contents of lv-create.log in infra_lv_create Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Andrew Schoen	131796f275	lv-create: add an example logfile_path config option in lv_vars.yml Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Andrew Schoen	810cc47892	tests: adds a testing scenario for lv-create and lv-teardown Using an explicitly named testing environment name allows us to have a specific [testenv] block for this test. This greatly simplifies how it will work as it doesn't really anything from the ceph cluster tests. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Andrew Schoen	b0bfc17351	lv-teardown: fail silently if lv_vars.yml is not found This allows user to opt out of using lv_vars.yml and load configuration from other sources. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Andrew Schoen	8424858b40	lv-teardown: set become: true at the playbook level Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Andrew Schoen	e43eec57bb	lv-create: fail silenty if lv_vars.yml is not found If a user decides to to use the lv_vars.yml file then it should fail silenty so that configuration can be picked up from other places. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Andrew Schoen	fde47be13c	lv-create: set become: true at the playbook level Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Andrew Schoen	35301b35af	lv-create: use the template module to write log file The copy module will not expand the template and render the variables included, so we must use template. Creating a temp file and using it locally means that you must run the playbook with sudo privledges, which I don't think we want to require. This introduces a logfile_path variable that the user can use to control where the logfile is written to, defaulting to the cwd. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00

... 2 3 4 5 6 ...

4063 Commits (abdc245cebf2d2a74ee395c43ba9290deb792eb3) All Branches Search

4063 Commits (abdc245cebf2d2a74ee395c43ba9290deb792eb3)

All Branches