ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Ali Maredia	01c58695fc	nfs: ensure nfs-server server is stopped NFS-ganesha cannot start is the nfs-server service is running. This commit stops nfs-server in case it is running on a (debian, redhat, suse) node before the nfs-ganesha service starts up fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1508506 Signed-off-by: Ali Maredia <amaredia@redhat.com>	2018-04-11 14:00:48 +02:00
Ramana Raja	4a430ae29a	ceph-nfs: allow disabling ganesha caching Add a variable, ceph_nfs_disable_caching, that if set to true disables ganesha's directory and attribute caching as much as possible. Also, disable caching done by ganesha, when 'nfs_file_gw' variable is true, i.e., when Ganesha is used as CephFS's gateway. This is the recommended Ganesha setting as libcephfs already caches information. And doing so helps avoid cache incoherency issues especially with clustered ganesha over CephFS. Fixes: https://tracker.ceph.com/issues/23393 Signed-off-by: Ramana Raja <rraja@redhat.com>	2018-04-11 13:56:40 +02:00
Sébastien Han	82ccbdafbc	ceph-defaults: bring backward compatibility for old syntax If people keep on using the mon_cap, osd_cap etc the playbook will translate this old syntax on the flight. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-11 12:18:34 +02:00
Sébastien Han	9657e4d6fa	ceph_key: use ceph_key in the playbook Replaced all the occurence of raw command using the 'command' module with the ceph_key module instead. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-11 12:18:34 +02:00
Guillaume Abrioux	66c4118dcd	defaults: fix backward compatibility backward compatibility with `ceph_mon_docker_interface` and `ceph_mon_docker_subnet` was not working since there wasn't lookup on `monitor_interface` and `public_network` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-10 00:19:11 +02:00
Ken Dreyer	3752cc6f38	common: upgrade/install ceph-test RPM first Prior to this change, if a user had ceph-test-12.2.1 installed, and upgraded to ceph v12.2.3 or newer, the RPM upgrade process would fail. The problem is that the ceph-test RPM did not depend on an exact version of ceph-common until v12.2.3. In Ceph v12.2.3, ceph-{osdomap,kvstore,monstore}-tool binaries moved from ceph-test into ceph-base. When ceph-test is not yet up-to-date, Yum encounters package conflicts between the older ceph-test and newer ceph-base. When all users have upgraded beyond Ceph < 12.2.3, this is no longer relevant.	2018-04-09 18:09:52 +02:00
Sébastien Han	bb60f2fea4	ceph-defaults: fix ceoh_uid for container image tag latest According to our recent change, we now use "CentOS" as a latest container image. We need to reflect this on the ceph_uid. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-09 13:54:55 +02:00
Zack Cerza	0123d790cd	Use the CentOS repo for Red Hat dev packages No use even trying to use something that doesn't exist. Signed-off-by: Zack Cerza <zack@redhat.com>	2018-04-09 10:05:57 +02:00
Attila Fazekas	ecd3563c21	Deploying without managed monitors failed Tripleo deployment failed when the monitors not manged by tripleo itself with: FAILED! => {"msg": "list object has no element 0"} The failing play item was introduced by `f46217b69a` . fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1552327 Signed-off-by: Attila Fazekas <afazekas@redhat.com>	2018-04-04 18:16:46 +02:00
Guillaume Abrioux	dcf6a246a4	defaults: remove `run_once: true` when creating fetch_directory because of `serial: 1`, it can be an issue when the playbook is being run on client nodes. Since the refact of `ceph-client` we skip the role `ceph-defaults` on every node except the first client node, it means that the task is not going to be played because of `run_once: true`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-04 10:51:17 +02:00
Guillaume Abrioux	18c0c7a508	config: use fact `ceph_uid` Use fact `ceph_uid` in the task which ensures `/etc/ceph` exists in containerized deployments. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-04 10:51:17 +02:00
Guillaume Abrioux	9c979c6390	clients: refact `ceph-clients` role This commit refacts this role so we don't have to pull container image on client nodes just to create pools and keys. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1550977 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-04 10:51:17 +02:00
Guillaume Abrioux	cefd471967	client: remove legacy code This seems to be a leftover. This commit removes an unnecessary 'set linux permissions' on `/var/lib/ceph` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-04 10:51:17 +02:00
Guillaume Abrioux	cf27c5e941	move selinux check to `ceph-defaults` This check is alone in `ceph-docker-common` since a previous code refactor. Moving this check in `ceph-defaults` allows us to run `ceph-clients` without having to run `ceph-docker-common` even in non-containerized deployment. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-04 10:51:17 +02:00
Sébastien Han	f3caee8460	ceph-iscsi: fix certificates generation and distribution Prior to this patch, the certificates where being generated on a single node only (because of the run_once: true). Thus certificates were not distributed on all the gateway nodes. This would require a second ansible run to work. This patches fix the creation and keys's distribution on all the nodes. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1540845 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-04 09:27:39 +02:00
Randy J. Martinez	ca572a11f1	ceph-mds: delete duplicate tasks which cause multimds container deployments to fail. This update will resolve error['cephfs' is undefined.] in multimds container deployments. See: roles/ceph-mon/tasks/create_mds_filesystems.yml. The same last two tasks are present there, and actully need to happen in that role since "{{ cephfs }}" gets defined in roles/ceph-mon/defaults/main.yml, and not roles/ceph-mds/defaults/main.yml. Signed-off-by: Randy J. Martinez <ramartin@redhat.com>	2018-03-29 09:32:40 +02:00
Alfredo Deza	3fcf966803	ceph-osd note that some scenarios use ceph-disk vs. ceph-volume Signed-off-by: Alfredo Deza <adeza@redhat.com>	2018-03-29 09:11:33 +02:00
John Fulton	e6e6bd078a	Refer to expected-num-ojects as expected_num_objects, not size Follow up patch to PR 2432 [1] which replaces "size" (sorry if the original bug used that term, which can be confusing) with expected_num_objects as is used in the Ceph documentation [2]. [1] https://github.com/ceph/ceph-ansible/pull/2432/files [2] http://docs.ceph.com/docs/jewel/rados/operations/pools	2018-03-26 15:41:51 +02:00
Ning Yao	691ddf5349	cleanup osd.conf.j2 in ceph-osd osd crush location is set by ceph_crush in the library, osd.conf.j2 is not used any more. Signed-off-by: Ning Yao <yaoning@unitedstack.com>	2018-03-26 15:57:37 +08:00
Patrick Donnelly	7f91547304	setup cephx keys when not nfs_obj_gw Copy the admin key when configured nfs_file_gw (but not nfs_obj_gw). Also, copy/setup RGW related directories only when configured as nfs_obj_gw. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2018-03-22 14:01:08 +01:00
Andrew Schoen	6cffbd5409	ceph-defaults: set is_atomic variable This variable is needed for containerized clusters and is required for the ceph-docker-common role. Typically the is_atomic variable is set in site-docker.yml.sample though so if ceph-docker-common is used outside of that playbook it needs set in another way. Moving the creation of the variable inside this role means playbooks don't need to worry about setting it. fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1558252 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-03-21 19:16:11 +01:00
Andy McCrae	388562a4af	Simplify ceph.conf generation Since the approach to creating a ceph.conf file has changed, and now no-longer relies on assembling config file fragments in /etc/ceph/ceph.d we can avoid the conf_overrides rendering on the local host and skip out the tasks related to that, instead using just the config_template task to configure the file directly.	2018-03-15 15:47:41 +01:00
Sébastien Han	e3275c1ca1	osd: add fs.aio-max-nr tuning The number of osds per nodes is limited by aio-max-nr, default is low, so we need to increase it. Full story: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-August/020408.html Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1553407 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-15 14:06:26 +01:00
Sébastien Han	f432819c1e	osd: apply systcl right away Without sysctl_set: yes the sysctm tuning will only get applied on the systctl.conf but not on the fly. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-15 14:06:26 +01:00
Sébastien Han	0f8a4251ba	move system tuning to osd role The changes from these tasks only apply to osd nodes so there is no reason to have them in ceph-common. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-15 14:06:26 +01:00
Sébastien Han	f119b25bbe	client: implement proper pools creation Just like we did for the monitor and openstack_config we now have the ability to precisely create pools. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-14 14:22:00 +01:00
Sébastien Han	e302c1baae	mon: add support for erasure code pool You can now specify type: erasure and erasure_profile to use when declaring the pool dictionnary. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-14 14:22:00 +01:00
Sébastien Han	277d885bc9	mon: add support for pgp, pool type and rule name When creating pools, it's crucial to expose all the options available as part of the pool creation command. As explained in: http://docs.ceph.com/docs/jewel/rados/operations/pools/ Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-14 14:22:00 +01:00
Sébastien Han	26bc00fb74	mon: fail if pool creation fails There is no reason to continue the deployment if these tasks fail. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1546185 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-14 14:22:00 +01:00
Sébastien Han	0011edd2bc	mon: add support for expected-num-objects This commit adds the support for expected-num-objects when creating a pool. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1541520 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-14 14:22:00 +01:00
Sébastien Han	18402b636f	defaults: add useful info if daemon are not restarted properly If OSDs don't restart normally we now also dump info of the crush map, crush rules, crush tree and pools. If the monitors don't restart normally we also print the socket status by calling mon_status and quorum_status. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-14 14:22:00 +01:00
jtudelag	691f7c5146	Adds handy ceph aliases whe containerized installations. Same approach as openshift-ansible etcdctl: * https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/etcd/tasks/auxiliary/drop_etcdctl.yml * https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/etcd/etcdctl.sh	2018-03-08 13:56:39 +01:00
Guillaume Abrioux	9181c94adf	client: fix pgs num for client pool creation The `pools` dict defined in `roles/ceph-client/defaults/main.yml` shouldn't have `{{ ceph_conf_overrides.global.osd_pool_default_pg_num }}` as default value for `pgs` keys. For instance, if you want some pools to be created but without explicitely specifying the pgs for these pools (it means you want to use the `osd_pool_default_pg_num`), you will be obliged to define `{{ ceph_conf_overrides.global.osd_pool_default_pg_num }}` anyway while you wanted to use the current default value already defined in the cluster which is retrieved early in the playbook and stored in the `{{ osd_pool_default_pg_num }}` fact. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-03-07 11:18:04 +01:00
Sébastien Han	96c049be5b	common: run updatedb task on debian systems only The command doesn't exist on Red Hat systems so it's better to skip it instead of ignoring the error. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-06 15:24:31 +00:00
Sébastien Han	a52ed43093	mon: fix osd_pool_default_crush_rule persistence and effectiveness Running the last portion (insert new default and add new default crush tasks) of crush_rules.yml only on the last monitor is wrong since ceph CLI calls usually end up on the master having the quorum, which is by default the one with the lower IP. So if we run the command and end up on another mon the creation will happen on the default crush rule because the particular mon hasn't been updated. To fix this we remove the \|last on the include and use run_once: true on certain tasks, then we let the final two tasks run on all the monitors. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-06 15:24:31 +00:00
Sébastien Han	47cef7a41d	mon: fix set crush default rule On releases after jewel the option 'osd_pool_default_crush_replicated_ruleset' does not exist anymore, it's called osd_pool_default_crush_rule. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-06 15:24:31 +00:00
Sébastien Han	3261ab23b8	osd: remove old crush_location implementation This was causing a lot of pain with the handlers. Also the implementation was not ideal since we were assembling files. Everything can now be done with the ceph_crush module so let's remove that. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-06 15:24:31 +00:00
Sébastien Han	73c4846744	mon: use ceph_crush module in the playbook Instead of creating the CRUSH hierarchy with Ansible tasks using the command module we now rely on the ceph_crush module. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-06 15:24:31 +00:00
Greg Charot	78c1f1938f	mons: Current crush_rule playbook does not work if there is no default rule defined (default: true). One could want to add new crush rules while keeping his current default rule. Fixed it so that it works with all rules defined as "default: false". If multiple rules are defined as default (should not be) then the last rule listed in "crush_rules" is taken as default.	2018-03-06 15:24:31 +00:00
Greg Charot	77f9c1df10	no reason the ceph-ansible ansible default provided crush_rule_hdd rule should be set as rack root + default ruleset	2018-03-06 15:24:31 +00:00
Greg Charot	50afc3fbf3	We don't want to automatically move the rbd pool to the new default crush rule. This operation shall be performed by the cluster operator.	2018-03-06 15:24:31 +00:00
Andy McCrae	04ca685ba7	Remove vars that are no longer used As part of `fcba2c801a` these vars were removed and no longer do anything: radosgw_dns_name radosgw_resolve_cname This patch removes them from the group_vars files and defaults/main.yml	2018-03-06 09:16:25 +01:00
jtudelag	c3267b77b7	Makes use of docker_exec_cmd in ceph-mon role. Keeps consistency inside the role and among roles. Makes the code more readable.	2018-03-05 12:48:35 +00:00
Sébastien Han	cb0f598965	common: run updatedb task on debian systems only The command doesn't exist on Red Hat systems so it's better to skip it instead of ignoring the error. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-02 20:59:10 +00:00
Sébastien Han	7f19df8196	rgw: add cluster name option to the handler If the cluster name is different than 'ceph', the command will fail so we need to pass the cluster name. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-02 20:59:10 +00:00
Sébastien Han	9c85280602	rgw: ability to copy ceph admin key on containerized If we now set copy_admin_key while running a containerized scenario, the ceph admin key will be copied on the node. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-02 20:59:10 +00:00
Sébastien Han	67f46d8ec3	rgw: run the handler on a mon host In case the admin wasn't copied over to the node this command would fail. So it's safer to run it from a monitor directly. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-02 20:59:10 +00:00
Guillaume Abrioux	6d35bc9bde	client: use `ceph_uid` fact to set uid/gid on admin key That task is failing on containerized deployment because `ceph:ceph` doesn't exist. The idea here is to use the `{{ ceph_uid }}` to set the ownerships for the admin keyring when containerized_deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1540578 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-02-26 15:52:05 +01:00
Grant Slater	1e1b26ca4d	mds: fix ansible_service_mgr typo This commit fixes a typo introduced by `4671b9e74e`	2018-02-26 13:05:14 +01:00
Andy McCrae	c33dae7509	Revert "[TEST] Test setting up correct systemd file for nfs-ganesha" The nfs-ganesha package has been fixed as part of this commit: `963b6681df` Once the package is rebuilt this should be good to merge. This reverts commit `e88af3c4cb`.	2018-02-26 10:23:42 +01:00
Giulio Fidente	a83e1aeea3	Make rule_name optional when defining items in openstack_pools Previously it was necessary to provide a value (eventually an empty string) for the "rule_name" key for each item in openstack_pools. This change makes that optional and defaults to empty string when not given.	2018-02-23 15:11:53 +01:00
Sébastien Han	165d9dec10	remove kernel.pid_max This is now managed by Ceph packages. See: https://github.com/ceph/ceph/pull/18544/files http://tracker.ceph.com/issues/21929 Closes: https://github.com/ceph/ceph-ansible/issues/2410 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-02-23 13:57:57 +01:00
Andy McCrae	2779d2a850	Adjust /etc/updatedb.conf to not parse /var/lib/ceph Using updatedb -e doesnt make a permanent change, but will updatedb without the passed path. To make this change more permanent we should update the /etc/updatedb.conf file to include /var/lib/ceph.	2018-02-20 11:32:56 +01:00
Andy McCrae	e88af3c4cb	[TEST] Test setting up correct systemd file for nfs-ganesha Don't merge this. Test to see if we copy over the nfs-ganesha-lock.service.debian8 file properly, whether the Xenial CI job will work. The upstream download.ceph.com nfs-ganesha package should be fixed for xenial (which is in progress).	2018-02-20 10:49:37 +01:00
Paul Bourke	463b5c6b22	Remove redundant task to check if atomic This fact is already set in site-docker.yml so there's no need to check it again in ceph-docker-common Signed-off-by: Paul Bourke <paul.bourke@oracle.com>	2018-02-19 10:10:46 +01:00
Andy McCrae	59a4335a56	Restart services if handler called This patch fixes an issue where if hosts have different service lists, it will prevent restarting changes on services that run later on. For example, hostA in the mons and rgws group would initiate a config change and restart of services on all mons and rgws hosts, even though a separate hostB (which is only in the rgws group) has not had its configuration changed yet. Additionally, when the second host has its coniguration changed as part of the ceph-rgw role, it will not initiate a restart since its inventory name != the first hosts. To fix this we should run the restart once (using run_once: True) as long as the host has called the handler. This will ensure that even if only 1 host has called the handler it will initiate a restart on all hosts that have called the handler. Additionally, we add a var that is set when the handler runs, this will ensure that only hosts that have called the handler get restarted. Includes minor fix to remove unrequired "inventory_hostname in play_hosts" when: clause. This is no longer required since the handlers were changed. The host calling the handler will be in play_hosts already.	2018-02-16 10:40:20 +01:00
Sébastien Han	c816a9282c	container: osd remove run_once When used along with delegate, run_once does not belong well. Thus, using \| last always brings the desired result. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-02-14 02:01:29 +01:00
Sébastien Han	d47d02a5eb	docker-common: fix container restart on new image We now look for any excisting containers, if any we compare their running image with the latest pulled container image. For OSDs, we iterate over the list of running OSDs, this handles the case where the first OSD of the list has been updated (runs the new image) and not the others. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1526513 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-02-14 02:01:29 +01:00
Sébastien Han	ebc195487c	default: remove duplicate code This is already defined in ceph-defaults. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-02-14 02:01:29 +01:00
Caleb Boylan	0be60456ce	osd: Add support for multipath disks Multipath disks have partitions with a different format than what ceph-ansible currently supports, this update makes ceph-ansible aware of that format so multipath disks can be used as OSDs Signed-off-by: Caleb Boylan <caleb.boylan@ormuco.com>	2018-02-09 18:06:25 +01:00
Andy McCrae	b4dbc862d6	Set application for OpenStack pools Since Luminous we need to set the application tag for each pool, otherwise a CEPH_WARNING is generated when the pools are in use. We should assign the OpenStack pools to their default which would be "rbd". When updating to Luminous this would happen automatically to the vms, images, backups and volumes pools, but for new deploys this is not the case.	2018-02-09 17:15:55 +01:00
Sébastien Han	22f843e3d4	default: define 'osd_scenario' variable osd_scenario does not exist in the ceph-default role so if we try to play ceph-default on an OSD node, the playbook will fail with undefined variable. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-02-08 17:42:12 +01:00
Guillaume Abrioux	e537779bb3	osd: fix osd restart when dmcrypt This commit fixes a bug that occurs especially for dmcrypt scenarios. There is an issue where the 'disk_list' container can't reach the ceph cluster because it's not launched with `--net=host`. If this container can't reach the cluster, it will hang on this step (when trying to retrieve the dm-crypt key) : ``` +common_functions.sh:448: open_encrypted_part(): ceph --cluster abc12 --name \ client.osd-lockbox.9138767f-7445-49e0-baad-35e19adca8bb --keyring \ /var/lib/ceph/osd-lockbox/9138767f-7445-49e0-baad-35e19adca8bb/keyring \ config-key get dm-crypt/osd/9138767f-7445-49e0-baad-35e19adca8bb/luks +common_functions.sh:452: open_encrypted_part(): base64 -d +common_functions.sh:452: open_encrypted_part(): cryptsetup --key-file \ -luksOpen /dev/sdb1 9138767f-7445-49e0-baad-35e19adca8bb ``` It means the `ceph-run-osd.sh` script won't be able to start the `osd_disk_activate` process in ceph-container because he won't have filled the `$DOCKER_ENV` environment variable properly. Adding `--net=host` to the 'disk_list' container fixes this issue. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1543284 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-02-08 15:45:13 +01:00
Giulio Fidente	bdcc52b96d	Check for docker sockets named after both _hostname or _fqdn While hostname -f will always return an hostname including its domain part and -s without the domain part, the behavior when no arguments are given can include or not include the domain part depending on how the system is configured; the socket name might not match the instance name then.	2018-02-06 14:16:54 +01:00
Greg Charot	a6d1922a2e	mon: Fixed crush_rule_config for containerised deployment. Was called too early, container was not yet started so the commands failed. Moved the section after include docker/main.yml Signed-off-by: Greg Charot <gcharot@redhat.com>	2018-02-06 05:12:59 +01:00
Guillaume Abrioux	dd0c98c5a2	common: do not use `shell` module when it is not needed There is no need here to use `shell` instead of `command` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-31 10:45:34 +01:00
Guillaume Abrioux	deaf273b25	syntax: change local_action syntax Use a nicer syntax for `local_action` tasks. We used to have oneliner like this: ``` local_action: wait_for port=22 host={{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }} state=started delay=10 timeout=500 }} ``` The usual syntax: ``` local_action: module: wait_for port: 22 host: "{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}" state: started delay: 10 timeout: 500 ``` is nicer and kind of way to keep consistency regarding the whole playbook. This also fix a potential issue about missing quotation : ``` Traceback (most recent call last): File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 213, in <module> main() File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 185, in main rc, out, err = module.run_command(args, executable=executable, use_unsafe_shell=shell, encoding=None, data=stdin) File "/tmp/ansible_wQtWsi/ansible_modlib.zip/ansible/module_utils/basic.py", line 2710, in run_command File "/usr/lib64/python2.7/shlex.py", line 279, in split return list(lex) File "/usr/lib64/python2.7/shlex.py", line 269, in next token = self.get_token() File "/usr/lib64/python2.7/shlex.py", line 96, in get_token raw = self.read_token() File "/usr/lib64/python2.7/shlex.py", line 172, in read_token raise ValueError, "No closing quotation" ValueError: No closing quotation ``` writing `local_action: shell echo {{ fsid }} \| tee {{ fetch_directory }}/ceph_cluster_uuid.conf` can cause trouble because it's complaining with missing quotes, this fix solves this issue. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1510555 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-31 10:45:34 +01:00
Sébastien Han	6f9dd26caa	config: remove any spaces in public_network or cluster_network With two public networks configured - we found that with "NETWORK_ADDR_1, NETWORK_ADDR_2" install process consistently became broken, trying to find docker registry on second network, and not finding mon container. but without spaces "NETWORK_ADDR_1,NETWORK_ADDR_2" install succeeds so, containerized install is more peculiar with formatting of this line Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1534003 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-01-30 17:47:15 +01:00
Sébastien Han	5132cc3de4	Do not search osd ids if ceph-volume Description of problem: The 'get osd id' task goes through all the 10 times (and its respective timeouts) to make sure that the number of OSDs in the osd directory match the number of devices. This happens always, regardless if the setup and deployment is correct. Version-Release number of selected component (if applicable): Surely the latest. But any ceph-ansible version that contains ceph-volume support is affected. How reproducible: 100% Steps to Reproduce: 1. Use ceph-volume (LVM) to deploy OSDs 2. Avoid using anything in the 'devices' section 3. Deploy the cluster Actual results: TASK [ceph-osd : get osd id _uses_shell=True, _raw_params=ls /var/lib/ceph/osd/ \| sed 's/.-//'] ********************************************************************************************************************************************* task path: /Users/alfredo/python/upstream/ceph/src/ceph-volume/ceph_volume/tests/functional/lvm/.tox/xenial-filestore-dmcrypt/tmp/ceph-ansible/roles/ceph-osd/tasks/start_osds.yml:6 FAILED - RETRYING: get osd id (10 retries left). FAILED - RETRYING: get osd id (9 retries left). FAILED - RETRYING: get osd id (8 retries left). FAILED - RETRYING: get osd id (7 retries left). FAILED - RETRYING: get osd id (6 retries left). FAILED - RETRYING: get osd id (5 retries left). FAILED - RETRYING: get osd id (4 retries left). FAILED - RETRYING: get osd id (3 retries left). FAILED - RETRYING: get osd id (2 retries left). FAILED - RETRYING: get osd id (1 retries left). ok: [osd0] => { "attempts": 10, "changed": false, "cmd": "ls /var/lib/ceph/osd/ \| sed 's/.*-//'", "delta": "0:00:00.002717", "end": "2018-01-21 18:10:31.237933", "failed": true, "failed_when_result": false, "rc": 0, "start": "2018-01-21 18:10:31.235216" } STDOUT: 0 1 2 Expected results: There aren't any (or just a few) timeouts while the OSDs are found Additional info: This is happening because the check is mapping the number of "devices" defined for ceph-disk (in this case it would be 0) to match the number of OSDs found. Basically this line: until: osd_id.stdout_lines\|length == devices\|unique\|length Means in this 2 OSD case it is trying to ensure the following incorrect condition: until: 2 == 0 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1537103	2018-01-30 14:44:38 +01:00
Andy McCrae	481173f203	Add default for radosgw_keystone_ssl This should default to False. The default for Keystone is not to use PKI keys, additionally, anybody using this setting had to have been manually setting it before. Fixes: #2111	2018-01-30 11:30:23 +01:00
Guillaume Abrioux	f1232b33fd	Revert "monitor_interface: document need to use monitor_address when using IPv6" This reverts commit `10b91661ce`. This reverts also the same comment added in `1359869497`	2018-01-29 14:43:24 +01:00
Eduard Egorov	93e9f3723b	config: add host-specific ceph_conf_overrides evaluation and generation. This allows us to use host-specific variables in ceph_conf_overrides variable. For example, this fixes usage of such variables (e.g. 'nss db path' having {{ ansible_hostname }} inside) in ceph_conf_overrides for rados gateway configuration (see profiles/rgw-keystone-v3) - issue #2157. Signed-off-by: Eduard Egorov <eduard.egorov@icl-services.com>	2018-01-26 10:15:03 +01:00
Guillaume Abrioux	ec16cbdb1a	defaults: avoid getting stuck (ceph --connect-timeout) Sometime the playbook gets stuck because even with `--connect-timeout=` option, the connexion to the existing ceph cluster never timeout. As a workaround, using `timeout` command provided by coreutils will actually timeout if we can't connect to the cluster. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1537003 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-25 10:15:59 +01:00
Andrew Schoen	79473badfe	ceph-osd: adds dmcrypt to the lvm scenario Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-01-24 14:10:08 +01:00
Guillaume Abrioux	9306a1789c	osds: change default value for `dedicated_devices` This is to keep backward compatibility with stable-2.2 and satisfy the check "verify dedicated devices have been provided" in `check_mandatory_vars.yml`. This check is looking for `dedicated_devices` so we need to default it's value to `raw_journal_devices` when `raw_multi_journal` is set to `True`. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1536098 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-22 18:02:51 +01:00
Sébastien Han	f88795e843	rgw: disable legacy unit Some systems that were deployed with old tools can leave units named "ceph-radosgw@radosgw.gateway.service". As a consequence, they will prevent the new unit to start. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1509584 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-01-18 14:12:18 +01:00
Andrew Schoen	fb4a6dc9a4	docs for the crush_device_class option of lvm_volumes Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-01-17 13:49:29 +01:00
Andrew Schoen	6cbb56a3b6	ceph-osd: adds the crush_device_class param to the lvm scenario Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-01-17 13:49:29 +01:00
Eduard Egorov	7d7080df6c	crush: create rack type buckets and build crush tree according to {{ osd_crush_location }}. Currently, we can define crush location for each host but only crush roots and crush rules are created. This commit automates other routines for a complete solution: 1) Creates rack type crush buckets defined in {{ ceph_crush_rack }} of each osd host. If it's not defined by user then a rack named 'default_rack_{{ ceph_crush_root }}' would be added and used in next steps. 2) Move rack type crush buckets defined in {{ ceph_crush_rack }} into crush roots defined in {{ ceph_crush_root }} of each osd host. 3) Move hosts defined in {{ ceph_crush_rack }} into crush roots defined in {{ ceph_crush_root }} of each osd host. Signed-off-by: Eduard Egorov <eduard.egorov@icl-services.com>	2018-01-11 17:42:18 +01:00
Sébastien Han	6db4aea453	osd: skip devices marked as '/dev/dead' On a non-collocated scenario, if a drive is faulty we can't really remove it from the list of 'devices' without messing up or having to re-arrange the order of the 'dedicated_devices'. We want to keep this device list ordered. This will prevent the activation failing on a device that we know is failing but we can't remove it yet to not mess up the dedicated_devices mapping with devices. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-01-11 17:34:32 +01:00
Guillaume Abrioux	70401f955b	container: trigger handlers on systemd file change When a systemd unit file is changed we should trigger handlers to restart the services. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-10 16:46:42 +01:00
Guillaume Abrioux	b29a42cba6	handlers: avoid duplicate handler Having handlers in both ceph-defaults and ceph-docker-common roles can make the playbook restarting two times services. Handlers can be triggered first time because of a change in ceph.conf and a second time because a new image has been pulled. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-10 16:46:42 +01:00
Sébastien Han	8a19a83354	container: restart container when there is a new image This wasn't any good choice to implement this. We had several options and none of them were ideal since handlers can not be triggered cross-roles. We could have achieved that by doing: * option 1 was to add a dependancy in the meta of the ceph-docker-common role. We had that long ago and we decided to stop so everything is managed via site.yml * option 2 was to import files from another role. This is messy and we don't that anywhere in the current code base. We will continue to do so. There is option 3 where we pull the image from the ceph-config role. This is not suitable as well since the docker command won't be available unless you run Atomic distro. This would also mean that you're trying to pull twice. First time in ceph-config, second time in ceph-docker-common The only option I came up with was to duplicate a bit of the ceph-config handlers code. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1526513 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-01-10 16:46:42 +01:00
Guillaume Abrioux	900f447c82	containers: fix bug when looking for existing cluster When containerized deployment, `docker_exec_cmd` is not set before the task which try to retrieve the current fsid is played, it means it considers there is no existing fsid and try to generate a new one. Typical error: ``` ok: [mon0 -> mon0] => { "changed": false, "cmd": [ "ceph", "--connect-timeout", "3", "--cluster", "test", "fsid" ], "delta": "0:00:00.179909", "end": "2018-01-09 10:36:58.759846", "failed": false, "failed_when_result": false, "rc": 1, "start": "2018-01-09 10:36:58.579937" } STDERR: Error initializing cluster client: Error('error calling conf_read_file: errno EINVAL',) ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-10 16:23:18 +01:00
Sébastien Han	c2e04623a5	container: change the way we force no logs inside the container Previously we were using ceph_conf_overrides however this doesn't play nice for softwares like TripleO that uses ceph_conf_overrides inside its own code. For now, and since this is the only occurence of this, we can ensure no logs through the ceph conf template. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1532619 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-01-10 16:21:47 +01:00
Guillaume Abrioux	acfbebe67e	defaults: rename check_socket files for containers When containerized deployment, we are not looking for a socket but for a running container. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-10 15:44:47 +01:00
Sébastien Han	f0787e64da	mon: use crush rules for non-container too There is no reasons why we can't use crush rules when deploying containers. So moving the inlcude in the main.yml so it can be called. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-01-10 15:21:36 +01:00
Sébastien Han	97f520bc74	containers: bump memory limit A default value of 4GB for MDS is more appropriate and 3GB for OSD also. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1531607 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-01-09 11:26:50 +01:00
Sébastien Han	0b55abe3d0	mon: always run ceph-create-keys ceph-create-keys is idempotent so it's not an issue to run it each time we play ansible. This also fix issues where the 'creates' arg skips the task and no keys get generated on newer version, e.g during an upgrade. Closes: https://github.com/ceph/ceph-ansible/issues/2228 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-12-21 13:50:01 +01:00
Sébastien Han	ad54e19262	rgw: disable legacy rgw service unit When upgrading from OSP11 to OSP12 container, ceph-ansible attempts to disable the RGW service provided by the overcloud image. The task attempts to stop/disable ceph-rgw@{{ ansible-hostname }} and ceph-radosgw@{{ ansible-hostname }}.service. The actual service name is ceph-radosgw@radosgw.$name Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1525209 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-12-21 13:48:42 +01:00
Guillaume Abrioux	895949d6c4	osd: fix check gpt the gpt label creation doesn't work even with parted module. This commit fixes the gpt label creation by using parted command instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-12-20 17:42:45 +01:00
Sébastien Han	bbc79765f3	osd: best effort if no device is found during activation We have a scenario when we switch from non-container to containers. This means we don't know anything about the ceph partitions associated to an OSD. Normally in a containerized context we have files containing the preparation sequence. From these files we can get the capabilities of each OSD. As a last resort we use a ceph-disk call inside a dummy bash container to discover the ceph journal on the current osd. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1525612 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-12-19 14:40:48 +01:00
Sébastien Han	dfbef8361d	nfs: fix package install for debian/suss systems This resolves the following error: E: There were unauthenticated packages and -y was used without --allow-unauthenticated Signed-off-by: Sébastien Han <seb@redhat.com>	2017-12-19 13:30:49 +01:00
Christian Berendt	50a848dc40	Rename fact docker_version to ceph_docker_version The name docker_version is very generic and is also used by other roles. As a result, there may be name conflicts. To avoid this a ceph_ prefix should be used for this fact. Since it is an internal fact renaming is not a problem.	2017-12-15 20:12:21 +01:00
Markos Chandras	162b7d2b23	roles: ceph-mgr: Install the ceph-mgr package on SUSE The ceph-mgr package name is identical to RedHat so add the SUSE family to the existing task.	2017-12-15 09:22:14 +01:00
Guillaume Abrioux	a24fd1cfd9	client: don't make `osd_pool_default_pg_num` mandatory making `osd_pool_default_pg_num` mandatory is a bit agressive and is unrelated when you just want to create users keyrings. Closes: #2241 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-12-14 17:22:07 +01:00
Guillaume Abrioux	ab1dd3027a	client: don't try to generate keys the entrypoint to generate users keyring is `ceph-authtool`, therefore, it can expand the `$(ceph-authtool --gen-print-key)` inside the container. Users must generate a keyring themselves. This commit also adds a check to ensure keyring are properly filled when `user_config: true`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-12-14 17:22:07 +01:00
Guillaume Abrioux	26afe46e13	docker: add missing condition for selinux tasks on `client` and `mds` roles, it tries to set selinux even on non rhel based distributions.` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-12-14 17:00:14 +01:00
Sébastien Han	7eaf444328	default: look for the right return code on socket stat in-use As reported in https://github.com/ceph/ceph-ansible/issues/2254, the check with fuser is not ideal. If fuser is not available the return code is 127. Here we want to make sure that we looking for the correct return code, so 1. Closes: https://github.com/ceph/ceph-ansible/issues/2254 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-12-14 16:59:14 +01:00
John Fulton	8cba44262c	Add flags for OSD 'docker run --cpuset-{cpus,mems}' Add the variables ceph_osd_docker_cpuset_cpus and ceph_osd_docker_cpuset_mems, so that a user may specify the CPUs and memory nodes of NUMA systems on which OSD containers are run. Provides a example in osds.yaml.sample to guide user based on sample `lscpu` output since cpuset-mems refers to the memory by NUMA node only while cpuset-cpus can refer to individual vCPUs within a NUMA node.	2017-12-14 16:39:35 +01:00

1 2 3 4 5 ...

1781 Commits (4baa8389e03d9014a44f53137207b9560546511e)