ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	a68091c923	tests: update the type for the rule used in pools As of ceph 12.2.5 the type of the parameter `type` is not a name anymore but an id, therefore an `int` is expected otherwise it will fail with the following error Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-30 08:15:18 +02:00
Sébastien Han	12eebc31fb	mon/client: honor key mode when copying it to other nodes The last mon creates the keys with a particular mode, while copying them to the other mons (first and second) we must re-use the mode that was set. The same applies for the client node, the slurp preserves the initial 'item' so we can get the mode for the copy. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-23 18:34:58 +02:00
Sébastien Han	74494253fa	mon: remove redundant copy task We had twice the same task, also one was overriding the mode. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-23 18:34:58 +02:00
Sébastien Han	85732d11b9	mon/client: remove acl code Applying ACL on the keyrings is not used anymore so let's remove this code. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-23 18:34:58 +02:00
Sébastien Han	cfe8e51d99	mon/client: apply mode from ceph_key Do not use a dedicated task for this but use the ceph_key module capability to set file mode. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-23 18:34:58 +02:00
Di Xu	113eb25424	add AArch64 to supported architecture works on AArch64 platform	2018-04-23 10:23:21 +02:00
Sébastien Han	949507d304	mon: remove mgr key from ceph_config_keys This key is created after the last mon is up so there is no need to try to push it from the first mon. The initia mon container is not creating the mgr key, ansible does. So this key will never exist. The key will go into the fetch dir once the last mon is up, then when the ceph-mgr plays it will try to get it from the fetch directory. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-23 10:17:24 +02:00
Sébastien Han	35c1eb7183	mon: remove mon map from ceph_config_keys During the initial bootstrap of the first mon, the monmap file is destroyed so it's not available and ansible will never find it. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-23 10:17:24 +02:00
Sébastien Han	65ba85aff6	Expose /var/run/ceph Useful for softwares that do data collection/monitoring like collectd. They can connect to the socket and then retrieve information. Even though the sockets are exposed now, I'm keeping the docker exec to check the socket, this will allow newer version of ceph-ansible to work with older versions. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1563280 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-20 15:48:32 +02:00
Sébastien Han	bf1e70e8cf	default: extent ceph_uid and gid We now have the ability to detect the uid/gid of the ceph user depending on the distribution we are running on and so we are doing non-container deployements. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-20 15:48:32 +02:00
Sébastien Han	f3656ad167	move create ceph initial directories to default This is needed for both non-container and container deployments. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-20 15:48:32 +02:00
Sébastien Han	641f141c0f	selinux: remove chcon calls We know bindmount with the :z option at the end of the -v command so this will basically run the exact same command as we used to run. So to speak: chcon -Rt svirt_sandbox_file_t /var/lib/ceph Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-19 14:59:37 +02:00
Sébastien Han	90e47c5fb0	client: add a --rm option to run the container This fixes the case where the playbook died and never removed the container. So now, once the container exits it will remove itself from the container list. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1568157 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-19 14:59:37 +02:00
Sébastien Han	6c742376fd	client: import the key in ceph is copy_admin_key is true If the user has set copy_admin_key to true we assume he/she wants to import the key in Ceph and not only create the key on the filesystem. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-18 17:46:54 +02:00
Sébastien Han	424815501a	client: add quotes to the dict values ceph-authtool does not support raw arguements so we have to quote caps declaration like this allow 'bla bla' instead of allow bla bla Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1568157 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-18 17:46:54 +02:00
Sébastien Han	d2a2793cb0	refactor the way we copy keys This commit does a couple of things: * use a common.yml file that contains things that can be played on both container and non-container * refactor the ability to copy the admin key to the nodes Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-18 16:46:33 +02:00
Randy J. Martinez	127a643fd0	ceph-defaults: fix ceph_uid fact on container deployments Red Hat is now using tags[3,latest] for image rhceph/rhceph-3-rhel7. Because of this, the ceph_uid conditional passes for Debian when 'ceph_docker_image_tag: latest' on RH deployments. I've added an additional task to check for rhceph image specifically, and also updated the RH family task for ceph/daemon [centos\|fedora]tags. Signed-off-by: Randy J. Martinez <ramartin@redhat.com>	2018-04-17 16:54:51 +02:00
Sébastien Han	a98885a71e	rhcs: re-add apt-pining When installing rhcs on Debian systems the red hat repos must have the highest priority so we avoid packages conflicts and install the rhcs version. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1565850 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-17 16:07:06 +02:00
Guillaume Abrioux	899b0eb451	defaults: check only 1 time if there is a running cluster There is no need to check for a running cluster n*nodes time in `ceph-defaults` so let's add a `run_once: true` to save some resources and time. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-16 11:23:00 +02:00
Sébastien Han	5bbbce527e	osd: do not do anything if the dev has a partition Regardless if the partition is 'ceph' or something else, we don't want to be as strick as checking for a particular partition. If the drive has a partition, we just don't do anything. This solves the case where the server reboots, disks get a different /dev/sda (node) allocation. In this case, prior to restarting the server /dev/sda was an OSD, but now it's /dev/sdb and the other way around. In such scenario, we will try to prepare the OSD and create a new partition, so let's not mess around with devices that have partitions. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1498303 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-13 19:11:15 +02:00
Sébastien Han	37117071eb	common: add tools repo for iscsi gw To install iscsi gw packages we need to enable the tools repo. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1547849 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-12 13:38:34 +02:00
Douglas Fuller	c8573fe0d7	Remove deprecated allow_multimds allow_multimds will be officially deprecated in Mimic, specify it only for all versions of Ceph where it was declared stable. Going forward, specify only max_mds. Signed-off-by: Douglas Fuller <dfuller@redhat.com>	2018-04-12 10:29:17 +02:00
vasishta p shastry	020e66c1b4	Fixed a typo (extra space)	2018-04-11 14:21:15 +02:00
vasishta p shastry	e1a1f81b6f	osd: to support copy_admin_key	2018-04-11 14:21:15 +02:00
vasishta p shastry	db3a5ce6d9	mds: to support copy_admin_keyring	2018-04-11 14:21:15 +02:00
vasishta p shastry	6b59416f75	nfs: to support copy_admin_key - containerized	2018-04-11 14:21:15 +02:00
Ali Maredia	01c58695fc	nfs: ensure nfs-server server is stopped NFS-ganesha cannot start is the nfs-server service is running. This commit stops nfs-server in case it is running on a (debian, redhat, suse) node before the nfs-ganesha service starts up fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1508506 Signed-off-by: Ali Maredia <amaredia@redhat.com>	2018-04-11 14:00:48 +02:00
Ramana Raja	4a430ae29a	ceph-nfs: allow disabling ganesha caching Add a variable, ceph_nfs_disable_caching, that if set to true disables ganesha's directory and attribute caching as much as possible. Also, disable caching done by ganesha, when 'nfs_file_gw' variable is true, i.e., when Ganesha is used as CephFS's gateway. This is the recommended Ganesha setting as libcephfs already caches information. And doing so helps avoid cache incoherency issues especially with clustered ganesha over CephFS. Fixes: https://tracker.ceph.com/issues/23393 Signed-off-by: Ramana Raja <rraja@redhat.com>	2018-04-11 13:56:40 +02:00
Sébastien Han	82ccbdafbc	ceph-defaults: bring backward compatibility for old syntax If people keep on using the mon_cap, osd_cap etc the playbook will translate this old syntax on the flight. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-11 12:18:34 +02:00
Sébastien Han	9657e4d6fa	ceph_key: use ceph_key in the playbook Replaced all the occurence of raw command using the 'command' module with the ceph_key module instead. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-11 12:18:34 +02:00
Guillaume Abrioux	66c4118dcd	defaults: fix backward compatibility backward compatibility with `ceph_mon_docker_interface` and `ceph_mon_docker_subnet` was not working since there wasn't lookup on `monitor_interface` and `public_network` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-10 00:19:11 +02:00
Ken Dreyer	3752cc6f38	common: upgrade/install ceph-test RPM first Prior to this change, if a user had ceph-test-12.2.1 installed, and upgraded to ceph v12.2.3 or newer, the RPM upgrade process would fail. The problem is that the ceph-test RPM did not depend on an exact version of ceph-common until v12.2.3. In Ceph v12.2.3, ceph-{osdomap,kvstore,monstore}-tool binaries moved from ceph-test into ceph-base. When ceph-test is not yet up-to-date, Yum encounters package conflicts between the older ceph-test and newer ceph-base. When all users have upgraded beyond Ceph < 12.2.3, this is no longer relevant.	2018-04-09 18:09:52 +02:00
Sébastien Han	bb60f2fea4	ceph-defaults: fix ceoh_uid for container image tag latest According to our recent change, we now use "CentOS" as a latest container image. We need to reflect this on the ceph_uid. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-09 13:54:55 +02:00
Zack Cerza	0123d790cd	Use the CentOS repo for Red Hat dev packages No use even trying to use something that doesn't exist. Signed-off-by: Zack Cerza <zack@redhat.com>	2018-04-09 10:05:57 +02:00
Attila Fazekas	ecd3563c21	Deploying without managed monitors failed Tripleo deployment failed when the monitors not manged by tripleo itself with: FAILED! => {"msg": "list object has no element 0"} The failing play item was introduced by `f46217b69a` . fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1552327 Signed-off-by: Attila Fazekas <afazekas@redhat.com>	2018-04-04 18:16:46 +02:00
Guillaume Abrioux	dcf6a246a4	defaults: remove `run_once: true` when creating fetch_directory because of `serial: 1`, it can be an issue when the playbook is being run on client nodes. Since the refact of `ceph-client` we skip the role `ceph-defaults` on every node except the first client node, it means that the task is not going to be played because of `run_once: true`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-04 10:51:17 +02:00
Guillaume Abrioux	18c0c7a508	config: use fact `ceph_uid` Use fact `ceph_uid` in the task which ensures `/etc/ceph` exists in containerized deployments. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-04 10:51:17 +02:00
Guillaume Abrioux	9c979c6390	clients: refact `ceph-clients` role This commit refacts this role so we don't have to pull container image on client nodes just to create pools and keys. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1550977 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-04 10:51:17 +02:00
Guillaume Abrioux	cefd471967	client: remove legacy code This seems to be a leftover. This commit removes an unnecessary 'set linux permissions' on `/var/lib/ceph` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-04 10:51:17 +02:00
Guillaume Abrioux	cf27c5e941	move selinux check to `ceph-defaults` This check is alone in `ceph-docker-common` since a previous code refactor. Moving this check in `ceph-defaults` allows us to run `ceph-clients` without having to run `ceph-docker-common` even in non-containerized deployment. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-04 10:51:17 +02:00
Sébastien Han	f3caee8460	ceph-iscsi: fix certificates generation and distribution Prior to this patch, the certificates where being generated on a single node only (because of the run_once: true). Thus certificates were not distributed on all the gateway nodes. This would require a second ansible run to work. This patches fix the creation and keys's distribution on all the nodes. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1540845 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-04 09:27:39 +02:00
Randy J. Martinez	ca572a11f1	ceph-mds: delete duplicate tasks which cause multimds container deployments to fail. This update will resolve error['cephfs' is undefined.] in multimds container deployments. See: roles/ceph-mon/tasks/create_mds_filesystems.yml. The same last two tasks are present there, and actully need to happen in that role since "{{ cephfs }}" gets defined in roles/ceph-mon/defaults/main.yml, and not roles/ceph-mds/defaults/main.yml. Signed-off-by: Randy J. Martinez <ramartin@redhat.com>	2018-03-29 09:32:40 +02:00
Alfredo Deza	3fcf966803	ceph-osd note that some scenarios use ceph-disk vs. ceph-volume Signed-off-by: Alfredo Deza <adeza@redhat.com>	2018-03-29 09:11:33 +02:00
John Fulton	e6e6bd078a	Refer to expected-num-ojects as expected_num_objects, not size Follow up patch to PR 2432 [1] which replaces "size" (sorry if the original bug used that term, which can be confusing) with expected_num_objects as is used in the Ceph documentation [2]. [1] https://github.com/ceph/ceph-ansible/pull/2432/files [2] http://docs.ceph.com/docs/jewel/rados/operations/pools	2018-03-26 15:41:51 +02:00
Ning Yao	691ddf5349	cleanup osd.conf.j2 in ceph-osd osd crush location is set by ceph_crush in the library, osd.conf.j2 is not used any more. Signed-off-by: Ning Yao <yaoning@unitedstack.com>	2018-03-26 15:57:37 +08:00
Patrick Donnelly	7f91547304	setup cephx keys when not nfs_obj_gw Copy the admin key when configured nfs_file_gw (but not nfs_obj_gw). Also, copy/setup RGW related directories only when configured as nfs_obj_gw. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2018-03-22 14:01:08 +01:00
Andrew Schoen	6cffbd5409	ceph-defaults: set is_atomic variable This variable is needed for containerized clusters and is required for the ceph-docker-common role. Typically the is_atomic variable is set in site-docker.yml.sample though so if ceph-docker-common is used outside of that playbook it needs set in another way. Moving the creation of the variable inside this role means playbooks don't need to worry about setting it. fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1558252 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-03-21 19:16:11 +01:00
Andy McCrae	388562a4af	Simplify ceph.conf generation Since the approach to creating a ceph.conf file has changed, and now no-longer relies on assembling config file fragments in /etc/ceph/ceph.d we can avoid the conf_overrides rendering on the local host and skip out the tasks related to that, instead using just the config_template task to configure the file directly.	2018-03-15 15:47:41 +01:00
Sébastien Han	e3275c1ca1	osd: add fs.aio-max-nr tuning The number of osds per nodes is limited by aio-max-nr, default is low, so we need to increase it. Full story: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-August/020408.html Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1553407 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-15 14:06:26 +01:00
Sébastien Han	f432819c1e	osd: apply systcl right away Without sysctl_set: yes the sysctm tuning will only get applied on the systctl.conf but not on the fly. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-15 14:06:26 +01:00
Sébastien Han	0f8a4251ba	move system tuning to osd role The changes from these tasks only apply to osd nodes so there is no reason to have them in ceph-common. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-15 14:06:26 +01:00
Sébastien Han	f119b25bbe	client: implement proper pools creation Just like we did for the monitor and openstack_config we now have the ability to precisely create pools. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-14 14:22:00 +01:00
Sébastien Han	e302c1baae	mon: add support for erasure code pool You can now specify type: erasure and erasure_profile to use when declaring the pool dictionnary. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-14 14:22:00 +01:00
Sébastien Han	277d885bc9	mon: add support for pgp, pool type and rule name When creating pools, it's crucial to expose all the options available as part of the pool creation command. As explained in: http://docs.ceph.com/docs/jewel/rados/operations/pools/ Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-14 14:22:00 +01:00
Sébastien Han	26bc00fb74	mon: fail if pool creation fails There is no reason to continue the deployment if these tasks fail. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1546185 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-14 14:22:00 +01:00
Sébastien Han	0011edd2bc	mon: add support for expected-num-objects This commit adds the support for expected-num-objects when creating a pool. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1541520 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-14 14:22:00 +01:00
Sébastien Han	18402b636f	defaults: add useful info if daemon are not restarted properly If OSDs don't restart normally we now also dump info of the crush map, crush rules, crush tree and pools. If the monitors don't restart normally we also print the socket status by calling mon_status and quorum_status. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-14 14:22:00 +01:00
jtudelag	691f7c5146	Adds handy ceph aliases whe containerized installations. Same approach as openshift-ansible etcdctl: * https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/etcd/tasks/auxiliary/drop_etcdctl.yml * https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/etcd/etcdctl.sh	2018-03-08 13:56:39 +01:00
Guillaume Abrioux	9181c94adf	client: fix pgs num for client pool creation The `pools` dict defined in `roles/ceph-client/defaults/main.yml` shouldn't have `{{ ceph_conf_overrides.global.osd_pool_default_pg_num }}` as default value for `pgs` keys. For instance, if you want some pools to be created but without explicitely specifying the pgs for these pools (it means you want to use the `osd_pool_default_pg_num`), you will be obliged to define `{{ ceph_conf_overrides.global.osd_pool_default_pg_num }}` anyway while you wanted to use the current default value already defined in the cluster which is retrieved early in the playbook and stored in the `{{ osd_pool_default_pg_num }}` fact. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-03-07 11:18:04 +01:00
Sébastien Han	96c049be5b	common: run updatedb task on debian systems only The command doesn't exist on Red Hat systems so it's better to skip it instead of ignoring the error. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-06 15:24:31 +00:00
Sébastien Han	a52ed43093	mon: fix osd_pool_default_crush_rule persistence and effectiveness Running the last portion (insert new default and add new default crush tasks) of crush_rules.yml only on the last monitor is wrong since ceph CLI calls usually end up on the master having the quorum, which is by default the one with the lower IP. So if we run the command and end up on another mon the creation will happen on the default crush rule because the particular mon hasn't been updated. To fix this we remove the \|last on the include and use run_once: true on certain tasks, then we let the final two tasks run on all the monitors. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-06 15:24:31 +00:00
Sébastien Han	47cef7a41d	mon: fix set crush default rule On releases after jewel the option 'osd_pool_default_crush_replicated_ruleset' does not exist anymore, it's called osd_pool_default_crush_rule. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-06 15:24:31 +00:00
Sébastien Han	3261ab23b8	osd: remove old crush_location implementation This was causing a lot of pain with the handlers. Also the implementation was not ideal since we were assembling files. Everything can now be done with the ceph_crush module so let's remove that. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-06 15:24:31 +00:00
Sébastien Han	73c4846744	mon: use ceph_crush module in the playbook Instead of creating the CRUSH hierarchy with Ansible tasks using the command module we now rely on the ceph_crush module. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-06 15:24:31 +00:00
Greg Charot	78c1f1938f	mons: Current crush_rule playbook does not work if there is no default rule defined (default: true). One could want to add new crush rules while keeping his current default rule. Fixed it so that it works with all rules defined as "default: false". If multiple rules are defined as default (should not be) then the last rule listed in "crush_rules" is taken as default.	2018-03-06 15:24:31 +00:00
Greg Charot	77f9c1df10	no reason the ceph-ansible ansible default provided crush_rule_hdd rule should be set as rack root + default ruleset	2018-03-06 15:24:31 +00:00
Greg Charot	50afc3fbf3	We don't want to automatically move the rbd pool to the new default crush rule. This operation shall be performed by the cluster operator.	2018-03-06 15:24:31 +00:00
Andy McCrae	04ca685ba7	Remove vars that are no longer used As part of `fcba2c801a` these vars were removed and no longer do anything: radosgw_dns_name radosgw_resolve_cname This patch removes them from the group_vars files and defaults/main.yml	2018-03-06 09:16:25 +01:00
jtudelag	c3267b77b7	Makes use of docker_exec_cmd in ceph-mon role. Keeps consistency inside the role and among roles. Makes the code more readable.	2018-03-05 12:48:35 +00:00
Sébastien Han	cb0f598965	common: run updatedb task on debian systems only The command doesn't exist on Red Hat systems so it's better to skip it instead of ignoring the error. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-02 20:59:10 +00:00
Sébastien Han	7f19df8196	rgw: add cluster name option to the handler If the cluster name is different than 'ceph', the command will fail so we need to pass the cluster name. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-02 20:59:10 +00:00
Sébastien Han	9c85280602	rgw: ability to copy ceph admin key on containerized If we now set copy_admin_key while running a containerized scenario, the ceph admin key will be copied on the node. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-02 20:59:10 +00:00
Sébastien Han	67f46d8ec3	rgw: run the handler on a mon host In case the admin wasn't copied over to the node this command would fail. So it's safer to run it from a monitor directly. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-02 20:59:10 +00:00
Guillaume Abrioux	6d35bc9bde	client: use `ceph_uid` fact to set uid/gid on admin key That task is failing on containerized deployment because `ceph:ceph` doesn't exist. The idea here is to use the `{{ ceph_uid }}` to set the ownerships for the admin keyring when containerized_deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1540578 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-02-26 15:52:05 +01:00
Grant Slater	1e1b26ca4d	mds: fix ansible_service_mgr typo This commit fixes a typo introduced by `4671b9e74e`	2018-02-26 13:05:14 +01:00
Andy McCrae	c33dae7509	Revert "[TEST] Test setting up correct systemd file for nfs-ganesha" The nfs-ganesha package has been fixed as part of this commit: `963b6681df` Once the package is rebuilt this should be good to merge. This reverts commit `e88af3c4cb`.	2018-02-26 10:23:42 +01:00
Giulio Fidente	a83e1aeea3	Make rule_name optional when defining items in openstack_pools Previously it was necessary to provide a value (eventually an empty string) for the "rule_name" key for each item in openstack_pools. This change makes that optional and defaults to empty string when not given.	2018-02-23 15:11:53 +01:00
Sébastien Han	165d9dec10	remove kernel.pid_max This is now managed by Ceph packages. See: https://github.com/ceph/ceph/pull/18544/files http://tracker.ceph.com/issues/21929 Closes: https://github.com/ceph/ceph-ansible/issues/2410 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-02-23 13:57:57 +01:00
Andy McCrae	2779d2a850	Adjust /etc/updatedb.conf to not parse /var/lib/ceph Using updatedb -e doesnt make a permanent change, but will updatedb without the passed path. To make this change more permanent we should update the /etc/updatedb.conf file to include /var/lib/ceph.	2018-02-20 11:32:56 +01:00
Andy McCrae	e88af3c4cb	[TEST] Test setting up correct systemd file for nfs-ganesha Don't merge this. Test to see if we copy over the nfs-ganesha-lock.service.debian8 file properly, whether the Xenial CI job will work. The upstream download.ceph.com nfs-ganesha package should be fixed for xenial (which is in progress).	2018-02-20 10:49:37 +01:00
Paul Bourke	463b5c6b22	Remove redundant task to check if atomic This fact is already set in site-docker.yml so there's no need to check it again in ceph-docker-common Signed-off-by: Paul Bourke <paul.bourke@oracle.com>	2018-02-19 10:10:46 +01:00
Andy McCrae	59a4335a56	Restart services if handler called This patch fixes an issue where if hosts have different service lists, it will prevent restarting changes on services that run later on. For example, hostA in the mons and rgws group would initiate a config change and restart of services on all mons and rgws hosts, even though a separate hostB (which is only in the rgws group) has not had its configuration changed yet. Additionally, when the second host has its coniguration changed as part of the ceph-rgw role, it will not initiate a restart since its inventory name != the first hosts. To fix this we should run the restart once (using run_once: True) as long as the host has called the handler. This will ensure that even if only 1 host has called the handler it will initiate a restart on all hosts that have called the handler. Additionally, we add a var that is set when the handler runs, this will ensure that only hosts that have called the handler get restarted. Includes minor fix to remove unrequired "inventory_hostname in play_hosts" when: clause. This is no longer required since the handlers were changed. The host calling the handler will be in play_hosts already.	2018-02-16 10:40:20 +01:00
Sébastien Han	c816a9282c	container: osd remove run_once When used along with delegate, run_once does not belong well. Thus, using \| last always brings the desired result. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-02-14 02:01:29 +01:00
Sébastien Han	d47d02a5eb	docker-common: fix container restart on new image We now look for any excisting containers, if any we compare their running image with the latest pulled container image. For OSDs, we iterate over the list of running OSDs, this handles the case where the first OSD of the list has been updated (runs the new image) and not the others. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1526513 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-02-14 02:01:29 +01:00
Sébastien Han	ebc195487c	default: remove duplicate code This is already defined in ceph-defaults. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-02-14 02:01:29 +01:00
Caleb Boylan	0be60456ce	osd: Add support for multipath disks Multipath disks have partitions with a different format than what ceph-ansible currently supports, this update makes ceph-ansible aware of that format so multipath disks can be used as OSDs Signed-off-by: Caleb Boylan <caleb.boylan@ormuco.com>	2018-02-09 18:06:25 +01:00
Andy McCrae	b4dbc862d6	Set application for OpenStack pools Since Luminous we need to set the application tag for each pool, otherwise a CEPH_WARNING is generated when the pools are in use. We should assign the OpenStack pools to their default which would be "rbd". When updating to Luminous this would happen automatically to the vms, images, backups and volumes pools, but for new deploys this is not the case.	2018-02-09 17:15:55 +01:00
Sébastien Han	22f843e3d4	default: define 'osd_scenario' variable osd_scenario does not exist in the ceph-default role so if we try to play ceph-default on an OSD node, the playbook will fail with undefined variable. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-02-08 17:42:12 +01:00
Guillaume Abrioux	e537779bb3	osd: fix osd restart when dmcrypt This commit fixes a bug that occurs especially for dmcrypt scenarios. There is an issue where the 'disk_list' container can't reach the ceph cluster because it's not launched with `--net=host`. If this container can't reach the cluster, it will hang on this step (when trying to retrieve the dm-crypt key) : ``` +common_functions.sh:448: open_encrypted_part(): ceph --cluster abc12 --name \ client.osd-lockbox.9138767f-7445-49e0-baad-35e19adca8bb --keyring \ /var/lib/ceph/osd-lockbox/9138767f-7445-49e0-baad-35e19adca8bb/keyring \ config-key get dm-crypt/osd/9138767f-7445-49e0-baad-35e19adca8bb/luks +common_functions.sh:452: open_encrypted_part(): base64 -d +common_functions.sh:452: open_encrypted_part(): cryptsetup --key-file \ -luksOpen /dev/sdb1 9138767f-7445-49e0-baad-35e19adca8bb ``` It means the `ceph-run-osd.sh` script won't be able to start the `osd_disk_activate` process in ceph-container because he won't have filled the `$DOCKER_ENV` environment variable properly. Adding `--net=host` to the 'disk_list' container fixes this issue. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1543284 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-02-08 15:45:13 +01:00
Giulio Fidente	bdcc52b96d	Check for docker sockets named after both _hostname or _fqdn While hostname -f will always return an hostname including its domain part and -s without the domain part, the behavior when no arguments are given can include or not include the domain part depending on how the system is configured; the socket name might not match the instance name then.	2018-02-06 14:16:54 +01:00
Greg Charot	a6d1922a2e	mon: Fixed crush_rule_config for containerised deployment. Was called too early, container was not yet started so the commands failed. Moved the section after include docker/main.yml Signed-off-by: Greg Charot <gcharot@redhat.com>	2018-02-06 05:12:59 +01:00
Guillaume Abrioux	dd0c98c5a2	common: do not use `shell` module when it is not needed There is no need here to use `shell` instead of `command` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-31 10:45:34 +01:00
Guillaume Abrioux	deaf273b25	syntax: change local_action syntax Use a nicer syntax for `local_action` tasks. We used to have oneliner like this: ``` local_action: wait_for port=22 host={{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }} state=started delay=10 timeout=500 }} ``` The usual syntax: ``` local_action: module: wait_for port: 22 host: "{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}" state: started delay: 10 timeout: 500 ``` is nicer and kind of way to keep consistency regarding the whole playbook. This also fix a potential issue about missing quotation : ``` Traceback (most recent call last): File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 213, in <module> main() File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 185, in main rc, out, err = module.run_command(args, executable=executable, use_unsafe_shell=shell, encoding=None, data=stdin) File "/tmp/ansible_wQtWsi/ansible_modlib.zip/ansible/module_utils/basic.py", line 2710, in run_command File "/usr/lib64/python2.7/shlex.py", line 279, in split return list(lex) File "/usr/lib64/python2.7/shlex.py", line 269, in next token = self.get_token() File "/usr/lib64/python2.7/shlex.py", line 96, in get_token raw = self.read_token() File "/usr/lib64/python2.7/shlex.py", line 172, in read_token raise ValueError, "No closing quotation" ValueError: No closing quotation ``` writing `local_action: shell echo {{ fsid }} \| tee {{ fetch_directory }}/ceph_cluster_uuid.conf` can cause trouble because it's complaining with missing quotes, this fix solves this issue. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1510555 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-31 10:45:34 +01:00
Sébastien Han	6f9dd26caa	config: remove any spaces in public_network or cluster_network With two public networks configured - we found that with "NETWORK_ADDR_1, NETWORK_ADDR_2" install process consistently became broken, trying to find docker registry on second network, and not finding mon container. but without spaces "NETWORK_ADDR_1,NETWORK_ADDR_2" install succeeds so, containerized install is more peculiar with formatting of this line Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1534003 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-01-30 17:47:15 +01:00
Sébastien Han	5132cc3de4	Do not search osd ids if ceph-volume Description of problem: The 'get osd id' task goes through all the 10 times (and its respective timeouts) to make sure that the number of OSDs in the osd directory match the number of devices. This happens always, regardless if the setup and deployment is correct. Version-Release number of selected component (if applicable): Surely the latest. But any ceph-ansible version that contains ceph-volume support is affected. How reproducible: 100% Steps to Reproduce: 1. Use ceph-volume (LVM) to deploy OSDs 2. Avoid using anything in the 'devices' section 3. Deploy the cluster Actual results: TASK [ceph-osd : get osd id _uses_shell=True, _raw_params=ls /var/lib/ceph/osd/ \| sed 's/.-//'] ********************************************************************************************************************************************* task path: /Users/alfredo/python/upstream/ceph/src/ceph-volume/ceph_volume/tests/functional/lvm/.tox/xenial-filestore-dmcrypt/tmp/ceph-ansible/roles/ceph-osd/tasks/start_osds.yml:6 FAILED - RETRYING: get osd id (10 retries left). FAILED - RETRYING: get osd id (9 retries left). FAILED - RETRYING: get osd id (8 retries left). FAILED - RETRYING: get osd id (7 retries left). FAILED - RETRYING: get osd id (6 retries left). FAILED - RETRYING: get osd id (5 retries left). FAILED - RETRYING: get osd id (4 retries left). FAILED - RETRYING: get osd id (3 retries left). FAILED - RETRYING: get osd id (2 retries left). FAILED - RETRYING: get osd id (1 retries left). ok: [osd0] => { "attempts": 10, "changed": false, "cmd": "ls /var/lib/ceph/osd/ \| sed 's/.*-//'", "delta": "0:00:00.002717", "end": "2018-01-21 18:10:31.237933", "failed": true, "failed_when_result": false, "rc": 0, "start": "2018-01-21 18:10:31.235216" } STDOUT: 0 1 2 Expected results: There aren't any (or just a few) timeouts while the OSDs are found Additional info: This is happening because the check is mapping the number of "devices" defined for ceph-disk (in this case it would be 0) to match the number of OSDs found. Basically this line: until: osd_id.stdout_lines\|length == devices\|unique\|length Means in this 2 OSD case it is trying to ensure the following incorrect condition: until: 2 == 0 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1537103	2018-01-30 14:44:38 +01:00
Andy McCrae	481173f203	Add default for radosgw_keystone_ssl This should default to False. The default for Keystone is not to use PKI keys, additionally, anybody using this setting had to have been manually setting it before. Fixes: #2111	2018-01-30 11:30:23 +01:00
Guillaume Abrioux	f1232b33fd	Revert "monitor_interface: document need to use monitor_address when using IPv6" This reverts commit `10b91661ce`. This reverts also the same comment added in `1359869497`	2018-01-29 14:43:24 +01:00
Eduard Egorov	93e9f3723b	config: add host-specific ceph_conf_overrides evaluation and generation. This allows us to use host-specific variables in ceph_conf_overrides variable. For example, this fixes usage of such variables (e.g. 'nss db path' having {{ ansible_hostname }} inside) in ceph_conf_overrides for rados gateway configuration (see profiles/rgw-keystone-v3) - issue #2157. Signed-off-by: Eduard Egorov <eduard.egorov@icl-services.com>	2018-01-26 10:15:03 +01:00
Guillaume Abrioux	ec16cbdb1a	defaults: avoid getting stuck (ceph --connect-timeout) Sometime the playbook gets stuck because even with `--connect-timeout=` option, the connexion to the existing ceph cluster never timeout. As a workaround, using `timeout` command provided by coreutils will actually timeout if we can't connect to the cluster. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1537003 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-25 10:15:59 +01:00
Andrew Schoen	79473badfe	ceph-osd: adds dmcrypt to the lvm scenario Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-01-24 14:10:08 +01:00

1 2 3 4 5 ...

1807 Commits (8704144e3157aa253fb7563fe701d9d434bf2f3e)