ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Artur Fijalkowski	52d9d406b1	Fix in regular expression matching OSD ID on non-contenerized deployment. restart_osd_daemon.sh is used to discover and restart all OSDs on a host. To do it the scripts loops the list of ceph-osd@ services in the system. This commit fixes bug in the regular expression responsile for extraction of OSDs - prior version uses `[0-9]{1,2}` expression which is ignoring all OSDS which numbers are greater than 99 (thus longer than 2 digits). Fix removed upper limit of digits in the number. This problem existed in two places in the script. Closes: #2964 Signed-off-by: Artur Fijalkowski <artur.fijalkowski@ing.com>	2018-08-06 15:53:49 +00:00
Guillaume Abrioux	1164cdc002	iscsigw: install ceph-iscsi-cli package Install ceph-iscsi-cli in order to provide the `gwcli` command tool. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1602785 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-08-06 14:11:52 +02:00
Guillaume Abrioux	0a6ff6bbf8	defaults: backward compatibility with fqdn deployments This commit ensures we are backward compatible with fqdn deployments. Since ceph-container enforces deployment to be done with shortname, we must keep backward compatibility with clusters already deployed with fqdn configuration Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-08-06 10:14:58 +00:00
Sébastien Han	ea9e60d48d	config: enforce socket name This was introduced by `59ee2e8d3b` and made our socket checks impossible to run. The PID could be found, but the cctid cannot. This happens during upgrade to mimic and on cluster running on mimic. So let's force the admin socket the way it was so we can properly check for existing instances also the line $cluster-$name.$pid.$cctid.asok is only needed when running multiple instances of the same daemon, thing ceph-ansible cannot do at the time of writing Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1610220 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-07-31 10:58:04 +02:00
Mike Christie	6f72f96dad	igw: do not fail purge on rbd removal errors Instead of failing the entire purge operation when the rbd command fails just log an error. This will allow the higher level target and config cleanup to complete, and the user only has to manually delete the rbd images. Signed-off-by: Mike Christie <mchristi@redhat.com>	2018-07-31 10:08:26 +02:00
Mike Christie	d572a9a602	igw: fix image removal during purge We were not passing in the ceph conf info into the rbd image removal command, so if the clustername was not the default igw purge would fail due to the rbd rm command failing. This just fixes the bug by passing in the ceph conf info which has the clustername to use. This fixes Red Hat bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1601949 Signed-off-by: Mike Christie <mchristi@redhat.com>	2018-07-31 10:08:26 +02:00
Sébastien Han	2ca8c51906	osd: do not remove expose_partition container The container runs with --rm which means it will be deleted by Docker when exiting. Also 'docker rm -f' is not idempotent and returns 1 if the container does not exist. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1609007 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-07-30 10:38:15 +02:00
Guillaume Abrioux	1ecbbbdcfa	rbd-mirror: bring back compatibility with jewel deployment rbd-mirror can't start when deploying jewel because it needs admin keyring. Getting back this task brings backward compatibility for jewel deployment. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-07-26 18:47:10 +00:00
Guillaume Abrioux	053709da97	ceph-osds: backward compatibility with jewel for osp pools creation If we want to be backward compatible with release prior to luminous, we have to set the rule name accordingly to default values used in jewel. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-07-26 18:47:10 +00:00
Guillaume Abrioux	2597a557c5	client: fix an incorrect title in a task This task would be run on both containerized and non containerized deployment. Let's have a proper title to avoid confusion. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-07-26 15:57:41 +02:00
Sébastien Han	e2ea5bac51	rgw: add more config option for civetweb frontend In containerized deployments we now inherite from the radosgw_civetweb_options options when bootstrapping the container. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1582411 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-07-25 13:19:14 +00:00
Giulio Fidente	e85e5ea781	Run creation of empty rados index object to first monitor When distributing ceph-nfs role, creation of rados index object fails as it assumes availability of client.admin locally. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1607970 Signed-off-by: Giulio Fidente <gfidente@redhat.com>	2018-07-25 11:40:11 +02:00
Sébastien Han	235d1b3f55	validate: add checks for interfaces Check if the interface provided: * exists in the gathered facts (thus on the system) * is active * has an IP address (depending on ip_version ) Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600227 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-07-24 17:59:30 +02:00
Guillaume Abrioux	af82e7523d	tests: test master against ansible 2.6 Ansible 2.4 is currently end-of-life. Ansible 2.5 will go end-of-life after Ansible 2.7 is released. Fixes: #2901 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-07-23 11:59:15 +00:00
Sébastien Han	7fc13bc9d5	validate: only run osd test on osd node Do not run device validation on every hosts, only on OSD nodes. Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-07-19 12:46:18 +00:00
Sébastien Han	cf01e596b6	valide: improve device check We know make sure that: * devices are actually block special files * length of dedicated_device is identical to devices Signed-off-by: Sébastien Han <seb@redhat.com>	2018-07-18 14:26:22 +00:00
Guillaume Abrioux	1a626d3c61	nfs: change default stable branch for nfs-ganesha repo Since `V2.6-stable` is available and has packages for `mimic`, let's update this default value accordingly so nfs nodes can be deployed with mimic. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-07-13 08:20:27 +00:00
Sébastien Han	e61ca882a1	validate: force ansible version We currently only support Ansible 2.4.X so let's fail if the version is different. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-07-13 07:52:56 +00:00
Guillaume Abrioux	5ef5fcd0b6	client: do not rely on copy_admin_key to import keys Relying on `copy_admin_key` to import created keys on client nodes makes us obliged to copy admin key on those nodes which is not something we might want. We should use the fact `condition_copy_admin_key` which will be set to `True` when the delegated node is a mon which means we can import keys without taking care of admin keyring. Fixes: #2867 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-07-13 06:52:00 +00:00
Guillaume Abrioux	ce5ac930c5	mgr: fix condition to add modules to ceph-mgr Follow up on #2784 We must check in the generated fact `_disabled_ceph_mgr_modules` to enable disabled mgr module. Otherwise, this task will be skipped because it's not comparing the right list. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600155 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-07-12 21:04:01 +00:00
Guillaume Abrioux	9f54b3b4a7	mon: ensure socker is purged when mon is stopped On containerized deployment, if a mon is stopped, the socket is not purged and can cause failure when a cluster is redeployed after the purge playbook has been run. Typical error: ``` fatal: [osd0]: FAILED! => {} MSG: 'dict object' has no attribute 'osd_pool_default_pg_num' ``` the fact is not set because of this previous failure earlier: ``` ok: [mon0] => { "changed": false, "cmd": "docker exec ceph-mon-mon0 ceph --cluster test daemon mon.mon0 config get osd_pool_default_pg_num", "delta": "0:00:00.217382", "end": "2018-07-09 22:25:53.155969", "failed_when_result": false, "rc": 22, "start": "2018-07-09 22:25:52.938587" } STDERR: admin_socket: exception getting command descriptions: [Errno 111] Connection refused MSG: non-zero return code ``` This failure happens when the ceph-mon service is stopped, indeed, since the socket isn't purged, it's a leftover which is confusing the process. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-07-10 20:08:07 +00:00
Guillaume Abrioux	d0746e0858	common: switch from docker module to docker_container As of ansible 2.4, `docker` module has been removed (was deprecated since ansible 2.1). We must switch to `docker_container` instead. See: https://docs.ansible.com/ansible/latest/modules/docker_module.html#docker-module Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-07-10 20:08:07 +00:00
Shilpa Jagannath	07852ed039	Remove zone from zonegroup and update period before deleting the zone to avoid inconsistent period information across other zones. When you delete a zone without removing from zonegroup, the period update would fail since that command needs to load the zone and zonegroup to be able to update the master. Period update would fail with an error like this: radosgw-admin period update --commit -1 Cannot find zone id= (name=), switching to local zonegroup configuration -1 Cannot find zone id= (name=) Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>	2018-07-09 12:27:24 +00:00
Sébastien Han	b9f7df7ba2	common: remove hdparm As of Kraken, the journal code does not use the hdparm command anymore so we can remove it from our package dependency list. Fixes: https://github.com/ceph/ceph-ansible/issues/1402 Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit f6910efa24389c264062963b2054c7cd29ffebb3)	2018-07-07 08:53:47 +00:00
Sébastien Han	713b9fcf9b	ceph-config: do not log cluster log on container The container image recently merged both cluster and mon log into a single stream. Following this, we now see this warning coming from the container image: 2018-06-19 13:44:01.542990 7ff75b024700 1 mon.vm02@1(peon).log v57928205 unable to write to '/var/log/ceph/ceph.log' for channel 'cluster': (2) No such file or directory So we now tell the mon to not log cluster log on the filesystem. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1591771 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-07-05 15:11:45 +00:00
Sébastien Han	fcf11ecc35	ceph-common: fix rhcs condition We forgot to add mgr_group_name when checking for the mon repo, thus the conditional on the next task was failing. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1598185 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-07-04 17:17:21 +02:00
Guillaume Abrioux	3abc253fec	mgr: fix enabling of mgr module on mimic The data structure has slightly changed on mimic. Prior to mimic, it used to be: ``` { "enabled_modules": [ "status" ], "disabled_modules": [ "balancer", "dashboard", "influx", "localpool", "prometheus", "restful", "selftest", "zabbix" ] } ``` From mimic it looks like this: ``` { "enabled_modules": [ "status" ], "disabled_modules": [ { "name": "balancer", "can_run": true, "error_string": "" }, { "name": "dashboard", "can_run": true, "error_string": "" } ] } ``` This means we can't simply check if `item` is in `item in _ceph_mgr_modules.disabled_modules` the idea here is to use filter `map(attribute='name')` to build a list when deploying mimic. Fixes: #2766 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-07-03 21:19:16 +00:00
Sébastien Han	63658c05c7	ceph-client: do not kill the dummy container The container runs for 300 sec, then dies and removes itself thanks to the '--rm' option, so there is no point of removing it. Also this is causing failure under some circonstances. Closing: https://bugzilla.redhat.com/show_bug.cgi?id=1568157 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-07-03 16:09:52 +00:00
Sébastien Han	a629408967	ceph-mds: enable application pool We now enable the application type 'cephfs' for each cephfs pools we create. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1590275 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-07-02 10:28:34 +00:00
Sébastien Han	103c279c21	ceph-defaults: add default application to pool We now add a default 'rbd' application type to each pool we create. This will remove the warning: " application not enabled on N pool(s) " Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1590275 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-07-02 10:28:34 +00:00
Vasu Kulkarni	1d454b611f	Enable monitor repo for mgr nodes and Tools repo for iscsi/nfs/clients Signed-off-by: Vasu Kulkarni <vasu@redhat.com>	2018-06-29 18:09:26 +00:00
Sébastien Han	abdb53e16a	ceph-osd: trigger osd container restart on script change The script ceph-osd-run.sh holds the config options to start the container, if one of these options are modified we must restart the container. This was not the case before becauase the 'notify' flag wasn't present. Closing: https://bugzilla.redhat.com/show_bug.cgi?id=1596061 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-06-28 17:54:13 +02:00
Sébastien Han	f623997271	systemd: remove changed_when: false When using a module there is no need to apply this Ansible option. The module will handle the idempotency on its own. So the module decides wether or not the task has changed during the execution. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-06-28 17:54:13 +02:00
George Shuklin	653b483fc3	Add ceph_keyring_permissions variable to control permissions for keyring files in /etc/ceph. Default value is the same as it was (0600), but this variable allows user to override it (f.e. set it to 0640). Signed-off-by: George Shuklin <george.shuklin@gmail.com>	2018-06-28 15:48:39 +00:00
Ha Phan	a7b7735b6f	ceph-mon: Generate initial keyring Minor fix so that initial keyring can be generated using python3. Signed-off-by: Ha Phan <thanhha.work@gmail.com>	2018-06-28 10:39:56 +02:00
Ha Phan	b7b8aba47b	Generate a copy of ceph.conf locally Refers to #2697 This change creates a copy of `ceph.conf` in ansible server. Signed-off-by: Ha Phan <thanhha.work@gmail.com>	2018-06-28 07:39:30 +00:00
Andy McCrae	a4a3d9a01b	Fix package state for upgrades on SuSE/RHEL During `226f80c22b` only Debian package installs had the correct state set to ensure packages were upgraded when the "upgrade_ceph_packages" var was set to true. Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>	2018-06-27 18:55:22 +00:00
Sébastien Han	322e2de7d2	mon: honour mon_docker_net_host option --net=host was hardcoded in the startup line so even though mon_docker_net_host was set to False the net option would always be activated. mon_docker_net_host is set to True by default so this commit does not change the behaviour. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-06-27 13:44:41 +00:00
Michel Rode	7774935707	Added 'squash' as a parameter to nfs-ganesha. Set the default to 'root_squash' - which is the default of nfs-ganesha. Signed-off-by: Michel Rode <rmichel@devnu11.net>	2018-06-25 09:13:17 +02:00
Christian Zunker	48394597c9	reset failed count of ceph-mgr Depending on your setup, ceph-mgr might get restarted multiple times. When this is done to fast, systemd will prevent further restarts because of configured limits in the ceph-mgr systemd unit file. Resetting the failure count will prevent this problem. The reset is done before the restart so in case of a real problem during the restart it still fails. Fixes: #2768 Signed-off-by: Christian Zunker <christian.zunker@codecentric.cloud>	2018-06-20 13:59:16 +02:00
Sébastien Han	bea4027f0c	common: start firewalld if configure_firewall Currently we expect that if configure_firewall is set to True to have firewalld enabled and running. Let's enforce that. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1589146 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-06-18 04:02:50 -04:00
Sébastien Han	a9ed3579ae	mon/osd: bump container memory limit As discussed with the cores, the current limits are too low and should be bumped to higher value. So now by default monitors get 3GB and OSDs get 5GB. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1591876 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-06-17 11:20:27 -04:00
Guillaume Abrioux	51cf3b7fa0	client: try to kill dummy container only on first client node The 'dummy' container is created only on first client node, it means we must seek to destroy this container only on this node, otherwise this can cause failure like following : ``` fatal: [192.168.24.8]: FAILED! => {"changed": false, "cmd": ["docker", "rm", "-f", "ceph-create-keys"], "delta": "0:00:00.023692", "end": "2018-06-12 20:56:07.261278", "msg": "non-zero return code", "rc": 1, "start": "2018-06-12 20:56:07.237586", "stderr": "Error response from daemon: No such container: ceph-create-keys", "stderr_lines": ["Error response from daemon: No such container: ceph-create-keys"], "stdout": "", "stdout_lines": []} ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1590746 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-06-13 16:10:46 +02:00
Patrick Donnelly	9ce81ae845	ceph-mds: do not enable multimds on jewel Multiple active MDS became stable in Luminous. Introduced-by: `c8573fe0d7` Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2018-06-12 10:47:34 +02:00
Sébastien Han	2e8412734a	common: ability to enable/disable fw configuration Prior to this patch if you were running on a Red Hat system, ceph-ansible would try to configure firewalld for you without the operators's consent. Now you can enable or disable the fw configuration by setting configure_firewall to either true or false. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1589146 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-06-11 21:51:59 +02:00
Konstantin Shalygin	3a07568496	ceph-osd: set 'openstack_keys_tmp' only when 'openstack_config' is defined. If 'openstack_config' is false this task shouldn't be executed. Signed-off-by: Konstantin Shalygin <k0ste@k0ste.ru>	2018-06-11 13:03:55 +02:00
Vishal Kanaujia	1a610df02b	Fix to run secure cluster only once in a run The current secure cluster play runs with all the monitors. The rerun of this task is unnecessary and can be skipped. Fixes: #2737 Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>	2018-06-11 08:37:29 +02:00
Guillaume Abrioux	090ecff94e	client: keyrings aren't created when single client node combining `run_once: true` with `inventory_hostname == groups.get(client_group_name) \| first` might cause bug when the only node being run is not the first in the group. In a deployment with a single client node it might cause issue because sometimes keyring won't be created since the task could be definitively skipped. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1588093 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-06-08 15:05:47 +02:00
Sébastien Han	20c8065e48	ceph-iscsi: rename group iscsi_gws Let's try to avoid using dashes as testinfra needs to be able to read the groups. Typically, with iscsi-gws we can't add a marker for these iscsi nodes, using an underscore fixes the issue. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-06-08 10:21:54 +02:00
Sébastien Han	91bf53ee93	ceph-iscsi: support for containerize deployment We now have the ability to deploy a containerized version of ceph-iscsi. The result is similar to the non-containerized version, you simply have 3 containers running for the following services: * rbd-target-api * rbd-target-gw * tcmu-runner Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1508144 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-06-08 10:21:54 +02:00

1 2 3 4 5 ...

1868 Commits (50be3fd9e8c0944cdddbd88bc8287e65765b0c63)