ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	678e155328	infra: fix a typo in filename configure_firewall is missing its dot. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-10 12:39:04 -04:00
Guillaume Abrioux	f666902d52	infra: add tags for each subcomponent This way we can skip one specific component if needed. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-10 15:44:33 +00:00
Guillaume Abrioux	f8a7ffb085	infra: add firewall configuration for containerized deployment firewalld is available on atomic so there is no reason to not apply firewall configuration. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-10 15:44:33 +00:00
Guillaume Abrioux	0fb8812e47	infra: update firewall rules, add cluster_network for osds At the moment, all daemons accept connections from 0.0.0.0. We should at least restrict to public_network and add cluster_network for OSDs. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1541840 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-10 15:44:33 +00:00
Guillaume Abrioux	b3a71eeb08	ceph-infra: add new role ceph-infra this role manages ceph infra services such as ntp, firewall, ... Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-10 15:44:33 +00:00
Noah Watkins	8dcc8d1434	Stringify ceph_docker_image_tag This could be a numeric input, but is treated like a string leading to runtime errors. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1635823 Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-10-10 04:26:33 +00:00
Noah Watkins	306e308f13	Avoid using tests as filter Fixes the deprecation warning: [DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of using `result\|search` use `result is search`. Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-10-10 04:26:33 +00:00
Guillaume Abrioux	cc6f41f76a	tests: fix lvm2 setup issue not gathering fact causes `package` module to fail because it needs to detect which OS we are running on to select the right package manager. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-09 16:12:54 -04:00
Ramana Raja	ce8e740f62	docs: Correct mandatory config options 'radosgw_interface' or 'radosgw_address' config option does not need to be set for all ceph-ansible deployments. Closes: https://github.com/ceph/ceph-ansible/issues/3143 Signed-off-by: Ramana Raja <rraja@redhat.com>	2018-10-09 15:15:49 -04:00
Alfredo Deza	3e488e8298	tests: install lvm2 before setting up ceph-volume/LVM tests Signed-off-by: Alfredo Deza <adeza@redhat.com>	2018-10-09 13:48:50 -04:00
Andrew Schoen	ada03d064d	ceph-validate: remove versions checks for bluestore and lvm scenario These checks will never pass unless ceph_stable_release is passed and ceph-defaults is run before ceph-validate. Additionally, we don't want to support deploying jewel upstream at ceph-ansible master. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1637537 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-10-09 13:30:42 -04:00
Andrew Schoen	436dc8c5e1	ceph-config: allow the batch --report to fail when getting the OSD num Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-10-09 10:09:50 -04:00
Andrew Schoen	a63ca220e6	ceph-volume: if --report fails to load json, fail with better info This handles the case gracefully where --report does not return any JSON because a validator might have failed. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-10-09 10:09:50 -04:00
Andrew Schoen	a68c680225	tests: remove journal_size from lvm-batch testing scenario Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-10-09 10:09:50 -04:00
Andrew Schoen	5ee305d1a0	ceph-volume: make the batch action idempotent The command is run with --report first to see if any OSDs will be created or not. If they will be, then the command is run. If not, then changed is set to False and the module exits. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-10-09 10:09:50 -04:00
Andrew Schoen	40f82319dd	ceph-config: use 'lvm list' to find num_osds for an existing cluster This makes finding num_osds idempotent for clusters that were deployed using 'lvm batch'. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-10-09 10:09:50 -04:00
Andrew Schoen	2ffad1b43a	ceph-volume: adds `lvm list` support to the ceph_volume module Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-10-09 10:09:50 -04:00
Andrew Schoen	8afef3d0de	ceph-config: use the ceph_volume module to get num_osds for lvm batch This gives us an accurate number of how many osds will be created. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-10-09 10:09:50 -04:00
Andrew Schoen	07a384ba56	ceph_volume: adds the report parameter Will pass the --report command to ceph-volume lvm batch. Results will be returned in json format. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-10-09 10:09:50 -04:00
Andrew Schoen	c453ea25c0	ceph-osd: use journal_size and block_db_size for lvm batch Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-10-09 10:09:50 -04:00
Andrew Schoen	71ce539da5	ceph-defaults: add the block_db_size option This is used in the lvm osd scenario for the 'lvm batch' subcommand of ceph-volume. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-10-09 10:09:50 -04:00
Andrew Schoen	8bb131c712	ceph-volume: add the journal_size and block_db_size options These can be used for the the --journal-size and --block-db-size options of `lvm batch`. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-10-09 10:09:50 -04:00
Sébastien Han	82ec5a29f2	site: use default value for 'cluster' variable If someone's cluster name is 'ceph' then the playbook will fail (with no errors because of ignore_errors) saying it can not find the variable. So let's declare the default. If the cluster name is different then it'll be in group_vars and thus there won't be any failre. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1636962 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-08 20:31:32 +00:00
Sébastien Han	9180f6a277	rhcs: add helpers for the containerized deployment We give more assistance to consultants deplying by setting the registry and the image name. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-08 09:50:49 -04:00
Guillaume Abrioux	3e2cdcc735	common: remove check_firewall code Check firewall isn't working as expected and might break deployments. This part of the code will be reworked soon. Let's focus on configure_firewall code for now. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1541840 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-06 14:32:17 +02:00
Guillaume Abrioux	be31c15ccd	follow up on `b5d2ea2` Add some missed statements Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-06 14:32:17 +02:00
Guillaume Abrioux	79bd06ad28	rolling_update: add ceph-handler role since the introduction of ceph-handler, it has to be added in rolling_update playbook as well Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-05 13:48:04 +00:00
Rishabh Dave	b5d2ea269f	don't use "static" field while including tasks Instead used "import_tasks" and "include_tasks" to tell whether tasks must be included statically or dynamically. Fixes: https://github.com/ceph/ceph-ansible/issues/2998 Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-10-04 07:44:28 +00:00
Sébastien Han	bae0f41705	switch: copy initial mon keyring We need to copy this key into /etc/ceph so when ceph-docker-common runs it can fetch it to the ansible server. Previously the task wasn't not failing because `fail_on_missing` was False before 2.5, so now it's True hence the failure. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-03 13:58:53 +00:00
Guillaume Abrioux	03e76af7b4	switch: add missing call to ceph-handler role Add missing call the ceph-handler role, otherwise we can't have reference to variable registered from ceph-handler from other roles. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-03 13:58:53 +00:00
Guillaume Abrioux	54b02fe187	switch: support migration when cluster is scrubbing Similar to `c13a3c3` we must allow scrubbing when running this playbook. In cluster with a large number of PGs, it can be expected some of them scrubbing, it's a normal operation. Preventing from scrubbing operation force to set noscrub flag. This commit allows to switch from non containerized to containerized environment even while PGs are scrubbing. Closes: #3182 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-03 13:58:53 +00:00
Guillaume Abrioux	6130bc841d	config: look up for monitor_address_block in hostvars `monitor_address_block` should be read from hostvars[host] instead of current node being played. eg: Let's assume we have: ``` [mons] ceph-mon0 monitor_address=192.168.1.10 ceph-mon1 monitor_interface=eth1 ceph-mon2 monitor_address_block=192.168.1.0/24 ``` the ceph.conf generation task will end up with: ``` fatal: [ceph-mon0]: FAILED! => {} MSG: 'ansible.vars.hostvars.HostVarsVars object' has no attribute u'ansible_interface' ``` the reason is that it will assume `monitor_address_block` isn't defined even on ceph-mon2 because looking for `monitor_address_block` instead of `hostvars[host]['monitor_address_block']`, therefore it enters in the condition as default value: ``` {%- else -%} {% set interface = 'ansible_' + (monitor_interface \| replace('-', '_')) %} {% if ip_version == 'ipv4' -%} {{ hostvars[host][interface][ip_version]['address'] }} {%- elif ip_version == 'ipv6' -%} [{{ hostvars[host][interface][ip_version][0]['address'] }}] {%- endif %} {%- endif %} ``` `monitor_interface` is set with default value `'interface'` so the `interface` variable is built with 'ansible_' + 'interface'. It makes ansible throwing a confusing message about `'ansible_interface'`. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1635303 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-02 22:41:05 +02:00
Benjamin Cherian	85071e6e53	Add support for different NTP daemons Allow user to choose between timesyncd, chronyd and ntpd Installation will default to timesyncd since it is distributed as part of the systemd installation for most distros. Added note indicating NTP daemon type is not used for containerized deployments. Fixes issue #3086 on Github Signed-off-by: Benjamin Cherian <benjamin_cherian@amat.com>	2018-10-02 13:18:08 +00:00
Mike Christie	eddb95941b	igw: valid client CHAP settings. The linux kernel target layer, LIO, does not support the iscsi target to mix ACLs that have chap enabled and disabled under the same tpg. This patch adds a check and fails if this type of setup is detected. This fixes Red Hat BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1615088 Signed-off-by: Mike Christie <mchristi@redhat.com>	2018-10-01 18:23:03 +02:00
Alfredo Deza	54adb6d894	doc: redo lvm scenario documentation, improved wording and config descriptions Signed-off-by: Alfredo Deza <adeza@redhat.com>	2018-10-01 11:48:11 +00:00
Sébastien Han	4db6a213f7	add ceph-handler role The role contains all the handlers for Ceph services. We decided to leave ceph-defaults role with variables and a few facts only. This is useful when organizing the site.yml files and also adding the known variables to infrastructure-playbooks. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-09-28 15:15:49 +00:00
Andrew Schoen	9747f3dbd5	purge-cluster: zap devices used with the lvm scenario Fixes: https://github.com/ceph/ceph-ansible/issues/3156 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-09-28 14:49:56 +02:00
wumingqiao	5da71e1ca1	purge-cluster: recursively remove ceph-related files, symlinks and directories under /etc/systemd/system. fix: https://github.com/ceph/ceph-ansible/issues/3166 Signed-off-by: wumingqiao <wumingqiao@beyondcent.com>	2018-09-28 14:49:22 +02:00
Sébastien Han	9fe86c2268	test: use osd_objecstore default value Do not force filestore on our test but whatever is the default of osd_objecstore. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-09-27 21:23:49 +00:00
Sébastien Han	145aef9fed	defaults: do not disable THP on bluestore As per #1013 it appears that BS will soon use THP to lower TLB misses, also disabling THP hasn't demonstrated any gains so far. Closes: https://github.com/ceph/ceph-ansible/issues/1013 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-09-27 21:23:49 +00:00
Sébastien Han	dc3319c3c4	default: use bluestore as default object store All tooling in Ceph is defaulting to use the bluestore objectstore for provisioning OSDs, there is no good reason for ceph-ansible to continue to default to filestore. Closes: https://github.com/ceph/ceph-ansible/issues/3149 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1633508 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-09-27 21:23:49 +00:00
Rishabh Dave	380168dadc	don't use "include" to include tasks Use "import_tasks" or "include_tasks" instead. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-09-27 17:53:40 +02:00
Guillaume Abrioux	144c92b21f	purge: actually remove of /var/lib/ceph/* `38dc20e74b` introduced a bug in the purge playbooks because using `` in `command` module doesn't work. `/var/lib/ceph/` files are not purged it means there is a leftover. When trying to redeploy a cluster, it failed because monitor daemon was detecting existing keyring, therefore, it assumed a cluster already existed. Typical error (from container output): ``` Sep 26 13:18:16 mon0 docker[31316]: 2018-09-26 13:18:16 /entrypoint.sh: Existing mon, trying to rejoin cluster... Sep 26 13:18:16 mon0 docker[31316]: 2018-09-26 13:18:16.9323937f15b0d74700 -1 auth: unable to find a keyring on /etc/ceph/test.client.admin.keyring,/etc/ceph/test.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,:(2) No such file or directory Sep 26 13:18:23 mon0 docker[31316]: 2018-09-26 13:18:23 /entrypoint.sh: SUCCESS ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1633563 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-27 17:45:21 +02:00
Guillaume Abrioux	179c4d00d7	rolling_update: ensure pgs_by_state has at least 1 entry Previous commit `c13a3c3` has removed a condition. This commit brings back this condition which is essential to ensure we won't hit a false positive result in the `when` condition for the check PGs task. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-25 14:58:54 +00:00
Guillaume Abrioux	c13a3c3492	upgrade: consider all 'active+clean' states as valid pgs In cluster with a large number of PGs, it can be expected some of them scrubbing, it's a normal operation. Preventing from scrubbing operation force to set noscrub flag before a rolling update which is a problem because it pauses an important data integrity operation until the end of the rolling upgrade. This commit allows an upgrade even while PGs are scrubbing. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616066 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-25 12:12:06 +00:00
Andrew Schoen	4cd675e7ec	docs: supported validation by the ceph-validate role List the osd_scenarios and install options that are validated by the ceph-validate role in the documentation. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-09-24 14:53:58 +00:00
Guillaume Abrioux	3285b47703	tests: add an RGW node on osd0 for ooo-collocation get more coverage by adding an RGW daemon collocated on osd0. We've missed a bug in the past which could have been caught earlier in the CI. Let's add this additional daemon in order to have a better coverage. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-24 14:35:25 +02:00
Giulio Fidente	6126210e0e	Fix version check in ceph.conf template We need to look for ceph_release when comparing with release names, not ceph_version. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1631789 Signed-off-by: Giulio Fidente <gfidente@redhat.com>	2018-09-24 13:08:27 +02:00
Matthew Vernon	806461ac6e	restart_osd_daemon.sh.j2 - use `+` rather than `{1,}` in regex `+` is more idiomatic for "one or more" in a regex than `{1,}`; the latter was introduced in a previous fix for an incorrect `{1,2}` restriction. Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>	2018-09-24 10:33:46 +00:00
Matthew Vernon	04f4991648	restart_osd_daemon.sh.j2 - consider active+clean+* pgs as OK After restarting each OSD, restart_osd_daemon.sh checks that the cluster is in a good state before moving on to the next one. One of the checks it does is that the number of pgs in the state "active+clean" is equal to the total number of pgs in the cluster. On large clusters (e.g. we have 173,696 pgs), it is likely that at least one pg will be scrubbing and/or deep-scrubbing at any one time. These pgs are in state "active+clean+scrubbing" or "active+clean+scrubbing+deep", so the script was erroneously not including them in the "good" count. Similar concerns apply to "active+clean+snaptrim" and "active+clean+snaptrim_wait". Fix this by considering as good any pg whose state contains active+clean. Do this as an integer comparison to num_pgs in pgmap. (could this be backported to at least stable-3.0 please?) Closes: #2008 Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>	2018-09-24 10:33:46 +00:00

... 10 11 12 13 14 ...

4519 Commits (02c63e8d45cf0a5791ee49d7114f98febdbdc55a) All Branches Search

4519 Commits (02c63e8d45cf0a5791ee49d7114f98febdbdc55a)

All Branches