ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Dimitri Savineau	0d55eeba79	tests: use a single grafana node on podman We don't use multiple grafana nodes for the moment on the others scenarios and I don't think this is supposed to be working. We can often see failure on grafana on that scenario. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `825045f6b4`)	2019-08-28 17:48:12 +00:00
Guillaume Abrioux	a3cbb59c05	lint: fix error [301], add `changed_when: false` when needed This commit fixes the error [301]: `[301] Commands should not change things if nothing needs doing` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `327d564106`)	2019-08-28 11:22:47 -04:00
Guillaume Abrioux	8f781198d6	lint: fix error [306], add pipefail on shell command using pipe This commit fixes the error [306]: `[306] Shells that use pipes should set the pipefail option` using `/bin/bash` as executable because Debian/Ubuntu systems use `dash` by default which doesn't have the `-o pipefail`. (See: https://github.com/ansible/ansible-lint/issues/497#issue-424623501) Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `102edaeb61`)	2019-08-28 11:22:47 -04:00
Dimitri Savineau	364951ce2f	ceph-mon: Bind mount the ca-trust directory On containerized deployment, the mon container sometimes needs to access to the radosgw endpoint (via the radosgw-admin command). When using TLS on the radosgw with self-signed certificates then we need to access to the CA certification from the mon container. The CA certificate needs to be added on the host and then the directory will be bind mount on the container. Resolves: #4358 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `2b0616ecca`)	2019-08-28 09:44:34 -04:00
Dimitri Savineau	1fbfa1ce1a	ceph-client: Use profile rbd in keyring caps Like the OpenStack keyrings, we can use the profile rbd for the clients keyring (both mon and osd). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `49aa05b96c`)	2019-08-28 09:42:03 -04:00
Dimitri Savineau	4df8de8f7b	Revert "osd: add 'osd blacklist' cap for osp keyrings" This reverts commit `2d955757ee`. The "osd blacklist" isn't an osd caps but should be used with mon caps. Also the correct caps for this is: 'allow command "osd blacklist"'. The current change is breaking the openstack and clients keyrings. By using the profile rbd (which is already used) we already rely on the ability to blacklist dead client. Resolves: #4385 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `717af83475`)	2019-08-28 09:42:03 -04:00
Johannes Kastl	3bfa1c50de	set discovered_python_interpreter if ansible_python_interpreter is defined If the user has set the `ansible_python_interpreter`, ansible will not try to discover python, so `discovered_python_interpreter` will not be set. Solution: Set `discovered_python_interpreter` to `ansible_python_interpreter` if `ansible_python_interpreter` is defined Signed-off-by: Johannes Kastl <kastl@b1-systems.de> (cherry picked from commit `bd507fa147`)	2019-08-27 21:06:43 +00:00
guihecheng	196e70a75a	rgw/multisite: assign 'rgw_zone' to the exact section in ceph.conf since the following commit: commit `1ac94c048f` rgw: add support for multiple rgw instances on a single host we have multi-instance rgw support on a single host and the config section name of the rgw changed from [client.rgw.$(hostname)] -> [client.rgw.$(hostname).rgwX] when X is the sequence number: 0,1,2,... So we should assign 'rgw_zone' item to the exact rgw instance config section in ceph.conf Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com> (cherry picked from commit `a0590cae9d`)	2019-08-23 15:56:15 +02:00
Artur Fijalkowski	27014df45e	global: make directories mode parameterizable This commit makes it possible to parametrize the ceph directories modes. So it changes hardocded mode for ceph related directories from 0755 to customizable with `ceph_directories_mode` variable. Closes: #2920 Signed-off-by: Artur Fijalkowski <artur.fijalkowski@ing.com> Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `011270ca69`)	2019-08-23 11:39:23 +00:00
Dimitri Savineau	500c59c648	ceph-osd: Add ulimit nofile on container start On containerized deployment, the OSD entrypoint runs some ceph-volume commands (lvm/simple scan and/or activate) which perform badly without the ulimit option. This option was added for all previous ceph-volume commands but not on the ceph-osd container startup. Also updating hard limit value to 4096 to reflect default baremetal value. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `9a4ac46d19`)	2019-08-22 22:50:17 +00:00
Kevin Coakley	c7950d5539	ceph-config: Set changed_when to false on fact gathering statements The "run 'ceph-volume lvm batch --report' to see how many osds are to be created" and "run 'ceph-volume lvm list' to see how many osds have already been created" statements only register the lvm_batch_report and lvm_list variables. Running those ceph-volume commands should never produce a change on the system. Adding changed_when: false prevents irrelevant change messages from Ansible. Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu> (cherry picked from commit `e11cbbbcb1`)	2019-08-22 20:36:39 +02:00
Johannes Kastl	3e17c458d0	facts: fix a typo This commit fixes a typo in roles/ceph-facts/tasks/facts.yml Signed-off-by: Johannes Kastl <kastl@b1-systems.de> (cherry picked from commit `e1b9312084`)	2019-08-22 18:11:18 +02:00
Kevin Jones	3a8de9cc36	Set proper ownership command performance improvement By changing the set ownership command from using the file module in combination with a with_items loop to a raw chown command, we can achieve a 98% performance increase here. On a ceph cluster with a significant amount of directories and files in /var/lib/ceph, the file module has to run checks on ownership of all those directories and files to determine whether a change is needed. In this case, we just want to explicitly set the ownership of all these directories and files to the ceph_uid Added context note to all set proper ownership tasks Signed-off-by: Kevin Jones <kevinjones@redhat.com> (cherry picked from commit `47bf47c9d8`)	2019-08-22 12:59:58 +02:00
Johannes Kastl	82ede0afdb	ceph-nfs: fail on openSUSE Leap using distro packages roles/ceph-validate/tasks/check_nfs.yml: fail on openSUSE Leap using `ceph_origin = distro`, as the ganesha packages are not available from the distribution repositories Fixes: #4342 Signed-off-by: Johannes Kastl <kastl@b1-systems.de> (cherry picked from commit `11aa5dbb58`)	2019-08-21 15:40:22 +02:00
Guillaume Abrioux	fcf571430b	handler: do not validate the server certificate against the CA Otherwise rgw handler ends up with an error when using https. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `9329bbb3af`)	2019-08-21 15:40:07 +02:00
Johannes Kastl	15646d1030	install ceph-mds packages on SUSE/openSUSE install packages on SUSE/openSUSE distributions, using the same logic as on RedHat-based distributions Fixes #4340 Signed-off-by: Johannes Kastl <kastl@b1-systems.de> (cherry picked from commit `c721cb99cb`)	2019-08-21 09:54:09 +00:00
Johannes Kastl	34783253a5	remove duplicate task installing suse dependencies roles/ceph-common/tasks/installs/install_on_suse.yml: remove the task that installs the dependencies, as this is done later in install_suse_packages.yml Signed-off-by: Johannes Kastl <kastl@b1-systems.de> (cherry picked from commit `504017d562`)	2019-08-20 14:36:15 +02:00
Guillaume Abrioux	642851fa5d	osd: add 'osd blacklist' cap for osp keyrings This commits adds the `osd blacklist` cap on all OSP clients keyrings. Fixes: #2296 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `2d955757ee`)	2019-08-20 13:09:05 +02:00
Guillaume Abrioux	3fc880ee7a	validate: do not validate devices or lvm_volumes in osd_auto_discovery case we shouldn't validate these two variables when `osd_auto_discovery` is set. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1644623 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `243edfbc96`)	2019-08-20 11:09:05 +02:00
Johannes Kastl	6fa0eb90a2	only support openSUSE Leap 15.x, fail on 42.x openSUSE switched from 'openSUSE 13.x' to 'openSUSE Leap 42.x' and then to 'openSUSE Leap 15.x' to align with SLES15 development. The previous logic did not correctly allow the current release, as 15.x matched the 'less than 42.3' condition. For now only support openSUSE Leap 15.x, and extend support once 16.x is released (or whatever the exact version will be) Signed-off-by: Johannes Kastl <kastl@b1-systems.de> (cherry picked from commit `5ee3d96fb4`)	2019-08-20 09:37:29 +02:00
Guillaume Abrioux	19c7b650db	osd: remove useless condition just like `ceph_osd_pool_default_size`, a pool size might change after an initial deployment. Having this condition prevents from customizing the pool in that case. This is not needed so let's remove it. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `70cf2a5846`)	2019-08-20 09:13:15 +02:00
Guillaume Abrioux	6d90dbc3c0	common: replace shell module there is no need to use `shell` in these tasks. Let's use `command`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `4df92152c0`)	2019-08-19 18:47:14 +00:00
Guillaume Abrioux	236020fb2b	shrink-mon: refact 'verify the monitor is out of the cluster' task use `from_json` filter instead of a `\| python` so we can get rid of the `shell` module usage here. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5573f17e76`)	2019-08-19 18:47:14 +00:00
Rishabh Dave	b28ed96378	use pre_tasks and post_tasks in shrink-mon.yml too This commit should've been part of commit `2fb12ae554`. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `2034387f57`)	2019-08-19 18:47:14 +00:00
Guillaume Abrioux	f08408bf5c	osd: refact 'wait for all osd to be up' task let's use `until` instead of doing test in bash using python oneliner also, use `command` instead of `shell`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `687087fd43`)	2019-08-19 18:47:14 +00:00
Guillaume Abrioux	2f77704591	common: use discovered_interpreter_python fact in order to use the right binary name when using python cli in command or shell module. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `13815ad3ca`)	2019-08-19 18:47:14 +00:00
Guillaume Abrioux	4b2d13995d	refact python installation This commit refacts the python installation when no available. In order to avoid generating errors, we check for each package manager to detect which system we are running on. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d3fa3c2d72`)	2019-08-19 18:47:14 +00:00
Guillaume Abrioux	0f90ffe9df	mgr: refact 'wait for all mgr to be up' task There's no need to use `shell` module here. Instead of using `\| python -c`, let's use `from_json` filter. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5b9b841108`)	2019-08-08 15:57:54 +02:00
Dimitri Savineau	d4348da7a1	mgr/dashboard: Fix grafana/prometheus url config When configuring grafana/prometheus embed in the mgr/dashboard, we need to use the address of the grafana-server node and not the current hostname because mgr/dashboard and grafana/prometheus could be present on different hosts. We should instead rely on the grafana_server_addr variable and remove the dashboard_url. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `4c6ec1dccb`)	2019-08-08 13:47:09 +02:00
Dimitri Savineau	f9d9ffac8f	dashboard: run dashboard role on mgr/mon nodes We don't need to execute the ceph-dashboard role on the nodes present in the grafana-server group. This one is dedicated to the grafana and prometheus stack. The ceph-dashboard needs to executed where the ceph-mgr is running. It is either on the dedicated mgr nodes or if mgr and mon are collocated implicitly on the mon nodes. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `16939eff9e`)	2019-08-08 13:47:09 +02:00
Dimitri Savineau	cf82ac5590	ceph-dashboard: Add run_once on delegate tasks Because we need to execute commands from a monitor node (the first one in the mons list) we are using delegate_to option. If there's multiple nodes running the ceph-dashboard role then the delegated task will be executed multiple times. Also remove a mgr config-key option not present for nautilus+ releases. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `f545b5be0d`)	2019-08-08 13:47:09 +02:00
Dimitri Savineau	8bb1be30fa	ceph-infra: Apply firewall rules with container We don't have a reason to not apply firewall rules on the host when using a containerized deployment. The TripleO environments already manage the ceph firewall rules outside ceph-ansible and set the configure_firewall variable to false. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1733251 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `771f25b1f8`)	2019-08-07 10:41:47 +02:00
Guillaume Abrioux	7550f47661	dashboard: do not deploy on Debian based OS/non-containerized in non-containerized deployment, we can't deploy dashboard on Debian based distribution since the package `ceph-grafana-dashboards` isn't available. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `dc7eb535b6`)	2019-08-07 10:41:24 +02:00
Dimitri Savineau	308e5fe9f4	ceph-grafana: Set grafana uid/gid on files We don't need to create a grafana system user (in fact we even don't set the righ uid to this user) because we're using a container setup. Instead we just need to be sure to set the owner/group to 472 (grafana user/group from the container) like we do for ceph/167. We don't need to set the user/group recursively on /etc/grafana directory in a dedicated task. Also on Ubuntu system, the ceph-grafana-dashboards isn't present so on non containerized deployment we won't have the /etc/grafana/dashboards/ceph-dashboard directory present (coming with the package) so we need to be sure it exists. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `34036c667c`)	2019-08-07 10:41:03 +02:00
Dimitri Savineau	6a5308fa7f	tests/shrink_rgw: Disable dashboard The shrink_rgw scenario has been merge just after the PR about enable ceph dashboard by default. So right now the shrink_rgw scenrio doesn't have nodes in the grafana group and fails. We just need to set dashboard_enabled to false. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `867583d5dd`)	2019-07-31 15:25:15 -04:00
Rishabh Dave	06c0a06122	tests/functional: add a test for shrink-rgw.yml Add a new functional test that deploys a Ceph cluster with three nodes for MON, OSD and RGW and then runs shrink-rgw.yml to test it. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `236b081a3a`) # Conflicts: # tox.ini	2019-07-31 15:25:15 -04:00
Rishabh Dave	72a062b6fa	add a playbook the remove rgw from a given node Add a playbook named shrink-rgw.yml to infrastructure-playbooks/ that can remove a RGW from a node in an already deployed Ceph cluster. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431 Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `632a44bdf2`)	2019-07-31 15:25:15 -04:00
Dimitri Savineau	36e18e20d1	ceph-osd: check container engine rc for pools When creating OpenStack pools, we only check if the return code from the pool list command isn't 0 (ie: if it doesn't exist). In that case, the return code will be 2. That's why the next condition is rc != 0 for the pool creation. But in containerized deployment, the return code could be different if there's a failure on the container engine command (like container not running). In that case, the return code could but either 1 (docker) or 125 (podman) so we should fail at this point and not in the next tasks. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1732157 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `d549fffdd2`)	2019-07-31 14:07:41 -04:00
Guillaume Abrioux	d2ef85b615	tests: add more memory in podman job Typical error : ``` fatal: [mon1 -> mon0]: FAILED! => changed=true cmd: - podman - exec - ceph-mon-mon0 - ceph - config - set - mgr - mgr/dashboard/ssl - 'false' delta: '0:00:00.644870' end: '2019-07-30 10:17:32.715639' msg: non-zero return code rc: 1 start: '2019-07-30 10:17:32.070769' stderr: \|- Traceback (most recent call last): File "/usr/bin/ceph", line 140, in <module> import rados ImportError: libceph-common.so.0: cannot map zero-fill pages: Cannot allocate memory Error: exit status 1 stderr_lines: <omitted> stdout: '' stdout_lines: <omitted> ``` Let's add more memory to get around this issue. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `0f620b2584`)	2019-07-30 15:08:46 +02:00
Guillaume Abrioux	d7d661d5d7	tests: deploy dashboard on mons there's no dedicated nodes for mgr, let's use monitor nodes. The mgr0 instance spawned isn't used, so if this node is part of the inventory for this scenario, testinfra will complain because there's no ceph.conf on this node. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d649e00893`)	2019-07-30 15:08:46 +02:00
Guillaume Abrioux	51af74face	dashboard: fix timeout usage on rgw user creation command For some reason, this is making the playbook failing like following: ``` TASK [ceph-dashboard : create radosgw system user] ********************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************** task path: /home/guits/ceph-ansible/roles/ceph-dashboard/tasks/configure_dashboard.yml:106 Tuesday 30 July 2019 10:04:54 +0200 (0:00:01.910) 0:11:22.319 ******** FAILED - RETRYING: create radosgw system user (3 retries left). FAILED - RETRYING: create radosgw system user (2 retries left). FAILED - RETRYING: create radosgw system user (1 retries left). fatal: [mgr0 -> mon0]: FAILED! => changed=true attempts: 3 cmd: timeout 20 podman exec ceph-mon-mon0 radosgw-admin user create --uid=ceph-dashboard --display-name='Ceph dashboard' --system delta: '0:00:20.021973' end: '2019-07-30 08:06:32.656066' msg: non-zero return code rc: 124 start: '2019-07-30 08:06:12.634093' stderr: 'exec failed: container_linux.go:336: starting container process caused "process_linux.go:82: copying bootstrap data to pipe caused \"write init-p: broken pipe\""' stderr_lines: <omitted> stdout: '' stdout_lines: <omitted> ``` using `timeout -f -s KILL` fixes this issue. Also, there is no need to use `shell` module here, let's switch to `command`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c9d80af4e0`)	2019-07-30 15:08:46 +02:00
Dimitri Savineau	5e273a9072	library/ceph_volume.py: remove six dependency The ceph nodes couldn't have the python six library installed which could lead to error during the ceph_volume custom module execution. ImportError: No module named six The six library isn't useful in this module if we're sure that all action variables passed to the build_ceph_volume_cmd function are a list and not a string. Resolves: #4071 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `a64a61429d`)	2019-07-29 15:53:18 +02:00
Rishabh Dave	8ca88b41cc	infra-playbooks: rewite a condition for better readability Use facility built-in in Ansible to check whether a command was executed successfully rather looking at its return value. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `5aecdd3ba6`)	2019-07-29 15:52:29 +02:00
Guillaume Abrioux	432257b6dd	tests: test dashboard deployment with podman scenario This commit adds a grafana-server section in order to test dashboard deployment with podman. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3c2fd337d9`)	2019-07-29 15:46:58 +02:00
Guillaume Abrioux	ea44783f3d	validate: add checks for grafana-server group definition this commit adds two checks: - check that the `[grafana-server]` group is defined - check that the `[grafana-server]` contains at least one node. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `02beb00916`)	2019-07-29 15:46:58 +02:00
Guillaume Abrioux	e2b41a17c0	mgr: fix a typo this tasks isn't using the right container_exec_cmd, that's delegating to the wrong node. Let's use the right fact to fix this command. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `ec33ee7574`)	2019-07-29 15:46:58 +02:00
Guillaume Abrioux	1a9043128c	dashboard: remove cfg80211 module installation According to this comment [1], this seems to be needed to detect wifi devices. In node exporter we can see this: ``` --collector.wifi Enable the wifi collector (default: disabled). ``` since it's enabled by default and we don't even change this in our systemd templates for node-exporter, we can easily assume in the end it's not needed. Therefore, let's remove this. [1] `dbf81b6b5b (diff-961545214e21efed3b84a9e178927a08L21-L23)` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `b9cdf341be`)	2019-07-29 15:46:58 +02:00
Guillaume Abrioux	d0ad1cf0f1	dashboard: use dedicated group only There's no need to add complexity and trying to fallback on other group. Let's deploy dashboard on all nodes present in grafana-server group. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d67230b2a2`)	2019-07-29 15:46:58 +02:00
Dimitri Savineau	dd87db70ca	dashboard: move code into a dedicated playbook Move dashboard, grafana/prometheus and node-exporter plays into a dedicated playbook in infrastructure-playbook directory. To avoid using 'dashboard_enabled \| bool' condition multiple time in the main playbook we can just import the dashboard playbook or not. This patch also allows to use an unique dashboard playbook for both baremetal and container playbooks. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `43135840b1`)	2019-07-29 15:46:58 +02:00
Guillaume Abrioux	93826e061d	dashboard: enable dashboard by default This commit enables dashboard deployment by default. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1726739 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `fb1b5b3251`) # Conflicts: # tox-dashboard.ini	2019-07-29 15:46:58 +02:00

1 2 3 4 5 ...

4817 Commits (b998fb339e09e98004eccf96c64351f1e11a9908) All Branches Search

4817 Commits (b998fb339e09e98004eccf96c64351f1e11a9908)

All Branches