ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Giulio Fidente	cb66a62ae2	Look for additional names when checking ceph-nfs container status Ganesha cannot be operated active/active, in those deployments where it is managed by pacemaker the container name can be different than the default. This change uses "ceph_nfs_service_suffix" where previously missing to ensure tasks will work with customized names. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1750005 Signed-off-by: Giulio Fidente <gfidente@redhat.com> (cherry picked from commit `d2a2bd7c42`)	2019-09-09 16:48:50 -04:00
Dimitri Savineau	3fded4b8ec	rbd-mirror: configure pool and peer The rbd mirror configuration was only available for non containerized deployment and was also imcomplete. We now enable the mirroring on the pool and add the remote peer in both scenarios. The default mirroring mode is set to 'pool' but can be configured via the ceph_rbd_mirror_mode variable. This commit also fixes an issue on the rbd mirror command if the ceph cluster name isn't using the default value (ceph) due to a missing --cluster parameter to the command. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1665877 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `7e5e21741e`)	2019-09-09 16:05:56 +00:00
Boris Ranto	51f18f9076	rhcs: Pin downstream containers We should pin down the versions of downstream container for dashboard instead of using upstream containers. Signed-off-by: Boris Ranto <branto@redhat.com> (cherry picked from commit `79fdf125c7`)	2019-09-05 13:45:18 -04:00
fmount	65a01036c2	Fix discovered_interpreter_python variable This change fixes the discovered_interpreter_python variable name that was "discovered_python_interpreter" and caused a failure in OSP deployments. Signed-off-by: fmount <fpantano@redhat.com> (cherry picked from commit `81eb091533`)	2019-09-04 14:16:57 -04:00
Johannes Kastl	781ab4ad62	openSUSE OBS repo using ceph_stable_release Instead of hardcoding `luminous`, use the `ceph_stable_release` variable to point to the correct repository. This is now uncommented in roles/ceph-defaults/defaults/main.yml to be available, as it is only used if ceph_repository is set to 'obs'. group_vars/*.sample files have been regenerated using the ./generate_group_vars_sample.sh script. Signed-off-by: Johannes Kastl <kastl@b1-systems.de> (cherry picked from commit `0cedc4d303`)	2019-08-30 09:04:24 -04:00
fmount	159db72269	Add http_addr option to grafana config We have no reason to make grafana container listen on *:<port>, so this change adds the http_addr option to the grafana config file and adds the related option on the wait_for tasks. Since grafana_server_addr should exists, we shouldn't rely on the _current_monitor_addr default on prometheus/grafana templates. This change also remove this default value that is not necessary anymore. Signed-off-by: fmount <fpantano@redhat.com> (cherry picked from commit `8a666bfd15`)	2019-08-30 09:04:16 -04:00
Dimitri Savineau	ab67c6bd76	lint: fix error [201,206] [201] Trailing whitespace [206] Variables should have spaces before and after: {{ var_name }} Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `42082c0a27`)	2019-08-30 09:04:00 -04:00
Johannes Kastl	64b11ab2b9	fix openSUSE OBS repo creation roles/ceph-common/tasks/installs/suse_obs_repository.yml: ansible's zypper_repository module does not know a parameter 'uri', this is called 'repo' instead Signed-off-by: Johannes Kastl <kastl@b1-systems.de> (cherry picked from commit `4711a7d626`)	2019-08-29 16:31:40 +00:00
Nick Erdmann	e8e1f310dd	ceph-infra: open ceph iscsi/prometheus port Signed-off-by: Nick Erdmann <n@nirf.de> (cherry picked from commit `7953ee1b81`)	2019-08-29 10:22:28 -04:00
Dimitri Savineau	0d55eeba79	tests: use a single grafana node on podman We don't use multiple grafana nodes for the moment on the others scenarios and I don't think this is supposed to be working. We can often see failure on grafana on that scenario. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `825045f6b4`)	2019-08-28 17:48:12 +00:00
Guillaume Abrioux	a3cbb59c05	lint: fix error [301], add `changed_when: false` when needed This commit fixes the error [301]: `[301] Commands should not change things if nothing needs doing` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `327d564106`)	2019-08-28 11:22:47 -04:00
Guillaume Abrioux	8f781198d6	lint: fix error [306], add pipefail on shell command using pipe This commit fixes the error [306]: `[306] Shells that use pipes should set the pipefail option` using `/bin/bash` as executable because Debian/Ubuntu systems use `dash` by default which doesn't have the `-o pipefail`. (See: https://github.com/ansible/ansible-lint/issues/497#issue-424623501) Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `102edaeb61`)	2019-08-28 11:22:47 -04:00
Dimitri Savineau	364951ce2f	ceph-mon: Bind mount the ca-trust directory On containerized deployment, the mon container sometimes needs to access to the radosgw endpoint (via the radosgw-admin command). When using TLS on the radosgw with self-signed certificates then we need to access to the CA certification from the mon container. The CA certificate needs to be added on the host and then the directory will be bind mount on the container. Resolves: #4358 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `2b0616ecca`)	2019-08-28 09:44:34 -04:00
Dimitri Savineau	1fbfa1ce1a	ceph-client: Use profile rbd in keyring caps Like the OpenStack keyrings, we can use the profile rbd for the clients keyring (both mon and osd). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `49aa05b96c`)	2019-08-28 09:42:03 -04:00
Dimitri Savineau	4df8de8f7b	Revert "osd: add 'osd blacklist' cap for osp keyrings" This reverts commit `2d955757ee`. The "osd blacklist" isn't an osd caps but should be used with mon caps. Also the correct caps for this is: 'allow command "osd blacklist"'. The current change is breaking the openstack and clients keyrings. By using the profile rbd (which is already used) we already rely on the ability to blacklist dead client. Resolves: #4385 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `717af83475`)	2019-08-28 09:42:03 -04:00
Johannes Kastl	3bfa1c50de	set discovered_python_interpreter if ansible_python_interpreter is defined If the user has set the `ansible_python_interpreter`, ansible will not try to discover python, so `discovered_python_interpreter` will not be set. Solution: Set `discovered_python_interpreter` to `ansible_python_interpreter` if `ansible_python_interpreter` is defined Signed-off-by: Johannes Kastl <kastl@b1-systems.de> (cherry picked from commit `bd507fa147`)	2019-08-27 21:06:43 +00:00
guihecheng	196e70a75a	rgw/multisite: assign 'rgw_zone' to the exact section in ceph.conf since the following commit: commit `1ac94c048f` rgw: add support for multiple rgw instances on a single host we have multi-instance rgw support on a single host and the config section name of the rgw changed from [client.rgw.$(hostname)] -> [client.rgw.$(hostname).rgwX] when X is the sequence number: 0,1,2,... So we should assign 'rgw_zone' item to the exact rgw instance config section in ceph.conf Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com> (cherry picked from commit `a0590cae9d`)	2019-08-23 15:56:15 +02:00
Artur Fijalkowski	27014df45e	global: make directories mode parameterizable This commit makes it possible to parametrize the ceph directories modes. So it changes hardocded mode for ceph related directories from 0755 to customizable with `ceph_directories_mode` variable. Closes: #2920 Signed-off-by: Artur Fijalkowski <artur.fijalkowski@ing.com> Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `011270ca69`)	2019-08-23 11:39:23 +00:00
Dimitri Savineau	500c59c648	ceph-osd: Add ulimit nofile on container start On containerized deployment, the OSD entrypoint runs some ceph-volume commands (lvm/simple scan and/or activate) which perform badly without the ulimit option. This option was added for all previous ceph-volume commands but not on the ceph-osd container startup. Also updating hard limit value to 4096 to reflect default baremetal value. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `9a4ac46d19`)	2019-08-22 22:50:17 +00:00
Kevin Coakley	c7950d5539	ceph-config: Set changed_when to false on fact gathering statements The "run 'ceph-volume lvm batch --report' to see how many osds are to be created" and "run 'ceph-volume lvm list' to see how many osds have already been created" statements only register the lvm_batch_report and lvm_list variables. Running those ceph-volume commands should never produce a change on the system. Adding changed_when: false prevents irrelevant change messages from Ansible. Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu> (cherry picked from commit `e11cbbbcb1`)	2019-08-22 20:36:39 +02:00
Johannes Kastl	3e17c458d0	facts: fix a typo This commit fixes a typo in roles/ceph-facts/tasks/facts.yml Signed-off-by: Johannes Kastl <kastl@b1-systems.de> (cherry picked from commit `e1b9312084`)	2019-08-22 18:11:18 +02:00
Kevin Jones	3a8de9cc36	Set proper ownership command performance improvement By changing the set ownership command from using the file module in combination with a with_items loop to a raw chown command, we can achieve a 98% performance increase here. On a ceph cluster with a significant amount of directories and files in /var/lib/ceph, the file module has to run checks on ownership of all those directories and files to determine whether a change is needed. In this case, we just want to explicitly set the ownership of all these directories and files to the ceph_uid Added context note to all set proper ownership tasks Signed-off-by: Kevin Jones <kevinjones@redhat.com> (cherry picked from commit `47bf47c9d8`)	2019-08-22 12:59:58 +02:00
Johannes Kastl	82ede0afdb	ceph-nfs: fail on openSUSE Leap using distro packages roles/ceph-validate/tasks/check_nfs.yml: fail on openSUSE Leap using `ceph_origin = distro`, as the ganesha packages are not available from the distribution repositories Fixes: #4342 Signed-off-by: Johannes Kastl <kastl@b1-systems.de> (cherry picked from commit `11aa5dbb58`)	2019-08-21 15:40:22 +02:00
Guillaume Abrioux	fcf571430b	handler: do not validate the server certificate against the CA Otherwise rgw handler ends up with an error when using https. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `9329bbb3af`)	2019-08-21 15:40:07 +02:00
Johannes Kastl	15646d1030	install ceph-mds packages on SUSE/openSUSE install packages on SUSE/openSUSE distributions, using the same logic as on RedHat-based distributions Fixes #4340 Signed-off-by: Johannes Kastl <kastl@b1-systems.de> (cherry picked from commit `c721cb99cb`)	2019-08-21 09:54:09 +00:00
Johannes Kastl	34783253a5	remove duplicate task installing suse dependencies roles/ceph-common/tasks/installs/install_on_suse.yml: remove the task that installs the dependencies, as this is done later in install_suse_packages.yml Signed-off-by: Johannes Kastl <kastl@b1-systems.de> (cherry picked from commit `504017d562`)	2019-08-20 14:36:15 +02:00
Guillaume Abrioux	642851fa5d	osd: add 'osd blacklist' cap for osp keyrings This commits adds the `osd blacklist` cap on all OSP clients keyrings. Fixes: #2296 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `2d955757ee`)	2019-08-20 13:09:05 +02:00
Guillaume Abrioux	3fc880ee7a	validate: do not validate devices or lvm_volumes in osd_auto_discovery case we shouldn't validate these two variables when `osd_auto_discovery` is set. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1644623 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `243edfbc96`)	2019-08-20 11:09:05 +02:00
Johannes Kastl	6fa0eb90a2	only support openSUSE Leap 15.x, fail on 42.x openSUSE switched from 'openSUSE 13.x' to 'openSUSE Leap 42.x' and then to 'openSUSE Leap 15.x' to align with SLES15 development. The previous logic did not correctly allow the current release, as 15.x matched the 'less than 42.3' condition. For now only support openSUSE Leap 15.x, and extend support once 16.x is released (or whatever the exact version will be) Signed-off-by: Johannes Kastl <kastl@b1-systems.de> (cherry picked from commit `5ee3d96fb4`)	2019-08-20 09:37:29 +02:00
Guillaume Abrioux	19c7b650db	osd: remove useless condition just like `ceph_osd_pool_default_size`, a pool size might change after an initial deployment. Having this condition prevents from customizing the pool in that case. This is not needed so let's remove it. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `70cf2a5846`)	2019-08-20 09:13:15 +02:00
Guillaume Abrioux	6d90dbc3c0	common: replace shell module there is no need to use `shell` in these tasks. Let's use `command`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `4df92152c0`)	2019-08-19 18:47:14 +00:00
Guillaume Abrioux	236020fb2b	shrink-mon: refact 'verify the monitor is out of the cluster' task use `from_json` filter instead of a `\| python` so we can get rid of the `shell` module usage here. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5573f17e76`)	2019-08-19 18:47:14 +00:00
Rishabh Dave	b28ed96378	use pre_tasks and post_tasks in shrink-mon.yml too This commit should've been part of commit `2fb12ae554`. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `2034387f57`)	2019-08-19 18:47:14 +00:00
Guillaume Abrioux	f08408bf5c	osd: refact 'wait for all osd to be up' task let's use `until` instead of doing test in bash using python oneliner also, use `command` instead of `shell`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `687087fd43`)	2019-08-19 18:47:14 +00:00
Guillaume Abrioux	2f77704591	common: use discovered_interpreter_python fact in order to use the right binary name when using python cli in command or shell module. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `13815ad3ca`)	2019-08-19 18:47:14 +00:00
Guillaume Abrioux	4b2d13995d	refact python installation This commit refacts the python installation when no available. In order to avoid generating errors, we check for each package manager to detect which system we are running on. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d3fa3c2d72`)	2019-08-19 18:47:14 +00:00
Guillaume Abrioux	0f90ffe9df	mgr: refact 'wait for all mgr to be up' task There's no need to use `shell` module here. Instead of using `\| python -c`, let's use `from_json` filter. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5b9b841108`)	2019-08-08 15:57:54 +02:00
Dimitri Savineau	d4348da7a1	mgr/dashboard: Fix grafana/prometheus url config When configuring grafana/prometheus embed in the mgr/dashboard, we need to use the address of the grafana-server node and not the current hostname because mgr/dashboard and grafana/prometheus could be present on different hosts. We should instead rely on the grafana_server_addr variable and remove the dashboard_url. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `4c6ec1dccb`)	2019-08-08 13:47:09 +02:00
Dimitri Savineau	f9d9ffac8f	dashboard: run dashboard role on mgr/mon nodes We don't need to execute the ceph-dashboard role on the nodes present in the grafana-server group. This one is dedicated to the grafana and prometheus stack. The ceph-dashboard needs to executed where the ceph-mgr is running. It is either on the dedicated mgr nodes or if mgr and mon are collocated implicitly on the mon nodes. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `16939eff9e`)	2019-08-08 13:47:09 +02:00
Dimitri Savineau	cf82ac5590	ceph-dashboard: Add run_once on delegate tasks Because we need to execute commands from a monitor node (the first one in the mons list) we are using delegate_to option. If there's multiple nodes running the ceph-dashboard role then the delegated task will be executed multiple times. Also remove a mgr config-key option not present for nautilus+ releases. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `f545b5be0d`)	2019-08-08 13:47:09 +02:00
Dimitri Savineau	8bb1be30fa	ceph-infra: Apply firewall rules with container We don't have a reason to not apply firewall rules on the host when using a containerized deployment. The TripleO environments already manage the ceph firewall rules outside ceph-ansible and set the configure_firewall variable to false. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1733251 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `771f25b1f8`)	2019-08-07 10:41:47 +02:00
Guillaume Abrioux	7550f47661	dashboard: do not deploy on Debian based OS/non-containerized in non-containerized deployment, we can't deploy dashboard on Debian based distribution since the package `ceph-grafana-dashboards` isn't available. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `dc7eb535b6`)	2019-08-07 10:41:24 +02:00
Dimitri Savineau	308e5fe9f4	ceph-grafana: Set grafana uid/gid on files We don't need to create a grafana system user (in fact we even don't set the righ uid to this user) because we're using a container setup. Instead we just need to be sure to set the owner/group to 472 (grafana user/group from the container) like we do for ceph/167. We don't need to set the user/group recursively on /etc/grafana directory in a dedicated task. Also on Ubuntu system, the ceph-grafana-dashboards isn't present so on non containerized deployment we won't have the /etc/grafana/dashboards/ceph-dashboard directory present (coming with the package) so we need to be sure it exists. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `34036c667c`)	2019-08-07 10:41:03 +02:00
Dimitri Savineau	6a5308fa7f	tests/shrink_rgw: Disable dashboard The shrink_rgw scenario has been merge just after the PR about enable ceph dashboard by default. So right now the shrink_rgw scenrio doesn't have nodes in the grafana group and fails. We just need to set dashboard_enabled to false. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `867583d5dd`)	2019-07-31 15:25:15 -04:00
Rishabh Dave	06c0a06122	tests/functional: add a test for shrink-rgw.yml Add a new functional test that deploys a Ceph cluster with three nodes for MON, OSD and RGW and then runs shrink-rgw.yml to test it. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `236b081a3a`) # Conflicts: # tox.ini	2019-07-31 15:25:15 -04:00
Rishabh Dave	72a062b6fa	add a playbook the remove rgw from a given node Add a playbook named shrink-rgw.yml to infrastructure-playbooks/ that can remove a RGW from a node in an already deployed Ceph cluster. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431 Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `632a44bdf2`)	2019-07-31 15:25:15 -04:00
Dimitri Savineau	36e18e20d1	ceph-osd: check container engine rc for pools When creating OpenStack pools, we only check if the return code from the pool list command isn't 0 (ie: if it doesn't exist). In that case, the return code will be 2. That's why the next condition is rc != 0 for the pool creation. But in containerized deployment, the return code could be different if there's a failure on the container engine command (like container not running). In that case, the return code could but either 1 (docker) or 125 (podman) so we should fail at this point and not in the next tasks. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1732157 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `d549fffdd2`)	2019-07-31 14:07:41 -04:00
Guillaume Abrioux	d2ef85b615	tests: add more memory in podman job Typical error : ``` fatal: [mon1 -> mon0]: FAILED! => changed=true cmd: - podman - exec - ceph-mon-mon0 - ceph - config - set - mgr - mgr/dashboard/ssl - 'false' delta: '0:00:00.644870' end: '2019-07-30 10:17:32.715639' msg: non-zero return code rc: 1 start: '2019-07-30 10:17:32.070769' stderr: \|- Traceback (most recent call last): File "/usr/bin/ceph", line 140, in <module> import rados ImportError: libceph-common.so.0: cannot map zero-fill pages: Cannot allocate memory Error: exit status 1 stderr_lines: <omitted> stdout: '' stdout_lines: <omitted> ``` Let's add more memory to get around this issue. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `0f620b2584`)	2019-07-30 15:08:46 +02:00
Guillaume Abrioux	d7d661d5d7	tests: deploy dashboard on mons there's no dedicated nodes for mgr, let's use monitor nodes. The mgr0 instance spawned isn't used, so if this node is part of the inventory for this scenario, testinfra will complain because there's no ceph.conf on this node. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d649e00893`)	2019-07-30 15:08:46 +02:00
Guillaume Abrioux	51af74face	dashboard: fix timeout usage on rgw user creation command For some reason, this is making the playbook failing like following: ``` TASK [ceph-dashboard : create radosgw system user] ********************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************** task path: /home/guits/ceph-ansible/roles/ceph-dashboard/tasks/configure_dashboard.yml:106 Tuesday 30 July 2019 10:04:54 +0200 (0:00:01.910) 0:11:22.319 ******** FAILED - RETRYING: create radosgw system user (3 retries left). FAILED - RETRYING: create radosgw system user (2 retries left). FAILED - RETRYING: create radosgw system user (1 retries left). fatal: [mgr0 -> mon0]: FAILED! => changed=true attempts: 3 cmd: timeout 20 podman exec ceph-mon-mon0 radosgw-admin user create --uid=ceph-dashboard --display-name='Ceph dashboard' --system delta: '0:00:20.021973' end: '2019-07-30 08:06:32.656066' msg: non-zero return code rc: 124 start: '2019-07-30 08:06:12.634093' stderr: 'exec failed: container_linux.go:336: starting container process caused "process_linux.go:82: copying bootstrap data to pipe caused \"write init-p: broken pipe\""' stderr_lines: <omitted> stdout: '' stdout_lines: <omitted> ``` using `timeout -f -s KILL` fixes this issue. Also, there is no need to use `shell` module here, let's switch to `command`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c9d80af4e0`)	2019-07-30 15:08:46 +02:00

... 2 3 4 5 6 ...

4926 Commits (78799ecf553f26b3608d46f51bdfdc4b08c083f9) All Branches Search

4926 Commits (78799ecf553f26b3608d46f51bdfdc4b08c083f9)

All Branches