ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Dimitri Savineau	f545b5be0d	ceph-dashboard: Add run_once on delegate tasks Because we need to execute commands from a monitor node (the first one in the mons list) we are using delegate_to option. If there's multiple nodes running the ceph-dashboard role then the delegated task will be executed multiple times. Also remove a mgr config-key option not present for nautilus+ releases. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-08-06 09:34:20 +02:00
Johannes Kastl	5ee3d96fb4	only support openSUSE Leap 15.x, fail on 42.x openSUSE switched from 'openSUSE 13.x' to 'openSUSE Leap 42.x' and then to 'openSUSE Leap 15.x' to align with SLES15 development. The previous logic did not correctly allow the current release, as 15.x matched the 'less than 42.3' condition. For now only support openSUSE Leap 15.x, and extend support once 16.x is released (or whatever the exact version will be) Signed-off-by: Johannes Kastl <kastl@b1-systems.de>	2019-08-05 09:46:31 -04:00
Dimitri Savineau	771f25b1f8	ceph-infra: Apply firewall rules with container We don't have a reason to not apply firewall rules on the host when using a containerized deployment. The TripleO environments already manage the ceph firewall rules outside ceph-ansible and set the configure_firewall variable to false. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1733251 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-08-01 15:16:49 +02:00
Dimitri Savineau	34036c667c	ceph-grafana: Set grafana uid/gid on files We don't need to create a grafana system user (in fact we even don't set the righ uid to this user) because we're using a container setup. Instead we just need to be sure to set the owner/group to 472 (grafana user/group from the container) like we do for ceph/167. We don't need to set the user/group recursively on /etc/grafana directory in a dedicated task. Also on Ubuntu system, the ceph-grafana-dashboards isn't present so on non containerized deployment we won't have the /etc/grafana/dashboards/ceph-dashboard directory present (coming with the package) so we need to be sure it exists. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-08-01 10:10:56 +02:00
Guillaume Abrioux	c9d80af4e0	dashboard: fix timeout usage on rgw user creation command For some reason, this is making the playbook failing like following: ``` TASK [ceph-dashboard : create radosgw system user] ********************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************** task path: /home/guits/ceph-ansible/roles/ceph-dashboard/tasks/configure_dashboard.yml:106 Tuesday 30 July 2019 10:04:54 +0200 (0:00:01.910) 0:11:22.319 ******** FAILED - RETRYING: create radosgw system user (3 retries left). FAILED - RETRYING: create radosgw system user (2 retries left). FAILED - RETRYING: create radosgw system user (1 retries left). fatal: [mgr0 -> mon0]: FAILED! => changed=true attempts: 3 cmd: timeout 20 podman exec ceph-mon-mon0 radosgw-admin user create --uid=ceph-dashboard --display-name='Ceph dashboard' --system delta: '0:00:20.021973' end: '2019-07-30 08:06:32.656066' msg: non-zero return code rc: 124 start: '2019-07-30 08:06:12.634093' stderr: 'exec failed: container_linux.go:336: starting container process caused "process_linux.go:82: copying bootstrap data to pipe caused \"write init-p: broken pipe\""' stderr_lines: <omitted> stdout: '' stdout_lines: <omitted> ``` using `timeout -f -s KILL` fixes this issue. Also, there is no need to use `shell` module here, let's switch to `command`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-30 13:52:44 +02:00
Guillaume Abrioux	2d955757ee	osd: add 'osd blacklist' cap for osp keyrings This commits adds the `osd blacklist` cap on all OSP clients keyrings. Fixes: #2296 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-29 09:57:25 -04:00
Dimitri Savineau	d549fffdd2	ceph-osd: check container engine rc for pools When creating OpenStack pools, we only check if the return code from the pool list command isn't 0 (ie: if it doesn't exist). In that case, the return code will be 2. That's why the next condition is rc != 0 for the pool creation. But in containerized deployment, the return code could be different if there's a failure on the container engine command (like container not running). In that case, the return code could but either 1 (docker) or 125 (podman) so we should fail at this point and not in the next tasks. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1732157 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-29 15:55:04 +02:00
Guillaume Abrioux	02beb00916	validate: add checks for grafana-server group definition this commit adds two checks: - check that the `[grafana-server]` group is defined - check that the `[grafana-server]` contains at least one node. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-29 14:42:45 +02:00
Guillaume Abrioux	ec33ee7574	mgr: fix a typo this tasks isn't using the right container_exec_cmd, that's delegating to the wrong node. Let's use the right fact to fix this command. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-29 14:42:45 +02:00
Guillaume Abrioux	b9cdf341be	dashboard: remove cfg80211 module installation According to this comment [1], this seems to be needed to detect wifi devices. In node exporter we can see this: ``` --collector.wifi Enable the wifi collector (default: disabled). ``` since it's enabled by default and we don't even change this in our systemd templates for node-exporter, we can easily assume in the end it's not needed. Therefore, let's remove this. [1] `dbf81b6b5b (diff-961545214e21efed3b84a9e178927a08L21-L23)` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-29 14:42:45 +02:00
Guillaume Abrioux	d67230b2a2	dashboard: use dedicated group only There's no need to add complexity and trying to fallback on other group. Let's deploy dashboard on all nodes present in grafana-server group. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-29 14:42:45 +02:00
Guillaume Abrioux	fb1b5b3251	dashboard: enable dashboard by default This commit enables dashboard deployment by default. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1726739 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-29 14:42:45 +02:00
Dimitri Savineau	07c6695d16	Remove NBSP characters Some NBSP are still present in the yaml files. Adding a test in travis CI. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-26 16:09:23 -04:00
Guillaume Abrioux	19950b5170	container: rename docker directories Those 2 directories should be renamed to be more generic (docker vs. podman). Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-24 16:31:46 +02:00
fmount	fac1b030cb	Avoid to setup provisioners in a fully containerized environment This commit adds a when clause to avoid the setup of grafana provisioners in a fully containerized scenario. This is needed when the ceph-grafana-dashboards package is not installed and this task could result in a wrong grafana configuration that let the container crash. Signed-off-by: fmount <fpantano@redhat.com>	2019-07-23 09:06:50 +02:00
Giulio Fidente	edd1420217	Fix backward compat with old cephfs_pools format Previously cephfs_pools items used to have a pgs: key but not pgp_num: nor pg_num: Signed-off-by: Giulio Fidente <gfidente@redhat.com>	2019-07-19 11:56:58 -04:00
Guillaume Abrioux	618dbf271d	handler: fix bug in osd handlers `fbf4ed42ae` introduced a bug when container binary is podman. podman doesn't support ps -f using regular expression, the container id is never set in the restart script causing the handler to fail. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1721536 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-18 16:22:51 +02:00
Guillaume Abrioux	487d701685	validate: fail if gpt header found on unprepared devices ceph-volume will complain if gpt headers are found on devices. This commit checks whether a gpt header is present on devices passed in `devices` variable and fail early. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1730541 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-18 07:43:55 +02:00
Dimitri Savineau	5383c2f7f3	ceph-dashboard: enable rgw options conditionally The dashboard rgw frontend options only need to be applied when there's some nodes present in the rgw ansible group. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-18 07:22:13 +02:00
Dimitri Savineau	8ab9b719fa	dashboard: use variables for port value The current port value for alertmanager, grafana, node-exporter and prometheus is hardcoded in the roles so it's not possible to change the port binding of those services. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-18 07:22:13 +02:00
Dimitri Savineau	0ae0193144	ceph-infra: update handler with daemon variable Both ntp and chrony daemon use variable for the service name because it could be different depending on the GNU/Linux distribution. This has been update in `9d88d3199` for chrony but only for the start part not for the handler. The commit fixes this for both ntp and chrony. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-12 09:14:33 -04:00
Dimitri Savineau	41b44dde85	ceph-infra: Open prometheus port The Prometheus porrt 9090 isn't open in the firewall configuration. Also the dashboard task on the grafana node was not required because it's already present on the mgr node. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-11 13:40:22 +02:00
Guillaume Abrioux	ee29f7370a	handler: remove legacy condition since everything is already in a block with the same condition, it's not needed to leave all of them on these tasks. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-10 09:42:00 -04:00
Guillaume Abrioux	e6dc3ebd8c	validate: improve message printed in check_devices.yml The message prints the whole content of the registered variable in the playbook, this is not needed and makes the message pretty unclear and unreadable. ``` "msg": "{'_ansible_parsed': True, 'changed': False, '_ansible_no_log': False, u'err': u'Error: Could not stat device /dev/sdf - No such file or directory.\\n', 'item': u'/dev/sdf', '_ansible_item_result': True, u'failed': False, '_ansible_item_label': u'/dev/sdf', u'msg': u\"Error while getting device information with parted script: '/sbin/parted -s -m /dev/sdf -- unit 'MiB' print'\", u'rc': 1, u'invocation': {u'module_args': {u'part_start': u'0%', u'part_end': u'100%', u'name': None, u'align': u'optimal', u'number': None, u'label': u'msdos', u'state': u'info', u'part_type': u'primary', u'flags': None, u'device': u'/dev/sdf', u'unit': u'MiB'}}, 'failed_when_result': False, '_ansible_ignore_errors': None, u'out': u''} is not a block special file!" ``` Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1719023 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-10 09:32:11 -04:00
Dimitri Savineau	1f2a4f1910	ceph-iscsi: Update gateway config/template - Remove gateway_keyring from the configuration file because it's not used in ceph-iscsi 3.x release. - Use config_template instead of template module for iscsi-gateway configuration file. Because the file is an ini file and we might want to override more parameters than those present in ceph-ansible. - Because we can now set the pool name in the configuration, we should use a variable for that. This is refact with the iscsi_pool_* variables also used to configure the pool size. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-10 09:44:40 +02:00
Dimitri Savineau	5413274412	ceph-dashboard: remove bool filter for rgw vars Some dashboard_rgw_api_* variables are using the bool filter but those variables are strings with an empty string as default value. So we should test the variable against an empty string instead of a bool. dashboard_rgw_api_host: '' dashboard_rgw_api_port: '' dashboard_rgw_api_scheme: '' dashboard_rgw_api_admin_resource: '' Resolves: #4179 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-10 09:42:37 +02:00
Boris Ranto	21758fcee8	dashboard: Use upstream default port We are currently using incorrect dashboard default port. The upstream uses 8443 instead of 8234 by default. This should get us closer to the upstream project. Signed-off-by: Boris Ranto <branto@redhat.com>	2019-07-10 09:17:36 +02:00
Dimitri Savineau	de7f948b75	ceph-handler: fix cluster name in socket path `c90f605b5` introduces the default ceph cluster name value in the rgw socket path for the rgw restart script. But this should use the `cluster` variable instead. This commit also fixes this in the osd restart script. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-08 13:55:35 -04:00
fmount	95bd002b35	Add package-install tag on ceph-grafana-dashboard pkg install. According to the OSP pattern, we need the package-install tag to control what is installed on the host. This commit just add the missing tag to meet the TripleO requirements. See: /issues/4197 for details Fixes: #4197 Signed-off-by: fmount <fpantano@redhat.com>	2019-07-08 10:54:30 +02:00
Dimitri Savineau	91bef94b6c	ceph-iscsi-gw: Update log directories bind mount On containerized deployment we need to bind mount the ceph-iscsi directory to avoid writing the logs in the container. The /var/log/ceph directory isn't use by rbd-targe-api/gw services because they have their own log directories. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-07 07:25:33 +02:00
ilyashestopalov	904532c5e2	ceph-mon: Fix cluster name parameter The ability to add nodes with the monitor role to an existing cluster whose name differs from the default name is fixed. Signed-off-by: ilyashestopalov <usr.tester@yandex.ru>	2019-07-07 07:21:29 +02:00
Guillaume Abrioux	a781ce881c	iscsi: refact deprecated variables This commit moves some old variables into ceph-defaults so we can move the `use_new_ceph_iscsi` fact in ceph-facts role in order. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-03 22:13:19 +02:00
Mike Christie	08a6d10c32	igw: Add check for missing iqn If the user is still using the older packages and does not setup the target iqn you will just get a vague error message later on. This adds a check during the validate task, so it is clear to the user. Signed-off-by: Mike Christie <mchristi@redhat.com>	2019-07-03 22:13:19 +02:00
Mike Christie	75fee55d19	igw: Update iscsigws.yml.sample for ceph-iscsi support Update iscsigws.yml.sample to document that we cannot use ansible to setup iSCSI objects and use the new ceph-iscsi package. Signed-off-by: Mike Christie <mchristi@redhat.com>	2019-07-03 22:13:19 +02:00
Mike Christie	cbe66cec52	igw: Support ceph-iscsi package for install This adds support for the ceph-iscsi package during install. ceph-iscsi does not support setting up targets/gws, luns and clients with the current library/igw_* code. Going forward those tasks should be done with gwcli or dashboard. ceph-iscsi will only be used if the user has no iscsi objects setup so we do not break existing setups. The next patch will update the iscsigws.yml.sample to document that users must not setup any iscsi object if they want to use the new package and tools. Signed-off-by: Mike Christie <mchristi@redhat.com>	2019-07-03 22:13:19 +02:00
Mike Christie	b7b2213be1	igw: drop gateway_ip_list for container setups The gateway_ip_list is not used in container setups, so drop it for that case. Signed-off-by: Mike Christie <mchristi@redhat.com>	2019-07-03 22:13:19 +02:00
Mike Christie	d89d3e7cd6	igw: move gateway_ip_list check to validate role Signed-off-by: Mike Christie <mchristi@redhat.com>	2019-07-03 22:13:19 +02:00
Dimitri Savineau	c90f605b51	ceph-handler: Fix rgw socket in restart script Since Mimic the radosgw socket has two extra fields in the socket name (before the .asok suffix): <pid>.<ctid> Before: /var/run/ceph/ceph-client.rgw.cephaio-1.asok After: /var/run/ceph/ceph-client.rgw.cephaio-1.16913.23928832.asok The radosgw restart script doesn't handle this and could fail during an upgrade. If the SOCKETS variable isn't defined in the script then the test command won't fail because the return code is 0 $ test -S $ echo $? 0 There multiple issues in that script: - The default SOCKETS value isn't defined due to a typo SOCKET vs SOCKETS. - Because the socket name uses the pid then we need to check the socket name after the service restart. - After restarting the radosgw service we need to wait few seconds otherwise the socket won't be created. - Update the wget parameters because the command is doing a loop. We now use the same option than curl. - The check_rest function doesn't test the radosgw at all due to a wrong test command (test against a string) and always returns 0. This needs to use the DOCKER_EXECS variable in order to execute the command. $ test 'wget http://192.168.100.11:8080' $ echo $? 0 Also remove the test based on the ansible_fqdn because we only use the ansible_hostname + rgw instance name. Finally group all for loop into a single one. Resolves: #3926 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-03 09:30:33 +02:00
Giulio Fidente	d526803c6c	Add radosgw_frontend_ssl_certificate parameter This is necessary when configuring RGW with SSL because in addition to passing specific frontend options, civetweb appends the 's' character to the binding port and beast uses ssl_endpoint instead of endpoint. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1722071 Signed-off-by: Giulio Fidente <gfidente@redhat.com>	2019-07-02 14:14:37 -04:00
Guillaume Abrioux	b725b3077e	nfs: clean template remove legacy options ``` ganesha.nfsd-115[main] config_errs_to_log :CONFIG :WARN :Config File (/etc/ganesha/ganesha.conf:13): Unknown parameter (Dir_Max) ganesha.nfsd-115[main] config_errs_to_log :CONFIG :WARN :Config File (/etc/ganesha/ganesha.conf:14): Unknown parameter (Cache_FDs) ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-28 15:09:19 -04:00
Guillaume Abrioux	33eed78d17	containers: improve logging bindmount /var/log/ceph on all containers so it's possible to retrieve logs from the host. related ceph-container PR: ceph/ceph-container#1408 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1710548 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-28 13:30:36 -04:00
Dimitri Savineau	02fbe76e62	ceph-osd: Add CONTAINER_IMAGE env variable This environment variable was added in `cb381b4` but was removed in `4d35e9e`. This commit reintroduces the change. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-27 16:38:02 +02:00
fmount	e655038743	Set grafana_server_addr fact for ipv6 scenarios. As the bz1721914 describes, the grafana_server_addr fact is not defined if ip_version used is ipv6. This commit adds the ip_version condition to set correctly this fact. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1721914 Signed-off-by: fmount <fpantano@redhat.com>	2019-06-26 15:47:22 +02:00
Guillaume Abrioux	366b309c12	facts: fix bug in grafana_server_addr fact setting If no grafana-server group is defined while an mgr group is, that task will fail because `hostvars[groups[grafana_server_group_name][0]` can't return anything since `groups['grafana-server']` will be a non existing key. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-26 10:49:30 +02:00
Guillaume Abrioux	2b9fb377a8	nfs: add missing \| bool filters To address this warning: ``` [DEPRECATION WARNING]: evaluating nfs_ganesha_dev as a bare variable, this behaviour will go away and you might need to add \|bool to the expression in the future ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-26 08:58:51 +02:00
Guillaume Abrioux	edb8d42596	nfs: remove duplicate task This task is already present in pre_requisite_non_container.yml Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-26 08:58:51 +02:00
Dimitri Savineau	45d46541cb	ceph-handler: Fix OSD restart script There's two big issues with the current OSD restart script. 1/ We try to test if the ceph osd daemon socket exists but we use a wildcard for the socket name : /var/run/ceph/*.asok. This fails because we usually have multiple ceph osd sockets (or other ceph daemon collocated) present in /var/run/ceph directory. Currently the test fails with: bash: line xxx: [: too many arguments But it doesn't stop the script execution. Instead we can specify the full ceph osd socket name because we already know the OSD id. 2/ The container filter pattern is wrong and could matches multiple containers resulting the script to fail. We use the filter with two different patterns. One is with the device name (sda, sdb, ..) and the other one is with the OSD id (ceph-osd-0, ceph-osd-15, ..). In both case we could match more than needed. $ docker container ls CONTAINER ID IMAGE NAMES 958121a7cc7d ceph-daemon:latest ceph-osd-strg0-sda 589a982d43b5 ceph-daemon:latest ceph-osd-strg0-sdb 46c7240d71f3 ceph-daemon:latest ceph-osd-strg0-sdaa 877985ec3aca ceph-daemon:latest ceph-osd-strg0-sdab $ docker container ls -q -f "name=sda" 958121a7cc7d 46c7240d71f3 877985ec3aca $ docker container ls CONTAINER ID IMAGE NAMES 2db399b3ee85 ceph-daemon:latest ceph-osd-5 099dc13f08f1 ceph-daemon:latest ceph-osd-13 5d0c2fe8f121 ceph-daemon:latest ceph-osd-17 d6c7b89db1d1 ceph-daemon:latest ceph-osd-1 $ docker container ls -q -f "name=ceph-osd-1" 099dc13f08f1 5d0c2fe8f121 d6c7b89db1d1 Adding an extra '$' character at the end of the pattern solves the problem. Finally removing the get_container_osd_id function because it's not used in the script at all. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-21 19:54:15 +02:00
Dimitri Savineau	dc187ea6fa	Change ansible_lsb by ansible_distribution_release The ansible_lsb fact is based on the lsb package (lsb-base, lsb-release or redhat-lsb-core). If the package isn't installed on the remote host then the fact isn't populated. -------- "ansible_lsb": {}, -------- Switching to the ansible_distribution_release fact instead. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-21 11:55:05 -04:00
fpantano	ba73dc7b21	Add higher retry/delay defaults to check the quorum status. As per bz1718981, this commit adds higher values to check the quorum status. This is helpful for several OSP deployments that fail during the scale up. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1718981 Signed-off-by: fpantano <fpantano@redhat.com>	2019-06-20 22:39:57 +02:00
Dimitri Savineau	b987534881	ceph-volume: Set max open files limit on container The ceph-volume lvm list command takes ages to complete when having a lot of LV devices on containerized deployment. For instance, with 25 OSDs on a node it takes 3 mins 44s to list the OSD. Adding the max open files limit to the container engine cli when executing the ceph-volume command seems to improve a lot thee execution time ~30s. This was impacting the OSDs creation with ceph-volume (both filestore and bluestore) when using multiple LV devices. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-20 22:37:40 +02:00

1 2 3 4 5 ...

2343 Commits (16939eff9efc680aebaafa12f2a25a804400b5e9)