ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Dimitri Savineau	36e18e20d1	ceph-osd: check container engine rc for pools When creating OpenStack pools, we only check if the return code from the pool list command isn't 0 (ie: if it doesn't exist). In that case, the return code will be 2. That's why the next condition is rc != 0 for the pool creation. But in containerized deployment, the return code could be different if there's a failure on the container engine command (like container not running). In that case, the return code could but either 1 (docker) or 125 (podman) so we should fail at this point and not in the next tasks. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1732157 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `d549fffdd2`)	2019-07-31 14:07:41 -04:00
Guillaume Abrioux	51af74face	dashboard: fix timeout usage on rgw user creation command For some reason, this is making the playbook failing like following: ``` TASK [ceph-dashboard : create radosgw system user] ********************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************** task path: /home/guits/ceph-ansible/roles/ceph-dashboard/tasks/configure_dashboard.yml:106 Tuesday 30 July 2019 10:04:54 +0200 (0:00:01.910) 0:11:22.319 ******** FAILED - RETRYING: create radosgw system user (3 retries left). FAILED - RETRYING: create radosgw system user (2 retries left). FAILED - RETRYING: create radosgw system user (1 retries left). fatal: [mgr0 -> mon0]: FAILED! => changed=true attempts: 3 cmd: timeout 20 podman exec ceph-mon-mon0 radosgw-admin user create --uid=ceph-dashboard --display-name='Ceph dashboard' --system delta: '0:00:20.021973' end: '2019-07-30 08:06:32.656066' msg: non-zero return code rc: 124 start: '2019-07-30 08:06:12.634093' stderr: 'exec failed: container_linux.go:336: starting container process caused "process_linux.go:82: copying bootstrap data to pipe caused \"write init-p: broken pipe\""' stderr_lines: <omitted> stdout: '' stdout_lines: <omitted> ``` using `timeout -f -s KILL` fixes this issue. Also, there is no need to use `shell` module here, let's switch to `command`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c9d80af4e0`)	2019-07-30 15:08:46 +02:00
Guillaume Abrioux	ea44783f3d	validate: add checks for grafana-server group definition this commit adds two checks: - check that the `[grafana-server]` group is defined - check that the `[grafana-server]` contains at least one node. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `02beb00916`)	2019-07-29 15:46:58 +02:00
Guillaume Abrioux	e2b41a17c0	mgr: fix a typo this tasks isn't using the right container_exec_cmd, that's delegating to the wrong node. Let's use the right fact to fix this command. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `ec33ee7574`)	2019-07-29 15:46:58 +02:00
Guillaume Abrioux	1a9043128c	dashboard: remove cfg80211 module installation According to this comment [1], this seems to be needed to detect wifi devices. In node exporter we can see this: ``` --collector.wifi Enable the wifi collector (default: disabled). ``` since it's enabled by default and we don't even change this in our systemd templates for node-exporter, we can easily assume in the end it's not needed. Therefore, let's remove this. [1] `dbf81b6b5b (diff-961545214e21efed3b84a9e178927a08L21-L23)` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `b9cdf341be`)	2019-07-29 15:46:58 +02:00
Guillaume Abrioux	d0ad1cf0f1	dashboard: use dedicated group only There's no need to add complexity and trying to fallback on other group. Let's deploy dashboard on all nodes present in grafana-server group. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d67230b2a2`)	2019-07-29 15:46:58 +02:00
Guillaume Abrioux	93826e061d	dashboard: enable dashboard by default This commit enables dashboard deployment by default. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1726739 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `fb1b5b3251`) # Conflicts: # tox-dashboard.ini	2019-07-29 15:46:58 +02:00
Dimitri Savineau	43d625b59a	Remove NBSP characters Some NBSP are still present in the yaml files. Adding a test in travis CI. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `07c6695d16`)	2019-07-26 16:23:41 -04:00
Guillaume Abrioux	6ef73b59d2	container: rename docker directories Those 2 directories should be renamed to be more generic (docker vs. podman). Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `19950b5170`)	2019-07-25 13:40:40 +02:00
fmount	15c745d998	Avoid to setup provisioners in a fully containerized environment This commit adds a when clause to avoid the setup of grafana provisioners in a fully containerized scenario. This is needed when the ceph-grafana-dashboards package is not installed and this task could result in a wrong grafana configuration that let the container crash. Signed-off-by: fmount <fpantano@redhat.com> (cherry picked from commit `fac1b030cb`)	2019-07-24 14:16:55 +02:00
Dimitri Savineau	367dce2894	ceph-dashboard: enable rgw options conditionally The dashboard rgw frontend options only need to be applied when there's some nodes present in the rgw ansible group. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `5383c2f7f3`)	2019-07-19 20:33:42 +00:00
Dimitri Savineau	87db5aa55c	dashboard: use variables for port value The current port value for alertmanager, grafana, node-exporter and prometheus is hardcoded in the roles so it's not possible to change the port binding of those services. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `8ab9b719fa`)	2019-07-19 20:33:42 +00:00
Giulio Fidente	985165dbf7	Fix backward compat with old cephfs_pools format Previously cephfs_pools items used to have a pgs: key but not pgp_num: nor pg_num: Signed-off-by: Giulio Fidente <gfidente@redhat.com> (cherry picked from commit `edd1420217`)	2019-07-19 17:50:57 +00:00
Guillaume Abrioux	bbfd6965e0	handler: fix bug in osd handlers `fbf4ed42ae` introduced a bug when container binary is podman. podman doesn't support ps -f using regular expression, the container id is never set in the restart script causing the handler to fail. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1721536 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `618dbf271d`)	2019-07-18 16:49:14 +00:00
Guillaume Abrioux	4aa4496fc1	validate: fail if gpt header found on unprepared devices ceph-volume will complain if gpt headers are found on devices. This commit checks whether a gpt header is present on devices passed in `devices` variable and fail early. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1730541 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `487d701685`)	2019-07-18 10:32:53 +02:00
Dimitri Savineau	2d8ed4cc52	ceph-infra: update handler with daemon variable Both ntp and chrony daemon use variable for the service name because it could be different depending on the GNU/Linux distribution. This has been update in `9d88d3199` for chrony but only for the start part not for the handler. The commit fixes this for both ntp and chrony. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `0ae0193144`)	2019-07-15 16:36:49 +00:00
Dimitri Savineau	b87c189299	ceph-infra: Open prometheus port The Prometheus porrt 9090 isn't open in the firewall configuration. Also the dashboard task on the grafana node was not required because it's already present on the mgr node. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `41b44dde85`)	2019-07-11 13:41:58 +00:00
Guillaume Abrioux	2742063aee	handler: remove legacy condition since everything is already in a block with the same condition, it's not needed to leave all of them on these tasks. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `ee29f7370a`)	2019-07-10 15:53:26 +00:00
Guillaume Abrioux	bca8ac39c2	validate: improve message printed in check_devices.yml The message prints the whole content of the registered variable in the playbook, this is not needed and makes the message pretty unclear and unreadable. ``` "msg": "{'_ansible_parsed': True, 'changed': False, '_ansible_no_log': False, u'err': u'Error: Could not stat device /dev/sdf - No such file or directory.\\n', 'item': u'/dev/sdf', '_ansible_item_result': True, u'failed': False, '_ansible_item_label': u'/dev/sdf', u'msg': u\"Error while getting device information with parted script: '/sbin/parted -s -m /dev/sdf -- unit 'MiB' print'\", u'rc': 1, u'invocation': {u'module_args': {u'part_start': u'0%', u'part_end': u'100%', u'name': None, u'align': u'optimal', u'number': None, u'label': u'msdos', u'state': u'info', u'part_type': u'primary', u'flags': None, u'device': u'/dev/sdf', u'unit': u'MiB'}}, 'failed_when_result': False, '_ansible_ignore_errors': None, u'out': u''} is not a block special file!" ``` Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1719023 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `e6dc3ebd8c`)	2019-07-10 09:37:01 -04:00
Boris Ranto	5d5e7d59fd	dashboard: Use upstream default port We are currently using incorrect dashboard default port. The upstream uses 8443 instead of 8234 by default. This should get us closer to the upstream project. Signed-off-by: Boris Ranto <branto@redhat.com> (cherry picked from commit `21758fcee8`)	2019-07-10 11:49:35 +02:00
Dimitri Savineau	3bdcbb005f	ceph-dashboard: remove bool filter for rgw vars Some dashboard_rgw_api_* variables are using the bool filter but those variables are strings with an empty string as default value. So we should test the variable against an empty string instead of a bool. dashboard_rgw_api_host: '' dashboard_rgw_api_port: '' dashboard_rgw_api_scheme: '' dashboard_rgw_api_admin_resource: '' Resolves: #4179 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `5413274412`)	2019-07-10 11:48:58 +02:00
Dimitri Savineau	c040c34d97	ceph-iscsi: Update gateway config/template - Remove gateway_keyring from the configuration file because it's not used in ceph-iscsi 3.x release. - Use config_template instead of template module for iscsi-gateway configuration file. Because the file is an ini file and we might want to override more parameters than those present in ceph-ansible. - Because we can now set the pool name in the configuration, we should use a variable for that. This is refact with the iscsi_pool_* variables also used to configure the pool size. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `1f2a4f1910`)	2019-07-10 09:35:21 +00:00
Dimitri Savineau	f13e6642a4	ceph-handler: fix cluster name in socket path `c90f605b5` introduces the default ceph cluster name value in the rgw socket path for the rgw restart script. But this should use the `cluster` variable instead. This commit also fixes this in the osd restart script. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `de7f948b75`)	2019-07-08 19:57:08 +00:00
ilyashestopalov	5c6a9e1a96	ceph-mon: Fix cluster name parameter The ability to add nodes with the monitor role to an existing cluster whose name differs from the default name is fixed. Signed-off-by: ilyashestopalov <usr.tester@yandex.ru> (cherry picked from commit `904532c5e2`)	2019-07-08 09:12:37 -04:00
fmount	ca378f1da0	Add package-install tag on ceph-grafana-dashboard pkg install. According to the OSP pattern, we need the package-install tag to control what is installed on the host. This commit just add the missing tag to meet the TripleO requirements. See: /issues/4197 for details Fixes: #4197 Signed-off-by: fmount <fpantano@redhat.com> (cherry picked from commit `95bd002b35`)	2019-07-08 10:42:41 +00:00
Dimitri Savineau	cd7156efee	ceph-iscsi-gw: Update log directories bind mount On containerized deployment we need to bind mount the ceph-iscsi directory to avoid writing the logs in the container. The /var/log/ceph directory isn't use by rbd-targe-api/gw services because they have their own log directories. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `91bef94b6c`)	2019-07-07 07:09:42 +00:00
Guillaume Abrioux	689605b084	iscsi: refact deprecated variables This commit moves some old variables into ceph-defaults so we can move the `use_new_ceph_iscsi` fact in ceph-facts role in order. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `a781ce881c`)	2019-07-04 00:04:04 +00:00
Mike Christie	ce62ac7beb	igw: Add check for missing iqn If the user is still using the older packages and does not setup the target iqn you will just get a vague error message later on. This adds a check during the validate task, so it is clear to the user. Signed-off-by: Mike Christie <mchristi@redhat.com> (cherry picked from commit `08a6d10c32`)	2019-07-04 00:04:04 +00:00
Mike Christie	cb8bab06d8	igw: Update iscsigws.yml.sample for ceph-iscsi support Update iscsigws.yml.sample to document that we cannot use ansible to setup iSCSI objects and use the new ceph-iscsi package. Signed-off-by: Mike Christie <mchristi@redhat.com> (cherry picked from commit `75fee55d19`)	2019-07-04 00:04:04 +00:00
Mike Christie	6872f7ee95	igw: Support ceph-iscsi package for install This adds support for the ceph-iscsi package during install. ceph-iscsi does not support setting up targets/gws, luns and clients with the current library/igw_* code. Going forward those tasks should be done with gwcli or dashboard. ceph-iscsi will only be used if the user has no iscsi objects setup so we do not break existing setups. The next patch will update the iscsigws.yml.sample to document that users must not setup any iscsi object if they want to use the new package and tools. Signed-off-by: Mike Christie <mchristi@redhat.com> (cherry picked from commit `cbe66cec52`)	2019-07-04 00:04:04 +00:00
Mike Christie	f180eccb84	igw: drop gateway_ip_list for container setups The gateway_ip_list is not used in container setups, so drop it for that case. Signed-off-by: Mike Christie <mchristi@redhat.com> (cherry picked from commit `b7b2213be1`)	2019-07-04 00:04:04 +00:00
Mike Christie	f984db5544	igw: move gateway_ip_list check to validate role Signed-off-by: Mike Christie <mchristi@redhat.com> (cherry picked from commit `d89d3e7cd6`)	2019-07-04 00:04:04 +00:00
Dimitri Savineau	d4a3e26534	ceph-handler: Fix rgw socket in restart script Since Mimic the radosgw socket has two extra fields in the socket name (before the .asok suffix): <pid>.<ctid> Before: /var/run/ceph/ceph-client.rgw.cephaio-1.asok After: /var/run/ceph/ceph-client.rgw.cephaio-1.16913.23928832.asok The radosgw restart script doesn't handle this and could fail during an upgrade. If the SOCKETS variable isn't defined in the script then the test command won't fail because the return code is 0 $ test -S $ echo $? 0 There multiple issues in that script: - The default SOCKETS value isn't defined due to a typo SOCKET vs SOCKETS. - Because the socket name uses the pid then we need to check the socket name after the service restart. - After restarting the radosgw service we need to wait few seconds otherwise the socket won't be created. - Update the wget parameters because the command is doing a loop. We now use the same option than curl. - The check_rest function doesn't test the radosgw at all due to a wrong test command (test against a string) and always returns 0. This needs to use the DOCKER_EXECS variable in order to execute the command. $ test 'wget http://192.168.100.11:8080' $ echo $? 0 Also remove the test based on the ansible_fqdn because we only use the ansible_hostname + rgw instance name. Finally group all for loop into a single one. Resolves: #3926 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `c90f605b51`)	2019-07-03 15:08:35 +00:00
Giulio Fidente	72e0ac1f44	Add radosgw_frontend_ssl_certificate parameter This is necessary when configuring RGW with SSL because in addition to passing specific frontend options, civetweb appends the 's' character to the binding port and beast uses ssl_endpoint instead of endpoint. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1722071 Signed-off-by: Giulio Fidente <gfidente@redhat.com> (cherry picked from commit `d526803c6c`)	2019-07-02 20:13:09 +00:00
Guillaume Abrioux	2295a4cf0a	containers: improve logging bindmount /var/log/ceph on all containers so it's possible to retrieve logs from the host. related ceph-container PR: ceph/ceph-container#1408 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1710548 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `33eed78d17`)	2019-07-02 11:27:34 -04:00
Guillaume Abrioux	381358f439	nfs: clean template remove legacy options ``` ganesha.nfsd-115[main] config_errs_to_log :CONFIG :WARN :Config File (/etc/ganesha/ganesha.conf:13): Unknown parameter (Dir_Max) ganesha.nfsd-115[main] config_errs_to_log :CONFIG :WARN :Config File (/etc/ganesha/ganesha.conf:14): Unknown parameter (Cache_FDs) ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `b725b3077e`)	2019-07-02 11:01:07 +02:00
Dimitri Savineau	109883e7a5	ceph-osd: Add CONTAINER_IMAGE env variable This environment variable was added in `cb381b4` but was removed in `4d35e9e`. This commit reintroduces the change. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `02fbe76e62`)	2019-06-27 17:34:24 -04:00
Guillaume Abrioux	bcfed47009	dashboard: move ceph-grafana-dashboards package installation This commit moves the package installation into ceph-dashboard role. This is needed to install ceph dasboard json file in `/etc/grafana/dashboards/ceph-dashboard/`. Closes: #4026 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `6e2e30db54`)	2019-06-26 12:03:21 -04:00
Guillaume Abrioux	df0d146166	infra: refact dashboard firewall rules - There is no need to open ports 3000, 8234, 9283 on all nodes. - Add missing rule for alertmanager (port 9093) Closes: #4023 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `14f5fc3c86`)	2019-06-26 12:03:21 -04:00
Guillaume Abrioux	28e1ce0d8c	dashboard: append mgr modules to ceph_mgr_modules when `dashboard_enabled` is `True`, let's append `dashboard` and `prometheus` modules to `ceph_mgr_modules` so they are automatically loaded. Closes: #4026 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `a2b6f44665`)	2019-06-26 12:03:21 -04:00
fmount	5c009d01b6	Set grafana_server_addr fact for ipv6 scenarios. As the bz1721914 describes, the grafana_server_addr fact is not defined if ip_version used is ipv6. This commit adds the ip_version condition to set correctly this fact. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1721914 Signed-off-by: fmount <fpantano@redhat.com> (cherry picked from commit `e655038743`)	2019-06-26 12:02:29 -04:00
Guillaume Abrioux	b9c49227bb	facts: fix bug in grafana_server_addr fact setting If no grafana-server group is defined while an mgr group is, that task will fail because `hostvars[groups[grafana_server_group_name][0]` can't return anything since `groups['grafana-server']` will be a non existing key. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `366b309c12`)	2019-06-26 15:08:44 +02:00
Guillaume Abrioux	115b457731	nfs: add missing \| bool filters To address this warning: ``` [DEPRECATION WARNING]: evaluating nfs_ganesha_dev as a bare variable, this behaviour will go away and you might need to add \|bool to the expression in the future ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `2b9fb377a8`)	2019-06-26 13:13:11 +02:00
Guillaume Abrioux	bf61b5e823	nfs: remove duplicate task This task is already present in pre_requisite_non_container.yml Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `edb8d42596`)	2019-06-26 13:13:11 +02:00
Dimitri Savineau	fbf4ed42ae	ceph-handler: Fix OSD restart script There's two big issues with the current OSD restart script. 1/ We try to test if the ceph osd daemon socket exists but we use a wildcard for the socket name : /var/run/ceph/*.asok. This fails because we usually have multiple ceph osd sockets (or other ceph daemon collocated) present in /var/run/ceph directory. Currently the test fails with: bash: line xxx: [: too many arguments But it doesn't stop the script execution. Instead we can specify the full ceph osd socket name because we already know the OSD id. 2/ The container filter pattern is wrong and could matches multiple containers resulting the script to fail. We use the filter with two different patterns. One is with the device name (sda, sdb, ..) and the other one is with the OSD id (ceph-osd-0, ceph-osd-15, ..). In both case we could match more than needed. $ docker container ls CONTAINER ID IMAGE NAMES 958121a7cc7d ceph-daemon:latest ceph-osd-strg0-sda 589a982d43b5 ceph-daemon:latest ceph-osd-strg0-sdb 46c7240d71f3 ceph-daemon:latest ceph-osd-strg0-sdaa 877985ec3aca ceph-daemon:latest ceph-osd-strg0-sdab $ docker container ls -q -f "name=sda" 958121a7cc7d 46c7240d71f3 877985ec3aca $ docker container ls CONTAINER ID IMAGE NAMES 2db399b3ee85 ceph-daemon:latest ceph-osd-5 099dc13f08f1 ceph-daemon:latest ceph-osd-13 5d0c2fe8f121 ceph-daemon:latest ceph-osd-17 d6c7b89db1d1 ceph-daemon:latest ceph-osd-1 $ docker container ls -q -f "name=ceph-osd-1" 099dc13f08f1 5d0c2fe8f121 d6c7b89db1d1 Adding an extra '$' character at the end of the pattern solves the problem. Finally removing the get_container_osd_id function because it's not used in the script at all. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `45d46541cb`)	2019-06-21 14:51:29 -04:00
Dimitri Savineau	6fd4902b55	Change ansible_lsb by ansible_distribution_release The ansible_lsb fact is based on the lsb package (lsb-base, lsb-release or redhat-lsb-core). If the package isn't installed on the remote host then the fact isn't populated. -------- "ansible_lsb": {}, -------- Switching to the ansible_distribution_release fact instead. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `dc187ea6fa`)	2019-06-21 13:36:15 -04:00
fpantano	c03a1e49dd	Add higher retry/delay defaults to check the quorum status. As per bz1718981, this commit adds higher values to check the quorum status. This is helpful for several OSP deployments that fail during the scale up. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1718981 Signed-off-by: fpantano <fpantano@redhat.com> (cherry picked from commit `ba73dc7b21`)	2019-06-20 20:03:19 -04:00
Dimitri Savineau	62d98971f2	ceph-volume: Set max open files limit on container The ceph-volume lvm list command takes ages to complete when having a lot of LV devices on containerized deployment. For instance, with 25 OSDs on a node it takes 3 mins 44s to list the OSD. Adding the max open files limit to the container engine cli when executing the ceph-volume command seems to improve a lot thee execution time ~30s. This was impacting the OSDs creation with ceph-volume (both filestore and bluestore) when using multiple LV devices. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `b987534881`)	2019-06-20 20:00:53 -04:00
Dimitri Savineau	590f6026bb	roles: Remove useless become (true) flag We already set the become flag to true at a play level in the site* playbooks so we don't need to set it at a task level. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `7c3640177b`)	2019-06-20 22:00:27 +00:00
Guillaume Abrioux	52ff9ce5d1	facts: add a retry on get current fsid task sometimes it can happen the following task fails: ``` TASK [ceph-facts : get current fsid] ***************************************** task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-dev-centos-container-update/roles/ceph-facts/tasks/facts.yml:78 Wednesday 19 June 2019 18:12:49 +0000 (0:00:00.203) 0:02:39.995 **** fatal: [mon2 -> mon1]: FAILED! => changed=true cmd: - timeout - --foreground - -s - KILL - 600s - docker - exec - ceph-mon-mon1 - ceph - --cluster - ceph - daemon - mon.mon1 - config - get - fsid delta: '0:00:00.239339' end: '2019-06-19 18:12:49.812099' msg: non-zero return code rc: 22 start: '2019-06-19 18:12:49.572760' stderr: 'admin_socket: exception getting command descriptions: [Errno 2] No such file or directory' stderr_lines: <omitted> stdout: '' stdout_lines: <omitted> ``` not sure exactly why since just before this task, mon1 seems to be well UP otherwise it wouldn't have passed the task `waiting for the containerized monitor to join the quorum`. As a quick fix/workaround, let's add a retry which allows us to get around this situation: ``` TASK [ceph-facts : get current fsid] *************************************** task path: /home/jenkins-build/build/workspace/ceph-ansible-scenario/roles/ceph-facts/tasks/facts.yml:78 Thursday 20 June 2019 15:35:07 +0000 (0:00:00.201) 0:03:47.288 ******* FAILED - RETRYING: get current fsid (3 retries left). changed: [mon2 -> mon1] => changed=true attempts: 2 cmd: - timeout - --foreground - -s - KILL - 600s - docker - exec - ceph-mon-mon1 - ceph - --cluster - ceph - daemon - mon.mon1 - config - get - fsid delta: '0:00:00.290252' end: '2019-06-20 15:35:13.960188' rc: 0 start: '2019-06-20 15:35:13.669936' stderr: '' stderr_lines: <omitted> stdout: \|- { "fsid": "153e159d-7ade-42a7-842c-4d04348b901e" } stdout_lines: <omitted> ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `46a2683944`)	2019-06-20 14:01:33 -04:00

1 2 3 4 5 ...

2338 Commits (6a5308fa7f267de2b93efd1106ef0c965f163a2c)