ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	7a8a719e75	rgw: add retry/until on pools tasks Sometimes, these task can timeout for some reason. Adding these retries can help to avoid unexcepted failures. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-03-06 08:55:13 -05:00
Guillaume Abrioux	eac207091b	client: skip create_users_keys.yml when rolling_update There's no need to run this part of the role when upgrading clients node. Let's skip it when rolling_update.yml is being run. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-03-04 13:06:32 -05:00
Ali Maredia	71f55bd54d	rgw multisite: enable more than 1 realm per cluster Make it so that more than one realm, zonegroup, or zone can be created during a run of the rgw multisite ansible playbooks. The rgw hosts now need to be grouped into zones and realms in the inventory. .yml files need to be created in group_vars for the realms and zones. Sample yaml files are available. Also remove multsite destroy playbook and add --cluster before radosgw-admin commands remove manually added rgw_zone_endpoints var and have ceph-ansible automatically add the correct endpoints of all the rgws in a rgw_zone from the information provided in that rgws hostvars. Signed-off-by: Ali Maredia <amaredia@redhat.com>	2020-03-04 12:58:13 -05:00
Guillaume Abrioux	e17c79b871	osd: do not change pool size on erasure pool This commit adds condition in order to not try to customize pools size when its type is erasure. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-03-04 09:29:01 -05:00
Guillaume Abrioux	47adc2bb08	osd: add pg autoscaler support This commit adds the pg autoscaler support. The structure for pool definition has now two additional attributes `pg_autoscale_mode` and `target_size_ratio`, eg: ``` test: name: "test" pg_num: "{{ osd_pool_default_pg_num }}" pgp_num: "{{ osd_pool_default_pg_num }}" rule_name: "replicated_rule" application: "rbd" type: 1 erasure_profile: "" expected_num_objects: "" size: "{{ osd_pool_default_size }}" min_size: "{{ osd_pool_default_min_size }}" pg_autoscale_mode: False target_size_ratio": 0.1 ``` when `pg_autoscale_mode` is `True` user has to set a decent value in `target_size_ratio`. Given that it's a new feature, it's still disabled by default. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1782253 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-03-04 09:29:01 -05:00
Guillaume Abrioux	bf1f125d71	osd: refact osd pool creation Currently, the command executed is wrong, eg: ``` cmd: - podman - exec - ceph-mon-controller-0 - ceph - --cluster - ceph - osd - pool - create - volumes - '32' - '32' - replicated_rule - '1' delta: '0:00:01.625525' end: '2020-02-27 16:41:05.232705' item: ``` From documentation, the osd pool creation command is : ``` ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] \ [crush-rule-name] [expected-num-objects] ceph osd pool create {pool-name} {pg-num} {pgp-num} erasure \ [erasure-code-profile] [crush-rule-name] [expected_num_objects] ``` it means we pass '1' (from item.type) as value for `expected_num_objects` by default which is very likely not what we want. Also, this commit modifies the default value when no `rule_name` is set to use the existing variable `osd_pool_default_crush_rule` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1808495 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-03-04 09:29:01 -05:00
Dimitri Savineau	be8b315102	ceph-validate: add key format validation If the user provides manually the key value for a specific keyring then there's not valation on the content which could lead to unexpected failures in the ceph_key module. Closes: #5104 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-03 10:01:58 +01:00
Dimitri Savineau	9d3b49293d	purge: stop rgw instances by iteration It looks like that the service module doesn't support wildcard anymore for stopping/disabling multiple services. fatal: [rgw0]: FAILED! => changed=false msg: 'This module does not currently support using glob patterns, found '''' in service name: ceph-radosgw@' ...ignoring Instead we should iterate over the rgw_instances list. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-02 16:32:06 +01:00
Dimitri Savineau	90b1fc8fe9	ceph-infra: install firewalld python bindings When using the firewalld ansible module we need to be sure that the python bindings are installed. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-02 16:32:06 +01:00
Dimitri Savineau	45fb9241c0	ceph-infra: split firewalld tasks Since ansible 2.9 the firewalld task could not be used with service and source in the same time anymore. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-02 16:32:06 +01:00
Dimitri Savineau	aefba82a2e	Add ansible 2.9 support This commit adds ansible 2.9 support in addition of 2.8. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-02 16:32:06 +01:00
Guillaume Abrioux	0326d992c2	osd: add journal option in ceph_volume call (batch) This commit adds the journal option to the ceph_volume call when scenario is lvm batch Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-02-28 17:29:59 -05:00
Guillaume Abrioux	a084a2a347	common: support OSDs with more than 2 digits When running environment with OSDs having ID with more than 2 digits, some tasks don't match the system units and therefore, playbook can fail. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1805643 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-02-27 09:48:36 +01:00
Dimitri Savineau	44e750ee5d	ceph-rgw: increase connection timeout to 10 5s as a connection timeout could be low in some setup. Let's increase it to 10s. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-24 16:01:36 +01:00
Francesco Pantano	15ed9eebf1	Configure ceph dashboard backend and dashboard_frontend_vip This change introduces a new set of tasks to configure the ceph dashboard backend and listen just on the mgr related subnet (and not on '*'). For the same reason the proper server address is added in both prometheus and alertmanger systemd units. This patch also adds the "dashboard_frontend_vip" parameter to make sure we're able to support the HA model when multiple grafana instances are deployed. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792230 Signed-off-by: Francesco Pantano <fpantano@redhat.com>	2020-02-19 17:52:53 -05:00
Dimitri Savineau	ac0f68ccf0	ceph-dashboard: update create/get rgw user tasks Since [1] if a rgw user already exists then the radosgw-admin user create command will return an error instead of modifying the current user. We were already doing separated tasks for create and get operation but only for multisite configuration but it's not enough. Instead we should do the get task first and depending on the result execute the create. This commit also adds missing run_once and delegate_to statement. [1] https://github.com/ceph/ceph/commit/269e9b9 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-18 10:22:21 +01:00
Sam Choraria	2a2656a985	ceph-rgw: allow SSL certificate content to supplied Allow SSL certificate & key contents to be written to the path specified by radosgw_frontend_ssl_certificate. This permits a certificate to be deployed & renewal of expired certificates through ceph-ansible. Signed-off-by: Sam Choraria <sam.choraria@bbc.co.uk>	2020-02-17 16:22:11 +01:00
Dimitri Savineau	c644ea9041	ceph-defaults: remove bootstrap_dirs_xxx vars Both bootstrap_dirs_owner and bootstrap_dirs_group variables aren't used anymore in the code. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-17 16:17:40 +01:00
Ali Maredia	1834c1e48d	rgw: extend automatic rgw pool creation capability Add support for erasure code pools. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1731148 Signed-off-by: Ali Maredia <amaredia@redhat.com> Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-17 16:07:43 +01:00
Florian Faltermeier	9d081e2453	ceph-rgw-loadbalancer: Fix SSL newline issue The `ad7a5da` commit introduced a regression when using TLS on haproxy via the haproxy_frontend_ssl_certificate variable. This cause the "stats socket" and the "tune.ssl.default-dh-param" parameters to be on the same line resulting haproxy failing to start. [ALERT] 351/140240 (21388) : parsing [xxxxx] : 'stats socket' : unknown keyword 'tune.ssl.default-dh-param'. Registered [ALERT] 351/140240 (21388) : Fatal errors found in configuration. Fixes: #4869 Signed-off-by: Florian Faltermeier <florian.faltermeier@uibk.ac.at>	2020-02-17 16:05:42 +01:00
Dimitri Savineau	16e12bf2bb	rgw: don't create user on secondary zones The rgw user creation for the Ceph dashboard integration shouldn't be created on secondary rgw zones. Closes: #4707 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794351 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-17 15:08:11 +01:00
John Fulton	e4bf4857f5	The _filtered_clients list should intersect with ansible_play_batch Client configuration with --limit fails without this patch because certain tasks are only done to the first host in the _filtered_clients list and it's likely that first host will not be included in what's sepcified with --limit. To fix this the _filtered_clients list should be built from all clients in the inventory that are also in the running play. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1798781 Signed-off-by: John Fulton <fulton@redhat.com>	2020-02-17 11:29:18 +01:00
Dimitri Savineau	6dd9b25565	ceph-iscsi: don't use ceph_dev_xxx variables Using ceph_dev_branch and ceph_dev_sha1 for configuring ceph-iscsi repositories from shaman doesn't make sense because the ceph devel branches and sha1 aren't compatible with ceph-iscsi devel. Instead we could rely on the master branch and the latest sha1. Currently it's not possible to using a custom ceph branch/sha1 value with iscsi setup otherwise the repository setup will fail. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-17 10:56:52 +01:00
Dimitri Savineau	10951eeea8	ceph-nfs: fix ceph_nfs_ceph_user variable The ceph_nfs_ceph_user variable is a string for the ceph-nfs role but a list in ceph-client role. `6a6785b` introduced a confusion between both variable type in the ceph-nfs role for external ceph with ganesha. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1801319 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-17 10:56:05 +01:00
Dimitri Savineau	0a3e85e8ca	ceph-nfs: add nfs-ganesha-rados-urls package Since nfs-ganesha 2.8.3 the rados-urls library has been move to a dedicated package. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-17 10:52:30 +01:00
Dimitri Savineau	1fc6b33714	ceph-{mon,osd}: move default crush variables Since `ed36a11` we move the crush rules creation code from the ceph-mon to the ceph-osd role. To keep the backward compatibility we kept the possibility to set the crush variables on the mons side but we didn't move the default values. As a result, when using crush_rule_config set to true and wanted to use the default values for crush_rules then the crush rule ansible task creation will fail. "msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute 'crush_rules'" This patch move the default crush variables from ceph-mon to ceph-osd role but also use those default values when nothing is defined on the mons side. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1798864 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-17 10:50:53 +01:00
Dimitri Savineau	15bd4cd189	ceph-grafana: fix grafana_{crt,key} condition The grafana_{crt,key} aren't boolean variables but strings. The default value is an empty string so we should do the conditional on the string length instead of the bool filter Closes: #5053 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-17 10:49:08 +01:00
Dimitri Savineau	b9d975385c	ceph-prometheus: add alertmanager HA config When using multiple alertmanager nodes (via the grafana-server group) then we need to specify the other peers in the configuration. https://prometheus.io/docs/alerting/alertmanager/#high-availability https://github.com/prometheus/alertmanager#high-availability Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792225 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-17 10:46:21 +01:00
Dimitri Savineau	5a03e0ee1c	containers: add KillMode=none to systemd templates Because we are relying on docker\|podman for managing containers then we don't need systemd to manage the process (like kill). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-13 16:11:33 +01:00
Dimitri Savineau	c6e96699f7	dashboard: allow configuring multiple grafana host When using multiple grafana hosts then we push set the grafana and prometheus URL and push the dashboard layout to a single node. grafana_server_addrs is the list of all grafana nodes and used during the ceph-dashboard role (on mgr/mon nodes). grafana_server_addr is the current grafana node used during the ceph-grafana and ceph-prometheus role (on grafana-server nodes). We don't have the grafana_server_addr fact duplication code between external vs collocated nodes. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1784011 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-10 11:18:45 -05:00
Guillaume Abrioux	3700aa5385	switch_to_containers: increase health check values This commit increases the default values for the following variable consumed in switch-from-non-containerized-to-containerized-ceph-daemons.yml playbook. This also moves these variables in `ceph-defaults` role so the user can set different values if needed. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783223 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-02-07 14:59:14 -05:00
Dimitri Savineau	298ba0bf03	ceph-facts: set devices osd_auto_discovery on OSDs We only need to set the devices fact with osd_auto_discovery on OSD nodes. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-03 16:23:38 +01:00
Dimitri Savineau	ed461544a7	ceph-facts: remove is_podman fact This was used before the CentOS 8 requirement when using CentOS 7 atomic which has both docker and podman installed. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-03 10:11:03 -05:00
Mike Christie	77f3b5d51b	iscsi: Fix crashes during rolling update During a rolling update we will run the ceph iscsigw tasks that start the daemons then run the configure_iscsi.yml tasks which can create iscsi objects like targets, disks, clients, etc. The problem is that once the daemons are started they will accept confifguration requests, or may want to update the system themself. Those operations can then conflict with the configure_iscsi.yml tasks that setup objects and we can end up in crashes due to the kernel being in a unsupported state. This could also happen during creation, but is less likely due to no objects being setup yet, so there are no watchers or users accessing the gws yet. The fix in this patch works for both update and initial setup. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1795806 Signed-off-by: Mike Christie <mchristi@redhat.com>	2020-01-31 11:15:36 -05:00
Dimitri Savineau	9b40a959b9	ceph-common: rhcs 4 repositories for rhel 7 RHCS 4 is available for both RHEL 7 and 8 so we should also enable the cdn repositories for that distribution. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1796853 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-31 09:33:51 -05:00
Guillaume Abrioux	e7bc079405	config: fix external client scenario When no monitor group is present in the inventory, this task fails. This affects only non-containerized deployments. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-31 12:02:15 +01:00
Dimitri Savineau	fa8aa8c864	ceph-container-engine: lvm2 on OSD nodes only Since `de8f2a9` the lvm2 package installation has been moved from ceph-osd role to ceph-container-engine role. But the scope wasn't limited to the OSD nodes only. This commit fixes this behaviour. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-29 14:41:34 +01:00
Dimitri Savineau	2f07b85131	ceph-defaults: remove rgw from ceph_conf_overrides The [rgw] section in the ceph.conf file or via the ceph_conf_overrides variable doesn't exist and has no effect. To apply overrides to all radosgw instances we should use either the [global] or [client] sections. Overrides per radosgw instance should still use the [client.rgw.{instance-name}] section. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794552 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-29 14:11:14 +01:00
Guillaume Abrioux	8c3759f8ce	dashboard: add quotes when passing password to the CLI Otherwise, if the variables contains a '$' it will be interpreted as a BASH variable. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-29 08:45:34 +01:00
Guillaume Abrioux	99328545de	validate: fail if dashboard\|grafana_admin_password aren't set This commit adds a task to make sure user set a custom password for `grafana_admin_password` and `dashboard_admin_password` variables. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1795509 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-29 08:45:34 +01:00
Dimitri Savineau	1fcafffdad	ceph-facts: fix _container_exec_cmd fact value When using different name between the inventory_hostname and the ansible_hostname then the _container_exec_cmd fact will get a wrong value based on the inventory_hostname instead of the ansible_hostname. This happens when the ceph cluster is already running (update/upgrade). Later the container exec commands will fail because the container name is wrong. We should always set the _container_exec_cmd based on the ansible_hostname fact. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1795792 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-29 08:44:59 +01:00
Guillaume Abrioux	2f919f8971	fix calls to `container_exec_cmd` in ceph-osd role We must call `container_exec_cmd` from the right monitor node otherwise the value of the fact might mistmatch between the delegated node and the node being played. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794900 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-27 15:30:45 -05:00
Dmitriy Rabotyagov	0961ab8e60	Ensure that ganesha log directory exists Some ganesha packages do not create ganesha log directories while it's expected to be created while changing it's permissions. Additionally it's no much sense in doing that as a separate task, so directory is created as correct permissions are set with creation of the rest required directories. Signed-off-by: Dmitriy Rabotyagov <drabotyagov@vexxhost.com>	2020-01-24 11:10:08 -05:00
Guillaume Abrioux	eb9112d8fb	handler: read container_exec_cmd value from first mon Given that we delegate to the first monitor, we must read the value of `container_exec_cmd` from this node. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792320 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-23 11:35:57 -05:00
Vytenis Sabaliauskas	ed1eaa1f38	ceph-facts: Fix for 'running_mon is undefined' error, so that fact 'running_mon' is set once 'grep' successfully exits with 'rc == 0' Signed-off-by: Vytenis Sabaliauskas <vytenis.sabaliauskas@protonmail.com>	2020-01-23 16:27:11 +01:00
Guillaume Abrioux	483adb5d79	common: add a default value for ceph_directories_mode Since this variable makes it possible to customize the mode for ceph directories, let's make it a bit more explicit by adding a default value in ceph-defaults. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-22 09:35:35 +01:00
Dimitri Savineau	c9e1fe3d92	ceph-osd: set container objectstore env variables Because we need to manage legacy ceph-disk based OSD with ceph-volume then we need a way to know the osd_objectstore in the container. This was done like this previously with ceph-disk so we should also do it with ceph-volume. Note that this won't have any impact for ceph-volume lvm based OSD. Rename docker_env_args fact to container_env_args and move the container condition on the include_tasks call. Remove OSD_DMCRYPT env variable from the ceph-osd template because it's now included in the container_env_args variable. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792122 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-20 13:59:44 -05:00
Benoît Knecht	3842aa1a30	ceph-rgw: Fix customize pool size "when" condition In `3c31b19ab3`, I fixed the `customize pool size` task by replacing `item.size` with `item.value.size`. However, I missed the same issue in the `when` condition. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2020-01-20 09:26:53 -05:00
Guillaume Abrioux	22865cde9c	handler: fix call to container_exec_cmd in handler_osds When unsetting the noup flag, we must call container_exec_cmd from the delegated node (first mon member) Also, adding a `run_once: true` because this task needs to be run only 1 time. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792320 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-20 09:25:56 -05:00
Dmitriy Rabotyagov	2478a7b948	Fix undefined running_mon Since commit [1] running_mon introduced, it can be not defined which results in fatal error [2]. This patch defines default value which was used before patch [1] Signed-off-by: Dmitriy Rabotyagov <drabotyagov@vexxhost.com> [1] `8dcbcecd71` [2] https://zuul.opendev.org/t/openstack/build/c82a73aeabd64fd583694ed04b947731/log/job-output.txt#14011	2020-01-16 17:03:25 -05:00
Dmitriy Rabotyagov	c81a213a6d	Fix application for openstack_cephfs pools RBD is invalid application for cephfs pools, so it was change to cephfs. Signed-off-by: Dmitriy Rabotyagov <drabotyagov@vexxhost.com>	2020-01-16 16:27:53 -05:00
Dimitri Savineau	7f997e623a	ceph-facts: move facts to defaults value There's no need to define a variable via a fact if we can do it via a default value. Using a fact could be interesseting to override the default value on some condition. - ceph_uid could be set to 167 by default because it's only different on non containerized deployment on Debian/Ubuntu. - rbd_client_directory_{owner,group,mode} could be set to ceph,ceph,0770 by default install of null as we are doing in the facts. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-16 13:57:11 -05:00
Dimitri Savineau	e790b0851d	group_vars: remove useless files Delete legacy files that aren't used anymore. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-16 13:53:12 -05:00
Guillaume Abrioux	3e262e072b	containers: use --cpus instead --cpu-quota When using docker 1.13.1, the current condition: ``` {% if (container_binary == 'docker' and ceph_docker_version.split('.')[0] is version_compare('13', '>=')) or container_binary == 'podman' -%} ``` is wrong because it compares the first digit (1) whereas it should compare the second one. It means we always use `--cpu-quota` although documentation recommend using `--cpus` when docker version is 1.13.1 or higher. From the doc: > --cpu-quota=<value> Impose a CPU CFS quota on the container. The number of > microseconds per --cpu-period that the container is limited to before > throttled. As such acting as the effective ceiling. > If you use Docker 1.13 or higher, use --cpus instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-16 13:51:43 -05:00
Guillaume Abrioux	8dcbcecd71	remove container_exec_cmd_mgr fact Iterating over all monitors in order to delegate a ` {{ container_binary }}` fails when collocating mgrs with mons, because ceph-facts reset `container_exec_cmd` to point to the first member of the monitor group. The idea is to force `container_exec_cmd` to be reset in ceph-mgr. This commit also removes the `container_exec_cmd_mgr` fact. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1791282 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-15 14:03:49 -05:00
Dimitri Savineau	4e7fb5d45a	drop use_fqdn variables This has been deprecated in the previous releases. Let's drop it. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-15 11:32:39 +01:00
Dimitri Savineau	bd87d69183	ceph-iscsi: don't use bracket with trusted_ip_list The trusted_ip_list parameter for the rbd-target-api service doesn't support ipv6 address with bracket. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1787531 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-14 11:32:36 -05:00
Guillaume Abrioux	5558664f37	osd: use _devices fact in lvm batch scenario since `fd1718f379`, we must use `_devices` when deploying with lvm batch scenario. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-14 09:12:03 -05:00
Guillaume Abrioux	2592a1e1e8	facts: fix osp/ceph external use case `d6da508a9b` broke the osp/ceph external use case. We must skip these tasks when no monitor is present in the inventory. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790508 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-13 12:06:06 -05:00
Dimitri Savineau	f940e695ab	ceph-facts: move grafana fact to dedicated file We don't need to executed the grafana fact everytime but only during the dashboard deployment. Especially for ceph-grafana, ceph-prometheus and ceph-dashboard roles. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790303 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-13 12:05:57 -05:00
Guillaume Abrioux	58e6bfed2d	osd: ensure osd ids collected are well restarted This commit refact the condition in the loop of that task so all potential osd ids found are well started. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790212 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-13 12:05:48 -05:00
Guillaume Abrioux	af6875706a	osd: do not run openstack_config during upgrade There is no need to run this part of the playbook when upgrading the cluter. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-13 09:59:08 -05:00
Guillaume Abrioux	3496a0efa2	osd: support scaling up using --limit This commit lets add-osd.yml in place but mark the deprecation of the playbook. Scaling up OSDs is now possible using --limit Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-13 09:59:08 -05:00
Dimitri Savineau	e4ddcb812b	ceph-validate: fail on CentOS 7 The Ceph Octopus release is only supported on CentOS 8 Closes: #4918 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-10 14:06:02 -05:00
Guillaume Abrioux	fd1718f379	config: exclude ceph-disk prepared osds in lvm batch report We must exclude the devices already used and prepared by ceph-disk when doing the lvm batch report. Otherwise it fails because ceph-volume complains about GPT header. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786682 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-10 00:04:22 +01:00
Guillaume Abrioux	86f3eeb717	mon: support replacing a mon We must pick up a mon which actually exists in ceph-facts in order to detect if a cluster is running. Otherwise, it will state no cluster is already running which will end up deploying a new monitor isolated in a new quorum. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622688 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-09 12:59:12 -05:00
Guillaume Abrioux	30200802d9	handler: fix bug `411bd07d54` introduced a bug in handlers using `handler__status` instead of `hostvars[item]['handler__status']` causes handlers to be triggered in anycase even though `handler_*_status` was set to `False` on a specific node. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622688 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-08 17:11:42 -05:00
Benoît Knecht	3c31b19ab3	ceph-rgw: Fix custom pool size setting RadosGW pools can be created by setting ```yaml rgw_create_pools: .rgw.root: pg_num: 512 size: 2 ``` for instance. However, doing so would create pools of size `osd_pool_default_size` regardless of the `size` value. This was due to the fact that the Ansible task used ``` {{ item.size \| default(osd_pool_default_size) }} ``` as the pool size value, but `item.size` is always undefined; the correct variable is `item.value.size`. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2020-01-08 16:16:38 -05:00
Dimitri Savineau	70eba66182	ceph-iscsi: manage ipv6 in trusted_ip_list Only the ipv4 addresses from the nodes running the dashboard mgr module were added to the trusted_ip_list configuration file on the iscsigws nodes. This also add the iscsi gateways with ipv6 configuration to the ceph dashboard. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1787531 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-08 13:54:04 -05:00
Guillaume Abrioux	5adb735c78	facts: use correct python interpreter that task is delegated on the first mon so we should always use the `discovered_interpreter_python` from that node. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-08 10:06:43 -05:00
Guillaume Abrioux	498bc45859	dashboard: use fqdn in external url Force fqdn to be used in external url for prometheus and alertmanager. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1765485 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-08 09:06:49 -05:00
Guillaume Abrioux	fca6f788a0	Revert "nfs: do not run privileged nfs container" This reverts commit `d06158e9d9`. Otherwise ganesha consumers can't dynamically update exports using dbus. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1784562 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-08 14:18:21 +01:00
Dimitri Savineau	254ab54f80	ceph-iscsi: remove python rtslib shaman repository The rtslib python library is now available in the distribution so we shouldn't have to use the shaman repository Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-08 11:13:46 +01:00
Dimitri Savineau	d758125290	ceph-nfs: add ganesha_t type to selinux Since RHEL 8.1 we need to add the ganesha_t type to the permissive SELinux list. Otherwise the nfs-ganesha service won't start. This was done on RHEL 7 previously and part of the nfs-ganesha-selinux package on RHEL 8. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786110 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-08 11:13:46 +01:00
Dimitri Savineau	de8f2a9f83	container: move lvm2 package installation Before this patch, the lvm2 package installation was done during the ceph-osd role. However we were running ceph-volume command in the ceph-config role before ceph-osd. If lvm2 wasn't installed then the ceph-volume command fails: error checking path "/run/lock/lvm": stat /run/lock/lvm: no such file or directory This wasn't visible before because lvm2 was automatically installed as docker dependency but it's not the same for podman on CentOS 8. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-08 11:13:46 +01:00
Dimitri Savineau	d4fd38c967	ceph-nfs: change ganesha CentOS repository Since we don't have nfs-ganesha builds available on CentOS 8 at the moment on shaman then we can use the alternative repository at [1] [1] https://download.nfs-ganesha.org/3/LATEST/CentOS Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-08 11:13:46 +01:00
Guillaume Abrioux	217d95abb2	common: add centos8 support Ceph octopus only supports CentOS 8. This commit adds CentOS 8 support: - update vagrant image in tox configurations. - add CentOS 8 repository for el8 dependencies. - CentOS 8 container engine is podman (same than RHEL 8). - don't use the epel mirror on sepia because it's epel7 only. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-08 11:13:46 +01:00
Stanley Lam	2ca3364109	ceph-rgw-loadbalancer: Modify keepalived master selection Currently the keepalived template only works when system hostnames exactly match the Ansible inventory name. If these are different, all generated templates become BACKUP without a MASTER assigned. Using the inventory_hostname in the template file resolves this issue. Signed-off-by: Stanley Lam stanleylam_604@hotmail.com	2020-01-06 09:25:04 -05:00
Dimitri Savineau	2c06678cde	ceph-infra: replace hardcoded grafana group name The grafana-server group name was hardcoded for the grafana/prometheus firewalld tasks condition. We should we the associated variable : grafana_server_group_name Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-12-18 16:09:14 +01:00
Dimitri Savineau	f4c261ef90	ceph-infra: move dashboard into a dedicated file Instead of using multiple dashboard_enabled condition in the configure_firewall file we could just have the condition once and include the dedicated tasks list. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-12-18 16:09:14 +01:00
Dimitri Savineau	4535985188	ceph-infra: open dashboard port on monitor When there's no mgr group defined in the ansible inventory then the mgrs are deployed implicitly on the mons nodes. If the dashboard is enabled then we need to open the dashboard port on the node that is running the ceph mgr process (mgr or mon). The current code only allow to open that port on the mgr nodes when they are present explicitly in the inventory but not implicitly. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783520 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-12-18 16:09:14 +01:00
Dimitri Savineau	6f0556f015	ceph-defaults: exclude rbd devices from discovery The RBD devices aren't excluded from the devices list in the LVM auto discovery scenario. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783908 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-12-18 09:03:19 +01:00
Guillaume Abrioux	fc02fc98eb	defaults: change monitor\|radosgw_address default values To avoid confusion, let's change the default value from `0.0.0.0` to `x.x.x.x`. Users might think setting `0.0.0.0` will make the daemon binding on all interfaces. Fixes: #4827 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-12 09:58:33 +01:00
Philip Brown	9021c29b61	Add comment on auto-SSL cert generation Fixes: #4830 Signed-off-by: Philip Brown <phil@bolthole.com>	2019-12-11 10:57:28 +01:00
Dimitri Savineau	68c6f39349	ceph-facts: set use_new_ceph_iscsi on iscsi nodes We don't need to set the use_new_ceph_iscsi fact on other nodes than those present in the iscsigws group. Also remove the duplicate iscsi_gw_group_name condition already present on the include_task. Finally validate the ansible distribution as the first task. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-12-10 23:57:03 +01:00
Guillaume Abrioux	8d0dc34ebe	defaults: fix a typo s/above/below Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-10 09:32:02 -05:00
Guillaume Abrioux	a234338eff	defaults: add a comment This commit isolates and adds an explicit comment about variables not intended to be modified by the user. Fixes: #4828 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-09 13:50:43 -05:00
Guillaume Abrioux	d245eb7e7d	dashboard: run node_export as privileged container Typical error: ``` type=AVC msg=audit(1575367499.582:3210): avc: denied { search } for pid=26680 comm="node_exporter" name="1" dev="proc" ino=11528 scontext=system_u:system_r:container_t:s0:c100,c1014 tcontext=system_u:system_r:init_t:s0 tclass=dir permissive=0 ``` node_exporter needs to be run as privileged to avoid avc denied error since it gathers lot of information on the host. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1762168 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-09 09:40:13 -05:00
Dimitri Savineau	1a77dd7e91	ceph-validate: start with ansible version test It doesn't make sense to start validating configuration if the ansible version isn't the good one. This commit moves the check_system task as the first task in the ceph-validate role. The ansible version test tasks are moved at the top of this file. Also moving the iscsi kernel tests from check_system to check_iscsi file. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-12-09 09:35:03 +01:00
Dimitri Savineau	12aa8f4025	ceph-facts: move ntp/chrony facts to ceph-infra The ntp/chrony facts are only used in the ceph-infra role so we don't really need to set them in the ceph-facts roles. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-12-05 19:46:59 +01:00
Guillaume Abrioux	0756fa467d	defaults: change default value for dashboard_admin_password A recent change in ceph/ceph prevent from having username in the password: `Error EINVAL: Password cannot contain username.` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-05 13:02:06 -05:00
Dimitri Savineau	014f51c2a4	ceph-defaults: exclude md devices from discovery The md devices (RAID software) aren't excluded from the devices list in the auto discovery scenario. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1764601 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-12-05 10:14:25 +01:00
Guillaume Abrioux	a8d76d72d7	dashboard: use fqdn url for active alert When using the shortname, the URL for active alert launches with short hostname and fails to connect to the server. This commit changes the template in order to use the fqdn. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1765485 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-03 14:30:32 +01:00
Guillaume Abrioux	fe5ffe589e	facts: isolate container_binary facts in order to be able to call container_binary without having to run the whole ceph-facts role. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-03 13:29:52 +01:00
Guillaume Abrioux	d23383a820	purge: remove docker_* task All containers are removed when systemd stops them. There is no need to call this module in purge container playbook. This commit also removes all docker_image task and remove all container images in the final cleanup play. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1776736 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-03 13:29:52 +01:00
Stanley Lam	ad7a5dad3f	Add option for HAproxy to act a SSL frontend termination point for loadbalanced RGW instances. Signed-off-by: Stanley Lam <stanleylam_604@hotmail.com>	2019-12-02 16:54:33 -05:00
Guillaume Abrioux	a43a872105	docker2podman: import ceph-handler role This is needed to avoid following error: ``` ERROR! The requested handler 'restart ceph mons' was not found in either the main handlers list nor in the listening handlers list ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1777829 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-02 09:11:12 -05:00
Dimitri Savineau	5bd1cf40eb	ceph-osd: wait for all osds once `cf8c6a3` moves the 'wait for all osds' task from openstack_config to the main tasks list. But the openstack_config code was executed only on the last OSD node. We don't need to do this check on all OSD node so we need to add set run_once to true on that task. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-11-27 13:05:42 -05:00
Guillaume Abrioux	23b1f43897	facts: avoid duplicated element in devices list When using `osd_auto_discovery`, `devices` is built multiple times due to multiple runs of `ceph-facts` role. It end up with duplicate instances of a same device in the list. Using `unique` filter when building the list fixes this issue. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-11-27 16:35:41 +01:00
Guillaume Abrioux	cc0c1ce301	dashboard: only print dashboard url of the grafana-server node This commit makes the ceph-dashboard role only printing ceph-dashboard URL of the nodes present in grafana-server group Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1762163 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-11-27 10:28:23 -05:00
Guillaume Abrioux	f19a2aef1a	Revert "tox-podman: use centos 8 vagrant image" This reverts commit `19e9a06ab1`.	2019-11-27 16:19:58 +01:00
Dimitri Savineau	cf8c6a3849	ceph-osd: wait for all osd before crush rules When creating crush rules with device class parameter we need to be sure that all OSDs are up and running because the device class list is is populated with this information. This is now enable for all scenario not openstack_config only. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-11-27 07:43:07 +01:00
Dimitri Savineau	55adc10be3	ceph-grafana: remove ipv6 brakets on wait_for The wait_for ansible module doesn't support the backets on IPv6 address so need to remove them. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1769710 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-11-26 10:08:17 +01:00
Guillaume Abrioux	33bfb10af9	nfs: remove legacy file this file is provided by the packaging (nfs-ganesha) so there's no need to maintain it in ceph-ansible Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-11-22 05:11:41 +01:00
Guillaume Abrioux	d06158e9d9	nfs: do not run privileged nfs container At the moment, we bindmount the dbus socket from the host, this requires to run the container with --privileged. Since we now run a dedicated dbus daemon inside the same container, we can stop running privileged nfs-ganesha containers Related ceph-container PR : ceph/ceph-container#1517 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1725254 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-11-22 05:11:41 +01:00
Dimitri Savineau	19e9a06ab1	tox-podman: use centos 8 vagrant image Switch the podman scenario from atomic centos 7 to centos 8 (not atomic) Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-11-20 10:34:34 +01:00
VasishtaShastry	72c43cc5d9	Fixes failure of cephfs configuration using --limit Configuration of cephfs with an existing cluster using --limit used to fail at different tasks while running with site-docker.yml This commit addresses both of those tasks Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1773489 Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>	2019-11-18 16:44:47 +01:00
Dimitri Savineau	ef2cb99f73	ceph-osd: add device class to crush rules This adds device class support to crush rules when using the class key in the rule dict via the create-replicated sub command. If the class key isn't specified then we use the create-simple sub command for backward compatibility. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1636508 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-11-14 16:25:46 +01:00
Dimitri Savineau	ed36a11eab	move crush rule creation from mon to osd role If we want to create crush rules with the create-replicated sub command and device class then we need to have the OSD created before the crush rules otherwise the device classes won't exist. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-11-14 16:25:46 +01:00
Dimitri Savineau	3e29b8d5ff	ceph-defaults: pin prometheus container tags In addition to the grafana container tag change, we need to do the same for the prometheus container stack based on the release present in the OSE 4.1 container image. $ docker run --rm openshift4/ose-prometheus-node-exporter:v4.1 --version node_exporter, version 0.17.0 build user: root@67fee13ed48f build date: 20191023-14:38:12 go version: go1.11.13 $ docker run --rm openshift4/ose-prometheus-alertmanager:4.1 --version alertmanager, version 0.16.2 build user: root@70b79a3f29b6 build date: 20191023-14:57:30 go version: go1.11.13 $ docker run --rm openshift4/ose-prometheus:4.1 --version prometheus, version 2.7.2 build user: root@12da054778a3 build date: 20191023-14:39:36 go version: go1.11.13 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-11-14 16:11:14 +01:00
VasishtaShastry	9a1f1626c3	Evades validation of ceph_repository_type in containerized scenario This will prevent failure of site-docker.yml with configs in doc. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1769760 Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com> Co-Authored-By: Guillaume Abrioux <gabrioux@redhat.com>	2019-11-14 15:53:22 +01:00
Dimitri Savineau	4a065cebd7	ceph-validate: add rbdmirror validation When ceph_rbd_mirror_configure is set to true we need to ensure that the required variables aren't empty. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1760553 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-11-07 08:57:43 -05:00
Dimitri Savineau	60cbfdc2a6	ceph-handler: Use /proc/net/unix for rgw socket If for some reason, there's an old rgw socket file present in the /var/run/ceph/ directory then the test command could fail with test: xxxxxxxxx.asok: binary operator expected $ ls -hl /var/run/ceph/ total 0 srwxr-xr-x. ceph-client.rgw.rgw0.rgw0.68.94153614631472.asok srwxr-xr-x. ceph-client.rgw.rgw0.rgw0.68.94240997655088.asok We can check the radosgw socket in /proc/net/unix to avoid using wildcard in the socket name. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-11-07 14:41:11 +01:00
Dimitri Savineau	ece46d33be	ceph-osd: fix fs.aio-max-nr sysctl condition [1] introduced a regression on the fs.aio-max-nr sysctl value condition. The enable key isn't a boolean but a string because the expression isn't evaluated. This string output "(osd_objectstore == 'bluestore')" is always true because item.enable condition only matches non empty string. So the sysctl value was applyied for both filestore and bluestore backend. [2] added the bool filter to the condition but the filter always returns false on string and the sysctl wasn't applyed at all. This commit fixes the enable key value by evaluating the value instead of using the string. [1] https://github.com/ceph/ceph-ansible/commit/08a2b58 [2] https://github.com/ceph/ceph-ansible/commit/ab54fe2 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-11-07 13:51:48 +01:00
Dimitri Savineau	2037fb87b6	ceph-defaults: pin grafana container tag to 5.2.4 The latest grafana container tag is using grafana 6.x release which could cause issue with the ceph dashboard integration. Considering that the grafana container in RHCS 3 is based on 5.x then we should use the same version. $ docker run --rm rhceph/rhceph-3-dashboard-rhel7:3 -v Version 5.2.4 (commit: unknown-dev) Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-31 18:44:51 -04:00
Dimitri Savineau	9a996aef7f	ceph-osd: Remove ulimit nofile on container start Even if this improves ceph-disk/ceph-volume performances then it also impact the ceph-osd process. The ceph-osd process shouldn't use 1024:4096 value for the max open files. Removing the ulimit option from the container engine and doing this kind of change on the container side [1]. [1] https://github.com/ceph/ceph-container/pull/1497 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-31 10:42:09 -04:00
fmount	41b8c17356	Set grafana-server user and password in ceph-dashboard role This change adds two tasks to set grafana-api user and password that are required to inject dashboard layouts to the external grafana instance. Without these two parameters the ceph-ansible playbook fails showing an authorization error (HTTPError: 401 Client Error: Unauthorized"). Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1767365 Signed-off-by: fmount <fpantano@redhat.com>	2019-10-31 10:29:57 -04:00
Mihai Plasoianu	d3f67d63ae	ceph-mon: use --admin-daemon to set default crush rule Signed-off-by: Mihai Plasoianu <m.plasoianu@vertical.de>	2019-10-29 20:59:32 -04:00
Radu Toader	f2573c9e6b	nfs: support specific keys for rgw nfs user This brings the possibility to modify the rgw nfs user to use specific keys when those are defined. Signed-off-by: Radu Toader <radu.m.toader@gmail.com>	2019-10-29 14:59:26 -04:00
Dimitri Savineau	15f7c7195a	ceph-nfs: add nfs-ganesha-rados-grace explicitly Since nfs-ganesha V3.0-rc4 and [1] we need to explicitly install the nfs-ganesha-rados-grace package. [1] https://github.com/nfs-ganesha/nfs-ganesha/commit/0fea990 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-28 16:27:36 -04:00
Dimitri Savineau	b33c476f16	defaults: add user/pass auth registry variables Add ceph_docker_registry_username and ceph_docker_registry_password variables in ceph-defaults role so they will be present in the group_vars samples but commented. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1763139 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-24 15:11:45 -04:00
Guillaume Abrioux	3d28773da5	mon: call mon_status from asok since c09b82a80a392ccd0da7677c7b424ce5cd3fa5d6 in ceph/ceph we must call mon_status from asok instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-24 10:19:16 -04:00
Dimitri Savineau	d050391cbb	dashboard: add ceph iscsi management When deploying with ceph-iscsi nodes and dashboard enabled, we need to add the ceph iscsi gateway endpoints to the dashboard configuration and add the mgr ip address in the trusted list in the iscsi gateway configuration file. Closes: #4638 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1764173 https://docs.ceph.com/docs/master/mgr/dashboard/#enabling-iscsi-management Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-22 23:24:17 +02:00
Dimitri Savineau	f2cb937193	ceph-iscsi: add ceph-iscsi stable repositories This commit adds the support of the ceph-iscsi stable repository when use ceph_repository community instead of always using the devel repositories. We're still using the devel repositories for rtslib and tcmu-runner in both cases (dev and community). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-22 23:24:17 +02:00
Dimitri Savineau	fd8d47da98	Revert "iscsigw: install python-requests" We don't need this since [1]. Also this was only working for python2 and not supporting python3. [1] https://github.com/ceph/ceph-iscsi/commit/00f198a This reverts commit `167737dd3d`. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-22 23:24:17 +02:00
Dimitri Savineau	9ad000618f	container/dashboard: run the registry auth task When deploying with packages then the ceph-container-common role isn't executed so the registry authentication task is ignored. Closes: #4636 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-22 23:23:32 +02:00
Guillaume Abrioux	da4215e9c0	validate: fix credentials validation This task is failing when `ceph_docker_registry_auth` is enabled and `ceph_docker_registry_username` is undefined with an ansible error instead of the expected message. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1763139 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-21 13:26:55 -04:00
Dimitri Savineau	3969470fca	travis: fail on ansible-lint errors If ansible-lint reports an error then it's skipped. We should fail in this case. This patch also fixes the pipefail lint in the rbd mirror role [306] Shells that use pipes should set the pipefail option Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-21 13:26:02 -04:00
Guillaume Abrioux	4e9504c939	common: do not override ceph_release when using custom repo Otherwise it fails like following: ``` TASK [ceph-mds : allow multimds] ************************************************************************************************************************************************ Monday 22 July 2019 16:37:38 +0800 (0:00:03.269) 0:13:25.651 ********* fatal: [rhel7u6clone1]: FAILED! => {"msg": "The conditional check 'ceph_release_num[ceph_release] == ceph_release_num.luminous' failed. The error was: error while evaluating conditional (ceph_release_num[ceph_release] == ceph_release_num.luminous): 'dict object' has no attribute u'dummy'\n\nThe error appears to have been in '/usr/share/ceph-ansible/roles/ceph-mds/tasks/create_mds_filesystems.yml': line 43, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: allow multimds\n ^ here\n"} ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1645379 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-17 22:58:16 +02:00
Mike Christie	ba141298d7	iscsi-gw: Fix rtslib installation When using python3 the name of the rtslib rpm is python3-rtslib. The packages that use rtslib already have code that detects the python version and distro deps, so drop it from the ceph iscsi gw task list and let the ceph-iscsi rpm dependency handle it. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1760930 Signed-off-by: Mike Christie <mchristi@redhat.com>	2019-10-16 12:59:31 -04:00
Guillaume Abrioux	71cebf80a6	update: follow new recommandation to upgrade mds cluster Refact the mds cluster upgrade code in order to follow the documented recommandation. See: https://github.com/ceph/ceph/blob/master/doc/cephfs/upgrading.rst Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1569689 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-16 11:23:12 -04:00
Guillaume Abrioux	b63bd13073	nfs: remove unnecessary set_fact in main.yml this task is a leftover and no longer needed. It even causes bug when collocating nfs with mon. Closes: #4609 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-16 11:23:02 -04:00
Dimitri Savineau	0b1e9c0737	rbd-mirror: fail if the peer is not added Due the 'failed_when: false' statement present in the peer task then the playbook continues to ran even if the peer task was failing (like incorrect remote peer format. "stderr": "rbd: invalid spec 'admin@cluster1'" This patch adds a task to list the peer present and add the peer only if it's not already added. With this we don't need the failed_when statement anymore. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1665877 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-16 16:27:46 +02:00
Dimitri Savineau	bc701860d5	ceph-iscsi: notify rbd target services When the iscsi gateway or the ceph configuration file change then we need to notify the rbd target api/gw services to be restarted. This patch also merges the rbd-target-api and rbd-target-gw handler into a single file and listen. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-16 16:25:40 +02:00
Guillaume Abrioux	cb80231725	mgr: do not copy all keyrings on all mgr There is no need to loop over all mgr nodes to set this fact, it's even breaking deployments because it tries to copy all mgr keyring on all mgr. Closes: #4602 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-15 15:06:46 -04:00
Dimitri Savineau	0f978d969b	Remove validate action and notario dependency The current ceph-validate role is using both validate action and fail module tasks to validate the ceph configuration. The validate action is based on the notario python library. When one of the notario validation fails then a python stack trace is reported to the ansible task. This output isn't understandable by users. This patch removes the validate action and the notario depencendy. The validation is now done with only fail ansible module. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1654790 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-15 11:34:49 +02:00
Dimitri Savineau	f7fd0b6d4f	lint: fix error [303,602,701,702] [303] mktemp used in place of tempfile module [602] Don't compare to empty string [701] No 'galaxy_info' found [702] Use 'galaxy_tags' rather than 'categories' This patch also changes the ansible log_path value via the ANSIBLE_LOG_PATH environment variable in the travis configuration to avoid warnings. [WARNING]: log file at /home/travis/ansible/ansible.log is not writeable and we cannot create it, aborting Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-15 10:07:52 +02:00
Dimitri Savineau	fe9c5b8c68	ceph-handler: group listen topics and condition We are using multiple listen topics with the handlers. That means that we are notifying 4 tasks for each handler. Instead we can group the listen on an include_tasks and based on the group condition. Before: NOTIFIED HANDLER ceph-handler : set _mon_handler_called before restart for mon0 NOTIFIED HANDLER ceph-handler : copy mon restart script for mon0 NOTIFIED HANDLER ceph-handler : restart ceph mon daemon(s) for mon0 NOTIFIED HANDLER ceph-handler : set _mon_handler_called after restart for mon0 NOTIFIED HANDLER ceph-handler : set _osd_handler_called before restart for mon0 NOTIFIED HANDLER ceph-handler : copy osd restart script for mon0 NOTIFIED HANDLER ceph-handler : restart ceph osds daemon(s) for mon0 NOTIFIED HANDLER ceph-handler : set _osd_handler_called after restart for mon0 NOTIFIED HANDLER ceph-handler : set _mds_handler_called before restart for mon0 NOTIFIED HANDLER ceph-handler : copy mds restart script for mon0 NOTIFIED HANDLER ceph-handler : restart ceph mds daemon(s) for mon0 NOTIFIED HANDLER ceph-handler : set _mds_handler_called after restart for mon0 NOTIFIED HANDLER ceph-handler : set _rgw_handler_called before restart for mon0 NOTIFIED HANDLER ceph-handler : copy rgw restart script for mon0 NOTIFIED HANDLER ceph-handler : restart ceph rgw daemon(s) for mon0 NOTIFIED HANDLER ceph-handler : set _rgw_handler_called after restart for mon0 NOTIFIED HANDLER ceph-handler : set _mgr_handler_called before restart for mon0 NOTIFIED HANDLER ceph-handler : copy mgr restart script for mon0 NOTIFIED HANDLER ceph-handler : restart ceph mgr daemon(s) for mon0 NOTIFIED HANDLER ceph-handler : set _mgr_handler_called after restart for mon0 NOTIFIED HANDLER ceph-handler : set _rbdmirror_handler_called before restart for mon0 NOTIFIED HANDLER ceph-handler : copy rbd mirror restart script for mon0 NOTIFIED HANDLER ceph-handler : restart ceph rbd mirror daemon(s) for mon0 NOTIFIED HANDLER ceph-handler : set _rbdmirror_handler_called after restart for mon0 After: NOTIFIED HANDLER ceph-handler : mons handler for mon0 NOTIFIED HANDLER ceph-handler : osds handler for mon0 NOTIFIED HANDLER ceph-handler : mdss handler for mon0 NOTIFIED HANDLER ceph-handler : rgws handler for mon0 NOTIFIED HANDLER ceph-handler : mgrs handler for mon0 NOTIFIED HANDLER ceph-handler : rbdmirrors handler for mon0 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-11 15:43:58 -04:00
Guillaume Abrioux	161170524d	mgr: improve mgr keyring creation Delegating on remote node isn't necessary here since we are already iterating over the right nodes. Closes: #4518 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-11 09:40:07 -04:00
Guillaume Abrioux	273413186a	common: do not reset `container_exec_cmd` This commit removes some legacy tasks. These tasks aren't needed, they cause the playbook to fail when collocating daemons. Closes: #4553 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-10 14:38:30 -04:00
Guillaume Abrioux	80e2d00b16	validate: prevent from installing OSD on same disk as the OS This commit adds a validation task to prevent from installing an OSD on the same disk as the OS. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1623580 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-10 08:33:20 -04:00
Dimitri Savineau	3f6ff240b7	dashboard: update layouts before the restart If the mgr dashboard doesn't restart fast enough then the inject dashboard task will fail with a HTTP error 400. Error EINVAL: Traceback (most recent call last): File "/usr/share/ceph/mgr/mgr_module.py", line 914, in _handle_command return self.handle_command(inbuf, cmd) File "/usr/share/ceph/mgr/dashboard/module.py", line 450, in handle_command push_local_dashboards() File "/usr/share/ceph/mgr/dashboard/grafana.py", line 132, in push_local_dashboards retry() File "/usr/share/ceph/mgr/dashboard/grafana.py", line 89, in call result = self.func(self.args, *self.kwargs) File "/usr/share/ceph/mgr/dashboard/grafana.py", line 127, in push grafana.push_dashboard(body) File "/usr/share/ceph/mgr/dashboard/grafana.py", line 54, in push_dashboard response.raise_for_status() File "/usr/lib/python2.7/site-packages/requests/models.py", line 834, in raise_for_status raise HTTPError(http_error_msg, response=self) HTTPError: 400 Client Error: Bad Request Instead we can trigger this task before the module restart. Closes: #4565 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-09 09:10:27 +02:00
Guillaume Abrioux	6c6a512a72	nfs: stop nfs server service in all context This commit moves this task in order to stop the nfs server service regardless the deployment type desired (containerized or non containerized). Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1508506 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-07 10:24:33 -04:00
Guillaume Abrioux	47034effe0	nfs: stop nfs server service The syntax here wasn't working, this refact fixes this task. Also, removing the `ignore_errors: true` which was hidding the failure. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1508506 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-07 10:24:33 -04:00
Guillaume Abrioux	fa9b42e98e	switch_to_containers: do not re-set `ceph_uid` This commit refacts the way we set `ceph_uid` fact in `ceph-facts` and removes all `set_fact` tasks for `ceph_uid` in switch-to-containers playbook to avoid duplicated code. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-07 14:15:56 +02:00
Dimitri Savineau	b9e93ad7a6	ceph-dashboard: remove rgw api host,port,scheme We don't need to have dedicated variables for the RGW integration into the Ceph Dashboard and need to be manually filled. Instead we can use the current values from the RGW nodes by using the IP and port from the first RGW instance of the first RGW node via the radosgw_address and radosgw_frontend_port variables. We don't need to specify all RGW nodes, this will be done automatically with one node. The RGW api scheme is using the radosgw_frontend_ssl_certificate variable to determine if the value is http or https. This variable is also reuse as a condition for the ssl verify task. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-07 11:22:44 +02:00
Dimitri Savineau	249764047b	ceph-dashboard: Improve https configuration This patch moves the https dashboard configuration into a dedicated block to avoid the multiple occurence of the dashboard_protocol condition. It also fixes the dashboard certificate and key variables handling in the condition introduced by `ab54fe2`. Those variables aren't boolean but strings so we can test them via the length filter. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-07 09:08:16 +02:00
Guillaume Abrioux	ccc11cfc93	handler: followup on #4519 This commit adds some missing `\| bool` filters. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-04 10:49:15 -04:00
Dimitri Savineau	dd526cfe4e	ceph-dashboard: add cluster parameter to ceph cmd The ceph dashboard tasks didn't use the cluster option if the cluster name isn't the default value. Closes: #4529 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-04 16:10:22 +02:00
Guillaume Abrioux	411bd07d54	handlers: refact osd handler This commit merges the two restart tasks into a single one, this way it's one task less to notify. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-04 09:42:20 -04:00
Dimitri Savineau	0346871fb5	ceph-handler: don't restart all OSDs with limit When using the ansible --limit option on one or few OSD nodes and if the handler is triggered then we will restart the OSD service on all OSDs nodes instead of the hosts limited by the limit value. Even if the play is limited by the --limit value we are using all OSD nodes from the OSD group. with_items: '{{ groups[osd_group_name] }}' Instead we should iterate only on the nodes present in both OSD group and limit list. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-03 14:52:27 -04:00
Dimitri Savineau	780cf36a59	ceph-facts: fix _radosgw_address with block `e695efc` introduced a regression in the _radosgw_address fact when using the radosgw_address_block variable. There's no item there because we don't use the items lookup. This is only used for _monitor_address with monitor_address_block. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1758099 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-03 16:15:45 +02:00
Guillaume Abrioux	9bad239d77	common: improve keyrings generation There is no need to get n * number of nodes the different keyrings. Adding a `run_once: true` here avoid running a ceph command too many times which could be impacting large cluster deployment. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-02 13:09:50 +02:00
Dimitri Savineau	ec3b687dc4	ceph-facts: use --admin-daemon to get fsid During the rolling_update scenario, the fsid value is retrieve from the current ceph cluster configuration via the ceph daemon config command. This command tries first to resolve the admin socket path via the ceph-conf command. Unfortunately this command won't work if you have a duplicate key in the ceph configuration even if it only produces a warning. As a result the task will fail. Can't get admin socket path: unable to get conf option admin_socket for mon.xxx: warning: line 13: 'osd_memory_target' in section 'osd' redefined Instead of using ceph daemon we can use the --admin-daemon option because we already know what the socket admin path value based on the ceph cluster and mon hostname values. Closes: #4492 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-02 10:07:13 +02:00
Guillaume Abrioux	272d16e101	validate: fix gpt header check Check for gpt header when osd scenario is lvm or lvm batch. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-01 11:46:17 -04:00
Guillaume Abrioux	ed8616aa66	rbdmirror: rename a file rename this file to be more generic. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-01 10:27:51 -04:00
Guillaume Abrioux	e08194dd67	rgw: refact tasks directory layout This commit moves containerized deployment related files to `./tasks/` directory. This is needed to make `docker-to-podman.yml` working since we use `tasks_from:` option. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-01 10:27:51 -04:00
Guillaume Abrioux	c69816c6b7	rbdmirror: refact tasks directory layout This commit moves containerized deployment related files to `./tasks/` directory. This is needed to make `docker-to-podman.yml` working since we use `tasks_from:` option. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-01 10:27:51 -04:00
Guillaume Abrioux	4636f3f7e2	iscsigw: refact tasks directory layout This commit moves containerized deployment related files to `./tasks/ directory. This is needed to make `docker-to-podman.yml` working since we use `tasks_from:` option. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-01 10:27:51 -04:00
Guillaume Abrioux	bd64167469	container: isolate systemd tasks This commit isolates the systemd unit files generation for containers into separate yml files in order to be able importing each corresponding roles without playing all tasks. This is needed so we can run ceph-ansible to render systemd unit files so they call podman instead of docker. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-01 10:27:51 -04:00
Dimitri Savineau	20b1a464ec	ceph-facts: update external grafana fact filter `e695efc` hasn't been updated with the changes introduced in `9bb11c7` so the ips_in_ranges filter isn't used for an external grafana instance. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-01 10:47:14 +02:00
Guillaume Abrioux	e4444d29e0	Revert "ceph-common: install only necesarry ceph-* packages on debian" This reverts commit `58b27ef0b3`. This is breaking debian based OS deployments. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-09-28 08:04:12 +02:00
Boris Ranto	b96c6da832	ceph-defaults: Change the default prometheus port The old default prometheus port 9090 clashes with cockpit in rhel 8. The 9090 port is reserved for web service administration of machines. We should change the default to something that does not clash with other ports used in rhel 8, at least by default. The port 9092 seems like a good choice in my testing. Signed-off-by: Boris Ranto <branto@redhat.com>	2019-09-28 04:40:42 +02:00
Johannes Kastl	5cf22e9b31	move python-xml to raw_install_python.yml The package python-xml is needed for ansible's zypper module to interact with the zypper package management tool. roles/ceph-defaults/defaults/main.yml: Remove python-xml from variable suse_package_dependencies to only install python-xml on SUSE/openSUSE if python is not found. raw_install_python.yml already contains all the logic needed to check if there is a valid python installation, so this is better suited there. openSUSE Leap 15.x / SLES 15.x do no longer have /usr/bin/python, only /usr/bin/python3, which already contains the xml module, so nothing needs to be installed in that case. Signed-off-by: Johannes Kastl <kastl@b1-systems.de>	2019-09-27 14:19:32 +02:00
Harald Jensås	e695efcaf7	Replace ipaddr() with ips_in_ranges() This change implements a filter_plugin that is used in the ceph-facts, ceph-validate roles and infrastucture-playbooks. The new filter plugin will return a list of all IP address that reside in any one of the given IP ranges. The new filter replaces the use of the ipaddr filter. ceph.conf already support a comma separated list of CIDRs for the public_network and cluster_network options. Changes: [1] and [2] introduced a regression in ceph-ansible where public_network can no longer be a comma separated list of cidrs. With this change a comma separated list of subnet CIDRs can also be used for monitor_address_block and radosgw_address_block. [1] commit: `d67230b2a2` [2] commit: `20e4852888` Related-To: https://bugs.launchpad.net/tripleo/+bug/1840030 Related-To: https://bugzilla.redhat.com/show_bug.cgi?id=1740283 Closes: #4333 Please backport to stable-4.0 Signed-off-by: Harald Jensås <hjensas@redhat.com>	2019-09-27 10:11:53 +02:00
Dimitri Savineau	74ab59c4f3	ceph-dashboard: Add prometheus api host The set-prometheus-api-host ceph dashboard subcommand was missing in ceph-dashboard role. Only grafana and alermanager were present. This commit also remove the trailing slash at the end of the host/url values. Closes: #4453 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-09-27 09:16:12 +02:00
Anthony Rusdi	58b27ef0b3	ceph-common: install only necesarry ceph-* packages on debian Currently, ceph package only an meta-package that do not contain actual software, but simply depend on other packages. It's been few release since debian stretch (official), ubuntu bionic (official), ubuntu uca repository and upstream debian-jewel. As we only support nautilus and higher release for master branch, I propose to drop ceph package and use ceph-base instead for repository model other than rhcs so debian ceph install will be more minimalis. Signed-off-by: Anthony Rusdi <33247310+antrusd@users.noreply.github.com>	2019-09-27 01:11:22 +02:00
Dimitri Savineau	ca77d7bd31	ceph-nfs: Allow to configure SecType value Depending on the infrastruture (w/o kerberos auth) then the SecType value could be different. Currently this value is hardcoded in the NFS Ganesha template. Instead we can use a variable. The default value is still the same to avoid breaking the backward compatibility. Closes: #4459 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-09-27 00:33:18 +02:00
liuxu	195f70897c	dashboard: add grafana dashboard support on Debian based OS download grafana dashboard files from github when running on Debian based OS Signed-off-by: liuxu <liuxu623@gmail.com>	2019-09-26 18:49:56 +02:00
fmount	9bb11c7b2a	Inject ceph grafana dashboard layouts This change just adds the task to inject from the ceph dashboard mgr module the required layouts to show all the cluster metrics on the grafana instance. Since we're now able to push grafana layouts through the ceph mgr module command, the dashboards configuration template is no longer needed on containerized environments. This commit also fixes the Vagrantfile IP static assigment in the grafana section because it generates an issue (it's the same of the mgr instance). Finally, considering some deployments that use an external grafana server instance, we reworked the 'grafana_server_addr' assignment to address these requirements. Signed-off-by: fmount <fpantano@redhat.com>	2019-09-26 11:12:20 -04:00
Guillaume Abrioux	167737dd3d	iscsigw: install python-requests Typical error at rbd-target-api startup: ``` Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: Traceback (most recent call last): Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: File "/usr/bin/rbd-target-api", line 39, in <module> Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: from gwcli.utils import (APIRequest, valid_gateway, valid_client, Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: File "/usr/lib/python2.7/site-packages/gwcli/utils.py", line 1, in <module> Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: import requests Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: ImportError: No module named requests ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-09-26 11:35:24 +02:00
Guillaume Abrioux	5bb6a4da42	tests: set copy_admin_key at group_vars level setting it at extra vars level prevent from setting it per node. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-09-26 11:35:24 +02:00
Guillaume Abrioux	ab370b6ad8	global: remove fetch_directory dependency This commit drops the fetch_directory dependency. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622688 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-09-26 11:35:24 +02:00
Guillaume Abrioux	09e04a9197	osd: add wal_devices option support to ceph_volume module This commit adds the `wal_devices` option support to the ceph_volume module. passing a devices list in `bluestore_wal_devices` will make ceph-volume creating 1 vg using these devices to create block.wal partitions. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-09-26 11:35:24 +02:00
Guillaume Abrioux	70f1b37097	osd: update doc text in defaults/main.yml This commit removes ceph-disk references. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-09-26 11:35:24 +02:00
Guillaume Abrioux	7b836eaa47	osd: add block_db_devices option support to ceph_volume module This commit adds the `block_db_devices` option support to the ceph_volume module. passing a devices list in `dedicated_devices` will make ceph-volume creating 1 vg using these devices to create block.db partitions for data devices. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-09-26 11:35:24 +02:00
Guillaume Abrioux	2b97ac921b	validate: check ceph_docker_registry_* length This commit adds a condition to check whether these variables are empty. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-09-18 16:03:18 +02:00
Dimitri Savineau	9f4a99fb24	container: Allow to use registry authentication The registry.redhat.io regsitry requires authentication so before pulling the RHCS 4 container images from the registry we need to do the login step. This is done via the new ceph_docker_registry_auth variable. The default value is false but true for RHCS setup. When set to true, you need to provide the username and password for the registry via the associated variables. This patch also updates the ceph_docker_registry value for RHCS setup. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1748911 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-09-18 16:03:18 +02:00
Dimitri Savineau	5b1c15653f	ceph-handler: Fix osd restart condition In containerized deployment, the restart OSD handler couldn't be triggered in most ansible execution. This is due to the usage of run_once + a condition on the inventory hostname and the last filter. The run_once is triggered first so ansible will pick a node in the osd group to execute the restart task. But if this node isn't the last one in the osd group then the task is ignored. There's more probability that the task will be ignored than executed. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-09-10 15:56:53 -04:00
Dimitri Savineau	1f505628dd	rbd-mirror: Allow to copy the admin keyring The ceph-rbd-mirror role allows to copy the admin keyring via the copy_admin_key variable but there's actually no task in that role doing the job. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-09-10 15:44:04 -04:00
Dimitri Savineau	a3d36df025	rbd-mirror: Use the rbd mirror client keyring The admin keyring isn't present by default on the rbd mirror nodes so the rbd commands related to the mirroring confguration will fail. Instead we can use the rbd mirror client keyring. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-09-10 15:44:04 -04:00
Giulio Fidente	d2a2bd7c42	Look for additional names when checking ceph-nfs container status Ganesha cannot be operated active/active, in those deployments where it is managed by pacemaker the container name can be different than the default. This change uses "ceph_nfs_service_suffix" where previously missing to ensure tasks will work with customized names. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1750005 Signed-off-by: Giulio Fidente <gfidente@redhat.com>	2019-09-09 15:27:37 -04:00
Harald Jensås	d94229204d	Support comma-delimited subnets in firewall ceph.conf supports a comma separated list of subnet CIDR's for the public_network and the cluster network. ceph-ansible should support setting up the firewall for this configuration. Closes: #4425 Related: #4333 https://docs.ceph.com/docs/nautilus/rados/configuration/network-config-ref/#network-config-settings Signed-off-by: Harald Jensås <hjensas@redhat.com>	2019-09-09 15:20:58 -04:00
Dimitri Savineau	7e5e21741e	rbd-mirror: configure pool and peer The rbd mirror configuration was only available for non containerized deployment and was also imcomplete. We now enable the mirroring on the pool and add the remote peer in both scenarios. The default mirroring mode is set to 'pool' but can be configured via the ceph_rbd_mirror_mode variable. This commit also fixes an issue on the rbd mirror command if the ceph cluster name isn't using the default value (ceph) due to a missing --cluster parameter to the command. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1665877 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-09-06 11:00:55 -04:00
fmount	81eb091533	Fix discovered_interpreter_python variable This change fixes the discovered_interpreter_python variable name that was "discovered_python_interpreter" and caused a failure in OSP deployments. Signed-off-by: fmount <fpantano@redhat.com>	2019-09-04 09:55:30 -04:00
Dimitri Savineau	42082c0a27	lint: fix error [201,206] [201] Trailing whitespace [206] Variables should have spaces before and after: {{ var_name }} Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-08-29 14:28:35 -04:00
Dimitri Savineau	65089a7fc3	ceph-common: remove ceph_stable repo on dev When upgrading from stable to devel release with redhat community packages, the rpm packages are not updated due to priority introduced via `a7b1e35` (starting nautilus). We need to remove the ceph stable repositories when configuring the dev repositories. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-08-29 14:05:13 -04:00
Dimitri Savineau	5e5d5c2d87	Add octopus release Add the 15th ceph release: octopus. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-08-29 14:05:13 -04:00
fmount	8a666bfd15	Add http_addr option to grafana config We have no reason to make grafana container listen on *:<port>, so this change adds the http_addr option to the grafana config file and adds the related option on the wait_for tasks. Since grafana_server_addr should exists, we shouldn't rely on the _current_monitor_addr default on prometheus/grafana templates. This change also remove this default value that is not necessary anymore. Signed-off-by: fmount <fpantano@redhat.com>	2019-08-29 13:00:22 -04:00
Anthony Rusdi	4c592066b7	ceph_custom_repo: define apt and rpm key for custom repo This commit also remove the notify on new added debian repo, force update_cache to yes and define sample ceph_custom_key vars. Signed-off-by: Anthony Rusdi <33247310+antrusd@users.noreply.github.com>	2019-08-29 10:25:10 -04:00
Johannes Kastl	0cedc4d303	openSUSE OBS repo using ceph_stable_release Instead of hardcoding `luminous`, use the `ceph_stable_release` variable to point to the correct repository. This is now uncommented in roles/ceph-defaults/defaults/main.yml to be available, as it is only used if ceph_repository is set to 'obs'. group_vars/*.sample files have been regenerated using the ./generate_group_vars_sample.sh script. Signed-off-by: Johannes Kastl <kastl@b1-systems.de>	2019-08-29 10:23:56 -04:00
Johannes Kastl	4711a7d626	fix openSUSE OBS repo creation roles/ceph-common/tasks/installs/suse_obs_repository.yml: ansible's zypper_repository module does not know a parameter 'uri', this is called 'repo' instead Signed-off-by: Johannes Kastl <kastl@b1-systems.de>	2019-08-29 10:23:07 -04:00
Nick Erdmann	7953ee1b81	ceph-infra: open ceph iscsi/prometheus port Signed-off-by: Nick Erdmann <n@nirf.de>	2019-08-28 16:09:55 -04:00
Johannes Kastl	bd507fa147	set discovered_python_interpreter if ansible_python_interpreter is defined If the user has set the `ansible_python_interpreter`, ansible will not try to discover python, so `discovered_python_interpreter` will not be set. Solution: Set `discovered_python_interpreter` to `ansible_python_interpreter` if `ansible_python_interpreter` is defined Signed-off-by: Johannes Kastl <kastl@b1-systems.de>	2019-08-27 20:54:59 +02:00
Dimitri Savineau	2b0616ecca	ceph-mon: Bind mount the ca-trust directory On containerized deployment, the mon container sometimes needs to access to the radosgw endpoint (via the radosgw-admin command). When using TLS on the radosgw with self-signed certificates then we need to access to the CA certification from the mon container. The CA certificate needs to be added on the host and then the directory will be bind mount on the container. Resolves: #4358 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-08-27 20:53:45 +02:00
Dimitri Savineau	49aa05b96c	ceph-client: Use profile rbd in keyring caps Like the OpenStack keyrings, we can use the profile rbd for the clients keyring (both mon and osd). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-08-27 20:52:23 +02:00
Dimitri Savineau	717af83475	Revert "osd: add 'osd blacklist' cap for osp keyrings" This reverts commit `2d955757ee`. The "osd blacklist" isn't an osd caps but should be used with mon caps. Also the correct caps for this is: 'allow command "osd blacklist"'. The current change is breaking the openstack and clients keyrings. By using the profile rbd (which is already used) we already rely on the ability to blacklist dead client. Resolves: #4385 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-08-27 20:52:23 +02:00
Guillaume Abrioux	5986b26a01	global: add newline at end of file This commit re-add a newline at end of files when it's missing. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-23 15:56:47 +02:00
Artur Fijalkowski	011270ca69	global: make directories mode parameterizable This commit makes it possible to parametrize the ceph directories modes. So it changes hardocded mode for ceph related directories from 0755 to customizable with `ceph_directories_mode` variable. Closes: #2920 Signed-off-by: Artur Fijalkowski <artur.fijalkowski@ing.com> Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-23 09:38:17 +02:00
guihecheng	a0590cae9d	rgw/multisite: assign 'rgw_zone' to the exact section in ceph.conf since the following commit: commit `1ac94c048f` rgw: add support for multiple rgw instances on a single host we have multi-instance rgw support on a single host and the config section name of the rgw changed from [client.rgw.$(hostname)] -> [client.rgw.$(hostname).rgwX] when X is the sequence number: 0,1,2,... So we should assign 'rgw_zone' item to the exact rgw instance config section in ceph.conf Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com>	2019-08-23 08:14:10 +02:00
Guillaume Abrioux	327d564106	lint: fix error [301], add `changed_when: false` when needed This commit fixes the error [301]: `[301] Commands should not change things if nothing needs doing` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-23 00:23:47 +02:00
Guillaume Abrioux	102edaeb61	lint: fix error [306], add pipefail on shell command using pipe This commit fixes the error [306]: `[306] Shells that use pipes should set the pipefail option` using `/bin/bash` as executable because Debian/Ubuntu systems use `dash` by default which doesn't have the `-o pipefail`. (See: https://github.com/ansible/ansible-lint/issues/497#issue-424623501) Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-23 00:23:47 +02:00
Johannes Kastl	efd38ecc88	ceph-validate: Refactor check for installation check on SUSE/openSUSE Move the validation from roles/ceph-common/tasks/installs/install_on_suse.yml to roles/ceph-validate/ and fix the syntax. There are two valid combinations of `ceph_origin` and `ceph_repository` on SUSE/openSUSE: - ceph_origin == 'distro' - ceph_origin == 'repository' and ceph_repository == 'obs' The current when condition would fail even in the valid second combination, as ceph_origin != distro would be true then Fixes: #4362 Signed-off-by: Johannes Kastl <kastl@b1-systems.de>	2019-08-22 20:22:13 +02:00
Johannes Kastl	e1b9312084	facts: fix a typo This commit fixes a typo in roles/ceph-facts/tasks/facts.yml Signed-off-by: Johannes Kastl <kastl@b1-systems.de>	2019-08-22 18:08:28 +02:00
Kevin Coakley	e11cbbbcb1	ceph-config: Set changed_when to false on fact gathering statements The "run 'ceph-volume lvm batch --report' to see how many osds are to be created" and "run 'ceph-volume lvm list' to see how many osds have already been created" statements only register the lvm_batch_report and lvm_list variables. Running those ceph-volume commands should never produce a change on the system. Adding changed_when: false prevents irrelevant change messages from Ansible. Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>	2019-08-22 17:27:58 +02:00
Johannes Kastl	8e3511ddc7	fix SUSE/openSUSE naming As SUSE 15.x and openSUSE Leap 15.x share the same base, make clear that both are targeted by the respective tasks Signed-off-by: Johannes Kastl <kastl@b1-systems.de>	2019-08-22 17:20:21 +02:00
Johannes Kastl	cdbe958e55	roles/ceph-validate/tasks/check_system.yml: fail on unsupported SUSE versions Fail if SUSE distributions other than 15.x are found, similar to what we have for openSUSE Signed-off-by: Johannes Kastl <kastl@b1-systems.de>	2019-08-22 17:17:21 +02:00
Dimitri Savineau	9a4ac46d19	ceph-osd: Add ulimit nofile on container start On containerized deployment, the OSD entrypoint runs some ceph-volume commands (lvm/simple scan and/or activate) which perform badly without the ulimit option. This option was added for all previous ceph-volume commands but not on the ceph-osd container startup. Also updating hard limit value to 4096 to reflect default baremetal value. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-08-22 16:59:08 +02:00
Johannes Kastl	11aa5dbb58	ceph-nfs: fail on openSUSE Leap using distro packages roles/ceph-validate/tasks/check_nfs.yml: fail on openSUSE Leap using `ceph_origin = distro`, as the ganesha packages are not available from the distribution repositories Fixes: #4342 Signed-off-by: Johannes Kastl <kastl@b1-systems.de>	2019-08-21 09:58:54 +02:00
Johannes Kastl	c721cb99cb	install ceph-mds packages on SUSE/openSUSE install packages on SUSE/openSUSE distributions, using the same logic as on RedHat-based distributions Fixes #4340 Signed-off-by: Johannes Kastl <kastl@b1-systems.de>	2019-08-21 09:57:56 +02:00
Guillaume Abrioux	9329bbb3af	handler: do not validate the server certificate against the CA Otherwise rgw handler ends up with an error when using https. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-20 13:52:15 +02:00
Johannes Kastl	504017d562	remove duplicate task installing suse dependencies roles/ceph-common/tasks/installs/install_on_suse.yml: remove the task that installs the dependencies, as this is done later in install_suse_packages.yml Signed-off-by: Johannes Kastl <kastl@b1-systems.de>	2019-08-20 12:59:25 +02:00
Guillaume Abrioux	70cf2a5846	osd: remove useless condition just like `ceph_osd_pool_default_size`, a pool size might change after an initial deployment. Having this condition prevents from customizing the pool in that case. This is not needed so let's remove it. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-19 16:17:22 +02:00
Guillaume Abrioux	4df92152c0	common: replace shell module there is no need to use `shell` in these tasks. Let's use `command`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-14 16:42:02 +02:00
Guillaume Abrioux	687087fd43	osd: refact 'wait for all osd to be up' task let's use `until` instead of doing test in bash using python oneliner also, use `command` instead of `shell`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-14 16:42:02 +02:00
Guillaume Abrioux	13815ad3ca	common: use discovered_interpreter_python fact in order to use the right binary name when using python cli in command or shell module. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-14 16:42:02 +02:00
Guillaume Abrioux	a5e359ee80	osd: update the check for 'all osd to be up' the data structure has changed in octopus. eg: the path to `num_osds` is now `["osdmap"]["num_osds"]`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-14 16:42:02 +02:00
Guillaume Abrioux	5b9b841108	mgr: refact 'wait for all mgr to be up' task There's no need to use `shell` module here. Instead of using `\| python -c`, let's use `from_json` filter. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-07 10:33:54 +02:00
Dimitri Savineau	4c6ec1dccb	mgr/dashboard: Fix grafana/prometheus url config When configuring grafana/prometheus embed in the mgr/dashboard, we need to use the address of the grafana-server node and not the current hostname because mgr/dashboard and grafana/prometheus could be present on different hosts. We should instead rely on the grafana_server_addr variable and remove the dashboard_url. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-08-06 09:34:20 +02:00
Dimitri Savineau	f545b5be0d	ceph-dashboard: Add run_once on delegate tasks Because we need to execute commands from a monitor node (the first one in the mons list) we are using delegate_to option. If there's multiple nodes running the ceph-dashboard role then the delegated task will be executed multiple times. Also remove a mgr config-key option not present for nautilus+ releases. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-08-06 09:34:20 +02:00
Johannes Kastl	5ee3d96fb4	only support openSUSE Leap 15.x, fail on 42.x openSUSE switched from 'openSUSE 13.x' to 'openSUSE Leap 42.x' and then to 'openSUSE Leap 15.x' to align with SLES15 development. The previous logic did not correctly allow the current release, as 15.x matched the 'less than 42.3' condition. For now only support openSUSE Leap 15.x, and extend support once 16.x is released (or whatever the exact version will be) Signed-off-by: Johannes Kastl <kastl@b1-systems.de>	2019-08-05 09:46:31 -04:00
Dimitri Savineau	771f25b1f8	ceph-infra: Apply firewall rules with container We don't have a reason to not apply firewall rules on the host when using a containerized deployment. The TripleO environments already manage the ceph firewall rules outside ceph-ansible and set the configure_firewall variable to false. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1733251 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-08-01 15:16:49 +02:00
Dimitri Savineau	34036c667c	ceph-grafana: Set grafana uid/gid on files We don't need to create a grafana system user (in fact we even don't set the righ uid to this user) because we're using a container setup. Instead we just need to be sure to set the owner/group to 472 (grafana user/group from the container) like we do for ceph/167. We don't need to set the user/group recursively on /etc/grafana directory in a dedicated task. Also on Ubuntu system, the ceph-grafana-dashboards isn't present so on non containerized deployment we won't have the /etc/grafana/dashboards/ceph-dashboard directory present (coming with the package) so we need to be sure it exists. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-08-01 10:10:56 +02:00
Guillaume Abrioux	c9d80af4e0	dashboard: fix timeout usage on rgw user creation command For some reason, this is making the playbook failing like following: ``` TASK [ceph-dashboard : create radosgw system user] ********************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************** task path: /home/guits/ceph-ansible/roles/ceph-dashboard/tasks/configure_dashboard.yml:106 Tuesday 30 July 2019 10:04:54 +0200 (0:00:01.910) 0:11:22.319 ******** FAILED - RETRYING: create radosgw system user (3 retries left). FAILED - RETRYING: create radosgw system user (2 retries left). FAILED - RETRYING: create radosgw system user (1 retries left). fatal: [mgr0 -> mon0]: FAILED! => changed=true attempts: 3 cmd: timeout 20 podman exec ceph-mon-mon0 radosgw-admin user create --uid=ceph-dashboard --display-name='Ceph dashboard' --system delta: '0:00:20.021973' end: '2019-07-30 08:06:32.656066' msg: non-zero return code rc: 124 start: '2019-07-30 08:06:12.634093' stderr: 'exec failed: container_linux.go:336: starting container process caused "process_linux.go:82: copying bootstrap data to pipe caused \"write init-p: broken pipe\""' stderr_lines: <omitted> stdout: '' stdout_lines: <omitted> ``` using `timeout -f -s KILL` fixes this issue. Also, there is no need to use `shell` module here, let's switch to `command`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-30 13:52:44 +02:00
Guillaume Abrioux	2d955757ee	osd: add 'osd blacklist' cap for osp keyrings This commits adds the `osd blacklist` cap on all OSP clients keyrings. Fixes: #2296 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-29 09:57:25 -04:00
Dimitri Savineau	d549fffdd2	ceph-osd: check container engine rc for pools When creating OpenStack pools, we only check if the return code from the pool list command isn't 0 (ie: if it doesn't exist). In that case, the return code will be 2. That's why the next condition is rc != 0 for the pool creation. But in containerized deployment, the return code could be different if there's a failure on the container engine command (like container not running). In that case, the return code could but either 1 (docker) or 125 (podman) so we should fail at this point and not in the next tasks. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1732157 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-29 15:55:04 +02:00
Guillaume Abrioux	02beb00916	validate: add checks for grafana-server group definition this commit adds two checks: - check that the `[grafana-server]` group is defined - check that the `[grafana-server]` contains at least one node. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-29 14:42:45 +02:00
Guillaume Abrioux	ec33ee7574	mgr: fix a typo this tasks isn't using the right container_exec_cmd, that's delegating to the wrong node. Let's use the right fact to fix this command. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-29 14:42:45 +02:00
Guillaume Abrioux	b9cdf341be	dashboard: remove cfg80211 module installation According to this comment [1], this seems to be needed to detect wifi devices. In node exporter we can see this: ``` --collector.wifi Enable the wifi collector (default: disabled). ``` since it's enabled by default and we don't even change this in our systemd templates for node-exporter, we can easily assume in the end it's not needed. Therefore, let's remove this. [1] `dbf81b6b5b (diff-961545214e21efed3b84a9e178927a08L21-L23)` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-29 14:42:45 +02:00
Guillaume Abrioux	d67230b2a2	dashboard: use dedicated group only There's no need to add complexity and trying to fallback on other group. Let's deploy dashboard on all nodes present in grafana-server group. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-29 14:42:45 +02:00
Guillaume Abrioux	fb1b5b3251	dashboard: enable dashboard by default This commit enables dashboard deployment by default. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1726739 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-29 14:42:45 +02:00
Dimitri Savineau	07c6695d16	Remove NBSP characters Some NBSP are still present in the yaml files. Adding a test in travis CI. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-26 16:09:23 -04:00
Guillaume Abrioux	19950b5170	container: rename docker directories Those 2 directories should be renamed to be more generic (docker vs. podman). Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-24 16:31:46 +02:00
fmount	fac1b030cb	Avoid to setup provisioners in a fully containerized environment This commit adds a when clause to avoid the setup of grafana provisioners in a fully containerized scenario. This is needed when the ceph-grafana-dashboards package is not installed and this task could result in a wrong grafana configuration that let the container crash. Signed-off-by: fmount <fpantano@redhat.com>	2019-07-23 09:06:50 +02:00
Giulio Fidente	edd1420217	Fix backward compat with old cephfs_pools format Previously cephfs_pools items used to have a pgs: key but not pgp_num: nor pg_num: Signed-off-by: Giulio Fidente <gfidente@redhat.com>	2019-07-19 11:56:58 -04:00
Guillaume Abrioux	618dbf271d	handler: fix bug in osd handlers `fbf4ed42ae` introduced a bug when container binary is podman. podman doesn't support ps -f using regular expression, the container id is never set in the restart script causing the handler to fail. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1721536 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-18 16:22:51 +02:00
Guillaume Abrioux	487d701685	validate: fail if gpt header found on unprepared devices ceph-volume will complain if gpt headers are found on devices. This commit checks whether a gpt header is present on devices passed in `devices` variable and fail early. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1730541 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-18 07:43:55 +02:00
Dimitri Savineau	5383c2f7f3	ceph-dashboard: enable rgw options conditionally The dashboard rgw frontend options only need to be applied when there's some nodes present in the rgw ansible group. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-18 07:22:13 +02:00
Dimitri Savineau	8ab9b719fa	dashboard: use variables for port value The current port value for alertmanager, grafana, node-exporter and prometheus is hardcoded in the roles so it's not possible to change the port binding of those services. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-18 07:22:13 +02:00
Dimitri Savineau	0ae0193144	ceph-infra: update handler with daemon variable Both ntp and chrony daemon use variable for the service name because it could be different depending on the GNU/Linux distribution. This has been update in `9d88d3199` for chrony but only for the start part not for the handler. The commit fixes this for both ntp and chrony. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-12 09:14:33 -04:00
Dimitri Savineau	41b44dde85	ceph-infra: Open prometheus port The Prometheus porrt 9090 isn't open in the firewall configuration. Also the dashboard task on the grafana node was not required because it's already present on the mgr node. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-11 13:40:22 +02:00
Guillaume Abrioux	ee29f7370a	handler: remove legacy condition since everything is already in a block with the same condition, it's not needed to leave all of them on these tasks. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-10 09:42:00 -04:00
Guillaume Abrioux	e6dc3ebd8c	validate: improve message printed in check_devices.yml The message prints the whole content of the registered variable in the playbook, this is not needed and makes the message pretty unclear and unreadable. ``` "msg": "{'_ansible_parsed': True, 'changed': False, '_ansible_no_log': False, u'err': u'Error: Could not stat device /dev/sdf - No such file or directory.\\n', 'item': u'/dev/sdf', '_ansible_item_result': True, u'failed': False, '_ansible_item_label': u'/dev/sdf', u'msg': u\"Error while getting device information with parted script: '/sbin/parted -s -m /dev/sdf -- unit 'MiB' print'\", u'rc': 1, u'invocation': {u'module_args': {u'part_start': u'0%', u'part_end': u'100%', u'name': None, u'align': u'optimal', u'number': None, u'label': u'msdos', u'state': u'info', u'part_type': u'primary', u'flags': None, u'device': u'/dev/sdf', u'unit': u'MiB'}}, 'failed_when_result': False, '_ansible_ignore_errors': None, u'out': u''} is not a block special file!" ``` Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1719023 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-10 09:32:11 -04:00
Dimitri Savineau	1f2a4f1910	ceph-iscsi: Update gateway config/template - Remove gateway_keyring from the configuration file because it's not used in ceph-iscsi 3.x release. - Use config_template instead of template module for iscsi-gateway configuration file. Because the file is an ini file and we might want to override more parameters than those present in ceph-ansible. - Because we can now set the pool name in the configuration, we should use a variable for that. This is refact with the iscsi_pool_* variables also used to configure the pool size. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-10 09:44:40 +02:00
Dimitri Savineau	5413274412	ceph-dashboard: remove bool filter for rgw vars Some dashboard_rgw_api_* variables are using the bool filter but those variables are strings with an empty string as default value. So we should test the variable against an empty string instead of a bool. dashboard_rgw_api_host: '' dashboard_rgw_api_port: '' dashboard_rgw_api_scheme: '' dashboard_rgw_api_admin_resource: '' Resolves: #4179 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-10 09:42:37 +02:00
Boris Ranto	21758fcee8	dashboard: Use upstream default port We are currently using incorrect dashboard default port. The upstream uses 8443 instead of 8234 by default. This should get us closer to the upstream project. Signed-off-by: Boris Ranto <branto@redhat.com>	2019-07-10 09:17:36 +02:00
Dimitri Savineau	de7f948b75	ceph-handler: fix cluster name in socket path `c90f605b5` introduces the default ceph cluster name value in the rgw socket path for the rgw restart script. But this should use the `cluster` variable instead. This commit also fixes this in the osd restart script. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-08 13:55:35 -04:00
fmount	95bd002b35	Add package-install tag on ceph-grafana-dashboard pkg install. According to the OSP pattern, we need the package-install tag to control what is installed on the host. This commit just add the missing tag to meet the TripleO requirements. See: /issues/4197 for details Fixes: #4197 Signed-off-by: fmount <fpantano@redhat.com>	2019-07-08 10:54:30 +02:00
Dimitri Savineau	91bef94b6c	ceph-iscsi-gw: Update log directories bind mount On containerized deployment we need to bind mount the ceph-iscsi directory to avoid writing the logs in the container. The /var/log/ceph directory isn't use by rbd-targe-api/gw services because they have their own log directories. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-07 07:25:33 +02:00
ilyashestopalov	904532c5e2	ceph-mon: Fix cluster name parameter The ability to add nodes with the monitor role to an existing cluster whose name differs from the default name is fixed. Signed-off-by: ilyashestopalov <usr.tester@yandex.ru>	2019-07-07 07:21:29 +02:00
Guillaume Abrioux	a781ce881c	iscsi: refact deprecated variables This commit moves some old variables into ceph-defaults so we can move the `use_new_ceph_iscsi` fact in ceph-facts role in order. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-03 22:13:19 +02:00
Mike Christie	08a6d10c32	igw: Add check for missing iqn If the user is still using the older packages and does not setup the target iqn you will just get a vague error message later on. This adds a check during the validate task, so it is clear to the user. Signed-off-by: Mike Christie <mchristi@redhat.com>	2019-07-03 22:13:19 +02:00
Mike Christie	75fee55d19	igw: Update iscsigws.yml.sample for ceph-iscsi support Update iscsigws.yml.sample to document that we cannot use ansible to setup iSCSI objects and use the new ceph-iscsi package. Signed-off-by: Mike Christie <mchristi@redhat.com>	2019-07-03 22:13:19 +02:00
Mike Christie	cbe66cec52	igw: Support ceph-iscsi package for install This adds support for the ceph-iscsi package during install. ceph-iscsi does not support setting up targets/gws, luns and clients with the current library/igw_* code. Going forward those tasks should be done with gwcli or dashboard. ceph-iscsi will only be used if the user has no iscsi objects setup so we do not break existing setups. The next patch will update the iscsigws.yml.sample to document that users must not setup any iscsi object if they want to use the new package and tools. Signed-off-by: Mike Christie <mchristi@redhat.com>	2019-07-03 22:13:19 +02:00
Mike Christie	b7b2213be1	igw: drop gateway_ip_list for container setups The gateway_ip_list is not used in container setups, so drop it for that case. Signed-off-by: Mike Christie <mchristi@redhat.com>	2019-07-03 22:13:19 +02:00
Mike Christie	d89d3e7cd6	igw: move gateway_ip_list check to validate role Signed-off-by: Mike Christie <mchristi@redhat.com>	2019-07-03 22:13:19 +02:00
Dimitri Savineau	c90f605b51	ceph-handler: Fix rgw socket in restart script Since Mimic the radosgw socket has two extra fields in the socket name (before the .asok suffix): <pid>.<ctid> Before: /var/run/ceph/ceph-client.rgw.cephaio-1.asok After: /var/run/ceph/ceph-client.rgw.cephaio-1.16913.23928832.asok The radosgw restart script doesn't handle this and could fail during an upgrade. If the SOCKETS variable isn't defined in the script then the test command won't fail because the return code is 0 $ test -S $ echo $? 0 There multiple issues in that script: - The default SOCKETS value isn't defined due to a typo SOCKET vs SOCKETS. - Because the socket name uses the pid then we need to check the socket name after the service restart. - After restarting the radosgw service we need to wait few seconds otherwise the socket won't be created. - Update the wget parameters because the command is doing a loop. We now use the same option than curl. - The check_rest function doesn't test the radosgw at all due to a wrong test command (test against a string) and always returns 0. This needs to use the DOCKER_EXECS variable in order to execute the command. $ test 'wget http://192.168.100.11:8080' $ echo $? 0 Also remove the test based on the ansible_fqdn because we only use the ansible_hostname + rgw instance name. Finally group all for loop into a single one. Resolves: #3926 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-07-03 09:30:33 +02:00
Giulio Fidente	d526803c6c	Add radosgw_frontend_ssl_certificate parameter This is necessary when configuring RGW with SSL because in addition to passing specific frontend options, civetweb appends the 's' character to the binding port and beast uses ssl_endpoint instead of endpoint. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1722071 Signed-off-by: Giulio Fidente <gfidente@redhat.com>	2019-07-02 14:14:37 -04:00
Guillaume Abrioux	b725b3077e	nfs: clean template remove legacy options ``` ganesha.nfsd-115[main] config_errs_to_log :CONFIG :WARN :Config File (/etc/ganesha/ganesha.conf:13): Unknown parameter (Dir_Max) ganesha.nfsd-115[main] config_errs_to_log :CONFIG :WARN :Config File (/etc/ganesha/ganesha.conf:14): Unknown parameter (Cache_FDs) ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-28 15:09:19 -04:00
Guillaume Abrioux	33eed78d17	containers: improve logging bindmount /var/log/ceph on all containers so it's possible to retrieve logs from the host. related ceph-container PR: ceph/ceph-container#1408 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1710548 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-28 13:30:36 -04:00
Dimitri Savineau	02fbe76e62	ceph-osd: Add CONTAINER_IMAGE env variable This environment variable was added in `cb381b4` but was removed in `4d35e9e`. This commit reintroduces the change. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-27 16:38:02 +02:00
fmount	e655038743	Set grafana_server_addr fact for ipv6 scenarios. As the bz1721914 describes, the grafana_server_addr fact is not defined if ip_version used is ipv6. This commit adds the ip_version condition to set correctly this fact. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1721914 Signed-off-by: fmount <fpantano@redhat.com>	2019-06-26 15:47:22 +02:00
Guillaume Abrioux	366b309c12	facts: fix bug in grafana_server_addr fact setting If no grafana-server group is defined while an mgr group is, that task will fail because `hostvars[groups[grafana_server_group_name][0]` can't return anything since `groups['grafana-server']` will be a non existing key. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-26 10:49:30 +02:00
Guillaume Abrioux	2b9fb377a8	nfs: add missing \| bool filters To address this warning: ``` [DEPRECATION WARNING]: evaluating nfs_ganesha_dev as a bare variable, this behaviour will go away and you might need to add \|bool to the expression in the future ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-26 08:58:51 +02:00
Guillaume Abrioux	edb8d42596	nfs: remove duplicate task This task is already present in pre_requisite_non_container.yml Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-26 08:58:51 +02:00
Dimitri Savineau	45d46541cb	ceph-handler: Fix OSD restart script There's two big issues with the current OSD restart script. 1/ We try to test if the ceph osd daemon socket exists but we use a wildcard for the socket name : /var/run/ceph/*.asok. This fails because we usually have multiple ceph osd sockets (or other ceph daemon collocated) present in /var/run/ceph directory. Currently the test fails with: bash: line xxx: [: too many arguments But it doesn't stop the script execution. Instead we can specify the full ceph osd socket name because we already know the OSD id. 2/ The container filter pattern is wrong and could matches multiple containers resulting the script to fail. We use the filter with two different patterns. One is with the device name (sda, sdb, ..) and the other one is with the OSD id (ceph-osd-0, ceph-osd-15, ..). In both case we could match more than needed. $ docker container ls CONTAINER ID IMAGE NAMES 958121a7cc7d ceph-daemon:latest ceph-osd-strg0-sda 589a982d43b5 ceph-daemon:latest ceph-osd-strg0-sdb 46c7240d71f3 ceph-daemon:latest ceph-osd-strg0-sdaa 877985ec3aca ceph-daemon:latest ceph-osd-strg0-sdab $ docker container ls -q -f "name=sda" 958121a7cc7d 46c7240d71f3 877985ec3aca $ docker container ls CONTAINER ID IMAGE NAMES 2db399b3ee85 ceph-daemon:latest ceph-osd-5 099dc13f08f1 ceph-daemon:latest ceph-osd-13 5d0c2fe8f121 ceph-daemon:latest ceph-osd-17 d6c7b89db1d1 ceph-daemon:latest ceph-osd-1 $ docker container ls -q -f "name=ceph-osd-1" 099dc13f08f1 5d0c2fe8f121 d6c7b89db1d1 Adding an extra '$' character at the end of the pattern solves the problem. Finally removing the get_container_osd_id function because it's not used in the script at all. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-21 19:54:15 +02:00
Dimitri Savineau	dc187ea6fa	Change ansible_lsb by ansible_distribution_release The ansible_lsb fact is based on the lsb package (lsb-base, lsb-release or redhat-lsb-core). If the package isn't installed on the remote host then the fact isn't populated. -------- "ansible_lsb": {}, -------- Switching to the ansible_distribution_release fact instead. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-21 11:55:05 -04:00
fpantano	ba73dc7b21	Add higher retry/delay defaults to check the quorum status. As per bz1718981, this commit adds higher values to check the quorum status. This is helpful for several OSP deployments that fail during the scale up. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1718981 Signed-off-by: fpantano <fpantano@redhat.com>	2019-06-20 22:39:57 +02:00
Dimitri Savineau	b987534881	ceph-volume: Set max open files limit on container The ceph-volume lvm list command takes ages to complete when having a lot of LV devices on containerized deployment. For instance, with 25 OSDs on a node it takes 3 mins 44s to list the OSD. Adding the max open files limit to the container engine cli when executing the ceph-volume command seems to improve a lot thee execution time ~30s. This was impacting the OSDs creation with ceph-volume (both filestore and bluestore) when using multiple LV devices. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-20 22:37:40 +02:00
Guillaume Abrioux	46a2683944	facts: add a retry on get current fsid task sometimes it can happen the following task fails: ``` TASK [ceph-facts : get current fsid] ***************************************** task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-dev-centos-container-update/roles/ceph-facts/tasks/facts.yml:78 Wednesday 19 June 2019 18:12:49 +0000 (0:00:00.203) 0:02:39.995 **** fatal: [mon2 -> mon1]: FAILED! => changed=true cmd: - timeout - --foreground - -s - KILL - 600s - docker - exec - ceph-mon-mon1 - ceph - --cluster - ceph - daemon - mon.mon1 - config - get - fsid delta: '0:00:00.239339' end: '2019-06-19 18:12:49.812099' msg: non-zero return code rc: 22 start: '2019-06-19 18:12:49.572760' stderr: 'admin_socket: exception getting command descriptions: [Errno 2] No such file or directory' stderr_lines: <omitted> stdout: '' stdout_lines: <omitted> ``` not sure exactly why since just before this task, mon1 seems to be well UP otherwise it wouldn't have passed the task `waiting for the containerized monitor to join the quorum`. As a quick fix/workaround, let's add a retry which allows us to get around this situation: ``` TASK [ceph-facts : get current fsid] *************************************** task path: /home/jenkins-build/build/workspace/ceph-ansible-scenario/roles/ceph-facts/tasks/facts.yml:78 Thursday 20 June 2019 15:35:07 +0000 (0:00:00.201) 0:03:47.288 ******* FAILED - RETRYING: get current fsid (3 retries left). changed: [mon2 -> mon1] => changed=true attempts: 2 cmd: - timeout - --foreground - -s - KILL - 600s - docker - exec - ceph-mon-mon1 - ceph - --cluster - ceph - daemon - mon.mon1 - config - get - fsid delta: '0:00:00.290252' end: '2019-06-20 15:35:13.960188' rc: 0 start: '2019-06-20 15:35:13.669936' stderr: '' stderr_lines: <omitted> stdout: \|- { "fsid": "153e159d-7ade-42a7-842c-4d04348b901e" } stdout_lines: <omitted> ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-20 13:13:04 -04:00
Dimitri Savineau	7c3640177b	roles: Remove useless become (true) flag We already set the become flag to true at a play level in the site* playbooks so we don't need to set it at a task level. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-19 10:31:32 +02:00
Guillaume Abrioux	eece362b38	osd: remove legacy task `parted_results` isn't used anymore in the playbook. By the way, `parted` seems to cause issue because it changes the ownership on devices: ``` root@osd0 ~]# ls -l /dev/sdc* brw-rw----. 1 root disk 8, 32 Jun 11 08:53 /dev/sdc brw-rw----. 1 ceph ceph 8, 33 Jun 11 08:53 /dev/sdc1 brw-rw----. 1 ceph ceph 8, 34 Jun 11 08:53 /dev/sdc2 [root@osd0 ~]# parted -s /dev/sdc print Model: ATA QEMU HARDDISK (scsi) Disk /dev/sdc: 53.7GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 1075MB 1074MB ceph block.db 2 1075MB 2149MB 1074MB ceph block.db [root@osd0 ~]# #We can see ownerships have changed from ceph:ceph to root:disk: [root@osd0 ~]# ls -l /dev/sdc* brw-rw----. 1 root disk 8, 32 Jun 11 08:57 /dev/sdc brw-rw----. 1 root disk 8, 33 Jun 11 08:57 /dev/sdc1 brw-rw----. 1 root disk 8, 34 Jun 11 08:57 /dev/sdc2 [root@osd0 ~]# ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-18 12:45:01 -04:00
Dimitri Savineau	34f9d51178	tests: Update ansible ssh_args variable Because we're using vagrant, a ssh config file will be created for each nodes with options like user, host, port, identity, etc... But via tox we're override ANSIBLE_SSH_ARGS to use this file. This remove the default value set in ansible.cfg. Also adding PreferredAuthentications=publickey because CentOS/RHEL servers are configured with GSSAPIAuthenticationis enabled for ssh server forcing the client to make a PTR DNS query. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-17 09:24:24 +02:00
Rishabh Dave	9d88d3199f	ceph-infra: make chronyd default NTP daemon Since timesyncd is not available on RHEL-based OSs, change the default to chronyd for RHEL-based OSs. Also, chronyd is chrony on Ubuntu, so set the Ansible fact accordingly. Fixes: https://github.com/ceph/ceph-ansible/issues/3628 Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-06-13 14:53:22 -04:00
Rishabh Dave	d1c266e6c7	ceph-infra: update cache for Ubuntu Ubuntu-based CI jobs often fail with error code 404 while installing NTP daemons. Updating cache beforehand should fix the issue. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-06-13 14:00:29 +02:00
Rishabh Dave	67071c3169	align cephfs pool creation The definitions of cephfs pools should match openstack pools. Signed-off-by: Rishabh Dave <ridave@redhat.com> Co-Authored-by: Simone Caronni <simone.caronni@teralytics.net>	2019-06-13 09:44:05 +02:00
Guillaume Abrioux	4cf17a6fdd	iscsi: assign application (rbd) to pool 'rbd' if we don't assign the rbd application tag on this pool, the cluster will get `HEALTH_WARN` state like following: ``` HEALTH_WARN application not enabled on 1 pool(s) POOL_APP_NOT_ENABLED application not enabled on 1 pool(s) application not enabled on pool 'rbd' ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-13 07:35:39 +02:00
Dimitri Savineau	da9891da1e	ceph-handler: replace fuser by /proc/net/unix We're using fuser command to see if a process is using a ceph unix socket file. But the fuser command runs through every PID present in /proc/<PID> to see if one of them is using the file. On a system running thousands processes, the fuser command can take a long time to finish. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1717011 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-12 19:31:21 +02:00
Guillaume Abrioux	905c2256bd	mon: enforce mon0 delegation for initial_mon_key register since this task is designed to be always run on the first monitor, let's enforce the container name accordingly otherwise it could fail like following: ``` fatal: [mon1 -> mon0]: FAILED! => changed=true cmd: - docker - exec - ceph-mon-mon1 - ceph - --cluster - ceph - --name - mon. - -k - /var/lib/ceph/mon/ceph-mon0/keyring - auth - get-key - mon. delta: '0:00:00.085025' end: '2019-06-12 06:12:27.677936' msg: non-zero return code rc: 1 start: '2019-06-12 06:12:27.592911' stderr: 'Error response from daemon: No such container: ceph-mon-mon1' stderr_lines: <omitted> stdout: '' stdout_lines: <omitted> ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-12 11:21:19 -04:00
Guillaume Abrioux	27856cc499	dashboard: add allow_embedding support Add a variable to support the allow_embedding support. See ceph/ceph-ansible/issues/4084 for details. Fixes: #4084 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-12 16:00:32 +02:00
Guillaume Abrioux	2c9cd9d9e7	dashboard: fix dashboard_url setting This setting must be set to something resolvable. See: ceph/ceph-ansible/issues/4085 for details Fixes: #4085 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-12 15:59:58 +02:00
Dimitri Savineau	d0840217f3	ceph-node-exporter: Fix systemd template `069076b` introduced a bug in the systemd unit script template. This commit fixes the options used by the node-exporter container. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-11 21:48:40 +02:00
Dimitri Savineau	dbf81b6b5b	ceph-node-exporter: use modprobe ansible module Instead of using the modprobe command from the path in the systemd unit script, we can use the modprobe ansible module. That way we don't have to manage the binary path based on the linux distribution. Resolves: #4072 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-11 21:40:50 +02:00
fmount	069076bbfd	Fix units and add ability to have a dedicated instance Few fixes on systemd unit templates for node_exporter and alertmanager container parameters. Added the ability to use a dedicated instance to deploy the dashboard components (prometheus and grafana). This commit also introduces the grafana_group_name variable to refer grafana group and keep consistency with the other groups. During the integration with TripleO some grafana/prometheus template variables resulted undefined. This commit adds the ability to check if the group exist and create, accordingly, different job groups in prometheus template. Signed-off-by: fmount <fpantano@redhat.com>	2019-06-10 18:18:46 +02:00
Guillaume Abrioux	771648304d	validate: fail in check_devices at the right task see https://bugzilla.redhat.com/show_bug.cgi?id=1648168#c17 for details. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1648168#c17 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-07 16:14:18 +02:00
Dimitri Savineau	f49090df7e	podman: Add systemd dependency on network.target When using podman, the systemd unit scripts don't have a dependency on the network. So we're not sure that the network is up and running when the containers are starting. With docker this behaviour is already handled because the systemd unit scripts depend on docker service which is started after the network. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-07 09:28:58 +02:00
guihecheng	35d40c65f8	Add role definitions of ceph-rgw-loadbalancer This add support for rgw loadbalancer based on HAProxy and Keepalived. We define a single role ceph-rgw-loadbalancer and include HAProxy and Keepalived configurations all in this. A single haproxy backend is used to balance all RGW instances and a single frontend is exported via a single port, default 80. Keepalived is used to maintain the high availability of all haproxy instances. You are free to use any number of VIPs. A single VIP is shared across all keepalived instances and there will be one master for one VIP, selected sequentially, and others serve as backups. This assumes that each keepalived instance is on the same node as one haproxy instance and we use a simple check script to detect the state of each haproxy instance and trigger the VIP failover upon its failure. Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com>	2019-06-06 17:12:04 +02:00
L3D	ab54fe20ec	ansible: use 'bool' filter on boolean conditionals By running ceph-ansible there are a lot ``[DEPRECATION WARNING]`` like these: ``` [DEPRECATION WARNING]: evaluating containerized_deployment as a bare variable, this behaviour will go away and you might need to add \|bool to the expression in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg. ``` Now appended ``\| bool`` on a lot of the affected variables. Sometimes the coding style from ``variable\|bool`` changed to ``variable \| bool`` (with spaces at the pipe). Closes: #4022 Signed-off-by: L3D <l3d@c3woc.de>	2019-06-06 10:21:17 +02:00
Dimitri Savineau	518ab794fb	container-common: support podman on Ubuntu Currently we're only able to use podman on ubuntu if podman's installation is done manually before the ceph-ansible execution because the deb package is present in an external repository. We already manage the docker-ce installation via an external repository so we should be able to allow the podman installation with the same mechanism too. https://github.com/containers/libpod/blob/master/install.md#ubuntu Resolves: #3947 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-05 14:07:34 +02:00
Guillaume Abrioux	80875adba7	ceph-osd: do not relabel /run/udev in containerized context Otherwise content in /run/udev is mislabeled and prevent some services like NetworkManager from starting. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-04 11:32:41 -04:00
Guillaume Abrioux	a78fb209b1	tests: test podman against atomic os instead rhel8 the rhel8 image used is an outdated beta version, it is not worth it to maintain this image upstream, since it's possible to test podman with a newer version of centos/atomic-host image. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-04 11:32:41 -04:00
Dimitri Savineau	616c484698	ceph-nfs: use template module for configuration `789cef7` introduces a regression in the ganesha configuration file generation. The new config_template module version broke it. But the ganesha.conf file isn't an ini file and doesn't really need to use the config_template module. Instead we can use the classic template module. Resolves: #4045 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-04 09:11:52 +02:00
Guillaume Abrioux	6e2e30db54	dashboard: move ceph-grafana-dashboards package installation This commit moves the package installation into ceph-dashboard role. This is needed to install ceph dasboard json file in `/etc/grafana/dashboards/ceph-dashboard/`. Closes: #4026 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-03 13:36:38 +02:00
Guillaume Abrioux	14f5fc3c86	infra: refact dashboard firewall rules - There is no need to open ports 3000, 8234, 9283 on all nodes. - Add missing rule for alertmanager (port 9093) Closes: #4023 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-03 13:36:38 +02:00
Guillaume Abrioux	a2b6f44665	dashboard: append mgr modules to ceph_mgr_modules when `dashboard_enabled` is `True`, let's append `dashboard` and `prometheus` modules to `ceph_mgr_modules` so they are automatically loaded. Closes: #4026 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-03 13:36:38 +02:00
Dimitri Savineau	7503098ca0	remove ceph-agent role and references The ceph-agent role was used only for RHCS 2 (jewel) so it's not usefull anymore. The current code will fail on CentOS distribution because the rhscon package is only avaible on Red Hat with the RHCS 2 repository and this ceph release is supported on stable-3.0 branch. Resolves: #4020 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-03 13:35:50 +02:00
Guillaume Abrioux	003aeea45a	validate: add a check for nfs standalone if `nfs_obj_gw` is True when deploying an internal ganesha with an external ceph cluster, `ceph_nfs_rgw_access_key` and `ceph_nfs_rgw_secret_key` must be provided so the ganesha configuration file can be generated. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-03 13:34:38 +02:00
Guillaume Abrioux	6a6785b719	nfs: support internal Ganesha with external ceph cluster This commits allows to deploy an internal ganesha with an external ceph cluster. This requires to define `external_cluster_mon_ips` with a comma separated list of external monitors. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1710358 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-03 13:34:38 +02:00
Dimitri Savineau	daf92a9e1f	ceph-facts: generate fsid on mon node The fsid generation is done via a python command. When the ansible controller node only have python3 available (like RHEL 8) then the python command isn't necessarily present causing the fsid generation to fail. We already do some resource creation (like ceph keyring secret) with the python command too but from the mon node so we should do the same for fsid. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1714631 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-03 10:11:32 +02:00
Guillaume Abrioux	55420d6253	roles: introduce `ceph-container-engine` role This commit splits the current `ceph-container-common` role. This introduces a new role `ceph-container-engine` which handles the tasks specific to the installation of containers tools (docker/podman). This is needed for the ceph-dashboard implementation for 2 main reasons: 1/ Since the ceph-dashboard stack is only containerized, we must install everything needed to run containers even in non containerized deployments. Splitting this role allows us to not have to call the full `ceph-container-common` role which would run a bunch of unneeded tasks that would have been skipped anyway. 2/ The current implementation would have required to run `ceph-container-common` on all ceph-clients nodes which would have been conflicting with `9d3517c670` (we don't want to run ceph-container-common on all client nodes, see mentioned commit for more details) Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-22 13:02:10 +02:00

... 4 5 6 7 8 ...

2812 Commits (01256ffe1be8466abbe58e28e52ee7c73009fcba)