ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	f5ba6d9b01	containers: modify bindmount option This commit changes the bind mount option for the mount point `/var/lib/ceph` in the systemd template for mon and mgr containers. This is needed in case of collocating mon/mgr with osds using dmcrypt scenario. Once mon/mgr got converted to containers, the dmcrypt layer sub mount is still seen in `/var/lib/ceph`. For some reason it makes the corresponding devices busy so any other container can't open/close it. As a result, it prevents osds from starting properly. Since it only happens on the nodes converted before the OSD play, the idea is to bind mount `/var/lib/ceph` on mon and mgr with the `rshared` option so once the sub mount is unmounted, it is propagated inside the container so it doesn't see that mount point. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1896392 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-17 09:19:23 -05:00
Guillaume Abrioux	5ba7824c55	container: force rm --storage on ExecStartPre This is a workaround to avoid error like following: ``` Error: error creating container storage: the container name "ceph-mgr-magna022" is already in use by "4a5f674e113f837a0cc561dea5d2cd55d16ca159a647b7794ab06c4c276ef701" ``` that doesn't seem to be 100% reproducible but it shows up after a reboot. The only workaround we came up with at the moment is to run `podman rm --storage <container>` before starting it. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1887716 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-16 10:38:40 -05:00
Benoît Knecht	0d76826bbb	ceph-mon: Don't set monitor directory mode recursively After rolling updates performed with `infrastructure-playbooks/rolling_updates.yml`, files located in `/var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }}` had mode 0755 (including the keyring), making them world-readable. This commit separates the task that configured permissions recursively on `/var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }}` into two separate tasks: 1. Set the ownership and mode of the directory itself; 2. Recursively set ownership in the directory, but don't modify the mode. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2020-11-02 17:36:37 +01:00
Dimitri Savineau	59ecddcdd0	keyring: use ceph_key module for auth get command Instead of using ceph auth get command via the ansible command module then we can use the ceph_key module and the info state. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-02 17:17:29 +01:00
Dimitri Savineau	16cd183b9c	podman: force log driver to journald Since we've changed to podman configuration using the detach mode and systemd type to forking then the container logs aren't present in the journald anymore. The default conmon log driver is using k8s-file. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1890439 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-02 15:49:27 +01:00
Benoît Knecht	8f436ab5d8	ceph-mon: Fix check mode for deploy monitor tasks Skip the `get initial keyring when it already exists` task when both commands whose `stdout` output it requires have been skipped (e.g. when running in check mode). Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2020-10-19 20:22:08 +02:00
Benoît Knecht	54ba38e35e	Fix Ansible check mode for site.yml.sample playbook Make sure the `site.yml.sample` playbook can be run in check mode by skipping tasks that try to read the output of commands that have been skipped. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2020-10-07 00:29:44 +02:00
Dimitri Savineau	733596582d	ceph-handler: set handler on xxx_stat result In non containerized deployment we check if the service is running via the socket file presence. This is done via the xxx_socket_stat variable that check the file socket in the /var/run/ceph/ directory. In some scenarios, we could have the socket file still present in that directory but not used by any process. That's why we have the xxx_stat variable which clean those leftovers. The problem here is that we're set the variable for the handlers status (like handler_mon_status) based on xxx_socket_stat instead of xxx_stat. That means we will trigger the handlers if there's an old socket file present on the system without any process associated. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1866834 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-29 07:32:10 +02:00
Dimitri Savineau	50104650e7	add missing boolean filter Otherwise this will generate an ansible warning about the missing filter. [DEPRECATION WARNING]: evaluating xxx as a bare variable, this behaviour will go away and you might need to add \|bool to the expression in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-28 20:45:01 +02:00
Dimitri Savineau	abb4023d76	ceph_key: set state as optional Most ansible module using a state parameter default to the present value (when available) instead of using it as a mandatory option. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-14 14:12:21 -04:00
Dimitri Savineau	47b7c00287	podman: always remove container on start In case of failure, the systemd ExecStop isn't executed so the container isn't removed. After a reboot of a failed node, the container doesn't start because the old container is still present in created state. We should always try to remove the container in ExecStartPre for this situation. A normal reboot doesn't trigger this issue and this also doesn't affect nodes running containers via docker. This behaviour was introduced by `d43769d`. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1858865 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-23 17:00:38 +02:00
Dimitri Savineau	d43769dc2a	podman: Add Type and PIDFile value to unit files This changes the way we are running the podman containers via systemd. They are now in dettached mode and Type/PIDFile set. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1834974 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-06-23 09:37:50 +02:00
Dimitri Savineau	bd22f1d1ec	docker: Add Requires on docker service When using docker container engine then the systemd unit scripts only use a dependency on the docker daemon via the After parameter. But if docker is restarted on a live system then the ceph systemd units should wait for the docker daemon to be fully restarted. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1846830 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-06-22 23:08:50 +02:00
Dimitri Savineau	1fc6b33714	ceph-{mon,osd}: move default crush variables Since `ed36a11` we move the crush rules creation code from the ceph-mon to the ceph-osd role. To keep the backward compatibility we kept the possibility to set the crush variables on the mons side but we didn't move the default values. As a result, when using crush_rule_config set to true and wanted to use the default values for crush_rules then the crush rule ansible task creation will fail. "msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute 'crush_rules'" This patch move the default crush variables from ceph-mon to ceph-osd role but also use those default values when nothing is defined on the mons side. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1798864 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-17 10:50:53 +01:00
Dimitri Savineau	5a03e0ee1c	containers: add KillMode=none to systemd templates Because we are relying on docker\|podman for managing containers then we don't need systemd to manage the process (like kill). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-13 16:11:33 +01:00
Guillaume Abrioux	483adb5d79	common: add a default value for ceph_directories_mode Since this variable makes it possible to customize the mode for ceph directories, let's make it a bit more explicit by adding a default value in ceph-defaults. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-22 09:35:35 +01:00
Guillaume Abrioux	3e262e072b	containers: use --cpus instead --cpu-quota When using docker 1.13.1, the current condition: ``` {% if (container_binary == 'docker' and ceph_docker_version.split('.')[0] is version_compare('13', '>=')) or container_binary == 'podman' -%} ``` is wrong because it compares the first digit (1) whereas it should compare the second one. It means we always use `--cpu-quota` although documentation recommend using `--cpus` when docker version is 1.13.1 or higher. From the doc: > --cpu-quota=<value> Impose a CPU CFS quota on the container. The number of > microseconds per --cpu-period that the container is limited to before > throttled. As such acting as the effective ceiling. > If you use Docker 1.13 or higher, use --cpus instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-16 13:51:43 -05:00
Guillaume Abrioux	86f3eeb717	mon: support replacing a mon We must pick up a mon which actually exists in ceph-facts in order to detect if a cluster is running. Otherwise, it will state no cluster is already running which will end up deploying a new monitor isolated in a new quorum. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622688 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-09 12:59:12 -05:00
Dimitri Savineau	ef2cb99f73	ceph-osd: add device class to crush rules This adds device class support to crush rules when using the class key in the rule dict via the create-replicated sub command. If the class key isn't specified then we use the create-simple sub command for backward compatibility. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1636508 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-11-14 16:25:46 +01:00
Dimitri Savineau	ed36a11eab	move crush rule creation from mon to osd role If we want to create crush rules with the create-replicated sub command and device class then we need to have the OSD created before the crush rules otherwise the device classes won't exist. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-11-14 16:25:46 +01:00
Mihai Plasoianu	d3f67d63ae	ceph-mon: use --admin-daemon to set default crush rule Signed-off-by: Mihai Plasoianu <m.plasoianu@vertical.de>	2019-10-29 20:59:32 -04:00
Guillaume Abrioux	3d28773da5	mon: call mon_status from asok since c09b82a80a392ccd0da7677c7b424ce5cd3fa5d6 in ceph/ceph we must call mon_status from asok instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-24 10:19:16 -04:00
Dimitri Savineau	f7fd0b6d4f	lint: fix error [303,602,701,702] [303] mktemp used in place of tempfile module [602] Don't compare to empty string [701] No 'galaxy_info' found [702] Use 'galaxy_tags' rather than 'categories' This patch also changes the ansible log_path value via the ANSIBLE_LOG_PATH environment variable in the travis configuration to avoid warnings. [WARNING]: log file at /home/travis/ansible/ansible.log is not writeable and we cannot create it, aborting Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-15 10:07:52 +02:00
Guillaume Abrioux	bd64167469	container: isolate systemd tasks This commit isolates the systemd unit files generation for containers into separate yml files in order to be able importing each corresponding roles without playing all tasks. This is needed so we can run ceph-ansible to render systemd unit files so they call podman instead of docker. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-01 10:27:51 -04:00
Guillaume Abrioux	ab370b6ad8	global: remove fetch_directory dependency This commit drops the fetch_directory dependency. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622688 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-09-26 11:35:24 +02:00
Dimitri Savineau	2b0616ecca	ceph-mon: Bind mount the ca-trust directory On containerized deployment, the mon container sometimes needs to access to the radosgw endpoint (via the radosgw-admin command). When using TLS on the radosgw with self-signed certificates then we need to access to the CA certification from the mon container. The CA certificate needs to be added on the host and then the directory will be bind mount on the container. Resolves: #4358 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-08-27 20:53:45 +02:00
Artur Fijalkowski	011270ca69	global: make directories mode parameterizable This commit makes it possible to parametrize the ceph directories modes. So it changes hardocded mode for ceph related directories from 0755 to customizable with `ceph_directories_mode` variable. Closes: #2920 Signed-off-by: Artur Fijalkowski <artur.fijalkowski@ing.com> Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-23 09:38:17 +02:00
Guillaume Abrioux	4df92152c0	common: replace shell module there is no need to use `shell` in these tasks. Let's use `command`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-14 16:42:02 +02:00
Guillaume Abrioux	13815ad3ca	common: use discovered_interpreter_python fact in order to use the right binary name when using python cli in command or shell module. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-08-14 16:42:02 +02:00
ilyashestopalov	904532c5e2	ceph-mon: Fix cluster name parameter The ability to add nodes with the monitor role to an existing cluster whose name differs from the default name is fixed. Signed-off-by: ilyashestopalov <usr.tester@yandex.ru>	2019-07-07 07:21:29 +02:00
Guillaume Abrioux	33eed78d17	containers: improve logging bindmount /var/log/ceph on all containers so it's possible to retrieve logs from the host. related ceph-container PR: ceph/ceph-container#1408 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1710548 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-28 13:30:36 -04:00
Dimitri Savineau	7c3640177b	roles: Remove useless become (true) flag We already set the become flag to true at a play level in the site* playbooks so we don't need to set it at a task level. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-19 10:31:32 +02:00
Guillaume Abrioux	905c2256bd	mon: enforce mon0 delegation for initial_mon_key register since this task is designed to be always run on the first monitor, let's enforce the container name accordingly otherwise it could fail like following: ``` fatal: [mon1 -> mon0]: FAILED! => changed=true cmd: - docker - exec - ceph-mon-mon1 - ceph - --cluster - ceph - --name - mon. - -k - /var/lib/ceph/mon/ceph-mon0/keyring - auth - get-key - mon. delta: '0:00:00.085025' end: '2019-06-12 06:12:27.677936' msg: non-zero return code rc: 1 start: '2019-06-12 06:12:27.592911' stderr: 'Error response from daemon: No such container: ceph-mon-mon1' stderr_lines: <omitted> stdout: '' stdout_lines: <omitted> ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-12 11:21:19 -04:00
Dimitri Savineau	f49090df7e	podman: Add systemd dependency on network.target When using podman, the systemd unit scripts don't have a dependency on the network. So we're not sure that the network is up and running when the containers are starting. With docker this behaviour is already handled because the systemd unit scripts depend on docker service which is started after the network. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-07 09:28:58 +02:00
L3D	ab54fe20ec	ansible: use 'bool' filter on boolean conditionals By running ceph-ansible there are a lot ``[DEPRECATION WARNING]`` like these: ``` [DEPRECATION WARNING]: evaluating containerized_deployment as a bare variable, this behaviour will go away and you might need to add \|bool to the expression in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg. ``` Now appended ``\| bool`` on a lot of the affected variables. Sometimes the coding style from ``variable\|bool`` changed to ``variable \| bool`` (with spaces at the pipe). Closes: #4022 Signed-off-by: L3D <l3d@c3woc.de>	2019-06-06 10:21:17 +02:00
Guillaume Abrioux	e74d80e72f	rename docker_exec_cmd variable This commit renames the `docker_exec_cmd` variable to `container_exec_cmd` so it's more generic. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-16 16:39:13 +02:00
Rishabh Dave	56bfec7c58	ceph-mgr: create keys for MGRs Add code in ceph-mgr for creating a keyring for manager in so that managers can be deployed on a separate node too. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-05-07 14:13:06 +02:00
Rishabh Dave	89748d579a	don't access other node's docker_exec_cmd variable Except for some corner case, it's not correct to access some other node's copy of variable docker_exec_cmd. Therefore replace "hostvars[groups[mon_group_name][0]]['docker_exec_cmd']" by "docker_exec_cmd". Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-05-07 12:37:48 +02:00
Rishabh Dave	739a662c80	improve coding style Keywords requiring only one item shouldn't express it by creating a list with single item. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-04-23 15:37:07 +02:00
Guillaume Abrioux	edf1ee2073	mon: check if an initial monitor keyring already exists When adding a new monitor, we must reuse the existing initial monitor keyring. Otherwise, the new monitor will issue its 'mkfs' with a new monitor keyring and it will result with a mismatch between them. The new monitor will be unable to join the quorum in the end. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> Co-authored-by: Rishabh Dave <ridave@redhat.com>	2019-04-15 10:00:50 +02:00
Guillaume Abrioux	631e5d3144	mon: remove useless delegate_to Let's use a condition to run this task only on the first mon. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-10 00:39:25 +00:00
fpantano	afbb90e4ac	Check ceph_health_raw.stdout value as string during mon bootstrap According to rdo testing https://review.rdoproject.org/r/#/c/18721 a check on the output of the ceph_health value is added to allow the playbook to make several attempts (according to the retry/delay variables) when waiting the cluster quorum or when the container bootstrap is not ended. It avoids the failure of the command execution when it doesn't receive a valid json object to decode (because cluster is too slow to boostrap compared to ceph-ansible task execution). Signed-off-by: fpantano <fpantano@redhat.com>	2019-04-03 20:55:05 +00:00
Rishabh Dave	4241b6403f	merge task blocks if their execution is based on same conditions Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-03-29 16:16:04 +00:00
Rishabh Dave	e0beaf123a	"when" keyword should precede "block" keyword Otherwise the reader is forced to search for "when" when blocks are too long. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2019-03-29 16:16:04 +00:00
Guillaume Abrioux	5c3ce4ca77	mon: fetch initial keyring even when running rolling_update otherwise, the task to copy mgr keyring fails during the rolling_update. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	82764afe8d	update: mask systemd service units during upgrade This prevents the packaging from restarting services before we do need to restart them in the rolling update sequence. We want to handle services restart at rolling_update playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	b4f14aba8e	ceph_key: `lookup_ceph_initial_entities` shouldn't fail on update As of nautilus, the initial keyrings list has changed, it means when upgrading from Luminous or Mimic, it is expected there's a mismatch between what is found on the cluster and the expected initial keyring list hardcoded in ceph_key module. We shouldn't fail when upgrading to nautilus. str_to_bool() took from ceph-volume. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> Co-Authored-by: Alfredo Deza <adeza@redhat.com>	2019-03-25 16:02:56 -04:00
Dimitri Savineau	d8538ad4e1	Set the default crush rule in ceph.conf Currently the default crush rule value is added to the ceph config on the mon nodes as an extra configuration applied after the template generation via the ansible ini module. This implies two behaviors: 1/ On each ceph-ansible run, the ceph.conf will be regenerated via ceph-config+template and then ceph-mon+ini_file. This leads to a non necessary daemons restart. 2/ When other ceph daemons are collocated on the monitor nodes (like mgr or rgw), the default crush rule value will be erased by the ceph.conf template (mon -> mgr -> rgw). This patch adds the osd_pool_default_crush_rule config to the ceph template and only for the monitor nodes (like crush_rules.yml). The default crush rule id is read (if exist) from the current ceph configuration. The default configuration is -1 (ceph default). Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1638092 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-14 08:56:52 +00:00
Dimitri Savineau	a089e1ec23	systemd/service: Set docker.service conditionally We don't need to set After=docker.service when the container_binary variable isn't set to docker. It doesn't break anything currently but it could be confusing when using podman. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-07 20:56:11 +00:00
Dimitri Savineau	cb381b41fe	Add CONTAINER_IMAGE env var to ceph daemons Ceph daemons will set the CONTAINER_IMAGE environment variable value in the daemon metadata. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-05 15:07:05 +00:00

1 2 3 4 5 ...

499 Commits (11b4bf5083639abea66a874ba86ac38a1b706ca6)