ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	5f6050bed1	container/systemd: ensure /var/log/ceph exists This adds a `ExecStartPre=-/usr/bin/mkdir -p /var/log/ceph` in all systemd service templates for all ceph daemon. This is specific to RHCS after a Leapp upgrade is done. Indeed, the `/var/log/ceph` seems to be removed after the upgrade. In order to work around this issue let's ensure the directory is present before trying to start the containers with podman. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1949489 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `bab403b603`)	2021-04-14 20:04:54 +02:00
Alex Schultz	56aac327dd	Use ansible_facts It has come to our attention that using ansible_* vars that are populated with INJECT_FACTS_AS_VARS=True is not very performant. In order to be able to support setting that to off, we need to update the references to use ansible_facts[<thing>] instead of ansible_<thing>. Related: ansible#73654 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935406 Signed-off-by: Alex Schultz <aschultz@redhat.com> (cherry picked from commit `a7f2fa73e6`)	2021-03-26 00:04:49 +01:00
Dimitri Savineau	3749d297c7	ceph-mon: add ExecStartPre docker stop to systemd We already do that in the other systemd templates (mgr, mds, etc..) and would present to add workaround in other orchestration tool. This change is for containerized deployment only. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1882724 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-01-29 09:03:34 +01:00
Guillaume Abrioux	c68b124ba8	container: remove `--ignore` from `podman rm` command As of podman 2.0.5, `--ignore` param conflicts with `--storage`. ``` Nov 30 13:53:10 magna089 podman[164443]: Error: --storage conflicts with --volumes, --all, --latest, --ignore and --cidfile ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-30 12:24:11 -05:00
Guillaume Abrioux	f5ba6d9b01	containers: modify bindmount option This commit changes the bind mount option for the mount point `/var/lib/ceph` in the systemd template for mon and mgr containers. This is needed in case of collocating mon/mgr with osds using dmcrypt scenario. Once mon/mgr got converted to containers, the dmcrypt layer sub mount is still seen in `/var/lib/ceph`. For some reason it makes the corresponding devices busy so any other container can't open/close it. As a result, it prevents osds from starting properly. Since it only happens on the nodes converted before the OSD play, the idea is to bind mount `/var/lib/ceph` on mon and mgr with the `rshared` option so once the sub mount is unmounted, it is propagated inside the container so it doesn't see that mount point. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1896392 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-17 09:19:23 -05:00
Guillaume Abrioux	5ba7824c55	container: force rm --storage on ExecStartPre This is a workaround to avoid error like following: ``` Error: error creating container storage: the container name "ceph-mgr-magna022" is already in use by "4a5f674e113f837a0cc561dea5d2cd55d16ca159a647b7794ab06c4c276ef701" ``` that doesn't seem to be 100% reproducible but it shows up after a reboot. The only workaround we came up with at the moment is to run `podman rm --storage <container>` before starting it. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1887716 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-11-16 10:38:40 -05:00
Dimitri Savineau	16cd183b9c	podman: force log driver to journald Since we've changed to podman configuration using the detach mode and systemd type to forking then the container logs aren't present in the journald anymore. The default conmon log driver is using k8s-file. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1890439 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-11-02 15:49:27 +01:00
Dimitri Savineau	50104650e7	add missing boolean filter Otherwise this will generate an ansible warning about the missing filter. [DEPRECATION WARNING]: evaluating xxx as a bare variable, this behaviour will go away and you might need to add \|bool to the expression in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-09-28 20:45:01 +02:00
Dimitri Savineau	47b7c00287	podman: always remove container on start In case of failure, the systemd ExecStop isn't executed so the container isn't removed. After a reboot of a failed node, the container doesn't start because the old container is still present in created state. We should always try to remove the container in ExecStartPre for this situation. A normal reboot doesn't trigger this issue and this also doesn't affect nodes running containers via docker. This behaviour was introduced by `d43769d`. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1858865 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-23 17:00:38 +02:00
Dimitri Savineau	d43769dc2a	podman: Add Type and PIDFile value to unit files This changes the way we are running the podman containers via systemd. They are now in dettached mode and Type/PIDFile set. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1834974 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-06-23 09:37:50 +02:00
Dimitri Savineau	bd22f1d1ec	docker: Add Requires on docker service When using docker container engine then the systemd unit scripts only use a dependency on the docker daemon via the After parameter. But if docker is restarted on a live system then the ceph systemd units should wait for the docker daemon to be fully restarted. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1846830 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-06-22 23:08:50 +02:00
Dimitri Savineau	5a03e0ee1c	containers: add KillMode=none to systemd templates Because we are relying on docker\|podman for managing containers then we don't need systemd to manage the process (like kill). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-13 16:11:33 +01:00
Guillaume Abrioux	3e262e072b	containers: use --cpus instead --cpu-quota When using docker 1.13.1, the current condition: ``` {% if (container_binary == 'docker' and ceph_docker_version.split('.')[0] is version_compare('13', '>=')) or container_binary == 'podman' -%} ``` is wrong because it compares the first digit (1) whereas it should compare the second one. It means we always use `--cpu-quota` although documentation recommend using `--cpus` when docker version is 1.13.1 or higher. From the doc: > --cpu-quota=<value> Impose a CPU CFS quota on the container. The number of > microseconds per --cpu-period that the container is limited to before > throttled. As such acting as the effective ceiling. > If you use Docker 1.13 or higher, use --cpus instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-16 13:51:43 -05:00
Dimitri Savineau	2b0616ecca	ceph-mon: Bind mount the ca-trust directory On containerized deployment, the mon container sometimes needs to access to the radosgw endpoint (via the radosgw-admin command). When using TLS on the radosgw with self-signed certificates then we need to access to the CA certification from the mon container. The CA certificate needs to be added on the host and then the directory will be bind mount on the container. Resolves: #4358 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-08-27 20:53:45 +02:00
Guillaume Abrioux	33eed78d17	containers: improve logging bindmount /var/log/ceph on all containers so it's possible to retrieve logs from the host. related ceph-container PR: ceph/ceph-container#1408 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1710548 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-06-28 13:30:36 -04:00
Dimitri Savineau	f49090df7e	podman: Add systemd dependency on network.target When using podman, the systemd unit scripts don't have a dependency on the network. So we're not sure that the network is up and running when the containers are starting. With docker this behaviour is already handled because the systemd unit scripts depend on docker service which is started after the network. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-06-07 09:28:58 +02:00
Dimitri Savineau	a089e1ec23	systemd/service: Set docker.service conditionally We don't need to set After=docker.service when the container_binary variable isn't set to docker. It doesn't break anything currently but it could be confusing when using podman. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-07 20:56:11 +00:00
Dimitri Savineau	cb381b41fe	Add CONTAINER_IMAGE env var to ceph daemons Ceph daemons will set the CONTAINER_IMAGE environment variable value in the daemon metadata. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-05 15:07:05 +00:00
Guillaume Abrioux	8c8ec63633	container: use tmpfiles.d to creates /run/ceph instead of using `RuntimeDirectory` parameter in systemd unit files, let's use a systemd `tmpfiles.d` to ensure `/run/ceph`. Explanation: `podman` doesn't create the `/var/run/ceph` if it doesn't exist the time where the container is run while `docker` used to create it. In case of `switch_to_containers` scenario, `/run/ceph` gets created by a tmpfiles.d systemd file; when switching to containers, the systemd unit file complains because `/run/ceph` already exists The better fix would be to ensure `/usr/lib/tmpfiles.d/ceph-common.conf` is removed and only rely on `RuntimeDirectory` from systemd unit file parameter but we come from a non-containerized environment which is already running, it means `/run/ceph` is already created and when starting the unit to start the container, systemd will still complain and we can't simply remove the directory if daemons are collocated. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-02-13 09:42:27 +01:00
Guillaume Abrioux	914d94cae8	set RuntimeDirectory in all systemd unit templates /var/run/ceph resides in a non persistent filesystem (tmpfs) After a reboot, all daemons won't start because this directory will be missing. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-02-05 18:14:28 +01:00
Sébastien Han	fc34fb1bd9	mon: ability to change mon listening port on container You can now use 'ceph_mon_container_listen_port' to change the port the monitor will listen on. Setting the default to 3300 (assigned by IANA) since Nautilus has released the messenger2 transport protocol. Signed-off-by: Sébastien Han <seb@redhat.com>	2019-01-22 13:45:38 +01:00
Sébastien Han	d9e7835086	mon: remove ceph aliases for containers These aliases have led to several issues making believe that ceph binaries are actually present on the host when running the command. However it wasn't explicit that the commands were only ran inside a container. It has brought to much confusion so we decided to remove them. Closes: https://github.com/ceph/ceph-ansible/issues/3445 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-17 11:10:03 +01:00
Guillaume Abrioux	fead0813b4	remove kv store support the next stable release will drop this feature. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-30 13:45:12 +00:00
Sébastien Han	80ba45793d	fix template generation Position the right condition on ceph_docker_version, activate it when the container_binary is 'docker'. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	a96e910114	Add new container scenario Test with podman instead of docker and also support for python 3 only. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Guillaume Abrioux	74ef7769fb	mon: use `_current_monitor_address` in systemd unit file Let's avoid a jinja loop and use `_current_monitor_address` to get the monitor address. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-31 14:16:10 +01:00
Guillaume Abrioux	a2b2028212	config: remove complex jinja logic in ceph.conf.j2 using consecutive set_fact in the playbook instead of complex jinja syntax makes ceph.conf.j2 more readable. By the way, jinja can be painful to debug at some point. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-31 14:16:10 +01:00
Noah Watkins	306e308f13	Avoid using tests as filter Fixes the deprecation warning: [DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of using `result\|search` use `result is search`. Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-10-10 04:26:33 +00:00
Guillaume Abrioux	1c88c444a3	mon: fix `ExecStartPre` option in systemd unit file This command line is not supported. According to official documentation: ``` Note that shell command lines are not directly supported. If shell command lines are to be used, they need to be passed explicitly to a shell implementation of some kind. ``` We must run this using /bin/sh instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-11 10:48:21 +02:00
Guillaume Abrioux	9f54b3b4a7	mon: ensure socker is purged when mon is stopped On containerized deployment, if a mon is stopped, the socket is not purged and can cause failure when a cluster is redeployed after the purge playbook has been run. Typical error: ``` fatal: [osd0]: FAILED! => {} MSG: 'dict object' has no attribute 'osd_pool_default_pg_num' ``` the fact is not set because of this previous failure earlier: ``` ok: [mon0] => { "changed": false, "cmd": "docker exec ceph-mon-mon0 ceph --cluster test daemon mon.mon0 config get osd_pool_default_pg_num", "delta": "0:00:00.217382", "end": "2018-07-09 22:25:53.155969", "failed_when_result": false, "rc": 22, "start": "2018-07-09 22:25:52.938587" } STDERR: admin_socket: exception getting command descriptions: [Errno 111] Connection refused MSG: non-zero return code ``` This failure happens when the ceph-mon service is stopped, indeed, since the socket isn't purged, it's a leftover which is confusing the process. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-07-10 20:08:07 +00:00
Sébastien Han	322e2de7d2	mon: honour mon_docker_net_host option --net=host was hardcoded in the startup line so even though mon_docker_net_host was set to False the net option would always be activated. mon_docker_net_host is set to True by default so this commit does not change the behaviour. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-06-27 13:44:41 +00:00
Sébastien Han	65ba85aff6	Expose /var/run/ceph Useful for softwares that do data collection/monitoring like collectd. They can connect to the socket and then retrieve information. Even though the sockets are exposed now, I'm keeping the docker exec to check the socket, this will allow newer version of ceph-ansible to work with older versions. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1563280 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-20 15:48:32 +02:00
Sébastien Han	641f141c0f	selinux: remove chcon calls We know bindmount with the :z option at the end of the -v command so this will basically run the exact same command as we used to run. So to speak: chcon -Rt svirt_sandbox_file_t /var/lib/ceph Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-19 14:59:37 +02:00
jtudelag	691f7c5146	Adds handy ceph aliases whe containerized installations. Same approach as openshift-ansible etcdctl: * https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/etcd/tasks/auxiliary/drop_etcdctl.yml * https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/etcd/etcdctl.sh	2018-03-08 13:56:39 +01:00
Sébastien Han	6f9dd26caa	config: remove any spaces in public_network or cluster_network With two public networks configured - we found that with "NETWORK_ADDR_1, NETWORK_ADDR_2" install process consistently became broken, trying to find docker registry on second network, and not finding mon container. but without spaces "NETWORK_ADDR_1,NETWORK_ADDR_2" install succeeds so, containerized install is more peculiar with formatting of this line Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1534003 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-01-30 17:47:15 +01:00
Christian Berendt	50a848dc40	Rename fact docker_version to ceph_docker_version The name docker_version is very generic and is also used by other roles. As a result, there may be name conflicts. To avoid this a ceph_ prefix should be used for this fact. Since it is an internal fact renaming is not a problem.	2017-12-15 20:12:21 +01:00
Major Hayden	5676fa23b1	Convert interface names to underscores for facts If a deployer uses an interface name with a dash/hyphen in it, such as 'br-storage' for the monitor_interface group_var, the ceph.conf.j2 template fails to find the right facts. It looks for 'ansible_br-storage' but only 'ansible_br_storage' exists. This patch converts the interface name to underscores when the template does the fact lookup.	2017-12-12 09:03:40 +01:00
Guillaume Abrioux	80d32decd3	config: fix config generation The path to the fact is not correct. In any case, we will retrieve the IP address in hostvars, the variable is the way we get the interface name according where it has been set (eg.: inventory host file vs. group_vars/) Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1510906 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-09 08:50:57 +01:00
Sébastien Han	ab7eb79212	config: fix monitor_interface when not passed in the inventory file Setting monitor_interface in group_vars/all.yml makes the hostvars[host]['monitor_interface'] non-existing. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1507922 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-11-03 09:25:02 +01:00
Sébastien Han	4413511b66	all: backward compatibility between stable-2.2 and 3.0 stable-3.0 brought numerous changes in ceph-ansible variables, this PR aims to maintain backward compatibility for someone running stable-2.2 upgrading to stable-3.0 but keeps its groups_vars untouched. We will then determine the right options to make sure the upgrade works but we are expecting that new variables should be used. We will drop this in a near future, maybe 3.1 or 3.2. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-20 11:54:10 +02:00
Christian Berendt	cf901f0171	In docker start scripts replace \u00a0 with \u0020 This will solve the following issue when starting docker containers on ubuntu: invalid argument "1\u00a0" for --cpus=1 : failed to parse 1 as a rational number Closes-bug: #2056	2017-10-16 15:16:48 +02:00
Guillaume Abrioux	be757122f1	config: fix path to set `interface` in ceph.conf need to use `hostvars[host]['XXX']` to retrieve the monitor interface and/or radosgw interface. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1493920 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-09-23 14:18:28 +02:00
Sébastien Han	02ba65dbbe	mon: add support for monitor_address block for containers Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-12 16:28:08 -06:00
Sébastien Han	2ea7f287fa	docker: simplify variable declaration Less configuration for the user, the container inherit from the global variables. No more container specific variables. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-09 01:22:06 +02:00
Sébastien Han	2fa151b9e8	container: introduce resource limitation for containers This can be controlled via 2 options: * ceph_$DAEMON_docker_memory_limit * ceph_$DAEMON_docker_cpu_limit All daemons default to 1GB for memory and 1 CPU by default. Recommendations from: https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html/red_hat_ceph_storage_hardware_guide/minimum_recommendations Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-06 14:52:21 +02:00
Andy McCrae	4671b9e74e	Allow ceph service systemd overrides to be specified ceph services can fail to start under certain circumstances (for example, when running in a container) because the default systemd service configuration causes namespace issues. To work around this we can override the system service settings by placing an overrides file in the ceph-<service>@.service.d directory. This can be generic so as to allow any potential changes required to the ceph-<service> service files. The overrides file is only setup when the "ceph_<service>_systemd_overrides" config_template override variable is specified. The available service systemd override files are as follows: ceph_mds_systemd_overrides ceph_mgr_systemd_overrides ceph_mon_systemd_overrides ceph_osd_systemd_overrides ceph_rbd_mirror_systemd_overrides ceph_rgw_systemd_overrides	2017-08-16 17:57:06 +01:00
Guillaume Abrioux	896d62d78b	Refact: remove ceph_mon_docker_interface variable remove `ceph_mon_docker_interface` and use `monitor_interface` instead for both containerized and non-containerized deployment. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-07-04 18:08:59 +02:00
Guillaume Abrioux	88df105d0b	Common: Add ipv6 support `e8187f6` does not fix the ipv6 as expected since `ansible_default_*` are filled with the IP address carried by the network interface used by the default gateway route. By the way, it assumes that the MON_IP address will be this IP address which is not always the case. We need to keep using the previous fact but add some intelligence in the template to determine how to retrieve the ipv4\|ipv6 address since the path to the fact in `hostvars` is not the same according to ipv4 vs ipv6 case. Fix: 1569 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-07-04 10:57:26 +02:00
Andrew Schoen	e8187f6a0f	ceph-mon: fix support for ipv6 on containerized mons The fact ['ansible_$interface']['ipv4'] is a dictionary where ['ansible_$interface']['ipv6'] is a list. If we use ansible_default_ipv6\|ipv4 is is always a dictionary which allows us to get the ipv6 and ipv4 address without adding more complexity to the template. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-06-05 10:51:47 -05:00
Guillaume Abrioux	ddfe019342	Refact code `ceph-docker-common`: At the moment there is a lot of duplicated tasks in each `./roles/ceph-<role>/tasks/docker/main.yml` that could be refactored in `./roles/ceph-docker-common/tasks/main.yml`. `_containerized_deployment` variables: All `_containerized_deployment` have been refactored to a single variable `containerized_deployment` duplicate `cephx` variables in `group_vars/* have been removed. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-05-24 15:55:41 +02:00

1 2

63 Commits (676aad9ea257dc54b472aee4e269c5c278bde6b7)