Commit Graph

19 Commits (657e6c8c3b5bafb96d4179d596de0352479e2a64)

Author SHA1 Message Date
Dimitri Savineau 47b7c00287 podman: always remove container on start
In case of failure, the systemd ExecStop isn't executed so the container
isn't removed. After a reboot of a failed node, the container doesn't
start because the old container is still present in created state.
We should always try to remove the container in ExecStartPre for this
situation.
A normal reboot doesn't trigger this issue and this also doesn't affect
nodes running containers via docker.
This behaviour was introduced by d43769d.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1858865

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-07-23 17:00:38 +02:00
Dimitri Savineau d43769dc2a podman: Add Type and PIDFile value to unit files
This changes the way we are running the podman containers via systemd.
They are now in dettached mode and Type/PIDFile set.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1834974

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-06-23 09:37:50 +02:00
Dimitri Savineau bd22f1d1ec docker: Add Requires on docker service
When using docker container engine then the systemd unit scripts only
use a dependency on the docker daemon via the After parameter.
But if docker is restarted on a live system then the ceph systemd units
should wait for the docker daemon to be fully restarted.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1846830

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-06-22 23:08:50 +02:00
Francesco Pantano 15ed9eebf1 Configure ceph dashboard backend and dashboard_frontend_vip
This change introduces a new set of tasks to configure the
ceph dashboard backend and listen just on the mgr related
subnet (and not on '*'). For the same reason the proper
server address is added in both prometheus and alertmanger
systemd units.
This patch also adds the "dashboard_frontend_vip" parameter
to make sure we're able to support the HA model when multiple
grafana instances are deployed.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792230
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
2020-02-19 17:52:53 -05:00
Dimitri Savineau b9d975385c ceph-prometheus: add alertmanager HA config
When using multiple alertmanager nodes (via the grafana-server group)
then we need to specify the other peers in the configuration.

https://prometheus.io/docs/alerting/alertmanager/#high-availability
https://github.com/prometheus/alertmanager#high-availability

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792225

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-17 10:46:21 +01:00
Dimitri Savineau 5a03e0ee1c containers: add KillMode=none to systemd templates
Because we are relying on docker|podman for managing containers then we
don't need systemd to manage the process (like kill).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-13 16:11:33 +01:00
Guillaume Abrioux 498bc45859 dashboard: use fqdn in external url
Force fqdn to be used in external url for prometheus and alertmanager.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1765485

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-08 09:06:49 -05:00
Guillaume Abrioux a8d76d72d7 dashboard: use fqdn url for active alert
When using the shortname, the URL for active alert launches with short
hostname and fails to connect to the server.

This commit changes the template in order to use the fqdn.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1765485

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-03 14:30:32 +01:00
Guillaume Abrioux d23383a820 purge: remove docker_* task
All containers are removed when systemd stops them.
There is no need to call this module in purge container playbook.

This commit also removes all docker_image task and remove all container
images in the final cleanup play.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1776736

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-12-03 13:29:52 +01:00
fmount 8a666bfd15 Add http_addr option to grafana config
We have no reason to make grafana container
listen on *:<port>, so this change adds the
http_addr option to the grafana config file
and adds the related option on the wait_for
tasks.
Since grafana_server_addr should exists, we
shouldn't rely on the _current_monitor_addr
default on prometheus/grafana templates.
This change also remove this default value
that is not necessary anymore.

Signed-off-by: fmount <fpantano@redhat.com>
2019-08-29 13:00:22 -04:00
Dimitri Savineau 8ab9b719fa dashboard: use variables for port value
The current port value for alertmanager, grafana, node-exporter and
prometheus is hardcoded in the roles so it's not possible to change the
port binding of those services.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-07-18 07:22:13 +02:00
fmount 069076bbfd Fix units and add ability to have a dedicated instance
Few fixes on systemd unit templates for node_exporter and
alertmanager container parameters.
Added the ability to use a dedicated instance to deploy the
dashboard components (prometheus and grafana).
This commit also introduces the grafana_group_name variable
to refer grafana group and keep consistency with the other
groups.
During the integration with TripleO some grafana/prometheus
template variables resulted undefined. This commit adds the
ability to check if the group exist and create, accordingly,
different job groups in prometheus template.

Signed-off-by: fmount <fpantano@redhat.com>
2019-06-10 18:18:46 +02:00
Dimitri Savineau f49090df7e podman: Add systemd dependency on network.target
When using podman, the systemd unit scripts don't have a dependency
on the network. So we're not sure that the network is up and running
when the containers are starting.
With docker this behaviour is already handled because the systemd
unit scripts depend on docker service which is started after the
network.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-06-07 09:28:58 +02:00
Dimitri Savineau 29b0d47c8c ceph-prometheus: fix error in templates
- remove trailing double quotes in jinja templates
- add jinja filename without .j2 suffix

Resolves: #4011

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-05-21 17:53:39 +02:00
Guillaume Abrioux cc285c417a dashboard: align the way containers are managed
This commit aligns the way the different containers are managed with how
it's currently done with the other ceph daemon.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-05-16 16:39:13 +02:00
Guillaume Abrioux 3578d576a4 dashboard: rename template files
add .j2 to all templates file related to dashboard roles.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-05-16 16:39:13 +02:00
Boris Ranto b4d1c3693b dashboard: Support podman
This adds support for podman in dashboard-related roles. It also drops
the creation of custom network for the dashboard-related roles as this
functionality works in a different way with podman.

Signed-off-by: Boris Ranto <branto@redhat.com>
2019-05-16 16:39:13 +02:00
Boris Ranto 8f77caa932 dashboard: Add and copy alerting rules
This commit adds a list of alerting rules for ceph-dashboard from the
old cephmetrics project. It also installs the configuration file so that
the rules get recognized by the prometheus server.

Signed-off-by: Boris Ranto <branto@redhat.com>
2019-05-16 16:39:13 +02:00
Boris Ranto 2f141a6e80 Merge cephmetrics/dashboard-ansible repo
This commit will merge dashboard-ansible installation scripts with
ceph-ansible. This includes several new roles to setup ceph-dashboard
and the underlying technologies like prometheus and grafana server.

Signed-off-by: Boris Ranto & Zack Cerza <team-gmeno@redhat.com>
Co-authored-by: Zack Cerza <zcerza@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-05-16 16:39:13 +02:00