Commit Graph

2593 Commits (38ce02c2eacee20395b4d7ad6fc5b7b2c4470a30)

Author SHA1 Message Date
Paulo Matias 38ce02c2ea Allow user to specify grafana_server_fqdn
This is needed to get a TLS certificate to validate correctly.

If unspecified, auto-detected grafana_server_addr is used.

Signed-off-by: Paulo Matias <matias@ufscar.br>
2020-04-07 20:51:23 +02:00
Paulo Matias dac8e1d0a9 Prometheus APIs are only available through plain http
Trying to access these APIs through TLS produces "Could not reach
external API" errors in Ceph dashboard.

Signed-off-by: Paulo Matias <matias@ufscar.br>
2020-04-07 20:51:23 +02:00
Matthew Vernon 7963a76c7a Use a tempfile directory to store restart scripts
Make a tempfile directory and copy the restart scripts there (and then
execute them from there), rather than using insecure known filenames
in /tmp/

This is a partial fix for ceph/ceph-ansible#2937

Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
2020-04-06 22:55:51 +02:00
Dimitri Savineau 6617d90733 ceph-mgr: add saml python lib for dashboard SSO
The dashboard SSO mgr module requires the saml python library to be
installed. This is only a valid scenario for RHCS deployment because
the saml python library isn't available in other classic repositories.
This package is present in RHCS Tools repository so we also need to
enable it on the mgr nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1820233

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-04-06 10:11:00 -04:00
Guillaume Abrioux 1bb9860dfd osd: use default crush rule name when needed
When `rule_name` isn't set in `crush_rules` the osd pool creation will
fail.
This commit adds a new fact `ceph_osd_pool_default_crush_rule_name` with
the default crush rule name.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1817586

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-31 14:49:38 -04:00
Guillaume Abrioux 8c1c34b201 tests: add more coverage in external_clients scenario
Run create_users_keys.yml in external_clients scenario

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-31 14:49:38 -04:00
Guillaume Abrioux 5b0476385c osd: support changing default rule even when osd_crush_location isn't defined
Creating crush rules even with no crush hierarchy configuration is a
valid scenario so we shouldn't be bound to the first task result (which
configure crush hierarchy) to be able to add new crush rules.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816989

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-31 14:26:48 -04:00
Dimitri Savineau 64701437de container: remove ulimit nofile parameter
Since Ceph Octopus is python3 only we don't need to specify the max open
files anymore with the container engine.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-30 09:54:23 +02:00
Dimitri Savineau 4ac99223b2 rhcs: drop debian support
Support for debian with RHCS has been dropped starting RHCS 4

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-27 04:36:36 +01:00
Dimitri Savineau 90ad110861 rhcs: update release to 5 for octopus
RHCS 5 will be based on Ceph Octopus release and only supported on
RHEL 8.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-26 22:00:08 +01:00
Guillaume Abrioux e551b5ba1a defaults: remove legacy comment
This is no longer true, let's remove this comment given that this option
is not ignored in containerized deployments.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-26 09:19:14 -04:00
Guillaume Abrioux b7ada14cf5 defaults: change nfs_ganesha_stable_branch
In master, even though we are using dev repo, the value here should be closer
from the last stable released.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-25 22:30:15 +01:00
Dimitri Savineau 706de944cf ceph-defaults: update ceph_stable_redhat_distro
Since octopus the ceph_stable_redhat_distro variable should be set to
el8 instead of el7.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-25 21:00:24 +01:00
Dimitri Savineau 0487d21938 ceph-facts: fix rgw_instances_all fact
The rgw_instances_all fact is supposed to be the list of all radosgw
instances from all rgw nodes.
But the fact is always using the local rgw_instances variable so this
won't work on multiple nodes.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-25 08:02:13 +01:00
Guillaume Abrioux 83fdf24caf doc/tests: bump to ansible 2.9 on master
Add testing against ansible 2.9 on master branch.
This commit also updates the documentation.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-25 08:01:27 +01:00
Guillaume Abrioux 1b0b7af119 osd: add a default value for 'default' in crush_rules
Let's default to `False` for the `default` attribute in `crush_rules`
variable.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1797774

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-24 08:41:43 -04:00
Dimitri Savineau df8f853c85 Add pacific release
Add the 16th ceph release: pacific.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-24 09:47:12 +01:00
Guillaume Abrioux 1a7f3caecb facts: fix typo
This commit fixes a typo in some task titles

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-23 14:03:52 -04:00
Guillaume Abrioux cc28d9ec26 nfs: fix nfs with external ceph cluster support
This commit refact and fix the nfs deployment with external ceph cluster
support.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1814942

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-19 18:21:16 -04:00
Dimitri Savineau fb69f6990c dashboard: allow to set read-only admin user
This commit allows one to set the role for the admin user as read-only.
This can be controlled via the dashboard_admin_user_ro variable but the
default value is false for backward compatibility.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1810176

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-19 15:34:41 +01:00
Dimitri Savineau 5051e67f8f ceph-defaults: add registry name on dashboard vars
We don't use the registry name when using the community dashboard
container images (grafana, prometheus, alertmanager & node exporter).
This commit adds the docker.io registry explicitly in the default
dashboard container image name values.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-19 14:27:50 +01:00
Dimitri Savineau b97a4d5201 ceph-defaults: update grafana container tag
Since 8e8aa73 we're using grafana 5.4.3 in RHCS 4.1 via [1].
We should also update the grafana container tag from docker.io when
using the community release.

[1] registry.redhat.io/rhceph/rhceph-4-dashboard-rhel8:4

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-17 14:35:06 +01:00
petruha 73b3fadb0e ceph-facts: Fix system_secret_key variable handling
This commit fixes the system_secret_key variable not substitued by the
right value and always using the 'system_secret_key' string instead.

$ egrep 'system_(access|secret)_key' group_vars/all.yml
system_access_key: foofoofoofoofoofoofo
system_secret_key: barbarbarbarbarbarbarbarbarbarbarbarbarb

$ ansible-playbook -vv -i hosts site.yml.sample -e rgw_multisite=true
(...)
  - hostname: storage0
    endpoint: http://192.168.100.42:8080
    instance_name: rgw0
    radosgw_address: 192.168.50.3
    radosgw_frontend_port: 8085
    rgw_realm: canada
    rgw_zone: montreal
    rgw_zone_user: justin.trudeau
    rgw_zone_user_display_name: Justin Trudeau
    rgw_zonegroup: quebec
    system_access_key: foofoofoofoofoofoofo
    system_secret_key: system_secret_key

Fixes https://github.com/ceph/ceph-ansible/issues/5150

Signed-off-by: petruha <5363545+p37ruh4@users.noreply.github.com>
2020-03-16 17:38:52 -04:00
Guillaume Abrioux 152c2caa9f config: remove legacy option in ceph.conf.j2
This option has been deprecated (As of 0.51).
By the way, ceph-ansible already sets the
auth_{service,client,cluster}_required variables.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1623586

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-16 09:49:36 -04:00
Dimitri Savineau 3626c688cf handler: add rgw multi-instances support
This commit adds the rgw multi-instances support in ceph-handler
(restart_rgw_daemons.sh.j2)

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-12 16:44:48 -04:00
Guillaume Abrioux 60a2e28189 rgw: add multi-instances support when deploying multisite
This commit adds the multi-instances when deploying rgw multisite

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-12 16:44:48 -04:00
Dimitri Savineau e8bf0a0cf2 ceph-infra: open radosgw ports for multi instances
When using the radosgw multi instances configuration then the firewall
rules aren't adapted to that setup.
We only open the port according to the radosgw_frontend_port variable
so only the first radosgw instance port will be opened in the firewall
configuration.
We should instead iterate over the rgw_instances list.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-12 16:44:48 -04:00
Dimitri Savineau e62532de46 update osd pool set size command
Since [1] we can't use osd pool without replicas (size: 1) by default.
We now need to set the mon_allow_pool_size_one flag to true in the ceph
configuration and add the --yes-i-really-mean-it flag to the osd pool
set size cli.

[1] https://github.com/ceph/ceph/commit/21508bd

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-11 11:25:42 +01:00
Guillaume Abrioux b3bbd6bb77 rgw: fix a typo in create_realm_zonegroup_zone_lists
This commit fixes a typo.

`s/realms/secondary_realms`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-10 14:13:30 +01:00
Guillaume Abrioux b3d943fe9f infra: add retries/until on firewalld start task
This commit make that task retrying 5 times to start the service
firewalld to avoid failure like following:

```
TASK [ceph-infra : start firewalld] ********************************************
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-centos-container-purge/roles/ceph-infra/tasks/configure_firewall.yml:22
Monday 09 March 2020  08:58:48 +0000 (0:00:00.963)       0:02:16.457 **********
fatal: [osd4]: FAILED! => changed=false
  msg: |-
    Unable to enable service firewalld: Created symlink from /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service to /usr/lib/systemd/system/firewalld.service.
    Created symlink from /etc/systemd/system/multi-user.target.wants/firewalld.service to /usr/lib/systemd/system/firewalld.service.
    Failed to execute operation: Connection reset by peer
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-09 15:01:34 -04:00
Christian Berendt 608d7188a1 openstack_keys: use openstack_cinder_pool.name
Instead of volumes as a static string the openstack_cinder_pool.name
variable should be used as with the other keys.

Signed-off-by: Christian Berendt <berendt@betacloud-solutions.de>
2020-03-09 08:17:22 -04:00
Guillaume Abrioux 7a8a719e75 rgw: add retry/until on pools tasks
Sometimes, these task can timeout for some reason.
Adding these retries can help to avoid unexcepted failures.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-06 08:55:13 -05:00
Guillaume Abrioux eac207091b client: skip create_users_keys.yml when rolling_update
There's no need to run this part of the role when upgrading clients
node. Let's skip it when rolling_update.yml is being run.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-04 13:06:32 -05:00
Ali Maredia 71f55bd54d rgw multisite: enable more than 1 realm per cluster
Make it so that more than one realm, zonegroup,
or zone can be created during a run of the rgw
multisite ansible playbooks.

The rgw hosts now need to be grouped into zones
and realms in the inventory.

.yml files need to be created in group_vars
for the realms and zones. Sample yaml files
are available.

Also remove multsite destroy playbook
and add --cluster before radosgw-admin commands

remove manually added rgw_zone_endpoints var
and have ceph-ansible automatically add the
correct endpoints of all the rgws in a rgw_zone
from the information provided in that rgws hostvars.

Signed-off-by: Ali Maredia <amaredia@redhat.com>
2020-03-04 12:58:13 -05:00
Guillaume Abrioux e17c79b871 osd: do not change pool size on erasure pool
This commit adds condition in order to not try to customize pools size
when its type is erasure.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-04 09:29:01 -05:00
Guillaume Abrioux 47adc2bb08 osd: add pg autoscaler support
This commit adds the pg autoscaler support.

The structure for pool definition has now two additional attributes
`pg_autoscale_mode` and `target_size_ratio`, eg:

```
test:
  name: "test"
  pg_num: "{{ osd_pool_default_pg_num }}"
  pgp_num: "{{ osd_pool_default_pg_num }}"
  rule_name: "replicated_rule"
  application: "rbd"
  type: 1
  erasure_profile: ""
  expected_num_objects: ""
  size: "{{ osd_pool_default_size }}"
  min_size: "{{ osd_pool_default_min_size }}"
  pg_autoscale_mode: False
  target_size_ratio": 0.1
```

when `pg_autoscale_mode` is `True` user has to set a decent value in
`target_size_ratio`.

Given that it's a new feature, it's still disabled by default.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1782253

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-04 09:29:01 -05:00
Guillaume Abrioux bf1f125d71 osd: refact osd pool creation
Currently, the command executed is wrong, eg:

```
  cmd:
  - podman
  - exec
  - ceph-mon-controller-0
  - ceph
  - --cluster
  - ceph
  - osd
  - pool
  - create
  - volumes
  - '32'
  - '32'
  - replicated_rule
  - '1'
  delta: '0:00:01.625525'
  end: '2020-02-27 16:41:05.232705'
  item:
```

From documentation, the osd pool creation command is :

```
ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] \
     [crush-rule-name] [expected-num-objects]
ceph osd pool create {pool-name} {pg-num}  {pgp-num}   erasure \
     [erasure-code-profile] [crush-rule-name] [expected_num_objects]
```

it means we pass '1' (from item.type) as value for
`expected_num_objects` by default which is very likely not what we want.

Also, this commit modifies the default value when no `rule_name` is set
to use the existing variable `osd_pool_default_crush_rule`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1808495

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-04 09:29:01 -05:00
Dimitri Savineau be8b315102 ceph-validate: add key format validation
If the user provides manually the key value for a specific keyring then
there's not valation on the content which could lead to unexpected
failures in the ceph_key module.

Closes: #5104

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-03 10:01:58 +01:00
Dimitri Savineau 9d3b49293d purge: stop rgw instances by iteration
It looks like that the service module doesn't support wildcard anymore
for stopping/disabling multiple services.

fatal: [rgw0]: FAILED! => changed=false
  msg: 'This module does not currently support using glob patterns,
        found ''*'' in service name: ceph-radosgw@*'
...ignoring

Instead we should iterate over the rgw_instances list.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-02 16:32:06 +01:00
Dimitri Savineau 90b1fc8fe9 ceph-infra: install firewalld python bindings
When using the firewalld ansible module we need to be sure that the
python bindings are installed.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-02 16:32:06 +01:00
Dimitri Savineau 45fb9241c0 ceph-infra: split firewalld tasks
Since ansible 2.9 the firewalld task could not be used with service and
source in the same time anymore.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-02 16:32:06 +01:00
Dimitri Savineau aefba82a2e Add ansible 2.9 support
This commit adds ansible 2.9 support in addition of 2.8.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-02 16:32:06 +01:00
Guillaume Abrioux 0326d992c2 osd: add journal option in ceph_volume call (batch)
This commit adds the journal option to the ceph_volume call when
scenario is lvm batch

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-02-28 17:29:59 -05:00
Guillaume Abrioux a084a2a347 common: support OSDs with more than 2 digits
When running environment with OSDs having ID with more than 2 digits,
some tasks don't match the system units and therefore, playbook can fail.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1805643

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-02-27 09:48:36 +01:00
Dimitri Savineau 44e750ee5d ceph-rgw: increase connection timeout to 10
5s as a connection timeout could be low in some setup. Let's increase
it to 10s.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-24 16:01:36 +01:00
Francesco Pantano 15ed9eebf1 Configure ceph dashboard backend and dashboard_frontend_vip
This change introduces a new set of tasks to configure the
ceph dashboard backend and listen just on the mgr related
subnet (and not on '*'). For the same reason the proper
server address is added in both prometheus and alertmanger
systemd units.
This patch also adds the "dashboard_frontend_vip" parameter
to make sure we're able to support the HA model when multiple
grafana instances are deployed.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792230
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
2020-02-19 17:52:53 -05:00
Dimitri Savineau ac0f68ccf0 ceph-dashboard: update create/get rgw user tasks
Since [1] if a rgw user already exists then the radosgw-admin user create
command will return an error instead of modifying the current user.
We were already doing separated tasks for create and get operation but
only for multisite configuration but it's not enough.
Instead we should do the get task first and depending on the result
execute the create.
This commit also adds missing run_once and delegate_to statement.

[1] https://github.com/ceph/ceph/commit/269e9b9

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-18 10:22:21 +01:00
Sam Choraria 2a2656a985 ceph-rgw: allow SSL certificate content to supplied
Allow SSL certificate & key contents to be written to the path
specified by radosgw_frontend_ssl_certificate. This permits a
certificate to be deployed & renewal of expired certificates
through ceph-ansible.

Signed-off-by: Sam Choraria <sam.choraria@bbc.co.uk>
2020-02-17 16:22:11 +01:00
Dimitri Savineau c644ea9041 ceph-defaults: remove bootstrap_dirs_xxx vars
Both bootstrap_dirs_owner and bootstrap_dirs_group variables aren't
used anymore in the code.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-17 16:17:40 +01:00
Ali Maredia 1834c1e48d rgw: extend automatic rgw pool creation capability
Add support for erasure code pools.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1731148

Signed-off-by: Ali Maredia <amaredia@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-17 16:07:43 +01:00