Commit Graph

2038 Commits (fe1d09925ae1525e99f22a3eab9ca1823c079bda)

Author SHA1 Message Date
Sébastien Han a882ad7ade lint: use command instead of shell
Use command when the tasks does not have any pipes or wilcards.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-31 14:18:36 +01:00
Sébastien Han cfd60411bc lint: skip the linter
Do not run the linter for these 3:

* we use latest for pip docker-py package
* for ssl keys this is a false positive since the inital command is a
'shell' it'll always change
* for keystone, we must use shell since the with_items contains pipes

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-31 14:18:36 +01:00
Sébastien Han 55b071a114 lint: remove trailling spaces
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-31 14:18:36 +01:00
Sébastien Han adaa914d8e ceph-common: use yum install of shell
Use yum module to list repos and then activate them if needed.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-31 14:18:36 +01:00
Sébastien Han 53cdddf886 ceph-common: use a handler
We need a handler because the task changed, the old implementation was
basically mimicing a handler.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-31 14:18:36 +01:00
Sébastien Han 53972ee672 lint: add changed_when to command
Calling command should have changed_when false otherwise each time it
runs it will show as 'changed' and this is irrelevant.
Commands should not change things if nothing needs doing

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-31 14:18:36 +01:00
Sébastien Han 7dab5d4ac2 lint: name tasks
Tasks must have names.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-31 14:18:36 +01:00
Guillaume Abrioux 74ef7769fb mon: use `_current_monitor_address` in systemd unit file
Let's avoid a jinja loop and use `_current_monitor_address` to get the
monitor address.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-31 14:16:10 +01:00
Guillaume Abrioux 073131d8a6 mon: refact docker/main.yml
since the jinja logic has been moved into ansible task, we can simply
this part of the code and use `_current_monitor_address`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-31 14:16:10 +01:00
Guillaume Abrioux 404712ef01 defaults: add a fact '_current_monitor_address'
So we don't have to loop over `_monitor_addresses` when we need the
monitor address of the current node being played.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-31 14:16:10 +01:00
Guillaume Abrioux a2b2028212 config: remove complex jinja logic in ceph.conf.j2
using consecutive set_fact in the playbook instead of complex jinja syntax
makes ceph.conf.j2 more readable.
By the way, jinja can be painful to debug at some point.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-31 14:16:10 +01:00
Guillaume Abrioux f7d4651186 playbook: remove jinja syntax in when statement
this syntax in deprecated

Closes: #3281

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-31 13:45:41 +01:00
Rishabh Dave e2be5cf58c Revert "DNM: use ansible 2.7 for testing this PR"
This reverts commit 162010d90e.
2018-10-31 11:54:57 +01:00
Rishabh Dave 162010d90e DNM: use ansible 2.7 for testing this PR
Signed-off-by: Rishabh Dave <ridave@redhat.com>
2018-10-31 09:38:59 +01:00
Rishabh Dave 8edbda96df use blocks directives to group tasks
Using block directives simplifies the playbooks and makes them more
readable.

Fixes: https://github.com/ceph/ceph-ansible/issues/2835
Signed-off-by: Rishabh Dave <ridave@redhat.com>
2018-10-31 09:37:43 +01:00
Guillaume Abrioux 34275ac847 rgw: move multisite default variables in ceph-defaults
Move all rgw multisite variables in ceph-defaults so ceph-validate can
go through them.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 17:41:05 +01:00
Guillaume Abrioux 17ffb792e0 tests: fail if ansible version is not 2.7
Latest ansible version at the moment is 2.7

We should explicitly require 2.7 only on master branch.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 17:07:05 +01:00
Guillaume Abrioux 62c314e2ba tests: test master against ansible 2.7
Let's test ceph-ansible master against ansible 2.7 to catch early any
potential issue with this ansible version.

Closes: #3148

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 17:07:05 +01:00
Sébastien Han 8843f48222 iscsi more linting
Make flake8 happy

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-30 13:47:37 +00:00
Sébastien Han fd72f1dd0d iscsi module linting
Fix linter issues on iscsi modules.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-30 14:41:36 +01:00
Sébastien Han d209fc9d02 lint yaml
Fix [error] too many blank lines (1 > 0) (empty-lines)

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-30 14:41:36 +01:00
Guillaume Abrioux d8d3e55006 remove restapi role
As of `mimic`, restapi is no longer available because of manager daemon.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 14:19:13 +01:00
Guillaume Abrioux 547e90f281 rgw: move multisite related tasks after docker/main.yml
We must play this task after the container has started otherwise
rgw_multisite tasks will fail.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 14:00:28 +01:00
Guillaume Abrioux 710e11668d rgw: add rgw_multisite for containerized deployments
run commands on containers when containerized deployments.
(At the moment, all commands are run on the host only)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 14:00:28 +01:00
Guillaume Abrioux fe88c89c9c validate: remove check on rgw_multisite_endpoint_addr definition
since `rgw_multisite_endpoint_addr` has a default value to
`{{ ansible_fqdn }}`, it shouldn't be mandatory to set this variable.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 14:00:28 +01:00
Ali Maredia 59e6d04f9b rgw: add ceph-validate tasks for multisite, other fixes
- updated README-MULTISITE
- re-added destroy.yml
- added tasks in ceph-validate to make sure the
rgw multisite vars are set

Signed-off-by: Ali Maredia <amaredia@redhat.com>
2018-10-30 14:00:28 +01:00
Guillaume Abrioux 77d5d128c3 rgw: add a dedicated variable for multisite endpoint
We should give users the possibility to set the IP they want as
multisite endpoint, setting the default value to `{{ ansible_fqdn }}` to
not force them to set this variable.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 14:00:28 +01:00
Ali Maredia 474f151450 rgw: update rgw multisite tasks
- remove destroy tasks
- cleanup conditionals and syntax
- remove unnecessary realm pulls
- enable multisite to be tested in automated
testing infra
- add multisite related vars to main.yml and
group_vars
- update README-MULTISITE
- ensure all `radosgw-admin` commands are being run
on a mon

Signed-off-by: Ali Maredia <amaredia@redhat.com>
2018-10-30 14:00:28 +01:00
Guillaume Abrioux 748342f5b6 roles: fix *_docker_memory_limit default value
append 'm' suffix to specify the unit size used in all
`*_docker_memory_limit`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-29 14:59:09 +01:00
Neha Ojha b7e4d4eb84 roles: do not limit docker_memory_limit for various daemons
Since we do not have enough data to put valid upper bounds for the memory
usage of these daemons, do not put artificial limits by default. This will
help us avoid failures like OOM kills due to low default values.

Whenever required, these limits can be manually enforced by the user.

More details in
https://bugzilla.redhat.com/show_bug.cgi?id=1638148

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1638148
Signed-off-by: Neha Ojha <nojha@redhat.com>
2018-10-29 14:59:09 +01:00
Sébastien Han 0e63f0f3c9
Merge branch 'master' into wip-rm-calamari 2018-10-29 14:50:37 +01:00
Sébastien Han 5ab90b358c nfs: do not create the nfs user if already present
Check if the user exists and skip its creation if true.

Closes: https://github.com/ceph/ceph-ansible/issues/3254
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-26 16:24:38 +00:00
Guillaume Abrioux 4d698ce831 ceph-infra: reload firewall after rules are added
we ensure that firewalld is installed and running before adding any
rule. This has no sense anymore not to reload firewalld once the rule
are added.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-23 09:53:09 +00:00
Rishabh Dave ee2d52d33d allow custom pool size
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1596339
Signed-off-by: Rishabh Dave <ridave@redhat.com>
2018-10-22 16:00:21 +02:00
Guillaume Abrioux 48cfc60722 defaults: set default `configure_firewall` to `True`
Let's configure firewalld by default.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1526400

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-19 15:12:45 +02:00
Guillaume Abrioux 8fa437b7bd iscsi: fix networking issue on containerized env
The iscsi-gw containers can't reach monitors without `--net=host`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-19 00:12:43 +00:00
Guillaume Abrioux e77c36ad17 infra: move restart fw handler in ceph-infra role
Move the handler to restart firewall in ceph-infra role.

Closes: #3243

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-19 00:12:43 +00:00
Sébastien Han fbd878c8d5 infra: rename osd-configure to add-osd and improve it
The playbook has various improvements:

* run ceph-validate role before doing anything
* run ceph-fetch-keys only on the first monitor of the inventory list
* set noup flag so PGs get distributed once all the new OSDs have been
added to the cluster and unset it when they are up and running

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1624962
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-17 11:26:11 +00:00
Sébastien Han 680574ed4c ceph-fetch-keys: refact
This commits simplies the usage of the ceph-fetch-keys role. The role
now has a nicer way to find various ceph keys and fetch them on the
ansible server.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1624962
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-17 11:26:11 +00:00
Andy McCrae 3e0fa3bc18 Add ability to use a different client container
Currently a throw-away container is built to run ceph client
commands to setup users, pools & auth keys. This utilises
the same base ceph container which has all the ceph services
inside it.

This PR allows the use of a separate container if the deployer
wishes - but defaults to use the same full ceph container.

This can be used for different architectures or distributions,
which may support the the Ceph client, but not Ceph server,
and allows the deployer to build and specify a separate client
container if need be.

Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>
2018-10-16 23:28:35 +00:00
Guillaume Abrioux f0b2d82695 infra: fix wrong condition on firewalld start task
a non skipped task won't have the `skipped` attribute, so `start
firewalld` task will complain about that.
Indeed, `skipped` and `rc` attributes won't exist since the first task
`check firewalld installation on redhat or suse` won't be skipped in
case of non-containerized deployment.

Fixes: #3236
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1541840

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-16 16:24:42 +00:00
Christian Berendt ac37a0d0cd ceph-defaults: set ceph_stable_openstack_release_uca to queens
Liberty is no longer available in the UCA. The last available release there
is currently Queens.

Signed-off-by: Christian Berendt <berendt@betacloud-solutions.de>
2018-10-16 12:56:32 +00:00
Guillaume Abrioux b953965399 handler: remove some leftover in restart_*_daemon.sh.j2
Remove some legacy in those restart script.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-16 11:53:55 +00:00
Nan Li 55334baa0c docker-ce is used in aarch64 instead of docker engine
Signed-off-by: Nan Li <herbert.nan@linaro.org>
2018-10-15 18:38:40 +02:00
Guillaume Abrioux 60bc1e38db handler: fix osd containers handler
`ceph_osd_container_stat` might not be set on other osd node.
We must ensure we are on the last node before trying to evaluate
`ceph_osd_container_stat`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-15 10:30:40 +02:00
Guillaume Abrioux 40b7747af7 remove jewel support
As of now, we should no longer support Jewel in ceph-ansible.
The latest ceph-ansible release supporting Jewel is `stable-3.1`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-12 23:38:17 +00:00
Sébastien Han 31a0438cb2 ceph_volume: refactor
This commit does a couple of things:

* Avoid code duplication
* Clarify the code
* add more unit tests
* add myself to the author of the module

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-10 16:08:41 -04:00
Sébastien Han bfe689094e osd: do not run when lvm scenario
This task was created for ceph-disk based deployments so it's not needed
when osd are prepared with ceph-volume.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-10 16:08:41 -04:00
Sébastien Han 2bea8d8ecf handler: add support for ceph-volume containerized restart
The restart script wasn't working with the current new addition of
ceph-volume in container where now OSDs have the OSD id name in the
container name.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-10 16:08:41 -04:00
Sébastien Han 790f52f934 ceph-handler: change osd container check
Now that the container is named ceph-osd@<id> looking for something that
contains a host is not necessary. This is also backward compatible as it
will continue to match container names with hostname in them.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-10 16:08:41 -04:00
Sébastien Han 0580328340 validate: add warning for ceph-disk
ceph-disk will be removed in 3.3 and we encourage to start using
ceph-volume as of 3.2.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-10 16:08:41 -04:00
Sébastien Han a948677de1 osd: ceph-volume activate, just pass the OSD_ID
We don't need to pass the device and discover the OSD ID. We have a
task that gathers all the OSD ID present on that machine, so we simply
re-use them and activate them. This also handles the situation when you
have multiple OSDs running on the same device.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-10 16:08:41 -04:00
Sébastien Han 5f35910ee1 osd: change unit template for ceph-volume container
We don't need to pass the hostname on the container name but we can keep
it simple and just call it ceph-osd-$id.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-10 16:08:41 -04:00
Sébastien Han ece9e9812e osd: do not use expose_partitions on lvm
expose_partitions is only needed on ceph-disk OSDs so we don't need to
activate this code when running lvm prepared OSDs.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-10 16:08:41 -04:00
Sébastien Han e39fc4f6ce ceph_volume: add container support for batch command
The batch option got recently added, while rebasing this patch it was
necessary to implement it. So now, the batch option can work on
containerized environments.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1630977
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-10 16:08:41 -04:00
Sébastien Han 3ddcc9af16 ceph_volume: try to get ride of the dummy container
If we run on a containerized deployment we pass an env variable which
contains the container image.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-10 16:08:41 -04:00
Sébastien Han aa2c1b27e3 ceph-osd: ceph-volume container support
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-10 16:08:41 -04:00
Guillaume Abrioux 678e155328 infra: fix a typo in filename
configure_firewall is missing its dot.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-10 12:39:04 -04:00
Guillaume Abrioux f666902d52 infra: add tags for each subcomponent
This way we can skip one specific component if needed.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-10 15:44:33 +00:00
Guillaume Abrioux f8a7ffb085 infra: add firewall configuration for containerized deployment
firewalld is available on atomic so there is no reason to not apply
firewall configuration.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-10 15:44:33 +00:00
Guillaume Abrioux 0fb8812e47 infra: update firewall rules, add cluster_network for osds
At the moment, all daemons accept connections from 0.0.0.0.
We should at least restrict to public_network and add
cluster_network for OSDs.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1541840

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-10 15:44:33 +00:00
Guillaume Abrioux b3a71eeb08 ceph-infra: add new role ceph-infra
this role manages ceph infra services such as ntp, firewall, ...

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-10 15:44:33 +00:00
Noah Watkins 8dcc8d1434 Stringify ceph_docker_image_tag
This could be a numeric input, but is treated like a string leading to
runtime errors.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1635823

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
2018-10-10 04:26:33 +00:00
Noah Watkins 306e308f13 Avoid using tests as filter
Fixes the deprecation warning:

  [DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of
  using `result|search` use `result is search`.

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
2018-10-10 04:26:33 +00:00
Andrew Schoen ada03d064d ceph-validate: remove versions checks for bluestore and lvm scenario
These checks will never pass unless ceph_stable_release is passed and
ceph-defaults is run before ceph-validate. Additionally, we don't want
to support deploying jewel upstream at ceph-ansible master.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1637537

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-10-09 13:30:42 -04:00
Andrew Schoen 436dc8c5e1 ceph-config: allow the batch --report to fail when getting the OSD num
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-10-09 10:09:50 -04:00
Andrew Schoen 40f82319dd ceph-config: use 'lvm list' to find num_osds for an existing cluster
This makes finding num_osds idempotent for clusters that were deployed
using 'lvm batch'.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-10-09 10:09:50 -04:00
Andrew Schoen 8afef3d0de ceph-config: use the ceph_volume module to get num_osds for lvm batch
This gives us an accurate number of how many osds will be created.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-10-09 10:09:50 -04:00
Andrew Schoen c453ea25c0 ceph-osd: use journal_size and block_db_size for lvm batch
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-10-09 10:09:50 -04:00
Andrew Schoen 71ce539da5 ceph-defaults: add the block_db_size option
This is used in the lvm osd scenario for the 'lvm batch' subcommand
of ceph-volume.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-10-09 10:09:50 -04:00
Guillaume Abrioux 3e2cdcc735 common: remove check_firewall code
Check firewall isn't working as expected and might break deployments.
This part of the code will be reworked soon.

Let's focus on configure_firewall code for now.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1541840

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-06 14:32:17 +02:00
Guillaume Abrioux be31c15ccd follow up on b5d2ea2
Add some missed statements

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-06 14:32:17 +02:00
Rishabh Dave b5d2ea269f don't use "static" field while including tasks
Instead used "import_tasks" and "include_tasks" to tell whether tasks
must be included statically or dynamically.

Fixes: https://github.com/ceph/ceph-ansible/issues/2998
Signed-off-by: Rishabh Dave <ridave@redhat.com>
2018-10-04 07:44:28 +00:00
Guillaume Abrioux 6130bc841d config: look up for monitor_address_block in hostvars
`monitor_address_block` should be read from hostvars[host] instead of
current node being played.

eg:

Let's assume we have:

```
[mons]
ceph-mon0 monitor_address=192.168.1.10
ceph-mon1 monitor_interface=eth1
ceph-mon2 monitor_address_block=192.168.1.0/24
```

the ceph.conf generation task will end up with:

```
fatal: [ceph-mon0]: FAILED! => {}

MSG:

'ansible.vars.hostvars.HostVarsVars object' has no attribute u'ansible_interface'
```

the reason is that it will assume `monitor_address_block` isn't defined even on
ceph-mon2 because looking for `monitor_address_block` instead of
`hostvars[host]['monitor_address_block']`, therefore it enters in the condition as default value:

```
    {%- else -%}
      {% set interface = 'ansible_' + (monitor_interface | replace('-', '_')) %}
      {% if ip_version == 'ipv4' -%}
        {{ hostvars[host][interface][ip_version]['address'] }}
      {%- elif ip_version == 'ipv6' -%}
        [{{ hostvars[host][interface][ip_version][0]['address'] }}]
      {%- endif %}
    {%- endif %}
```

`monitor_interface` is set with default value `'interface'` so the `interface`
variable is built with 'ansible_' + 'interface'. It makes ansible throwing a
confusing message about `'ansible_interface'`.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1635303

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-02 22:41:05 +02:00
Benjamin Cherian 85071e6e53 Add support for different NTP daemons
Allow user to choose between timesyncd, chronyd and ntpd
Installation will default to timesyncd since it is distributed as
part of the systemd installation for most distros.
Added note indicating NTP daemon type is not used for containerized
deployments.

Fixes issue #3086 on Github

Signed-off-by: Benjamin Cherian <benjamin_cherian@amat.com>
2018-10-02 13:18:08 +00:00
Mike Christie eddb95941b igw: valid client CHAP settings.
The linux kernel target layer, LIO, does not support the iscsi target to
mix ACLs that have chap enabled and disabled under the same tpg. This
patch adds a check and fails if this type of setup is detected.

This fixes Red Hat BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1615088

Signed-off-by: Mike Christie <mchristi@redhat.com>
2018-10-01 18:23:03 +02:00
Sébastien Han 4db6a213f7 add ceph-handler role
The role contains all the handlers for Ceph services. We decided to
leave ceph-defaults role with variables and a few facts only. This is
useful when organizing the site.yml files and also adding the known
variables to infrastructure-playbooks.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-09-28 15:15:49 +00:00
Sébastien Han 145aef9fed defaults: do not disable THP on bluestore
As per #1013 it appears that BS will soon use THP to lower TLB misses,
also disabling THP hasn't demonstrated any gains so far.

Closes: https://github.com/ceph/ceph-ansible/issues/1013
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-09-27 21:23:49 +00:00
Sébastien Han dc3319c3c4 default: use bluestore as default object store
All tooling in Ceph is defaulting to use the bluestore objectstore for provisioning OSDs, there is no good reason for ceph-ansible to continue to default to filestore.

Closes: https://github.com/ceph/ceph-ansible/issues/3149
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1633508
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-09-27 21:23:49 +00:00
Rishabh Dave 380168dadc don't use "include" to include tasks
Use "import_tasks" or "include_tasks" instead.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2018-09-27 17:53:40 +02:00
Giulio Fidente 6126210e0e Fix version check in ceph.conf template
We need to look for ceph_release when comparing with release names,
not ceph_version.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1631789
Signed-off-by: Giulio Fidente <gfidente@redhat.com>
2018-09-24 13:08:27 +02:00
Matthew Vernon 806461ac6e restart_osd_daemon.sh.j2 - use `+` rather than `{1,}` in regex
`+` is more idiomatic for "one or more" in a regex than `{1,}`; the
latter was introduced in a previous fix for an incorrect `{1,2}`
restriction.

Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
2018-09-24 10:33:46 +00:00
Matthew Vernon 04f4991648 restart_osd_daemon.sh.j2 - consider active+clean+* pgs as OK
After restarting each OSD, restart_osd_daemon.sh checks that the
cluster is in a good state before moving on to the next one. One of
the checks it does is that the number of pgs in the state
"active+clean" is equal to the total number of pgs in the cluster.

On large clusters (e.g. we have 173,696 pgs), it is likely that at
least one pg will be scrubbing and/or deep-scrubbing at any one
time. These pgs are in state "active+clean+scrubbing" or
"active+clean+scrubbing+deep", so the script was erroneously not
including them in the "good" count. Similar concerns apply to
"active+clean+snaptrim" and "active+clean+snaptrim_wait".

Fix this by considering as good any pg whose state contains
active+clean. Do this as an integer comparison to num_pgs in pgmap.

(could this be backported to at least stable-3.0 please?)

Closes: #2008
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
2018-09-24 10:33:46 +00:00
Matthew Vernon aa97ecf048 restart_osd_daemon.sh.j2 - Reset RETRIES between calls of check_pgs
Previously RETRIES was set (by default to 40) once at the start of the
script; this meant that it would only ever wait for up to 40 lots of
30s across *all* the OSDs on a host before bombing out. In fact, we
want to be prepared to wait for the same amount of time after each OSD
restart for the clusters' pgs to be happy again before continuing.

Closes: #3154
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
2018-09-24 08:20:32 +00:00
John Spray 26bfef4107 Remove Calamari-related pieces
...with the exception of the purge operation, since
removing Calamari would still be useful for an old
cluster.

Signed-off-by: John Spray <john.spray@redhat.com>
2018-09-21 11:00:18 +01:00
Andrew Schoen 16ccac83fe ceph-config: calculate num_osds for the lvm batch scenario
For now our best guess is to count the number of devices and multiply
by osds_per_device. Ideally we'd like to run ceph-volume lvm batch
--report and get the number of OSDs that way, but currently we need
a ceph.conf in place already before we can do that. There is a tracker
ticket that would allow os to get around the need for a ceph.conf:
http://tracker.ceph.com/issues/36088

Fixes: https://github.com/ceph/ceph-ansible/issues/3135

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-09-20 15:41:52 +00:00
Guillaume Abrioux 6d6fd514e0 config: set default _rgw_hostname value to respective host
the default value for _rgw_hostname was took from the current node being
played while it should be took from the respective node in the loop.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622505

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-09-18 20:10:34 +02:00
Andrew Schoen 8afad35f5a ceph-config: default devices and lvm_volumes when setting num_osds
This avoids errors when the osd scenario choosen does not require
setting devices or lvm_volumes. The default values for these are not
set because they exist in the ceph-osd role, not ceph-defaults.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-09-18 17:02:33 +00:00
Neha Ojha 27027a17d3 osd: add osd memory target option
BlueStore's cache is sized conservatively by default, so that it does
not overwhelm under-provisioned servers. The default is 1G for HDD, and
3G for SSD.

To replace the page cache, as much memory as possible should be given to
BlueStore. This is required for good performance. Since ceph-ansible
knows how much memory a host has, it can set

`bluestore cache size = max(total host memory / num OSDs on this host * safety
factor, 1G)`

Due to fragmentation and other memory use not included in bluestore's
cache, a safety factor of 0.5 for dedicated nodes and 0.2 for
hyperconverged nodes is recommended.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1595003

Signed-off-by: Neha Ojha <nojha@redhat.com>
Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-09-18 10:12:46 +00:00
Mike Christie 8fcd63cc50 igw: enable and start rbd-target-api
The commit:

commit 1164cdc002
Author: Guillaume Abrioux <gabrioux@redhat.com>
Date:   Thu Aug 2 11:58:47 2018 +0200

    iscsigw: install ceph-iscsi-cli package

installs the cli package but does not start and enable the
rbd-target-api daemon needed for gwcli to communicate with the igw
nodes. This patch just enables and starts it for the non-container
setup. The container setup is already doing this.

This fixes bz https://bugzilla.redhat.com/show_bug.cgi?id=1613963

Signed-off-by: Mike Christie <mchristi@redhat.com>
2018-09-13 19:35:45 +00:00
Guillaume Abrioux a6f77340fd nfs: ignore error on semanage command for ganesha_t
As of rhel 7.6, it has been decided it doesn't make sense to confine
`ganesha_t` anymore. It means this domain won't exist anymore.

Let's add a `failed_when: false` in order to make the deployment not
failing when trying to run this command.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1626070

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-09-13 13:06:47 +02:00
Andrew Schoen b36f3e06b5 ceph_volume: adds the osds_per_device parameter
If this is set to anything other than the default value of 1 then the
--osds-per-device flag will be used by the batch command to define how
many osds will be created per device.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-09-12 20:27:14 +00:00
Guillaume Abrioux 1c88c444a3 mon: fix `ExecStartPre` option in systemd unit file
This command line is not supported.
According to official documentation:

```
Note that shell command lines are not directly supported.
If shell command lines are to be used,
they need to be passed explicitly to a shell implementation of some kind.
```

We must run this using /bin/sh instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-09-11 10:48:21 +02:00
Guillaume Abrioux 9ff26e80f2 defaults: add a default value to rgw_hostname
let's add ansible_hostname as a default value for rgw_hostname if no
hostname in servicemap matches ansible_fqdn.

Fixes: #3063
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622505

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-09-10 12:07:44 +02:00
Guillaume Abrioux ecbd3e4558 Revert "client: add quotes to the dict values"
This commit is adding quotes that make keyring unusuable

eg:

```
client.john
        key: AQAN0RdbAAAAABAAH5D3WgMN9Rxw3M8jkpMIfg==
        caps: [mds] ''
        caps: [mgr] 'allow *'
        caps: [mon] 'allow rw'
        caps: [osd] 'allow rw'
```

Trying to import such a keyring and use it will result:

```
Error EACCES: access denied
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1623417

This reverts commit 424815501a.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-09-07 17:21:55 +00:00
Tom Barron bf8f589958 run rados cmd in container if containerized deployment
When ceph-nfs is deployed containerized and ceph-common is not
installed on the host the start_nfs task fails because the rados
command is missing on the host.

Run rados commands from a ceph container instead so that
they will succeed.

Signed-off-by: Tom Barron <tpb@dyncloud.net>
2018-09-03 17:06:00 +00:00
Markos Chandras 217f35dbdb roles: ceph-rgw: Enable the ceph-radosgw target
If the ceph-radosgw target is not enabled, then enabling the
ceph-radosgw@ service has no effect since nothing will pull
it on the next reboot. As such, we need to ensure that the
target is enabled.

Signed-off-by: Markos Chandras <mchandras@suse.de>
2018-09-03 15:48:58 +02:00
Andy McCrae 772e6b9be2 Dont run client dummy container on non-x86_64 hosts
The dummy client container currently wont work on non-x86_64 hosts.
This PR creates a filtered client group that contains only hosts
that are x86_64 - which can then be the group to run the
dummy container against.

This is for the specific case of a containerized_deployment where
there is a mixture of non-x86_64 hosts and x86_64 hosts. As such
the filtered group will contain all hosts when running with
containerized_deployment: false.

Currently ppc64le is not supported for Ceph server components.

Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>
2018-08-31 11:34:00 +00:00
Sébastien Han 9ba670567e remove warning for unsupported variables
As promised, these will go unsupported for 3.1 so let's actually remove
them :).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622729
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-28 13:31:57 -07:00
Sébastien Han 6d7fa99ff7 defaults: fix rgw_hostname
A couple if things were wrong in the initial commit:

* ceph_release_num[ceph_release] >= ceph_release_num['luminous'] will
never work since the ceph_release fact is set in the roles after. So
either ceph-common or ceph-docker-common set it

* we can easily re-use the initial command to check if a cluster is
running, it's more elegant than running it twice.

* set the fact rgw_hostname on rgw nodes only

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1618678
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-22 17:46:00 +02:00