Commit Graph

1958 Commits (c5e4e62ab5b836fc40cc24b9ac3402ff2828743d)

Author SHA1 Message Date
Giulio Fidente 6126210e0e Fix version check in ceph.conf template
We need to look for ceph_release when comparing with release names,
not ceph_version.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1631789
Signed-off-by: Giulio Fidente <gfidente@redhat.com>
2018-09-24 13:08:27 +02:00
Matthew Vernon 806461ac6e restart_osd_daemon.sh.j2 - use `+` rather than `{1,}` in regex
`+` is more idiomatic for "one or more" in a regex than `{1,}`; the
latter was introduced in a previous fix for an incorrect `{1,2}`
restriction.

Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
2018-09-24 10:33:46 +00:00
Matthew Vernon 04f4991648 restart_osd_daemon.sh.j2 - consider active+clean+* pgs as OK
After restarting each OSD, restart_osd_daemon.sh checks that the
cluster is in a good state before moving on to the next one. One of
the checks it does is that the number of pgs in the state
"active+clean" is equal to the total number of pgs in the cluster.

On large clusters (e.g. we have 173,696 pgs), it is likely that at
least one pg will be scrubbing and/or deep-scrubbing at any one
time. These pgs are in state "active+clean+scrubbing" or
"active+clean+scrubbing+deep", so the script was erroneously not
including them in the "good" count. Similar concerns apply to
"active+clean+snaptrim" and "active+clean+snaptrim_wait".

Fix this by considering as good any pg whose state contains
active+clean. Do this as an integer comparison to num_pgs in pgmap.

(could this be backported to at least stable-3.0 please?)

Closes: #2008
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
2018-09-24 10:33:46 +00:00
Matthew Vernon aa97ecf048 restart_osd_daemon.sh.j2 - Reset RETRIES between calls of check_pgs
Previously RETRIES was set (by default to 40) once at the start of the
script; this meant that it would only ever wait for up to 40 lots of
30s across *all* the OSDs on a host before bombing out. In fact, we
want to be prepared to wait for the same amount of time after each OSD
restart for the clusters' pgs to be happy again before continuing.

Closes: #3154
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
2018-09-24 08:20:32 +00:00
John Spray 26bfef4107 Remove Calamari-related pieces
...with the exception of the purge operation, since
removing Calamari would still be useful for an old
cluster.

Signed-off-by: John Spray <john.spray@redhat.com>
2018-09-21 11:00:18 +01:00
Andrew Schoen 16ccac83fe ceph-config: calculate num_osds for the lvm batch scenario
For now our best guess is to count the number of devices and multiply
by osds_per_device. Ideally we'd like to run ceph-volume lvm batch
--report and get the number of OSDs that way, but currently we need
a ceph.conf in place already before we can do that. There is a tracker
ticket that would allow os to get around the need for a ceph.conf:
http://tracker.ceph.com/issues/36088

Fixes: https://github.com/ceph/ceph-ansible/issues/3135

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-09-20 15:41:52 +00:00
Guillaume Abrioux 6d6fd514e0 config: set default _rgw_hostname value to respective host
the default value for _rgw_hostname was took from the current node being
played while it should be took from the respective node in the loop.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622505

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-09-18 20:10:34 +02:00
Andrew Schoen 8afad35f5a ceph-config: default devices and lvm_volumes when setting num_osds
This avoids errors when the osd scenario choosen does not require
setting devices or lvm_volumes. The default values for these are not
set because they exist in the ceph-osd role, not ceph-defaults.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-09-18 17:02:33 +00:00
Neha Ojha 27027a17d3 osd: add osd memory target option
BlueStore's cache is sized conservatively by default, so that it does
not overwhelm under-provisioned servers. The default is 1G for HDD, and
3G for SSD.

To replace the page cache, as much memory as possible should be given to
BlueStore. This is required for good performance. Since ceph-ansible
knows how much memory a host has, it can set

`bluestore cache size = max(total host memory / num OSDs on this host * safety
factor, 1G)`

Due to fragmentation and other memory use not included in bluestore's
cache, a safety factor of 0.5 for dedicated nodes and 0.2 for
hyperconverged nodes is recommended.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1595003

Signed-off-by: Neha Ojha <nojha@redhat.com>
Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-09-18 10:12:46 +00:00
Mike Christie 8fcd63cc50 igw: enable and start rbd-target-api
The commit:

commit 1164cdc002
Author: Guillaume Abrioux <gabrioux@redhat.com>
Date:   Thu Aug 2 11:58:47 2018 +0200

    iscsigw: install ceph-iscsi-cli package

installs the cli package but does not start and enable the
rbd-target-api daemon needed for gwcli to communicate with the igw
nodes. This patch just enables and starts it for the non-container
setup. The container setup is already doing this.

This fixes bz https://bugzilla.redhat.com/show_bug.cgi?id=1613963

Signed-off-by: Mike Christie <mchristi@redhat.com>
2018-09-13 19:35:45 +00:00
Guillaume Abrioux a6f77340fd nfs: ignore error on semanage command for ganesha_t
As of rhel 7.6, it has been decided it doesn't make sense to confine
`ganesha_t` anymore. It means this domain won't exist anymore.

Let's add a `failed_when: false` in order to make the deployment not
failing when trying to run this command.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1626070

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-09-13 13:06:47 +02:00
Andrew Schoen b36f3e06b5 ceph_volume: adds the osds_per_device parameter
If this is set to anything other than the default value of 1 then the
--osds-per-device flag will be used by the batch command to define how
many osds will be created per device.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-09-12 20:27:14 +00:00
Guillaume Abrioux 1c88c444a3 mon: fix `ExecStartPre` option in systemd unit file
This command line is not supported.
According to official documentation:

```
Note that shell command lines are not directly supported.
If shell command lines are to be used,
they need to be passed explicitly to a shell implementation of some kind.
```

We must run this using /bin/sh instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-09-11 10:48:21 +02:00
Guillaume Abrioux 9ff26e80f2 defaults: add a default value to rgw_hostname
let's add ansible_hostname as a default value for rgw_hostname if no
hostname in servicemap matches ansible_fqdn.

Fixes: #3063
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622505

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-09-10 12:07:44 +02:00
Guillaume Abrioux ecbd3e4558 Revert "client: add quotes to the dict values"
This commit is adding quotes that make keyring unusuable

eg:

```
client.john
        key: AQAN0RdbAAAAABAAH5D3WgMN9Rxw3M8jkpMIfg==
        caps: [mds] ''
        caps: [mgr] 'allow *'
        caps: [mon] 'allow rw'
        caps: [osd] 'allow rw'
```

Trying to import such a keyring and use it will result:

```
Error EACCES: access denied
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1623417

This reverts commit 424815501a.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-09-07 17:21:55 +00:00
Tom Barron bf8f589958 run rados cmd in container if containerized deployment
When ceph-nfs is deployed containerized and ceph-common is not
installed on the host the start_nfs task fails because the rados
command is missing on the host.

Run rados commands from a ceph container instead so that
they will succeed.

Signed-off-by: Tom Barron <tpb@dyncloud.net>
2018-09-03 17:06:00 +00:00
Markos Chandras 217f35dbdb roles: ceph-rgw: Enable the ceph-radosgw target
If the ceph-radosgw target is not enabled, then enabling the
ceph-radosgw@ service has no effect since nothing will pull
it on the next reboot. As such, we need to ensure that the
target is enabled.

Signed-off-by: Markos Chandras <mchandras@suse.de>
2018-09-03 15:48:58 +02:00
Andy McCrae 772e6b9be2 Dont run client dummy container on non-x86_64 hosts
The dummy client container currently wont work on non-x86_64 hosts.
This PR creates a filtered client group that contains only hosts
that are x86_64 - which can then be the group to run the
dummy container against.

This is for the specific case of a containerized_deployment where
there is a mixture of non-x86_64 hosts and x86_64 hosts. As such
the filtered group will contain all hosts when running with
containerized_deployment: false.

Currently ppc64le is not supported for Ceph server components.

Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>
2018-08-31 11:34:00 +00:00
Sébastien Han 9ba670567e remove warning for unsupported variables
As promised, these will go unsupported for 3.1 so let's actually remove
them :).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622729
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-28 13:31:57 -07:00
Sébastien Han 6d7fa99ff7 defaults: fix rgw_hostname
A couple if things were wrong in the initial commit:

* ceph_release_num[ceph_release] >= ceph_release_num['luminous'] will
never work since the ceph_release fact is set in the roles after. So
either ceph-common or ceph-docker-common set it

* we can easily re-use the initial command to check if a cluster is
running, it's more elegant than running it twice.

* set the fact rgw_hostname on rgw nodes only

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1618678
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-22 17:46:00 +02:00
Andy McCrae 18684b7209 Sync config_template with base plugin
The config_template plugin exists in the ceph-common role so that
config_template will still work with ansible galaxy.

This PR syncs the config_template module from the base of the repo in
plugins/actions to the ceph-common role.

Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>
2018-08-21 16:10:33 +00:00
Sébastien Han 8c70a5b197 osd: fix ceph_release
We need ceph_release in the condition, not ceph_stable_release

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1619255
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-20 20:14:56 +02:00
Markos Chandras 126e2e3f92 roles: ceph-defaults: Check if 'rgw' attribute exists for rgw_hostname
If there are no services on the cluster, then the 'rgw' could be missing
and the task is failing with the following problem:

msg": "The task includes an option with an undefined variable.
The error was: 'dict object' has no attribute 'rgw'

We fix this by checking the existence of the 'rgw' attribute. If it's
missing, we skip the task since the role already contains code to set
a good default rgw_hostname.

Signed-off-by: Markos Chandras <mchandras@suse.de>
2018-08-20 11:37:45 +02:00
Markos Chandras 37e50114de roles: ceph-defaults: Delegate cluster information task to monitor node
Since commit f422efb1d6 ("config: ensure
rgw section has the correct name") we observe the following failures in
new Ceph deployment with OpenStack-Ansible

fatal: [aio1_ceph-rgw_container-fc588f0a]: FAILED! => {"changed": false,
"cmd": "ceph --cluster ceph -s -f json", "msg": "[Errno 2] No such file
or directory"

This is because the task executes 'ceph' but at this point no package
installation has happened. Packages are normally installed in the
'ceph-common' role which runs after the 'ceph-defaults' one.

Since we are looking to obtain cluster information, the task should be
delegated to a monitor node similar to other tasks in that role

Signed-off-by: Markos Chandras <mchandras@suse.de>
2018-08-20 11:37:45 +02:00
Dardo D Kleiner f6519e4003 mgr: improve/fix disabled modules check
Follow up on 36942af698

"disabled_modules" is always a list, it's the items in the list that
can be dicts in mimic.  Many ways to fix this, here's one.

Signed-off-by: Dardo D Kleiner <dardokleiner@gmail.com>
2018-08-20 11:23:58 +02:00
Sébastien Han 3149b2564f Revert "osd: generate device list for osd_auto_discovery on rolling_update"
This reverts commit e84f11e99e.

This commit was giving a new failure later during the rolling_update
process. Basically, this was modifying the list of devices and started
impacting the ceph-osd itself. The modification to accomodate the
osd_auto_discovery parameter should happen outside of the ceph-osd.

Also we are trying to not play ceph-osd role during the rolling_update
process so we can speed up the upgrade.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-16 11:13:12 +02:00
Markos Chandras 7172737f13 roles: ceph-defaults: Set ceph_uid on SUSE distributions
The ceph_uid is also '167' on SUSE systems so extend the existing task.

Signed-off-by: Markos Chandras <mchandras@suse.de>
2018-08-13 19:02:57 +00:00
Guillaume Abrioux 36942af698 mgr: backward compatibility for module management
Follow up on 3abc253fec

The structure had even changed within `luminous` release.
It was first:

```
{
    "enabled_modules": [
        "balancer",
        "dashboard",
        "restful",
        "status"
    ],
    "disabled_modules": [
        "influx",
        "localpool",
        "prometheus",
        "selftest",
        "zabbix"
    ]
}
```
Then it changed for:

```
{
  "enabled_modules": [
      "status"
  ],
  "disabled_modules": [
      "balancer",
      "dashboard",
      "influx",
      "localpool",
      "prometheus",
      "restful",
      "selftest",
      "zabbix"
  ]
}
```

and finally:
```
{
  "enabled_modules": [
      "status"
  ],
  "disabled_modules": [
      {
          "name": "balancer",
          "can_run": true,
          "error_string": ""
      },
      {
          "name": "dashboard",
          "can_run": true,
          "error_string": ""
      }
  ]
}
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-08-13 13:25:06 +00:00
Guillaume Abrioux 8b5e3cd999 validate: fail if fqdn deployment attempted
fqdn configuration possibility caused a lot of trouble, it's adding a
lot of complexity because of multiple cases and the relation between
ceph-ansible and ceph-container. Moreover, there is no benefit for such
a feature.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613155

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-08-13 10:04:24 +02:00
Guillaume Abrioux f422efb1d6 config: ensure rgw section has the correct name
the ceph.conf.j2 always assumes the hostname used to register the
radosgw in the servicemap is equivalent to `{{ ansible_hostname }}`
which returns the shortname form.

We need to detect which form of the hostname was used in case of already
deployed cluster and update the ceph.conf accordingly.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1580408

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-08-13 10:04:24 +02:00
Guillaume Abrioux db29b5b84d config: clean template, remove useless conditions
there is no need to have all these conditions.

for instance, assuming `mds_group_name` is set to 'mdss':

  - `if groups[mds_group_name] is defined` checks if `'mdss'` is present in `{{ groups }}`

  - `if {{ mds_group_name }} in group_names` checks if the current node is part
  the group `'mdss'`

  - `if inventory_hostname in groups.get(mds_group_name, [])` checks if
  the current node is part of the group 'mdss'

The third condition is enough to cover the need of ensuring we are
running on a mds node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-08-13 10:04:24 +02:00
Sébastien Han 4c9e24a90f mon: fix calamari initialisation
If calamari is already installed and ceph has been upgraded to a higher
version the initialisation will fail later. So if we detect the
calamari-server is too old compare to ceph_rhcs_version we try to update
it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1601755
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-10 14:14:23 +02:00
Andrew Schoen 6423ab4ad3 lvm: fix condition when selecting which scenario to run
devices and lvm_volumes will always be defined, so we need to instead
check it's length before deciding to run the scenario.

This fixes the failure here:
https://2.jenkins.ceph.com/job/ceph-ansible-prs-luminous-bluestore_lvm_osds/86/consoleFull#1667273050b5dd38fa-a56e-4233-a5ca-584604e56e3a

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-08-10 11:46:12 +02:00
Sébastien Han e84f11e99e osd: generate device list for osd_auto_discovery on rolling_update
rolling_update relies on the list of devices when performing the restart
of the OSDs. The task that is builind the devices list out of the
ansible_devices dict only runs when there are no partitions on the
drives. However during an upgrade the OSD are already configured, they
have been prepared and have partitions so this task won't run and thus
the devices list will be empty, skipping the restart during
rolling_update. We now run the same task under different requirements
when rolling_update is true and build a list when:

* osd_auto_discovery is true
* rolling_update is true
* ansible_devices exists
* no dm/lv are part of the discovery
* the device is not removable
* the device has more than 1 sector

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613626
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-10 09:19:40 +02:00
Andrew Schoen 3592c68cca ceph-osd: adds crush_device_class config option
This is used with the lvm osd scenario. When using devices you need the
option to set the crush device class for all of the OSDs that are
created from those devices.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-08-09 09:41:58 -04:00
Andrew Schoen 6d431ec22d ceph-volume: implement the 'lvm batch' subcommand
This adds the action 'batch' to the ceph-volume module so that we can
run the new 'ceph-volume lvm batch' subcommand. A functional test is
also included.

If devices is defind and osd_scenario is lvm then the 'ceph-volume lvm
batch' command will be used to create the OSDs.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-08-09 09:41:58 -04:00
Sébastien Han 4d64dd4686 rgw: ability to use ceph-ansible vars into containers
Since the container now simply reads the ceph.conf, we remove all the
unnecessary options.

Also this PR is the foundation to support multiple backend, such as the
new 'beast' from Ceph Mimic.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1582411
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-09 14:13:17 +02:00
Sébastien Han 3bce117de2 rgw: remove unused file
copy_configs.yml was not including and is a leftover so let's remove it.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-09 14:13:17 +02:00
Sébastien Han 5a89479abe rgw: remove useless condition
The include does not need a condition on containerized_deployment since
we are already in an include than has the same condition.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-09 14:13:17 +02:00
Graeme Gillies a46025820d Allow mgr bootstrap keyring to be defined
In environments where we wish to have manual/greater control over
how the bootstrap keyrings are used, we need to able to externally
define what the mgr keyring secret will be and have ceph-ansible
use it, instead of it being autogenerated

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1610213

Signed-off-by: Graeme Gillies <ggillies@akamai.com>
2018-08-08 19:09:01 +00:00
Artur Fijalkowski 52d9d406b1 Fix in regular expression matching OSD ID on non-contenerized
deployment.
restart_osd_daemon.sh is used to discover and restart all OSDs on a
host. To do it the scripts loops the list of ceph-osd@ services in the
system. This commit fixes bug in the regular expression responsile for
extraction of OSDs - prior version uses `[0-9]{1,2}` expression
which is ignoring all OSDS which numbers are greater than 99 (thus
longer than 2 digits). Fix removed upper limit of digits in the number.
This problem existed in two places in the script.

Closes: #2964

Signed-off-by: Artur Fijalkowski <artur.fijalkowski@ing.com>
2018-08-06 15:53:49 +00:00
Guillaume Abrioux 1164cdc002 iscsigw: install ceph-iscsi-cli package
Install ceph-iscsi-cli in order to provide the `gwcli` command tool.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1602785

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-08-06 14:11:52 +02:00
Guillaume Abrioux 0a6ff6bbf8 defaults: backward compatibility with fqdn deployments
This commit ensures we are backward compatible with fqdn deployments.
Since ceph-container enforces deployment to be done with shortname, we
must keep backward compatibility with clusters already deployed with
fqdn configuration

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-08-06 10:14:58 +00:00
Sébastien Han ea9e60d48d config: enforce socket name
This was introduced by
59ee2e8d3b
and made our socket checks impossible to run. The PID could be found,
but the cctid cannot.
This happens during upgrade to mimic and on cluster running on mimic.

So let's force the admin socket the way it was so we can properly check
for existing instances also the line $cluster-$name.$pid.$cctid.asok
is only needed when running multiple instances of the same daemon,
thing ceph-ansible cannot do at the time of writing

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1610220
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-07-31 10:58:04 +02:00
Mike Christie 6f72f96dad igw: do not fail purge on rbd removal errors
Instead of failing the entire purge operation when the rbd command fails
just log an error. This will allow the higher level target and config
cleanup to complete, and the user only has to manually delete the rbd
images.

Signed-off-by: Mike Christie <mchristi@redhat.com>
2018-07-31 10:08:26 +02:00
Mike Christie d572a9a602 igw: fix image removal during purge
We were not passing in the ceph conf info into the rbd image removal
command, so if the clustername was not the default igw purge would fail
due to the rbd rm command failing.

This just fixes the bug by passing in the ceph conf info which has the
clustername to use.

This fixes Red Hat bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=1601949

Signed-off-by: Mike Christie <mchristi@redhat.com>
2018-07-31 10:08:26 +02:00
Sébastien Han 2ca8c51906 osd: do not remove expose_partition container
The container runs with --rm which means it will be deleted by Docker
when exiting. Also 'docker rm -f' is not idempotent and returns 1 if the
container does not exist.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1609007
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-07-30 10:38:15 +02:00
Guillaume Abrioux 1ecbbbdcfa rbd-mirror: bring back compatibility with jewel deployment
rbd-mirror can't start when deploying jewel because it needs admin
keyring.
Getting back this task brings backward compatibility for jewel
deployment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-26 18:47:10 +00:00
Guillaume Abrioux 053709da97 ceph-osds: backward compatibility with jewel for osp pools creation
If we want to be backward compatible with release prior to luminous, we
have to set the rule name accordingly to default values used in jewel.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-26 18:47:10 +00:00
Guillaume Abrioux 2597a557c5 client: fix an incorrect title in a task
This task would be run on both containerized *and* non containerized
deployment.
Let's have a proper title to avoid confusion.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-26 15:57:41 +02:00
Sébastien Han e2ea5bac51 rgw: add more config option for civetweb frontend
In containerized deployments we now inherite from the
radosgw_civetweb_options options when bootstrapping the container.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1582411
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-07-25 13:19:14 +00:00
Giulio Fidente e85e5ea781 Run creation of empty rados index object to first monitor
When distributing ceph-nfs role, creation of rados index object
fails as it assumes availability of client.admin locally.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1607970
Signed-off-by: Giulio Fidente <gfidente@redhat.com>
2018-07-25 11:40:11 +02:00
Sébastien Han 235d1b3f55 validate: add checks for interfaces
Check if the interface provided:

* exists in the gathered facts (thus on the system)
* is active
* has an IP address (depending on ip_version )

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600227
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-07-24 17:59:30 +02:00
Guillaume Abrioux af82e7523d tests: test master against ansible 2.6
Ansible 2.4 is currently end-of-life.
Ansible 2.5 will go end-of-life after Ansible 2.7 is released.

Fixes: #2901

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-23 11:59:15 +00:00
Sébastien Han 7fc13bc9d5 validate: only run osd test on osd node
Do not run device validation on every hosts, only on OSD nodes.

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-19 12:46:18 +00:00
Sébastien Han cf01e596b6 valide: improve device check
We know make sure that:

* devices are actually block special files
* length of dedicated_device is identical to devices

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-07-18 14:26:22 +00:00
Guillaume Abrioux 1a626d3c61 nfs: change default stable branch for nfs-ganesha repo
Since `V2.6-stable` is available and has packages for `mimic`, let's
update this default value accordingly so nfs nodes can be deployed with
mimic.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-13 08:20:27 +00:00
Sébastien Han e61ca882a1 validate: force ansible version
We currently only support Ansible 2.4.X so let's fail if the version is
different.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-07-13 07:52:56 +00:00
Guillaume Abrioux 5ef5fcd0b6 client: do not rely on copy_admin_key to import keys
Relying on `copy_admin_key` to import created keys on client nodes makes
us obliged to copy admin key on those nodes which is not something we might
want.
We should use the fact `condition_copy_admin_key` which will be set to
`True` when the delegated node is a mon which means we can import keys
without taking care of admin keyring.

Fixes: #2867

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-13 06:52:00 +00:00
Guillaume Abrioux ce5ac930c5 mgr: fix condition to add modules to ceph-mgr
Follow up on #2784

We must check in the generated fact `_disabled_ceph_mgr_modules` to
enable disabled mgr module.
Otherwise, this task will be skipped because it's not comparing the
right list.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600155

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-12 21:04:01 +00:00
Guillaume Abrioux 9f54b3b4a7 mon: ensure socker is purged when mon is stopped
On containerized deployment, if a mon is stopped, the socket is not
purged and can cause failure when a cluster is redeployed after the
purge playbook has been run.

Typical error:

```
fatal: [osd0]: FAILED! => {}

MSG:

'dict object' has no attribute 'osd_pool_default_pg_num'
```

the fact is not set because of this previous failure earlier:

```
ok: [mon0] => {
    "changed": false,
    "cmd": "docker exec ceph-mon-mon0 ceph --cluster test daemon mon.mon0 config get osd_pool_default_pg_num",
    "delta": "0:00:00.217382",
    "end": "2018-07-09 22:25:53.155969",
    "failed_when_result": false,
    "rc": 22,
    "start": "2018-07-09 22:25:52.938587"
}

STDERR:

admin_socket: exception getting command descriptions: [Errno 111] Connection refused

MSG:

non-zero return code
```

This failure happens when the ceph-mon service is stopped, indeed, since
the socket isn't purged, it's a leftover which is confusing the process.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-10 20:08:07 +00:00
Guillaume Abrioux d0746e0858 common: switch from docker module to docker_container
As of ansible 2.4, `docker` module has been removed (was deprecated
since ansible 2.1).
We must switch to `docker_container` instead.

See: https://docs.ansible.com/ansible/latest/modules/docker_module.html#docker-module

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-10 20:08:07 +00:00
Shilpa Jagannath 07852ed039 Remove zone from zonegroup and update period before deleting the zone to avoid inconsistent period information across other zones.
When you delete a zone without removing from zonegroup, the period update would
fail since that command needs to load the zone and zonegroup to be able to
update the master. Period update would fail with an error like this:

radosgw-admin period update --commit
-1 Cannot find zone id= (name=), switching to local zonegroup configuration
-1 Cannot find zone id= (name=)

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
2018-07-09 12:27:24 +00:00
Sébastien Han b9f7df7ba2 common: remove hdparm
As of Kraken, the journal code does not use the hdparm command anymore
so we can remove it from our package dependency list.

Fixes: https://github.com/ceph/ceph-ansible/issues/1402
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit f6910efa24389c264062963b2054c7cd29ffebb3)
2018-07-07 08:53:47 +00:00
Sébastien Han 713b9fcf9b ceph-config: do not log cluster log on container
The container image recently merged both cluster and mon log into a
single stream. Following this, we now see this warning coming from the
container image:

2018-06-19 13:44:01.542990 7ff75b024700  1 mon.vm02@1(peon).log
v57928205 unable to write to '/var/log/ceph/ceph.log' for channel
'cluster': (2) No such file or directory

So we now tell the mon to not log cluster log on the filesystem.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1591771
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-07-05 15:11:45 +00:00
Sébastien Han fcf11ecc35 ceph-common: fix rhcs condition
We forgot to add mgr_group_name when checking for the mon repo, thus the
conditional on the next task was failing.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1598185
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-07-04 17:17:21 +02:00
Guillaume Abrioux 3abc253fec mgr: fix enabling of mgr module on mimic
The data structure has slightly changed on mimic.

Prior to mimic, it used to be:

```
{
    "enabled_modules": [
        "status"
    ],
    "disabled_modules": [
        "balancer",
        "dashboard",
        "influx",
        "localpool",
        "prometheus",
        "restful",
        "selftest",
        "zabbix"
    ]
}
```

From mimic it looks like this:

```
{
    "enabled_modules": [
        "status"
    ],
    "disabled_modules": [
        {
            "name": "balancer",
            "can_run": true,
            "error_string": ""
        },
        {
            "name": "dashboard",
            "can_run": true,
            "error_string": ""
        }
    ]
}
```

This means we can't simply check if `item` is in `item in
_ceph_mgr_modules.disabled_modules`

the idea here is to use filter `map(attribute='name')` to build a list
when deploying mimic.

Fixes: #2766

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-03 21:19:16 +00:00
Sébastien Han 63658c05c7 ceph-client: do not kill the dummy container
The container runs for 300 sec, then dies and removes itself thanks to
the '--rm' option, so there is no point of removing it. Also this is
causing failure under some circonstances.

Closing: https://bugzilla.redhat.com/show_bug.cgi?id=1568157
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-07-03 16:09:52 +00:00
Sébastien Han a629408967 ceph-mds: enable application pool
We now enable the application type 'cephfs' for each cephfs pools we
create.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1590275
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-07-02 10:28:34 +00:00
Sébastien Han 103c279c21 ceph-defaults: add default application to pool
We now add a default 'rbd' application type to each pool we create. This
will remove the warning: "  application not enabled on N pool(s) "

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1590275
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-07-02 10:28:34 +00:00
Vasu Kulkarni 1d454b611f Enable monitor repo for mgr nodes and Tools repo for iscsi/nfs/clients
Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
2018-06-29 18:09:26 +00:00
Sébastien Han abdb53e16a ceph-osd: trigger osd container restart on script change
The script ceph-osd-run.sh holds the config options to start the
container, if one of these options are modified we must restart the
container. This was not the case before becauase the 'notify' flag
wasn't present.

Closing: https://bugzilla.redhat.com/show_bug.cgi?id=1596061
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-28 17:54:13 +02:00
Sébastien Han f623997271 systemd: remove changed_when: false
When using a module there is no need to apply this Ansible option. The
module will handle the idempotency on its own. So the module decides
wether or not the task has changed during the execution.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-28 17:54:13 +02:00
George Shuklin 653b483fc3 Add ceph_keyring_permissions variable to control permissions for
keyring files in /etc/ceph. Default value is the same as it was (0600),
but this variable allows user to override it (f.e. set it to 0640).

Signed-off-by: George Shuklin <george.shuklin@gmail.com>
2018-06-28 15:48:39 +00:00
Ha Phan a7b7735b6f ceph-mon: Generate initial keyring
Minor fix so that initial keyring can be generated using python3.

Signed-off-by: Ha Phan <thanhha.work@gmail.com>
2018-06-28 10:39:56 +02:00
Ha Phan b7b8aba47b Generate a copy of ceph.conf locally
Refers to #2697

This change creates a copy of `ceph.conf` in ansible server.

Signed-off-by: Ha Phan <thanhha.work@gmail.com>
2018-06-28 07:39:30 +00:00
Andy McCrae a4a3d9a01b Fix package state for upgrades on SuSE/RHEL
During 226f80c22b only Debian package
installs had the correct state set to ensure packages were upgraded when
the "upgrade_ceph_packages" var was set to true.

Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>
2018-06-27 18:55:22 +00:00
Sébastien Han 322e2de7d2 mon: honour mon_docker_net_host option
--net=host was hardcoded in the startup line so even though
mon_docker_net_host was set to False the net option would always be
activated.
mon_docker_net_host is set to True by default so this commit does not
change the behaviour.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-27 13:44:41 +00:00
Michel Rode 7774935707 Added 'squash' as a parameter to nfs-ganesha.
Set the default to 'root_squash' - which is the default of nfs-ganesha.

Signed-off-by: Michel Rode <rmichel@devnu11.net>
2018-06-25 09:13:17 +02:00
Christian Zunker 48394597c9 reset failed count of ceph-mgr
Depending on your setup, ceph-mgr might get restarted multiple times.
When this is done to fast, systemd will prevent further restarts because of
configured limits in the ceph-mgr systemd unit file.

Resetting the failure count will prevent this problem. The reset is done before
the restart so in case of a real problem during the restart it still fails.

Fixes: #2768

Signed-off-by: Christian Zunker <christian.zunker@codecentric.cloud>
2018-06-20 13:59:16 +02:00
Sébastien Han bea4027f0c common: start firewalld if configure_firewall
Currently we expect that if configure_firewall is set to True to have
firewalld enabled and running. Let's enforce that.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1589146
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-18 04:02:50 -04:00
Sébastien Han a9ed3579ae mon/osd: bump container memory limit
As discussed with the cores, the current limits are too low and should
be bumped to higher value.
So now by default monitors get 3GB and OSDs get 5GB.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1591876
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-17 11:20:27 -04:00
Guillaume Abrioux 51cf3b7fa0 client: try to kill dummy container only on first client node
The 'dummy' container is created only on first client node, it means we
must seek to destroy this container only on this node, otherwise this
can cause failure like following :
```
fatal: [192.168.24.8]: FAILED! => {"changed": false, "cmd": ["docker", "rm",
"-f", "ceph-create-keys"], "delta": "0:00:00.023692", "end": "2018-06-12
20:56:07.261278", "msg": "non-zero return code", "rc": 1, "start":
"2018-06-12 20:56:07.237586", "stderr": "Error response from daemon: No such
container: ceph-create-keys", "stderr_lines": ["Error response from daemon: No
such container: ceph-create-keys"], "stdout": "", "stdout_lines": []}

```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1590746

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-13 16:10:46 +02:00
Patrick Donnelly 9ce81ae845 ceph-mds: do not enable multimds on jewel
Multiple active MDS became stable in Luminous.

Introduced-by: c8573fe0d7
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-06-12 10:47:34 +02:00
Sébastien Han 2e8412734a common: ability to enable/disable fw configuration
Prior to this patch if you were running on a Red Hat system,
ceph-ansible would try to configure firewalld for you without the
operators's consent.
Now you can enable or disable the fw configuration by setting
configure_firewall to either true or false.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1589146
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-11 21:51:59 +02:00
Konstantin Shalygin 3a07568496 ceph-osd: set 'openstack_keys_tmp' only when 'openstack_config' is defined.
If 'openstack_config' is false this task shouldn't be executed.

Signed-off-by: Konstantin Shalygin <k0ste@k0ste.ru>
2018-06-11 13:03:55 +02:00
Vishal Kanaujia 1a610df02b Fix to run secure cluster only once in a run
The current secure cluster play runs with all the monitors. The rerun
of this task is unnecessary and can be skipped.

Fixes: #2737

Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>
2018-06-11 08:37:29 +02:00
Guillaume Abrioux 090ecff94e client: keyrings aren't created when single client node
combining `run_once: true` with `inventory_hostname ==
groups.get(client_group_name) | first` might cause bug when the only
node being run is not the first in the group.

In a deployment with a single client node it might cause issue because
sometimes keyring won't be created since the task could be definitively
skipped.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1588093

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-08 15:05:47 +02:00
Sébastien Han 20c8065e48 ceph-iscsi: rename group iscsi_gws
Let's try to avoid using dashes as testinfra needs to be able to read
the groups.
Typically, with iscsi-gws we can't add a marker for these iscsi nodes,
using an underscore fixes the issue.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-08 10:21:54 +02:00
Sébastien Han 91bf53ee93 ceph-iscsi: support for containerize deployment
We now have the ability to deploy a containerized version of ceph-iscsi.
The result is similar to the non-containerized version, you simply have
3 containers running for the following services:

* rbd-target-api
* rbd-target-gw
* tcmu-runner

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1508144
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-08 10:21:54 +02:00
Guillaume Abrioux 8a653cacd5 client: add a default value for keyring file
Potential error if someone doesnt pass the mode in `keys` dict for
client nodes:

```
fatal: [client2]: FAILED! => {}

MSG:

The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'mode'

The error appears to have been in '/home/guits/ceph-ansible/roles/ceph-client/tasks/create_users_keys.yml': line 117, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

- name: get client cephx keys
  ^ here

exception type: <class 'ansible.errors.AnsibleUndefinedVariable'>
exception: 'dict object' has no attribute 'mode'

```

adding a default value will avoid the deployment failing for this.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-07 17:26:35 +02:00
Guillaume Abrioux 5eacc8f8d8 tests: add a dummy value for 'dev' release
Functional tests are broken when testing against 'dev' release (ceph).
Adding a dummy value here will make it possible to run ceph-ansible CI
against dev ceph release.

Typical error:

```
>       if request.node.get_marker("from_luminous") and ceph_release_num[ceph_stable_release] < ceph_release_num['luminous']:
E       KeyError: 'dev'
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fd1487d93f21b609a637053f5b33cd2a4e408d00)
2018-06-07 13:59:17 +02:00
Andrew Schoen 24ef47b0e5 ceph-common: move firewall checks after package installation
We need to do this because on dev or rhcs installs ceph_stable_release
is not mandatory and the firewall check tasks have a task that is
conditional based off the installed version of ceph. If we perform those
checks after package install then they will not fail on dev or rhcs
installs.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-06-07 13:59:17 +02:00
Guillaume Abrioux 7b156deb67 client: use dummy created container when there is no mon in inventory
the `docker_exec_cmd` fact set in client role when there is no monitor
in inventory is wrong, `ceph-client-{{ hostname }}` is never created so
it will fail anyway.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-07 16:16:38 +08:00
Guillaume Abrioux 433ecc7cbc osd: copy openstack keys over to all mon
When configuring openstack, the created keyrings aren't copied over to
all monitors nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1588093

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-07 13:58:57 +08:00
Patrick Donnelly 91f9da530f change max_mds default to 1
Otherwise, with the removal of mds_allow_multimds, the default of 3 will be set
on every new FS.

Introduced by: c8573fe0d7

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1583020
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-06-06 12:16:42 +08:00
Vishal Kanaujia 2cdb0d1812 Syntax error fix in rgw multisite role
This checkin fixes a syntax error in RGW multisite role under when
clause.

Fixes: #2704

Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>
2018-06-05 16:01:07 +05:30
Guillaume Abrioux 2cf06b515f rgw: refact rgw pools creation
Refact of 8704144e31
There is no need to have duplicated tasks for this. The rgw pools
creation should be delegated on a monitor node se we don't have to care
if the admin keyring is present on rgw node.
By the way, only one task is needed to create the pools, we just need to
use the `docker_exec_cmd` fact already defined in `ceph-defaults` to
achieve it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1550281

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-05 15:00:20 +08:00
Ha Phan 1f3c9ce4f3 Use python instead of python2
The initial keyring is generated from ansible server locally and the snippet works well for both v2 and v3 of python.

I don't see any reason why we should explicitly invoke`python2` instead of just `python`.

In some setups, `python2` is not symlinked to `python`; while `python` and `python3` refer to v2 and v3 respectively.

Signed-off-by: Ha Phan <thanhha.work@gmail.com>
2018-06-04 14:24:10 +02:00
Sébastien Han db50aec13d ceph-common: add firewall rules for ceph-mgr
Prior to this commit the firewall tasks were not opening the ceph-mgr
ports. This would lead to unclean configuration since the ceph-mgr
daemons can not connect to the OSDs.
Thi commit opens the right ports on the ceph-mgr nodes to talk with the
OSDs.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1526400
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-04 12:11:41 +02:00