Commit Graph

3530 Commits (ecd3563c2128553d4145a2f9c940ff31458c33b4)
 

Author SHA1 Message Date
Attila Fazekas ecd3563c21 Deploying without managed monitors failed
Tripleo deployment failed when the monitors not manged
by tripleo itself with:
    FAILED! => {"msg": "list object has no element 0"}

The failing play item was introduced by
 f46217b69a .

fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1552327

Signed-off-by: Attila Fazekas <afazekas@redhat.com>
2018-04-04 18:16:46 +02:00
Guillaume Abrioux dcf6a246a4 defaults: remove `run_once: true` when creating fetch_directory
because of `serial: 1`, it can be an issue when the playbook is being
run on client nodes.
Since the refact of `ceph-client` we skip the role `ceph-defaults` on
every node except the first client node, it means that the task is not
going to be played because of `run_once: true`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-04-04 10:51:17 +02:00
Guillaume Abrioux 18c0c7a508 config: use fact `ceph_uid`
Use fact `ceph_uid` in the task which ensures `/etc/ceph` exists in
containerized deployments.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-04-04 10:51:17 +02:00
Guillaume Abrioux 9c979c6390 clients: refact `ceph-clients` role
This commit refacts this role so we don't have to pull container image
on client nodes just to create pools and keys.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1550977

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-04-04 10:51:17 +02:00
Guillaume Abrioux cefd471967 client: remove legacy code
This seems to be a leftover.
This commit removes an unnecessary 'set linux permissions' on
`/var/lib/ceph`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-04-04 10:51:17 +02:00
Guillaume Abrioux 9d3517c670 container: play docker-common only on first client node
This commit aims to set the default behavior to play
`ceph-docker-common` only on first node in clients group.

Currently, we play docker-common to pull container image so we can run
ceph commands in order to generate keys or create pools.
On a cluster with a large number of client nodes this can be time consuming
to proceed this way. An alternative would be to pull container image
only a first node and then copy keys on other nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-04-04 10:51:17 +02:00
Guillaume Abrioux cf27c5e941 move selinux check to `ceph-defaults`
This check is alone in `ceph-docker-common` since a previous code
refactor.
Moving this check in `ceph-defaults` allows us to run `ceph-clients`
without having to run `ceph-docker-common` even in non-containerized
deployment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-04-04 10:51:17 +02:00
Sébastien Han f3caee8460 ceph-iscsi: fix certificates generation and distribution
Prior to this patch, the certificates where being generated on a single
node only (because of the run_once: true). Thus certificates were not
distributed on all the gateway nodes.

This would require a second ansible run to work. This patches fix the
creation and keys's distribution on all the nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1540845
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-04 09:27:39 +02:00
Guillaume Abrioux 5b73be254d do not delegate facts on client nodes
This commit is a workaround for
https://bugzilla.redhat.com/show_bug.cgi?id=1550977

We iterate over all nodes on each node and we delegate the facts gathering.
This is high memory consuming when having a large number of nodes in the
inventory.
That way of gathering is not necessary for clients node so we can simply
gather local facts for these nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-04-04 09:23:03 +02:00
Guillaume Abrioux e32a177af8 purge-docker: remove redundant task
The `remove_packages` prompt is redundant to the `ireallymeanit` prompt
since it does exactly the same thing. I guess the only goal of this task
was to make a break to warn user about `--skip-tags=with_pkg` feature.
This warning should be part of the first prompt.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-04-03 11:54:42 +02:00
Randy J. Martinez ca572a11f1 ceph-mds: delete duplicate tasks which cause multimds container deployments to fail.
This update will resolve error['cephfs' is undefined.] in multimds container deployments.
See: roles/ceph-mon/tasks/create_mds_filesystems.yml. The same last two tasks are present there, and actully need to happen in that role since "{{ cephfs }}" gets defined in
roles/ceph-mon/defaults/main.yml, and not roles/ceph-mds/defaults/main.yml.

Signed-off-by: Randy J. Martinez <ramartin@redhat.com>
2018-03-29 09:32:40 +02:00
Alfredo Deza 3fcf966803 ceph-osd note that some scenarios use ceph-disk vs. ceph-volume
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2018-03-29 09:11:33 +02:00
John Fulton e6e6bd078a Refer to expected-num-ojects as expected_num_objects, not size
Follow up patch to PR 2432 [1] which replaces "size" (sorry if
the original bug used that term, which can be confusing) with
expected_num_objects as is used in the Ceph documentation [2].

[1] https://github.com/ceph/ceph-ansible/pull/2432/files
[2] http://docs.ceph.com/docs/jewel/rados/operations/pools
2018-03-26 15:41:51 +02:00
Ning Yao 691ddf5349 cleanup osd.conf.j2 in ceph-osd
osd crush location is set by ceph_crush in the library,
osd.conf.j2 is not used any more.

Signed-off-by: Ning Yao <yaoning@unitedstack.com>
2018-03-26 15:57:37 +08:00
Patrick Donnelly 7f91547304 setup cephx keys when not nfs_obj_gw
Copy the admin key when configured nfs_file_gw (but not nfs_obj_gw). Also,
copy/setup RGW related directories only when configured as nfs_obj_gw.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-03-22 14:01:08 +01:00
Andrew Schoen 6cffbd5409 ceph-defaults: set is_atomic variable
This variable is needed for containerized clusters and is required for
the ceph-docker-common role. Typically the is_atomic variable is set in
site-docker.yml.sample though so if ceph-docker-common is used outside
of that playbook it needs set in another way. Moving the creation of
the variable inside this role means playbooks don't need to worry
about setting it.

fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1558252

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-03-21 19:16:11 +01:00
Andy McCrae fe4ba9d135 Fix config_template to consistently order sections
In ec042219e6 we added OrderedDict and
sorted to be able to preserve order for config_template k,v pairs inside
a section.

This patch adds a similar ordering for the sections themselves, which
could still change order and intiiate handler restarts.

OrderedDict isn't needed because we use .items() to return a list that
can then be sorted().
2018-03-16 23:24:28 +01:00
Andy McCrae 388562a4af Simplify ceph.conf generation
Since the approach to creating a ceph.conf file has changed, and now
no-longer relies on assembling config file fragments in /etc/ceph/ceph.d
we can avoid the conf_overrides rendering on the local host and skip out
the tasks related to that, instead using just the config_template task
to configure the file directly.
2018-03-15 15:47:41 +01:00
Sébastien Han e3275c1ca1 osd: add fs.aio-max-nr tuning
The number of osds per nodes is limited by aio-max-nr, default is low,
so we need to increase it.

Full story:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-August/020408.html

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1553407
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-15 14:06:26 +01:00
Sébastien Han f432819c1e osd: apply systcl right away
Without     sysctl_set: yes the sysctm tuning will only get applied on
the systctl.conf but not on the fly.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-15 14:06:26 +01:00
Sébastien Han 0f8a4251ba move system tuning to osd role
The changes from these tasks only apply to osd nodes so there is no
reason to have them in ceph-common.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-15 14:06:26 +01:00
Sébastien Han 3ab89ab48c ci: re-arrange group_vars files
We should stop putting everything in 'all'. This is too easy and this is
error prone as well for those who are separating variables into host
type, things that you should do.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han d5f8cac820 ci: remove left over iscsi_gws file
Wrong file that is not used, only iscsi-ggw that is present is correct.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han 8000ae342e remove unsed ceph_rgw_civetweb_port variable
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han f119b25bbe client: implement proper pools creation
Just like we did for the monitor and openstack_config we now have the
ability to precisely create pools.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han e302c1baae mon: add support for erasure code pool
You can now specify type: erasure and   erasure_profile to use when
declaring the pool dictionnary.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han 277d885bc9 mon: add support for pgp, pool type and rule name
When creating pools, it's crucial to expose all the options available as
part of the pool creation command. As explained in:
http://docs.ceph.com/docs/jewel/rados/operations/pools/

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han 4806ff4ff8 ci: test pool creation on container
On containerized scenario we also want to test pool creation.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han 26bc00fb74 mon: fail if pool creation fails
There is no reason to continue the deployment if these tasks fail.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1546185
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han 0011edd2bc mon: add support for expected-num-objects
This commit adds the support for expected-num-objects when creating a pool.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1541520
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han 18402b636f defaults: add useful info if daemon are not restarted properly
If OSDs don't restart normally we now also dump info of the crush map,
crush rules, crush tree and pools.

If the monitors don't restart normally we also print the socket status
by calling mon_status and quorum_status.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
jtudelag 3a9d0c5535 Tune ansible.cfg
Based on the OpenShift one:
https://docs.openshift.com/container-platform/3.7/scaling_performance/install_practices.html#scaling-performance-install-optimization

* Increases number of forks.
* Disables host_key_checking
* Smart gathering facts
* Fact caching jsonfile
* Enables profile_tasks callback
* Mutliplexes ssh connections (ControlMaster)
* Enables pipelining
2018-03-14 13:51:13 +01:00
Andy McCrae 60d4b75f51 Cleanup plugins directories and references
Having callback_plugins, and action plugins in random locations causes
a lot of disparity.

We should centralize this into one place in the plugins directory and
fix up the ansible.cfg to reflect this.

Additionally, since the ansible.cfg already reflects action_plugins, we
don't need a link to action_plugins in the base of the repository.
2018-03-14 11:15:39 +01:00
jtudelag 691f7c5146 Adds handy ceph aliases whe containerized installations.
Same approach as openshift-ansible etcdctl:

* https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/etcd/tasks/auxiliary/drop_etcdctl.yml
* https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/etcd/etcdctl.sh
2018-03-08 13:56:39 +01:00
Guillaume Abrioux 9181c94adf client: fix pgs num for client pool creation
The `pools` dict defined in `roles/ceph-client/defaults/main.yml`
shouldn't have `{{ ceph_conf_overrides.global.osd_pool_default_pg_num
}}` as default value for `pgs` keys.

For instance, if you want some pools to be created but without explicitely
specifying the pgs for these pools (it means you want to use the
`osd_pool_default_pg_num`), you will be obliged to define
`{{ ceph_conf_overrides.global.osd_pool_default_pg_num }}` anyway while you
wanted to use the current default value already defined in the cluster which is
retrieved early in the playbook and stored in the
`{{ osd_pool_default_pg_num }}` fact.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-03-07 11:18:04 +01:00
Sébastien Han 96c049be5b common: run updatedb task on debian systems only
The command doesn't exist on Red Hat systems so it's better to skip it
instead of ignoring the error.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-06 15:24:31 +00:00
Sébastien Han a52ed43093 mon: fix osd_pool_default_crush_rule persistence and effectiveness
Running the last portion (insert new default and add new default crush
tasks) of crush_rules.yml only on the last monitor is
wrong since ceph CLI calls usually end up on the master having the
quorum, which is by default the one with the lower IP.
So if we run the  command and end up on another mon the creation will
happen on the default crush rule because the particular mon hasn't been
updated.
To fix this we remove the |last on the include and use run_once: true on
 certain tasks, then we let the final two tasks run on all the monitors.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-06 15:24:31 +00:00
Sébastien Han 47cef7a41d mon: fix set crush default rule
On releases after jewel the option
'osd_pool_default_crush_replicated_ruleset' does not exist anymore, it's
called osd_pool_default_crush_rule.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-06 15:24:31 +00:00
Sébastien Han 3261ab23b8 osd: remove old crush_location implementation
This was causing a lot of pain with the handlers. Also the
implementation was not ideal since we were assembling files. Everything
can now be done with the ceph_crush module so let's remove that.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-06 15:24:31 +00:00
Sébastien Han fc0fa48e0d test: add tests for creating crush tree
We now run tests on the newly created ceph_crush module. Now the CI will
create a specific hierarchy for the OSD.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-06 15:24:31 +00:00
Sébastien Han 73c4846744 mon: use ceph_crush module in the playbook
Instead of creating the CRUSH hierarchy with Ansible tasks using the
command module we now rely on the ceph_crush module.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-06 15:24:31 +00:00
Sébastien Han 5fac3784f7 add ceph_crush module
This module allows us to create Ceph CRUSH hierarchy. The module works
with
hostvars from individual OSD hosts.
Here is an example of the expected configuration in the inventory file:

[osds]
ceph-osd-01 osd_crush_location="{ 'root': 'mon-roottt', 'rack':
'mon-rackkkk', 'pod': 'monpod', 'host': 'localhost' }"  # valid case

Then, if create_crush_tree is enabled the module will create the
appropriate CRUSH buckets and their types in Ceph.

Some pre-requesites:

* a 'host' bucket must be defined
* at least two buckets must be defined (this includes the 'host')

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-06 15:24:31 +00:00
Greg Charot 78c1f1938f mons: Current crush_rule playbook does not work if there is no default rule defined (default: true).
One could want to add new crush rules while keeping his current default rule.
Fixed it so that it works with all rules defined as "default: false". If multiple rules are defined as default (should not be) then the last rule listed in "crush_rules" is taken as default.
2018-03-06 15:24:31 +00:00
Greg Charot 77f9c1df10 no reason the ceph-ansible ansible default provided crush_rule_hdd rule should be set as rack root + default ruleset 2018-03-06 15:24:31 +00:00
Greg Charot 50afc3fbf3 We don't want to automatically move the rbd pool to the new default crush rule. This operation shall be performed by the cluster operator. 2018-03-06 15:24:31 +00:00
Sébastien Han f2e0ceed78 add support for installation checkpoint
This was taken from the openshift ansible repository here:
https://github.com/leseb/openshift-ansible/tree/master/roles/installer_checkpoint

Rationale:

A complete OpenShift cluster installation is comprised of many different
components which can take 30 minutes to several hours to complete. If
the installation should fail, it could be confusing to understand at
which component the failure occurred. Additionally, it may be desired to
re-run only the component which failed instead of starting over from the
beginning. Components which came after the failed component would also
need to be run individually.

Ceph has a similar situation so we can benefit from that
callback_plugin.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-06 15:21:40 +00:00
Andy McCrae 04ca685ba7 Remove vars that are no longer used
As part of fcba2c801a these vars were
removed and no longer do anything:

radosgw_dns_name
radosgw_resolve_cname

This patch removes them from the group_vars files and defaults/main.yml
2018-03-06 09:16:25 +01:00
jtudelag c3267b77b7 Makes use of docker_exec_cmd in ceph-mon role.
Keeps consistency inside the role and among roles.
Makes the code more readable.
2018-03-05 12:48:35 +00:00
Sébastien Han cb0f598965 common: run updatedb task on debian systems only
The command doesn't exist on Red Hat systems so it's better to skip it
instead of ignoring the error.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-02 20:59:10 +00:00
Sébastien Han 7f19df8196 rgw: add cluster name option to the handler
If the cluster name is different than 'ceph', the command will fail so
we need to pass the cluster name.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-02 20:59:10 +00:00