Commit Graph

3719 Commits (c94ada69e80d7a1ddfbd2de2b13086d57a6fdfcd)
 

Author SHA1 Message Date
Alfredo Deza 3fcf966803 ceph-osd note that some scenarios use ceph-disk vs. ceph-volume
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2018-03-29 09:11:33 +02:00
John Fulton e6e6bd078a Refer to expected-num-ojects as expected_num_objects, not size
Follow up patch to PR 2432 [1] which replaces "size" (sorry if
the original bug used that term, which can be confusing) with
expected_num_objects as is used in the Ceph documentation [2].

[1] https://github.com/ceph/ceph-ansible/pull/2432/files
[2] http://docs.ceph.com/docs/jewel/rados/operations/pools
2018-03-26 15:41:51 +02:00
Ning Yao 691ddf5349 cleanup osd.conf.j2 in ceph-osd
osd crush location is set by ceph_crush in the library,
osd.conf.j2 is not used any more.

Signed-off-by: Ning Yao <yaoning@unitedstack.com>
2018-03-26 15:57:37 +08:00
Patrick Donnelly 7f91547304 setup cephx keys when not nfs_obj_gw
Copy the admin key when configured nfs_file_gw (but not nfs_obj_gw). Also,
copy/setup RGW related directories only when configured as nfs_obj_gw.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-03-22 14:01:08 +01:00
Andrew Schoen 6cffbd5409 ceph-defaults: set is_atomic variable
This variable is needed for containerized clusters and is required for
the ceph-docker-common role. Typically the is_atomic variable is set in
site-docker.yml.sample though so if ceph-docker-common is used outside
of that playbook it needs set in another way. Moving the creation of
the variable inside this role means playbooks don't need to worry
about setting it.

fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1558252

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-03-21 19:16:11 +01:00
Andy McCrae fe4ba9d135 Fix config_template to consistently order sections
In ec042219e6 we added OrderedDict and
sorted to be able to preserve order for config_template k,v pairs inside
a section.

This patch adds a similar ordering for the sections themselves, which
could still change order and intiiate handler restarts.

OrderedDict isn't needed because we use .items() to return a list that
can then be sorted().
2018-03-16 23:24:28 +01:00
Andy McCrae 388562a4af Simplify ceph.conf generation
Since the approach to creating a ceph.conf file has changed, and now
no-longer relies on assembling config file fragments in /etc/ceph/ceph.d
we can avoid the conf_overrides rendering on the local host and skip out
the tasks related to that, instead using just the config_template task
to configure the file directly.
2018-03-15 15:47:41 +01:00
Sébastien Han e3275c1ca1 osd: add fs.aio-max-nr tuning
The number of osds per nodes is limited by aio-max-nr, default is low,
so we need to increase it.

Full story:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-August/020408.html

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1553407
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-15 14:06:26 +01:00
Sébastien Han f432819c1e osd: apply systcl right away
Without     sysctl_set: yes the sysctm tuning will only get applied on
the systctl.conf but not on the fly.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-15 14:06:26 +01:00
Sébastien Han 0f8a4251ba move system tuning to osd role
The changes from these tasks only apply to osd nodes so there is no
reason to have them in ceph-common.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-15 14:06:26 +01:00
Sébastien Han 3ab89ab48c ci: re-arrange group_vars files
We should stop putting everything in 'all'. This is too easy and this is
error prone as well for those who are separating variables into host
type, things that you should do.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han d5f8cac820 ci: remove left over iscsi_gws file
Wrong file that is not used, only iscsi-ggw that is present is correct.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han 8000ae342e remove unsed ceph_rgw_civetweb_port variable
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han f119b25bbe client: implement proper pools creation
Just like we did for the monitor and openstack_config we now have the
ability to precisely create pools.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han e302c1baae mon: add support for erasure code pool
You can now specify type: erasure and   erasure_profile to use when
declaring the pool dictionnary.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han 277d885bc9 mon: add support for pgp, pool type and rule name
When creating pools, it's crucial to expose all the options available as
part of the pool creation command. As explained in:
http://docs.ceph.com/docs/jewel/rados/operations/pools/

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han 4806ff4ff8 ci: test pool creation on container
On containerized scenario we also want to test pool creation.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han 26bc00fb74 mon: fail if pool creation fails
There is no reason to continue the deployment if these tasks fail.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1546185
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han 0011edd2bc mon: add support for expected-num-objects
This commit adds the support for expected-num-objects when creating a pool.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1541520
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han 18402b636f defaults: add useful info if daemon are not restarted properly
If OSDs don't restart normally we now also dump info of the crush map,
crush rules, crush tree and pools.

If the monitors don't restart normally we also print the socket status
by calling mon_status and quorum_status.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
jtudelag 3a9d0c5535 Tune ansible.cfg
Based on the OpenShift one:
https://docs.openshift.com/container-platform/3.7/scaling_performance/install_practices.html#scaling-performance-install-optimization

* Increases number of forks.
* Disables host_key_checking
* Smart gathering facts
* Fact caching jsonfile
* Enables profile_tasks callback
* Mutliplexes ssh connections (ControlMaster)
* Enables pipelining
2018-03-14 13:51:13 +01:00
Andy McCrae 60d4b75f51 Cleanup plugins directories and references
Having callback_plugins, and action plugins in random locations causes
a lot of disparity.

We should centralize this into one place in the plugins directory and
fix up the ansible.cfg to reflect this.

Additionally, since the ansible.cfg already reflects action_plugins, we
don't need a link to action_plugins in the base of the repository.
2018-03-14 11:15:39 +01:00
jtudelag 691f7c5146 Adds handy ceph aliases whe containerized installations.
Same approach as openshift-ansible etcdctl:

* https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/etcd/tasks/auxiliary/drop_etcdctl.yml
* https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/etcd/etcdctl.sh
2018-03-08 13:56:39 +01:00
Guillaume Abrioux 9181c94adf client: fix pgs num for client pool creation
The `pools` dict defined in `roles/ceph-client/defaults/main.yml`
shouldn't have `{{ ceph_conf_overrides.global.osd_pool_default_pg_num
}}` as default value for `pgs` keys.

For instance, if you want some pools to be created but without explicitely
specifying the pgs for these pools (it means you want to use the
`osd_pool_default_pg_num`), you will be obliged to define
`{{ ceph_conf_overrides.global.osd_pool_default_pg_num }}` anyway while you
wanted to use the current default value already defined in the cluster which is
retrieved early in the playbook and stored in the
`{{ osd_pool_default_pg_num }}` fact.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-03-07 11:18:04 +01:00
Sébastien Han 96c049be5b common: run updatedb task on debian systems only
The command doesn't exist on Red Hat systems so it's better to skip it
instead of ignoring the error.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-06 15:24:31 +00:00
Sébastien Han a52ed43093 mon: fix osd_pool_default_crush_rule persistence and effectiveness
Running the last portion (insert new default and add new default crush
tasks) of crush_rules.yml only on the last monitor is
wrong since ceph CLI calls usually end up on the master having the
quorum, which is by default the one with the lower IP.
So if we run the  command and end up on another mon the creation will
happen on the default crush rule because the particular mon hasn't been
updated.
To fix this we remove the |last on the include and use run_once: true on
 certain tasks, then we let the final two tasks run on all the monitors.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-06 15:24:31 +00:00
Sébastien Han 47cef7a41d mon: fix set crush default rule
On releases after jewel the option
'osd_pool_default_crush_replicated_ruleset' does not exist anymore, it's
called osd_pool_default_crush_rule.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-06 15:24:31 +00:00
Sébastien Han 3261ab23b8 osd: remove old crush_location implementation
This was causing a lot of pain with the handlers. Also the
implementation was not ideal since we were assembling files. Everything
can now be done with the ceph_crush module so let's remove that.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-06 15:24:31 +00:00
Sébastien Han fc0fa48e0d test: add tests for creating crush tree
We now run tests on the newly created ceph_crush module. Now the CI will
create a specific hierarchy for the OSD.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-06 15:24:31 +00:00
Sébastien Han 73c4846744 mon: use ceph_crush module in the playbook
Instead of creating the CRUSH hierarchy with Ansible tasks using the
command module we now rely on the ceph_crush module.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-06 15:24:31 +00:00
Sébastien Han 5fac3784f7 add ceph_crush module
This module allows us to create Ceph CRUSH hierarchy. The module works
with
hostvars from individual OSD hosts.
Here is an example of the expected configuration in the inventory file:

[osds]
ceph-osd-01 osd_crush_location="{ 'root': 'mon-roottt', 'rack':
'mon-rackkkk', 'pod': 'monpod', 'host': 'localhost' }"  # valid case

Then, if create_crush_tree is enabled the module will create the
appropriate CRUSH buckets and their types in Ceph.

Some pre-requesites:

* a 'host' bucket must be defined
* at least two buckets must be defined (this includes the 'host')

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-06 15:24:31 +00:00
Greg Charot 78c1f1938f mons: Current crush_rule playbook does not work if there is no default rule defined (default: true).
One could want to add new crush rules while keeping his current default rule.
Fixed it so that it works with all rules defined as "default: false". If multiple rules are defined as default (should not be) then the last rule listed in "crush_rules" is taken as default.
2018-03-06 15:24:31 +00:00
Greg Charot 77f9c1df10 no reason the ceph-ansible ansible default provided crush_rule_hdd rule should be set as rack root + default ruleset 2018-03-06 15:24:31 +00:00
Greg Charot 50afc3fbf3 We don't want to automatically move the rbd pool to the new default crush rule. This operation shall be performed by the cluster operator. 2018-03-06 15:24:31 +00:00
Sébastien Han f2e0ceed78 add support for installation checkpoint
This was taken from the openshift ansible repository here:
https://github.com/leseb/openshift-ansible/tree/master/roles/installer_checkpoint

Rationale:

A complete OpenShift cluster installation is comprised of many different
components which can take 30 minutes to several hours to complete. If
the installation should fail, it could be confusing to understand at
which component the failure occurred. Additionally, it may be desired to
re-run only the component which failed instead of starting over from the
beginning. Components which came after the failed component would also
need to be run individually.

Ceph has a similar situation so we can benefit from that
callback_plugin.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-06 15:21:40 +00:00
Andy McCrae 04ca685ba7 Remove vars that are no longer used
As part of fcba2c801a these vars were
removed and no longer do anything:

radosgw_dns_name
radosgw_resolve_cname

This patch removes them from the group_vars files and defaults/main.yml
2018-03-06 09:16:25 +01:00
jtudelag c3267b77b7 Makes use of docker_exec_cmd in ceph-mon role.
Keeps consistency inside the role and among roles.
Makes the code more readable.
2018-03-05 12:48:35 +00:00
Sébastien Han cb0f598965 common: run updatedb task on debian systems only
The command doesn't exist on Red Hat systems so it's better to skip it
instead of ignoring the error.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-02 20:59:10 +00:00
Sébastien Han 7f19df8196 rgw: add cluster name option to the handler
If the cluster name is different than 'ceph', the command will fail so
we need to pass the cluster name.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-02 20:59:10 +00:00
Sébastien Han fd94840a6e ci: add copy_admin_key test to container scenario
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-02 20:59:10 +00:00
Sébastien Han 9c85280602 rgw: ability to copy ceph admin key on containerized
If we now set copy_admin_key while running a containerized scenario, the
ceph admin key will be copied on the node.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-02 20:59:10 +00:00
Sébastien Han 67f46d8ec3 rgw: run the handler on a mon host
In case the admin wasn't copied over to the node this command would
fail. So it's safer to run it from a monitor directly.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-02 20:59:10 +00:00
Guillaume Abrioux 1e283bf69b tests: make CI jobs using 'ansible.cfg'
The jobs launches by the CI are not using 'ansible.cfg'.
There are some parameters that should avoid SSH failure that we are used
to see in the CI so far.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-02-28 13:51:52 +01:00
Guillaume Abrioux 6d35bc9bde client: use `ceph_uid` fact to set uid/gid on admin key
That task is failing on containerized deployment because `ceph:ceph`
doesn't exist.
The idea here is to use the `{{ ceph_uid }}` to set the ownerships for
the admin keyring when containerized_deployment.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1540578

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-02-26 15:52:05 +01:00
Grant Slater 1e1b26ca4d mds: fix ansible_service_mgr typo
This commit fixes a typo introduced by 4671b9e74e
2018-02-26 13:05:14 +01:00
Andy McCrae c33dae7509 Revert "[TEST] Test setting up correct systemd file for nfs-ganesha"
The nfs-ganesha package has been fixed as part of this commit:
963b6681df

Once the package is rebuilt this should be good to merge.

This reverts commit e88af3c4cb.
2018-02-26 10:23:42 +01:00
Giulio Fidente a83e1aeea3 Make rule_name optional when defining items in openstack_pools
Previously it was necessary to provide a value (eventually an
empty string) for the "rule_name" key for each item in
openstack_pools. This change makes that optional and defaults to
empty string when not given.
2018-02-23 15:11:53 +01:00
Sébastien Han 165d9dec10 remove kernel.pid_max
This is now managed by Ceph packages.

See: https://github.com/ceph/ceph/pull/18544/files

http://tracker.ceph.com/issues/21929

Closes: https://github.com/ceph/ceph-ansible/issues/2410

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-02-23 13:57:57 +01:00
Guillaume Abrioux 4a8986459f tests: change ceph_docker_image_tag for 2nd run
The ceph-ansible upstream CI runs severals tests, including a
'idempotency/handlers' test. It means the playbook is run a first time
and then a second time with an other container image version to ensure the
handlers run properly and the containers are well restarted.
This can cause issues.
For instance, in that specific case which drove me to submit this commit,
I've hit the case where `latest` image ships ceph 12.2.3 while the `stable-3.0`
(which is the image used for the second run) ships ceph 12.2.2.

The goal of this test is not to verify we can upgrade from a specific
version to another but to ensure handlers are working even if it's a valid
failure here.
It should be caught by a test dedicated to that usecase.

We just need to have a container image which has a different id for
the upstream CI, we need the same content in container imagebut a different
image id in the registry since the test relies on image id to decide whether
the container should be restarted.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-02-23 13:54:32 +01:00
Guillaume Abrioux 707458c979 ci: add tripleo scenario testing
This should help to see earlier any failure in a tripleo deployment scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-02-23 13:54:32 +01:00