Commit Graph

166 Commits (d0515cb70487cb803dc3d4509db3c9005d4bf974)

Author SHA1 Message Date
Sébastien Han 0205f6d645 rolling_update: nicer way to set osd flags
Prior to this patch, we were applying the osd flags like this:

"
General pre tasks
Set flags
Upgrade OSDs on a host
Unset flags <-- this triggers pending scrub to start
Set flags
Upgrade OSDs on a hosts
Unset flags <-- this triggers pending scrub to start
.
.
.
General post tasks
"

Now instead, we apply the flag once before starting the OSD update and
unset them once the last OSD is finished.

"
General pre tasks
Set flags and wait for any scrubs to finish
Upgrade OSDs on a host
Upgrade OSDs on a host
.
.
.
Unset flags
General post tasks
"

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1450754
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-08-25 18:21:28 +02:00
Sébastien Han 4a4a20f07d rolling update: skip pg check if num_pgs = 0
In our test case we don't have any pgs, thus the check fails. The check
always returns an empty array, which makes the comparaison failing.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-08-24 08:50:49 +02:00
Alfredo Deza e651469a2a Merge pull request #1797 from ceph/purge-lvm
adds purge support for the lvm_osds osd scenario
2017-08-23 14:28:29 -04:00
Sébastien Han f2499ff5ac Merge pull request #1788 from ceph/improve-switch
switch-from-non-containerized-to-containerized: simplify
2017-08-23 19:47:26 +02:00
Sébastien Han 4f0ecb7f30 switch-from-non-containerized-to-containerized: simplify
This commit eases the use of the
infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml
playbook. We basically run it with a couple of pre-tasks and then we let
the playbook run the docker roles.

It obviously expect to have proper variables configured in order to
work.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-08-23 18:39:45 +02:00
Andrew Schoen bed57572cc purge-cluster: adds support for purging lvm osds
This also adds a new testing scenario for purging lvm osds

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-08-23 10:33:35 -05:00
Sébastien Han 1ac0969c28 Merge pull request #1778 from ceph/fix-1770
purge: add ability to purge bluestore osd
2017-08-22 23:56:36 +02:00
Giulio Fidente 2c01de4350 Default cluster to ceph in switch to containers 2017-08-22 13:13:36 +02:00
Giulio Fidente f0423b1804 Parse ceph_docker_registry in switch to containers
Defaults it to docker.io as it was for backward compatibility.
2017-08-22 13:11:27 +02:00
Giulio Fidente a59b84d5c9 Assume mon_docker_privileged false in switch to containers 2017-08-22 13:01:25 +02:00
Giulio Fidente 0106fa6835 Consume public_network vs ceph_mon_docker_subnet
In the switch to containers migration there were broken references
to ceph_mon_docker_subnet variable, replaced with public_network.

Also fixes references to ceph_mon_docker_extra_env setting for it
a default as it could be undefined.
2017-08-21 18:34:24 +02:00
Giulio Fidente 386303d42e Extend set_uid fact to support RH Ceph images 2017-08-21 18:32:08 +02:00
Sébastien Han 9c824b9818 purge: add ability to purge bluestore osd
We now purge block db and/or wal partitions if we find any.

Closes: https://github.com/ceph/ceph-ansible/issues/1770
Signed-off-by: Sébastien Han <seb@redhat.com>
2017-08-21 18:08:18 +02:00
Andrew Schoen d2f4d3666f Merge pull request #1725 from ceph/simplify-osd-scenario
osd: simply osd scenario declaration
2017-08-03 09:31:57 -05:00
Sébastien Han 671f2cd4bc Merge pull request #1738 from yanyixing/nvmepart
fix for nvme part path
2017-08-03 13:37:10 +02:00
yanyx d506fad056 fix for nvme part path 2017-08-03 17:37:52 +08:00
Sébastien Han 30991b1c0a osd: simplify scenarios
There is only two main scenarios now:

* collocated: everything remains on the same device:
  - data, db, wal for bluestore
  - data and journal for filestore
* non-collocated: dedicated device for some of the component

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-08-03 10:20:39 +02:00
Sébastien Han fdc6aebd62 infrastructure-playbooks: update with ceph-defaults roles
Signed-off-by: Sébastien Han <seb@redhat.com>
2017-08-02 17:12:20 +02:00
Guillaume Abrioux 7a333d05ce Add handlers for containerized deployment
Until now, there is no handlers for containerized deployments.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-08-02 17:12:20 +02:00
Guillaume Abrioux 5adbf0fdaa Move role dependencies in site.yml/site-docker.yml
This will give us more flexibility and avoid a lot of useless when
skipping all tasks from a non-desired role.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-08-02 17:12:14 +02:00
Guillaume Abrioux 206c7a16d0 rolling_update: refact code
Refact rolling_update playbook.
Add ceph-client upgrade.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-08-02 11:10:51 +02:00
yanyx d0a17b11b2 change the partition's ownership 2017-07-27 11:55:30 +08:00
Sébastien Han fad9d0caec Merge pull request #1690 from yanyixing/master
fix: when osd device is a disk partition
2017-07-26 15:55:29 +02:00
yanyx 2e6233271e fix: when osd device is a disk partition 2017-07-25 21:39:43 +08:00
Sébastien Han 0c18cf199e purge: remove leftover unit files
Closes https://github.com/ceph/ceph-ansible/issues/1672

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-07-25 13:26:28 +02:00
Guillaume Abrioux 828f88403e Update: Avoid screen scraping in rolling update
since luminous has revamped the `ceph -s` output, we need to avoid screen
scraping.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-07-12 15:02:39 +02:00
Guillaume Abrioux 896d62d78b Refact: remove ceph_mon_docker_interface variable
remove `ceph_mon_docker_interface` and use `monitor_interface` instead
for both containerized and non-containerized deployment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-07-04 18:08:59 +02:00
Guillaume Abrioux 73141118d0 Make the new check PGs working with /bin/sh
The new test in the checks PGs are no longer working on distributions
where /bin/sh isn't linked to /bin/bash.

Fix: #1619
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-06-22 17:59:38 +02:00
David Galloway 127b5ad9b4 infra: Create a backup of ceph.conf when taking over existing cluster
Signed-off-by: David Galloway <dgallowa@redhat.com>
2017-06-21 09:53:09 -04:00
David Galloway 40ed2d7be6 infra: Fix ceph.conf creation when taking over existing cluster
Fixes bug introduced in https://github.com/ceph/ceph-ansible/pull/1330

The "stat ceph.conf" task was basically using the stat module on a
string instead of the ceph.conf filename.  This caused the "generate
ceph configuration file" task to fail.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1463382

Signed-off-by: David Galloway <dgallowa@redhat.com>
2017-06-21 09:52:01 -04:00
Andrew Schoen e2104acb62 rolling_update: set health_mon_check_delay to 15
The old value of 10 did not give enough time for a containerized mon to
pass the health check.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-06-13 08:56:44 -05:00
Guillaume Abrioux 5af9bb432c rewrite check pgs clean tasks
Avoid screen scrapping by rewriting `waiting for clean pgs` tasks like it is
done in 304de48.

Use the json output returned by `ceph -s` instead

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-06-13 09:48:56 +02:00
Andrew Schoen 59992c54cc purge-docker-cluster: include ceph_docker_registry
We need to include ceph_docker_registry when removing containers/images
because if we don't it will assume docker.io which is not always where
the image originated from, causing the playbook to fail.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-06-02 09:49:17 -05:00
Sébastien Han fdc7866072 Merge pull request #1469 from ceph/refact_code
Docker: Refact code
2017-06-02 12:40:25 +02:00
Andrew Schoen f7677e4393 purge-docker-cluster: pip is only used on Debian
We only need to purge packages installed by pip on Debian systems.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-05-31 09:03:44 -05:00
Andrew Schoen 8e322d4825 purge-docker-cluster: default raw_journal_devices to []
If we're purging a containerized cluster that did not use the
raw_multi_journal OSD scenario then raw_journal_devices will not be
defined which causes the playbook to fail.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1455187

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-05-25 07:30:25 -05:00
Guillaume Abrioux ddfe019342 Refact code
`ceph-docker-common`:
  At the moment there is a lot of duplicated tasks in each
  `./roles/ceph-<role>/tasks/docker/main.yml` that could be refactored in
  `./roles/ceph-docker-common/tasks/main.yml`.

`*_containerized_deployment` variables:
  All `*_containerized_deployment` have been refactored to a single
  variable `containerized_deployment`

duplicate `cephx` variables in `group_vars/* have been removed.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-05-24 15:55:41 +02:00
Sébastien Han 90389864d8 rolling-update: set/unset flags on the right container
Problem: we are delegating the set/unset flag to a monitor node but we
try to call an osd container

Solution: use the right container name.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-05-22 09:38:08 +02:00
Sébastien Han b93ffe637b Merge pull request #1476 from WingkaiHo/improve-shrink-osd.yml
improve shrink-osd.yml can shrink osd when disk damage
2017-04-27 11:01:27 +02:00
WingkaiHo 0b9f322ca0 improve shrink-osd.yml can shrink osd when disk damage 2017-04-27 10:26:26 +08:00
Andrew Schoen 5a3f95dfc1 purge-cluster: check for any running ceph process after purge
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-04-25 09:30:22 -05:00
Andrew Schoen 26bdd59f5d purge-cluster: we don't support sysv or upstart anymore
Now that ceph-ansible only supports > jewel we don't need
to bother with sysv or upstart

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-04-21 15:14:38 -07:00
Andrew Schoen 7ca2bddcce purge-cluster: do not need to check for running ceph processes
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-04-21 15:12:46 -07:00
Andrew Schoen aac79df3b3 purge-cluster: no need to remove ceph.target
The package uninstalls will stop ceph.target

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-04-21 15:11:03 -07:00
Sébastien Han dfd8f4d96e test: add mgr section to the host inventory file
Without this, we don't test the mgr role so we need to add it.

Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
2017-04-15 00:16:10 +02:00
Sébastien Han 17ac1fd464 Merge pull request #1443 from WingkaiHo/osds-journal-migrate
Migrate osd(s) journal to ssd
2017-04-13 16:45:57 +02:00
WingkaiHo 9fba41b4ce Migrate osd(s) journal to ssd 2017-04-13 11:05:58 +08:00
Daniel Lupescu d5e56c481a purge-cluster: fix grep match for NVMe and HP Smart Array devices
raw_device would return invalid block device names for NVMe and HPSA
devices which would cause sgdisk partition deletion to fail

$ echo /dev/nvme1n1p3 | egrep -o '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p){1,2}'
/dev/nvme1n1p

$ echo /dev/cciss/c0d0p2 |  egrep -o '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p){1,2}'
/dev/cciss/c0d0p
2017-04-11 16:13:28 +03:00
Sébastien Han c37aaa41f4 playbook: homogenize the way list osd ids
Problem: too many different commands to do the same thing. The 'cut'
command on infrastructure-playbooks/purge-cluster.yml was also wrong.
This sed command from osixia in ceph-docker
https://github.com/ceph/ceph-docker/pull/580/ addresses all the
scenarios.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-03-30 11:51:38 +02:00
Sébastien Han 35a90ae283 Merge pull request #1386 from WingkaiHo/master
Create recover-osds-after-ssd-journal-failure.yml
2017-03-28 09:50:39 +02:00