Commit Graph

508 Commits (dd97353574720dccf1d9170c6250e6d032d70f02)

Author SHA1 Message Date
Giulio Fidente f0423b1804 Parse ceph_docker_registry in switch to containers
Defaults it to docker.io as it was for backward compatibility.
2017-08-22 13:11:27 +02:00
Giulio Fidente a59b84d5c9 Assume mon_docker_privileged false in switch to containers 2017-08-22 13:01:25 +02:00
Giulio Fidente 0106fa6835 Consume public_network vs ceph_mon_docker_subnet
In the switch to containers migration there were broken references
to ceph_mon_docker_subnet variable, replaced with public_network.

Also fixes references to ceph_mon_docker_extra_env setting for it
a default as it could be undefined.
2017-08-21 18:34:24 +02:00
Giulio Fidente 386303d42e Extend set_uid fact to support RH Ceph images 2017-08-21 18:32:08 +02:00
Sébastien Han 9c824b9818 purge: add ability to purge bluestore osd
We now purge block db and/or wal partitions if we find any.

Closes: https://github.com/ceph/ceph-ansible/issues/1770
Signed-off-by: Sébastien Han <seb@redhat.com>
2017-08-21 18:08:18 +02:00
Andrew Schoen d2f4d3666f Merge pull request #1725 from ceph/simplify-osd-scenario
osd: simply osd scenario declaration
2017-08-03 09:31:57 -05:00
Sébastien Han 671f2cd4bc Merge pull request #1738 from yanyixing/nvmepart
fix for nvme part path
2017-08-03 13:37:10 +02:00
yanyx d506fad056 fix for nvme part path 2017-08-03 17:37:52 +08:00
Sébastien Han 30991b1c0a osd: simplify scenarios
There is only two main scenarios now:

* collocated: everything remains on the same device:
  - data, db, wal for bluestore
  - data and journal for filestore
* non-collocated: dedicated device for some of the component

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-08-03 10:20:39 +02:00
Sébastien Han fdc6aebd62 infrastructure-playbooks: update with ceph-defaults roles
Signed-off-by: Sébastien Han <seb@redhat.com>
2017-08-02 17:12:20 +02:00
Guillaume Abrioux 7a333d05ce Add handlers for containerized deployment
Until now, there is no handlers for containerized deployments.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-08-02 17:12:20 +02:00
Guillaume Abrioux 5adbf0fdaa Move role dependencies in site.yml/site-docker.yml
This will give us more flexibility and avoid a lot of useless when
skipping all tasks from a non-desired role.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-08-02 17:12:14 +02:00
Guillaume Abrioux 206c7a16d0 rolling_update: refact code
Refact rolling_update playbook.
Add ceph-client upgrade.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-08-02 11:10:51 +02:00
yanyx d0a17b11b2 change the partition's ownership 2017-07-27 11:55:30 +08:00
Sébastien Han fad9d0caec Merge pull request #1690 from yanyixing/master
fix: when osd device is a disk partition
2017-07-26 15:55:29 +02:00
yanyx 2e6233271e fix: when osd device is a disk partition 2017-07-25 21:39:43 +08:00
Sébastien Han 0c18cf199e purge: remove leftover unit files
Closes https://github.com/ceph/ceph-ansible/issues/1672

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-07-25 13:26:28 +02:00
Guillaume Abrioux 828f88403e Update: Avoid screen scraping in rolling update
since luminous has revamped the `ceph -s` output, we need to avoid screen
scraping.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-07-12 15:02:39 +02:00
Guillaume Abrioux 896d62d78b Refact: remove ceph_mon_docker_interface variable
remove `ceph_mon_docker_interface` and use `monitor_interface` instead
for both containerized and non-containerized deployment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-07-04 18:08:59 +02:00
Guillaume Abrioux 73141118d0 Make the new check PGs working with /bin/sh
The new test in the checks PGs are no longer working on distributions
where /bin/sh isn't linked to /bin/bash.

Fix: #1619
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-06-22 17:59:38 +02:00
David Galloway 127b5ad9b4 infra: Create a backup of ceph.conf when taking over existing cluster
Signed-off-by: David Galloway <dgallowa@redhat.com>
2017-06-21 09:53:09 -04:00
David Galloway 40ed2d7be6 infra: Fix ceph.conf creation when taking over existing cluster
Fixes bug introduced in https://github.com/ceph/ceph-ansible/pull/1330

The "stat ceph.conf" task was basically using the stat module on a
string instead of the ceph.conf filename.  This caused the "generate
ceph configuration file" task to fail.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1463382

Signed-off-by: David Galloway <dgallowa@redhat.com>
2017-06-21 09:52:01 -04:00
Andrew Schoen e2104acb62 rolling_update: set health_mon_check_delay to 15
The old value of 10 did not give enough time for a containerized mon to
pass the health check.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-06-13 08:56:44 -05:00
Guillaume Abrioux 5af9bb432c rewrite check pgs clean tasks
Avoid screen scrapping by rewriting `waiting for clean pgs` tasks like it is
done in 304de48.

Use the json output returned by `ceph -s` instead

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-06-13 09:48:56 +02:00
Andrew Schoen 59992c54cc purge-docker-cluster: include ceph_docker_registry
We need to include ceph_docker_registry when removing containers/images
because if we don't it will assume docker.io which is not always where
the image originated from, causing the playbook to fail.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-06-02 09:49:17 -05:00
Sébastien Han fdc7866072 Merge pull request #1469 from ceph/refact_code
Docker: Refact code
2017-06-02 12:40:25 +02:00
Andrew Schoen f7677e4393 purge-docker-cluster: pip is only used on Debian
We only need to purge packages installed by pip on Debian systems.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-05-31 09:03:44 -05:00
Andrew Schoen 8e322d4825 purge-docker-cluster: default raw_journal_devices to []
If we're purging a containerized cluster that did not use the
raw_multi_journal OSD scenario then raw_journal_devices will not be
defined which causes the playbook to fail.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1455187

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-05-25 07:30:25 -05:00
Guillaume Abrioux ddfe019342 Refact code
`ceph-docker-common`:
  At the moment there is a lot of duplicated tasks in each
  `./roles/ceph-<role>/tasks/docker/main.yml` that could be refactored in
  `./roles/ceph-docker-common/tasks/main.yml`.

`*_containerized_deployment` variables:
  All `*_containerized_deployment` have been refactored to a single
  variable `containerized_deployment`

duplicate `cephx` variables in `group_vars/* have been removed.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-05-24 15:55:41 +02:00
Sébastien Han 90389864d8 rolling-update: set/unset flags on the right container
Problem: we are delegating the set/unset flag to a monitor node but we
try to call an osd container

Solution: use the right container name.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-05-22 09:38:08 +02:00
Sébastien Han b93ffe637b Merge pull request #1476 from WingkaiHo/improve-shrink-osd.yml
improve shrink-osd.yml can shrink osd when disk damage
2017-04-27 11:01:27 +02:00
WingkaiHo 0b9f322ca0 improve shrink-osd.yml can shrink osd when disk damage 2017-04-27 10:26:26 +08:00
Andrew Schoen 5a3f95dfc1 purge-cluster: check for any running ceph process after purge
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-04-25 09:30:22 -05:00
Andrew Schoen 26bdd59f5d purge-cluster: we don't support sysv or upstart anymore
Now that ceph-ansible only supports > jewel we don't need
to bother with sysv or upstart

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-04-21 15:14:38 -07:00
Andrew Schoen 7ca2bddcce purge-cluster: do not need to check for running ceph processes
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-04-21 15:12:46 -07:00
Andrew Schoen aac79df3b3 purge-cluster: no need to remove ceph.target
The package uninstalls will stop ceph.target

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-04-21 15:11:03 -07:00
Sébastien Han dfd8f4d96e test: add mgr section to the host inventory file
Without this, we don't test the mgr role so we need to add it.

Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
2017-04-15 00:16:10 +02:00
Sébastien Han 17ac1fd464 Merge pull request #1443 from WingkaiHo/osds-journal-migrate
Migrate osd(s) journal to ssd
2017-04-13 16:45:57 +02:00
WingkaiHo 9fba41b4ce Migrate osd(s) journal to ssd 2017-04-13 11:05:58 +08:00
Daniel Lupescu d5e56c481a purge-cluster: fix grep match for NVMe and HP Smart Array devices
raw_device would return invalid block device names for NVMe and HPSA
devices which would cause sgdisk partition deletion to fail

$ echo /dev/nvme1n1p3 | egrep -o '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p){1,2}'
/dev/nvme1n1p

$ echo /dev/cciss/c0d0p2 |  egrep -o '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p){1,2}'
/dev/cciss/c0d0p
2017-04-11 16:13:28 +03:00
Sébastien Han c37aaa41f4 playbook: homogenize the way list osd ids
Problem: too many different commands to do the same thing. The 'cut'
command on infrastructure-playbooks/purge-cluster.yml was also wrong.
This sed command from osixia in ceph-docker
https://github.com/ceph/ceph-docker/pull/580/ addresses all the
scenarios.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-03-30 11:51:38 +02:00
Sébastien Han 35a90ae283 Merge pull request #1386 from WingkaiHo/master
Create recover-osds-after-ssd-journal-failure.yml
2017-03-28 09:50:39 +02:00
Konstantin Shalygin 1662976fc0
Resolve issues when groups names not in default value. 2017-03-27 21:44:30 +07:00
WingkaiHo ac1498b0d7 Merge https://github.com/ceph/ceph-ansible 2017-03-27 10:50:38 +08:00
WingkaiHo ebb56ccebf command module instead shell 2017-03-23 17:38:41 +08:00
WingkaiHo 2d44c1cee6 remove service enable 2017-03-23 15:28:14 +08:00
WingkaiHo 14c189fee5 break it into lines since you already use the string block synta and fix disable it here and enable again in later task 2017-03-23 14:49:10 +08:00
WingkaiHo 62c37042fe remove this detection and simply rely on {{ cluster }} 2017-03-23 09:22:06 +08:00
WingkaiHo 3d10c5981e fix some pelling mistakes and wirting format, use full device path for device name 2017-03-22 17:48:34 +08:00
WingkaiHo 1e670bdeb0 This assumes ceph as a cluster name. We need detect the name of the cluster 2017-03-22 10:09:06 +08:00
WingkaiHo 83a1ac0c67 This assumes ceph as a cluster name. We need detect the name of the cluster 2017-03-22 10:06:11 +08:00
WingkaiHo 19f9e200d7 Add auto detect the ceph cluster name 2017-03-22 10:00:44 +08:00
WingkaiHo 8602166f6e Ansible will include host_vars/ansible_hostname.yml itself, no need this task IMO. 2017-03-21 13:50:27 +08:00
WingkaiHo 55725fd01d fix some syntax error 2017-03-21 11:19:25 +08:00
WingKai Ho 7445113dc4 Create recover-osds-after-ssd-journal-failure.yml
This playbook use to recover Ceph OSDs after ssd journal failure.
2017-03-21 11:08:25 +08:00
Anthony D'Atri 6c4911276e Enhance clean PG check to catch active+clean+scrubbing and active+clean+scrubbing+deep
Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>
2017-03-19 00:23:26 -07:00
Daniel Marks 77edd3d40a Fixing tabs that are breaking the syntax check
With the merge of PR #1336 the syntax check fails. This commit replaces
the tabs with proper indentation.
2017-03-15 14:15:15 +01:00
Sébastien Han 38ab6de602 Merge pull request #1336 from WingkaiHo/master
Load a variable file for devices partition
2017-03-15 11:55:26 +01:00
Sébastien Han 8320c14191 Merge pull request #1317 from ibotty/harmonize-docker-names
harmonize docker names
2017-03-14 18:20:20 +01:00
Andrew Schoen e81d690aa0 switch-to-containers: do not include group vars or role defaults
Doing so will override any values set for these in the group_vars
directory relative to the users inventory.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-03-08 08:57:09 -06:00
Andrew Schoen cf702b05cf purge-docker-cluster: do not include role defaults or group vars
Doing so at playbook level overrides whatever values might be set for
these in the user's group_vars directory that's relative to their
inventory.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-03-08 08:57:09 -06:00
Andrew Schoen aef54d89d9 switch-to-containers: do not set group name vars at playbook level
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-03-08 08:57:09 -06:00
Andrew Schoen 7289acb6b3 purge-docker-cluster: do not set group names vars at playbook level
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-03-08 08:57:08 -06:00
Andrew Schoen 46f26bec13 rolling-update: do not set group name vars at playbook level
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-03-08 08:57:08 -06:00
Andrew Schoen 4fe6607004 purge-cluster: do not set group name vars at playbook level
This has the behavior of overriding custom values set in group_vars.
I've added defaults to the rest of the group names so that if they are
not overridden in group_vars then defaults will be used.

See: https://bugzilla.redhat.com/show_bug.cgi?id=1354700

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-03-08 08:57:08 -06:00
WingKai Ho 0d134b4ad9 Update make-osd-partitions.yml
change
2017-03-08 17:46:37 +08:00
WingKai Ho e2d06068f4 Update make-osd-partitions.yml
When ansible do not load the file host_vars/{{ ansible_hostname }}.yml and host_vars/default.yml it will show syntactic, so keyword "skip" to fix it. 
Exit the playbook if the user not define devices  in both  host_vars/{{ ansible_hostname }}.yml and host_vars/default.yml
2017-03-06 15:43:09 +08:00
WingKai Ho 2861a483d7 Update make-osd-partitions.yml
When ansible do not load the file host_vars/{{ ansible_hostname }}.yml and host_vars/default.yml it will show syntactic err, so add keyword "skip" to fix it. 

Exit the playbook if the user not define devices  in both  host_vars/{{ ansible_hostname }}.yml and host_vars/default.yml
host_vars/default.yml
2017-03-06 10:33:22 +08:00
WingKai Ho 4cc489f2ba Update make-osd-partitions.yml
fix syntactic error
2017-03-03 17:26:53 +08:00
WingKai Ho 102befa927 Update make-osd-partitions.yml
Remove capital `L`
2017-03-02 14:06:41 +08:00
WingKai Ho c3f170e758 Update make-osd-partitions.yml
there is an extra space between 'custom' and 'layout'
2017-03-02 12:24:44 +08:00
WingKai Ho 2967772f6a Load a variable file for devices parrition
load device partition file in directory host_vars

1) if the user define host_vars/hostname.yml load the devices  partition on this file.
2) otherwise load host_vars/default.yml for default
2017-03-01 17:27:57 +08:00
yangyimincn 8b36cbac64 Update rolling_update.yml
The task waiting for the monitor to join the quorum... , the result for ceph -s | grep monmap only contain monmap, not included quorum:

# ceph -s --cluster ceph | grep monmap
     monmap e1: 3 mons at {sh-office-ceph-1=10.12.10.34:6789/0,sh-office-ceph-2=10.12.10.35:6789/0,sh-office-ceph-3=10.12.10.36:6789/0}

If want to get monitor, should use this:

# ceph -s --cluster ceph | grep election
            election epoch 80, quorum 0,1 sh-office-ceph-1,sh-office-ceph-2

ceph verison: 10.2.5
2017-02-28 16:56:02 +08:00
Sébastien Han 4639d89231 infra: fix cluster name detection
The previous command was returning /etc/ceph/ceph.conf, we only need
'ceph' to be returned.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-02-23 15:40:34 -05:00
Tobias Florek 931027e6f7 harmonize docker names
Created containers now are named more or less in the form of

    <ansible role>-<ansible_hostname>
2017-02-23 09:15:05 +01:00
Sébastien Han 3b633d5ddc purge-docker: re-implement zap devices
We now run the container and waits until it dies. Prior to this we were
stopping it before completion so not all the devices where zapped.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-02-21 15:56:09 -05:00
Sébastien Han a002508a91 purge-docker: also purge journal devices
Signed-off-by: Sébastien Han <seb@redhat.com>
2017-02-21 15:54:36 -05:00
Andrew Schoen 5622c94e8b rolling-update: do not use upstart to stop mons when using systemd
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-02-21 12:31:26 -06:00
Shengjing Zhu 32923fd217 fix grep match pattern for osd ids
Some playbooks use [0-9]*, others use \d+$
The latter is more correct since cluster name may contain numbers.

Signed-off-by: Shengjing Zhu <zsj950618@gmail.com>
2017-02-20 16:35:56 +08:00
Andrew Schoen 22f52a9dc6 purge-cluster: also purge dmcrypt dedicated journals
See: https://bugzilla.redhat.com/show_bug.cgi?id=1414647

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-02-15 10:27:17 -06:00
Andrew Schoen 3964929a56 rgw-standalone: also fetch keys from mons
This is to allow for ceph-installer usage of this playbook and
to ensure that you have the correct keys locally when bootstrapping.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-02-14 16:12:59 -06:00
Andrew Schoen c5f561a4e9 purge-cluster: remove calamari-server package
See: https://bugzilla.redhat.com/show_bug.cgi?id=1422134

Signed-off-by: Andrew Schoen <aschoen@redhat.com>

Resolves rhbz#1422134
2017-02-14 09:24:02 -06:00
Sébastien Han c2f1dca823 docker: use a better method to pull images
We changed the way we declare image.
Prior to this patch we must have a "user/image:tag"
format, which is incompatible with non docker-hub registry where you
usually don't have a "user". On the docker hub a "user" is also
identified as a namespace, so for Ceph the user was "ceph".

Variables have been simplified with only:

* ceph_docker_image
* ceph_docker_image_tag

1. For docker hub images: ceph_docker_name: "ceph/daemon" will give
you the 'daemon' image of the 'ceph' user.

2. For non docker hub images: ceph_docker_name: "daemon" will simply
give you the "daemon" image.

Infrastructure playbooks have been modified as well.
The file group_vars/all.docker.yml.sample has been removed as well.
It is hard to maintain since we have to generate it manually. If
you want to configure specific variables for a specific daemon simply
edit group_vars/$DAEMON.yml

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1420207
Signed-off-by: Sébastien Han <seb@redhat.com>
2017-02-09 17:57:18 +01:00
Andrew Schoen 5ddfc4f85c Merge pull request #1284 from ceph/BZ-1418980
purge-cluster: do not use ceph-detect-init
2017-02-08 08:46:03 -06:00
Andrew Schoen 4ff5908758 Merge pull request #1289 from ceph/fix-1286
rolling-update: detect init system properly
2017-02-08 06:31:30 -06:00
Andrew Schoen 865b4500dc purge-cluster: set a default value for fetch_directory if not defined
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-02-08 06:25:43 -06:00
Andrew Schoen adf6aee643 purge-cluster: remove all include tasks
Including variables from role defaults or files in a group_vars
directory relative to the playbook is a bad practice. We don't want to
do this because including these defaults at the task level overrides
values that would be set in a group_vars directory relative to the
inventory file, which is the correct usage if you wish to override
those default values.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-02-08 06:25:43 -06:00
Andrew Schoen 0476b24af1 purge-cluster: do not use ceph-detect-init
We can not always ensure that ceph-detect-init will be
present on the system.

See: https://bugzilla.redhat.com/show_bug.cgi?id=1418980

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-02-08 06:24:44 -06:00
Sébastien Han 8f94bfb498 rolling-update: detect init system properly
Simply use the ansible_service_mgr fact.

Closes: #1286

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-02-08 08:52:05 +01:00
Sébastien Han c34d0a9d28 purge-docker: force image deletion
even if non-runnin containers are using this image as a reference.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-02-07 22:14:21 +01:00
Sébastien Han 72cd9199ac purge: ability to purge client role
Signed-off-by: Sébastien Han <seb@redhat.com>
2017-02-07 22:14:18 +01:00
Guillaume Abrioux 76ddcbc271 Remove support of releases prior to Jewel.
According to #1216, we need to simply the code by removing the
support of anything before Jewel.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-01-31 11:00:54 +01:00
Sébastien Han d5dd658cfa purge: do not stop ceph.target on each daemon
Doing this cause some all the daemons to go down at the same time. In a
scenario where we colocate a monitor and an osd, this osds will take
some time to go down which will make the 'umount' task fail.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-01-30 14:31:56 +01:00
Sébastien Han cb57a359ba purge: do not fail on purge ceph files
On systems running docker there is an issue with lxfs that results in
the find command returning 1 but actually did the job.
e.g: on a system with docker runnning find /var will give us the
following error:

find:
'/var/lib/lxcfs/cgroup/devices/lxc/x1/system.slice/systemd-update-utmp.service/devices.deny':
Permission denied
find:
'/var/lib/lxcfs/cgroup/devices/lxc/x1/system.slice/dev-random.mount/devices.allow':
Permission denied
...
...

However ceph files got deleted so we ignore the error.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-01-30 14:31:56 +01:00
Sébastien Han e371bd591c purge: fix ubuntu purge when not using systemd
We now rely on the cli tool ceph-detect-init which will tell us the init
system in used on the distribution. We do this instead of the previous
lookup for systemd unit files to call the right task depending on the
init system.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-01-30 14:31:56 +01:00
Sébastien Han 0e2e270ab2 purge: allow purge to run multiple times
with_items is evaluated before the when so in a second run where the
variable is empty if will fail with "'dict object' has no attribute
'stdout_lines'". To fix this we had a default array so with_items does
not fail and the task is skipped with the when.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-01-30 14:31:56 +01:00
Sébastien Han 0d2e580768 Merge pull request #1250 from ceph/new-tests
CI testing updates
2017-01-27 14:30:45 +01:00
Andrew Schoen d3cb8dba4e purge-cluster: fix failure when raw_multi_journal is not defined
Because the purge-cluster.yml playbook does not have access to the roles
default vars then we can be sure that raw_multi_journal is defined. For
example, if this was purging a dmcrypt journal then raw_multi_journal
might not be defined at all in group_vars/all.yml or
group_vars/osds.yml.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-01-27 05:23:17 -06:00
Ivan Font 0298354137 Update to use consistent docker extra env vars
This playbook was still referencing the old version of the
ceph_*_docker_extra_env but only for Ceph MONs and Ceph NFS. This
playbook was not kept up-to-date when updating the
ceph_*_docker_extra_env variables to add the '-e' option to docker.
That's because the addition of '-e' breaks this playbook as it requires
a comma separated list of variables for the 'env:' docker module
parameter. Therefore this change just makes the playbook consistently
broken by referencing the same variable throughout.
2017-01-26 15:57:34 -08:00
Andrew Schoen b2a6f095f1 purge-cluster: fix syntax when deleting dmcrypt devices
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-01-26 11:28:30 -06:00
Sébastien Han 73ca1a7a00 purge: remove dm-crypt devices
When running encrypted OSDs, an encrypted device mapper is used (because
created by the crypsetup tool). So before attempting to remove all the
partitions on a device we must delete all the encrypted device mappers,
then we can delete all the partitions.

Signed-off-by: Sébastien Han <seb@redhat.com>

 Please enter the commit message for your changes. Lines starting
2017-01-25 22:32:46 +01:00
Sébastien Han adeb3decf3 purge: remove zap_block_devs variable
The name of this variable was a bit confusing since its activation will
zap all the block devices no matter which osd scenario we are using.
Removing this variable and applying a condition on the OSD scenario is
now feasible and easier since we import group_vars variable files for
OSDs.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-01-18 10:55:01 +01:00
Sébastien Han b7fcbe5ca2 purge: cosmetic cleanup
Just applying our writing syntax convention in the playbook.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-01-18 10:53:21 +01:00
Andrew Schoen dd8389cdf7 purge-cluster: do not include ceph-osd and ceph-common defaults for osds
When purging OSDs we do not need to include these defaults as nothing in
the following tasks uses them. Also, it has the side effect of
overwriting any variables defined in group_vars files that are relative
to the inventory you are using with the default values. That behavior
was causing the CI tests to fail.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-01-10 16:57:58 -06:00
Andrew Schoen 321cea8ba9 purge-cluster: get journal partitions after zapping osd disks
In my testing zapping the osd disks deleted the journal
partitions, making the 'zap ceph journal partitions' task fail because
the partitions it found previously do not exist anymore.

This moves the task that finds the journal partitions after 'zap osd disks'
to catch any partitions ceph-disk might have missed.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-01-03 15:57:17 -06:00
Andrew Schoen c9e5914377 purge-cluster: use ignore_errors: true when including group_vars files
Using failed_when will still throw an exception and stop the playbook if
the file you're trying to include doesn't exist.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-01-03 15:57:17 -06:00
Sébastien Han cb1c06901e Merge pull request #1171 from cbodley/wip-libcephfs2
bump package version to libcephfs2
2017-01-03 10:48:56 +01:00
Shengjing Zhu 2dc2e1d48c infrastructure playbook: add make osd partition
Signed-off-by: Shengjing Zhu <zsj950618@gmail.com>
2016-12-15 22:03:38 +08:00
Casey Bodley acaf01ac17 purge-cluster: add new version of libcephfs2
the libcephfs version was bumped to 2, so we need to check for that as
well when we're removing all ceph packages

Signed-off-by: Casey Bodley <cbodley@redhat.com>
2016-12-09 16:54:06 -05:00
Sébastien Han 9dac195200 take-over: use more precise ceph.conf detection
Prior to this patch we were just looking for any *.conf file which
sometimes could results in multiple matches. The new command looks for a
.conf file that must contain [global] and 'fsid' patterns. This will
definitely get us the ceph.conf file. We can not directly use ceph.conf
because of a different cluster name.

Signed-off-by: Sébastien Han <seb@redhat.com>
2016-12-06 16:02:48 +01:00
Sébastien Han 4444d7d78e git: update gitignore
* ignore yml files in general
* refactor based on commit f8e043b6ea5ac4e886532d4f2f675c507b44b955 that
changed directory layouts

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit ec5c6f5da566611c4e0b88f925cbd26dc90368d6)
2016-12-06 10:18:19 +01:00
chenyanshan 7eab2529ed this patch fix the regex pattern in infrastructure-playbooks/shrink-osd.yml when the osd's pid num is bigger than 9999
Signed-off-by: chenyanshan <yanshanchen@139.com>
2016-12-05 13:40:38 +08:00
Guillaume Abrioux b2b7222b3a [shrink-mon]: force playbook to fail if there is only one mon
The playbook will fail if only 1 mon is in the cluster
and advise to use the `purge-cluster` playbook instead.

Fix #1083
2016-11-25 11:20:11 +01:00
Guillaume Abrioux a680707f6f All `include_vars` need to have `*.yml`, `*.yaml` or `*.json` extension.
As introduced in the following PR:
- https://github.com/ansible/ansible/pull/17207
we need to refactor our code.
2016-11-24 14:03:49 +01:00
Sébastien Han 829e2b6598 Merge pull request #1077 from font/rolling_update
Support containerized rolling update
2016-11-22 16:56:46 +01:00
Sébastien Han 38e846e542 rolling_update: clarify "serial" usage
Prior to this commit the serial variable was poorly documented. Now we
are making clear that this value should be left untouched as the rolling
update mechanism should happen serially.

Solves: bz-1396742
Signed-off-by: Sébastien Han <seb@redhat.com>
2016-11-21 14:42:46 +01:00
Ken Dreyer adfdf6871e remove apache support for RGW
libfcgi is dead upstream (http://tracker.ceph.com/issues/16784)

The RGW developers intend to remove libfcgi support entirely before the
Luminous release.

Since libfcgi gets little-to-no developer attention or testing, remove
it entirely from ceph-ansible.
2016-11-18 13:13:12 -07:00
Ivan Font 255e816e28 Rolling update changes for containerized deployments
Separate out systemd restart tasks for containerized and
non-containerized deployments

Signed-off-by: Ivan Font <ifont@redhat.com>
2016-11-17 11:25:25 -08:00
Ivan Font e72f08080d Warn user when upgrading cluster with only one mon
Signed-off-by: Ivan Font <ifont@redhat.com>
2016-11-17 11:25:25 -08:00
Ivan Font 3ff17f1c8f Support containerized rolling update
- Update rolling update playbook to support containerized deployments
  for mons, osds, mdss, and rgws
- Skip checking if existing cluster is running when performing a rolling
  update
- Fixed bug where we were failing to start the mds container because it
  was missing the admin keyring. The admin keyring was missing because
  it was not being pushed from the mon host to the ansible host due to
  the keyring not being available before running the copy_configs.yml
  task include file. Now we forcefully wait for the admin keyring to be
  generated before continuing with the copy_configs.yml task include file
- Skip pre_requisite.yml when running on atomic host. This technically
  no longer requires specifying to skip tasks containing the with_pkg tag
- Add missing variables to all.docker.sample
- Misc. cleanup

Signed-off-by: Ivan Font <ifont@redhat.com>
2016-11-17 11:25:25 -08:00
Alfredo Deza 60ce2311b8 rolling_update: bump retries for osd_check/retries to 20 minutes
Signed-off-by: Alfredo Deza <adeza@redhat.com>

Resolves: rhbz#1395073
2016-11-17 10:43:58 -05:00
Sébastien Han 81a72cb85d Merge pull request #1068 from ceph/v2.2
moving to ansible v2.2 compatibility
2016-11-16 16:33:40 +01:00
Andrew Schoen 5f44b118b8 rolling update: stop RGWs before upgrade and start afterwards
Signed-off-by: Andrew Schoen <aschoen@redhat.com>

Resolves: rhbz#1394929
2016-11-14 14:47:12 -06:00
Andrew Schoen ded9d9dfd3 rolling update: stop MDSs before upgrading and start afterwards
Signed-off-by: Andrew Schoen <aschoen@redhat.com>

Resolves: rhbz#1394929
2016-11-14 14:47:12 -06:00
Andrew Schoen 5429c5f8c5 rolling update: stop MONs before upgrading and start afterwards
Signed-off-by: Andrew Schoen <aschoen@redhat.com>

Resolves: rhbz#1394929
2016-11-14 14:47:12 -06:00
Andrew Schoen 66f09bdac4 rolling update: stop OSDs before upgrading
This avoids a bug where OSDs are sometimes restarted twice on
upgrades which leaves the OSD process running but not marked up.

See:

https://bugzilla.redhat.com/show_bug.cgi?id=1394928
https://bugzilla.redhat.com/show_bug.cgi?id=1391675
https://bugzilla.redhat.com/show_bug.cgi?id=1394929

Signed-off-by: Andrew Schoen <aschoen@redhat.com>

Resolves: rhbz#1394929
2016-11-14 14:46:58 -06:00
Sébastien Han 991341f525 rolling_update: add variable to upgrade ceph
My stupid self removed this crucial variable here: 217ce3ca thinking it
was another hard coded variable import where this is actually the
trigger for the upgrade.

Closes: #1071

Signed-off-by: Sébastien Han <seb@redhat.com>
2016-11-04 17:31:02 +01:00
Sébastien Han a2fcd222d2 moving to ansible v2.2 compatibility
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-Authored-By: Julien Francoz julien@francoz.net
2016-11-04 10:09:38 +01:00
Andrew Schoen 8262ce5e40 rolling update: fix restarts of radosgw
Signed-off-by: Andrew Schoen <aschoen@redhat.com>

Resolves: rhbz#1391675
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2016-11-03 14:36:42 -05:00
Eduard Egorov ab5c9f2a67 Adjust 'devices' list check for being not defined in purge-cluster playbook (see PR #1024)
Signed-off-by: Eduard Egorov <eduard.egorov@icl-services.com>
2016-11-03 06:36:42 +00:00
Leseb 899c8b309f Merge pull request #1024 from eduardegorov/egorove_make_devices_optional
Make {{ devices }} list optional
2016-11-02 15:12:02 +01:00
Eduard Egorov e5473ee565 Fix typos
Signed-off-by: Eduard Egorov <eduard.egorov@icl-services.com>
2016-11-01 12:29:21 +00:00
Eduard Egorov 3652bb708b Fix rbd-mirrors group name
Signed-off-by: Eduard Egorov <eduard.egorov@icl-services.com>
2016-11-01 12:21:47 +00:00
Eduard Egorov 645b5efebf Fix hard-coded host group names in include tasks for group variables' file paths.
Signed-off-by: Eduard Egorov <eduard.egorov@icl-services.com>
2016-11-01 12:21:40 +00:00
Eduard Egorov f33c1cd2d2 Make {{ devices }} list optional: define it as empty list by default, remove unneccessary 'default([])' checks
Signed-off-by: Eduard Egorov <eduard.egorov@icl-services.com>
2016-11-01 09:57:25 +00:00
Andrew Schoen 0897c965ff rolling_update: define mon_group_name when upgrading the mons
see: https://bugzilla.redhat.com/show_bug.cgi?id=1389456

Signed-off-by: Andrew Schoen <aschoen@redhat.com>

Resolves: rhbz#1389456
2016-10-27 14:17:56 -05:00
Sébastien Han b0989c700f rolling_update: fix wrong indent
Fixing: https://bugzilla.redhat.com/show_bug.cgi?id=1388295
Also add some notes in the README on how to run infrastructure
playbooks.

Signed-off-by: Sébastien Han <seb@redhat.com>
2016-10-26 12:51:08 -05:00
Ivan Font 534b188396 Update for infrastructure-playbooks execution
- Updates to allow running infrastructure-playbooks both from within its
  directory or root directory of ceph-ansible.

Signed-off-by: Ivan Font <ifont@redhat.com>
2016-10-26 09:43:37 -07:00
Andrew Schoen bebf412c92 infrastructure-playbooks: fix syntax errors in all playbooks
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2016-10-25 16:56:58 -05:00
Sébastien Han f49bf2832c rolling_update: improve variables import
we now have pointer to default role so we don't miss any of the
variables defined.

Signed-off-by: Sébastien Han <seb@redhat.com>
2016-10-06 14:08:04 +02:00
Ivan Font 5a5e185e11 Reworked purge cluster playbook
- Separated out one large playbook into multiple playbooks to run
  host-type by host-type i.e. mdss, rgws, rbdmirrors, nfss, osds, mons.
- Combined common tasks into one shared task for all hosts where
  applicable
- Fixed various bugs

Signed-off-by: Ivan Font <ivan.font@redhat.com>
2016-10-05 21:32:38 -07:00
Leseb 598d78cef3 Merge pull request #961 from ceph/fix-purge
purge: only purge ceph partitions
2016-10-04 18:03:21 +02:00
Sébastien Han e81ec9c138 purge: only purge ceph partitions
Prior to this change we were purging all the partitions on the device
when using the raw_journal_devices scenario.
This was breaking deployments where other partitions are used for other
purposes (ie: OS system).

Signed-off-by: Sébastien Han <seb@redhat.com>
2016-10-04 17:58:53 +02:00
Leseb 4bf7e8355a Merge pull request #953 from jsaintrocc/hammerfix
Fixes for Hammer install and added numerical release checks
2016-10-04 11:34:26 +02:00
Sébastien Han ac2cb9ac2c upgrade: add custom timeout options
This commit introduces the ability to configure delays and retries for
cluster health checks, for both monitors and OSDs.

Signed-off-by: Sébastien Han <seb@redhat.com>
2016-10-03 11:27:02 +02:00
James Saint-Rossy 982c44d41c Rebased with upstream master 2016-09-25 23:22:16 -04:00
Sébastien Han b8158a6554 ability to switch from bare metal to containerized daemons
Signed-off-by: Sébastien Han <seb@redhat.com>
2016-09-21 18:07:50 +02:00
Leseb 517196ed66 Merge pull request #977 from ceph/switch-bare-metal-to-container
ability to switch from bare metal to containerized daemons
2016-09-21 15:04:06 +02:00
Sébastien Han 5bfa1b0d24 ability to switch from bare metal to containerized daemons
Signed-off-by: Sébastien Han <seb@redhat.com>
2016-09-21 14:46:57 +02:00
Sébastien Han 21356c653f rolling updates: remove mon compact command
Users have reported this task to hang. Since this command is not
required to perform the upgrade, we remove it.

Signed-off-by: Sébastien Han <seb@redhat.com>
2016-09-13 10:09:07 +02:00
James Saint-Rossy 666637f715 Replaced is_before is_after is_ booleans with numerical version dictionary 2016-09-09 17:34:26 -04:00
Rachana Patel ad5805f03e rolling_update.yml will not work if cluster name is not 'ceph'. Adding --cluster will solve this problem
Fixes issue #969

Signed-off-by: Rachana Patel <rachana83.patel@gmail.com>
2016-09-07 15:38:58 -04:00
James Saint-Rossy f52be23770 Prevent local_action from requiring root 2016-09-02 19:31:59 -04:00
Ivan Font 05c5d1ea91 Update relative path to include vars
Signed-off-by: Ivan Font <ivan.font@redhat.com>
2016-08-24 00:27:54 -07:00
Ivan Font 7c9cb0993e Include group_vars files in purge cluster playbook
- Add all relevant group_vars files in containerized purge cluster
  playbook and ignore errors if file may not exist.
- Also fixing indentation issues.

Signed-off-by: Ivan Font <ivan.font@redhat.com>
2016-08-19 09:11:56 -07:00
Ivan Font c1905bfa23 Update for containerized purge cluster playbook
- Added support for purging containerized rbd-mirror node

Signed-off-by: Ivan Font <ivan.font@redhat.com>
2016-08-19 09:11:56 -07:00
James Saint-Rossy 449d456086 Rebased and moved multisite/rgw playbooks to infrastructure-playbooks 2016-08-17 13:28:01 -04:00
Sébastien Han fde819d1a8 create a directory for infrastructure playbooks
Since we have a couple of infrastructure related playbooks
(additionnally to the roles we are using to deploy Ceph), it makes sense
to have them located in a separate directory.

Signed-off-by: Sébastien Han <seb@redhat.com>
2016-08-17 11:53:34 +02:00