This adds a script, generate_group_vars_sample.sh, that generates
group_vars/*.sample from roles/ceph-*/defaults/main.yml to avoid
discrepancies between the sets of files. It also converts the line
endings in the various main.yml from DOS to Unix, since generating the
samples was spreading the line ending plague around to more files.
0644 should never be a directory mode. 1777 makes it so that any user
can create a ceph client, not just root. (This is helpful if, for
instance, nova-compute is running as non-root.)
Previously, creating pools was skipped if cephx was disabled; instead,
we should only skip key creation if cephx is disabled, and create
pools any time openstack_config is true.
If using another method to generate a consistent fsid, then we can
skip creation of an (unused) cluster UUID file. If cephx is disabled
as well, we can skip creation of the fetch directory entirely.
Skip a number of ceph keyring-related tasks (or remove the keyring
portion of some tasks) when cephx is disabled. Specifically, avoid
generating the initial keyring, which only clutters up the ansible
repo if cephx is not in use.
This commit allows you to set a new variable to 'true' if you want to
have ceph admin key copied over different kind of hosts such as MDS,
OSD, RGW. To enable this just set `copy_admin_key` to true.
Closes: #555
Signed-off-by: Sébastien Han <seb@redhat.com>
When autodiscovering disks, disks can be skipped if either they are
removable, or if they have partitions on them. Skipped actions have no
'rc' attribute, though, so the 'ceph prepare' conditional fails unless
we first check to ensure that the results were not skipped before
checking the return value.
The firewall checks can fail for any number of reasons -- e.g., the
ceph cluster hostnames are unresolvable from the ansible host, or the
ports are filtered by some intermediate hop, etc. Make two changes to
make those checks better:
* Set pipefail when running the checks, so if nmap itself fails the
command will be marked as 'failed'. Specifically, this fixes the
case where the hostnames cannot be resolved.
* Add a new variable, check_firewall, which can be used to disable
checks entirely. Specifically, this fixes the case where some
intermediate firewall filters the ports, so nmap returns "filtered".
If cephx is set to false, the "set keys permissions" task fails with:
file ({# ceph_keys.stdout_lines #}) is absent, cannot continue
This skips that step when cephx is false.
Installs on RHEL with ceph_origin set to distro previously would fail
because no packages would get installed, but all of the checks passed
fine. This adds support for ceph_origin: distro, simply installing the
packages using yum/dnf and assuming that the sysadmin has provided a
repository containing them.
This also supports the use case where Satellite or a similar local
mirror is in use, and the admin does not or cannot use the additional
repositories the role would otherwise add.
The purpose of this is so we can connect to the mons and gather the keys
needed to configure an OSD or additonal MON without having to reconfigure
the existing mons at the same time.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
In our use case we might only be configuring mons and not osds in the
same call, so we don't want to check variables needed for osds when they
are not needed to configure a mon.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
as stated in https://github.com/ansible/ansible/issues/4297
if we register a variable twice and even if a task is skipped the
register will not get overwritten... So we use the fact variant as
mentionned in the ansible issue.
Signed-off-by: Sébastien Han <seb@redhat.com>
While this is not widly used (AFAIK :p) the feature was broken. Thanks
to @zmc for reporting it. You can now set `osd_auto_discovery` to
true in your group_vars/osd and it will go through all the devices
available and will make them OSDs.
Signed-off-by: Sébastien Han <seb@redhat.com>
Currently deploying a MON fails with "bad symbolic permission for mode"
errors due to the file/directory modes not being interpreted as octal
values. This commit updates roles/ceph-common/tasks/main.yml to set
the file/directory modes to strings so they can be interpreted
correctly.
Closes issue #525
run containerized daemons in virtual machines.
to enable it simply do:
`cp site-docker.yml.sample site-docker.yml`
and set `docker: true` in `vagrant_variables.yml`
Signed-off-by: Sébastien Han <seb@redhat.com>
At the moment, all the tasks using the file module are duplicated to have differents ownerships depending on the fact `is_ceph_infernalis`.
The goal of this commit is to have a new logic for this:
- First set facts depending on the `is_ceph_infernalis` fact
- Create the files or directories using the setted facts as ownerships.
We have a requirement to install the packages first without
configuration. These tags should allow us to target the tasks need to do
that.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
as reported in #510 some systems don't have uuidgen installed so we
better use a more global way to generate it. It sounds like python
should be available in case uuidgen is not.
Otherwise we will have to find another way :)
closes#510
Signed-off-by: Sébastien Han <seb@redhat.com>
Currently, all the ceph package installation resources use
"state=latest", which means subsequent runs of the ceph playbooks
could result in ceph being upgraded if there are package updates
available in the selected repo.
This commit adds a new variable to ceph-common called
'upgrade_ceph_packages' which defaults to False. This variable is used
in the package installation resources for ceph packages to determine if
the resource should use "state=present" or "state=latest". If the
variable gets set to True, "state=latest" will be used.
Additionally, we update rolling_update.yml to override
upgrade_ceph_packages to true to permit package upgrades in this
context specifically.
Closes issue #506
It seems that in ansible 2.0 even if a task is skipped by it's `when`
clause not evaluating to true the variables in the play are still
rendered. Because these were not defined in defaults/main.yml ansible
was failing in installs/install_on_redhat where those variables are
being used in a `with_items` stanza.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
This change allows for configurable Ceph Conf Directory permissions. This
is required for integrators of Ceph, like OpenStack Cinder, which needs to
read from /etc/ceph for operation.
Use command module instead of shell since we do not do anything fancy
here. Remove the duplicate register.
Signed-off-by: Sébastien Han <seb@redhat.com>
As raised in #466 it is important in order to avoid unnecessary
troubleshooting to check that ceph ports are allowed on the platform.
The check runs a nmap command from the host running Ansible
to all the ceph nodes with their respective ports.
Signed-off-by: Sébastien Han <seb@redhat.com>
Thanks to @cloudnull great patch at
https://github.com/ansible/ansible/pull/12555
we now have the ability to add more configuration options instead of
having to push a PR to add a new option to the template. So you can
dynamically add and remove flags.
To use it, edit `ceph_conf_overrides` in `group_vars/all` like so:
```
ceph_conf_overrides
global:
foo: 12345
bar: 6789
```
Signed-off-by: Sébastien Han <seb@redhat.com>
Because of some permission issue, likely due to the recent ceph user, if
80 is used for civetweb we get:
set_ports_option: cannot bind to 80: 13 (Permission denied)
Changing the port to 8080 until this gets solved.
Signed-off-by: Sébastien Han <seb@redhat.com>
I changed the argument used for starting the mds server. (pre
infernalis)
```
service ceph start mds
```
errors, while
```
service ceph start mds.$hostname
```
correctly starts the service.
I changed the mds directory ownership from ceph:cephh to root:root
again, for pre-infernalis.
And finally, add the ceph_stable_releases checks for the upstart
activation task `for or after infernalis release'.
I have seen a number of failures on this task due to mismatch of
checksum of source file and destination. I suspect this is due to a
race condition caused by several hosts simultaneously copying the same
file to single location on the deployment server.
This change simply updates the 'copy keys to the ansible server' task
by adding 'run_once', which limits the task to being run on a single
MON host.
Closes issue #410
Verify that partitions (for both osd disks and journal disks) are sane
before attempting to prepare the device. Fail if parted fails for
whatever reason.
Closes: #437
Signed-off-by: Sébastien Han <seb@redhat.com>
Since we renamed the variables and removed the old 'docker' variable we
can now collocate container daemons with standard bare metal deployment.
For instance, monitors can be containerized but osds can be deployed
traditionally.
Signed-off-by: Sébastien Han <seb@redhat.com>
It should be used to disable health warnings about number of PGs
being too low if some pools have very few objects bringing down
the average number of objects per pool. This happens when running RadosGW.
The default is 10 and since the warnings only occur with some use cases,
the default here is 10 as well. Set to 20 or more to silence the warnings.
Currently, the fetch directory is created in your working directory
(where ansible is run from). We prefer to not keep any state in this
directory and would prefer to have the fetch directory configurable so
we can store it outside of our code checkout.
This commit creates a new variable in each role called
`fetch_directory` (defaulting to the previous value of 'fetch/'), and
then updates each reference to 'fetch' to use the new variable instead.
Closes issue #383
When multiple monitor hosts attempt to create the fetch directory there
is the potential for the task to fail with:
"OSError: [Errno 17] File exists: 'fetch'"
This appear to be an issue with the file module trying to create the
same directory at the same time when the tasks has been delegated to a
single host.
This commit enables run_once on the affected task which should address
the issue.
This is a rare case but it happens. Since we're just calling
`monitor_interface` and not `hostvars[host]['monitor_interface'],
an error may occur when the current host's interface does not
exist on the other hosts. (eg. eth0 exists for node0, but it does
not exist on node1 and node2)
Fix for this is to use hostvars[host]['monitor_interface']
I'm removing the ceph paritition check from `activate osd(s) when device
is a disk` because the ceph parition does not exist when parted was
registered (on a fresh install). This was causing the activate step to
be skipped.
Currently the OpenStack pools that get created use the default pg_num.
This commit updates the ceph-mon role to allow the pg_num for each pool
to be customised.
Fix back the rolling update playbook.
However every single time the playbook will run it will check for new
packages and install the latest ones. I don't think this is always the
desired behaviour. We need to find a way to conciliate both...
Signed-off-by: Sébastien Han <seb@redhat.com>
Fix the logic for the mandatory devices check so that it applies to
raw_multi_journal and journal_collocation scenarios separately.
This fails otherwise because whichever var is "first" in the or is most
likely undefined.
This will likely one day or another break something. If ceph-disk
complains about a disk just use the purge-cluster.yml playbook first as
it will wipe all the devices.
Signed-off-by: Sébastien Han <seb@redhat.com>
I'm currently getting a KeyError due to missing 'dependencies' on this
role when I attempt to install it with ansible-galaxy (ansible 1.9.2).
This commit simply defines an empty dependencies list so that
ansible-galaxy executes correctly.
Cool stuff :). We don't need to specify an initial monitor key anymore.
A key will automatically be generated.
The default key can always be overriden with the `monitor_secret`
variable.
Signed-off-by: leseb <seb@redhat.com>
We don't always have a dedicated cluster network so we can by default
re-use the public network value.
This is just laziness :).
Signed-off-by: leseb <seb@redhat.com>
While re-running the playbook we do not want to check for new packages.
We shouldn't perform upgrades, we leave this to the operators.
Signed-off-by: leseb <seb@redhat.com>
Prior to this change, the zap was executed during every play, this was
not ideal. Now we do check if there is a 'ceph' partition. If so we skip
the zap.
Signed-off-by: leseb <seb@redhat.com>
Feel so bad about this one...
Now it's fixed, the rgw section will be activated once the rgws hosts
are part of the inventory.
Signed-off-by: leseb <seb@redhat.com>
Even if the subcription command is indempotent it takes around 15/16sec
to get it done. Where with the simple yum check we lower down this to
3sec.
Signed-off-by: leseb <seb@redhat.com>
Since the command is indempotent we don't need to check if the repo is
enabled as it will likely take twice the time.
Signed-off-by: leseb <seb@redhat.com>
We want to force the user to only enable the options they need. Thus
they shouldn't have to enable one option and then disable another.
Signed-off-by: leseb <seb@redhat.com>
Now we don't need to activate the services through a variable. If the
role is activated in the inventory, actions will occur automatically.
Fixing the repo creation for red hat storage too.
Signed-off-by: leseb <seb@redhat.com>
The new product version has jsut came out. ICE doesn't exist anymore and
Red Hat Storage is the name of the new product.
Signed-off-by: leseb <seb@redhat.com>
The logic was broken here for repeated runs. We only want to run
'ceph-disk prepare' when the disk does not contain a ceph partition, is
not a partition, and raw_multi_journal is set. Previously it would
attempt to run 'ceph-disk prepare' when there was a ceph partition
because the second half of the 'or' was still true since it isn't a
partition.
Following the best practice, we don't create a key from the monitor but
we really on the initial keys created by the mons to bootstrap each
daemon.
Signed-off-by: Sébastien Han <seb@redhat.com>
This branch has been sitting on my local repo for a while. I guess I had
time to spend on a plane :).
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
* fix the Vagrantfile ruby check
* fix the variable positions
Bring more mandatory variables and try to separate Vagrant vars from the
playbook vars.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
Once again and hopefully final commit to rework the support of both
upstart and sysvinit. As from now, Ubuntu systems will use upstart and
the others will use sysvinit.
A later commit might include the support of systemd as the unit files
come out. This will be for Hammer so probably soon.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
Depending on the distro, init scripts will look for different files to
be available on the ceph data dir.
Fixing the upstart support here.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
If the distribution wasn't Ubuntu, the check wasn't performed so the
evaluation in the task later wasn't possible.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
Now the Ceph REST API can be deployed.
Default implementation deploys it on the same nodes as the monitors
which should be fine.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
Fix the usage of Upstart for Ubuntu machines instead of the init.d
script.
Note that because of the way upstart init script looks at the radosgw id
the command 'start radosgw id=' is broken, you should use 'start
radosgw-all' instead.
Keep backard compatibility with the radosgw init script as well by using
client prefixed by 'client.radosgw'.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
The ceph fs new command was introduced in Ceph 0.84. Prior to this
release, no manual steps are required to create a filesystem, and pools
named data and metadata exist by default.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
If we use the hostname, the radosgw will lookup for a wrong secret.
Using the same name for all the gateways.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
Use hostname in socket and log.
Improve jinja template so when a var doesn't exist we don't indent the
next line.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
We isolated the key operations into a file and modified the fetch
function to collect all the new keys.
In the mean time fixed the pool creation since the command is not
indempotent.
Renamed the rgw key to work with the key collection.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
Without this plugin if a Ceph version is present in a repo (let's say
epel) it will install the epel version and not the ICE version.
We install yum-plugin-priorities.noarch to honor the 'priority=1' flag.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
In storage world it's often recommended to disable transparent hugepages
as they will tend to lower performance.
Note that this change won't survive reboot. There are several ways to
disable this permanently such as:
* rc.local
* grub boot line
It's a bit tricky to do this in Ansible since it really depends on the
OS you're running on.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
Default behavior is to fail if a variable is not declared however this
can be disable in your ansible.cfg so we force this variable as
mandatory.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
Still WIP, @mwheckmann free to test
As requested by #162
Current known issue, since ceph.conf gets modified during every single
run (at the end during the merge) so this will restart ceph daemons.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
Depending on the OS you are runnning on you should be able to configure
these values.
Re-ordering file for clarity as well.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
Big cluster will easily reach the default limit so we need to increase
it and make it configurable.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
We remove all the partitions, label and re-create something clean prior
to prepare the design. This will help solving many issues with existing
disks or while scratching/deploy test environments often.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
MDS and RGW are not deployed often (RGW more), so we disable them from
the default deployment to only get MONs and OSDs.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
With the appropriate subscription details you will be able to use the
Inktank Ceph Enterprise version of Ceph running on RHEL7.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
It has becomes really anoying to manually generate an fsid prior to the
inital bootstrap. This commit introduces a method that auto-generates an
fsid. If for whatever reasons you want to force your own fsid you can
simply edit these 3 files and override the fsid variable:
- roles/ceph-common/vars/main.yml
- roles/ceph-mon/vars/main.yml
- roles/ceph-osd/vars/main.yml
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
- We don’t need ceph-extra for trusty
- Enable multiverse repo for access to libapache2-mod-fastcgi
- Update cache before attempting to install packages to register
multiverse repo and only refresh cache once an hour to avoid delays in
the playbook
- Add wildcard to disabling default site as on Ubuntu it is 000_default
by…default
While running big boxes with 72 disks it's easy to get out of PID for
all the threads needed by Ceph. Increasing the default value removes
this limitation.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
In ceph-common you load {{ ansible_managed }} at the top of the main
config file - this will trigger handlers on that file whenever an
Ansible run is made.
I'd suggest replacing it with a vanilla text comment 'managed by
Ansible' to warn
admins but avoid unnecessary cluster bounces.
fixes: #125
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
The ceph.conf.j2 template currently always uses the current host facts
to get the IP address of each host in the mon loop. This is not the
expected behavior. This patch uses the correct facts to get the IP.
Recovery and/or re-balancing decrease performance, adding more options
might help tweaking this behavior.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
Since 192.168.0.0/24 is very commong and might overlap with some
existing networks on your laptop, using another subnet like '42' is less
bound to happen.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
Proviously we used osd_crush_update_on_start: true, this was interpreted
by Ansible as a boolean and appeared as 'True' inside the Ceph configuration
file. However the Ceph's init script is looking for 'true'.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
This commits introduces the support of the developpement branches of
Ceph. You can now install Ceph from master.
The behavior is done through 2 new options:
* ceph_stable: true will use the stable branch
* ceph_dev: true will use the dev branch
For the dev packages don't forget to set the branch that you want to
use.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
Prior to this patch, the first match was winning and the playbook wasn't
doing any difference both "restart ceph", adding a distro filtrer fixes
this.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
It has been reported a couple of months ago by Dan van der Ster from
CERN that updatedb was consumming 100% of CPU while parsing system's
directories. Indeed the process was parsing the OSD PG directories that
might contains billions of objects.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
After a change is made on the configuration file we must restart the
Ceph services. I also added a check that verifies if a socker exists
because during the first play there are no services running. We check if
a socket exists, if not we don't try to restart the services, if it
exists we can restart them.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
I added a 'ceph-' prefix to all the roles related to Ceph. Since we are
about to push the roles into the Ansible Galaxy that will be easier when
we want to use these roles into a larger environement with other roles.
Fixes: #94
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
This commit implements a fourth scenario where we can directely use a
directory instead of a block device for the OSDs. The purpose of this
scenario is more testing-oriented. Please note that we do not check
the filesystem underneath the directory so it is really up to you to
configure this properly. Declaring more than one directory on the
same filesystem will confuse Ceph.
Fixes: #14
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
This commit introduces a new config option 'osd crush chooseleaf type'.
With the help of this option and by setting it to '0' we tell Ceph to
store all the replicas on a single host. Basically we tell CRUSH to
iterate over disk and not over host.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
since we're now using fsid for the directory name, it should be safe to
just copy the keys from all mon hosts. Once they are copied, the rest of
the hosts will just skip copying. :)
The mon_initial_members is not used since we declare the mon section in
the ceph.conf file. Later, we could reduce the ceph.conf file by only
using the mon_host flag instead of all the mon sections.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
We introduced a key generation mechanism that aimed to ease deployment.
In the end, it brought more complexity to the playbook and doesn't
scale.
Reverting the auto generation commit and instructing users to generate
their own keys.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
Currently everything lives in main.yml, the file has become difficult to
read at some point and can be a real mess since we keep adding new
scenarios.
I think we should separate the scenarios into dedicated files and just
do includes in the main.yml file.
Closes: #16
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
As mentionned in the issue 24 it's not really safe to store a default
fsid nor a monitor key. Thus the commit brings the auto-generation of
the initial monitor key. However it is quite complex to do the same for
the fsid, so I leave this to the person in charge of the deployment to
generate one and edit group_vars/all accordingly. The default fsid has
been removed as well.
Close: #24
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
Even if MDS are not configured in site.yml the playbook has a
dependancy on the ceph.conf template.
This disables the mds section from the ceph.conf file.
Closes: #21
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
Now the playbook is able to install Ceph on RedHat systems.
This has been tested on CentOS 6.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
This commits brings the support of multiple journals where each journal
points to a specific OSD and vice-versa. The commit also clarifies the
usage of multi scenarios for both journal and osd_data.
In the meantime, it fixes the collocation scenario.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
This brings the support of heterogeneous hardware. Not all the servers
are identical, some have more or less disks than the others. Prior this
commit, the 'parted' command was hanging, now the command simply exits 1
if the device doesn't exist, same for the 'egrep' piped command after.
Then we skip these errors and continue to run. So now, you can specify
multiple devices in group_vars/osds that don't exist on all the
servers.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>
Since the fetch module takes care of the permissions it is not necessary
to set them with another module. The second command is useless.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>