Updating the PyTorch version may break the Torch-MLIR build, as it did
recently, since the PyTorch update caused the shape library to change,
but the shape library was not updated in the commit for updating
PyTorch.
This patch introduces a new default-off environment variable to the
build_linux_packages.sh script called `TM_UPDATE_ODS_AND_SHAPE_LIB`
which instructs the script to run the update_torch_ods.sh and
update_shape_lib.sh scripts.
However, running these scripts requires an in-tree build and the tests
that run for an in-tree build of Torch-MLIR are more comprehensive than
those that run for an out-of-tree build, so this patch also swaps out
the out-of-tree build for an in-tree build.
Prior to this patch, the release process (`pip wheel`) retrieved
dependencies from the pyproject.toml file, which specified a version of
PyTorch that defaulted to the most recent nightly release. Instead, we
want the release process to use the same pinned PyTorch version as the
development build of PyTorch.
Since TOML files can't reference the pytorch-requirements.txt file, this
patch puts the dependencies from pyproject.toml into
whl-requirements.txt, which references pytorch-requirements.txt.
`git diff` does not work by default on untracked files. Since the
function `_check_file_not_changed_by` stores the new generated file in
an untracked file, `git diff` was not catching any modifications in
the new generated file. This commit adds the flag `--no-index` to make
`git diff` work with untracked files.
This patch fetches the most recent nightly (binary) build of PyTorch,
before pinning it in pytorch-requirements.txt, which is referenced in
the top-level requirements.txt file. This way, end users will continue
to be able to run `pip -r requirements.txt` without worrying whether
doing so will break their Torch-MLIR build.
This patch also fetches the git commit hash that corresponds to the
nightly release, and this hash is passed to the out-of-tree build so
that it can build PyTorch from source.
If we were to sort the torch versions as numbers (in the usual
descending order), then 1.9 appears before 1.13. To fix this problem,
we use the `--version-sort` flag (along with `--reverse` for specifying
a descending order). We also filter out lines that don't contain
version numbers by only considering lines that start with a digit.
As a matter of slight clarity, this patch renames the variable
`torch_from_src` to `torch_from_bin`, since that variable is initialized
to `TM_USE_PYTORCH_BINARY`.
Co-authored-by: powderluv <powderluv@users.noreply.github.com>
This adds a very long and obnoxious option to disable crashing tests.
The right fix here is to use the right multiprocessing techniques to
ensure that segfaulting tests can be XFAILed like normal tests, but we
currently don't know how to implement "catch a segfault" in Python
(patches or even just ideas welcome).
Motivated by #1361, where we ended up removing two tests from *all*
backends due to a failure in one backend, which is undesirable.
We added both ipc=host and explicit ulimits. This _may_ be causing slow downs on GHA. Remove the ulimit setting still passes all the CI tests locally. `--ipc=host` is still required.
The new logic has the following benefits:
1. It does not clobber the working tree state. We expect testing to not
change the work tree.
2. It correctly handles the case where a user has changes to the
generated files, but hasn't checked them in yet (this happens
frequently when adding new ops).
Gets both CI and Release builds integrated in one workflow.
Mount ccache and pip cache as required for fast iterative builds
Current Release docker builds still run with root perms, fix it
in the future to run as the same user.
There may be some corner cases left especially when switching
build types etc.
Docker build TEST plan:
tl;dr:
Build everythin: Releases (Python 3.8, 3.9, 3.10) and CIs.
TM_PACKAGES="torch-mlir out-of-tree in-tree"
2.57s user 2.49s system 0% cpu 30:33.11 total
Out of Tree + PyTorch binaries:
Fresh build (purged cache):
TM_PACKAGES="out-of-tree"
0.47s user 0.51s system 0% cpu 5:24.99 total
Incremental with ccache:
TM_PACKAGES="out-of-tree"
0.09s user 0.08s system 0% cpu 34.817 total
Out of Tree + PyTorch from source
Incremental
TM_PACKAGES="out-of-tree" TM_USE_PYTORCH_BINARY=OFF
1.58s user 1.81s system 2% cpu 1:59.61 total
In-Tree + PyTorch binaries:
Fresh build and tests: (purge ccache)
TM_PACKAGES="in-tree"
0.53s user 0.49s system 0% cpu 6:23.35 total
Fresh build/ but with prior ccache
TM_PACKAGES="in-tree"
0.45s user 0.66s system 0% cpu 3:57.47 total
Incremental in-tree with all tests and regression tests
TM_PACKAGES="in-tree"
0.16s user 0.09s system 0% cpu 2:18.52 total
In-Tree + PyTorch from source
Fresh build and tests: (purge ccache)
TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF
2.03s user 2.28s system 0% cpu 11:11.86 total
Fresh build/ but with prior ccache
TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF
1.58s user 1.88s system 1% cpu 4:53.15 total
Incremental in-tree with all tests and regression tests
TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF
1.09s user 1.10s system 1% cpu 3:29.84 total
Incremental without tests
TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF TM_SKIP_TESTS=ON
1.52s user 1.42s system 3% cpu 1:15.82 total
In-tree+out-of-tree + Pytorch Binaries
TM_PACKAGES="out-of-tree in-tree"
0.25s user 0.18s system 0% cpu 3:01.91 total
To clear all artifacts:
rm -rf build build_oot llvm-build libtorch docker_venv
externals/pytorch/build
On my local machine, `unzip` didn't exist (producing a "command not
found" error), but CMake ignored the error. Although the build did
succeed (because it found a previously-built version of libtorch), it
seems better to abort builds on such failures, so this patch checks the
return code of all external process invocations.
Along similar lines, this patch also updates the shell scripts in
`build_tools` to extensively use double-quoting to prevent unintentional
word splitting or globbing. Since some of the scripts execute `rm`
while using shell variables, this patch also adds the preamble `set -u`
to abort execution if an undefined variable is referenced, so that we
reduce the chances of executing `rm -rf /` if the path expression
happens to refer to an undefined variable.
* Add oneshot release snapshot for test/ondemand
Add some build scripts to test new release flow based on IREE.
Wont affect current builds, once this works well we can plumb it
in.
Build with manylinux docker
* Fixes a few issues found when debugging powderluv's setup.
* It is optional to link against Python3_LIBRARIES. Check that and don't do it if they don't exist for this config.
* Clean and auditwheel need to operate on sanitized package names. So "torch_mlir" vs "torch-mlir".
* Adds a pyproject.toml file that pins the build dependencies needed to detect both Torch and Python (the MLIR Python build was failing to detect because Numpy wasn't in the pip venv).
* Commented out auditwheel: These wheels are not PyPi compliant since they weak link to libtorch at runtime. However, they should be fine to deploy to users.
* Adds the --extra-index-url to the pip wheel command, allowing PyTorch to be found.
* Hack setup.py to remove the _mlir_libs dir before building. This keeps back-to-back versions from accumulating in the wheels for subsequent versions. IREE has a more principled way of doing this, but what I have here should work.
Co-authored-by: Stella Laurenzo <stellaraccident@gmail.com>