We can route the torch tests via `onnx` using the `torch.onnx.export`
tooling. We can then reimport, lower to torch, and compile to linalg to
validate the onnx path is working correctly.
The current implementation exposes some failures in the `onnx` path so
we cannot enable the onnx test suite yet due to segmentation faults.
Required some massaging of LTC to make it warning clean, and I had to
manually disable some warnings on the generated source files (which we
don't control).
The project is warning clean now.
The `-Werror` flag is disabled by default as we can't control everywhere
people will try to build/install. The CI enables it via
-DTORCH_MLIR_ENABLE_WERROR_FLAG=ON.
Per the RFC and numerous conversations on Discord, this rebuilds the
torch-mlir CI and discontinues the infra and coupling to the binary
releases
(https://discourse.llvm.org/t/rfc-discontinuing-pytorch-1-binary-releases/76371).
I iterated on this to get latency back to about what it was with the old
(much larger and non-ephemeral) runners: About 4m - 4.5m for an
incremental change.
Behind the scenes changes:
* Uses a new runner pool operated by AMD. It is currently set to manual
scaling and has two runners (32-core, 64GiB RAM) while we get some
traction. We can either fiddle with some auto-scaling or use a schedule
to give it an increase during certain high traffic hours.
* Builds are now completely isolated and cannot have run-to-run
interference like we were getting before (i.e. lock file/permissions
stuff).
* The GHA runner is installed directly into a manylinux 2.28 container
with upgraded dev tools. This eliminates the need to do sub-invocations
of docker on Linux in order to run on the same OS that is used to build
wheels.
* While not using it now, this setup was cloned from another project
that posts the built artifacts to the job and fans out testing. Might be
useful here later.
* Uses a special git cache that lets us have ephemeral runners and still
check out the repo and deps (incl. llvm) in ~13s.
* Running in an Azure VM Scale Set.
In-repo changes:
* Disables (but does not yet delete):
* Old buildAndTest.yml jobs
* releaseSnapshotPackage.yml
* Adds a new `ci.yml` pipeline and scripts the steps in `build_tools/ci`
(by decomposing the existing `build_linux_packages.sh` for in-tree
builds and modularizing it a bit better).
* Test framework changes:
* Adds a `TORCH_MLIR_TEST_CONCURRENCY` env var that can be used to bound
the multiprocess concurrency. Ended up not using this in the final
version but is useful to have as a knob.
* Changes the default concurrency to `nproc * 0.8 + 1` vs `nproc * 1.1`.
We're running on systems with significantly less virtual memory and I
did a bit of fiddling to find a good tradeoff.
* Changed multiprocess mode to spawn instead of fork. Otherwise, I was
getting instability (as discussed on discord).
* Added MLIR configuration to disable multithreaded contexts globally
for the project. Constantly spawning `nproc * nproc` threads (more than
that actually) was OOM'ing.
* Added a test timeout of 5 minutes. If a multiprocess worker crashes,
the framework can get wedged indefinitely (and then will just be reaped
after multiple hours). We should fix this, but this at least keeps the
CI pool from wedging with stuck jobs.
Functional changes needing followup:
* No matter what I did, I couldn't get the LTC tests to work, and I'm
not 100% sure they were being run in the old setup as the scripts were a
bit twisty. I disabled them and left a comment.
* Dropped out-of-tree build variants. These were not providing much
signal and increase CI needs by 50%.
* Dropped MacOS and Windows builds. Now that we are "just a library" and
not building releases, there is less pressure to test these commit by
commit. Further, since we bump torch-mlir to known good commits on these
platforms, it has been a long time since either of these jobs have
provided much signal (and they take ~an hour+ to run). We can add them
back later post-submit if ever needed.
This is part 1 of 2, which will also include upstreaming the FX
importer. I started with ONNX because it forces some project layout
updates and is more self contained/easier as a first step.
Deviating somewhat from the RFCs on project layout, I made the following
decisions:
* Locating the `onnx_importer.py` into `torch_mlir.extras` as Maks
already has opened up that namespace and it seemed to fit. Better to
have fewer things at that level.
* Setup the build so that the root project only contains MLIR Python and
pure Python deps (like the importers), but this can be augmented with
the `projects/` adding more depending on which features are enabled.
* The default build continues to build everything whereas in
`TORCH_MLIR_ENABLE_ONLY_MLIR_PYTHON_BINDINGS=1` mode, it builds a
`torch-mlir-core` wheel with the pure contents only.
`onnx_importer.py` and `importer_smoke_test.py` are almost verbatim
copies from SHARK-Turbine. I made some minor local alterations to adapt
to paths and generalize the way they interact with the outer project. I
expect I can copy these back to Turbine verbatim from here. I also
updated the license boilerplate (they have the same license but slightly
different project norms for the headers) but retained the correct
copyright.
Other updates:
* Added the ONNX importer unit test (which also can generate test data)
in lit, conditioned on the availability of the Python `onnx` package. In
a followup once I know everything is stable, I'll add another env var
that the CI can set to always enable this so we know conclusively if
tests pass.
* Moved the ONNX conversion readme to `docs/`.
* Renamed CMake option `TORCH_MLIR_ENABLE_ONLY_MLIR_PYTHON_BINDINGS` ->
`TORCH_MLIR_ENABLE_PYTORCH_EXTENSIONS` and inverted the sense. Made the
JitIR importer and LTC options `cmake_dependent_options` for robustness.
Some of docs referred to old file paths that no longer exists. This
patch updates some of the instructions that I happened to notice were
out of date. This is not a full update
This lifts the core of the jit_ir_importer and ltc out of the pt1
project, making them peers to it. As a side-effect of this layering, now
the "MLIR bits" (dialects, etc) are not commingled with the various
parts of the pt1 project, allowing pt1 and ltc to overlay cleanly onto a
more fundamental "just MLIR" Python core. Prior to this, the Python
namespace was polluted to the point that this could not happen.
That "just MLIR" Python core will be introduced in a followup, which
will create the space to upstream the FX and ONNX pure Python importers.
This primary non-NFC change to the API is:
* `torch_mlir.dialects.torch.importer.jit_ir` ->
`torch_mlir.jit_ir_importer`.
The rest is source code layering so that we can make the pt1 project
optional without losing the other features.
Progress on #2546.
This is a first step towards the structure we discussed here:
https://gist.github.com/stellaraccident/931b068aaf7fa56f34069426740ebf20
There are two primary goals:
1. Separate the core project (C++ dialects and conversions) from the
hard PyTorch dependencies. We move all such things into projects/pt1 as
a starting point since they are presently entangled with PT1-era APIs.
Additional work can be done to disentangle components from that
(specifically LTC is identified as likely ultimately living in a
`projects/ltc`).
2. Create space for native PyTorch2 Dynamo-based infra to be upstreamed
without needing to co-exist with the original TorchScript path.
Very little changes in this path with respect to build layering or
options. These can be updated in a followup without commingling
directory structure changes.
This also takes steps toward a couple of other layering enhancements:
* Removes the llvm-external-projects/torch-mlir-dialects sub-project,
collapsing it into the main tree.
* Audits and fixes up the core C++ build to account for issues found
while moving things. This is just an opportunistic pass through but
roughly ~halves the number of build actions for the project from the
high 4000's to the low 2000's.
It deviates from the discussed plan by having a `projects/` tree instead
of `compat/`. As I was thinking about it, this will better accommodate
the follow-on code movement.
Once things are roughly in place and the CI passing, followups will
focus on more in-situ fixes and cleanups.
We just have to do this: I ran into an issue today where I needed to make a one line patch to stablehlo to work around a compiler issue, and it is completely unapparent how to do so given that the mlir-hlo repo is a read-only export and is at the tail end of a multi-week integration chain from the open-source stablehlo repo.
We've discussed this often enough and gotten +1 from everyone that they are ok with taking the e2e testing hit if it becomes necessary: It is necessary as the current situation is unmanageable.
Looking at it, I expect it wouldn't actually be very difficult to build a little runner binary out of the stablehlo interpreter and subprocess call that in order to get the testing coverage back. I leave that as an exercise to the users of this part of the stack and recommend following the breadcrumbs from the deleted python/torch_mlir_e2e_test/stablehlo_backends/linalg_on_tensors.py file and the main.py changes.
Note that I am pointing us at a stablehlo fork for the moment until it is apparent that we don't need to carry any local patches to it. We can update this in a few days if everything is clear.
* Tensor[]? support operands type support using partial codegen
* aten.index.Tensor support via partial codegen
* Add torch.index_put tracing support
* Added optional tensor list type support for LTC/TorchMLIR lowering
* Added comments
Co-authored-by: Gleb Kazantaev <gleb.kazantaev@cerebras.net>
* LTC/TorchMLIR multi-output operations support
* Update torch-mlir jit lowering to support ops with dynamic number of outputs
* Added support for aten::split_copy, aten::split_with_sizes_copy
* Fix native function for aten::split; cleanup code
* Fix TorchMlirTensorList lowering
* Remove xfails
* feat: split pytorch requirements into stable and nightly
* fix: add true to tests to see full output
* refactor: add comments to explain true statement
* feat: move some tests to experimental mode
* refactor: refactor pipeline into more fine grained difference
* feat: add version differentiation for some tests
* feat: activate more configs
* refactor: change implementation to use less requirement files
* refactor: remove contraints used for testing
* fix: revert some requirement file names
* refactor: remove unnecessary ninja install
* fix: fix version parsing
* refactor: remove dependency on torchvision in main requirements file
* refactor: remove index url
* style: remove unnecesary line switch
* fix: readd index url
Creates a build_linux_arm64 job that builds the release on an arm64 self-hosted runner.
Drop Python 3.10 support
Pass TM_TORCH_VERSION to choose the Stable PyTorch version (since arm64 doesn't have nightly builds)
Borrows nightly / stable Pytorch switch from the WIP
https://github.com/llvm/torch-mlir/pull/2038
This patch, by itself, doesn't fix caching on Windows, but once a new
release of ccache is available, caching for Windows builds should start
working again (validated by building ccache from source and using it
with LLVM builds).
Ccache rejects caching when either the `/Zi` or `/ZI` flags are used
during compilation on Windows, since these flags tell the compiler to
embed debug information in a PDB file (separate from the object file
produced by the compiler). In particular, our CI builds add the `/Zi`
flag, making ccache mark these compiler invocations as uncacheable.
But what caused our CI to add debug flags, especially when we specified
`-DCMAKE_BUILD_TYPE=Release`? On Windows, unless we specify the
`--config Release` flag during the CMake build step, CMake assumes a
debug build. So all this while, we had been producing debug builds of
torch-mlir for every PR! No doubt it took so long to build the Windows
binaries.
The reason for having to specify the configuration during the _build_
step (as opposed to the _configure_ step) of CMake on Windows is that
CMake's Visual Studio generators will produce _both_ Release and Debug
profiles during the CMake configure step (thus requiring a build-time
value that tells CMake whether to build in Release or Debug mode).
Luckily, on Linux and macOS, the `--config` flag seems to be simply
ignored, instead of causing build errors.
Strangely, based on cursory tests, it seems like on Windows we need to
specify the Relase configuration as both `-DCMAKE_BUILD_TYPE=Release` as
well as `--config Release`. Dropping either made my build switch to a
Debug configuration.
Additionally, there is a bug in ccache v4.8 (although this is addressed
in trunk) that causes ccache to reject caching if the compiler
invocation includes any flag that starts with `/Z`, including /`Zc`,
which is added by LLVM's HandleLLVMOptions.cmake and which isn't related
to debug info or PDB files. The next release of ccache should include
the fix, which is to reject caching only for `/Zi` and `/ZI` flags and
not all flags that start with `/Z`.
As a side note, debugging this problem was possible because of ccache's
log file, which is enabled by: `ccache --set-config="log_file=log.txt"`.
We want to ensure that pip packages required for building torch-mlir
should be included in the dependencies of torch-mlir, but we don't want
the pip packages required for _testing_ of torch-mlir to be included
among the dependencies. To be able to specify and install one set of
dependencies and not the other, this patch separates the pip packages
into two files: build-requirements.txt and test-requirements.txt.
This patch also updates references to the requirements.txt file so that
CI builds that run end-to-end tests install test-related pip
dependencies while everything else (including WHL builds) sticks to just
the build-related pip dependencies.
Despite this change, this patch should not affect a torch-mlir
developer's workflow. More precisely, since this patch makes the
top-level requirements.txt file refer to both build-requirements.txt and
test-requirements.txt files, a torch-mlir developer should be able to
continue referring to the requirements.txt file without any impact.
This patch replaces all MHLO operations with their StableHLO
counterparts and adds a validation pass to ensure that no MHLO operations
remain before translating all Stablehlo operations to the MHLO dialect
for further lowering to the Linalg dialect.
This patch also updates all lit tests so that they refer to the
`convert-torch-to-stablehlo` pass and so that they check for StableHLO
operations.
Previously, torchvision had not released WHL files for Python v3.8,
causing failures in torch-mlir python package builds, so we had disabled
building for Python v3.8.
Now that the WHL files are back, this patch re-enables v3.8 builds.
- Use v3 of actions/checkout, since the version we use (v2) uses
Node.js 12, which is deprecated by GitHub.
- Source the PowerShell venv sctipt (instead of the bash sript) since
the calling script is a PowerShell script. Without this, the build
doesn't use venv at all.
- Make the build dependencies in whl-requirements.txt (used by
setup.py) match those in requirements.txt. To that end, this patch
creates a build-requirements.txt that is referenced by
requirements.txt and whl-requirements.txt.
* [custom op] Generalize shape library logic to work with dtypes
This commit generalizes the shape library logic, so that dtype rules
for ops can also be expressed using the same mechanism. In other
words, each op can now have a shape function and a dtype function
specified in Python that is imported during lowering to calculate the
shapes and dtypes throught a program. For more information about how
to specify a dtype function, see the updated
`docs/adding_a_shape_and_dtype_function.md`.
For those not familiar with how the shape library works, the file
`docs/calculations_lib.md` provides an overview.
This was an experimental attempt at rolling out own op-by-op executor
with `__torch_dispatch__`, but it proved difficult to make it robust.
Op-by-op execution is very easy to implement robustly now with the
PyTorch 2.0 stack, so we don't need eager_mode.
Downstream users were using eager_mode to implement lockstep numerical
accuracy debuggers. We implemented the same functionality with
TorchDynamo in https://github.com/llvm/torch-mlir/pull/1681 so now there
is not much reason to continue maintaining it.
This more accurately reflects what it is. The previous name was
conflating the use of RefBackend (which `linalg`, `tosa`, and `mhlo`
configs all use) with the use of the linalg backend (e.g. TorchToLinalg).
This conflation was artifically giving the linalg backend a "privileged"
position, which we want to avoid. We still keep it as the default
backend, and it remains the most complete, but at least there's not
artificial boosting.
This adds a basic e2e Config for TorchDynamo using
Linalg-on-Tensors/RefBackend.
But TorchDynamo is pretty orthogonal to
various other pieces, so it should compose nicely with variations like:
- Switching out all the backends (Linalg-on-Tensors, TOSA, MHLO)
- PyTorch functionalization and decompositions
- Taking the example inputs and compiling with all dynamic or all static
shapes without duplicating tests.
This adds it to the CI, but there are still a lot of XFAIL's.
This also adds a helper `from torch_mlir.dynamo import
make_simple_dynamo_backend` which simplifies some of the steps for
making a Torch-MLIR-based TorchDynamo backend. We include "simple" in
the name because we are going to be exploring various things next from
the long-term roadmap.
The next steps are:
- Burn down all the XFAIL's.
- Start working on the pieces from the [long-term roadmap](https://github.com/llvm/torch-mlir/blob/main/docs/long_term_roadmap.md).
- Add functionalization/decompositions into the TorchDynamo flow and
remove reliance on the current Torch-MLIR "frontend".
- Write a pure-Python direct FX->MLIR importer.
- Hook up the new PyTorch symbolic shape stuff.
- Explore PrimTorch decompositions for simplifying backends.
We want each build to be reproducible regardless of prior builds and
prior package installations, but pip, by default, uses cached packages
from previous invocations of `pip install`. As a result, the incorrect
dependencies downloaded in the RollPyTorch workflow in the main
repository cannot be reproduced in private forks of the repository. To
resolve this problem, this patch adds a `--no-cache-dir` flag to pip, so
that it fetches and inspects each requested package independent or prior
installations.
Before this patch, the update_shape_lib.sh and update_torch_ods.sh
scripts only worked on in-tree builds, which implied that the
RollPyTorch action was forced to run the longer-running in-tree build.
As a result of this patch, we should be able to run through the basic
checks in the RollPyTorch action faster, while running the full suite of
tests off the critical path.
The key change in this patch is that the update scripts now look for the
directory that is most recently modified between in-tree or out-of-tree
build directories. The change also correctly handles the case when only
one of the two directories exists.