We just have to do this: I ran into an issue today where I needed to make a one line patch to stablehlo to work around a compiler issue, and it is completely unapparent how to do so given that the mlir-hlo repo is a read-only export and is at the tail end of a multi-week integration chain from the open-source stablehlo repo.
We've discussed this often enough and gotten +1 from everyone that they are ok with taking the e2e testing hit if it becomes necessary: It is necessary as the current situation is unmanageable.
Looking at it, I expect it wouldn't actually be very difficult to build a little runner binary out of the stablehlo interpreter and subprocess call that in order to get the testing coverage back. I leave that as an exercise to the users of this part of the stack and recommend following the breadcrumbs from the deleted python/torch_mlir_e2e_test/stablehlo_backends/linalg_on_tensors.py file and the main.py changes.
Note that I am pointing us at a stablehlo fork for the moment until it is apparent that we don't need to carry any local patches to it. We can update this in a few days if everything is clear.
This patch replaces all MHLO operations with their StableHLO
counterparts and adds a validation pass to ensure that no MHLO operations
remain before translating all Stablehlo operations to the MHLO dialect
for further lowering to the Linalg dialect.
This patch also updates all lit tests so that they refer to the
`convert-torch-to-stablehlo` pass and so that they check for StableHLO
operations.
* [custom op] Generalize shape library logic to work with dtypes
This commit generalizes the shape library logic, so that dtype rules
for ops can also be expressed using the same mechanism. In other
words, each op can now have a shape function and a dtype function
specified in Python that is imported during lowering to calculate the
shapes and dtypes throught a program. For more information about how
to specify a dtype function, see the updated
`docs/adding_a_shape_and_dtype_function.md`.
For those not familiar with how the shape library works, the file
`docs/calculations_lib.md` provides an overview.
Bazel LIT test support was added in https://github.com/llvm/torch-mlir/pull/1585. This PR enables the tests in CI.
```
INFO: Build completed successfully, 254 total actions
@torch-mlir//test/Conversion:TorchToArith/basic.mlir.test PASSED in 0.3s
@torch-mlir//test/Conversion:TorchToLinalg/basic.mlir.test PASSED in 0.5s
@torch-mlir//test/Conversion:TorchToLinalg/elementwise.mlir.test PASSED in 0.3s
@torch-mlir//test/Conversion:TorchToLinalg/flatten.mlir.test PASSED in 0.3s
@torch-mlir//test/Conversion:TorchToLinalg/pooling.mlir.test PASSED in 0.3s
@torch-mlir//test/Conversion:TorchToLinalg/unsqueeze.mlir.test PASSED in 0.2s
@torch-mlir//test/Conversion:TorchToLinalg/view.mlir.test PASSED in 0.3s
@torch-mlir//test/Conversion:TorchToMhlo/basic.mlir.test PASSED in 0.5s
@torch-mlir//test/Conversion:TorchToMhlo/elementwise.mlir.test PASSED in 0.9s
@torch-mlir//test/Conversion:TorchToMhlo/gather.mlir.test PASSED in 0.3s
@torch-mlir//test/Conversion:TorchToMhlo/linear.mlir.test PASSED in 0.6s
@torch-mlir//test/Conversion:TorchToMhlo/pooling.mlir.test PASSED in 0.3s
@torch-mlir//test/Conversion:TorchToMhlo/reduction.mlir.test PASSED in 0.4s
@torch-mlir//test/Conversion:TorchToMhlo/view_like.mlir.test PASSED in 0.6s
@torch-mlir//test/Conversion:TorchToSCF/basic.mlir.test PASSED in 0.2s
@torch-mlir//test/Conversion:TorchToTosa/basic.mlir.test PASSED in 1.1s
@torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/basic.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/error.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/free-functions.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/initializers.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/methods.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/module-uses-error.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/module-uses.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/multiple-instances-error.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/multiple-instances-multiple-module-args.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/multiple-instances.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/submodules.mlir.test PASSED in 0.3s
@torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/visibility.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/adjust-calling-conventions.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/canonicalize.mlir.test PASSED in 0.4s
@torch-mlir//test/Dialect:Torch/decompose-complex-ops-legal.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/decompose-complex-ops.mlir.test PASSED in 0.9s
@torch-mlir//test/Dialect:Torch/drop-shape-calculations.mlir.test PASSED in 0.4s
@torch-mlir//test/Dialect:Torch/erase-module-initializer.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/inline-global-slots-analysis.mlir.test PASSED in 0.3s
@torch-mlir//test/Dialect:Torch/inline-global-slots-transform.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/invalid.mlir.test PASSED in 0.4s
@torch-mlir//test/Dialect:Torch/lower-to-backend-contract-error.mlir.test PASSED in 17.3s
@torch-mlir//test/Dialect:Torch/maximize-value-semantics.mlir.test PASSED in 0.3s
@torch-mlir//test/Dialect:Torch/ops.mlir.test PASSED in 0.3s
@torch-mlir//test/Dialect:Torch/prepare-for-globalize-object-graph.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/promote-types.mlir.test PASSED in 0.3s
@torch-mlir//test/Dialect:Torch/reduce-op-variants-error.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/reduce-op-variants.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/refine-public-return.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/refine-types-branch.mlir.test PASSED in 0.3s
@torch-mlir//test/Dialect:Torch/refine-types-ops.mlir.test PASSED in 0.6s
@torch-mlir//test/Dialect:Torch/refine-types.mlir.test PASSED in 0.4s
@torch-mlir//test/Dialect:Torch/reify-shape-calculations.mlir.test PASSED in 2.9s
@torch-mlir//test/Dialect:Torch/simplify-shape-calculations.mlir.test PASSED in 0.3s
@torch-mlir//test/Dialect:Torch/torch-function-to-torch-backend-pipeline.mlir.test PASSED in 0.6s
@torch-mlir//test/Dialect:TorchConversion/canonicalize.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:TorchConversion/finalizing-backend-type-conversion.mlir.test PASSED in 0.3s
@torch-mlir//test/Dialect:TorchConversion/func-backend-type-conversion.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:TorchConversion/ops.mlir.test PASSED in 0.3s
@torch-mlir//test/Dialect:TorchConversion/verify-linalg-on-tensors-backend-contract.mlir.test PASSED in 0.3s
@torch-mlir//test/Dialect:TorchConversion/verify-tosa-backend-contract.mlir.test PASSED in 0.2s
@torch-mlir//test/RefBackend:insert-rng-globals.mlir.test PASSED in 0.2s
INFO: Build completed successfully, 2[54](https://github.com/sjain-stanford/torch-mlir/actions/runs/3476816449/jobs/5812368489#step:7:55) total actions
@torch-mlir//test/RefBackend:munge-calling-conventions.mlir.test PASSED in 0.2s
Executed [59](https://github.com/sjain-stanford/torch-mlir/actions/runs/3476816449/jobs/5812368489#step:7:60) out of 59 tests: 59 tests pass.
```
GHA workflow: https://github.com/sjain-stanford/torch-mlir/actions/runs/3476816449/jobs/5812368489
I was helping an engineer the other day who was attempting to use the Docker flow for interactive development and ran into countless issues. Add a note that it is not recommended for interactive development, and also move the Docker section down to avoid positioning it as the "default" that people should be using.
Gets both CI and Release builds integrated in one workflow.
Mount ccache and pip cache as required for fast iterative builds
Current Release docker builds still run with root perms, fix it
in the future to run as the same user.
There may be some corner cases left especially when switching
build types etc.
Docker build TEST plan:
tl;dr:
Build everythin: Releases (Python 3.8, 3.9, 3.10) and CIs.
TM_PACKAGES="torch-mlir out-of-tree in-tree"
2.57s user 2.49s system 0% cpu 30:33.11 total
Out of Tree + PyTorch binaries:
Fresh build (purged cache):
TM_PACKAGES="out-of-tree"
0.47s user 0.51s system 0% cpu 5:24.99 total
Incremental with ccache:
TM_PACKAGES="out-of-tree"
0.09s user 0.08s system 0% cpu 34.817 total
Out of Tree + PyTorch from source
Incremental
TM_PACKAGES="out-of-tree" TM_USE_PYTORCH_BINARY=OFF
1.58s user 1.81s system 2% cpu 1:59.61 total
In-Tree + PyTorch binaries:
Fresh build and tests: (purge ccache)
TM_PACKAGES="in-tree"
0.53s user 0.49s system 0% cpu 6:23.35 total
Fresh build/ but with prior ccache
TM_PACKAGES="in-tree"
0.45s user 0.66s system 0% cpu 3:57.47 total
Incremental in-tree with all tests and regression tests
TM_PACKAGES="in-tree"
0.16s user 0.09s system 0% cpu 2:18.52 total
In-Tree + PyTorch from source
Fresh build and tests: (purge ccache)
TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF
2.03s user 2.28s system 0% cpu 11:11.86 total
Fresh build/ but with prior ccache
TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF
1.58s user 1.88s system 1% cpu 4:53.15 total
Incremental in-tree with all tests and regression tests
TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF
1.09s user 1.10s system 1% cpu 3:29.84 total
Incremental without tests
TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF TM_SKIP_TESTS=ON
1.52s user 1.42s system 3% cpu 1:15.82 total
In-tree+out-of-tree + Pytorch Binaries
TM_PACKAGES="out-of-tree in-tree"
0.25s user 0.18s system 0% cpu 3:01.91 total
To clear all artifacts:
rm -rf build build_oot llvm-build libtorch docker_venv
externals/pytorch/build
We use it for more than TorchScript testing now. This is a purely
mechanical change to adjust some file paths to remove "torchscript".
The most perceptible change here is that now e2e tests are run with
```
./tools/e2e_test.sh
instead of:
./tools/torchscript_e2e_test.sh
```