This documents which CMake options must be set to be able to use
`torch_mlir_e2e_test`, required e.g. for
`projects/pt1/tools/e2e_test.sh`.
Makes progress on #3696.
Closes#3719.
This bump triggered an upstream assert. Includes a WAR for #3506.
Also includes several things I needed to do to repro:
* When TORCH_MLIR_TEST_CONCURRENCY=1, test runs will be printed.
* Added TORCH_MLIR_TEST_VERBOSE=1 handling to enable verbose mode
(useful on CI).
---------
Co-authored-by: Stella Laurenzo <stellaraccident@gmail.com>
@kuhar mentioned in the previous PR that we should use ld.lld. I kept
using ld because for my LLD version, it worked.
After updating to a new LLD version, that became necessary.
Users can run via `pre-commit run` or set up a hook as described in the
instructions: https://pre-commit.com/
The CI is set to only run pre-commit on files changed in the patch. We
will run with `--all-files` in a separate patch.
As this
issuecomment(https://github.com/llvm/torch-mlir/pull/3021#issuecomment-2031248199)
suggests, `setup.py` should only be used for building Python packages,
so:
* disabled the develop command
* refactor the environment variable parameters
* add more doc for the usage and env var of setup.py
Link to related RFC:
https://discourse.llvm.org/t/rfc-rename-torch-mlir-compile-apis-and-introduce-fx-based-analogs/76646
This commit updates the documentation, tests, CMake files, and API for
the proposed changes in the RFC. There is a new torch_mlir/fx.py for
user level APIs related to importing modules and a corresponding test
for this path can be found at test/python/fx_importer/basic_test.py.
---------
Co-authored-by: MaheshRavishankar <mravisha@amd.com>
Adding the `--progress` flag shows the same output as what `git clone`
would show. This is very nice for slow connections. Without it, the
command may run for many minutes without providing any indication that
it is still doing something.
For `--depth=1`, I think it should be safe as most people have new
enough git versions nowadays, but let's be safe and make it an optional
suggestion. I ran all the tests fine with `--depth=1`, but I don't know
whether things will keep working when the submodules get updated for
systems with old git versions.
Some of docs referred to old file paths that no longer exists. This
patch updates some of the instructions that I happened to notice were
out of date. This is not a full update
This is a first step towards the structure we discussed here:
https://gist.github.com/stellaraccident/931b068aaf7fa56f34069426740ebf20
There are two primary goals:
1. Separate the core project (C++ dialects and conversions) from the
hard PyTorch dependencies. We move all such things into projects/pt1 as
a starting point since they are presently entangled with PT1-era APIs.
Additional work can be done to disentangle components from that
(specifically LTC is identified as likely ultimately living in a
`projects/ltc`).
2. Create space for native PyTorch2 Dynamo-based infra to be upstreamed
without needing to co-exist with the original TorchScript path.
Very little changes in this path with respect to build layering or
options. These can be updated in a followup without commingling
directory structure changes.
This also takes steps toward a couple of other layering enhancements:
* Removes the llvm-external-projects/torch-mlir-dialects sub-project,
collapsing it into the main tree.
* Audits and fixes up the core C++ build to account for issues found
while moving things. This is just an opportunistic pass through but
roughly ~halves the number of build actions for the project from the
high 4000's to the low 2000's.
It deviates from the discussed plan by having a `projects/` tree instead
of `compat/`. As I was thinking about it, this will better accommodate
the follow-on code movement.
Once things are roughly in place and the CI passing, followups will
focus on more in-situ fixes and cleanups.
We just have to do this: I ran into an issue today where I needed to make a one line patch to stablehlo to work around a compiler issue, and it is completely unapparent how to do so given that the mlir-hlo repo is a read-only export and is at the tail end of a multi-week integration chain from the open-source stablehlo repo.
We've discussed this often enough and gotten +1 from everyone that they are ok with taking the e2e testing hit if it becomes necessary: It is necessary as the current situation is unmanageable.
Looking at it, I expect it wouldn't actually be very difficult to build a little runner binary out of the stablehlo interpreter and subprocess call that in order to get the testing coverage back. I leave that as an exercise to the users of this part of the stack and recommend following the breadcrumbs from the deleted python/torch_mlir_e2e_test/stablehlo_backends/linalg_on_tensors.py file and the main.py changes.
Note that I am pointing us at a stablehlo fork for the moment until it is apparent that we don't need to carry any local patches to it. We can update this in a few days if everything is clear.
This patch replaces all MHLO operations with their StableHLO
counterparts and adds a validation pass to ensure that no MHLO operations
remain before translating all Stablehlo operations to the MHLO dialect
for further lowering to the Linalg dialect.
This patch also updates all lit tests so that they refer to the
`convert-torch-to-stablehlo` pass and so that they check for StableHLO
operations.
* [custom op] Generalize shape library logic to work with dtypes
This commit generalizes the shape library logic, so that dtype rules
for ops can also be expressed using the same mechanism. In other
words, each op can now have a shape function and a dtype function
specified in Python that is imported during lowering to calculate the
shapes and dtypes throught a program. For more information about how
to specify a dtype function, see the updated
`docs/adding_a_shape_and_dtype_function.md`.
For those not familiar with how the shape library works, the file
`docs/calculations_lib.md` provides an overview.
Bazel LIT test support was added in https://github.com/llvm/torch-mlir/pull/1585. This PR enables the tests in CI.
```
INFO: Build completed successfully, 254 total actions
@torch-mlir//test/Conversion:TorchToArith/basic.mlir.test PASSED in 0.3s
@torch-mlir//test/Conversion:TorchToLinalg/basic.mlir.test PASSED in 0.5s
@torch-mlir//test/Conversion:TorchToLinalg/elementwise.mlir.test PASSED in 0.3s
@torch-mlir//test/Conversion:TorchToLinalg/flatten.mlir.test PASSED in 0.3s
@torch-mlir//test/Conversion:TorchToLinalg/pooling.mlir.test PASSED in 0.3s
@torch-mlir//test/Conversion:TorchToLinalg/unsqueeze.mlir.test PASSED in 0.2s
@torch-mlir//test/Conversion:TorchToLinalg/view.mlir.test PASSED in 0.3s
@torch-mlir//test/Conversion:TorchToMhlo/basic.mlir.test PASSED in 0.5s
@torch-mlir//test/Conversion:TorchToMhlo/elementwise.mlir.test PASSED in 0.9s
@torch-mlir//test/Conversion:TorchToMhlo/gather.mlir.test PASSED in 0.3s
@torch-mlir//test/Conversion:TorchToMhlo/linear.mlir.test PASSED in 0.6s
@torch-mlir//test/Conversion:TorchToMhlo/pooling.mlir.test PASSED in 0.3s
@torch-mlir//test/Conversion:TorchToMhlo/reduction.mlir.test PASSED in 0.4s
@torch-mlir//test/Conversion:TorchToMhlo/view_like.mlir.test PASSED in 0.6s
@torch-mlir//test/Conversion:TorchToSCF/basic.mlir.test PASSED in 0.2s
@torch-mlir//test/Conversion:TorchToTosa/basic.mlir.test PASSED in 1.1s
@torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/basic.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/error.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/free-functions.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/initializers.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/methods.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/module-uses-error.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/module-uses.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/multiple-instances-error.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/multiple-instances-multiple-module-args.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/multiple-instances.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/submodules.mlir.test PASSED in 0.3s
@torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/visibility.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/adjust-calling-conventions.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/canonicalize.mlir.test PASSED in 0.4s
@torch-mlir//test/Dialect:Torch/decompose-complex-ops-legal.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/decompose-complex-ops.mlir.test PASSED in 0.9s
@torch-mlir//test/Dialect:Torch/drop-shape-calculations.mlir.test PASSED in 0.4s
@torch-mlir//test/Dialect:Torch/erase-module-initializer.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/inline-global-slots-analysis.mlir.test PASSED in 0.3s
@torch-mlir//test/Dialect:Torch/inline-global-slots-transform.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/invalid.mlir.test PASSED in 0.4s
@torch-mlir//test/Dialect:Torch/lower-to-backend-contract-error.mlir.test PASSED in 17.3s
@torch-mlir//test/Dialect:Torch/maximize-value-semantics.mlir.test PASSED in 0.3s
@torch-mlir//test/Dialect:Torch/ops.mlir.test PASSED in 0.3s
@torch-mlir//test/Dialect:Torch/prepare-for-globalize-object-graph.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/promote-types.mlir.test PASSED in 0.3s
@torch-mlir//test/Dialect:Torch/reduce-op-variants-error.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/reduce-op-variants.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/refine-public-return.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:Torch/refine-types-branch.mlir.test PASSED in 0.3s
@torch-mlir//test/Dialect:Torch/refine-types-ops.mlir.test PASSED in 0.6s
@torch-mlir//test/Dialect:Torch/refine-types.mlir.test PASSED in 0.4s
@torch-mlir//test/Dialect:Torch/reify-shape-calculations.mlir.test PASSED in 2.9s
@torch-mlir//test/Dialect:Torch/simplify-shape-calculations.mlir.test PASSED in 0.3s
@torch-mlir//test/Dialect:Torch/torch-function-to-torch-backend-pipeline.mlir.test PASSED in 0.6s
@torch-mlir//test/Dialect:TorchConversion/canonicalize.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:TorchConversion/finalizing-backend-type-conversion.mlir.test PASSED in 0.3s
@torch-mlir//test/Dialect:TorchConversion/func-backend-type-conversion.mlir.test PASSED in 0.2s
@torch-mlir//test/Dialect:TorchConversion/ops.mlir.test PASSED in 0.3s
@torch-mlir//test/Dialect:TorchConversion/verify-linalg-on-tensors-backend-contract.mlir.test PASSED in 0.3s
@torch-mlir//test/Dialect:TorchConversion/verify-tosa-backend-contract.mlir.test PASSED in 0.2s
@torch-mlir//test/RefBackend:insert-rng-globals.mlir.test PASSED in 0.2s
INFO: Build completed successfully, 2[54](https://github.com/sjain-stanford/torch-mlir/actions/runs/3476816449/jobs/5812368489#step:7:55) total actions
@torch-mlir//test/RefBackend:munge-calling-conventions.mlir.test PASSED in 0.2s
Executed [59](https://github.com/sjain-stanford/torch-mlir/actions/runs/3476816449/jobs/5812368489#step:7:60) out of 59 tests: 59 tests pass.
```
GHA workflow: https://github.com/sjain-stanford/torch-mlir/actions/runs/3476816449/jobs/5812368489
I was helping an engineer the other day who was attempting to use the Docker flow for interactive development and ran into countless issues. Add a note that it is not recommended for interactive development, and also move the Docker section down to avoid positioning it as the "default" that people should be using.
Gets both CI and Release builds integrated in one workflow.
Mount ccache and pip cache as required for fast iterative builds
Current Release docker builds still run with root perms, fix it
in the future to run as the same user.
There may be some corner cases left especially when switching
build types etc.
Docker build TEST plan:
tl;dr:
Build everythin: Releases (Python 3.8, 3.9, 3.10) and CIs.
TM_PACKAGES="torch-mlir out-of-tree in-tree"
2.57s user 2.49s system 0% cpu 30:33.11 total
Out of Tree + PyTorch binaries:
Fresh build (purged cache):
TM_PACKAGES="out-of-tree"
0.47s user 0.51s system 0% cpu 5:24.99 total
Incremental with ccache:
TM_PACKAGES="out-of-tree"
0.09s user 0.08s system 0% cpu 34.817 total
Out of Tree + PyTorch from source
Incremental
TM_PACKAGES="out-of-tree" TM_USE_PYTORCH_BINARY=OFF
1.58s user 1.81s system 2% cpu 1:59.61 total
In-Tree + PyTorch binaries:
Fresh build and tests: (purge ccache)
TM_PACKAGES="in-tree"
0.53s user 0.49s system 0% cpu 6:23.35 total
Fresh build/ but with prior ccache
TM_PACKAGES="in-tree"
0.45s user 0.66s system 0% cpu 3:57.47 total
Incremental in-tree with all tests and regression tests
TM_PACKAGES="in-tree"
0.16s user 0.09s system 0% cpu 2:18.52 total
In-Tree + PyTorch from source
Fresh build and tests: (purge ccache)
TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF
2.03s user 2.28s system 0% cpu 11:11.86 total
Fresh build/ but with prior ccache
TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF
1.58s user 1.88s system 1% cpu 4:53.15 total
Incremental in-tree with all tests and regression tests
TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF
1.09s user 1.10s system 1% cpu 3:29.84 total
Incremental without tests
TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF TM_SKIP_TESTS=ON
1.52s user 1.42s system 3% cpu 1:15.82 total
In-tree+out-of-tree + Pytorch Binaries
TM_PACKAGES="out-of-tree in-tree"
0.25s user 0.18s system 0% cpu 3:01.91 total
To clear all artifacts:
rm -rf build build_oot llvm-build libtorch docker_venv
externals/pytorch/build
We use it for more than TorchScript testing now. This is a purely
mechanical change to adjust some file paths to remove "torchscript".
The most perceptible change here is that now e2e tests are run with
```
./tools/e2e_test.sh
instead of:
./tools/torchscript_e2e_test.sh
```