When we renamed the directory containing submodules from `external` to
`externals`, we accidentally left the original name in the Github
workflow. This patch fixes the problem.
My earlier[ PR](https://github.com/llvm/torch-mlir/pull/1213) had (among other things) decoupled ubuntu and macos builds into separate matrix runs. This is not working well due to limited number of MacOS GHA VMs causing long queue times and backlog. There are two reasons causing this backlog:
1. macos arm64 builds with pytorch source are getting erratically cancelled due to resource / network constraints. This is addressed with this: https://github.com/llvm/torch-mlir/pull/1215
> "macos-arm64 (in-tree, OFF) The hosted runner: GitHub Actions 3 lost communication with the server. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error."
2. macos runs don't fail-fast when ubuntu runs fail due to being in separate matrix setups. This PR couples them again.
* mac m1 cross compile
Add support for M1 cross compile
* Remove redundant ExecutionEngine
It is registered as part of RegisterEverything
* nuke non-universal zstd
disable LTC
At the moment we don't gate torch-mlir PRs with bazel builds. This means bazel builds don't get run on open PRs, and so there's no good way to validate a fix PR which is meant to fix a broken bazel build. This option allows a bazel build to be manually triggered as needed on open PRs.
* Replace CHECK_EQ with TORCH_CHECK_EQ
* Check value of TORCH_MLIR_USE_INSTALLED_PYTORCH during LTC build
* Update LTC XFAIL with NewZerosModule ops
* Explicitly blacklist _like ops
* Automatically blacklist new_/_like ops
* Prune away unused Python dependencies from LTC
* Add flag to disable LTC
* Autogen dummy _REFERENCE_LAZY_BACKEND library when LTC is disabled
* Implement compute_shape_var
* Removed Var tests from XFAIL Set
* XFAIL tests using _local_scalar_dense or index.Tensor
* Add StdDim tests to XFAIL set
* Autogen aten::cat
* Added e2e LTC Torch MLIR tests
* Fix seed for reproducability
* Check if computation is None before getting debug string
* Updated unit tests, and added numeric tests
* Print name of the model layer that fails numeric validation
* Run LTC e2e test with CI/CD
* Set seed in main function, instead of beginning of execution
* Add comment to specify number of digits of precision
* Fixed typo
* Remove tests for LTC example models
* Added LTC option to torchscript e2e
* Implement compile and run for LTC e2e test
* xfail all tests that use ops that aren't currently supported
* Update buildAndTest.yml
test with fast-fail matrix builds
* Remove redundant and statement
* Downgrade to 20.04
Until upstream PyTorch FBGEMM is fixed to compile with clang+14+ https://github.com/pytorch/pytorch/pull/82396
* Update buildAndTest.yml
run tests on only the binary config.
This enables building Pytorch from source in the CI.
The build should mostly hit the ccache.
Release builds will follow once we have some runtime on the CI.
* Add oneshot release snapshot for test/ondemand
Add some build scripts to test new release flow based on IREE.
Wont affect current builds, once this works well we can plumb it
in.
Build with manylinux docker
* Fixes a few issues found when debugging powderluv's setup.
* It is optional to link against Python3_LIBRARIES. Check that and don't do it if they don't exist for this config.
* Clean and auditwheel need to operate on sanitized package names. So "torch_mlir" vs "torch-mlir".
* Adds a pyproject.toml file that pins the build dependencies needed to detect both Torch and Python (the MLIR Python build was failing to detect because Numpy wasn't in the pip venv).
* Commented out auditwheel: These wheels are not PyPi compliant since they weak link to libtorch at runtime. However, they should be fine to deploy to users.
* Adds the --extra-index-url to the pip wheel command, allowing PyTorch to be found.
* Hack setup.py to remove the _mlir_libs dir before building. This keeps back-to-back versions from accumulating in the wheels for subsequent versions. IREE has a more principled way of doing this, but what I have here should work.
Co-authored-by: Stella Laurenzo <stellaraccident@gmail.com>
Since they run in distinct jobs, using the same ccache would
cause one job to overwrite the cache of the other.
See https://github.com/ljfitz/torch-mlir/pull/16 for a proof
that this works. The first build takes a long time but ccache
takes over in the dummy commit.
As per the docs on:
https://github.com/eregon/publish-release
> Note that the release must *not be marked as prerelease* for this to work.
For some reason, we were marking the release as pre-release before and
this was working, but the docs here seem pretty clear, so I'm going to
try it.
I am investigating the breakage.
Also, fix "externals" rename in setup.py and some cases where we weren't
using `requirements.txt` consistently.
Also, fix a case where the packaging script would get confused due to
".." in the path name.