Commit Graph

93 Commits (41aa562fb48aa11f1d30badf278cc9fb00ff80bd)

Author SHA1 Message Date
Ashay Rane 606f4d2c0e
build: streamline options for enabling LTC and MHLO (#1221) 2022-08-12 23:49:28 -07:00
Sambhav Jain 34478ab1c7
[Build] Add concurrency groups to address long queue times (#1219)
We're seeing large CI queue times ([example](https://discord.com/channels/636084430946959380/742573221882364009/1007631811184164944)) especially with MacOS VMs on GHA. Part of the problem is follow-on commits to the same branch which trigger new runs while the previous runs are still in-progress, hogging on the scarce VMs.

This PR adds concurrency groups to the GHA workflow which ensures that only a single job or workflow using the same concurrency group will run at a time. This would cancel any in-progress jobs in the same github workflow and github ref (e.g. `refs/heads/main` or `refs/pull/<pr_number>/merge`).

As discussed on discord [thread](https://discord.com/channels/636084430946959380/1007787336848912386/1007787338895740928), once this lands we may have to closely monitor the workflows to see this didn't introduce unintended consequences. If so, we could either revert, or decide to selectively cancel particular runs (e.g. macos only which is the main bottleneck right now) instead of entire workflow.

This will also require some expectation management. As in, if you see an  on the main branch, it may not necessarily mean things broke, it could mean the run was killed by a more recent run. Making it a bit harder to traceback a failure to a commit in a sequence of commits (requiring to run those builds again).

Thanks @powderluv for the proposal and pointer to this! It should help with the scarce VMs on GHA and save on queue time. 

References:
* https://docs.github.com/en/actions/using-jobs/using-concurrency#example-only-cancel-in-progress-jobs-or-runs-for-the-current-workflow
* https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#example-only-cancel-in-progress-jobs-or-runs-for-the-current-workflow
2022-08-12 17:38:48 -07:00
Ashay Rane 1581d6a84c
build: fix typo in path (#1218)
When we renamed the directory containing submodules from `external` to
`externals`, we accidentally left the original name in the Github
workflow.  This patch fixes the problem.
2022-08-12 15:38:25 -07:00
Sambhav Jain aed0ec3a2c
Merge matrix runs to fail fast globally (#1216)
My earlier[ PR](https://github.com/llvm/torch-mlir/pull/1213) had (among other things) decoupled ubuntu and macos builds into separate matrix runs. This is not working well due to limited number of MacOS GHA VMs causing long queue times and backlog. There are two reasons causing this backlog: 

1. macos arm64 builds with pytorch source are getting erratically cancelled due to resource / network constraints. This is addressed with this: https://github.com/llvm/torch-mlir/pull/1215

> "macos-arm64 (in-tree, OFF) The hosted runner: GitHub Actions 3 lost communication with the server. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error."

2. macos runs don't fail-fast when ubuntu runs fail due to being in separate matrix setups. This PR couples them again.
2022-08-12 11:30:09 -07:00
Sambhav Jain b8bd0a46cc
use pytorch binary for macos-arm64 builds (#1215) 2022-08-12 06:33:57 -07:00
Sambhav Jain f00ca91db0
Simplify matrix configuration for CI workflows (#1213)
Addresses https://github.com/llvm/torch-mlir/issues/1207. 

#### Provisioned jobs:
```
# ubuntu - x86_64 - llvm in-tree     - pytorch binary - build+test    # most used dev flow and fastest signal
# ubuntu - x86_64 - llvm out-of-tree - pytorch source - build+test    # most elaborate build
# macos  - arm64  - llvm in-tree     - pytorch source - build only    # cross compile, can't test arm64
```

#### Main changes
- [x] Spawn macos builds from a separate matrix (in the same workflow). It made sense to do this as they are fairly different from ubuntu (cross compile, use a different cmake configuration). This simplifies the matrix configuration and exclusions quite a bit, and makes the workflow a bit more tractable and maintenance friendly.
- [x] Remove the submodule md5sum step for ccache config. This was [broken](https://github.com/llvm/torch-mlir/runs/7779288734?check_suite_focus=true#step:3:145) for a while now.
- [x] Removes unused matrix options - `os`, `targetarch`, `python-version`, `llvmtype`.
- [x] Address ZSTD [comment](https://github.com/llvm/torch-mlir/pull/1204#discussion_r942349282) on @powderluv's cross compile [PR](https://github.com/llvm/torch-mlir/pull/1204). 

#### Further improvements (to be addressed in follow-on):
* ubuntu-x86_64 out-of-tree integration tests fail ([error](https://github.com/sjain-stanford/torch-mlir/runs/7781264029?check_suite_focus=true)); only run unit tests for now (tests are excluded in current CI too)

#### Passing workflow:
https://github.com/sjain-stanford/torch-mlir/actions/runs/2840676309
![image](https://user-images.githubusercontent.com/19234106/184194535-f3807991-401a-4cb9-b030-0ee8c334eba3.png)
2022-08-11 16:35:15 -07:00
powderluv 2342456356
mac m1 cross compile (#1204)
* mac m1 cross compile

Add support for M1 cross compile

* Remove redundant ExecutionEngine

It is registered as part of RegisterEverything

* nuke non-universal zstd

disable LTC
2022-08-10 08:48:39 -07:00
powderluv 9cf0b6e8ff
Disable out-of-tree and PyTorch binary (#1206) 2022-08-09 18:18:12 -07:00
Sambhav Jain d41c7becf5
[Bazel] Allow workflow_dispatch manual trigger on bazel workflow (#1203)
At the moment we don't gate torch-mlir PRs with bazel builds. This means bazel builds don't get run on open PRs, and so there's no good way to validate a fix PR which is meant to fix a broken bazel build. This option allows a bazel build to be manually triggered as needed on open PRs.
2022-08-09 13:28:21 -07:00
Sambhav Jain b696362b7d
Enable OOT builds in CI (#1188) 2022-08-09 12:13:16 -07:00
Henry Tu 3e97a33c80
Revert "Reenable LTC in out-of-tree build (#1177)" (#1183)
This reverts commit f85ae9c685.
2022-08-08 18:58:35 -07:00
Henry Tu f85ae9c685
Reenable LTC in out-of-tree build (#1177) 2022-08-08 17:35:22 -04:00
Henry Tu e322f6a878
Update LTC CMake hack documentation (#1155)
* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update buildAndTest.yml

* Update setup.py

* Address review comments
2022-08-05 14:12:20 -04:00
powderluv 37a229cffc
Update buildAndTest.yml (#1145) 2022-08-03 12:50:54 -07:00
powderluv 0d25b6f10e
Fix cache-suffix name bug (#1138)
This should enabling better caching of builds.
2022-08-03 07:53:01 -07:00
Henry Tu 2c3b3606d0 Resolve remaining LTC CI failures (#1110)
* Replace CHECK_EQ with TORCH_CHECK_EQ

* Check value of TORCH_MLIR_USE_INSTALLED_PYTORCH during LTC build

* Update LTC XFAIL with NewZerosModule ops

* Explicitly blacklist _like ops

* Automatically blacklist new_/_like ops

* Prune away unused Python dependencies from LTC

* Add flag to disable LTC

* Autogen dummy _REFERENCE_LAZY_BACKEND library when LTC is disabled

* Implement compute_shape_var

* Removed Var tests from XFAIL Set

* XFAIL tests using _local_scalar_dense or index.Tensor

* Add StdDim tests to XFAIL set

* Autogen aten::cat
2022-07-30 09:40:02 -04:00
Antonio Kim de6c135dc3 Fix LTC autogen for CI with nightly PyTorch
- Update llvm-project pin to match main
2022-07-30 09:40:02 -04:00
Henry Tu dfcc26556a Added e2e LTC tests (#916)
* Added e2e LTC Torch MLIR tests

* Fix seed for reproducability

* Check if computation is None before getting debug string

* Updated unit tests, and added numeric tests

* Print name of the model layer that fails numeric validation

* Run LTC e2e test with CI/CD

* Set seed in main function, instead of beginning of execution

* Add comment to specify number of digits of precision

* Fixed typo

* Remove tests for LTC example models

* Added LTC option to torchscript e2e

* Implement compile and run for LTC e2e test

* xfail all tests that use ops that aren't currently supported
2022-07-30 09:40:02 -04:00
Jae Hoon (Antonio) Kim 2f22e2ef40 Add initial LTC backend (#610)
* Add initial LTC backend skeleton

* Disable CI build and move TorchMLIRPyTorch.cmake
2022-07-30 09:40:02 -04:00
powderluv db4a6991a0
buildAndTest.yml for matrix builds (#1098)
* Update buildAndTest.yml

test with fast-fail matrix builds

* Remove redundant and statement

* Downgrade to 20.04

Until upstream PyTorch FBGEMM is fixed to compile with clang+14+ https://github.com/pytorch/pytorch/pull/82396

* Update buildAndTest.yml

run tests on only the binary config.
2022-07-29 10:52:46 -07:00
powderluv 31fd812acf
Add linux and macOS source builds in CI (#1070)
This enables building Pytorch from source in the CI.
The build should mostly hit the ccache.
Release builds will follow once we have some runtime on the CI.
2022-07-21 14:16:03 -07:00
Ziheng Jiang c61c99e887
[MHLO] Init MHLO integration. (#1083)
Co-authored-by: Bairen Yi <yibairen.byron@bytedance.com>
Co-authored-by: Jiawei Wu <xremold@gmail.com>
Co-authored-by: Tianyou Guo <tianyou.gty@alibaba-inc.com>
Co-authored-by: Xu Yan <yancey.yx@alibaba-inc.com>
Co-authored-by: Ziheng Jiang <ziheng.jiang@bytedance.com>
2022-07-20 16:18:16 -07:00
powderluv a1947c7bd1
Update oneshotSnapshotPackage.yml 2022-07-02 10:00:52 -07:00
powderluv 2f0b1d0b08
bump macOS builds to Python 3.10 2022-06-04 22:44:32 -07:00
powderluv b14c5d619d
Build the nightly package only once a day/night
No need to be shipping two releases a day, our supported packages and binaries have grown.
2022-06-04 22:40:53 -07:00
Maksim Levental cec5aeedb0
add ci tests (#754) 2022-05-25 14:59:59 -05:00
powderluv cfc1a6515c
build only Python3.9 to avoid timeout
GH runner times out when building 3.9 and 3.10 on macOS.
2022-05-13 00:07:55 -07:00
powderluv 2877a37ac6
Update buildRelease.yml
Fix filename changed missed in Code Review.
2022-04-25 17:00:31 -07:00
powderluv 6d09c98b2f
Fix version information in Release builds (#788)
env vars seems to be lost in manylinux docker.
Use a version file like IREE does.
2022-04-25 14:13:17 -07:00
Ahmed S. Taei 6b3d0b7e7a
Add bazel build support (2/N) (#744)
- Add bazel GitHub actions.
2022-04-25 12:33:15 -07:00
powderluv 0f751498a7
Update releaseSnapshotPackage.yml 2022-04-22 15:38:36 -07:00
powderluv d789aee11e
Only upload torch*.whl (#786)
only upload torch*.whl to unblock OSX build failures during upload. We have to move to svenstaro/upload-release-action
2022-04-22 15:17:09 -07:00
powderluv cbf158f069
Update buildRelease.yml
Update artifact directory to ./build_tools/python_deploy/wheelhouse/*.whl
2022-04-21 19:57:27 -07:00
powderluv 9f2184da98
Update oneshotSnapshotPackage.yml
remove now deprecated inputs to build and test
2022-04-21 19:12:42 -07:00
powderluv 8003b92fa7
Delete releasePackage.yml 2022-04-21 18:54:01 -07:00
powderluv c1026fa95b
Switch to using the new Release builds (#780) 2022-04-21 18:46:34 -07:00
powderluv 4ef61aa27f
Minor buildsystem fixes (#778)
Sets up auto-pinning of latest torch-nightly
2022-04-21 15:53:00 -07:00
powderluv 0257d91a21
Update buildManylinux.yml
use sudo for mac OS
2022-04-21 11:06:02 -07:00
powderluv 299c1bbe6d
Update buildManylinux.yml
fix build naming
2022-04-21 10:55:40 -07:00
powderluv b03eac4224
Enable OSX (Intel, Apple Silicon Builds) (#776)
Update pinned pytorch version. Will submit a follow on PR to bump.
Also update artifacts directory
2022-04-21 10:47:28 -07:00
powderluv cc3a4a58ef
Add oneshot release snapshot for test/ondemand (#768)
* Add oneshot release snapshot for test/ondemand

Add some build scripts to test new release flow based on IREE.
Wont affect current builds, once this works well we can plumb it
in.

Build with manylinux docker

* Fixes a few issues found when debugging powderluv's setup.

* It is optional to link against Python3_LIBRARIES. Check that and don't do it if they don't exist for this config.
* Clean and auditwheel need to operate on sanitized package names. So "torch_mlir" vs "torch-mlir".
* Adds a pyproject.toml file that pins the build dependencies needed to detect both Torch and Python (the MLIR Python build was failing to detect because Numpy wasn't in the pip venv).
* Commented out auditwheel: These wheels are not PyPi compliant since they weak link to libtorch at runtime. However, they should be fine to deploy to users.
* Adds the --extra-index-url to the pip wheel command, allowing PyTorch to be found.
* Hack setup.py to remove the _mlir_libs dir before building. This keeps back-to-back versions from accumulating in the wheels for subsequent versions. IREE has a more principled way of doing this, but what I have here should work.

Co-authored-by: Stella Laurenzo <stellaraccident@gmail.com>
2022-04-21 02:19:12 -07:00
Clément Fournier 578d0ec292 Review comments 2022-04-19 15:11:17 -07:00
Clément Fournier 3e0c1cf6af Change cache suffix to not invalidate existing caches 2022-04-19 15:11:17 -07:00
Clément Fournier 566650c5ae Use distinct ccaches
Since they run in distinct jobs, using the same ccache would
cause one job to overwrite the cache of the other.

See https://github.com/ljfitz/torch-mlir/pull/16 for a proof
that this works. The first build takes a long time but ccache
takes over in the dummy commit.
2022-04-19 15:11:17 -07:00
Clément Fournier f9d5201ae6 address PR review 2022-04-19 15:11:17 -07:00
Clément Fournier 4a2535a86d Add build-out-of-tree job 2022-04-19 15:11:17 -07:00
Clément Fournier 37087ccd5f Refactor current CI workflow into composable jobs 2022-04-19 15:11:17 -07:00
Sean Silva 8250f50c81 Attempt to set Python package version to the snapshot identifier.
This should make the releases sort properly when `pip`'s
`-f`/`--find-links` argument is used.
2022-03-30 17:54:11 +00:00
Sean Silva 4f61b1fce1 Try to get the release packages publishing again.
As per the docs on:
https://github.com/eregon/publish-release

> Note that the release must *not be marked as prerelease* for this to work.

For some reason, we were marking the release as pre-release before and
this was working, but the docs here seem pretty clear, so I'm going to
try it.
2022-03-30 00:35:02 +00:00
Sean Silva 3a96078571 Pin the CI to the latest working PyTorch.
I am investigating the breakage.

Also, fix "externals" rename in setup.py and some cases where we weren't
using `requirements.txt` consistently.

Also, fix a case where the packaging script would get confused due to
".." in the path name.
2022-03-29 15:02:17 -07:00