torch-mlir

Commit Graph

Author	SHA1	Message	Date
Stella Laurenzo	4a4d80a6ad	[ci] Add lint job and enable yaml linting of GH files. (#2819 )	2024-01-27 15:48:06 -08:00
Sambhav Jain	49fdc1a8a6	Add bazel targets for TorchOnnxToTorch conversion passes (#2596 ) Adapts to the TorchOnnxToTorch changes from https://github.com/llvm/torch-mlir/pull/2585. Also restores bazel builds in post-merge CI that was disabled in `2148c4cd0d`. Bazel workflow: https://github.com/sjain-stanford/torch-mlir/actions/runs/7023912962	2023-11-28 13:06:35 -08:00
Stella Laurenzo	53fc995639	Run CI on all main/postsubmit commits. Prior to this, the concurrency rules for presubmits (which cancel eagerly) were being applied to main. The result was that landing a second patch would cancel the CI on the one prior.	2023-11-22 18:05:18 -08:00
Stella Laurenzo	2148c4cd0d	Temporarily disable bazel build until fixed.	2023-11-22 18:00:39 -08:00
Sambhav Jain	facce24ae3	[Bazel] Fix broken Bazel build (#2252 ) Bazel GHA run: https://github.com/sjain-stanford/torch-mlir/actions/runs/5408580473	2023-06-29 08:45:35 -07:00
powderluv	0497f0b08d	Revert "CI: drop deletion of workspace and limit submodule fetch concurrency (#1921 )" (#2007 ) This reverts commit `07f5f042c7`.	2023-04-06 10:36:30 -07:00
Ashay Rane	07f5f042c7	CI: drop deletion of workspace and limit submodule fetch concurrency (#1921 ) Despite using sudo to delete the workspace directory, we still occasionally run into checkout errors. This patch thus drops the deletion of the workspace prior to checkout. It also restricts the number of parallel jobs in the submodule fetch step to just one, to try and resolve the checkout issue ("index.lock: File exists.").	2023-04-04 12:58:52 -05:00
Ashay Rane	987d5ab335	CI: use `sudo` to remove Docker-created files (#1905 )	2023-02-27 17:44:50 -06:00
Ashay Rane	ea00371d85	CI: clear workspace directory before checkout (#1900 ) We have recently started seeing errors like: ``` Synchronizing submodule url for 'externals/llvm-project' Synchronizing submodule url for 'externals/mlir-hlo' /usr/bin/git -c protocol.version=2 submodule update --init --force --depth=1 Error: fatal: Unable to create '/home/anush/actions-runner/_work/torch-mlir/torch-mlir/.git/modules/externals/llvm-project/index.lock': File exists. ``` As a workaround, this patch removes the workspace directory before the checkout step.	2023-02-24 14:44:35 -06:00
Sambhav Jain	109c91ae9b	[CI] Verify bazel buildifier is run and changes committed (#1700 ) Ensures the buildifier (linter for bazel build files) is run and changes are pushed.	2022-12-08 15:56:57 -08:00
Sambhav Jain	ba5b90ee27	Enable bazel LIT tests in CI (#1596 ) Bazel LIT test support was added in https://github.com/llvm/torch-mlir/pull/1585. This PR enables the tests in CI. ``` INFO: Build completed successfully, 254 total actions @torch-mlir//test/Conversion:TorchToArith/basic.mlir.test PASSED in 0.3s @torch-mlir//test/Conversion:TorchToLinalg/basic.mlir.test PASSED in 0.5s @torch-mlir//test/Conversion:TorchToLinalg/elementwise.mlir.test PASSED in 0.3s @torch-mlir//test/Conversion:TorchToLinalg/flatten.mlir.test PASSED in 0.3s @torch-mlir//test/Conversion:TorchToLinalg/pooling.mlir.test PASSED in 0.3s @torch-mlir//test/Conversion:TorchToLinalg/unsqueeze.mlir.test PASSED in 0.2s @torch-mlir//test/Conversion:TorchToLinalg/view.mlir.test PASSED in 0.3s @torch-mlir//test/Conversion:TorchToMhlo/basic.mlir.test PASSED in 0.5s @torch-mlir//test/Conversion:TorchToMhlo/elementwise.mlir.test PASSED in 0.9s @torch-mlir//test/Conversion:TorchToMhlo/gather.mlir.test PASSED in 0.3s @torch-mlir//test/Conversion:TorchToMhlo/linear.mlir.test PASSED in 0.6s @torch-mlir//test/Conversion:TorchToMhlo/pooling.mlir.test PASSED in 0.3s @torch-mlir//test/Conversion:TorchToMhlo/reduction.mlir.test PASSED in 0.4s @torch-mlir//test/Conversion:TorchToMhlo/view_like.mlir.test PASSED in 0.6s @torch-mlir//test/Conversion:TorchToSCF/basic.mlir.test PASSED in 0.2s @torch-mlir//test/Conversion:TorchToTosa/basic.mlir.test PASSED in 1.1s @torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/basic.mlir.test PASSED in 0.2s @torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/error.mlir.test PASSED in 0.2s @torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/free-functions.mlir.test PASSED in 0.2s @torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/initializers.mlir.test PASSED in 0.2s @torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/methods.mlir.test PASSED in 0.2s @torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/module-uses-error.mlir.test PASSED in 0.2s @torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/module-uses.mlir.test PASSED in 0.2s @torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/multiple-instances-error.mlir.test PASSED in 0.2s @torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/multiple-instances-multiple-module-args.mlir.test PASSED in 0.2s @torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/multiple-instances.mlir.test PASSED in 0.2s @torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/submodules.mlir.test PASSED in 0.3s @torch-mlir//test/Dialect:Torch/GlobalizeObjectGraph/visibility.mlir.test PASSED in 0.2s @torch-mlir//test/Dialect:Torch/adjust-calling-conventions.mlir.test PASSED in 0.2s @torch-mlir//test/Dialect:Torch/canonicalize.mlir.test PASSED in 0.4s @torch-mlir//test/Dialect:Torch/decompose-complex-ops-legal.mlir.test PASSED in 0.2s @torch-mlir//test/Dialect:Torch/decompose-complex-ops.mlir.test PASSED in 0.9s @torch-mlir//test/Dialect:Torch/drop-shape-calculations.mlir.test PASSED in 0.4s @torch-mlir//test/Dialect:Torch/erase-module-initializer.mlir.test PASSED in 0.2s @torch-mlir//test/Dialect:Torch/inline-global-slots-analysis.mlir.test PASSED in 0.3s @torch-mlir//test/Dialect:Torch/inline-global-slots-transform.mlir.test PASSED in 0.2s @torch-mlir//test/Dialect:Torch/invalid.mlir.test PASSED in 0.4s @torch-mlir//test/Dialect:Torch/lower-to-backend-contract-error.mlir.test PASSED in 17.3s @torch-mlir//test/Dialect:Torch/maximize-value-semantics.mlir.test PASSED in 0.3s @torch-mlir//test/Dialect:Torch/ops.mlir.test PASSED in 0.3s @torch-mlir//test/Dialect:Torch/prepare-for-globalize-object-graph.mlir.test PASSED in 0.2s @torch-mlir//test/Dialect:Torch/promote-types.mlir.test PASSED in 0.3s @torch-mlir//test/Dialect:Torch/reduce-op-variants-error.mlir.test PASSED in 0.2s @torch-mlir//test/Dialect:Torch/reduce-op-variants.mlir.test PASSED in 0.2s @torch-mlir//test/Dialect:Torch/refine-public-return.mlir.test PASSED in 0.2s @torch-mlir//test/Dialect:Torch/refine-types-branch.mlir.test PASSED in 0.3s @torch-mlir//test/Dialect:Torch/refine-types-ops.mlir.test PASSED in 0.6s @torch-mlir//test/Dialect:Torch/refine-types.mlir.test PASSED in 0.4s @torch-mlir//test/Dialect:Torch/reify-shape-calculations.mlir.test PASSED in 2.9s @torch-mlir//test/Dialect:Torch/simplify-shape-calculations.mlir.test PASSED in 0.3s @torch-mlir//test/Dialect:Torch/torch-function-to-torch-backend-pipeline.mlir.test PASSED in 0.6s @torch-mlir//test/Dialect:TorchConversion/canonicalize.mlir.test PASSED in 0.2s @torch-mlir//test/Dialect:TorchConversion/finalizing-backend-type-conversion.mlir.test PASSED in 0.3s @torch-mlir//test/Dialect:TorchConversion/func-backend-type-conversion.mlir.test PASSED in 0.2s @torch-mlir//test/Dialect:TorchConversion/ops.mlir.test PASSED in 0.3s @torch-mlir//test/Dialect:TorchConversion/verify-linalg-on-tensors-backend-contract.mlir.test PASSED in 0.3s @torch-mlir//test/Dialect:TorchConversion/verify-tosa-backend-contract.mlir.test PASSED in 0.2s @torch-mlir//test/RefBackend:insert-rng-globals.mlir.test PASSED in 0.2s INFO: Build completed successfully, 2[54](https://github.com/sjain-stanford/torch-mlir/actions/runs/3476816449/jobs/5812368489#step:7:55) total actions @torch-mlir//test/RefBackend:munge-calling-conventions.mlir.test PASSED in 0.2s Executed [59](https://github.com/sjain-stanford/torch-mlir/actions/runs/3476816449/jobs/5812368489#step:7:60) out of 59 tests: 59 tests pass. ``` GHA workflow: https://github.com/sjain-stanford/torch-mlir/actions/runs/3476816449/jobs/5812368489	2022-11-16 11:59:33 -08:00
Sambhav Jain	4aa1e90b34	Fix cache bug with Bazel builds in CI (#1593 ) Some time ago, bazel builds in CI were being sped up fine with caching. However, over time the cache got stale because `actions/cache@v3` apparently doesn't update caches when it "hits" unless it is configured to do so specifically. This requires using a uniqued per-commit `key` (to force it to update cache after each successful run) and a relaxed `restore-keys` which is not unique per-commit so newer commits can restore from the nearest hit. Test GHA run 1 (no cache hit): [1h 1m 52s](https://github.com/sjain-stanford/torch-mlir/actions/runs/3474770334/usage) Test GHA run 2 (cache hit, same commit): [5m 14s](https://github.com/sjain-stanford/torch-mlir/actions/runs/3475132135/usage) Test GHA run 3 (cache hit, different commit): [6m 6s](https://github.com/sjain-stanford/torch-mlir/actions/runs/3475161009/usage)	2022-11-15 18:48:31 -08:00
Sambhav Jain	99ec6039f6	Fix bazel CI (#1591 ) I accidentally broke bazel CI by forgetting to update the GHA workflow in my [previous PR](https://github.com/llvm/torch-mlir/pull/1587). This should get it back to green, my apologies. Qualifying CI run: https://github.com/sjain-stanford/torch-mlir/actions/runs/3472523982	2022-11-15 09:51:52 -08:00
Ashay Rane	f847642495	CI script improvements (#1547 ) * ci: update versions of external actions Node.js 12 actions are deprecated and will eventually go away, so this patch bumps the old actions to their latest versions that use Node.js 16. * ci: replace deprecated action with bash commands The llvm/actions/install-ninja action uses Node.js 12, which is deprecated. Since that action is not updated to work with Node.js 16, this patch replaces that action with equivalent bash commands to install Ninja. * ci: use smaller ccache artifacts to reduce evictions Over time, our ccache sizes have grown quite large (some as large as 1.3 GB), which results in us routinely exceeding GitHub's limits, thus triggering frequent cache evictions. As a result, cache downloads and uploads take unnecessary long, in addition to fewer cache entries being available. Based on experiments on a clean cache state, it appears that we need less than 300 MB of (compressed) ccache artifacts for each build type. Anything larger than that will accrue changes from the past that aren't needed. To alleviate the cache burden, this patch sets the maximum ccache size to be 300 MB. This change should not affect the success or failure of our builds. I will monitor the build times to check whether this change causes any performance degradation. * ci: use consistent platform identifiers Prior to this patch, some of our builds ran on `ubuntu-latest`, while some others ran on `ubuntu-20.04` and others ran on `ubuntu-22.04`, with similar situations for macOS and windows. This patch instead sets all Linux builds to run on `ubuntu-latest`, all macOS builds to run on `macos-latest`, and all Windows builds to run on `windows-latest`, to make debugging future CI failures a little easier.	2022-11-02 21:37:01 -05:00
Sambhav Jain	114f48e96c	[Bazel] Check cache directory exists before changing owners (#1241 ) This fixes a seeding issue with the [previous PR](https://github.com/llvm/torch-mlir/pull/1240) where bazel build's GHA cache is not present to begin with and one of the commands (chown) fails on it. Should get the Bazel build back to green.	2022-08-17 17:04:50 -07:00
Sambhav Jain	9c8b962720	Dockerize and Cache Bazel {Local, CI} Builds (#1240 ) This PR adds: - A minimal docker wrapper to the bazel GHA workflow to make it reproducible locally - Bazel cache to speed up GHA workflows (down to ~5 minutes from ~40+minutes) This is a no-op for non-bazel workflows and an incremental improvement.	2022-08-17 12:46:17 -07:00
Sambhav Jain	34478ab1c7	[Build] Add concurrency groups to address long queue times (#1219 ) We're seeing large CI queue times ([example](https://discord.com/channels/636084430946959380/742573221882364009/1007631811184164944)) especially with MacOS VMs on GHA. Part of the problem is follow-on commits to the same branch which trigger new runs while the previous runs are still in-progress, hogging on the scarce VMs. This PR adds concurrency groups to the GHA workflow which ensures that only a single job or workflow using the same concurrency group will run at a time. This would cancel any in-progress jobs in the same github workflow and github ref (e.g. `refs/heads/main` or `refs/pull/<pr_number>/merge`). As discussed on discord [thread](https://discord.com/channels/636084430946959380/1007787336848912386/1007787338895740928), once this lands we may have to closely monitor the workflows to see this didn't introduce unintended consequences. If so, we could either revert, or decide to selectively cancel particular runs (e.g. macos only which is the main bottleneck right now) instead of entire workflow. This will also require some expectation management. As in, if you see an ❌ on the main branch, it may not necessarily mean things broke, it could mean the run was killed by a more recent run. Making it a bit harder to traceback a failure to a commit in a sequence of commits (requiring to run those builds again). Thanks @powderluv for the proposal and pointer to this! It should help with the scarce VMs on GHA and save on queue time. References: * https://docs.github.com/en/actions/using-jobs/using-concurrency#example-only-cancel-in-progress-jobs-or-runs-for-the-current-workflow * https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#example-only-cancel-in-progress-jobs-or-runs-for-the-current-workflow	2022-08-12 17:38:48 -07:00
Sambhav Jain	f00ca91db0	Simplify matrix configuration for CI workflows (#1213 ) Addresses https://github.com/llvm/torch-mlir/issues/1207. #### Provisioned jobs: ``` # ubuntu - x86_64 - llvm in-tree - pytorch binary - build+test # most used dev flow and fastest signal # ubuntu - x86_64 - llvm out-of-tree - pytorch source - build+test # most elaborate build # macos - arm64 - llvm in-tree - pytorch source - build only # cross compile, can't test arm64 ``` #### Main changes - [x] Spawn macos builds from a separate matrix (in the same workflow). It made sense to do this as they are fairly different from ubuntu (cross compile, use a different cmake configuration). This simplifies the matrix configuration and exclusions quite a bit, and makes the workflow a bit more tractable and maintenance friendly. - [x] Remove the submodule md5sum step for ccache config. This was [broken](https://github.com/llvm/torch-mlir/runs/7779288734?check_suite_focus=true#step:3:145) for a while now. - [x] Removes unused matrix options - `os`, `targetarch`, `python-version`, `llvmtype`. - [x] Address ZSTD [comment](https://github.com/llvm/torch-mlir/pull/1204#discussion_r942349282) on @powderluv's cross compile [PR](https://github.com/llvm/torch-mlir/pull/1204). #### Further improvements (to be addressed in follow-on): * ubuntu-x86_64 out-of-tree integration tests fail ([error](https://github.com/sjain-stanford/torch-mlir/runs/7781264029?check_suite_focus=true)); only run unit tests for now (tests are excluded in current CI too) #### Passing workflow: https://github.com/sjain-stanford/torch-mlir/actions/runs/2840676309 ![image](https://user-images.githubusercontent.com/19234106/184194535-f3807991-401a-4cb9-b030-0ee8c334eba3.png)	2022-08-11 16:35:15 -07:00
Sambhav Jain	d41c7becf5	[Bazel] Allow workflow_dispatch manual trigger on bazel workflow (#1203 ) At the moment we don't gate torch-mlir PRs with bazel builds. This means bazel builds don't get run on open PRs, and so there's no good way to validate a fix PR which is meant to fix a broken bazel build. This option allows a bazel build to be manually triggered as needed on open PRs.	2022-08-09 13:28:21 -07:00
Ahmed S. Taei	6b3d0b7e7a	Add bazel build support (2/N) (#744 ) - Add bazel GitHub actions.	2022-04-25 12:33:15 -07:00

20 Commits (c0ec22df2ca9bc1e0d645304e8e46cff84728c45)