Commit Graph

176 Commits (b75c208f4e3343c997bc995a6284ffdd92476851)

Author SHA1 Message Date
Sambhav Jain 4aa1e90b34
Fix cache bug with Bazel builds in CI (#1593)
Some time ago, bazel builds in CI were being sped up fine with caching. However, over time the cache got stale because `actions/cache@v3` apparently doesn't update caches when it "hits" unless it is configured to do so specifically. This requires using a uniqued per-commit `key` (to force it to update cache after each successful run) and a relaxed `restore-keys` which is not unique per-commit so newer commits can restore from the nearest hit.

Test GHA run 1 (no cache hit): [1h 1m 52s](https://github.com/sjain-stanford/torch-mlir/actions/runs/3474770334/usage)
Test GHA run 2 (cache hit, same commit): [5m 14s](https://github.com/sjain-stanford/torch-mlir/actions/runs/3475132135/usage)
Test GHA run 3 (cache hit, different commit): [6m 6s](https://github.com/sjain-stanford/torch-mlir/actions/runs/3475161009/usage)
2022-11-15 18:48:31 -08:00
Sambhav Jain 99ec6039f6
Fix bazel CI (#1591)
I accidentally broke bazel CI by forgetting to update the GHA workflow in my [previous PR](https://github.com/llvm/torch-mlir/pull/1587). This should get it back to green, my apologies.

Qualifying CI run: https://github.com/sjain-stanford/torch-mlir/actions/runs/3472523982
2022-11-15 09:51:52 -08:00
Ashay Rane f1ef5681cc
build: pin torchvision to latest nightly (#1584)
We currently pin the `torch` package to the latest nightly version, but
since `torchvision` depends on the `torch` package, the pip resolver
then has to run through an extensive list of `torchvision` packages that
can be installed with the pinned `torch` package.  This search fails in
the RollPyTorch action, causing pip to settle on an old version of
`torchvision` that does not work with our tests.  In reality, we are
only interested in a specific version of the `torchvision` package.

To make the dependency explicit and to prevent test failures because of
incorrect package installations, this patch makes two key changes:

1. `torchvision` is now pinned to the latest nightly release in
   pytorch-requirements.txt along with the version of `torch` that is
   necessary to install the requested `torchvision` package

2. The RollPyTorch action now looks for the latest `torchvision` package
   instead of the latest `torch` package before writing the version
   numbers for pinning in pytorch-requirements.txt
2022-11-14 15:56:02 -06:00
Ashay Rane 2846776897
ci: enable ccache on Windows (#1548)
This patch makes a few small, but key, changes to enable ccache on
Windows.  First, it replaces the hendrikmuhs/ccache-action action with
command line invocations to the ccache binary, since the action has two
bugs, one of which causes CI to refer to different ccache artifacts
before versus after the build on Windows whereas the other bug can
sometimes cause the action to incorrectly infer that the cache is empty.

Second, this patch slightly alters the cache key, so that our old cache
artifacts, which have grown too big, are eventually discarded in favor
of the new, smaller cache artifacts.  Along the way, this patch also
keeps the RollPyTorch's cache artifact separate from the regular build's
cache artifact so as to keep these artifacts small, and also because the
RollPyTorch action is off the critical path for most contributors.

Finally, this patch makes small changes to the CMake file so that on
Windows, the ccache binary is added as a prefix, as recommended on the
[ccache Wiki](https://github.com/ccache/ccache/wiki/MS-Visual-Studio).
2022-11-03 12:17:22 -05:00
Ashay Rane f847642495
CI script improvements (#1547)
* ci: update versions of external actions

Node.js 12 actions are deprecated and will eventually go away, so this
patch bumps the old actions to their latest versions that use Node.js
16.

* ci: replace deprecated action with bash commands

The llvm/actions/install-ninja action uses Node.js 12, which is
deprecated.  Since that action is not updated to work with Node.js 16,
this patch replaces that action with equivalent bash commands to install
Ninja.

* ci: use smaller ccache artifacts to reduce evictions

Over time, our ccache sizes have grown quite large (some as large as
1.3 GB), which results in us routinely exceeding GitHub's limits, thus
triggering frequent cache evictions.  As a result, cache downloads and
uploads take unnecessary long, in addition to fewer cache entries being
available.

Based on experiments on a clean cache state, it appears that we need
less than 300 MB of (compressed) ccache artifacts for each build type.
Anything larger than that will accrue changes from the past that aren't
needed.

To alleviate the cache burden, this patch sets the maximum ccache size
to be 300 MB.  This change should not affect the success or failure of
our builds.  I will monitor the build times to check whether this change
causes any performance degradation.

* ci: use consistent platform identifiers

Prior to this patch, some of our builds ran on `ubuntu-latest`, while
some others ran on `ubuntu-20.04` and others ran on `ubuntu-22.04`, with
similar situations for macOS and windows.  This patch instead sets all
Linux builds to run on `ubuntu-latest`, all macOS builds to run on
`macos-latest`, and all Windows builds to run on `windows-latest`, to
make debugging future CI failures a little easier.
2022-11-02 21:37:01 -05:00
Ashay Rane 031d127940
ci: introduce read-only and read-write PyTorch build caches (#1546)
Until recently, we had to either risk feature branches creating PyTorch
build caches (which were unusable by the main branch or other parallel
feature branches because of GitHub's rules around sharing caches among
branches) or we had to limit the PyTorch build caches to only the main
branch, causing CI runs on feature branches to be terribly slow because
they had to rebuild PyTorch each time.

This patch enables the best of both worlds, by using a fork
(github.com/ashay/cache) of the GitHub's cache action, where the fork
adds an option (called `save`) which, when set, uploads a new cache
entry.  We thus set this `save` flag only when we're building PyTorch
from source in Torch-MLIR's main branch, whereas all other builds set
this `save` flag to `false`.

The ability to conditionally update the cache has been an oft-requested
feature on the original (github.com/actions/cache) repository and
multiple unmerged PRs exist to allow conditional cache updates, so it is
likely that using the fork is only a temporary solution.
2022-11-01 23:26:17 -07:00
powderluv 1a33577860
remove spurious ref in publish pages (#1536)
We don't need to pass in optional tag information.
2022-11-01 09:42:21 -07:00
Ashay Rane a8970101dc
pytorch: rename pytorch-version.txt to pytorch-hash.txt (#1541)
This patch is part of a larger set of improvements to the CI/build
system.  In the code, we refer to the version as the string that
contains the release identifier such as 1.14.0.dev20221028, so calling
the file that contains the commit hash as pytorch-version.txt creates
confusion.  For the sake of simplicity, this patch renames that file to
be pytorch-hash.txt.
2022-10-31 22:03:05 -05:00
Ashay Rane 2cf1092d4d
ci: restrict PyTorch cache to just the main branch (#1540)
If PyTorch build caches are created on a branch other than the main
branch, then GitHub does not share those caches with the main branch,
making every CI run that runs for each PR slow.  This patch resolves the
problem by letting only the main branch create and use PyTorch build
caches.
2022-10-31 15:14:53 -05:00
powderluv 87ab714ed6
Update buildRelease.yml (#1535) 2022-10-30 00:14:54 -07:00
powderluv bbde4e163f
Add Windows Builder (#1521)
Add a powershell script to build windows .whl packages
Disable LTC as it doesn't build on Windows.
Add GHA hooks
Use Python 3.10.8
2022-10-25 16:13:31 -07:00
Ashay Rane 801452b2f4
ci: make RollPyTorch run only on the Torch-MLIR repo (#1516) 2022-10-25 17:56:59 -05:00
Ashay Rane a9942f343a
Cache PyTorch source builds to reduce CI time (#1500)
* ci: cache PyTorch source builds

This patch reduces the time spent in regular CI builds by caching
PyTorch source builds.  Specifically, this patch:

1. Makes CI lookup the cache entry for the PyTorch commit hash in
   pytorch-version.txt
2. If lookup was successful, CI fetches the previously-generated WHL
   file into the build_tools/python/wheelhouse directory
3. CI sets the `TM_PYTORCH_INSTALL_WITHOUT_REBUILD` variable to `true`
4. The build_libtorch.sh script then uses the downloaded WHL file
   instead of rebuilding PyTorch

* ci: warm up PyTorch source cache during daily RollPyTorch action

This patch makes the RollPyTorch action write the updated WHL file to
the cache, so that it can be later retrieved by CI that runs for each
PR.  We deliberately add the caching step to the end of the action since
the RollPyTorch action never needs to read from the cache, although
executing this step earlier in the process should not cause problems
either.
2022-10-18 00:42:42 -05:00
Ashay Rane 0374a6da4e
ci: re-enable auto execution of the RollPyTorch action (#1488)
Now that the RollPyTorch action seems to have become stable, this patch
enables that action to be run at around 4am Pacific Time every day.
2022-10-12 19:18:54 -05:00
Daniel Ellis 67a0fb14ef Fix build release workflow. 2022-10-11 15:19:53 -04:00
Daniel Ellis 2e0d806bf7 Publish releases page after both mac and linux builds.
Mac was finishing first, causing linux releases to be lagged a day behind.
2022-10-10 10:37:02 -04:00
Ashay Rane 8a8e779529
Disable auto-update of PyTorch version until CI script stabilizes (#1456)
Instead of letting the auto-update script either fail because of script
errors or letting it commit bad versions, this patch makes the update
process manual, for now.  Once the script stabilizes, I will its
re-enable periodic execution.
2022-10-04 03:02:44 -05:00
Ashay Rane da02390188
build: update ODS and shape library when updating PyTorch (#1450)
Updating the PyTorch version may break the Torch-MLIR build, as it did
recently, since the PyTorch update caused the shape library to change,
but the shape library was not updated in the commit for updating
PyTorch.

This patch introduces a new default-off environment variable to the
build_linux_packages.sh script called `TM_UPDATE_ODS_AND_SHAPE_LIB`
which instructs the script to run the update_torch_ods.sh and
update_shape_lib.sh scripts.

However, running these scripts requires an in-tree build and the tests
that run for an in-tree build of Torch-MLIR are more comprehensive than
those that run for an out-of-tree build, so this patch also swaps out
the out-of-tree build for an in-tree build.
2022-10-02 18:02:34 -05:00
Ashay Rane 005d40f4d7
build: check exit code without causing script to fail (#1447)
A bug in the CI script caused the entire script to fail if the exit code
of the command for comparing with the existing hash returned a non-zero
exit status.  The non-zero exit status for this comparison does not
imply failed execution, since it only indicates that the hash has
changed.
2022-10-02 16:04:26 -05:00
Ashay Rane b3345e69e2
build: miscellaneous performance improvements (#1443) 2022-09-30 12:47:43 -05:00
Ashay Rane cf41a2582e
Final changes necessary to auto-update PyTorch version (#1438)
* build: push directly from CI to main branch

This avoids the need to create, approve, and merge a separate PR, in
addition to avoiding unnecessary CI runs for the PyTorch version update.

* build: schedule cronjob to run RollPyTorch action

This patch schedules the RollPyTorch action to be run at noon UTC, which
roughly corresponds to 4am Pacific Time.  We pick this time since the
commit for PyTorch nightly releases are picked just after midnight
Pacific Time and the nightly release artifacts are produced in about 2
to 3 hours after the commit is picked.
2022-09-29 17:15:32 -05:00
powderluv da584fbb73
Update releases pages after release builds (#1432)
* Update buildRelease.yml

Update Releases right after a Release build.

* Move gh-page update after release builds

This removes the periodic update and updates after a release build.
2022-09-29 12:49:41 -07:00
Ashay Rane 8f608c048d
build: use Github Actions for creating PR (#1433) 2022-09-29 07:09:16 -05:00
Ashay Rane 53e76b8ab6
build: create RollPyTorch to update PyTorch version in Torch-MLIR (#1419)
This patch fetches the most recent nightly (binary) build of PyTorch,
before pinning it in pytorch-requirements.txt, which is referenced in
the top-level requirements.txt file.  This way, end users will continue
to be able to run `pip -r requirements.txt` without worrying whether
doing so will break their Torch-MLIR build.

This patch also fetches the git commit hash that corresponds to the
nightly release, and this hash is passed to the out-of-tree build so
that it can build PyTorch from source.

If we were to sort the torch versions as numbers (in the usual
descending order), then 1.9 appears before 1.13.  To fix this problem,
we use the `--version-sort` flag (along with `--reverse` for specifying
a descending order).  We also filter out lines that don't contain
version numbers by only considering lines that start with a digit.

As a matter of slight clarity, this patch renames the variable
`torch_from_src` to `torch_from_bin`, since that variable is initialized
to `TM_USE_PYTORCH_BINARY`.

Co-authored-by: powderluv <powderluv@users.noreply.github.com>
2022-09-28 15:38:30 -05:00
Daniel Ellis 1dfe5efe9e Create github action for creating pip-compatible releases index 2022-09-23 15:26:19 -04:00
powderluv e6528f701a
Move CIs to use docker builds (#1316)
* Move CIs to use docker builds

Now that #1234 has landed and anyone can run CI / Release builds locally move GHA to use the same flow.

* update names

* Update comments
2022-09-02 18:35:40 -07:00
Sean Silva e16b43e20b Remove "torchscript" association from the e2e framework.
We use it for more than TorchScript testing now. This is a purely
mechanical change to adjust some file paths to remove "torchscript".

The most perceptible change here is that now e2e tests are run with

```
./tools/e2e_test.sh
instead of:
./tools/torchscript_e2e_test.sh
```
2022-08-29 14:10:03 -07:00
Sean Silva bcccf41d96 Add CI for generated files.
This ensures that they are always up to date.

This also updates the shape lib to make the new CI actually pass :)
2022-08-29 12:07:16 -07:00
powderluv c0630da678
Disable LTC by default until upstream revert relands (#1303)
* Disable LTC by default until upstream revert relands

Tracked with the WIP https://github.com/llvm/torch-mlir/pull/1292

* Disable LTC e2e tests temporarily

* Update setup.py

Disable LTC in setup.py temporarily until upstream is fixed.
2022-08-28 19:11:40 -07:00
Tanyo Kwok 2374098d71
[MHLO] Init end to end unit tests (#1223) 2022-08-23 16:47:21 +08:00
Henry Tu ba17a4d6c0
Reenable LTC in out-of-tree build (for real this time) (#1205)
* Fix OOT LTC CI build failure

* Disable LTC during macOS package gen

* Add more details about static TorchMLIRJITIRImporter library
2022-08-19 15:25:00 -04:00
Sambhav Jain 114f48e96c
[Bazel] Check cache directory exists before changing owners (#1241)
This fixes a seeding issue with the [previous PR](https://github.com/llvm/torch-mlir/pull/1240) where bazel build's GHA cache is not present to begin with and one of the commands (chown) fails on it. Should get the Bazel build back to green.
2022-08-17 17:04:50 -07:00
Sambhav Jain 9c8b962720
Dockerize and Cache Bazel {Local, CI} Builds (#1240)
This PR adds:

- A minimal docker wrapper to the bazel GHA workflow to make it reproducible locally
- Bazel cache to speed up GHA workflows (down to ~5 minutes from ~40+minutes)

This is a no-op for non-bazel workflows and an incremental improvement.
2022-08-17 12:46:17 -07:00
Ashay Rane 606f4d2c0e
build: streamline options for enabling LTC and MHLO (#1221) 2022-08-12 23:49:28 -07:00
Sambhav Jain 34478ab1c7
[Build] Add concurrency groups to address long queue times (#1219)
We're seeing large CI queue times ([example](https://discord.com/channels/636084430946959380/742573221882364009/1007631811184164944)) especially with MacOS VMs on GHA. Part of the problem is follow-on commits to the same branch which trigger new runs while the previous runs are still in-progress, hogging on the scarce VMs.

This PR adds concurrency groups to the GHA workflow which ensures that only a single job or workflow using the same concurrency group will run at a time. This would cancel any in-progress jobs in the same github workflow and github ref (e.g. `refs/heads/main` or `refs/pull/<pr_number>/merge`).

As discussed on discord [thread](https://discord.com/channels/636084430946959380/1007787336848912386/1007787338895740928), once this lands we may have to closely monitor the workflows to see this didn't introduce unintended consequences. If so, we could either revert, or decide to selectively cancel particular runs (e.g. macos only which is the main bottleneck right now) instead of entire workflow.

This will also require some expectation management. As in, if you see an  on the main branch, it may not necessarily mean things broke, it could mean the run was killed by a more recent run. Making it a bit harder to traceback a failure to a commit in a sequence of commits (requiring to run those builds again).

Thanks @powderluv for the proposal and pointer to this! It should help with the scarce VMs on GHA and save on queue time. 

References:
* https://docs.github.com/en/actions/using-jobs/using-concurrency#example-only-cancel-in-progress-jobs-or-runs-for-the-current-workflow
* https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#example-only-cancel-in-progress-jobs-or-runs-for-the-current-workflow
2022-08-12 17:38:48 -07:00
Ashay Rane 1581d6a84c
build: fix typo in path (#1218)
When we renamed the directory containing submodules from `external` to
`externals`, we accidentally left the original name in the Github
workflow.  This patch fixes the problem.
2022-08-12 15:38:25 -07:00
Sambhav Jain aed0ec3a2c
Merge matrix runs to fail fast globally (#1216)
My earlier[ PR](https://github.com/llvm/torch-mlir/pull/1213) had (among other things) decoupled ubuntu and macos builds into separate matrix runs. This is not working well due to limited number of MacOS GHA VMs causing long queue times and backlog. There are two reasons causing this backlog: 

1. macos arm64 builds with pytorch source are getting erratically cancelled due to resource / network constraints. This is addressed with this: https://github.com/llvm/torch-mlir/pull/1215

> "macos-arm64 (in-tree, OFF) The hosted runner: GitHub Actions 3 lost communication with the server. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error."

2. macos runs don't fail-fast when ubuntu runs fail due to being in separate matrix setups. This PR couples them again.
2022-08-12 11:30:09 -07:00
Sambhav Jain b8bd0a46cc
use pytorch binary for macos-arm64 builds (#1215) 2022-08-12 06:33:57 -07:00
Sambhav Jain f00ca91db0
Simplify matrix configuration for CI workflows (#1213)
Addresses https://github.com/llvm/torch-mlir/issues/1207. 

#### Provisioned jobs:
```
# ubuntu - x86_64 - llvm in-tree     - pytorch binary - build+test    # most used dev flow and fastest signal
# ubuntu - x86_64 - llvm out-of-tree - pytorch source - build+test    # most elaborate build
# macos  - arm64  - llvm in-tree     - pytorch source - build only    # cross compile, can't test arm64
```

#### Main changes
- [x] Spawn macos builds from a separate matrix (in the same workflow). It made sense to do this as they are fairly different from ubuntu (cross compile, use a different cmake configuration). This simplifies the matrix configuration and exclusions quite a bit, and makes the workflow a bit more tractable and maintenance friendly.
- [x] Remove the submodule md5sum step for ccache config. This was [broken](https://github.com/llvm/torch-mlir/runs/7779288734?check_suite_focus=true#step:3:145) for a while now.
- [x] Removes unused matrix options - `os`, `targetarch`, `python-version`, `llvmtype`.
- [x] Address ZSTD [comment](https://github.com/llvm/torch-mlir/pull/1204#discussion_r942349282) on @powderluv's cross compile [PR](https://github.com/llvm/torch-mlir/pull/1204). 

#### Further improvements (to be addressed in follow-on):
* ubuntu-x86_64 out-of-tree integration tests fail ([error](https://github.com/sjain-stanford/torch-mlir/runs/7781264029?check_suite_focus=true)); only run unit tests for now (tests are excluded in current CI too)

#### Passing workflow:
https://github.com/sjain-stanford/torch-mlir/actions/runs/2840676309
![image](https://user-images.githubusercontent.com/19234106/184194535-f3807991-401a-4cb9-b030-0ee8c334eba3.png)
2022-08-11 16:35:15 -07:00
powderluv 2342456356
mac m1 cross compile (#1204)
* mac m1 cross compile

Add support for M1 cross compile

* Remove redundant ExecutionEngine

It is registered as part of RegisterEverything

* nuke non-universal zstd

disable LTC
2022-08-10 08:48:39 -07:00
powderluv 9cf0b6e8ff
Disable out-of-tree and PyTorch binary (#1206) 2022-08-09 18:18:12 -07:00
Sambhav Jain d41c7becf5
[Bazel] Allow workflow_dispatch manual trigger on bazel workflow (#1203)
At the moment we don't gate torch-mlir PRs with bazel builds. This means bazel builds don't get run on open PRs, and so there's no good way to validate a fix PR which is meant to fix a broken bazel build. This option allows a bazel build to be manually triggered as needed on open PRs.
2022-08-09 13:28:21 -07:00
Sambhav Jain b696362b7d
Enable OOT builds in CI (#1188) 2022-08-09 12:13:16 -07:00
Henry Tu 3e97a33c80
Revert "Reenable LTC in out-of-tree build (#1177)" (#1183)
This reverts commit f85ae9c685.
2022-08-08 18:58:35 -07:00
Henry Tu f85ae9c685
Reenable LTC in out-of-tree build (#1177) 2022-08-08 17:35:22 -04:00
Henry Tu e322f6a878
Update LTC CMake hack documentation (#1155)
* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update buildAndTest.yml

* Update setup.py

* Address review comments
2022-08-05 14:12:20 -04:00
powderluv 37a229cffc
Update buildAndTest.yml (#1145) 2022-08-03 12:50:54 -07:00
powderluv 0d25b6f10e
Fix cache-suffix name bug (#1138)
This should enabling better caching of builds.
2022-08-03 07:53:01 -07:00
Henry Tu 2c3b3606d0 Resolve remaining LTC CI failures (#1110)
* Replace CHECK_EQ with TORCH_CHECK_EQ

* Check value of TORCH_MLIR_USE_INSTALLED_PYTORCH during LTC build

* Update LTC XFAIL with NewZerosModule ops

* Explicitly blacklist _like ops

* Automatically blacklist new_/_like ops

* Prune away unused Python dependencies from LTC

* Add flag to disable LTC

* Autogen dummy _REFERENCE_LAZY_BACKEND library when LTC is disabled

* Implement compute_shape_var

* Removed Var tests from XFAIL Set

* XFAIL tests using _local_scalar_dense or index.Tensor

* Add StdDim tests to XFAIL set

* Autogen aten::cat
2022-07-30 09:40:02 -04:00
Antonio Kim de6c135dc3 Fix LTC autogen for CI with nightly PyTorch
- Update llvm-project pin to match main
2022-07-30 09:40:02 -04:00
Henry Tu dfcc26556a Added e2e LTC tests (#916)
* Added e2e LTC Torch MLIR tests

* Fix seed for reproducability

* Check if computation is None before getting debug string

* Updated unit tests, and added numeric tests

* Print name of the model layer that fails numeric validation

* Run LTC e2e test with CI/CD

* Set seed in main function, instead of beginning of execution

* Add comment to specify number of digits of precision

* Fixed typo

* Remove tests for LTC example models

* Added LTC option to torchscript e2e

* Implement compile and run for LTC e2e test

* xfail all tests that use ops that aren't currently supported
2022-07-30 09:40:02 -04:00
Jae Hoon (Antonio) Kim 2f22e2ef40 Add initial LTC backend (#610)
* Add initial LTC backend skeleton

* Disable CI build and move TorchMLIRPyTorch.cmake
2022-07-30 09:40:02 -04:00
powderluv db4a6991a0
buildAndTest.yml for matrix builds (#1098)
* Update buildAndTest.yml

test with fast-fail matrix builds

* Remove redundant and statement

* Downgrade to 20.04

Until upstream PyTorch FBGEMM is fixed to compile with clang+14+ https://github.com/pytorch/pytorch/pull/82396

* Update buildAndTest.yml

run tests on only the binary config.
2022-07-29 10:52:46 -07:00
powderluv 31fd812acf
Add linux and macOS source builds in CI (#1070)
This enables building Pytorch from source in the CI.
The build should mostly hit the ccache.
Release builds will follow once we have some runtime on the CI.
2022-07-21 14:16:03 -07:00
Ziheng Jiang c61c99e887
[MHLO] Init MHLO integration. (#1083)
Co-authored-by: Bairen Yi <yibairen.byron@bytedance.com>
Co-authored-by: Jiawei Wu <xremold@gmail.com>
Co-authored-by: Tianyou Guo <tianyou.gty@alibaba-inc.com>
Co-authored-by: Xu Yan <yancey.yx@alibaba-inc.com>
Co-authored-by: Ziheng Jiang <ziheng.jiang@bytedance.com>
2022-07-20 16:18:16 -07:00
powderluv a1947c7bd1
Update oneshotSnapshotPackage.yml 2022-07-02 10:00:52 -07:00
powderluv 2f0b1d0b08
bump macOS builds to Python 3.10 2022-06-04 22:44:32 -07:00
powderluv b14c5d619d
Build the nightly package only once a day/night
No need to be shipping two releases a day, our supported packages and binaries have grown.
2022-06-04 22:40:53 -07:00
Maksim Levental cec5aeedb0
add ci tests (#754) 2022-05-25 14:59:59 -05:00
powderluv cfc1a6515c
build only Python3.9 to avoid timeout
GH runner times out when building 3.9 and 3.10 on macOS.
2022-05-13 00:07:55 -07:00
powderluv 2877a37ac6
Update buildRelease.yml
Fix filename changed missed in Code Review.
2022-04-25 17:00:31 -07:00
powderluv 6d09c98b2f
Fix version information in Release builds (#788)
env vars seems to be lost in manylinux docker.
Use a version file like IREE does.
2022-04-25 14:13:17 -07:00
Ahmed S. Taei 6b3d0b7e7a
Add bazel build support (2/N) (#744)
- Add bazel GitHub actions.
2022-04-25 12:33:15 -07:00
powderluv 0f751498a7
Update releaseSnapshotPackage.yml 2022-04-22 15:38:36 -07:00
powderluv d789aee11e
Only upload torch*.whl (#786)
only upload torch*.whl to unblock OSX build failures during upload. We have to move to svenstaro/upload-release-action
2022-04-22 15:17:09 -07:00
powderluv cbf158f069
Update buildRelease.yml
Update artifact directory to ./build_tools/python_deploy/wheelhouse/*.whl
2022-04-21 19:57:27 -07:00
powderluv 9f2184da98
Update oneshotSnapshotPackage.yml
remove now deprecated inputs to build and test
2022-04-21 19:12:42 -07:00
powderluv 8003b92fa7
Delete releasePackage.yml 2022-04-21 18:54:01 -07:00
powderluv c1026fa95b
Switch to using the new Release builds (#780) 2022-04-21 18:46:34 -07:00
powderluv 4ef61aa27f
Minor buildsystem fixes (#778)
Sets up auto-pinning of latest torch-nightly
2022-04-21 15:53:00 -07:00
powderluv 0257d91a21
Update buildManylinux.yml
use sudo for mac OS
2022-04-21 11:06:02 -07:00
powderluv 299c1bbe6d
Update buildManylinux.yml
fix build naming
2022-04-21 10:55:40 -07:00
powderluv b03eac4224
Enable OSX (Intel, Apple Silicon Builds) (#776)
Update pinned pytorch version. Will submit a follow on PR to bump.
Also update artifacts directory
2022-04-21 10:47:28 -07:00
powderluv cc3a4a58ef
Add oneshot release snapshot for test/ondemand (#768)
* Add oneshot release snapshot for test/ondemand

Add some build scripts to test new release flow based on IREE.
Wont affect current builds, once this works well we can plumb it
in.

Build with manylinux docker

* Fixes a few issues found when debugging powderluv's setup.

* It is optional to link against Python3_LIBRARIES. Check that and don't do it if they don't exist for this config.
* Clean and auditwheel need to operate on sanitized package names. So "torch_mlir" vs "torch-mlir".
* Adds a pyproject.toml file that pins the build dependencies needed to detect both Torch and Python (the MLIR Python build was failing to detect because Numpy wasn't in the pip venv).
* Commented out auditwheel: These wheels are not PyPi compliant since they weak link to libtorch at runtime. However, they should be fine to deploy to users.
* Adds the --extra-index-url to the pip wheel command, allowing PyTorch to be found.
* Hack setup.py to remove the _mlir_libs dir before building. This keeps back-to-back versions from accumulating in the wheels for subsequent versions. IREE has a more principled way of doing this, but what I have here should work.

Co-authored-by: Stella Laurenzo <stellaraccident@gmail.com>
2022-04-21 02:19:12 -07:00
Clément Fournier 578d0ec292 Review comments 2022-04-19 15:11:17 -07:00
Clément Fournier 3e0c1cf6af Change cache suffix to not invalidate existing caches 2022-04-19 15:11:17 -07:00
Clément Fournier 566650c5ae Use distinct ccaches
Since they run in distinct jobs, using the same ccache would
cause one job to overwrite the cache of the other.

See https://github.com/ljfitz/torch-mlir/pull/16 for a proof
that this works. The first build takes a long time but ccache
takes over in the dummy commit.
2022-04-19 15:11:17 -07:00
Clément Fournier f9d5201ae6 address PR review 2022-04-19 15:11:17 -07:00
Clément Fournier 4a2535a86d Add build-out-of-tree job 2022-04-19 15:11:17 -07:00
Clément Fournier 37087ccd5f Refactor current CI workflow into composable jobs 2022-04-19 15:11:17 -07:00
Sean Silva 8250f50c81 Attempt to set Python package version to the snapshot identifier.
This should make the releases sort properly when `pip`'s
`-f`/`--find-links` argument is used.
2022-03-30 17:54:11 +00:00
Sean Silva 4f61b1fce1 Try to get the release packages publishing again.
As per the docs on:
https://github.com/eregon/publish-release

> Note that the release must *not be marked as prerelease* for this to work.

For some reason, we were marking the release as pre-release before and
this was working, but the docs here seem pretty clear, so I'm going to
try it.
2022-03-30 00:35:02 +00:00
Sean Silva 3a96078571 Pin the CI to the latest working PyTorch.
I am investigating the breakage.

Also, fix "externals" rename in setup.py and some cases where we weren't
using `requirements.txt` consistently.

Also, fix a case where the packaging script would get confused due to
".." in the path name.
2022-03-29 15:02:17 -07:00
Ahmed S. Taei 8383497704
[NFC] Rename external -> externals (#699) 2022-03-26 09:12:27 -07:00
Yi Zhang 869daf3c22 Add TMTensor dialect to torch-mlir
This is intended to explore support for non-structured ops that can't
be modeled by Linalg dialect. `tm_tensor.scan` and `tm_tensor.scatter`
are added as the first such ops. The dialect should aim to be
upstreamed in the future.
2022-02-15 16:45:38 -05:00
Sean Silva c46d48f9f5 Make error reporting a bit better.
- Split out TOSA in the CI.
- Add summary of unexpected test outcomes. This works better when there
  are many XFAIL'ing tests, as it only prints out the error_str on
  FAIL, not on XFAIL. Example here:
  https://gist.github.com/silvasean/c7886ec7b3d35c21563cb09f7c3407da
2021-10-28 13:20:16 -07:00
Stella Laurenzo 92ae692387
Filter checks to only run on push to main branch. (#372)
Keeps redundent pull request and push workflows from running when pushing to branches in the main repo.
2021-10-21 21:23:21 -07:00
Sean Silva 0c5c84d63d Add a basic TOSA E2E backend.
We lower through linalg-on-tensors and use RefBackend to run it.
This adds enough support for a "tanh" op. Adding more ops should be
fairly mechanical now that things are wired up. Run with:
```
./tools/torchscript_e2e_test.sh -c tosa
```

The backend structure is very similar to linalg-on-tensors based E2E
backends and is a nice parallel (see `tosa_backend.py`). Actually, this
forced a nice refactoring to the layering here. We removed
`torchscript-module-to-linalg-on-tensors-backend-pipeline` and instead
require separately running
```
torchscript-function-to-torch-backend-pipeline,torch-backend-to-linalg-on-tensors-backend-pipeline
```
This highlights the step that lowers to the "torch backend contract"
of cleaned up `torch` dialect ops is a critical step in the lowering.
Going forward, that is the key load-bearing contract of the torch-mlir
project, not the linalg-on-tensors backend contract.

Recommended review order:
- `TorchToTosa.cpp` / `TorchToTosa/basic.mlir`
- `python/torch_mlir_e2e_test/torchscript/configs/tosa_backend.py` and
  the new `utils.py` file there.
- `python/torch_mlir_e2e_test/tosa_backends/linalg_on_tensors.py` and
  `abc.py` in that directory for the TOSA backend e2e interface.
- other misc mechanical changes
2021-10-08 09:59:45 -07:00
Sean Silva b6628fe774 Mark releases as "published".
This allows `pip` to see them.
2021-10-06 22:48:21 +00:00
Sean Silva 4a8d05e4a5 Add torch_mlir snapshot packages.
This closely follows IREE's
[schedule_snapshot_release.yml](f2f153d394/.github/workflows/schedule_snapshot_release.yml (L1))
workflow.

The snapshot releases can be installed with:
```
python -m pip install torch_mlir -f "https://github.com/llvm/torch-mlir/releases"
```
2021-10-06 14:50:31 -07:00
Sean Silva e687d39074 Update buildAndTest.yml 2021-09-27 17:11:08 -07:00
Sean Silva 0eb767ea45 Remove frontends/pytorch directory.
It just contained the e2e testing framework. We now fold it into the
main project to reduce complexity.

- `frontends/pytorch/python/` -> `python/torch_support`
- `frontends/pytorch/e2e_testing -> e2e_testing`
- `frontends/pytorch/examples -> examples`
- `frontends/pytorch/test` -> `python/test`
- `torch_mlir_torchscript` python module -> `npcomp_torchscript`
- `torch_mlir_torchscript_e2e_test_configs` python module ->
  `npcomp_torchscript_e2e_test_configs`

This also changes the license of a handful of files from the
"pytorch-style" license to the regular LLVM/npcomp license. The only
people who committed to those files were myself and Yi.
2021-09-17 09:27:49 -07:00
Sean Silva d94d6800fa Bring CI back to life.
This brings back `check-npcomp-all` and the refbackend e2e tests
coverage.
2021-09-16 12:07:32 -07:00
Sean Silva b6be96d722 [torch-mlir earthmoving (2/N)] Python code movement.
This moves the bulk of the Python code (including the Torch interop)
from `frontends/pytorch` into `torch-mlir/TorchPlugin`. This also
required reconciling a bunch of other Python-related stuff, like the
`torch` dialects.

As I did this, it was simpler to just remove all the old numpy/basicpy
stuff because we were going to delete it anyway and it was faster than
debugging an intermediate state that would only last O(days) anyway.

torch-mlir has two top-level python packages (built into the
`python_packages` directory):

- `torch_mlir_dialects`: `torch` dialect Python bindings (does not
  depend on PyTorch). This also involves building the aggregate CAPI for
  `torch-mlir`.
- `torch_mlir`: bindings to the part of the code that links against
  PyTorch (or C++ code that transitively does).

Additionally, there remain two more Python packages in npcomp (but
outside `torch-mlir`):

- `npcomp_torch`: Contains the e2e test framework and testing configs
  that plug into RefBackend and IREE.
- `npcomp_core`: Contains the low-level interfaces to RefBackend and
  IREE that `npcomp_torch` uses, along with its own
  `MLIR_PYTHON_PACKAGE_PREFIX=npcomp.` aggregation of the core MLIR
  python bindings. (all other functionality has been stripped out)

After all the basicpy/numpy deletions, the `npcomp` C++ code is now very
tiny. It basically just contains RefBackend and the `TorchConversion`
dialect/passes (e.g. `TorchToLinalg.cpp`).

Correspondingly, there are now 4 main testing targets paralleling the
Python layering (which is reflective of the deeper underlying dependency
structure)

- `check-torch-mlir`: checks the `torch-mlir` pure MLIR C++ code.
- `check-torch-mlir-plugin`: checks the code in `TorchPlugin` (e.g.
  TorchScript import)
- `check-frontends-pytorch`: Checks the little code we have in
  `frontends/pytorch` -- mainly things related to the e2e framework
  itself.
- `check-npcomp`: Checks the pure MLIR C++ code inside npcomp.

There is a target `check-npcomp-all` that runs all of them.
The `torch-mlir/build_standalone.sh` script does a standalone build of
`torch-mlir`.

The e2e tests (`tools/torchscript_e2e_test.sh`) are working too.

The update_torch_ods script now lives in
`torch-mlir/build_tools/update_torch_ods.sh` and expects a standalone
build.

This change also required a fix upstream related to cross-shlib Python
dependencies, so we also update llvm-project to
8dca953dd39c0cd8c80decbeb38753f58a4de580 to get
https://reviews.llvm.org/D109776 (no other fixes were needed for the
integrate, thankfully).

This completes most of the large source code changes. Next will be
bringing the CI/packaging/examples back to life.
2021-09-15 13:40:30 -07:00
Sean Silva 28762699b3
Comment out the full wheel build
Last commit was only the last step of that.
2021-09-10 21:43:25 -07:00
Sean Silva 0d8af19550
Temporarily disable wheel building
It will be re-enabled after the torch-mlir excision is completed.
2021-09-10 21:40:16 -07:00
Sean Silva 7c788dbfec Remove CI pinning. 2021-08-02 11:07:08 -07:00
Stella Laurenzo 8494455282 Re-enable integration tests in CI. 2021-07-29 22:57:20 -07:00
Stella Laurenzo 445472c51e Build packages for npcomp-torch.
* Adds a minimal setup.py for frontends/pytorch
* Makes npcomp-core export its headers and libraries
* Adds a script to build packages.
* Adds CI step to package and smoke test.
* Will need some more tweaks and coordination prior to deploying (version locking etc).
2021-07-29 19:58:59 -07:00
Stella Laurenzo 2dbab50444
Rework the python build to a static assembly of MLIR+NPCOMP (#251)
* Adapt to python build system updates.

* Bump llvm to 310c9496d80961188e8d8f8ad306cdf44bd7541f (includes python build updates)
* Adds refback C-API.
* Re-layers all python builds.
* Rework CI.
2021-07-27 16:10:10 -07:00