Commit Graph

185 Commits (aa7e95f7c8cde77528d273633baa8887f4795187)

Author SHA1 Message Date
Sambhav Jain 49fdc1a8a6
Add bazel targets for TorchOnnxToTorch conversion passes (#2596)
Adapts to the TorchOnnxToTorch changes from
https://github.com/llvm/torch-mlir/pull/2585.
Also restores bazel builds in post-merge CI that was disabled in
2148c4cd0d.

Bazel workflow:
https://github.com/sjain-stanford/torch-mlir/actions/runs/7023912962
2023-11-28 13:06:35 -08:00
Stella Laurenzo 53fc995639 Run CI on all main/postsubmit commits.
Prior to this, the concurrency rules for presubmits (which cancel eagerly) were being applied to main. The result was that landing a second patch would cancel the CI on the one prior.
2023-11-22 18:05:18 -08:00
Stella Laurenzo 2148c4cd0d Temporarily disable bazel build until fixed. 2023-11-22 18:00:39 -08:00
Vivek Khandelwal b26797c20b
Disable torch-mlir-core for release build (#2586) 2023-11-20 19:36:14 -08:00
Stella Laurenzo 5eae0adff1
Breakup python pytorch deps (#2582)
This lifts the core of the jit_ir_importer and ltc out of the pt1
project, making them peers to it. As a side-effect of this layering, now
the "MLIR bits" (dialects, etc) are not commingled with the various
parts of the pt1 project, allowing pt1 and ltc to overlay cleanly onto a
more fundamental "just MLIR" Python core. Prior to this, the Python
namespace was polluted to the point that this could not happen.

That "just MLIR" Python core will be introduced in a followup, which
will create the space to upstream the FX and ONNX pure Python importers.

This primary non-NFC change to the API is:

* `torch_mlir.dialects.torch.importer.jit_ir` ->
`torch_mlir.jit_ir_importer`.

The rest is source code layering so that we can make the pt1 project
optional without losing the other features.

Progress on #2546.
2023-11-19 12:10:19 -08:00
Stella Laurenzo 6961f0a247
Re-organize project structure to separate PyTorch dependencies from core project. (#2542)
This is a first step towards the structure we discussed here:
https://gist.github.com/stellaraccident/931b068aaf7fa56f34069426740ebf20

There are two primary goals:

1. Separate the core project (C++ dialects and conversions) from the
hard PyTorch dependencies. We move all such things into projects/pt1 as
a starting point since they are presently entangled with PT1-era APIs.
Additional work can be done to disentangle components from that
(specifically LTC is identified as likely ultimately living in a
`projects/ltc`).
2. Create space for native PyTorch2 Dynamo-based infra to be upstreamed
without needing to co-exist with the original TorchScript path.

Very little changes in this path with respect to build layering or
options. These can be updated in a followup without commingling
directory structure changes.

This also takes steps toward a couple of other layering enhancements:

* Removes the llvm-external-projects/torch-mlir-dialects sub-project,
collapsing it into the main tree.
* Audits and fixes up the core C++ build to account for issues found
while moving things. This is just an opportunistic pass through but
roughly ~halves the number of build actions for the project from the
high 4000's to the low 2000's.

It deviates from the discussed plan by having a `projects/` tree instead
of `compat/`. As I was thinking about it, this will better accommodate
the follow-on code movement.

Once things are roughly in place and the CI passing, followups will
focus on more in-situ fixes and cleanups.
2023-11-02 19:45:55 -07:00
Vivek Khandelwal d10a86f51c Disable LTC for arm release
Also, revert https://github.com/llvm/torch-mlir/pull/2488.
Disabling LTC based on the discussion here:
https://discord.com/channels/636084430946959380/742573221882364009/1156272667813494824
2023-10-02 22:22:07 +05:30
Vivek Khandelwal 8abfa5b196
Use PyTorch nightly for Arm release build (#2488)
The LTC backend has drifted from being able to pass tests on the stable
PyTorch version, so pinning to nightly on ARM.

Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
2023-09-27 09:40:32 -07:00
Ashay Rane 5f772e8cb4
CI: reconcile differences between RollPyTorch and pre-merge checks (#2482) 2023-09-23 07:00:16 -07:00
powderluv cfd70dfb0d
Update merge-rollpytorch.yml with approved actors (#2446) 2023-09-08 05:00:09 -07:00
Ashay Rane 8f28d933e1
CI: disable LTC e2e tests in stable PyTorch builds (#2414)
This way, we can keep CI green without being forced to ignore _all_
errors that arise in stable PyTorch builds
2023-08-23 11:11:17 -05:00
Ramiro Leal-Cavazos c0d8248ec7
Prevent failed stable CI job from cancelling nightly jobs (#2373)
The CI jobs that use stable PyTorch are currently not required to pass
in order for a patch to get merged in `main`. This commit makes sure
that if a CI job for stable PyTorch fails, it does not cancel the
other required jobs.
2023-08-03 14:58:57 -07:00
Sambhav Jain facce24ae3
[Bazel] Fix broken Bazel build (#2252)
Bazel GHA run: https://github.com/sjain-stanford/torch-mlir/actions/runs/5408580473
2023-06-29 08:45:35 -07:00
Ashay Rane c202cb5263
CI: Checkout repo so that gh knows where to look for the PR (#2223)
Without this patch, the gh command (for merging the PR) doesn't know
which repo we're referring to.
2023-06-09 21:50:19 -05:00
Ashay Rane 33ac7c3ad1
CI: Use GitHub token when calling gh for merging RollPyTorch PR (#2220) 2023-06-08 15:07:43 -05:00
Ashay Rane 3c1a796f7e
CI: Merge RollPyTorch PR upon successful completion (#2218)
This patch removes the mock commands, so that once the Build And Test
workflow has successfully completed on the RollPyTorch action, the PR is
merged and the branch is deleted.
2023-06-07 14:06:50 -05:00
Ashay Rane 2480cb7a51
CI: Update script to (mock) merge of RollPyTorch PRs (#2213)
Before enabling the actual merge, this patch dumps to the console the
bash commands that it plans to execute.
2023-06-06 12:38:16 -05:00
Ashay Rane 173050ec8a
CI: Fix yaml syntax in merge-rollpytorch.yml (#2201)
This patch fixes the indentation in the yaml file.
2023-06-05 09:43:00 -05:00
Ashay Rane c804dac925
CI: Introduce workflow to auto-merge RollPyTorch updates (#2196)
This patch adds a new workflow that runs when an update to the
rollpytorch branch by silvasean (in whose name the RollPyTorch action
runs) causes the regular CI build to complete without errors.  Upon
execution, this workflow currently just prints the PR number(s) of the
PR created by the RollPyTorch action, but once this is working as
expected, we will add the step to merge the PR changes.
2023-06-05 08:48:20 -05:00
Ashay Rane 755d0c46da
CI: Spot fixes related to nightly and stable PyTorch builds (#2190)
* CI: Skip (redundant) libtorch build when using stable PyTorch version

When we use PyTorch stable builds, there is no need to build libtorch
from source, making the stable-pytorch-with-torch-binary-OFF
configuration redundant with stable-pytorch-with-torch-binary-ON.  This
patch drops the redundant configuration from CI.

* CI: Simplify guard conditions for creating and using libtorch cache

Whether libtorch is enabled or not is predicated on a host of conditions
such as the platform, in-tree versus out-of-tree build, and stable
versus nightly PyTorch builds.  Instead of repeating these conditions to
guard whether to create or use the libtorch cache artifacts (and getting
them almost incorrect), this patch predicates the relevant pipeline
steps to whether libtorch is enabled, thus making the conditions far
simpler.
2023-06-01 22:58:25 -07:00
maxbartel db3f2e3fde
Add Stable PyTorch CI Pipeline (#2038)
* feat: split pytorch requirements into stable and nightly

* fix: add true to tests to see full output

* refactor: add comments to explain true statement

* feat: move some tests to experimental mode

* refactor: refactor pipeline into more fine grained difference

* feat: add version differentiation for some tests

* feat: activate more configs

* refactor: change implementation to use less requirement files

* refactor: remove contraints used for testing

* fix: revert some requirement file names

* refactor: remove unnecessary ninja install

* fix: fix version parsing

* refactor: remove dependency on torchvision in main requirements file

* refactor: remove index url

* style: remove unnecesary line switch

* fix: readd index url
2023-05-30 12:16:24 -07:00
powderluv 2f02ae1ebe
Delete another spurious pip (#2173) 2023-05-26 00:02:21 -07:00
powderluv 9b7909b599
Add ARM64 release builds (#2159)
Creates a build_linux_arm64 job that builds the release on an arm64 self-hosted runner.
Drop Python 3.10 support
Pass  TM_TORCH_VERSION to choose the Stable PyTorch version (since arm64 doesn't have nightly builds)

Borrows nightly / stable Pytorch switch from the WIP
https://github.com/llvm/torch-mlir/pull/2038
2023-05-25 20:39:19 -07:00
powderluv f5e0287aaa
Remove spurious pip in Release builds (#2172)
(left over from a previous commit that was approved and landed in a branch on accident)
2023-05-25 18:59:21 -07:00
powderluv 0cdc03d8fe
only setup python for non-docker platforms (#2171)
Original PR was accidentally merged to a branch. Re-landing same PR to main now
2023-05-25 16:27:06 -07:00
Ashay Rane 9f65a8a961
CI: disable caching for release builds (#2168)
This patch adds a (default-true) input called `cache-enabled` to the
setup-build action, so that when the input is false, ccache is not setup
on the host machine.  This patch also sets the input to be false for the
release builds.
2023-05-25 11:01:46 -05:00
Ashay Rane 558f12f05c
CI: Use GitHub app token for creating PRs (#2137)
Since PRs created by the GitHub action bot cannot trigger workflows (and
thus build tests), this patch uses the token for a GitHub app that was
specifically created for the RollPyTorch action.
2023-05-19 23:18:03 -05:00
Ashay Rane 19a08d51f3
CI: [nfc] Use actions/cache instead of modified fork (#2124)
We previously used a fork of the action/cache repository for the PyTorch
cache since the actions/cache repo did not support read-only caches.
Now that actions/cache supports separate read and write steps, this
patch switches back to the actions/cache repo.
2023-05-12 23:25:17 -05:00
Ashay Rane 28bb866260
CI: prepare CI for ccache updates for MSVC/Windows (#2120)
This patch, by itself, doesn't fix caching on Windows, but once a new
release of ccache is available, caching for Windows builds should start
working again (validated by building ccache from source and using it
with LLVM builds).

Ccache rejects caching when either the `/Zi` or `/ZI` flags are used
during compilation on Windows, since these flags tell the compiler to
embed debug information in a PDB file (separate from the object file
produced by the compiler).  In particular, our CI builds add the `/Zi`
flag, making ccache mark these compiler invocations as uncacheable.

But what caused our CI to add debug flags, especially when we specified
`-DCMAKE_BUILD_TYPE=Release`?  On Windows, unless we specify the
`--config Release` flag during the CMake build step, CMake assumes a
debug build.  So all this while, we had been producing debug builds of
torch-mlir for every PR!  No doubt it took so long to build the Windows
binaries.

The reason for having to specify the configuration during the _build_
step (as opposed to the _configure_ step) of CMake on Windows is that
CMake's Visual Studio generators will produce _both_ Release and Debug
profiles during the CMake configure step (thus requiring a build-time
value that tells CMake whether to build in Release or Debug mode).
Luckily, on Linux and macOS, the `--config` flag seems to be simply
ignored, instead of causing build errors.

Strangely, based on cursory tests, it seems like on Windows we need to
specify the Relase configuration as both `-DCMAKE_BUILD_TYPE=Release` as
well as `--config Release`.  Dropping either made my build switch to a
Debug configuration.

Additionally, there is a bug in ccache v4.8 (although this is addressed
in trunk) that causes ccache to reject caching if the compiler
invocation includes any flag that starts with `/Z`, including /`Zc`,
which is added by LLVM's HandleLLVMOptions.cmake and which isn't related
to debug info or PDB files.  The next release of ccache should include
the fix, which is to reject caching only for `/Zi` and `/ZI` flags and
not all flags that start with `/Z`.

As a side note, debugging this problem was possible because of ccache's
log file, which is enabled by: `ccache --set-config="log_file=log.txt"`.
2023-05-12 12:45:01 -05:00
Ashay Rane e161f2511a
CI: let GitHub action create commit (#2114)
The GitHub action for creating the PR expects that either the changes
are not committed (in which case it commits them with the specified
commit message) or that the commit exists but that it is also pushed to
remote.

Prior to this patch, we created the commit but did not push it to
remote, causing failures.  This patch leaves the changes uncommitted so
that they're committed and pushed to remote as part of the PR creation.
2023-05-11 19:19:32 -05:00
Ashay Rane 377720af87
CI: create PR for RollPyTorch updates (#2106)
Currently, we run just the Linux in-tree tests when the RollPyTorch
workflow runs, but this is insufficient since WHL files for macOS or
Windows are sometimes not uploaded by PyTorch, causing the RollPyTorch
action to pass but all subsequent torch-mlir CI tests to fail because of
the broken build.

The easiest way to validate the RollPyTorch action on all platforms is
to run the standard set of tests that we run for each submitted PR, so
this patch makes the RollPyTorch action submit a PR instead of
committing the changes to the main branch directly.  The PR is assigned
to a handful of folks for review, although this can be changed in the
future.
2023-05-10 09:25:59 -05:00
powderluv 0a3ab07c8f
Set fetch-depth 0 for CI builds too (#2034) 2023-04-14 11:36:41 -07:00
powderluv 6cab740603
Set fetch-depth 0 (#2009)
This is to potentially workaround the index.lock issue in git when we checkout new depth 1 submodules of recently updated mhlo.
2023-04-06 14:29:59 -07:00
powderluv 0497f0b08d
Revert "CI: drop deletion of workspace and limit submodule fetch concurrency (#1921)" (#2007)
This reverts commit 07f5f042c7.
2023-04-06 10:36:30 -07:00
Ashay Rane 07f5f042c7
CI: drop deletion of workspace and limit submodule fetch concurrency (#1921)
Despite using sudo to delete the workspace directory, we still
occasionally run into checkout errors.  This patch thus drops the
deletion of the workspace prior to checkout.  It also restricts the
number of parallel jobs in the submodule fetch step to just one, to try
and resolve the checkout issue ("index.lock: File exists.").
2023-04-04 12:58:52 -05:00
powderluv f83c516b15
Update RollPyTorch.yml to use Pytorch 3.11 (#1999) 2023-04-03 23:53:26 -07:00
Maksim Levental c718f87c5d
- rename no-jit -> core (#1920)
- add windows release
2023-03-07 00:20:06 -06:00
Maksim Levental ac1f03e6f7
add jit,no-jit release matrix (#1916) 2023-03-05 22:13:33 -08:00
Maksim Levental 415265a64c
Add `torch-mlir-no-jit-importer` build case for mac os wheels (#1902)
* add flags to setup.py for out-of-tree build

* - fix build_ext bug
- add wheels script cases for mac wheels
2023-03-05 12:23:43 -06:00
Ashay Rane 987d5ab335
CI: use `sudo` to remove Docker-created files (#1905) 2023-02-27 17:44:50 -06:00
Ashay Rane ea00371d85
CI: clear workspace directory before checkout (#1900)
We have recently started seeing errors like:

```
  Synchronizing submodule url for 'externals/llvm-project'
  Synchronizing submodule url for 'externals/mlir-hlo'
  /usr/bin/git -c protocol.version=2 submodule update --init --force --depth=1
  Error: fatal: Unable to create '/home/anush/actions-runner/_work/torch-mlir/torch-mlir/.git/modules/externals/llvm-project/index.lock': File exists.
```

As a workaround, this patch removes the workspace directory before the
checkout step.
2023-02-24 14:44:35 -06:00
Ashay Rane 268364e061
CI: install `unzip` before using it (#1893)
The RollPyTorch action needs the `unzip` command to peek into WHL files
for fetching metadata.  This patch makes sure that the command is
installed before referencing it.
2023-02-19 17:49:08 -06:00
powderluv 5710871f4f
Update buildAndTest.yml (#1881)
* Update buildAndTest.yml

* Update oneshotSnapshotPackage.yml

* Update buildRelease.yml

* Update RollPyTorch.yml

* Update oneshotSnapshotPackage.yml

* Update buildAndTest.yml
2023-02-15 09:17:12 -08:00
Ashay Rane 67ab708b63
python: separate build- and test-related pip dependencies (#1874)
We want to ensure that pip packages required for building torch-mlir
should be included in the dependencies of torch-mlir, but we don't want
the pip packages required for _testing_ of torch-mlir to be included
among the dependencies.  To be able to specify and install one set of
dependencies and not the other, this patch separates the pip packages
into two files: build-requirements.txt and test-requirements.txt.

This patch also updates references to the requirements.txt file so that
CI builds that run end-to-end tests install test-related pip
dependencies while everything else (including WHL builds) sticks to just
the build-related pip dependencies.

Despite this change, this patch should not affect a torch-mlir
developer's workflow.  More precisely, since this patch makes the
top-level requirements.txt file refer to both build-requirements.txt and
test-requirements.txt files, a torch-mlir developer should be able to
continue referring to the requirements.txt file without any impact.
2023-02-13 21:22:09 -06:00
powderluv 320e67ff34
Python 3.11 support (#1848)
* Python 3.11 support

* test without torchvision

* Update pytorch-requirements.txt

* Update buildRelease.yml

* Update action.yml

* Update install_macos_deps.sh

* Update build_macos_packages.sh
2023-02-10 07:16:37 -08:00
Ashay Rane 711646d095
mhlo: migrate conversion to stablehlo (#1840)
This patch replaces all MHLO operations with their StableHLO
counterparts and adds a validation pass to ensure that no MHLO operations
remain before translating all Stablehlo operations to the MHLO dialect
for further lowering to the Linalg dialect.

This patch also updates all lit tests so that they refer to the
`convert-torch-to-stablehlo` pass and so that they check for StableHLO
operations.
2023-02-02 07:29:47 -06:00
Ashay Rane a897c49803
CI: miscellaneous fixes for Release builds (#1781)
- Use v3 of actions/checkout, since the version we use (v2) uses
   Node.js 12, which is deprecated by GitHub.

 - Source the PowerShell venv sctipt (instead of the bash sript) since
   the calling script is a PowerShell script.  Without this, the build
   doesn't use venv at all.

 - Make the build dependencies in whl-requirements.txt (used by
   setup.py) match those in requirements.txt.  To that end, this patch
   creates a build-requirements.txt that is referenced by
   requirements.txt and whl-requirements.txt.
2023-01-06 20:41:43 -06:00
Ashay Rane f6b6069a34
ci: post comment on RollPyTorch tracker issue upon build failure (#1730)
Now that the RollPyTorch tracker issue exists, we can automate the job
of notifying folks of failures instead of having to do it manually.
This patch adds a step to the workflow to post such a message.
2022-12-18 13:45:30 -06:00
powderluv cd90c0aaf5
Update buildAndTest.yml (#1723) 2022-12-15 05:42:01 -08:00
Ashay Rane 64f9a0e978
ci: print ccache statistics and configuration at end of CI run (#1719)
There appear to be two problems with the caching layer in our CI runs:
(a) the sizes of some of the caches have grown to multiples of the
300 MB limit and (b) caching on Windows seems to be provide little to no
benefit.

To help understand the reasons for these problems, this patch adds a
line item to the list of steps run in CI to dump the ccache
configuration and statistics just prior to uploading the cache artifact.
2022-12-14 09:50:43 -06:00