torch-mlir

Commit Graph

Author	SHA1	Message	Date
Sambhav Jain	34478ab1c7	[Build] Add concurrency groups to address long queue times (#1219 ) We're seeing large CI queue times ([example](https://discord.com/channels/636084430946959380/742573221882364009/1007631811184164944)) especially with MacOS VMs on GHA. Part of the problem is follow-on commits to the same branch which trigger new runs while the previous runs are still in-progress, hogging on the scarce VMs. This PR adds concurrency groups to the GHA workflow which ensures that only a single job or workflow using the same concurrency group will run at a time. This would cancel any in-progress jobs in the same github workflow and github ref (e.g. `refs/heads/main` or `refs/pull/<pr_number>/merge`). As discussed on discord [thread](https://discord.com/channels/636084430946959380/1007787336848912386/1007787338895740928), once this lands we may have to closely monitor the workflows to see this didn't introduce unintended consequences. If so, we could either revert, or decide to selectively cancel particular runs (e.g. macos only which is the main bottleneck right now) instead of entire workflow. This will also require some expectation management. As in, if you see an ❌ on the main branch, it may not necessarily mean things broke, it could mean the run was killed by a more recent run. Making it a bit harder to traceback a failure to a commit in a sequence of commits (requiring to run those builds again). Thanks @powderluv for the proposal and pointer to this! It should help with the scarce VMs on GHA and save on queue time. References: * https://docs.github.com/en/actions/using-jobs/using-concurrency#example-only-cancel-in-progress-jobs-or-runs-for-the-current-workflow * https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#example-only-cancel-in-progress-jobs-or-runs-for-the-current-workflow	2022-08-12 17:38:48 -07:00
Ashay Rane	1581d6a84c	build: fix typo in path (#1218 ) When we renamed the directory containing submodules from `external` to `externals`, we accidentally left the original name in the Github workflow. This patch fixes the problem.	2022-08-12 15:38:25 -07:00
Sambhav Jain	aed0ec3a2c	Merge matrix runs to fail fast globally (#1216 ) My earlier[ PR](https://github.com/llvm/torch-mlir/pull/1213) had (among other things) decoupled ubuntu and macos builds into separate matrix runs. This is not working well due to limited number of MacOS GHA VMs causing long queue times and backlog. There are two reasons causing this backlog: 1. macos arm64 builds with pytorch source are getting erratically cancelled due to resource / network constraints. This is addressed with this: https://github.com/llvm/torch-mlir/pull/1215 > "macos-arm64 (in-tree, OFF) The hosted runner: GitHub Actions 3 lost communication with the server. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error." 2. macos runs don't fail-fast when ubuntu runs fail due to being in separate matrix setups. This PR couples them again.	2022-08-12 11:30:09 -07:00
Sambhav Jain	b8bd0a46cc	use pytorch binary for macos-arm64 builds (#1215 )	2022-08-12 06:33:57 -07:00
Sambhav Jain	f00ca91db0	Simplify matrix configuration for CI workflows (#1213 ) Addresses https://github.com/llvm/torch-mlir/issues/1207. #### Provisioned jobs: ``` # ubuntu - x86_64 - llvm in-tree - pytorch binary - build+test # most used dev flow and fastest signal # ubuntu - x86_64 - llvm out-of-tree - pytorch source - build+test # most elaborate build # macos - arm64 - llvm in-tree - pytorch source - build only # cross compile, can't test arm64 ``` #### Main changes - [x] Spawn macos builds from a separate matrix (in the same workflow). It made sense to do this as they are fairly different from ubuntu (cross compile, use a different cmake configuration). This simplifies the matrix configuration and exclusions quite a bit, and makes the workflow a bit more tractable and maintenance friendly. - [x] Remove the submodule md5sum step for ccache config. This was [broken](https://github.com/llvm/torch-mlir/runs/7779288734?check_suite_focus=true#step:3:145) for a while now. - [x] Removes unused matrix options - `os`, `targetarch`, `python-version`, `llvmtype`. - [x] Address ZSTD [comment](https://github.com/llvm/torch-mlir/pull/1204#discussion_r942349282) on @powderluv's cross compile [PR](https://github.com/llvm/torch-mlir/pull/1204). #### Further improvements (to be addressed in follow-on): * ubuntu-x86_64 out-of-tree integration tests fail ([error](https://github.com/sjain-stanford/torch-mlir/runs/7781264029?check_suite_focus=true)); only run unit tests for now (tests are excluded in current CI too) #### Passing workflow: https://github.com/sjain-stanford/torch-mlir/actions/runs/2840676309 ![image](https://user-images.githubusercontent.com/19234106/184194535-f3807991-401a-4cb9-b030-0ee8c334eba3.png)	2022-08-11 16:35:15 -07:00
Renato Golin	51bfe25c89	Add PyYaml to requirements.txt (#1174 ) Building on a fresh environment + virtualenv + in-tree build errors out becayse PyYaml isn't installed. Adding to requirements.txt fixes that. Fixes #1173	2022-08-11 17:59:39 +01:00
Prashant Kumar	b1a506624c	Add decomposition of `aten.masked.tensor` op. `aten.masked.tensor` op has been decomposed to `aten.masked.scalar` op.	2022-08-11 07:48:04 +05:30
Yan Xu	d96ec64be1	remove torch dialect from legal list (#1192 )	2022-08-11 09:22:41 +08:00
Vidush Singhal	dd2da5a038	E2E support for AtenRemainderScalarOp (#1200 )	2022-08-10 20:02:06 -04:00
gpetters94	79b9cf9468	Add lowering for aten.to.device (#1107 )	2022-08-10 19:24:02 -04:00
Ramana Radhakrishnan	b8d51a74d9	Update TorchToStd to TorchtoArith in bazel files too. (#1210 ) The CI didn't catch the missing rename of TorchToArith until the merge had happened. This is following up from #1163	2022-08-10 14:51:13 -07:00
Renato Golin	71c240a6fa	Add note about MLIR compiled outputs in dev docs (#1195 ) Adding an example on how to extract MLIR output from the compilation process in various different formats to the development documentation. This should help developers trying to either debug torch_mlir or use it for the purpose of extracting MLIR outputs for other testing. Fixes #1175	2022-08-10 20:14:41 +01:00
Ramana Radhakrishnan	738f4fe96a	Rename TorchToStd pass as TorchToArith (#1163 ) All the converters in this pass appear to create ops from the arith dialect. Hence the full rename. Fix GH Issue #409.	2022-08-10 20:12:51 +01:00
powderluv	2342456356	mac m1 cross compile (#1204 ) * mac m1 cross compile Add support for M1 cross compile * Remove redundant ExecutionEngine It is registered as part of RegisterEverything * nuke non-universal zstd disable LTC	2022-08-10 08:48:39 -07:00
武家伟	87562773f8	[MHLO] Add AtenCatOp conversion pattern to MHLO (#1208 ) Co-authored-by: Bairen Yi <yibairen.byron@bytedance.com> Co-authored-by: Jiawei Wu <xremold@gmail.com> Co-authored-by: Tianyou Guo <tianyou.gty@alibaba-inc.com> Co-authored-by: Xu Yan <yancey.yx@alibaba-inc.com> Co-authored-by: Ziheng Jiang <ziheng.jiang@bytedance.com> Co-authored-by: Vremold <xremold@gamil.com>	2022-08-09 22:12:34 -07:00
powderluv	9cf0b6e8ff	Disable out-of-tree and PyTorch binary (#1206 )	2022-08-09 18:18:12 -07:00
Sambhav Jain	072c2b5aaf	[Bazel] Add EraseModuleInitializer to TorchMLIRTorchPasses library (#1202 ) The torch-mlir bazel build is [failing](https://github.com/llvm/torch-mlir/runs/7737425906?check_suite_focus=true) since [this commit](`504de5e701`) due to a linker failure (undefined symbol: `mlir::torch::Torch::createEraseModuleInitializerPass()`). ``` ERROR: /home/runner/.cache/bazel/_bazel_runner/db599744cd37f8c161e5034d9b9cd520/external/torch-mlir/BUILD.bazel:845:10: Linking external/torch-mlir/torch-mlir-opt failed: (Exit 1): clang failed: error executing command /usr/lib/llvm-11/bin/clang @bazel-out/k8-fastbuild/bin/external/torch-mlir/torch-mlir-opt-2.params Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging ld.lld: error: undefined symbol: mlir::torch::Torch::createEraseModuleInitializerPass() >>> referenced by Passes.cpp >>> bazel-out/k8-fastbuild/bin/external/torch-mlir/_objs/TorchMLIRTorchPasses/Passes.pic.o:(mlir::torch::Torch::createTorchFunctionToTorchBackendPipeline(mlir::OpPassManager&, mlir::torch::Torch::TorchLoweringPipelineOptions const&)) >>> referenced by Passes.cpp >>> bazel-out/k8-fastbuild/bin/external/torch-mlir/_objs/TorchMLIRTorchPasses/Passes.pic.o:((anonymous namespace)::registerEraseModuleInitializerPass()::'lambda'()::operator()() const) clang: error: linker command failed with exit code 1 (use -v to see invocation) ``` This PR adds `lib/Dialect/Torch/Transforms/EraseModuleInitializer.cpp` to `TorchMLIRTorchPasses` library.	2022-08-09 13:34:59 -07:00
Sambhav Jain	d41c7becf5	[Bazel] Allow workflow_dispatch manual trigger on bazel workflow (#1203 ) At the moment we don't gate torch-mlir PRs with bazel builds. This means bazel builds don't get run on open PRs, and so there's no good way to validate a fix PR which is meant to fix a broken bazel build. This option allows a bazel build to be manually triggered as needed on open PRs.	2022-08-09 13:28:21 -07:00
Sambhav Jain	b696362b7d	Enable OOT builds in CI (#1188 )	2022-08-09 12:13:16 -07:00
Jacques Pienaar	e75c7c5292	Flip to C++17 (#1198 ) LLVM now uses C++17.	2022-08-09 08:38:30 -07:00
Marius Brehler	747356186f	Don't set MLIR_TABLEGEN_EXE (#1197 ) With llvm/llvm-project@112499f landed, `MLIR_TABLEGEN_EXE` is given as a cache variable in the MLIR core project. Other external projects, such as TORCH-MLIR, should not set the variable as this breaks cross-compilation.	2022-08-09 16:06:12 +02:00
Marius Brehler	202076c6e3	Add CMake dep to Func dialect (#1196 ) The Torch dialect has an include to `mlir/Dialect/Func/IR/FuncOps.h` and should therefore have a CMake dependency to the MLIRFuncDialect. Otherwise, the build can fail since it may occur that `mlir/Dialect/Func/IR/FuncOps.h.inc` isn't generated yet.	2022-08-09 06:54:30 -07:00
Yan Xu	f83a905856	[MHLO]fix lowering failed on reduction op with i32 shape (#1185 ) fixed lowering failed on torch::max.dim while shape type is i32	2022-08-09 17:02:50 +08:00
powderluv	e55fc4deb5	Revert "E2E support for AtenRemainderScalarOp (#1119 )" (#1190 ) This reverts commit `34e207eeb5`.	2022-08-08 22:59:57 -07:00
Ashay Rane	bb47c166a0	llvm: update tag to 061e0189 (#1180 ) Summary of changes: - Switch to C++17 (similar to https://reviews.llvm.org/D131348) - Update MHLO to build with LLVM commit hash 061e0189 - Replace deprecated `hasValue()` and `getValue()` with `has_value()` and `value()` respectively (https://reviews.llvm.org/D131349) - Use `TypedAttr` (https://reviews.llvm.org/D130092) - Use updated assembly format of `mhlo.compare` op (commit d03ef01e70fbf9afd0fa1976fbb7ed31838929b3 in MHLO repo)	2022-08-08 20:17:35 -07:00
Henry Tu	3e97a33c80	Revert "Reenable LTC in out-of-tree build (#1177 )" (#1183 ) This reverts commit `f85ae9c685`.	2022-08-08 18:58:35 -07:00
武家伟	351f15424e	[MHLO] Add transposed convolution conversion pattern (#1171 ) Co-authored-by: Bairen Yi <yibairen.byron@bytedance.com> Co-authored-by: Jiawei Wu <xremold@gmail.com> Co-authored-by: Tianyou Guo <tianyou.gty@alibaba-inc.com> Co-authored-by: Xu Yan <yancey.yx@alibaba-inc.com> Co-authored-by: Ziheng Jiang <ziheng.jiang@bytedance.com>	2022-08-09 09:50:07 +08:00
Sean Silva	504de5e701	Rework how global slot initializers work. Rather than a per-global-slot initializer region, we now have one for the whole module. For example, it might look like this: ``` torch.global_slot "private" @tensor : !torch.tensor torch.global_slot "private" @list : !torch.list<tensor> torch.global_slot.module_initializer { %0 = torch.tensor.literal(dense<0.0> : tensor<f32>) : !torch.tensor %1 = torch.prim.ListConstruct %0 : (!torch.tensor) -> !torch.list<tensor> torch.initialize.global_slots [ @tensor(%0 : !torch.tensor) @list(%1 : !torch.list<tensor>) ] } ``` This new structure allows GlobalizeObjectGraph to create the initializer in a much simpler way, avoiding the need to reason about whether different slots alias each other. Reasoning about whether slots alias each other now is the responsibility of InlineGlobalSlots, which has to do a much more complicated analysis, implemented using MLIR's dataflow analysis framework. Recommended review order: - Check out the new IR constructs in the .mlir files of various passes - Op definitions (*.td) - Changes to GlobalizeObjectGraph pass. - InlineGlobalSlots pass (~total rewrite) - Misc changes: - Moving torchMlirAdjustStaticInformation for sharing with C++ code. - EraseModuleInitializer pass To make this a bit nicer, it would be good to have a `torch.module` op with an initializer region attached. That would be more invasive though. This change has highlighted certain aspects of our project layering which are worth calling out. None of our backends can handle global slots, so we enforce that there are no global slots before backend lowering. At an earlier stage in the project, we had aspirations of transparently handling mutable global state and such, but for reasons described below, that is no longer a goal. So really global slots should be seen as a progressive lowering step as part of inlining all the IValue's in the original program (GlobalizeObjectGraph is also one such step). Over time, with insights from work like IREE-JAX, it has become clear that there isn't a reliable programming model we can compile for users where we just transparently handle mutable global state (and some other things, like lists and dictionaries). There is a need for an "outer program" that orchestrates more restricted subroutines of the kind we can handle in our compile flow here. The benefit of that is that it decouples considerations like shapes, dtypes, etc. from the program constructs used in the outer program. As long as the outer program can efficiently invoke (pipelining/async/etc.) high-performance data-parallel numerical subroutines of the kind we compile in our flow here, then there is a complete programming model. This is also consistent with the direction of upstream PyTorch which is becoming more tracing-based (which inherently loses a lot of program structure, which then has to be applied back with an "outer program" orchestrating the traced subroutines).	2022-08-08 18:12:06 -07:00
Vidush Singhal	34e207eeb5	E2E support for AtenRemainderScalarOp (#1119 ) * E2E support for AtenRemainderScalarOp	2022-08-08 20:02:52 -04:00
Vidush Singhal	b70548edff	Add decomposition and E2E support for Aten_EmbeddingBag (#1137 ) * Add decomposition and E2E support for Aten_EmbeddingBag	2022-08-08 18:56:49 -04:00
Henry Tu	f85ae9c685	Reenable LTC in out-of-tree build (#1177 )	2022-08-08 17:35:22 -04:00
Tanyo Kwok	290d7755fb	importer: add initial support for loading Float16 tensors (#1169 ) follow up #761: This patch updates the `torch_mlir::convertTensorToMlirElementsAttr()` method to enable the creation of tensors whose base type is Float16. This patch also adds a test to validate the IR generation, and it updates the test for importing tensors of various types.	2022-08-08 12:37:31 +08:00
Tanyo Kwok	1ee865983b	[MHLO] fix tensor mode aten.div op pattern (#1160 ) * [MHLO] fix tensor mode aten.div op pattern See RFC #999 Co-authored-by: Bairen Yi <yibairen.byron@bytedance.com> Co-authored-by: Jiawei Wu <xremold@gmail.com> Co-authored-by: Tianyou Guo <tianyou.gty@alibaba-inc.com> Co-authored-by: Xu Yan <yancey.yx@alibaba-inc.com> Co-authored-by: Ziheng Jiang <ziheng.jiang@bytedance.com>	2022-08-06 23:38:06 +08:00
Sean Silva	5618890ca0	development.md: Avoid name collisions with PYTORCH_ variables	2022-08-05 19:41:08 -07:00
Sean Silva	1fdaf2faa0	development.md: How to enable ASan	2022-08-05 17:37:09 -07:00
Henry Tu	e322f6a878	Update LTC CMake hack documentation (#1155 ) * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Update buildAndTest.yml * Update setup.py * Address review comments	2022-08-05 14:12:20 -04:00
Sean Silva	8ce5d3f12c	E2E framework: Report tensor dtype in summary This helps to triage issues related to backends that don't support all dtypes.	2022-08-05 10:05:18 -07:00
Vivek Khandelwal	c129a6de93	[MLIR][TORCH] Add support for dim=None to Aten[Var\|Std]DimOp PyTorch recently added support for `dim=None` in the `torch.var` (`5ca9b2b6fa`) and `torch.std`op (`eb0e30e0bc`). This commit adds the corresponding support in torch-mlir. Signed-Off By: Vivek Khandelwal<vivek@nod-labs.com>	2022-08-05 20:28:56 +05:30
Sean Silva	31727f81d8	torch_mlir.compile: Allow ignoring traced shapes In some cases, users know that a traced graph is valid for a wider set of shapes than they originally traced it with. Provide an option for users to ignore the shapes in the traced graph when they know it is legal. Fixes #997	2022-08-04 10:18:34 -07:00
Sean Silva	6484776a25	Make numerical stability test more perverse To test the summation stability of `torch.aten.var`, add a large constant to it, which increases the effective precision requirements.	2022-08-04 10:04:38 -07:00
武家伟	c94431f71c	[MHLO] Add convolution op pattern (#1152 ) Co-authored-by: Bairen Yi <yibairen.byron@bytedance.com> Co-authored-by: Jiawei Wu <xremold@gmail.com> Co-authored-by: Tianyou Guo <tianyou.gty@alibaba-inc.com> Co-authored-by: Xu Yan <yancey.yx@alibaba-inc.com> Co-authored-by: Ziheng Jiang <ziheng.jiang@bytedance.com>	2022-08-04 00:41:35 -07:00
gpetters94	08fc2d89bb	Add non-unit groups support to aten.convolution (#858 )	2022-08-04 02:18:38 -04:00
武家伟	d030591df9	[MHLO] Init MHLO pooling-like op conversion (#1141 ) * [MHLO] Init MHLO pooling-like op conversion and remove 'op' suffix in filenames Co-authored-by: Bairen Yi <yibairen.byron@bytedance.com> Co-authored-by: Jiawei Wu <xremold@gmail.com> Co-authored-by: Tianyou Guo tianyou.gty@alibaba-inc.com Co-authored-by: Xu Yan <yancey.yx@alibaba-inc.com> Co-authored-by: Ziheng Jiang <ziheng.jiang@bytedance.com> See RFC #999	2022-08-04 12:34:22 +08:00
Tanyo Kwok	f0a24f59f6	[MHLO] Init MHLO linear op patterns (#1132 ) See RFC https://github.com/llvm/torch-mlir/issues/999 Co-authored-by: Bairen Yi yibairen.byron@bytedance.com Co-authored-by: Jiawei Wu xremold@gmail.com Co-authored-by: Tianyou Guo tianyou.gty@alibaba-inc.com Co-authored-by: Xu Yan yancey.yx@alibaba-inc.com Co-authored-by: Ziheng Jiang ziheng.jiang@bytedance.com	2022-08-03 19:10:54 -07:00
Ahmed S. Taei	48ec300586	[Fix bazel] Add recently added torch->mhlo conversion pass to bazel (#1148 )	2022-08-03 14:12:07 -07:00
powderluv	37a229cffc	Update buildAndTest.yml (#1145 )	2022-08-03 12:50:54 -07:00
powderluv	0d25b6f10e	Fix cache-suffix name bug (#1138 ) This should enabling better caching of builds.	2022-08-03 07:53:01 -07:00
Vivek Khandelwal	f2a0e32127	[MLIR][TORCH] Fix CI failure This commit fixes the CI failure by temporarily adding the failing test to xfail set. Signed-Off By: Vivek Khandelwal<vivek@nod-labs.com>	2022-08-03 20:07:56 +05:30
武家伟	636f5acb10	[MHLO] Init MHLO reduce-like op conversion (#1133 ) * [MHLO] init reduce-like op conversion from Torch to MHLO Co-authored-by: Bairen Yi <yibairen.byron@bytedance.com> Co-authored-by: Jiawei Wu <xremold@gmail.com> Co-authored-by: Tianyou Guo <tianyou.gty@alibaba-inc.com> Co-authored-by: Xu Yan <yancey.yx@alibaba-inc.com> Co-authored-by: Ziheng Jiang <ziheng.jiang@bytedance.com>	2022-08-03 10:47:52 +08:00
Tanyo Kwok	0b23af27d3	[MHLO] support non-constant torch scalar in BasicOps (#1134 ) See RFC https://github.com/llvm/torch-mlir/issues/999 Co-authored-by: Bairen Yi yibairen.byron@bytedance.com Co-authored-by: Jiawei Wu xremold@gmail.com Co-authored-by: Tianyou Guo tianyou.gty@alibaba-inc.com Co-authored-by: Xu Yan yancey.yx@alibaba-inc.com Co-authored-by: Ziheng Jiang ziheng.jiang@bytedance.com	2022-08-03 08:16:31 +08:00

1 2 3 4 5 ...

1288 Commits (34478ab1c702bc7e229708eb5fc6b0b1c6aada70) All Branches Search

1288 Commits (34478ab1c702bc7e229708eb5fc6b0b1c6aada70)

All Branches