torch-mlir

Commit Graph

Author	SHA1	Message	Date
Stella Laurenzo	032f225fa5	[ci] Allow long line in YAML	2024-01-27 19:43:41 -08:00
Stella Laurenzo	6b3ebb237f	[ci] Use a different cache key for torch nightly vs stable.	2024-01-27 19:42:29 -08:00
Stella Laurenzo	4513c3ca87	[ci] Add step to run unit tests. (#2820 )	2024-01-27 19:35:48 -08:00
Stella Laurenzo	77c14ab22b	[ci] Upgrade to new runners and disable unsupported jobs. (#2818 ) Per the RFC and numerous conversations on Discord, this rebuilds the torch-mlir CI and discontinues the infra and coupling to the binary releases (https://discourse.llvm.org/t/rfc-discontinuing-pytorch-1-binary-releases/76371). I iterated on this to get latency back to about what it was with the old (much larger and non-ephemeral) runners: About 4m - 4.5m for an incremental change. Behind the scenes changes: * Uses a new runner pool operated by AMD. It is currently set to manual scaling and has two runners (32-core, 64GiB RAM) while we get some traction. We can either fiddle with some auto-scaling or use a schedule to give it an increase during certain high traffic hours. * Builds are now completely isolated and cannot have run-to-run interference like we were getting before (i.e. lock file/permissions stuff). * The GHA runner is installed directly into a manylinux 2.28 container with upgraded dev tools. This eliminates the need to do sub-invocations of docker on Linux in order to run on the same OS that is used to build wheels. * While not using it now, this setup was cloned from another project that posts the built artifacts to the job and fans out testing. Might be useful here later. * Uses a special git cache that lets us have ephemeral runners and still check out the repo and deps (incl. llvm) in ~13s. * Running in an Azure VM Scale Set. In-repo changes: * Disables (but does not yet delete): * Old buildAndTest.yml jobs * releaseSnapshotPackage.yml * Adds a new `ci.yml` pipeline and scripts the steps in `build_tools/ci` (by decomposing the existing `build_linux_packages.sh` for in-tree builds and modularizing it a bit better). * Test framework changes: * Adds a `TORCH_MLIR_TEST_CONCURRENCY` env var that can be used to bound the multiprocess concurrency. Ended up not using this in the final version but is useful to have as a knob. * Changes the default concurrency to `nproc * 0.8 + 1` vs `nproc * 1.1`. We're running on systems with significantly less virtual memory and I did a bit of fiddling to find a good tradeoff. * Changed multiprocess mode to spawn instead of fork. Otherwise, I was getting instability (as discussed on discord). * Added MLIR configuration to disable multithreaded contexts globally for the project. Constantly spawning `nproc * nproc` threads (more than that actually) was OOM'ing. * Added a test timeout of 5 minutes. If a multiprocess worker crashes, the framework can get wedged indefinitely (and then will just be reaped after multiple hours). We should fix this, but this at least keeps the CI pool from wedging with stuck jobs. Functional changes needing followup: * No matter what I did, I couldn't get the LTC tests to work, and I'm not 100% sure they were being run in the old setup as the scripts were a bit twisty. I disabled them and left a comment. * Dropped out-of-tree build variants. These were not providing much signal and increase CI needs by 50%. * Dropped MacOS and Windows builds. Now that we are "just a library" and not building releases, there is less pressure to test these commit by commit. Further, since we bump torch-mlir to known good commits on these platforms, it has been a long time since either of these jobs have provided much signal (and they take ~an hour+ to run). We can add them back later post-submit if ever needed.	2024-01-27 18:35:45 -08:00
Stella Laurenzo	4a4d80a6ad	[ci] Add lint job and enable yaml linting of GH files. (#2819 )	2024-01-27 15:48:06 -08:00
MaheshRavishankar	28c7051ceb	Bump LLVM to llvm/llvm-project@5fcf907b34 (#2810 )	2024-01-26 18:38:44 -08:00
Aart Bik	46a25d7241	[torch-mlir][sparse] preserve sparsity during lowering torch to linalg (#2809 ) This preserves sparsity at the most obvious places of lowering TORCH tensors to MLIR RankedTensorType tensors. Other places are marked for audit. With some initial lowering tests.	2024-01-26 10:54:59 -08:00
Vivek Khandelwal	da7c6d2c16	[MLIR][TORCH] Add support for dynamic shape for Onnx.Transpose op (#2803 ) Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>	2024-01-26 09:46:54 -08:00
Phaneesh Barwaria	4964977e85	[ONNX][MLIR] support constantOfShape op (#2747 )	2024-01-26 09:36:39 -08:00
Yuanqiang Liu	e73c5368fb	[FxImporter] make FxImporter to fit python<=3.9 (#2802 ) As that torch with py3.9 is also used widely.	2024-01-26 09:01:47 +08:00
Rob Suderman	2ef228328f	[torch] `torch.dequantize` for per channel tensors to` linalg` (#2769 ) Support a lowering for dequantization for per channel tensors from `torch` dialect to a linalg decomposition. Tested via a numerical `torch` test.	2024-01-25 16:40:21 -08:00
Aart Bik	0aed231e21	[torch-mlir][conversion-test] cleanup trailing whitespace in mlir files (#2807 )	2024-01-25 14:24:28 -08:00
Aart Bik	fe836ceebf	[torch-mlir][test] cleanup trailing whitespace in mlir files (#2806 )	2024-01-25 14:24:13 -08:00
Aart Bik	dc9c624a29	[torch-mlir][sparse] provide a bazel build (#2805 )	2024-01-25 12:54:40 -08:00
Aart Bik	e824fbc65c	[torch-mlir][torch] add encoding field to torch type (#2799 ) This adds an encoding field to the torch type, using the interfaces for printing, parsing, and verification. Note that although this change prepares adding sparsity to the torch type (as illustrated by the round trip and invalid tests), nothing in this change depends on the actual contents of the encoding field!	2024-01-25 10:04:04 -08:00
lonely eagle	e581b33f96	[Stablehlo]fix CumsumInputDtypeInt32Module_basic on stablehlo backend. (#2797 ) Code used for testing.For the location of CumsumInputDtypeInt32Module in the repo you can see [here](`311b6b0286/projects/pt1/python/torch_mlir_e2e_test/test_suite/basic.py (L4148)`). ```python import torch import torch_mlir class CumsumInputDtypeInt32Module(torch.nn.Module): def __init__(self): super().__init__() def forward(self, val): return torch.ops.aten.cumsum(val, 1) module = torch_mlir.compile(CumsumInputDtypeInt32Module(), [torch.randn(2, 7, 4).to(torch.int32)], output_type="stablehlo") print(module.operation.get_asm()) ``` After fixing the bugs. ``` module attributes {torch.debug_module_name = "CumsumInputDtypeInt32Module"} { func.func @forward(%arg0: tensor<2x7x4xi32>) -> tensor<2x7x4xi64> { %0 = stablehlo.constant dense<0> : tensor<i64> %1 = stablehlo.convert %arg0 : (tensor<2x7x4xi32>) -> tensor<2x7x4xi64> %2 = "stablehlo.reduce_window"(%1, %0) ({ ^bb0(%arg1: tensor<i64>, %arg2: tensor<i64>): %3 = stablehlo.add %arg1, %arg2 : tensor<i64> stablehlo.return %3 : tensor<i64> }) {padding = dense<[[0, 0], [6, 0], [0, 0]]> : tensor<3x2xi64>, window_dilations = dense<1> : tensor<3xi64>, window_dimensions = dense<[1, 7, 1]> : tensor<3xi64>, window_strides = dense<1> : tensor<3xi64>} : (tensor<2x7x4xi64>, tensor<i64>) -> tensor<2x7x4xi64> return %2 : tensor<2x7x4xi64> } } ```	2024-01-25 10:44:08 +08:00
Rob Suderman	f6f890520b	[torch][quant] Quantized `torch.mm` for linalg with end-to-end test (#2750 ) This includes custom op matching for decomposed operations and fusing dequantization into dense operations. As a validation we compare to the dequant+mm torch implementation.	2024-01-24 14:02:50 -08:00
Rob Suderman	60bf6c25af	[onnx] Lower `onnx.QLinearMatMul` lowering to `torch` operators (#2776 ) We can plumb the linear matmul into pytorch using its quantized types with side channel information. To handle the final int8 operation we dequantize and requantize.	2024-01-24 12:28:48 -08:00
Vivek Khandelwal	894805dd5e	[MLIR][TORCH] Support for `onnx.LayerNormalization` (#2789 ) Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>	2024-01-24 11:08:20 -08:00
Gaurav Shukla	12f123eff8	[ONNX][MLIR] Add support for pad op in the onnx pipeline (#2738 ) This commit adds mapping from `onnx.pad` op to `torch.pad` op. Currently it does not support `axes` parameter of `onnx.pad` op. Signed-off-by: Gaurav Shukla <gaurav.shukla@amd.com>	2024-01-25 00:33:37 +05:30
Phaneesh Barwaria	ac8975ea12	[MLIR] [ONNX] lowering for onnx tile op and sign op (#2725 )	2024-01-24 22:56:21 +05:30
zjgarvey	c531f5495b	AtenAdaptiveMaxPool2d Conversion to Linalg (#2779 ) The logic here is very similar to the conversion for AdaptiveAvgPool1d #2661 with a few modifications: 1. buffVal = -inf instead of 0 2. the main linalg generic op accumulates a max, instead of a sum, to the first output tensor 3. avg pooling requires dividing the sum pool by the kernel width, which we stored as an auxilliary tensor (kSizeTensor). Here, the auxiliary tensor will be recording the indices. Strangely enough, the only signature available for this function is to return indices, and it appears that they must be computed whether the user desires them or not. See [pytorch/torch/nn/functional.py](https://github.com/pytorch/pytorch/blob/main/torch/nn/functional.py#L1174). Before writing other adaptive pooling conversions, the logic of this decomposition should be rolled into a helper function that will work for both max and avg pooling ops. Even the auxiliary tensor should likely be automated. This code was written in a slightly more tedious way than strictly necessary (often using loops to fill SmallVectors up to rank-2, which is only two in this case), in order to more easily facilitate the transition to a helper function.	2024-01-24 09:09:56 -08:00
Vivek Khandelwal	311b6b0286	CI: Fix Roll PyTorch CI failure at determining commit hash (#2796 ) Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>	2024-01-24 15:55:12 +05:30
Xida Ren (Cedar)	ccaac85788	implement aten.conv1d, aten.conv3d, and aten.conv_tbc (#2757 ) convolution with [time,batch,channel] ordering, as opposed to the default [batch, channel, time]. Currently implementing by transposing the input and output, but may need to get its own implementation in the future because this is supposed to be an op that gives a speedup. This is used by fairseq (https://github.com/facebookresearch/fairseq/issues/172). (in case you were wondering like me, this is different from transposed convolution. Transposed convolution has fractional strides). --------- Co-authored-by: Xida Ren <xida.ren.dev@gmail.com> Co-authored-by: Frederik Harwath <frederik.harwath@amd.com>	2024-01-23 21:30:03 -08:00
Chi_Liu	77ae56337d	[ONNX][MLIR] Add support for onnx.Exp op (#2792 ) https://github.com/nod-ai/SHARK-Turbine/issues/312	2024-01-23 13:45:00 -08:00
James Newling	dc056e58e6	[MLIR][TORCH] Add onnx.cast cases used by OPT-1.25M (#2787 )	2024-01-23 21:06:25 +05:30
Vivek Khandelwal	c9d8ffb414	build: manually update PyTorch version (#2788 ) Set PyTorch and TorchVision version to nightly release 2024-01-22. Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>	2024-01-23 21:05:19 +05:30
Gaurav Shukla	b7a0329676	[ONNX][MLIR] Fix padding size constraint for onnx.maxpool op (#2782 ) Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>	2024-01-23 19:23:01 +05:30
Dave Liddell	d452c4f4c0	Fix onnx importer to treat Constant values as static (#2780 ) Fixes https://github.com/llvm/torch-mlir/issues/2764 In the case of OPT, there are ConstantOfShape ops whose input shape is not static (that is, an initializer), but rather comes from a Constant op. The importer can't handle such non-static input shapes. The fix here is to create initializers for a subset of Constant ops (ones with "value" attributes), so that their outputs can be used statically. Additionally, there was no case for creating a splat of int64, so I added that as well. --------- Co-authored-by: Dave Liddell <dliddell@xilinx.com>	2024-01-22 13:00:05 -08:00
Chi_Liu	cad98e8113	[ONNX][TORCH-MLIR] Add TopK support (#2774 ) https://github.com/nod-ai/SHARK-Turbine/issues/331	2024-01-22 12:56:39 -08:00
Ramiro Leal-Cavazos	5883ef0f21	Fix unused variable warnings (#2775 )	2024-01-22 11:05:55 -08:00
Srinath Avadhanula	73b30604da	Do not try to legalize transposed convolution (#2721 ) Currently transposed convolution is not handled correctly by `TorchToTosa`. This PR allows transposed convolutions to pass through the conversion so that they can be handled by other conversion passes later in a pipeline. An example input which produces a compilation error is: ``` func.func @forward(%input: !torch.vtensor<[1,64,1,100],f32>) -> !torch.vtensor<[1,64,2,200],f32> { %true = torch.constant.bool true %int1 = torch.constant.int 1 %int2 = torch.constant.int 2 %weight = torch.vtensor.literal(dense<0.0> : tensor<64x64x3x3xf32>) : !torch.vtensor<[64,64,3,3],f32> %bias = torch.vtensor.literal(dense<0.0> : tensor<64xf32>) : !torch.vtensor<[64],f32> %stride = torch.prim.ListConstruct %int2, %int2 : (!torch.int, !torch.int) -> !torch.list<int> %int1x1 = torch.prim.ListConstruct %int1, %int1 : (!torch.int, !torch.int) -> !torch.list<int> %output = torch.aten.convolution %input, %weight, %bias, %stride, %int1x1, %int1x1, %true, %int1x1, %int1 : !torch.vtensor<[1,64,1,100],f32>, !torch.vtensor<[64,64,3,3],f32>, !torch.vtensor<[64],f32>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int -> !torch.vtensor<[1,64,2,200],f32> return %output : !torch.vtensor<[1,64,2,200],f32> } ``` This MLIR produces an error about a cast operation with a size mismatch when passed through `torch-to-tosa`: ``` error: 'tensor.cast' op operand type 'tensor<1x64x1x50xf32>' and result type 'tensor<1x64x2x200xf32>' are cast incompatible ``` --------- Co-authored-by: Srinath Avadhanula <srinath.avadhanula@getcruise.com>	2024-01-22 10:57:56 -08:00
Franz Haniel	b9806cfa38	[TorchToLinalg] Add lowering for torch.aten.diagonal (#2632 )	2024-01-22 12:47:13 -05:00
James Newling	50ac3b1912	g++ build fix (#2778 ) Introduced in `704cfdaf08` of @wu-s-john g++ compiler error: Pooling.cpp:177:13: error: explicit specialization in non-namespace scope ‘class Design looks good, g++ is just freaking out for no good reason. Un-nesting the template classes fixes the error. We don't have g++ CI. This hopefully happens infrequently enough that we can just fix manually. My service to those folks who really like building with g++... :)	2024-01-19 19:12:29 -08:00
Dave Liddell	2f4924015d	[onnx] Added flatten (#2760 ) [https://github.com/nod-ai/SHARK-Turbine/issues/328](url) --------- Co-authored-by: Dave Liddell <dliddell@xilinx.com>	2024-01-19 16:18:16 -08:00
Scott Todd	b3a3ad4e2a	Generalize install instructions to not exclude Windows. (#2771 ) Overly specific docs can get stale easily. It looks like https://llvm.github.io/torch-mlir/package-index/ has included Windows packages since around https://github.com/llvm/torch-mlir/pull/1521. Here's an example release: https://github.com/llvm/torch-mlir/releases/tag/snapshot-20240118.1087 ``` torch-2.3.0.dev20240109+cpu-cp311-cp311-linux_x86_64.whl torch-2.3.0.dev20240109+cpu-cp311-cp311-win_amd64.whl torch-2.3.0.dev20240109+cpu-cp38-cp38-linux_x86_64.whl torch-2.3.0.dev20240109-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl torch-2.3.0.dev20240109-cp311-none-macosx_10_9_x86_64.whl torch_mlir-20240118.1087-cp311-cp311-linux_aarch64.whl torch_mlir-20240118.1087-cp311-cp311-linux_x86_64.whl torch_mlir-20240118.1087-cp311-cp311-macosx_11_0_universal2.whl torch_mlir-20240118.1087-cp311-cp311-win_amd64.whl torch_mlir-20240118.1087-cp38-cp38-linux_x86_64.whl ```	2024-01-19 15:13:32 -08:00
Xida Ren (Cedar)	18669b38cb	Create add_ops.md (#2770 )	2024-01-19 10:44:45 -08:00
Gaurav Shukla	3b85c70748	[ONNX][MLIR] Add support for onnx.gather op (#2726 ) This commit adds support for gather op in the onnx pipeline. https://github.com/nod-ai/SHARK-Turbine/issues/242 Signed-off-by: Gaurav Shukla <gaurav.shukla@amd.com>	2024-01-19 21:58:29 +05:30
John Wu	704cfdaf08	Add aten.pool_max3d support to torch-to-linalg (#2735 ) Added verification logic to the abstract_interpreter_lib_gen.py Also made some unit tests Initially, I thought we can use `linalg::pooling_ndhwc_max` to help implement this problem. However, on a 5-dimensional matrix it does the pooling on dimensions (2, 3, 4) which is not what we want. We want pooling on dimensions (3, 4, 5). To achieve this, we would need to lower our code using the `linalg` dialect. Turns out the pooling code in `linalg` looks like this. ``` func @max_pooling_ncdhw(%I: memref<?x?x?x?x?xf32>, %K: memref<3xindex>, %O: memref<?x?x?x?x?xf32>, %strides: memref<3xindex>, %dilations: memref<3xindex>) { %c0 = arith.constant 0 : index %c1 = arith.constant 1 : index %N = memref.dim %I, %c0 : memref<?x?x?x?x?xf32> %C = memref.dim %I, %c1 : memref<?x?x?x?x?xf32> %D = memref.dim %I, 2 : memref<?x?x?x?x?xf32> %H = memref.dim %I, 3 : memref<?x?x?x?x?xf32> %W = memref.dim %I, 4 : memref<?x?x?x?x?xf32> %kernel_d = memref.load %K[%c0] : memref<3xindex> %kernel_h = memref.load %K[%c1] : memref<3xindex> %kernel_w = memref.load %K[2] : memref<3xindex> %stride_d = memref.load %strides[%c0] : memref<3xindex> %stride_h = memref.load %strides[%c1] : memref<3xindex> %stride_w = memref.load %strides[2] : memref<3xindex> %dilation_d = memref.load %dilations[%c0] : memref<3xindex> %dilation_h = memref.load %dilations[%c1] : memref<3xindex> %dilation_w = memref.load %dilations[2] : memref<3xindex> linalg.generic { indexing_maps = [ affine_map<(n, c, d, h, w, kd, kh, kw) -> (n, c, d * %stride_d + kd * %dilation_d, h * %stride_h + kh * %dilation_h, w * %stride_w + kw * %dilation_w)>, // Map for input tensor affine_map<(n, c, d, h, w, kd, kh, kw) -> (kd, kh, kw)>, // Map for kernel tensor affine_map<(n, c, d, h, w, kd, kh, kw) -> (n, c, d, h, w)> // Map for output tensor ], iterator_types = ["parallel", "parallel", "parallel", "parallel", "parallel", "reduction", "reduction", "reduction"], doc = "3D Max Pooling NCDHW with Strides, Dilations, and Kernel Size" } ins(%I, %K : memref<?x?x?x?x?xf32>, memref<3xindex>) outs(%O : memref<?x?x?x?x?xf32>) { ^bb0(%input_elem: f32, %kernel_elem: index, %output_elem: f32): %max_val = arith.maxf %input_elem, %output_elem : f32 linalg.yield %max_val : f32 } return } ``` This was implemented based on it's source code with the adjustments mentioned above: `4ca1b5e094/mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yaml (L5647)` Issues related to this can be found here https://github.com/nod-ai/SHARK-Turbine/issues/324	2024-01-19 21:09:46 +05:30
Ilija Kalinić	faa4517e83	Implement lowering of torch.aten.remainder.Tensor (#2763 ) Closes nod-ai/SHARK-Turbine#349	2024-01-19 18:09:08 +05:30
Andreas Falkenberg	4de4d38b87	Initial commit of NonZero op (#2766 )	2024-01-18 15:23:13 -10:00
Rob Suderman	b5387c0f29	[onnx] Lowering `onnx.dequantize_linear` to `torch` (#2759 ) We can make the per-tensor version of the operation to the dequantize operation via marking with the make quantized tensor component. This introductions the `qint` and `quint` tensor type that can be lowered to teh appropriate dequantization behavior during the torch-to-linalg conversion.	2024-01-18 16:47:21 -08:00
Rob Suderman	bd11877f6f	[onnx] Support lowering quantize linear to `torch` (#2751 ) We can map the per_tensor case to the `torch.aten.quantize_per_linear` operation. In this case we extract the `scale` and `zeropoint` values and directly invoke the quantization, then return the integer representation value.	2024-01-18 16:33:10 -08:00
Ze Zhang	77a03f2069	torch-to-tosa lowering support for AtenLinalgVectorNormOp (#2734 ) This PR add torch-to-tosa lowering support for AtenLinalgVectorNormOp e2e test: python -m e2e_testing.main --config=tosa LIT tests: cmake --build build --target tools/torch-mlir/all --------- Co-authored-by: Ze Zhang <ze.zhang@getcruise.com>	2024-01-18 12:32:23 -08:00
Phaneesh Barwaria	eed144bfbc	[ONNX][MLIR] add Identity op support (#2754 )	2024-01-16 19:06:54 +05:30
Sungsoon Cho	a8538e1e3f	Decompose AtenNormalFunctionalOp into AtenRandn* and other arithmetic. (#2737 )	2024-01-15 22:49:29 -08:00
lonely eagle	f85e5c932b	[Torch Dialect] support aten.isneginf, aten.isposinf, aten.nan_to_num (#2743 )	2024-01-16 14:29:34 +08:00
James Newling	f78ec78ac8	Adjust bound check to be the same as PyTorch native (i.e. stricter) (#2755 ) prims.expand expects the start and end dimensions to be strictly less than the rank of the tensor.	2024-01-15 11:44:45 -08:00
kumardeepakamd	87389f0762	[ONNXToTorch] Add conversion for Onnx range (#2752 ) Implemented ONNX.Range. The spec says the data type for start, limit, delta are 0-D can be double, float, int16, int32, int64, All int types mapped to !torch.int and all float types mapped to !torch.float --------- Co-authored-by: Kumar Deepak <kumar@xilinx.com>	2024-01-15 14:26:46 -05:00
lisaliu1	09421b1cf3	[TorchToLinalg] Add lowering for aten.replication_pad2d (#2715 ) Co-authored-by: Lisa Liu <lingl@xilinx.com>	2024-01-15 14:02:27 -05:00

... 4 5 6 7 8 ...

2721 Commits (e4b11a0ab4cfcb9bc6e2665d761e181a27097184) All Branches Search

2721 Commits (e4b11a0ab4cfcb9bc6e2665d761e181a27097184)

All Branches