torch-mlir

Commit Graph

Author	SHA1	Message	Date
Rob Suderman	25a5a22cbd	[torch] Support `torch.convolution` quantized lowering to `linalg` (#2811 ) Linalg has quantized specific operations. We can lower to these operations when there is a known zeropoint and scale operations. This allows the `convolution` to occur with lower bitwidth's, improving the overall performance.	2024-01-30 13:46:47 -08:00
Rob Suderman	2ef228328f	[torch] `torch.dequantize` for per channel tensors to` linalg` (#2769 ) Support a lowering for dequantization for per channel tensors from `torch` dialect to a linalg decomposition. Tested via a numerical `torch` test.	2024-01-25 16:40:21 -08:00
Rob Suderman	f6f890520b	[torch][quant] Quantized `torch.mm` for linalg with end-to-end test (#2750 ) This includes custom op matching for decomposed operations and fusing dequantization into dense operations. As a validation we compare to the dequant+mm torch implementation.	2024-01-24 14:02:50 -08:00
Xida Ren (Cedar)	ccaac85788	implement aten.conv1d, aten.conv3d, and aten.conv_tbc (#2757 ) convolution with [time,batch,channel] ordering, as opposed to the default [batch, channel, time]. Currently implementing by transposing the input and output, but may need to get its own implementation in the future because this is supposed to be an op that gives a speedup. This is used by fairseq (https://github.com/facebookresearch/fairseq/issues/172). (in case you were wondering like me, this is different from transposed convolution. Transposed convolution has fractional strides). --------- Co-authored-by: Xida Ren <xida.ren.dev@gmail.com> Co-authored-by: Frederik Harwath <frederik.harwath@amd.com>	2024-01-23 21:30:03 -08:00
John Wu	704cfdaf08	Add aten.pool_max3d support to torch-to-linalg (#2735 ) Added verification logic to the abstract_interpreter_lib_gen.py Also made some unit tests Initially, I thought we can use `linalg::pooling_ndhwc_max` to help implement this problem. However, on a 5-dimensional matrix it does the pooling on dimensions (2, 3, 4) which is not what we want. We want pooling on dimensions (3, 4, 5). To achieve this, we would need to lower our code using the `linalg` dialect. Turns out the pooling code in `linalg` looks like this. ``` func @max_pooling_ncdhw(%I: memref<?x?x?x?x?xf32>, %K: memref<3xindex>, %O: memref<?x?x?x?x?xf32>, %strides: memref<3xindex>, %dilations: memref<3xindex>) { %c0 = arith.constant 0 : index %c1 = arith.constant 1 : index %N = memref.dim %I, %c0 : memref<?x?x?x?x?xf32> %C = memref.dim %I, %c1 : memref<?x?x?x?x?xf32> %D = memref.dim %I, 2 : memref<?x?x?x?x?xf32> %H = memref.dim %I, 3 : memref<?x?x?x?x?xf32> %W = memref.dim %I, 4 : memref<?x?x?x?x?xf32> %kernel_d = memref.load %K[%c0] : memref<3xindex> %kernel_h = memref.load %K[%c1] : memref<3xindex> %kernel_w = memref.load %K[2] : memref<3xindex> %stride_d = memref.load %strides[%c0] : memref<3xindex> %stride_h = memref.load %strides[%c1] : memref<3xindex> %stride_w = memref.load %strides[2] : memref<3xindex> %dilation_d = memref.load %dilations[%c0] : memref<3xindex> %dilation_h = memref.load %dilations[%c1] : memref<3xindex> %dilation_w = memref.load %dilations[2] : memref<3xindex> linalg.generic { indexing_maps = [ affine_map<(n, c, d, h, w, kd, kh, kw) -> (n, c, d * %stride_d + kd * %dilation_d, h * %stride_h + kh * %dilation_h, w * %stride_w + kw * %dilation_w)>, // Map for input tensor affine_map<(n, c, d, h, w, kd, kh, kw) -> (kd, kh, kw)>, // Map for kernel tensor affine_map<(n, c, d, h, w, kd, kh, kw) -> (n, c, d, h, w)> // Map for output tensor ], iterator_types = ["parallel", "parallel", "parallel", "parallel", "parallel", "reduction", "reduction", "reduction"], doc = "3D Max Pooling NCDHW with Strides, Dilations, and Kernel Size" } ins(%I, %K : memref<?x?x?x?x?xf32>, memref<3xindex>) outs(%O : memref<?x?x?x?x?xf32>) { ^bb0(%input_elem: f32, %kernel_elem: index, %output_elem: f32): %max_val = arith.maxf %input_elem, %output_elem : f32 linalg.yield %max_val : f32 } return } ``` This was implemented based on it's source code with the adjustments mentioned above: `4ca1b5e094/mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yaml (L5647)` Issues related to this can be found here https://github.com/nod-ai/SHARK-Turbine/issues/324	2024-01-19 21:09:46 +05:30
Ze Zhang	77a03f2069	torch-to-tosa lowering support for AtenLinalgVectorNormOp (#2734 ) This PR add torch-to-tosa lowering support for AtenLinalgVectorNormOp e2e test: python -m e2e_testing.main --config=tosa LIT tests: cmake --build build --target tools/torch-mlir/all --------- Co-authored-by: Ze Zhang <ze.zhang@getcruise.com>	2024-01-18 12:32:23 -08:00
Sungsoon Cho	a8538e1e3f	Decompose AtenNormalFunctionalOp into AtenRandn* and other arithmetic. (#2737 )	2024-01-15 22:49:29 -08:00
lonely eagle	f85e5c932b	[Torch Dialect] support aten.isneginf, aten.isposinf, aten.nan_to_num (#2743 )	2024-01-16 14:29:34 +08:00
Rob Suderman	dc37616d67	[torch][quant] Support quantize and dequantize for torch (#2731 ) Handle both `torch.dequantize` and `torch.quantize_per_tensor` including the op based quantization parameter tracking. This includes adding `qint32` to torch types as it was missing during the initial type inclusion. For testing we only have `torch.int8` and `torch.float` types on function boundaries as the `qint8` types require passing the scale and zero point quantization information which is not supported yet.	2024-01-12 19:11:14 -08:00
Ze Zhang	670a99ae19	Handle torch.none type in tosa.clamp op (#2739 ) This PR updates the torch-to-tosa conversion with following changes: - Support torch.none as min/max input argument for tosa.clamp op - Support negative value as start index for tosa.slice op - Add tosa.logical_or lowering support e2e test: python -m e2e_testing.main --config=tosa LIT tests: cmake --build build --target tools/torch-mlir/all --------- Co-authored-by: Ze Zhang <ze.zhang@getcruise.com>	2024-01-11 10:36:48 -08:00
Ilija Kalinić	e1a86e480a	Implement lowering of torch.aten.logit (#2697 ) Closes nod-ai/SHARK-Turbine#290	2024-01-11 20:25:42 +05:30
zjgarvey	07d0645f64	[RFC] general support for Adaptive Pooling Ops (#2661 ) Adaptive pooling ops can only be decomposed into their non-adaptive counterparts in trivial cases. For example, the current decomposition for AtenAdaptiveAvgPool1dOp in DecomposeComplexOps.cpp supports outSize = inSize (i.e., do literally nothing), and outSize = 1 (i.e., do a batched average). The reason adaptive pooling ops are difficult to lower to linalg is that they are not constantly strided. They are computed by taking an input tensor of shape (N, C, Hin), and an output size Hout, and computing the output tensor at position (n,c, h) in the following way: 1. compute st(h) = (hHin)//Hout 2. compute en(h) = 1 + ((h+1)Hin -1)//Hout 3. apply a computation (max or avg) to the slice: INPUT[n, c, st(h):en(h)] The provided sample implementation (for ConvertAtenAdaptiveAvgPool1dOp) uses tensor.extract to access the input tensor inside the payload of a linalg generic op. This is likely an unattractive use of linalg generic ops, which is why I am asking for some more targeted feedback on the validity of this approach before attempting to support the many other adaptive pooling ops. Specifically: - Is the performance of this implementation bad enough to warrant targeting different dialects entirely? e.g. TMtensor/linalg ext/ etc. - If the provided implementation is of acceptable performance to the community, then is it permissable to remove the Adaptive pooling decompositions from DecomposeComplexOps.cpp? Based on the current structure of the -torch-decompose-complex-ops pass, it does not seem possible to only decompose the adaptive ops in special cases (it seems to get stuck in an infinite loop on a match failure). I would be happy to instead incorporate the case logic into the conversion directly, and remove the decompositions once they are rendered completely obsolete. As long as this approach is acceptable, I can clean up the implementation with some helper functions, and quickly add support for each of the remaining Adaptive pooling ops.	2024-01-09 11:14:10 -08:00
Vivek Khandelwal	690827fe52	build: manually update PyTorch version Set PyTorch and TorchVision version to nightly release 2024-01-02. Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>	2024-01-03 11:47:12 +05:30
Xida Ren (Cedar)	6660a26594	lower torch.aten.isinf to linalg (#2638 ) Co-authored-by: Rob Suderman <rob.suderman@gmail.com>	2023-12-28 17:20:32 -08:00
Sungsoon Cho	8e389ff2ff	Implement lowering of torch.aten.exponential (#2680 ) https://github.com/llvm/torch-mlir/issues/2646 Decompose aten.exponential() into: -exp(1-x)/lambda	2023-12-27 20:33:18 -08:00
JianzheXiao	6ddeb1a6ef	[torch] Add support for aten.selu (#2640 ) Add `aten.selu` operation to `torch` dialect.	2023-12-13 20:28:08 -08:00
JianzheXiao	7cf52ae73f	[Torch Dialect]Add Support for AtenGroupNormOp and AtenNativeGroupNormOp (#2591 ) Co-authored-by: LiuYuanqiang <liuyuanqiang.yqliu@bytedance.com>	2023-12-13 11:05:12 +08:00
Eric Kunze	f67249d34f	Sort the TOSA passing test list (#2630 ) For easier tracking of issues, sort the TOSA passing list. It is still significantly smaller then the XFAIL list would be. Resolves #2620, at least until the xfail list gets smaller than the passing list. Signed-off-by: Eric Kunze <eric.kunze@arm.com>	2023-12-12 14:22:25 -08:00
JianzheXiao	96fcde4d77	[Torch Dialect] Support Einsum Op (#2230 ) As title, support torch.aten.einsum op Right now only support Static Shape, because of the known issue, the fixed solution is here: https://github.com/llvm/torch-mlir/pull/2154 Co-authored-by: Jiawei Wu [wujiawei.aml@bytedance.com](mailto:wujiawei.aml@bytedance.com)	2023-12-10 12:30:37 +08:00
Felix Schneider	fb21a85874	[TorchToLinalg] Lower grouped conv2d to linalg Op with correct dimension ordering (#2623 ) The linalg Op `linalg.conv_2d_ngchw_fgchw` had a bug where 1. Weights were accessed as G,F,C,H,W instead of as F,G,C,H,W 2. Output was accessed as N,F,G,H,W instead of as N,G,F,H,W Now this has been fixed in https://github.com/llvm/llvm-project/pull/73855 which broke the torch-mlir lowering to that Op. This patch switches lowering in torch-mlir to the newly introduced `linalg.conv_2d_ngchw_gfchw` op which accesses weights in an order that is compatible with PyTorch's memory layout. Fix https://github.com/llvm/torch-mlir/issues/2622	2023-12-08 14:18:23 +01:00
Stella Laurenzo	8252656b6d	Advance llvm-project and stablehlo. (#2619 ) llvm-project: bbd2b08b95fe76bea138c1b03c1cd42ed3ee04df stablehlo: ab709fe48de88c67717abfbd7ef17425eb95ddaf These commits were chosen in order to account for an MLIR API break from `3dbac2c007` which required a patch to stablehlo. We integrate a bit beyond that commit to deal with some revert/reapply cycles in the intervening range which were discovered in another downstream. Further, it requires adaptation to the stablehlo API breaks introduced from https://github.com/openxla/stablehlo/pull/1872 which are along for the ride. Since some stablehlo builders were changed to directly take int64_t array refs, also traced that up some call stacks to eliminate some signed/unsigned mismatches that result. Also adds a few TOSA tests to the passing set that seem to work now.	2023-12-07 23:13:42 -08:00
Frederik Harwath	6248216dca	Add aten.min.dim to linalg lowering (#2600 )	2023-12-05 07:16:35 -08:00
Vivek Khandelwal	10b5432e7d	build: manually update PyTorch version Set PyTorch and TorchVision version to nightly release 2023-12-04. Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>	2023-12-05 13:18:47 +05:30
James Newling	1b7d6f2af9	Improve decomposition of pixel_shuffle (support dynamic shapes) (#2590 ) The aten.reshape ops in the decomposition are replaced with prims.collapse and prims.split_dim ops, which means that the cases where the lowering of reshape from torch to linalg which are not supported, are avoided. Essentially, by using the collapse and split_dim ops instead of the reshape ops, we are not "losing" the information that the reshapes do not arbitrarily mix dimensions. Which makes lowering easy. 3 additional tests added: - fully dynamic, - dynamic only the spatial dimensions, - dynamic only in the non-spatial dimensions.	2023-11-22 12:31:06 -08:00
James Newling	03e8f99730	Lowering to linalg of prims split_dim op (#2576 ) Adds support for lowering to prims split_op. Similar design to collapse op lowering in https://github.com/llvm/torch-mlir/pull/2572, with some small differences, because the split_dim op (in pytorch) is view-changing whereas the collapse is not. The difference means that 1) it must be registered in the function Torch::isViewLikeOp 2) it must be be added to the "expected fail" set for the torch dynamo backend.	2023-11-21 07:56:09 -08:00
Yuanqiang Liu	facbe5d96b	[Torch Dialect] support AtenArangeStartOutOp in ReduceOpVariants like… (#2563 ) … AtenBernoulli_FloatOp It fixing case like: `%2110 = torch.aten.arange.start_out %int1, %int1517, %int1, %2109 : !torch.int, !torch.int, !torch.int, !torch.tensor -> !torch.tensor`. `aten.arange.start_out` doesn't have value semantics also, means`%2110` is an alias for %2109. So I decompose it to `aten.arange.start` + `torch.contents.overwrite`. The complex decomposition logic is target to handle cases like view and dtype cast which I add in e2e tests.	2023-11-17 00:51:55 +08:00
James Newling	e81282ae8f	Support for prims collapse op (lowering to linalg) (#2572 ) Steps taken: 1) add generator code to torch_ods_gen.py, run update_torch_ods.sh 2) add (custom) shape and type inference generator code to abstract_interp_lib_gen.py, run update_abstract_interp_lib.sh 3) Implement lowering to tensor.collapse_dims. Requires the `start` and `end` values to be constant, else lowering fails 4) Update xfail_sets.py (append to LTC_XFAIL_SET) after running /tools/e2e_test.sh --filter Collapse --verbose -c XX for all support backends (XX). Motivation: - Supporting the collapse operation will be useful for lowering of pixel_shuffle (see Issue #2559)	2023-11-15 08:34:38 -08:00
Yuanqiang Liu	3ab790c50a	[Torch Dialect] add canonicalize for aten.numel (#2562 )	2023-11-11 12:16:53 +08:00
James Newling	b6e551c7b8	Decomposition of aten.pixel_shuffle with static input shape (#2550 ) For static tests (that is when the shape is know) for example: ``` @annotate_args([None, ([3, 18, 2, 2], torch.float32, True)]) ``` The e2e passes. But only if the replacement op's return type is set as undefined (optional shape and type must be explicitly made unset), otherwise there's a error about the function return type. For dynamic cases, for example if the above is replaced with ``` @annotate_args([None, ([-1, -1, -1, -1], torch.float32, True)]) ``` There is a failure to lower to linalg from torch ("view op explicitly labelled as illegal"). This seems to be because the support for lowering from torch to linalg with dynamic shapes is limited.	2023-11-08 08:52:44 -05:00
JianzheXiao	a42d4c18ff	[Torch Dialect]Support aten.cosine_similarity (#2364 ) As title, add support for aten.cosine_similarity, support broadcast inputA/inputB to the same shape	2023-11-08 15:28:30 +08:00
Jiawei Wu	d5ee8ee73a	[Torch Dialect] emit aten.reshape_as op and add decomposition pattern. (#2553 )	2023-11-05 11:38:36 +08:00
Yuanqiang Liu	0378da0abd	[Torch Dialect] support aten.isinf (#2544 ) Also fix linalg lowering from `UEQ` to `OEQ`. I will check other comparison's lowering later.	2023-11-04 22:26:01 +08:00
Stella Laurenzo	6961f0a247	Re-organize project structure to separate PyTorch dependencies from core project. (#2542 ) This is a first step towards the structure we discussed here: https://gist.github.com/stellaraccident/931b068aaf7fa56f34069426740ebf20 There are two primary goals: 1. Separate the core project (C++ dialects and conversions) from the hard PyTorch dependencies. We move all such things into projects/pt1 as a starting point since they are presently entangled with PT1-era APIs. Additional work can be done to disentangle components from that (specifically LTC is identified as likely ultimately living in a `projects/ltc`). 2. Create space for native PyTorch2 Dynamo-based infra to be upstreamed without needing to co-exist with the original TorchScript path. Very little changes in this path with respect to build layering or options. These can be updated in a followup without commingling directory structure changes. This also takes steps toward a couple of other layering enhancements: * Removes the llvm-external-projects/torch-mlir-dialects sub-project, collapsing it into the main tree. * Audits and fixes up the core C++ build to account for issues found while moving things. This is just an opportunistic pass through but roughly ~halves the number of build actions for the project from the high 4000's to the low 2000's. It deviates from the discussed plan by having a `projects/` tree instead of `compat/`. As I was thinking about it, this will better accommodate the follow-on code movement. Once things are roughly in place and the CI passing, followups will focus on more in-situ fixes and cleanups.	2023-11-02 19:45:55 -07:00

1 2 3 4

183 Commits (99511cef82997fd9faf6c5d2be4b932a37ec0f96)