torch-mlir

Commit Graph

Author	SHA1	Message	Date
zjgarvey	c26ca8b94d	Fix a bug for large models in onnx importer. (#3875 ) The method `onnx.load_external_data_for_model` function does not admit `pathlib.Path` as an input.	2024-11-14 23:20:11 -06:00
Vivek Khandelwal	0eba539ef7	Bump LLVM to 813f7c3 (#3873 ) This commit bumps the llvm-project to `813f7c3820`. This commit also updates the usage of `APInt` in `unpack-quant-tensor` pass by explicitly setting the `implicitTrunc` arg to be `True` whose default value was changed from True to False here `3494ee9590`. Signed-off-by: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>	2024-11-15 10:36:55 +05:30
Vivek Khandelwal	fe2f64919d	[ONNX] Remove kernel shape and weight shape equivalence check from Onnx.Conv lowering (#3869 ) This commit removes the equivalence check for kernel shape and weight shape from the Onnx.conv lowering since those checks seem to be of no use (not sure why were they part of the lowering in the first place). Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>	2024-11-15 10:36:41 +05:30
giacs-epic	06d17897f0	[Torch Dialect] Allow simplification of shape calculations of aten.tile, col2im, aten.stft (#3785 ) - Add `aten.mul.left_t` (+ canonicalizer) to allow simplification of aten.tile. - Change syntax of the computation of col2im shape to allow the use of an already existing canonicalization pattern (for `aten.add.t`) for its simplification. - Add `aten.eq.bool` ( + folder) to allow simplification of aten.stft.	2024-11-14 15:14:39 -06:00
zjgarvey	1201babb9f	[ONNX] rework some reduction op lowerings (#3870 ) - Refactors more "onnx.ReduceXXX" patterns through helper function. - Fixes bug with iterating unconditionally on `output_dim == 1` during `dimList` inference. This change results in passes for the following 11 models: crossvit_15_240 crossvit_15_dagger_240 crossvit_15_dagger_408 crossvit_18_240 crossvit_18_dagger_240 crossvit_18_dagger_408 crossvit_9_240 crossvit_9_dagger_240 crossvit_base_240 crossvit_small_240 crossvit_tiny_240 --------- Co-authored-by: Vinayak Dev <104419489+vinayakdsci@users.noreply.github.com>	2024-11-14 16:25:28 +00:00
Hanumanth	30c519369e	Support default padding case for tosa::AvgPool in the presence of count_include_pad (#3868 ) Essentially, as part of my earlier [change](`7f9f99c6f8`) , I didn't consider the `padding` value while erroring out for unsupported `count_include_pad` during `torch-to-tosa` lowering for AvgPool2d. The fix captured in this change addresses this. Please see [issue](https://github.com/llvm/torch-mlir/issues/3862) for more details on this. Co-authored-by: Hanumanth Hanumantharayappa <hhanuman@ah-hhanuman-l.dhcp.mathworks.com>	2024-11-12 13:48:20 -08:00
zjgarvey	cd38ecf6c2	Add Scalarization Patterns for `AtenToDtypeOp`, `AtenNegOp`, `AtenRemainderTensorOp` (#3861 ) 1. adds a lowering for `aten.neg.int` and `aten.remainder.int` to arith. 2. adds a scalarization pattern for `aten.neg` and `aten.remainder.Tensor` ops. 3. improves folding of `aten.mul.int` 4. adds a scalarization pattern for `aten.to.dtype` which relies on scalar cast ops and basic C++ casting between `double` and `int64_t`. 5. improves rank-0 case handling for `FoldAtenSplatPattern` 6. removes a bug with `aten.unflatten.int` decomposition incorrectly generating a constant size int from a dynamic shape. 7. simplifies the dim list for `aten.unflatten.int` ops generated from the `aten.view` canonicalization in scalarize shapes. All of these changes were necessary to unblock <https://github.com/iree-org/iree/issues/18899>.	2024-11-12 14:25:02 -06:00
aldesilv	889a836b3d	OnnxToTorch bicubic interpolation (#3802 ) (https://github.com/nod-ai/SHARK-TestSuite/pull/391) Repro (using SHARK TestSuite): 1. `python run.py --torchtolinalg -m cl-onnx-iree -t cubic_test` --------- Co-authored-by: zjgarvey <zjgarvey@gmail.com>	2024-11-12 12:54:29 -06:00
Vivek Khandelwal	17c1985c4d	build: manually update PyTorch version (#3863 ) This commit sets the PyTorch and TorchVision version to nightly release 2024-11-07. This commit also updates the dtype check for the `aten.fake_quantize_per_tensor_affine` and `aten.fake_quantize_per_tensor_affine_cachemask` op since the op now supports bfloat16 input. Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>	2024-11-11 21:26:56 +05:30
Justin Ngo	8eb34dae78	[TOSA] Add promote type to unary ops and aten.cat lowering (#3860 ) Change-Id: I2699bf9007723fe629edb1c524c10ef8142e0234 Signed-off-by: Justin Ngo <justin.ngo@arm.com>	2024-11-08 11:23:39 -08:00
Justin Ngo	b6f04fa32b	[TOSA] Fix rsub; add clamp.Tensor, avg_pool1d, max_pool1d, prims.collapse (#3855 ) - Fix aten.rsub.Scalar legalization with appropriate type casting - Add legalization for aten.clamp.Tensor - Resolve some unexpected test failures from PyTorch update by adding legalization for the following ops: + aten.avg_pool1d + aten.max_pool1d + torch.prims.collapse - Update xfail_sets with new e2e results - Add new LIT tests to basic.mlir Change-Id: I9762c7d36ca0b0f75ca68d0c71d7f5d5309a96ad --------- Signed-off-by: Justin Ngo <justin.ngo@arm.com>	2024-11-07 14:09:43 -08:00
zjgarvey	8519ecc4d7	Generalize `aten.view` pattern in scalarize shapes (#3856 ) Extends the existing pattern to allow finding matching dims from the back as well as the front.	2024-11-07 15:26:07 -06:00
yyp0	7058f456b8	[Stablehlo] support aten.isfinite (#3850 )	2024-11-07 16:52:39 +08:00
yyp0	dda65b196d	[Torch] support float_power and threshold ops (#3854 )	2024-11-07 16:27:51 +08:00
yyp0	2f33f31724	[Torch] support AtenNllLossForwardOp decomposition (#3833 )	2024-11-06 11:34:48 +08:00
Yuanqiang Liu	70e089802a	[Torch] emit and lowering frac, signbit, ldexp, copysign ops (#3851 ) also fix `aten.exp2` with integer type	2024-11-06 10:21:37 +08:00
Ian Wood	e88faf08ff	Create scatter op with unique indicies (#3853 ) For the op `index_put_`, if accumulate == false, the behavior is undefined if the indicies aren't unique (https://pytorch.org/docs/stable/generated/torch.Tensor.index_put_.html). So, when converting `AtenIndexPutHackedTwinOp` to a TMTensor scatter op, mark the indices as unique if when `accumulate == false`. This should have no functional effect (unless users are relying on UB) and assuming unique indices has the benefit of unlocking better optimizations in further compiler stages. Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>	2024-11-05 12:48:34 -08:00
Jiawei Wu	b75d0e3f8b	[stablehlo] fix: enhance torch's index-like op lowering to stablehlo's gather/scatter (#3829 ) In torch.index_put like ops, `values` is only required to be broadcastable to `input[indices]`, rather than exact dimension match. This patch fixes the problem by add additional stablehlo.dynamic_broadcast_in_dim before creating stablehlo.scatter op. BTW, this patch also enhance the `getBroadcastResultShape` utility in hlo namespace.	2024-11-05 19:15:11 +08:00
Justin Ngo	4c1518d365	[TOSA] Add legalization for aten.as_strided (#3848 ) - Add Torch to TOSA legalization for aten.as_strided op - Update xfail_sets with the following: + New aten.as_strided results + Changes from this commit: `7f9f99c6f8` + Failed tests from new PyTorch version update - Add new LIT test to basic.mlir Change-Id: I6f471ea116ca47f2bf9537b62950fce75a2c624f Signed-off-by: Justin Ngo <justin.ngo@arm.com>	2024-11-04 09:57:59 -08:00
jinchen	6aa46967b6	Add tosa::getConstTensor with int8_t template (#3845 ) Add tosa::getConstTensor with int8_t template used in https://github.com/llvm/torch-mlir/pull/3827	2024-11-01 21:22:27 +00:00
zjgarvey	3104b66560	Fix Slice Folder OOB Crash and onnx.Shape lowering (#3843 ) 1. Clamps OOB start index to 0 in slice folder 2. Adds a more descriptive `emitError` in slice folder if the creation of the `DenseElementsAttr` would fail due to a bad result shape. 3. Fixes the `onnx.Shape` lowering to default to `inputRank` for `end` instead of `-1`. When `end==-1` the last element was missing when slicing.	2024-11-01 15:33:21 -05:00
zjgarvey	738d45d3bb	add scalarization patterns to support dynamic pytorch pad exports (#3838 ) 1. Adds case handling for `aten.slice.tensor` shape inference with negative strides. This is not technically allowed by native pytorch, but it is useful for ONNX ingest. We were getting some incorrect shapes for these negative strided slice ops. 2. Adds scalarization support for ops seen in pytorch pad exports to ONNX. These are typically `aten.view` `aten.transpose.int` and `aten.slice.Tensor` with negative strides (and rank 2). 3. Allows view op `self` to be added to the worklist conditionally, based on whether the view op actually occurs as a middle point in a shape computation.	2024-11-01 14:56:48 -05:00
jinchen	39d69db5ca	Cast static/dynamic shape for onnx.If branches to match result type (#3828 )	2024-11-01 12:10:59 -07:00
zjgarvey	3cfb7c8df6	Add an info cast to `prims.squeeze` decomposition (#3844 ) The onnx ingest sometimes has poorly propagated shape information. E.g.: ```mlir ... %9020 = torch.prims.squeeze %9010#1, %9019 : !torch.vtensor<[?,384,1],f32>, !torch.list<int> -> !torch.vtensor<[1,384],f32> return %9015, %9020 : !torch.vtensor<[1,384],f32>, !torch.vtensor<[1,384],f32> } } ``` This occurred at the boundary of the onnx model `migraphx_bert__bert-large-uncased`. Evidently, the output value tensor info had more information than could be propagated forward. The `PrimsSqueeze` lowering was returning a `!torch.vtensor<[?,384],f32>` which was causing a type mismatch with the `func.return`.	2024-11-01 12:10:47 -05:00
zjgarvey	a82ba1c422	[TorchToArith] add lowerings for some scalar bool binary ops (#3823 ) Added lit tests since these scalar operations don't trace well through the `fx_importer` route. `XOR` and `NE` are equivalent binary operators, so `aten.ne.bool` is lowered to `arith.xori`.	2024-11-01 10:40:20 -05:00
Xinyu Yang	3dbeda9082	[Stablehlo] fix template typo (#3842 ) I think we should use template parameters. @yyp0 @qingyunqu	2024-11-01 21:10:38 +08:00
Hanumanth	7f9f99c6f8	Fix torchToTosa lowering for avgpool2d to handle unsupported parameters (#3822 ) The existing TorchToTosa lowering logic for `torch.aten.avg_pool2d` doesn't handle some unsupported properties well, leading to a silent wrong answer(SWA) when we go through `torch-backend-to-tosa-backend-pipeline.` For instance, with the existing TOSA avgpool2d specification, we can not represent `count_include_pad` and `divisor_override,` so during TorchToTosa lowering, we silently ignore these properties which leads to SWA in some cases—the fix captured in this change errors for unsupported scenarios. For details on `count_include_pad` and `divisor_override,` please see the below link. https://pytorch.org/docs/stable/generated/torch.nn.AvgPool2d.html --------- Co-authored-by: Hanumanth Hanumantharayappa <hhanuman@ah-hhanuman-l.dhcp.mathworks.com>	2024-11-01 08:25:59 -04:00
jinchen	032a636c35	Fix onnx.If lowering with scalar condition tensor (#3846 ) Fixes https://github.com/nod-ai/SHARK-ModelDev/issues/696#issuecomment-2442016530	2024-10-31 20:34:50 -07:00
Rob Suderman	25738b8c19	[linalg] Broadcast batch for mask on sdpa lowering (#3824 ) Attention often broadcasts a mask across the batch dimension as masking is usually performed the same across attention heads. Added this materialization to the mask dimensions optionally.	2024-10-31 17:59:24 -07:00
Rob Suderman	5aa323dd29	[linalg] Fix torch.aten.add of `torch.bool` (#3820 ) Addition of bools saturate which equates to an `or` operator. Updated to avoid some noticed downstream failures.	2024-10-31 17:37:25 -07:00
Stephen Baione	9c1e3b8154	support `aten._trilinear` and improve `einsum` decomposition (#3784 ) # Tracking [Issue](https://github.com/nod-ai/SHARK-ModelDev/issues/848) [TorchToLinalg Op Support](https://github.com/nod-ai/SHARK-ModelDev/issues/347) # Description Aten_TrilinearOp is an implementation of a "trilinear einstein sum". Essentially, just an einsum across 3 tensors. There are a few inputs: ## Tensor Inputs - i1, i2, i3 - The three input tensors for the _trilinear op. ## Expands These inputs allow you to unsqueeze an input tensor at the specified dims as a pre-processing step to make the shapes compatible for the rest of the op: - expand1: List[int], expand2: List[int], expand3: List[int] ## sumdim - sumdim: List[int] - After applying element wise multiplication, the values in sumdim denote where to collapse a dimension by summing over it ## unroll_dim - unroll_dim: int - In the PyTorch implementation, this specifies a dimension where you could slice the input tensors, multiply and sum them, then concatenate the results in an output tensor. This complicates the implementation significantly, but doesn't change the result, so I opted against it. Along with that, a previously accepted path for solving this involved reusing the AtenEinsumOp, which also would also ignore this input. # Solution After trying a bunch of more complicated approaches for it, this op actually ended up being quite simple: [See _trilinear](https://dev-discuss.pytorch.org/t/defining-the-core-aten-opset/1464) `_trilinear = (i1.unsqueeze(expand1) * i2.unsqueeze(expand2) * i3.unsqueeze(expand3)).sum(sumdim)` Wish I saw this earlier, but watcha gonna do: 🙃 ## Not Reusing AtenEinsumOp Frankly, I found multiple cases where valid inputs would have numerical mismatches for EinsumOp, even when running tests against EinsumOp directly. I think it has something to do with the singleton dimensions. Will need to look into this further, but once I realized the simplified approach, it appeared to be more reliable and much simpler. Either way (credit to @zjgarvey), there are improvements to the einsum op here. When I was originally trying to use the op, intermediate tensors were being flattened properly, but then its 0th dimension was being cast from a static dim to a dynamic dim due to integers not folding correctly in the MLIR. Figured it's worth keeping these improvements for future reusers of EinsumOp. # The zero'd out dim "bug" For some reason, if you specify a dimension in all `expands`, ```i.e. [expand1=[0], expand2=[0], expand3=[0]], [expand1=[1], expand2=[1], expand3=[1]] ``` The _trilinear op would specify `0` for that dimension in the output shape, unless it was also included in `sumdim`. This goes against the implementation of torch.einsum: ``` >>> a, b, c = [torch.rand(1, 3, 3, 3) for i in range(3)] # Simulate expand at dim=0 for all input tensors >>> torch.einsum('abcd,abcd,abcd->abcd', a, b, c).shape torch.Size([1, 3, 3, 3]) ``` And is just straight up incorrect mathematically. I considered "replacing" singleton dims with zeroed out dims, but that seemed like carrying over a bug. Instead, I included a test for the case, verified that the singleton dimensions were handled the way that torch.einsum handles it, instead of torch._trilinear, and xfailed it with a note as to why.	2024-10-31 14:30:40 -05:00
Dixin Zhou	8f52f5a4ed	[Fx Importer] fix mutation importer with non persistent buffer (#3798 ) A non-persistent buffer will not be a part of this module’s `state_dict`. Hence when setting `experimental_support_mutation=True` and have non-persistent buffer, the current fx importer will fail to retrieve a value from `state_dict` and produce `torch.constant.none` to represent the buffer. This fix get value of non-persistent buffer from the module's `constants`. --------- Co-authored-by: Dixin Zhou <dzhou@vdi-ahddp-020.dhcp.mathworks.com>	2024-11-01 02:20:32 +08:00
yyp0	9ce2a69703	[Torch] support AtenExp2Op (#3832 ) - support AtenExp2Op by decomposing it to aten.pow.scalar - refine stablehlo pow.scalar pow.Tensor_Scalar pow.Tensor_Tensor lowering according to https://github.com/llvm/torch-mlir/pull/2983 - Close https://github.com/llvm/torch-mlir/pull/2983	2024-10-31 19:14:05 +08:00
Justin Ngo	4dd213b042	[TOSA] Expand Torch to TOSA legalization coverage (#3827 ) - Add/Extend Torch to TOSA legalization for the following ops: + Add aten.threshold_backward + Fix aten.threshold + Re-implement aten.broadcast_to using tosa.reshape and tosa.tile + Add support for rank 0 index for aten.index_select + Fix aten.index_put.hacked_twin + Add aten.uniform + Add aten.logical_and - Update xfail_sets.py with new e2e results - Add LIT tests to basic.mlir for newly added ops Change-Id: I8910564a049d18293284fe2e55e82bc1d2cf10e3 Signed-off-by: Justin Ngo <justin.ngo@arm.com>	2024-10-30 16:26:10 -07:00
Rob Suderman	a6292f38ca	[bazel] Fix missing dependency in the build (#3826 )	2024-10-30 13:47:04 -05:00
Max191	8b0bf2e293	Bump LLVM to llvm/llvm-project@6c64c8a6f3 (#3818 ) - bumps llvm-project to `6c64c8a6f3` - bumps stablehlo to `6e403b1aa6` - Updates type conversion materialization functions to return Value after API change in llvm-project. --------- Signed-off-by: Max Dawkins <max.dawkins@gmail.com>	2024-10-30 11:38:51 -04:00
Max191	6b58c89914	Remove variable used for only assertion (#3837 ) Removes a boolean variable that is used only for an assertion, and inlines the condition into the assertion. Signed-off-by: Max Dawkins <max.dawkins@gmail.com>	2024-10-30 10:51:06 -04:00
Vivek Khandelwal	16b3bd6e6c	build: manually update PyTorch version and fix CI failure (#3830 ) This commit sets the PyTorch and TorchVision version to nightly release 2024-10-29. This commit also fixes the CI failure after this commit `54d9e24013` got merged. The issue was that the CI checks in the PR were run before the previous roll pytorch update but the PR was actually merged after the roll pytorch update. Hence, the failure was not caught before merging the PR. While exporting the fx_graph through fx_importer for `rrelu` and `rrelu_with_noise` op for train mode, it decomposes the `aten.rrelu_with_noise` op based on the PyTorch decomposition which is the default behavior. However, the decomposition contains an input mutation specifically here `9bbe4a67ad/torch/_decomp/decompositions.py (L325)`, resulting in the runtime failure. This issue would probably be fixed by https://github.com/pytorch/pytorch/pull/138503. Until then, the failing tests are added to the xfail set. Also, after the roll pytorch update following tests started passing for fx_importer, and fx_importer_stablehlo config. - "ElementwiseRreluTrainModule_basic" - "ElementwiseRreluTrainStaticModule_basic" - "ElementwiseRreluWithNoiseTrainModule_basic" - "ElementwiseRreluWithNoiseTrainStaticModule_basic" This commit also updates the dtype check for the `aten.linear` op since the op now expects both the input tensors to have the same dtype. Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>	2024-10-30 18:56:01 +05:30
Yuanqiang Liu	9ab2a150f2	[Torch] emit upsample_bilinear2d(.vec) ops (#3834 )	2024-10-30 20:18:24 +08:00
Sayan Saha	2b01f8b7f3	[Tosa] : Add support for negative indices in index.tensor and index.Tensor_hacked_twin for TorchToTosa lowering. (#3790 ) 1. Negative indices for tensor indexing is handled by wrapping around the index values by checking their values at run time. Without the fix, there was a runtime error. 2. Added a lit test to lock down the behavior. 3. Updated the `xfails_set` for `fx_importer_tosa` config to lockdown the behavior with e2e test as well. "THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY."	2024-10-25 15:37:19 -07:00
Andrija Bosnjakovic	54d9e24013	[TorchToLinalg] Implement lowering of torch.aten.rrelu_with_noise and torch.aten.rrelu_with_noise_backward ops (fix) (#3748 )	2024-10-25 21:31:05 +05:30
Dmitry Babokin	ad9dfe974e	Fix clang warning about printf format (#3814 ) Compiling with clang 16.0 on macOS I have warnings about incorrect printf format (see below). Values to be printed are `int64_t`, but they are printed with `%zu` and `%ld`, which are not portable way to print this type. ``` <...>/torch-mlir/test/CAPI/torch.c:52:3: warning: format specifies type 'size_t' (aka 'unsigned long') but the argument has type 'int64_t' (aka 'long long') [-Wformat] 52 \| DEFINE_CHECK(NonValueTensor) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ <...>/torch-mlir/test/CAPI/torch.c:37:13: note: expanded from macro 'DEFINE_CHECK' 36 \| fprintf(stderr, #TTT "Type %s rank: %zu\n", testName, \ \| ~~~ 37 \| torchMlirTorch##TTT##TypeGetRank(TTT##Type)); \ \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <scratch space>:78:1: note: expanded from here 78 \| torchMlirTorchNonValueTensorTypeGetRank \| ^ <...>/torch-mlir/test/CAPI/torch.c:52:3: warning: format specifies type 'long' but the argument has type 'int64_t' (aka 'long long') [-Wformat] 52 \| DEFINE_CHECK(NonValueTensor) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ <...>/torch-mlir/test/CAPI/torch.c:42:15: note: expanded from macro 'DEFINE_CHECK' 41 \| fprintf(stderr, #TTT "Type %s pos %d size: %ld\n", testName, i, \ \| ~~~ 42 \| TTT##Sizes[i]); \ \| ^~~~~~~~~~~~~ <scratch space>:85:1: note: expanded from here 85 \| NonValueTensorSizes \| ^ <...>/torch-mlir/test/CAPI/torch.c:53:3: warning: format specifies type 'size_t' (aka 'unsigned long') but the argument has type 'int64_t' (aka 'long long') [-Wformat] 53 \| DEFINE_CHECK(ValueTensor) \| ^~~~~~~~~~~~~~~~~~~~~~~~~ <...>/torch-mlir/test/CAPI/torch.c:37:13: note: expanded from macro 'DEFINE_CHECK' 36 \| fprintf(stderr, #TTT "Type %s rank: %zu\n", testName, \ \| ~~~ 37 \| torchMlirTorch##TTT##TypeGetRank(TTT##Type)); \ \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <scratch space>:112:1: note: expanded from here 112 \| torchMlirTorchValueTensorTypeGetRank \| ^ <...>/torch-mlir/test/CAPI/torch.c:53:3: warning: format specifies type 'long' but the argument has type 'int64_t' (aka 'long long') [-Wformat] 53 \| DEFINE_CHECK(ValueTensor) \| ^~~~~~~~~~~~~~~~~~~~~~~~~ <...>/torch-mlir/test/CAPI/torch.c:42:15: note: expanded from macro 'DEFINE_CHECK' 41 \| fprintf(stderr, #TTT "Type %s pos %d size: %ld\n", testName, i, \ \| ~~~ 42 \| TTT##Sizes[i]); \ \| ^~~~~~~~~~~~~ <scratch space>:119:1: note: expanded from here 119 \| ValueTensorSizes \| ^ 4 warnings generated. ```	2024-10-25 15:42:08 +08:00
Felix Schneider	76209db5a5	Update quantized matmul tests to DQ/Q format supported by fx_importer (#3815 ) Continuation of https://github.com/llvm/torch-mlir/pull/3809 for the matmul tests.	2024-10-24 21:59:58 +02:00
zjgarvey	1259e8a00a	Add Some Folders For Small Reshape Ops (#3813 ) ### Changes 1. Folders for view-like ops: `aten.view`, `aten.flatten.using_ints`, and `aten.unflatten.int` 2. Folder for transpose 3. Extended support for the `aten.slice.Tensor` op folder to include negative strides. ### Motivation The biggest motivation for this patch is to fold the extremely convoluted ir that gets generated when exporting a pytorch model with an `aten.pad` op to ONNX, then re-importing and lowering back to torch. For example, the verbose output of the e2e test `PadModule_basic` with `-c onnx`: ```mlir module { func.func @main_graph(%arg0: !torch.vtensor<[?,?,?,?],f32>) -> !torch.vtensor<[?,?,?,?],f32> attributes {torch.onnx_meta.ir_version = 9 : si64, torch.onnx_meta.opset_version = 20 : si64, torch.onnx_meta.producer_name = "pytorch", torch.onnx_meta.producer_version = "2.5.0"} { %none = torch.constant.none %0 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<_> : tensor<1xsi64>} : () -> !torch.vtensor<[1],si64> %1 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__1> : tensor<4xsi64>} : () -> !torch.vtensor<[4],si64> %2 = torch.operator "onnx.ConstantOfShape"(%0) {torch.onnx.value = dense_resource<__2> : tensor<1xsi64>} : (!torch.vtensor<[1],si64>) -> !torch.vtensor<[4],si64> %3 = torch.operator "onnx.Concat"(%1, %2) {torch.onnx.axis = 0 : si64} : (!torch.vtensor<[4],si64>, !torch.vtensor<[4],si64>) -> !torch.vtensor<[8],si64> %4 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__3> : tensor<2xsi64>} : () -> !torch.vtensor<[2],si64> %5 = torch.operator "onnx.Reshape"(%3, %4) {torch.onnx.allowzero = 0 : si64} : (!torch.vtensor<[8],si64>, !torch.vtensor<[2],si64>) -> !torch.vtensor<[4,2],si64> %6 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__4> : tensor<1xsi64>} : () -> !torch.vtensor<[1],si64> %7 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__5> : tensor<1xsi64>} : () -> !torch.vtensor<[1],si64> %8 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__6> : tensor<1xsi64>} : () -> !torch.vtensor<[1],si64> %9 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__7> : tensor<1xsi64>} : () -> !torch.vtensor<[1],si64> %10 = torch.operator "onnx.Slice"(%5, %7, %8, %6, %9) : (!torch.vtensor<[4,2],si64>, !torch.vtensor<[1],si64>, !torch.vtensor<[1],si64>, !torch.vtensor<[1],si64>, !torch.vtensor<[1],si64>) -> !torch.vtensor<[4,2],si64> %11 = torch.operator "onnx.Transpose"(%10) {torch.onnx.perm = [1 : si64, 0 : si64]} : (!torch.vtensor<[4,2],si64>) -> !torch.vtensor<[2,4],si64> %12 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__8> : tensor<1xsi64>} : () -> !torch.vtensor<[1],si64> %13 = torch.operator "onnx.Reshape"(%11, %12) {torch.onnx.allowzero = 0 : si64} : (!torch.vtensor<[2,4],si64>, !torch.vtensor<[1],si64>) -> !torch.vtensor<[8],si64> %14 = torch.operator "onnx.Cast"(%13) {torch.onnx.to = 7 : si64} : (!torch.vtensor<[8],si64>) -> !torch.vtensor<[8],si64> %15 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__9> : tensor<f32>} : () -> !torch.vtensor<[],f32> %16 = torch.operator "onnx.Pad"(%arg0, %14, %15) {torch.onnx.mode = "constant"} : (!torch.vtensor<[?,?,?,?],f32>, !torch.vtensor<[8],si64>, !torch.vtensor<[],f32>) -> !torch.vtensor<[?,?,?,?],f32> return %16 : !torch.vtensor<[?,?,?,?],f32> } } {-# dialect_resources: { builtin: { _: "0x080000000400000000000000", __1: "0x080000000000000000000000010000000000000002000000000000000300000000000000", __2: "0x080000000000000000000000", __3: "0x08000000FFFFFFFFFFFFFFFF0200000000000000", __4: "0x080000000000000000000000", __5: "0x08000000FFFFFFFFFFFFFFFF", __6: "0x080000000100000000000080", __7: "0x08000000FFFFFFFFFFFFFFFF", __8: "0x08000000FFFFFFFFFFFFFFFF", __9: "0x080000000000C03F" } } #-} ``` Get's converted to the torch IR: ```mlir module { func.func @main_graph(%arg0: !torch.vtensor<[?,?,?,?],f32>) -> !torch.vtensor<[?,?,?,?],f32> attributes {torch.onnx_meta.ir_version = 9 : si64, torch.onnx_meta.opset_version = 20 : si64, torch.onnx_meta.producer_name = "pytorch", torch.onnx_meta.producer_version = "2.5.0"} { %float1.500000e00 = torch.constant.float 1.500000e+00 %int-9223372036854775807 = torch.constant.int -9223372036854775807 %int-1 = torch.constant.int -1 %int7 = torch.constant.int 7 %int6 = torch.constant.int 6 %int5 = torch.constant.int 5 %int3 = torch.constant.int 3 %int8 = torch.constant.int 8 %int1 = torch.constant.int 1 %int2 = torch.constant.int 2 %int4 = torch.constant.int 4 %int0 = torch.constant.int 0 %0 = torch.vtensor.literal(dense<[0, 1, 2, 3, 0, 0, 0, 0]> : tensor<8xsi64>) : !torch.vtensor<[8],si64> %1 = torch.prim.ListConstruct %int4, %int2 : (!torch.int, !torch.int) -> !torch.list<int> %2 = torch.aten.view %0, %1 : !torch.vtensor<[8],si64>, !torch.list<int> -> !torch.vtensor<[4,2],si64> %3 = torch.aten.slice.Tensor %2, %int0, %int-1, %int-9223372036854775807, %int-1 : !torch.vtensor<[4,2],si64>, !torch.int, !torch.int, !torch.int, !torch.int -> !torch.vtensor<[4,2],si64> %4 = torch.aten.transpose.int %3, %int0, %int1 : !torch.vtensor<[4,2],si64>, !torch.int, !torch.int -> !torch.vtensor<[2,4],si64> %5 = torch.prim.ListConstruct %int-1 : (!torch.int) -> !torch.list<int> %6 = torch.aten.view %4, %5 : !torch.vtensor<[2,4],si64>, !torch.list<int> -> !torch.vtensor<[8],si64> %7 = torch.aten.slice.Tensor %6, %int0, %int0, %int1, %int1 : !torch.vtensor<[8],si64>, !torch.int, !torch.int, !torch.int, !torch.int -> !torch.vtensor<[1],si64> %8 = torch.aten.item %7 : !torch.vtensor<[1],si64> -> !torch.int %9 = torch.aten.slice.Tensor %6, %int0, %int1, %int2, %int1 : !torch.vtensor<[8],si64>, !torch.int, !torch.int, !torch.int, !torch.int -> !torch.vtensor<[1],si64> %10 = torch.aten.item %9 : !torch.vtensor<[1],si64> -> !torch.int %11 = torch.aten.slice.Tensor %6, %int0, %int2, %int3, %int1 : !torch.vtensor<[8],si64>, !torch.int, !torch.int, !torch.int, !torch.int -> !torch.vtensor<[1],si64> %12 = torch.aten.item %11 : !torch.vtensor<[1],si64> -> !torch.int %13 = torch.aten.slice.Tensor %6, %int0, %int3, %int4, %int1 : !torch.vtensor<[8],si64>, !torch.int, !torch.int, !torch.int, !torch.int -> !torch.vtensor<[1],si64> %14 = torch.aten.item %13 : !torch.vtensor<[1],si64> -> !torch.int %15 = torch.aten.slice.Tensor %6, %int0, %int4, %int5, %int1 : !torch.vtensor<[8],si64>, !torch.int, !torch.int, !torch.int, !torch.int -> !torch.vtensor<[1],si64> %16 = torch.aten.item %15 : !torch.vtensor<[1],si64> -> !torch.int %17 = torch.aten.slice.Tensor %6, %int0, %int5, %int6, %int1 : !torch.vtensor<[8],si64>, !torch.int, !torch.int, !torch.int, !torch.int -> !torch.vtensor<[1],si64> %18 = torch.aten.item %17 : !torch.vtensor<[1],si64> -> !torch.int %19 = torch.aten.slice.Tensor %6, %int0, %int6, %int7, %int1 : !torch.vtensor<[8],si64>, !torch.int, !torch.int, !torch.int, !torch.int -> !torch.vtensor<[1],si64> %20 = torch.aten.item %19 : !torch.vtensor<[1],si64> -> !torch.int %21 = torch.aten.slice.Tensor %6, %int0, %int7, %int8, %int1 : !torch.vtensor<[8],si64>, !torch.int, !torch.int, !torch.int, !torch.int -> !torch.vtensor<[1],si64> %22 = torch.aten.item %21 : !torch.vtensor<[1],si64> -> !torch.int %23 = torch.prim.ListConstruct %14, %22, %12, %20, %10, %18, %8, %16 : (!torch.int, !torch.int, !torch.int, !torch.int, !torch.int, !torch.int, !torch.int, !torch.int) -> !torch.list<int> %24 = torch.aten.constant_pad_nd %arg0, %23, %float1.500000e00 : !torch.vtensor<[?,?,?,?],f32>, !torch.list<int>, !torch.float -> !torch.vtensor<[?,?,?,?],f32> return %24 : !torch.vtensor<[?,?,?,?],f32> } } ``` *All of these operations are useless*. It is literally the result of needing to reverse (and change the lexicographic order hierarchy of) padding ints provided via torch vs. ONNX pad ops, which is then subsequently UNDONE by our ONNX->Torch lowering (represented in the ordering of the generated list construct). With the added folders in this patch, the torch IR becomes: ``` module { func.func @main_graph(%arg0: !torch.vtensor<[?,?,?,?],f32>) -> !torch.vtensor<[?,?,?,?],f32> attributes {torch.onnx_meta.ir_version = 9 : si64, torch.onnx_meta.opset_version = 20 : si64, torch.onnx_meta.producer_name = "pytorch", torch.onnx_meta.producer_version = "2.5.0"} { %float1.500000e00 = torch.constant.float 1.500000e+00 %int0 = torch.constant.int 0 %int2 = torch.constant.int 2 %int3 = torch.constant.int 3 %int1 = torch.constant.int 1 %0 = torch.prim.ListConstruct %int0, %int1, %int2, %int3, %int0, %int0, %int0, %int0 : (!torch.int, !torch.int, !torch.int, !torch.int, !torch.int, !torch.int, !torch.int, !torch.int) -> !torch.list<int> %1 = torch.aten.constant_pad_nd %arg0, %0, %float1.500000e00 : !torch.vtensor<[?,?,?,?],f32>, !torch.list<int>, !torch.float -> !torch.vtensor<[?,?,?,?],f32> return %1 : !torch.vtensor<[?,?,?,?],f32> } } ```	2024-10-24 12:09:00 -05:00
Sriram Kumar	d6feb2179c	Added support for Maxpool (Autopad) (#3774 ) Added autopad. and passed 3 tests test_maxpool_2d_precomputed_same_upper test_maxpool_2d_same_lower' test_maxpool_2d_same_upper Address : https://github.com/nod-ai/SHARK-ModelDev/issues/843 2 attributes yet to complete : storage_order, indices output	2024-10-23 13:04:50 +00:00
lingzhiz1998	2f9a68cc1e	Add canonicalization pattern for maxpool3d with indices op (#3704 ) As discussed in https://github.com/llvm/torch-mlir/pull/3652, we should replace maxpool3dwithindices with maxpool3d if indices have no user.	2024-10-23 18:31:20 +05:30
zjgarvey	55ff110dc2	[MLIR][TORCH] Only unroll prim loop-like ops within a `torch.shape.calculate` region (#3812 ) Reports a match failure for the pattern `FullyUnrollPrimLoop` when the loop op is not in a region defined by a `torch.shape.calculate` op. This is needed to avoid unrolling prim loops generated by ONNX IR, since we are applying shape refinement in the `torch-onnx-to-torch-backend-pipeline` introduced in `fa4794d` . See also the discussion in <https://github.com/iree-org/iree/pull/18867#discussion_r1811101655>	2024-10-23 13:38:55 +05:30
Felix Schneider	aca33f1742	[TorchToLinalg] Use Op with native channel order for quantized conv2d (#3807 ) I've upstreamed the necessary quantized linalg Op with the "channel-first" ordering used by torch (https://github.com/llvm/llvm-project/pull/107740) for 2d convolution. This patch changes the lowering for the quantized 2d case of `aten.convolution` accordingly, which saves three transpositions per convolution (input, weights, result) and therefore removes the requirement to try to optimize these away in downstream passes.	2024-10-22 20:26:16 +02:00
Felix Schneider	42ba541c68	[fx] Fix importing and tests for quantized conv (#3809 ) The fx tracer does not support tracing "real" quantized tensors currently. A "real" quantized tensor here means a tensor that is created using a method like `torch.quantize_per_tensor()` and carries the quantization parameters (scale, zero_point, scheme) in the object. However, it seems like the DQ-Q type fake quantizatation is now commonly used as a high level representation of quantized operators and is only lowered to native quantized ops (if available) in the respective hardware backend. Quantization of floating point modules in PyTorch is recently also performed as a graph transformation after exporting/tracing the original module. ```python # Examples of "real"/native quantization tens = torch.randint(-127, 127, (1,), dtype=torch.int8) torch._make_per_tensor_quantized_tensor(tens, 1, 0) # tensor([90.], size=(1,), dtype=torch.qint8, # quantization_scheme=torch.per_tensor_affine, scale=1.0, zero_point=0) tens = torch.rand((1,)) torch.quantize_per_tensor(tens, 1, 0, torch.qint8) # tensor([1.], size=(1,), dtype=torch.qint8, # quantization_scheme=torch.per_tensor_affine, scale=1.0, zero_point=0) # Example of DQ/Q quantization import torch.ao.quantization.fx._decomposed tens = torch.rand((1,)) torch.ops.quantized_decomposed.quantize_per_tensor.default(tens, 1, 0, -128, 127, torch.int8) # tensor([1], dtype=torch.int8) ``` This means that a typical import flow for a quantized network into/through torch-mlir would look like this: `torch.export() -> quantization transformations on fx graph -> fx_importer` Where the tensors in the graph are normal float/int tensors and the quantization parameters are carried by the DQ/Q ops. These kinds of graphs can be traced without issues. Currently, our quantized convolution tests use the "real" quantized tensors. This means that with the retirement of the `jit_ir_importer`, these tests cannot be imported any longer. In summary, I see no reason to stick to the "real" quantization in these tests, as both PyTorch 2.0 is using DQ/Q quantization and our linalg backend is also using it. This patch updates our quantized convolution tests to use the DQ-Q quantization with the ops from `torch.ops.quantized_decomposed`. Note: For future reference, there seems to be an ongoing consolidation of the ops for the DQ/Q scheme on the PyTorch side (https://github.com/pytorch/ao/issues/986#issuecomment-2390296826).	2024-10-22 18:37:57 +02:00
zjgarvey	140cad5659	Add More Scalarize Shapes Patterns (#3810 ) ### new patterns: 1. Propagates `aten.broadcast_to` ops of a single value to an `aten.full` op 2. Propagates arithmetic operations through a templated class which associates some tensor arithmetic ops to their integer-scalar counterparts. These are a major blocker right now, since some models have a bunch of rank 0 arithmetic being done with tensor ops. See the lit test for an interesting example that pads an input to the smallest shape which will become divisible by twelve in `dim0`. If you think this is convoluted, you haven't been staring at ONNX generated IR long enough. 3. Adds a stronger folder for `aten.eq.int` to fold `size.int == 0` to `false`. See the comment in that conversion pattern for more justification as to why it is acceptable to make this assumption here. This is another major blocker for models, since this lack of folding propagates to lack of folding for subsequent `where.self` operations. 4. Add `AtenSqueezeDim` to the existing `FoldAtenSqueezeOpPattern` ### other changes: 1. Add two new anchor ops: `AtenArangeStartStepOp` and `Torch::RuntimeAssertOp`. I've checked all possible sources of the runtime assert ops and it is always shape related. The Arange op only takes int inputs, and these are all shape related. Adds a size check to getting a list from literal ops. 2. Improved folders for int arithmetic ops to fold some common patterns. 3. adds the ability to get some values from scalar-tensor ops to getListFromTensor. 4. further cleans up getListFromTensor for readability. ### points to scrutinize: 1. I made the choice to scalarize `div.Tensor` (int dtype result) to `floordiv.int`. This is because our shape computations involving this kind of arithmetic are never negative in practice, and we don't have a "round towards zero" scalar int divide counterpart. 2. Anchoring on `RuntimeAssertOp` sounds really suspicious, and if someone happens to add a runtime assert in the future that doesn't boil down to shapes, then it would add to the worklist considerably. We might be able to get around this by adding "NoMemoryEffect" to ops which are "ReadOnly" so that the inputs for the runtime asserts get cse'd with existing elements of the worklist before we even get to this pass.	2024-10-21 19:42:39 -05:00

1 2 3 4 5 ...

3242 Commits (c26ca8b94d4b1020b9be58d9c525964dd0bd79fb) All Branches Search

3242 Commits (c26ca8b94d4b1020b9be58d9c525964dd0bd79fb)

All Branches