Summary of changes:
- LLVM now includes <optional> instead of "llvm/ADT/Optional.h" in most
(although not all) places
(https://reviews.llvm.org/rG541ef3d61e9341cd38420c0dbca9250c4d0ea04c).
This patch replaces the affected instances of `llvm::Optional` with
`std::optional`.
- In the usages of llvm::Optional that remain, llvm::Optional::value()
is deprecated, so this patch replaces them with a dereference.
Functions like `getTypeForScalarType` that do a mapping from one set
of types to another should not fail, and if they do it
should be obvious to the developer that that function has an
unhandled case.
Instead of silently failing when encountering an unsupported type,
this commit adds a `report_fatal_error` at the end, similar to other
type translation functions in this file.
In order to verify if a given IR satisfies the backend contract, the
verifier needs to know if decompositions took place, and if so, which
ops were decomposed and which were not.
This commit adds two arguments to `verifyBackendContractPass` to
specify if decompositions took place and which ops to consider backend
legal, similar to the arguments of `LowerToBackendContractPass`.
Summary of changes:
- Replace `llvm::None` with `std::nullopt`, since the former is deprecated
(https://reviews.llvm.org/D139763)
- Use setter for symbol visibility instead of passing string attribute when
creating FuncOp
* [custom op] Generalize shape library logic to work with dtypes
This commit generalizes the shape library logic, so that dtype rules
for ops can also be expressed using the same mechanism. In other
words, each op can now have a shape function and a dtype function
specified in Python that is imported during lowering to calculate the
shapes and dtypes throught a program. For more information about how
to specify a dtype function, see the updated
`docs/adding_a_shape_and_dtype_function.md`.
For those not familiar with how the shape library works, the file
`docs/calculations_lib.md` provides an overview.
Currently `getTensorRank` returns -1 if it was unable to get the rank
of the tensor. However, not every use in the codebase was checking the
return value, and in some cases, the return value was casted to
unsigned leading to some infinte loops when an unranked tensor reached
a decomposition.
This commit changes the return of `getTensorRank` to
`Optional<unsigned>` to make it clear to the user that the function
can fail.
This commit also changes a couple of for loops that iterate a vector
in reverse order that can potentially become infinite loops into
range-based for loops.
A circular dependency was introduced in e7edcc62fd.
Specifically, the `makeShapeLLVMCompatible` and `makeShapeTorchCompatible` utilities were being called from `lib/Dialect/Torch/IR/TorchTypes.cpp` and `lib/Dialect/Torch/IR/TorchOps.cpp` defined under the `:TorchMLIRTorchDialect` bazel target, leading it to take a dependency on `:TorchMLIRConversionUtils` which already depends on `:TorchMLIRTorchDialect`, hence creating a circular dependency.
This commit resolves the same by moving said utilities from `lib/Conversion/Utils/Utils.cpp` to `lib/Dialect/Torch/Utils/Utils.cpp`. Please LMK if there's a better way to fix this and I will update the code.
This commit also adds the required targets to support building the new conversions from Torch to ML Program dialect that was introduced in f416953600.
Bazel build GHA triggered manually to verify: https://github.com/sjain-stanford/torch-mlir/actions/runs/3645944517
The current implementation of `DecomposeComplexOps` fails if an op
expected to be decomposed does not get decomposed in the first
iteration of the `createTorchSimplificationPipeline` in
`LowerToBackendContractPass`. However, some graphs require multiple
iterations of `createTorchSimplificationPipeline` to fully propagate
all statically knowable information, such as dtypes and shapes, to the
entire graph, sometimes resulting in the need to run
`DecomposeComplexOps` more than once.
This commit changes `DecomposeComplexOps` to use a greedy algorithm
for pattern application and moves the legalization check of ops to the
`LowerToBackendContractPass` to allow for the `DecomposeComplexOps` to
run more than once.
- Support for non-prefixed accessors has been removed. See:
https://reviews.llvm.org/D136727
- Rename `operands` to `methodOperands` in `prim.CallMethod` since the
name `operands` overlaps with a builtin method name. See:
https://reviews.llvm.org/D136727
- Add passes in refbackend to lower memref.subview. See:
https://reviews.llvm.org/D136377
- Replace `CopyToValueTensorOps` first in `RewriteViewLikeSubgraph` in
maximize-value-semantics.
The current implementation of the `RewriteViewLikeSubgraph` pass in
maximize-value-semantics creates temporarily invalid IR. In
particular, given a forward slice starting from a
`CopyToNonValueTensorOp` and ending in `CopyToValueTensorOp`s, the
pass first replaces all uses of the `CopyToNonValueTensorOp` with
its operand, which results in all the `CopyToValueTensorOp` users
having their operand have type `!torch.vtensor`, which is invalid.
The correct way to do things is to first replace all the
`CopyToValueTensorOp`s with their operand, and then replace all uses
of the `CopyToNonValueTensorOp` with its operand.
This only started failing now because the generated accessor
`getOperand` for the `CopyToValueTensorOp` now returns a
`TypedValue<NonValueTensorType>`, which has an assert checking that
the value returned is of the expected type.
This commit changes the `InsertRngGlobalsPass` to `TorchConversionToMLProgram`
pass. This commit also adds the `MLProgramBufferize` pass for the
bufferization of ml_program dialect ops to run on refbackend.
Signed-Off By: Vivek Khandelwal<vivek@nod-labs.com>
Summary of changes:
- Change ShapedType::kDynamicSize -> ShapedType::kDynamic
- llvm::NoneType has been deprecated, change convertScalarToDtype to use llvm::None
This commit replaces the LCG algorithm that was being used by the
`TorchToLinalg` lowering of `AtenUniformOp` to generate random numbers
with the `squares64` algorithm, for the LCG algorithm was producing
tensors that were highly correlated with one another.
Squares64 algorithm: https://arxiv.org/abs/2004.06278
Closes https://github.com/llvm/torch-mlir/issues/1608
Summary of changes:
- Replace call to `MemoryEffectOpInterface::hasNoEffect`
with `isMemoryEffectFree`.
- Make fix for the dynamic dims, since
`kDynamicSize` value changed to
`std::numeric_limits<int64_t>::min()` from `-1` in llvm
- `makeShapeLLVMCompatible` and `makeShapeTorchCompatible`
utilities convert shapes in order to remain consistent
with the Torch and MLIR semantics.
- Update tags
llvm: 147fe9de29dc13c14835127b35280c4d95c8e8ba
mhlo: 1944b5fa6062ec4c065d726c9c5d64f1487ee8c5
Signed-Off By: Vivek Khandelwal<vivek@nod-labs.com>
The current implementation sets the `nextSeed` value to `temp & 127`,
which is wrong. The last step of the LCG algorithm for the multiplier
and increment chosen should be `temp % 2^{64} = temp & (1 <<
63)`. However, because we are dealing with i64 values, the modulus
operation happens automatically, so it is not needed.
See Donald Knuth's values for LCG here:
https://en.wikipedia.org/wiki/Linear_congruential_generator
-- This commit fixes a bug in computeReductionType API.
-- The bug pertains to removal of `dim` from the `sizes` array.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
-- This commit adds decompose logic for `aten._softmax` when
`half_to_float` is `True`.
-- An e2e test case will be added once support for half to float conversion for
`aten._softmax` is added upstream.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
This commit fixes the aten.mean and aten.mean.dim op decomposition
for supporting large-sized inputs.
This commit also fixes the formatting for the file stats.py
Signed-Off By: Vivek Khandelwal<vivek@nod-labs.com>
-- aten.upsample_nearest2d.vec op is not present
owing to https://github.com/pytorch/pytorch/pull/85638
-- So this commit adds a lowering on aten.upsample_nearest2d.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
This commit renames the patterns used to match on lists of constant
values to `m_TorchListOfConstant{valueType}s`. This is needed to avoid
ambiguity for when `valueType` has `Optional` in it. In particular, it
makes it clear whether the values in the list are optional or the list
itself is optional.
lib/Dialect/Torch/Utils/Utils.cpp includes TorchOps.h, which, by way of
included header files, refers to both TorchOps.h.inc as well as
TorchTypes.h.inc. However, the build rules do not specify the
dependency of the `TorchMLIRTorchUtils` target on the TableGen generated
header files, causing spurious build errors.
This patch fixes the problem by adding `MLIRTorchOpsIncGen` and
`MLIRTorchTypesIncGen` to the list of dependencies of
`TorchMLIRTorchUtils`.
* build: update llvm tag to 74fb770d
This commit makes the following changes needed to update bump LLVM:
+ replace usages of `tensor::createPadScalarOp`, see https://reviews.llvm.org/D136493
+ Update file checks
The parameter "supportFPInputOnly" of function createPoolingOp() is
supposed to be "supportNonFPInput", which was added to distinguish
between "MaxPool2d" and "AvgPool2d" op in #718
This commit removes almost all of the valsem ops, since the value
semantics version of the ops now exist in PyTorch. The only op missing
is `aten.bernoulli_.float`. In addition, this commit also simplifies
the implementation of `aten.fill.Scalar` by moving it to the pattern
that converts elementwise ops.
This commit makes the following changes needed to update bump LLVM:
- Replace `linalg.init_tensor` with `tensor.empty` (see:
https://reviews.llvm.org/D135129)
- Replace `NoSideEffect` with `Pure` (see
https://reviews.llvm.org/D135505)
- Replace `body` region accessor for `ReduceOp` and `ReduceWindowOp`
with `getBody`
- Fix incorrect use of `tosa::ReduceSumOp` in `AtenNativeLayerNormOp`
conversion pattern. The result type of `tosa::ReduceSumOp` must have
the same rank as the input type. (see:
https://www.mlplatform.org/tosa/tosa_spec.html#_reduce_sum)
Co-authored-by: Ashay Rane <ashay@users.noreply.github.com>
Co-authored-by: Ashay Rane <ashay@users.noreply.github.com>
This commit removes the `weight` tensor from the inputs of one of the
`linalg.generic` ops generated by the `aten.convolution` linalg
lowering, since the indexed values are not actually used by the body
of the `linalg.generic`. Moreover, in general the `weight` tensor does
not have the same shape as the output tensor of the `linalg.generic`,
so both tensors being indexed by the same indexing maps is wrong.
-- This commit adds e2e support for `aten.Mish` op.
-- `aten.Mish` op is decomposed as following :-
Mish(x) = x * Tanh(Softplus(x))
Signed-off-by: Abhishek Varma <avarma094@gmail.com>
Signed-off-by: Abhishek Varma <avarma094@gmail.com>
This commit adds lowering of `aten.div.int` and `aten.bitwise_or.Tensor`
ops. Both these ops are required in order to support bloom_560m model.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
This commit updates the linalg conversion of `AtenMaxDimOp` to use
`arith.maxf` instead of `arith.select` to calculate the maximum. This
allows better vectorization further downstream, since the operation
can be converted to a simple max reduction when the `indices` result
is not used. See: https://github.com/iree-org/iree/issues/10666.
Summary of changes:
- Updated references to the Arith dialect
(https://reviews.llvm.org/D134762)
- Switched to prefixed accessors for MemRef dialect
(https://reviews.llvm.org/D134995)
- Fixed warnings about signed/unsigned comparisons, ignored return
values, and unused variables
* Fix c10::prim::Constant conversion; Added CAPI for passes; Added passes to base lazy backend
* Update ivalue_importer to use ImportOptions; Added tests for non-value/value tensor types
* Added tests for scalar Constant import; Updated MB::importFunction to use ImportOptions
* Test updates
* Move back module variable name
* Remove RefineTypes from TorchMlirLoweringContext::Build()
* Rename pass; Remove passes from base lazy backend
* Rename pass to VerifyBackendContractPass
* Aligned cmd pass name; Fixed TorchConversion passes registration
The auto-update of the PyTorch version broke the Torch-MLIR build
because it did not update the shape library. Going forward, we should
add the shape library update to the PyTorch version update action.
This commit adds support for TorchToTosa lowering of
`aten.broadcast_to` op for cases:
1.) When the rank of input and output tensor is equal.
2.) When the rank of input tensor is zero.
Signed-Off By: Vivek Khandelwal<vivek@nod-labs.com>
Summary of changes:
- Renamed OptionalArrayRefParameter since the name conflicts with an
upstream symbol that has a different meaning
(https://reviews.llvm.org/D133819)
- Removed extraneous dependency between TorchMLIRTorchToMhlo and
ChloOps, since the existing dependency on MhloDialect is sufficient
- Fixed code to prevent warnings related to comparisons between signed
and unsigned values
Strength the shape inference for aten.arange-like op by
1. registering aten.sub and aten.ceil.Scalar op and design folders for them.
2. register a new constant-like op: Torch::ConstantNumberOp and design canonicalizer for it.
This PR adds an `AllowedInModuleInitializer` trait to keep track of ops that are permitted in the module initializer. We have a handful of such ops that are produced by the IValue importer, and so this change avoids maintaining a list of ops in `TorchOps.cpp` that could lead to spurious merge conflicts, and help us integrate torch-mlir in our downstream compiler better. Please let me know if you'd prefer a better name for the trait itself. Feedback is welcome!
As @oroppas identified, literal strings that are over 16,380 characters
cause the MSVC compiler to throw an error (C2026), eventually causing
the Windows build of Torch-MLIR to fail because the length of the
generated MLIR for the shape library crosses the allowed threshold.
This patch fixes the problem by making the Python script generate one
literal string per line to satisfy the MSVC compiler.
Thanks to @oroppas for the bulk of the effort required to resolve this!
Summary of changes:
- Updated emitAccessorPrefix since the default value has changed
(https://reviews.llvm.org/D133179)
- Updated RefineTypes pass since Lattice::isUninitialized() is removed
(https://reviews.llvm.org/D132800)
- Updated MHLO tag so that it builds with the updated LLVM tag
- Disabled two tests that cause segfaults in the TOSA backend (see Issue
#1361)
* Add aten.frobenius_norm.dim op and init its conversion pattern to linalg and MHLO,
* run symbolic-shape-optimization before hlo-legalize-to-linalg to fit more mhlo e2e tests.
Summary of changes:
- Update the dataflow analysis in RefineTypes.cpp
- Add tosa-to-arith pass after tosa-to-linalg pass, since
tosa-to-linalg (and canonicalizations) can produce tosa.const() ops
- Fixed warning about not making `matchAndRewrite` as override
This commit adds decomposition of `aten.linear` op. Due to limited
support at tosa backend in case of dynamic dimensions, this
decomposition is currently disabled for tosa backend.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
- Update MHLO commit to build with LLVM commit hash 00d648bd
- Update TorchToMhlo code to work with Stablehlo
- Re-enabled two failing TOSA tests, thus resolving Github Issue #1231
Caught in the wild here:
https://github.com/llvm/torch-mlir/runs/8046660640?check_suite_focus=true
It is common for a missing dependency to only surface as an issue on the
CI machines since they have fewer cores which prevents a "race" that
happens to cause the dependency to be built before the dependent.
An earlier patch (bb47c166) incorrectly replaced the now-dropped
`OpaqueElementsAttr` with `SparseElementsAttr` in one place and with
`DenseElementsAttr` in another. This patch fixes the problem by making
both replacements use the dense-equivalent type.
We were already hitting many cases where backends different in terms of
the legal ops that they wanted. This caused unnecessary coupling between
the backends. Examples:
- https://github.com/llvm/torch-mlir/pull/1161
- https://github.com/llvm/torch-mlir/pull/862
This PR centralizes all compilation to go through `torch_mlir.compile`
so that we can keep the logic centralized there. We should move these
lists closer to each backend. Especially cases like
https://github.com/llvm/torch-mlir/pull/862 where blocking a
decomposition is necessary to avoid a crash emphasize that the set of
decompositions is tightly coupled to the backend, and should be
"controlled by the backend" and not something arbitrarily tweakable.
Also:
- Fix a small bug in the way we passed through the backendLegalOps
option.
- Add better error messages in `torch_mlir.compile` for import errors.
One of the simplifications made by the pass `RefinePublicReturn`
currently only happens if the tensor in question only has one
user. However, the current method of checking this does not correctly
handle the case of a user having multiple uses of the same
tensor. This commit makes sure only unique users are considered.
This is a first step towards formalizing the set of ops in our backend
contract. The goal is to eventually formalize `torch` dialect ops into 3
categories:
1. Legal in backend contract
2. Illegal in backend contract
3. Conditionally legal in backend contract
The "conditionally legal" set are the ops that we can optionally
decompose for backends.
This patch adds relevant pass options for this throughout the compiler,
in preparation for a new set of traits which will formalize this
classification.
I recently fixed the handling of the `dim` argument in
`sum_mean_dim` (59fccab857). Therefore,
the checks that the `dim` input is `None` or `[]` are no longer needed.
This introduces a new pass LowerToBackendContract (better name very
welcome) which performs the bulk of the simplifications that we do,
such as
- shape refinement
- dtype refinement
- maximizing value semantics
- inlining global slots
- decomposing complex ops
The key difference from before is that it iterates the set of
transformations, which can help to break a number of "catch-22" issues
where one simplification depends on another, the latest example being
here:
https://github.com/llvm/torch-mlir/issues/1131
This also exposed that RefineTypes was sometimes crashing/asserting for
certain inputs. This commit hardens it a bit.
Bumps the shape library:
- Updates the function signature for aten.arange.start_step
- upstream_shape_functions.mean_dim -> upstream_shape_functions.sum_mean_dim
The Torch dialect has an include to `mlir/Dialect/Func/IR/FuncOps.h` and
should therefore have a CMake dependency to the MLIRFuncDialect.
Otherwise, the build can fail since it may occur that
`mlir/Dialect/Func/IR/FuncOps.h.inc` isn't generated yet.
Summary of changes:
- Switch to C++17 (similar to https://reviews.llvm.org/D131348)
- Update MHLO to build with LLVM commit hash 061e0189
- Replace deprecated `hasValue()` and `getValue()` with `has_value()`
and `value()` respectively (https://reviews.llvm.org/D131349)
- Use `TypedAttr` (https://reviews.llvm.org/D130092)
- Use updated assembly format of `mhlo.compare` op (commit
d03ef01e70fbf9afd0fa1976fbb7ed31838929b3 in MHLO repo)
Rather than a per-global-slot initializer region, we now have one for
the whole module. For example, it might look like this:
```
torch.global_slot "private" @tensor : !torch.tensor
torch.global_slot "private" @list : !torch.list<tensor>
torch.global_slot.module_initializer {
%0 = torch.tensor.literal(dense<0.0> : tensor<f32>) : !torch.tensor
%1 = torch.prim.ListConstruct %0 : (!torch.tensor) -> !torch.list<tensor>
torch.initialize.global_slots [
@tensor(%0 : !torch.tensor)
@list(%1 : !torch.list<tensor>)
]
}
```
This new structure allows GlobalizeObjectGraph to create the initializer in a
much simpler way, avoiding the need to reason about whether different slots
alias each other. Reasoning about whether slots alias each other now is the
responsibility of InlineGlobalSlots, which has to do a much more complicated
analysis, implemented using MLIR's dataflow analysis framework.
Recommended review order:
- Check out the new IR constructs in the .mlir files of various passes
- Op definitions (*.td)
- Changes to GlobalizeObjectGraph pass.
- InlineGlobalSlots pass (~total rewrite)
- Misc changes:
- Moving torchMlirAdjustStaticInformation for sharing with C++ code.
- EraseModuleInitializer pass
To make this a bit nicer, it would be good to have a `torch.module` op
with an initializer region attached. That would be more invasive though.
This change has highlighted certain aspects of our project layering
which are worth calling out. None of our backends can handle global
slots, so we enforce that there are no global slots before backend
lowering. At an earlier stage in the project, we had aspirations of
transparently handling mutable global state and such, but for reasons
described below, that is no longer a goal. So really global slots should
be seen as a progressive lowering step as part of inlining all the
IValue's in the original program (GlobalizeObjectGraph is also one such
step).
Over time, with insights from work like IREE-JAX, it has become clear
that there isn't a reliable programming model we can compile for users
where we just transparently handle mutable global state (and some other
things, like lists and dictionaries). There is a need for an "outer
program" that orchestrates more restricted subroutines of the kind we
can handle in our compile flow here. The benefit of that is that it
decouples considerations like shapes, dtypes, etc. from the program
constructs used in the outer program. As long as the outer program can
efficiently invoke (pipelining/async/etc.) high-performance
data-parallel numerical subroutines of the kind we compile in our flow
here, then there is a complete programming model. This is also
consistent with the direction of upstream PyTorch which is becoming more
tracing-based (which inherently loses a lot of program structure, which
then has to be applied back with an "outer program" orchestrating the
traced subroutines).
follow up #761:
This patch updates the `torch_mlir::convertTensorToMlirElementsAttr()`
method to enable the creation of tensors whose base type is Float16.
This patch also adds a test to validate the IR generation, and it
updates the test for importing tensors of various types.
PyTorch recently added support for `dim=None` in the `torch.var`
(5ca9b2b6fa)
and `torch.std`op (eb0e30e0bc).
This commit adds the corresponding support in torch-mlir.
Signed-Off By: Vivek Khandelwal<vivek@nod-labs.com>
* [MHLO] Support for dynamic shape in basic op conversion by introducing CHLO dialect
Co-authored-by: Bairen Yi <yibairen.byron@bytedance.com>
Co-authored-by: Jiawei Wu <xremold@gmail.com>
Co-authored-by: Tianyou Guo <tianyou.gty@alibaba-inc.com>
Co-authored-by: Xu Yan <yancey.yx@alibaba-inc.com>
Co-authored-by: Ziheng Jiang <ziheng.jiang@bytedance.com>
* [MHLO] Support I32 as shape tensor dtype
* [NFC] Add a 'TODO' annotation
* Assume zero rank tensors are scalar
* Run RefineTypes pass on JIT Graph
* Rollback assumption that zero rank tensors are scalar
* Set numSizes to -1 for non-ranked tensors
* Rename RefineTypes to RefineTupleTypes
This commit fixes the shape calculation for:
1.) aten.mean.dim
2.) aten.var.dim
3.) aten.sum.dim_IntList op
Also, it fixes the lowering of `aten.mean.dim` and
`aten.sum.dim_IntList` for handling the cases of empty dim list.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com
- Includes a canonicalizer for `aten.add.t`needed for successfully lowering the shape function
- Only offers support for statically sized index tensors when there is more than one
- Dynamic shape support remains for single indexing tensors
This commit adds verifiers to the ops `ToBuiltinTensorOp` and
`FromBuiltinTensorOp` that make sure that the input and output have
the same shape and data type.
In the interest of merging upstream LLVM quickly, a previous patch
(7f08169) updated the torch-mlir build to register all dialects and
passes through Python bindings. This patch limits the dialects and
passes to only those that are used in torch-mlir.
Key to this change are the removal of
`MLIRPythonExtension.RegisterEverything` and the introduction of a new
Python module (`_mlir_libs/_site_initialize_0.py`), where we register
the dialects and passes used by torch-mlir.
- Supports cases where the view op expands and collapses dims
simulataneously. This does not handle the case where it is neither
expanding nor collapsing (e.g. [2, 3] -> [3, 2])
- Additionally fixes a previous bug with adding 1-sized dims on both
sides of a tensor with aten.view
An upstream MLIR bug (that was recently fixed) caused the result to be
ignored for Region- and Block-visitor functions. Now that the bug is
fixed, we don't need an auxiliary boolean to track whether the visitor
function has succeeded.
This commit adds the support for negative dim cases for `aten.cat`,
`aten.slice.Tensor` and `aten.slice_scatter` op.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
emitError is intended for error cases and not match failures of
patterns. notifyMatchFailure is intended where pattern reports reason
for not matching.
Op verification should also not happen inside patterns but as part of
verify/verification, but left ones that were obviously verification to
emitError inside patterns to keep this change small.
The biggest change here is to upgrade RefineTypes to the new sparse
dataflow framework.
Smaller changes:
- minor changes to type parsing
- suppress warnings in e2e tests
The original conversion pattern for `AtenBatchNormOp` required that
the input rank be greater than 2; however, the only
expectation in the conversion pattern and in Pytorch is that the input
rank is greater than 1, since the second dimension of the input must
match the size of the `weight`, `bias`, `runningMean`, and
`runningVar` inputs. This commit fixes the `inputRank` check.
This commit adds the decomposition for `aten.var.dim` op.
This commit also make changes in the decomposition for `aten.var` op.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
This patch adds a new pass `torch-verify-conversion-to-value-semantics`,
which looks for non-value semantics tensors to catch such tensors early
during compilation.
This pass requires `torch-refine-public-return` pass to ensure that
return operations are updated to use value tensors, followed by the
canonicalize pass to remove any dead ops that may use or produce
non-value tensors.
lowering.
This commit addresses the remaining comments on lowering of
slice_scatter and select_scatter.
Signed-Off-By: Prateek Gupta <gprateek93@gmail.com>
Prior to this patch, the canonicalizers for `AtenSizeOp` and
`AtenSizeIntOp` succeeded only if the tensor operand's type information
included the size of the requested dimension(s). We can extend the set
of optimizable cases by propagating types across operations whose result
type matches the input tensor type.
Specifically, this patch enables the canonicalizers for `AtenSizeOp` and
`AtenSizeIntOp` to see past `tensor_static_info_cast`,
`copy.to_vtensor`, and `copy.to_tensor` ops until it reaches the first
op whose result type contains size information for the requested
dimensions, with a maximum bound of 6 parent lookups to avoid indefinite
compilation times. All other encountered ops cause the canonicalizer to
give up.
Prior to this patch, the code in the `torch-simplify-shape-calculations`
pass iterated on the uses of an op's result while also modifying the
value. This caused the iterator to get invalidated, thus terminating
the loop early and producing incorrect IR. This patch makes use of
`llvm::make_early_inc_range()` to ensure that the iterator is not
invalidated while executing the loop body.
This commit does three things:
1. Reverts some of the shape lib changes merged in
https://github.com/llvm/torch-mlir/pull/844
2. Updates the signature of `aten.sum_dim_IntList` that was recently
updated in
23bdb570cf
3. Replaces `aten.zero.functional` with `aten.zero`, updated in 960758b0b7
`aten.select_scatter` op.
This commit adds:
1. Lowering of `aten.slice_scatter` op into `tensor.insert_slice`
op.
2. Decomposes the `aten.select_scatter` op into `aten.slice_scater`
op.
Signed-Off-By: Prateek Gupta <gprateek93@gmail.com>
The canonicalizer converts `torch.prim.dtype` ops into integer constants
for valid types, but the type may not be known until type refinement is
complete. However, type refinement cannot make progress until
`torch.prim.dtype` ops have been resolved to their corresponding integer
constants, thus creating a circular dependency.
This patch creates a tight coupling between type refinement and the
lowering of `torch.prim.dtype` ops by handling such ops as they are
encountered during type refinement. The unit test in this patch aims to
check whether the type refinement pass can now handle chains of
operations that alternate between type construction and type refinement.
This patch replaces the use of raw integers like 6, 4, etc. (that
represent PyTorch's scalar types) with named values from the ScalarType
enum (e.g. `ScalarType::Float`, `ScalarType::Long`, etc.) in code for
folding `prim.dtype` ops into numeric constants.
This patch isn't strictly a non-functional change, since its use of
`Torch::getScalarTypeForType()` implies that the input type has to be
one among the supported types, otherwise compilation will abort, whereas
previously, compilation proceeded without folding the unsupported data
type into a numeric constant.
A prior patch (63538de2) that added support for bfloat16 type did not
add the canonicalization pattern to fold `torch.prim.dtype` operations
on bfloat16 tensors into the integer constant 15. This patch fixes the
problem.
A previous fix to the handling of size-1 dims in
`aten.view` (https://github.com/llvm/torch-mlir/pull/962) resulted in
the wrong grouping of dimensions when size-1 dims where between two
dims of size greater than 1. This commit fixes that.
In the `pyhpc_turbulent_kinetic_energy` TorchBench benchmark, the shape
calculation occurs inside loops, but because `DropShapeCalculationsPass`
does not explicitly mark the Torch dialect as legal, the pass execution
fails.
This patch adds Torch to the list of legal dialects, and adds a test to
validate the translation.
This commit lowers `aten.matmul` to `linalg.BatchMatmul` under the
following conditions:
1. The result of matrix multiplication must have batch dimensions,
i.e., rank greater than 2.
2. The resultant matrix must have at most 1 dynamic batch dimension.
It also handles broadcasting of batch dimensions when batch dimensions
of the matrices are broadcastable.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
This commit fixes the shape function for `index.Tensor`, adding
support for multiple index tensors and `None`s in the indices
list. This commit also adds support for input tensors of rank greater
than 1. The lowering for `index.Tensor` still has the the limitation
that only a single index tensor along the first dimension of the input
tensor is supported.
Prior to this patch, the torch dialect included `AtenTriuOp` for
computing the upper triangular part of the input matrix, but there was
no code for lowering the op to the linalg dialect.
This patch adds code to generate a `linalg.generic` operation that
compares indices (computed using `linalg.index`) to choose between zero
or the original value (using `arith.select`). The lowering fails if the
number of dimensions are less than two. This patch also adds a few
end-to-end tests.
* [MLIR][TORCH] Add folder for torch_c.from_i64 & torch_c.to_i64
* add unit tests for each individual fold
* fix failure of NumelZeroRankModule & TestMultipleTensorAndPrimitiveTypesReturn
The MacOS builders are having linking trouble with the extension library.
Until it's fixed, all support for op extensions is disabled. It should be
easy to restore once the issue is resolved.
The function `AffineMap::inferFromExprList` does not work if the first
vector of expressions is empty, because it uses these expressions to
obtain the context. This prevented `aten.permute` from working for
inputs of 0-rank. This commit adds support for 0-rank inputs.
PyTorch allows new operators to be registered dynamically in modules.
Torch-mlir already makes it fairly straightforward to add support for
new operators, and this commit just extends that support to allow new
PyTorch ops to come from a external module.
This does *not* allow ops to be dynamically loaded into torch-mlir.
Torch-mlir must still be compiled with support built-in.
Add a `_torch_mlir_custom_op_example` subpackage to `torch_mlir` which
registers an demonstration op. It will not be imported by default when
importing torch_mlir. It's strictly for testing and documentation.
Adds an end-to-end test for the `torch_mlir_custom_op_example::identity` op.
With all these changes, we should now be actively testing PyTorch extension
support with all future patches.
Now that upstream exposes them nicely, we can use them.
I noticed that we had added stuff into the upstream_shape_helpers.py
file (which was supposed to stay pristine), so some more shape functions
need to be upstreamed.
Going forward, all shape functions should be upstreamed similar to
https://github.com/pytorch/pytorch/pull/76889 instead of added in this
file.
This commit adds lowering of `aten.div.Tensor_mode` op.
This commit also fixes formatting for the test file elementwise.py.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
This commit decomposes `aten.baddbmm` op into `aten.bmm`,
`aten.mul.Scalar`, and `aten.add.Tensor` op.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
The patch bumped up the LLVM tag made manual fixes to the code in
`ShapeLibrary.cpp`. However, since that file is generated by the
`update_shape_lib.sh` script, its contents were reverted each time the
script was run. This patch fixes the problem by removing the manual
changes to that file.
This commit adds the decomposition of `aten.adaptive_avg_pool2d` op into
`aten.avg_pool2d` op. The current decomposition only supports cases where
input size is equal to the output size.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
When compiling without assertions (i.e. in `NDEBUG` mode), a handful of
statements turn to NOPs, which results in warnings such as missing
return statement or unused variables and function. This patch replaces
such statements with `llvm_unreachable()`, which informs the compiler
about program termination regardless of the `NDEBUG` mode. This also
enables torch-mlir to be compiled using the flags `-Wall`, `-Wextra`,
`-Wpedantic`, and `-Werror`.
This patch adds support for the torch.linalg.vector_norm op to the torch
dialect, including the necessary shape function. It also extends the
conversion of reduction operators to support lowering of
AtenLinalgVectorNormOp, in addition to adding a handful of end-to-end
tests to validate the lowering.
There exist several opportunities to make this lowering optimal and
robust. For instance, in its current form, the translation does not
support ord = 0, +inf, or -inf. For L1 norms, we don't need to raise
each element to the power 1.0. Similarly, L2 norms could benefit from
strength reduction. Since the canonicalization pass is not able to
apply these optimizations, we should consider applying them during the
linalg lowering itself.
In addition to updating the llvm-project submodule, this patch also:
1. updates shape functions and tests so that `func` and `call`
operations refer to the `func` dialect
2. avoid duplicate registration of dialects
The op `aten.rand_like` was missing a shape function, unit tests, and
the `dtype` argument was being ignored in its decomposition. This
commit fixes all three things.
This commit adds support for aten.max_pool2d, aten.max_pool2d_with_indices,
and aten.avg_pool2d op for the cases where ceil_mode = true.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
The preserve memory specifies that `If any of the input tensors is in channels_last format,
operator output should be in channels_last format` and hence can be
added as is in aten_empty_like op.
Fix the type promotion code for scalar only operation to return
TorchType which is the type tracked in ValueKnowledge.scalarType.
- Fix `getPromotedResultScalarType` to return Torch type.
- Add `getBuiltInTypeForTorchScalar` helper to convert scalar type
to builtin type before passing to the next level type promotion
helper `updateResultTypeState`.
- Add `setScalarType` helper to make setting ValueKnowledge.scalarType
easier.
This commit adds lowering of `aten.ge.float`, `aten.ge.float_int`,
`aten.ne.float_int`, `aten.gt.float_int` and `aten.ceil.float` op.
This commit also fixes formatting for the file scalar.py and scalar_comparison.py.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
The main changes are:
- Added `ValueKnowledge.scalarType` to track scalar type information.
- Added `ValueKnowledge.kind` to indicate the value kind.
- Modified the meet and join helper functions. The ValueKnowledge has
slightly more complicated state now so the meet and join function need
to look at the `kind` field in addition to just the type field.
- This commit adds support for `aten.mean.dim` op.
- It also adds a new test script `stats.py` for statistics related ops.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
This also has a fix for the adjustment of types of TupleConstruct
inputs, which I found when using this new functionality on a model.
Some scenarios in tracing create situations where the output of
TupleConstruct has a more refined type than the inputs.
This introduces a helper `adjustStaticInformationForValues` which
subsumes the `derefineValues` helper and the tensor static information
adjustment we were doing.
This commit decomposes `aten.to.dtype_layout` op into `aten.to.dtype` op.
This commit also fixes the formatting for the file type_conversion.py.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
This commit adds lowering of `aten.masked_fill.Scalar` op.
This commit also fixes the formatting of the file constant_alloc.py.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
This commit fixes the `ConstantPad2dStaticModule` test case by adding
the lowering of `aten.pad` operation. Previously the test case
mapped to `aten.constant_pad_nd` operation.
The `aten.pad` now decomposes into `aten.constant_pad_nd` operation.
Signed-Off-By: Prateek Gupta <prateek@nod-labs.com>
This patch updates the `torch_mlir::convertTensorToMlirElementsAttr()`
method to enable the creation of tensors whose base type is BFloat16.
This patch also adds a test to validate the IR generation, and it
updates the test for importing tensors of various types.
1. This commit adds lowering of "while-like" prim loop to scf.while
operation.
2. Adds lowering of "for-like" prim loops to scf.for operation.
Signed-Off-By: Prateek Gupta <prateek@nod-labs.com>
This commit adds lowering of `aten.ceil.float` op.
This commit also fixes formatting for the file scalar.py.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
The updated LLVM code includes a patch to create bfloat16 array
attributes, thus enabling a different patch to torch-mlir to flesh out
support for the bfloat16 type.
Prior to this patch, the result type for several tensor operations could
only be float32, float64, or null. This patch adds bf16 to the list of
allowed result types.
Added the dynamic registration of return function to the execution
engine. This makes sure that different/multiple return types are supported.
Also, updated the .style.yapf indentation to 4.
* shape: add shape transfer function for aten.neg
Prior to this patch, the list of shape transfer functions did not
include `aten.neg`, which resulted in errors like below.
```
error: unsupported by backend lowering: tensor with unknown rank or dtype
note: see current operation: %0 = "torch.aten.neg"(%arg0) :
(!torch.vtensor<[256,256],f32>) -> !torch.vtensor<*,f32>
note: this is likely due to a missing shape transfer function in shape_lib_gen.py
```
This patch fixes the problem by adding a shape transfer function to
reflect the point-wise nature of this operation.
* linalg: add translation of aten.neg operation
This patch adds a translation rule to lower `aten.neg` operations on
tensors to an `arith.negf` operation wrapped inside a `linalg.generic`
operation. This patch also adds a rudimentary test.
This commit adds lowering of `aten::max_pool2d_with_indices_backward` op.
This commit also fixes formatting issues in basic.py.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
This commit adds the following support to the op `nll_loss_backward`:
- `input` tensor can be rank-1
- `weight` parameter
- `reduction` parameter
- `target`, `grad_output`, `total_weight` can be rank-0
- Checks that input tensors are of the expected type
This commit adds support for multi-dimensional tensors as input to the
`_index_put_impl_` op. The support was to some degree already there,
since `ScatterOp` already supports multi-dimensional tensors. This
commit also adds a bit more error checking to `index_put` and
refactors the code for creating `ScatterOp`s to mimic the way one
would make a `Linalg::GenericOp`.
The issue was in the canonicalizer for torch.aten.ge.int -- in cases
where the operands were swapped, it would miscompile. This issue is
fixed and folding support generalized to `torch.aten.size.int < 0` as
well.
Fixes#716
This commit decomposes different variants of `aten.where.*` op into
`aten.where.Self` op. It covers `aten.where.Scalar`,
`aten.where.ScalarSelf` and `aten.where.ScalarOther` ops.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
This commit decomposes `aten.new_empty` op into `aten.empty.memory_format` op.
This commit also made a dtype fix to the constant tensor allocation like ops.
Earlier the dtype for the result was inferred from the result type; now, it's
being evaluated as per the original definition of the op.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
A recent PyTorch commit made ConstantPad2d call a helper function with a
`Union[int, float]` type annotated. This commit adds minimal support for
representing and dealing with that.
https://github.com/pytorch/pytorch/pull/73287
Changes:
- Adding support for `!torch.union<T1, T2, T3>`/`Torch::UnionType`,
along with the importer and CAPI code.
- Add support in isValidSubtype for union types.
- Adding a canonicalizer for `torch.derefine` to help simplify some code
that derefines to a UnionType (this also fixes#664).
There is still more work to do for really supporting UnionType well,
such as canonicalizing UnionType's so that they can be compared with
pointer equality.
The reified code to compute the shape of torch.aten.constant_pad_nd
uses negative indices when setting list elements. This was not
converted to a positive offset in one place in SimplifyShapeCalculations
which prevented computation of the static shape.
The logic in the rewriting phase had a bug in case of a read-only op
coming before mutation ops. The logic would use the op itself as the
"latest literal", but that is not correct, because later on we replace
the op itself with the *final* "latest literal", assuming that all uses
of the op have been rewritten -- that was working in general, except for
any read-only ops at the beginning.
Big thanks to @ljfitz for the tiny reproducer!
Fixes#704
This commit adds support for the cases of view op where the rank and
the shapes of the input and result are equal.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
In order to make sure that the TorchToLinalg conversions leave the
graph in a valid state, the final result of the conversion has to be
casted to the result type of the op. This commit adds this cast to ops
that did not have it.
- This commit adds decomposition of `aten.dropout` op. It also covers the
training mode of the same op.
- It also adds lowering of `aten.sub.float` op.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
The `assemblyFormat` stuff (which generates unrolled, per-op C++ code)
was taking up a lot of compile time, and all the ops are essentially
printed with the same logic. So this PR makes them all call the same
helper function. This is done by using
`let hasCustomAssemblyFormat = 1` and then implementing `FooOp::parse`
and `FooOp::print`.
Additionally, the `Generated*Ops.td` files are all collapsed into just
`GeneratedTorchOps.td` (there is no reason to have the files separate,
since the files are very large anyway so one is always having to search
within them -- editors don't care that the file to search is now a bit
bigger :) ).
This reduces TorchOpsODSGenerated.cpp compile time (which is now
GeneratedTorchOps.cpp) from 39 to 31 seconds on my machine. This is
actually less than I expected, but this PR is an overall cleanup to the
code anyway. The next step will be to introduce (better) functionality
upstream for sharding the TorchOps.cpp.inc file, so that we can truly
parallelize the O(#ops) costs. This is also necessary, because after
this PR, TorchDialect.cpp is now the slowest file to compile, due to the
`addOperations<... all the ops ...>` call, which needs to be shareded
too.
This commit adds the op `ValsemVariantAtenCopyOp` that represents
`AtenCopy_Op` without the underscore. This is needed to make sure
that the `ReduceOpVariants` pass turns the in-place op into an op
that takes value tensors as inputs, otherwise the
`MaximizeValueSemantics` pass will not be able to add value
semantics correctly.
This commit also adds the lowering of `ValsemVariantAtenCopyOp`.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
This commit adds support for type refinement when
`torch.tensor_static_info_cast`s are involved, even when there are
users of the casted tensor that don't allow type refinements.
Originally the canonicalization pattern for
`torch.tensor_static_info_cast` would check if all the users of the
casted tensor allowed type refinements before making any changes. This
means that if at least one of the users did not allow type
refinements, the pattern would fail. This becomes an issue when doing
shape calculations because the calculations need the shape information
of each input tensor to be available before the calculation can be
simplified.
This commit fixes the 2nd and 3rd return types of the `aten.native_layer_norm`.
Previously the mean and rSTD were returned with reduction dims removed.
This commit fixes this and keeps the reduction dims of the results.
Signed-Off-By: Prateek Gupta <prateek@nord-labs.com>
The ODS-generated code included via the `TorchOps.cpp.inc` file takes a
very long time to compile. This PR isolates it into its own file so that
the build system can cache it.
This PR creates a new file `TorchOpsODSGenerated.cpp` just to include
the `TorchOps.cpp.inc` file. Doing so required moving to the "new" way
to define verifiers, since the static `verify` free functions in
TorchOps.cpp weren't accessible from the .inc file after it was moved to
`TorchOpsODSGenerated.cpp`.
On my machine, this drops the build time of TorchOps.cpp (such as when
iterating on a canonicalizer) from >40 seconds to <10 seconds.
10 seconds still isn't great though, but at least it isn't "go get a
coffee" type of waiting.
This commit adds the op `ValsemVariantAtenIndexPutImplOp` that represents
`Aten_IndexPutImpl_Op` without the underscore. This is needed to
make sure that the `ReduceOpVariants` pass turns the in-place op
into an op that takes value tensors as inputs, otherwise the
`MaximizeValueSemantics` pass will not be able to add value
semantics correctly.
This commit also adds the lowering of `ValsemVariantAtenIndexPutImplOp` op.
This commit also updates the `torch.bincount` op test cases.
The term "pseudo" is very vague and was getting confusing (I felt I had
to explain it in every comment referencing it). Instead, rework the
"pseudo" ops to instead be named:
- MLIR Syntax: `torch.valsem.*`
- C++ / ODS: `ValsemVariant*Op`
This makes it clear what the concept is, and avoids confusion with other
things that might be called "pseudo", since these are very specific and
should be 100% consistently named w.r.t. the non-valsem-variant ops that
they correspond to.
This is code that we always want to treat as "foreign" and not get too
comfortable using in many functions. One way to accomplish that is to
make it a bit clunkier to use.
Also, fix Utils.cpp to match the LLVM/MLIR coding conventions (don't
define functions inside namespaces -- prefer `using` and explicit
qualification).
This leads to much more succinct types in many cases:
```
!torch.list<!torch.int>
!torch.list<int>
!torch.tuple<!torch.list<!torch.int>, !torch.list<!torch.int>>
!torch.tuple<list<int>, list<int>>
!torch.optional<!torch.list<!torch.int>>
!torch.optional<list<int>>
!torch.list<list<list<tensor>>>
!torch.list<!torch.list<!torch.list<!torch.tensor>>>
```
I would like to take this further and allow omitting the `!torch.`
prefix in all cases, but that's harder -- for example, we currently use
`FuncOp` for functions, and so I don't think we can customize the
printing there. It seems like it will be a longer road to getting that
level of customization.
See the documentation in `docs/shape_lib.md` and
`docs/adding_a_shape_function.md` for an overview of the system.
This completely overhauls how we represent shape functions. In
particular, RefineTypes does not infer shapes anymore (only dtypes).
Shape functions are now written in (TorchScript'able) Python.
Recommended review order:
1. Read `docs/shape_lib.md` and `docs/adding_a_shape_function.md`.
1. Code and tests for ReifyShapeCalculations, DropShapeCalculations.
1. Code and tests for SimplifyShapeCalculations.
1. shape_lib_gen.py
1. Code and tests for new RefineTypes pass.
1. Random folders/canonicalizers in TorchOps.cpp and associated test in
`canonicalize.mlir`.
1. New ReadOnly trait inferred from the registry.
1. Any miscellaneous remaining stuff.
Example `-print-ir-after-all` for ElementwiseUnaryModule:
[IR lowering dump](https://gist.github.com/silvasean/e4dc8cbc8d00aac7819602e3cbd8e212).
Example `-print-ir-after-all` for ElementwiseBinaryModule:
[IR lowering dump](https://gist.github.com/silvasean/daf6860ecced732af3568af6b1899113).
This helps keep things organized and also exposes more parallelism to
the build system. It seems though that most of the compile time is
actually spent in the headers though, so the wall time doesn't decrease
as much as I had hoped (and now that the headers are being included
multiple times, the cpu time actually increases a lot, sadly -- will try
to dig into this).
This commit replaces the two rewrite patterns of
maximize-value-semantics with a single pattern that captures the
behavior of both as well as other edge cases previously not
supported. The new pattern works by first performing alias analysis on
a subgraph to see if pattern is applicable, then rewriting all
non-value tensors to value tensors in a single go.
This pass is added to lower ops, which can not be lowered
via the TorchToLinalg pass, such as `torch.bincount` op.
This pass also uses torch-mlir's TMTensor Dialect to lower the
complex ops.
Also add torch.bincount op lowering with the help of TMTensor dialect
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
This commit moves the helper function which are common across
different torch-mlir conversion passes into a common directory
Utils.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
This commit adds support for integer type inputs for
`AtenMaxOp`, `AtenSumOp`, `AtenSumDimIntListOp`.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
- This commit adds E2E support for `aten.rand_like` and
`aten.bernoulli_.Tensor` ops.
- The `aten.bernoulli(x)` was implemented as:
`aten.bernoulli(x) = rand_like(x) < 0.5`, assuming 0.5 as default
probability, whereas according to the pytorch documentation:
https://pytorch.org/docs/stable/generated/torch.bernoulli.html#torch.bernoulli
the input x in `aten.bernoulli(x)` is itself a tensor containing
probabilities to be used for drawing the binary random number.
- So this commit fixes the `aten.bernoulli(x)` implementation as:
`aten.bernoulli(x) = rand_like(x) < x`.
- It also fixes the case where the input to `aten.bernoulli_.float` is
an integer tensor. In this case the input must be casted to float type
before passing it as operand to `aten.rand_like` op.
`aten.bernoulli_.float(x, p) = rand_like(float(x)) < p`.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
The view op allows for the new shape argument to have a -1 value for
one of the dimensions, and the op is expected to deduce the size of
that dimension by looking at the sizes of the other dimensions and
comparing it to the total number of elements in the original
tensor. This commit adds this functionality.