This is a first step towards formalizing the set of ops in our backend
contract. The goal is to eventually formalize `torch` dialect ops into 3
categories:
1. Legal in backend contract
2. Illegal in backend contract
3. Conditionally legal in backend contract
The "conditionally legal" set are the ops that we can optionally
decompose for backends.
This patch adds relevant pass options for this throughout the compiler,
in preparation for a new set of traits which will formalize this
classification.
This introduces a new pass LowerToBackendContract (better name very
welcome) which performs the bulk of the simplifications that we do,
such as
- shape refinement
- dtype refinement
- maximizing value semantics
- inlining global slots
- decomposing complex ops
The key difference from before is that it iterates the set of
transformations, which can help to break a number of "catch-22" issues
where one simplification depends on another, the latest example being
here:
https://github.com/llvm/torch-mlir/issues/1131
This also exposed that RefineTypes was sometimes crashing/asserting for
certain inputs. This commit hardens it a bit.
Summary of changes:
- Switch to C++17 (similar to https://reviews.llvm.org/D131348)
- Update MHLO to build with LLVM commit hash 061e0189
- Replace deprecated `hasValue()` and `getValue()` with `has_value()`
and `value()` respectively (https://reviews.llvm.org/D131349)
- Use `TypedAttr` (https://reviews.llvm.org/D130092)
- Use updated assembly format of `mhlo.compare` op (commit
d03ef01e70fbf9afd0fa1976fbb7ed31838929b3 in MHLO repo)
Rather than a per-global-slot initializer region, we now have one for
the whole module. For example, it might look like this:
```
torch.global_slot "private" @tensor : !torch.tensor
torch.global_slot "private" @list : !torch.list<tensor>
torch.global_slot.module_initializer {
%0 = torch.tensor.literal(dense<0.0> : tensor<f32>) : !torch.tensor
%1 = torch.prim.ListConstruct %0 : (!torch.tensor) -> !torch.list<tensor>
torch.initialize.global_slots [
@tensor(%0 : !torch.tensor)
@list(%1 : !torch.list<tensor>)
]
}
```
This new structure allows GlobalizeObjectGraph to create the initializer in a
much simpler way, avoiding the need to reason about whether different slots
alias each other. Reasoning about whether slots alias each other now is the
responsibility of InlineGlobalSlots, which has to do a much more complicated
analysis, implemented using MLIR's dataflow analysis framework.
Recommended review order:
- Check out the new IR constructs in the .mlir files of various passes
- Op definitions (*.td)
- Changes to GlobalizeObjectGraph pass.
- InlineGlobalSlots pass (~total rewrite)
- Misc changes:
- Moving torchMlirAdjustStaticInformation for sharing with C++ code.
- EraseModuleInitializer pass
To make this a bit nicer, it would be good to have a `torch.module` op
with an initializer region attached. That would be more invasive though.
This change has highlighted certain aspects of our project layering
which are worth calling out. None of our backends can handle global
slots, so we enforce that there are no global slots before backend
lowering. At an earlier stage in the project, we had aspirations of
transparently handling mutable global state and such, but for reasons
described below, that is no longer a goal. So really global slots should
be seen as a progressive lowering step as part of inlining all the
IValue's in the original program (GlobalizeObjectGraph is also one such
step).
Over time, with insights from work like IREE-JAX, it has become clear
that there isn't a reliable programming model we can compile for users
where we just transparently handle mutable global state (and some other
things, like lists and dictionaries). There is a need for an "outer
program" that orchestrates more restricted subroutines of the kind we
can handle in our compile flow here. The benefit of that is that it
decouples considerations like shapes, dtypes, etc. from the program
constructs used in the outer program. As long as the outer program can
efficiently invoke (pipelining/async/etc.) high-performance
data-parallel numerical subroutines of the kind we compile in our flow
here, then there is a complete programming model. This is also
consistent with the direction of upstream PyTorch which is becoming more
tracing-based (which inherently loses a lot of program structure, which
then has to be applied back with an "outer program" orchestrating the
traced subroutines).
PyTorch recently added support for `dim=None` in the `torch.var`
(5ca9b2b6fa)
and `torch.std`op (eb0e30e0bc).
This commit adds the corresponding support in torch-mlir.
Signed-Off By: Vivek Khandelwal<vivek@nod-labs.com>
- Pruned number of xfailed e2e LTC tests from 305 to 134
- Reviewed every failure to ensure the error genuinely warrants an xfail
- Fixed bug where non-tensor outputs of LTC computation had `.to('cpu')` called, which caused a failure and inflated the xfail count
- Fixed bug with `HBC_basic` test where a constant tensor was created in its constructor without being declared as a buffer, which prevented the device from being updated when the parent `torch.nn.Module` got moved to the `lazy` device
- Note that this test is still xfail'd due to some unsupported ops. Left a comment about some potential issues that may arise if it gets reenabled in the future
- Updated autogen `GeneratedTorchOps.td` to reflect the latest set of supported ops
- Renamed `aten.zero.functionalization` to `aten.zero` to reflect upstream PyTorch changes
* Assume zero rank tensors are scalar
* Run RefineTypes pass on JIT Graph
* Rollback assumption that zero rank tensors are scalar
* Set numSizes to -1 for non-ranked tensors
* Rename RefineTypes to RefineTupleTypes
- Includes a canonicalizer for `aten.add.t`needed for successfully lowering the shape function
- Only offers support for statically sized index tensors when there is more than one
- Dynamic shape support remains for single indexing tensors
This commit adds verifiers to the ops `ToBuiltinTensorOp` and
`FromBuiltinTensorOp` that make sure that the input and output have
the same shape and data type.
In the interest of merging upstream LLVM quickly, a previous patch
(7f08169) updated the torch-mlir build to register all dialects and
passes through Python bindings. This patch limits the dialects and
passes to only those that are used in torch-mlir.
Key to this change are the removal of
`MLIRPythonExtension.RegisterEverything` and the introduction of a new
Python module (`_mlir_libs/_site_initialize_0.py`), where we register
the dialects and passes used by torch-mlir.
This patch makes some rudimentary changes to torch-mlir's use of MLIR
Python bindings to work with the most recent LLVM code. We can perhaps
do better by being more selective in what we link against, instead of
using `MLIRPythonExtension.RegisterEverything`.
This commit adds the decomposition for `aten.var.dim` op.
This commit also make changes in the decomposition for `aten.var` op.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
This patch adds a new pass `torch-verify-conversion-to-value-semantics`,
which looks for non-value semantics tensors to catch such tensors early
during compilation.
This pass requires `torch-refine-public-return` pass to ensure that
return operations are updated to use value tensors, followed by the
canonicalize pass to remove any dead ops that may use or produce
non-value tensors.
This commit does three things:
1. Reverts some of the shape lib changes merged in
https://github.com/llvm/torch-mlir/pull/844
2. Updates the signature of `aten.sum_dim_IntList` that was recently
updated in
23bdb570cf
3. Replaces `aten.zero.functional` with `aten.zero`, updated in 960758b0b7
`aten.select_scatter` op.
This commit adds:
1. Lowering of `aten.slice_scatter` op into `tensor.insert_slice`
op.
2. Decomposes the `aten.select_scatter` op into `aten.slice_scater`
op.
Signed-Off-By: Prateek Gupta <gprateek93@gmail.com>
TorchScript nodes like `prim::Load` and `prim::Store` aren't supported
in torch-mlir because they can't be lowered to backends, but such nodes
can occur in the TorchScript IR.
This patch adds a rudimentary translation from such nodes to
corresponding ops in the Torch dialect. Since we expected such nodes to
go away during lowering because of the SymbolDCE pass, this patch does
not add code to lower these ops beyond the Torch dialect.
* [MLIR][TORCH] Add folder for torch_c.from_i64 & torch_c.to_i64
* add unit tests for each individual fold
* fix failure of NumelZeroRankModule & TestMultipleTensorAndPrimitiveTypesReturn
The MacOS builders are having linking trouble with the extension library.
Until it's fixed, all support for op extensions is disabled. It should be
easy to restore once the issue is resolved.
PyTorch allows new operators to be registered dynamically in modules.
Torch-mlir already makes it fairly straightforward to add support for
new operators, and this commit just extends that support to allow new
PyTorch ops to come from a external module.
This does *not* allow ops to be dynamically loaded into torch-mlir.
Torch-mlir must still be compiled with support built-in.
Add a `_torch_mlir_custom_op_example` subpackage to `torch_mlir` which
registers an demonstration op. It will not be imported by default when
importing torch_mlir. It's strictly for testing and documentation.
Adds an end-to-end test for the `torch_mlir_custom_op_example::identity` op.
With all these changes, we should now be actively testing PyTorch extension
support with all future patches.
This commit adds lowering of `aten.div.Tensor_mode` op.
This commit also fixes formatting for the test file elementwise.py.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
This commit decomposes `aten.baddbmm` op into `aten.bmm`,
`aten.mul.Scalar`, and `aten.add.Tensor` op.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
This commit adds the decomposition of `aten.adaptive_avg_pool2d` op into
`aten.avg_pool2d` op. The current decomposition only supports cases where
input size is equal to the output size.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
This patch adds support for the torch.linalg.vector_norm op to the torch
dialect, including the necessary shape function. It also extends the
conversion of reduction operators to support lowering of
AtenLinalgVectorNormOp, in addition to adding a handful of end-to-end
tests to validate the lowering.
There exist several opportunities to make this lowering optimal and
robust. For instance, in its current form, the translation does not
support ord = 0, +inf, or -inf. For L1 norms, we don't need to raise
each element to the power 1.0. Similarly, L2 norms could benefit from
strength reduction. Since the canonicalization pass is not able to
apply these optimizations, we should consider applying them during the
linalg lowering itself.
Fix the type promotion code for scalar only operation to return
TorchType which is the type tracked in ValueKnowledge.scalarType.
- Fix `getPromotedResultScalarType` to return Torch type.
- Add `getBuiltInTypeForTorchScalar` helper to convert scalar type
to builtin type before passing to the next level type promotion
helper `updateResultTypeState`.
- Add `setScalarType` helper to make setting ValueKnowledge.scalarType
easier.
This commit adds lowering of `aten.ge.float`, `aten.ge.float_int`,
`aten.ne.float_int`, `aten.gt.float_int` and `aten.ceil.float` op.
This commit also fixes formatting for the file scalar.py and scalar_comparison.py.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
The main changes are:
- Added `ValueKnowledge.scalarType` to track scalar type information.
- Added `ValueKnowledge.kind` to indicate the value kind.
- Modified the meet and join helper functions. The ValueKnowledge has
slightly more complicated state now so the meet and join function need
to look at the `kind` field in addition to just the type field.
This also has a fix for the adjustment of types of TupleConstruct
inputs, which I found when using this new functionality on a model.
Some scenarios in tracing create situations where the output of
TupleConstruct has a more refined type than the inputs.
This introduces a helper `adjustStaticInformationForValues` which
subsumes the `derefineValues` helper and the tensor static information
adjustment we were doing.
This commit decomposes `aten.to.dtype_layout` op into `aten.to.dtype` op.
This commit also fixes the formatting for the file type_conversion.py.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
This commit fixes the `ConstantPad2dStaticModule` test case by adding
the lowering of `aten.pad` operation. Previously the test case
mapped to `aten.constant_pad_nd` operation.
The `aten.pad` now decomposes into `aten.constant_pad_nd` operation.
Signed-Off-By: Prateek Gupta <prateek@nod-labs.com>
This commit adds lowering of `aten.ceil.float` op.
This commit also fixes formatting for the file scalar.py.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
The updated LLVM code includes a patch to create bfloat16 array
attributes, thus enabling a different patch to torch-mlir to flesh out
support for the bfloat16 type.
This commit adds lowering of `aten::max_pool2d_with_indices_backward` op.
This commit also fixes formatting issues in basic.py.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
This commit adds support for multi-dimensional tensors as input to the
`_index_put_impl_` op. The support was to some degree already there,
since `ScatterOp` already supports multi-dimensional tensors. This
commit also adds a bit more error checking to `index_put` and
refactors the code for creating `ScatterOp`s to mimic the way one
would make a `Linalg::GenericOp`.
This commit decomposes different variants of `aten.where.*` op into
`aten.where.Self` op. It covers `aten.where.Scalar`,
`aten.where.ScalarSelf` and `aten.where.ScalarOther` ops.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
This commit decomposes `aten.new_empty` op into `aten.empty.memory_format` op.
This commit also made a dtype fix to the constant tensor allocation like ops.
Earlier the dtype for the result was inferred from the result type; now, it's
being evaluated as per the original definition of the op.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
A recent PyTorch commit made ConstantPad2d call a helper function with a
`Union[int, float]` type annotated. This commit adds minimal support for
representing and dealing with that.
https://github.com/pytorch/pytorch/pull/73287
Changes:
- Adding support for `!torch.union<T1, T2, T3>`/`Torch::UnionType`,
along with the importer and CAPI code.
- Add support in isValidSubtype for union types.
- Adding a canonicalizer for `torch.derefine` to help simplify some code
that derefines to a UnionType (this also fixes#664).
There is still more work to do for really supporting UnionType well,
such as canonicalizing UnionType's so that they can be compared with
pointer equality.
The reified code to compute the shape of torch.aten.constant_pad_nd
uses negative indices when setting list elements. This was not
converted to a positive offset in one place in SimplifyShapeCalculations
which prevented computation of the static shape.
- This commit adds decomposition of `aten.dropout` op. It also covers the
training mode of the same op.
- It also adds lowering of `aten.sub.float` op.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
The `assemblyFormat` stuff (which generates unrolled, per-op C++ code)
was taking up a lot of compile time, and all the ops are essentially
printed with the same logic. So this PR makes them all call the same
helper function. This is done by using
`let hasCustomAssemblyFormat = 1` and then implementing `FooOp::parse`
and `FooOp::print`.
Additionally, the `Generated*Ops.td` files are all collapsed into just
`GeneratedTorchOps.td` (there is no reason to have the files separate,
since the files are very large anyway so one is always having to search
within them -- editors don't care that the file to search is now a bit
bigger :) ).
This reduces TorchOpsODSGenerated.cpp compile time (which is now
GeneratedTorchOps.cpp) from 39 to 31 seconds on my machine. This is
actually less than I expected, but this PR is an overall cleanup to the
code anyway. The next step will be to introduce (better) functionality
upstream for sharding the TorchOps.cpp.inc file, so that we can truly
parallelize the O(#ops) costs. This is also necessary, because after
this PR, TorchDialect.cpp is now the slowest file to compile, due to the
`addOperations<... all the ops ...>` call, which needs to be shareded
too.
This commit adds the op `ValsemVariantAtenCopyOp` that represents
`AtenCopy_Op` without the underscore. This is needed to make sure
that the `ReduceOpVariants` pass turns the in-place op into an op
that takes value tensors as inputs, otherwise the
`MaximizeValueSemantics` pass will not be able to add value
semantics correctly.
This commit also adds the lowering of `ValsemVariantAtenCopyOp`.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
The ODS-generated code included via the `TorchOps.cpp.inc` file takes a
very long time to compile. This PR isolates it into its own file so that
the build system can cache it.
This PR creates a new file `TorchOpsODSGenerated.cpp` just to include
the `TorchOps.cpp.inc` file. Doing so required moving to the "new" way
to define verifiers, since the static `verify` free functions in
TorchOps.cpp weren't accessible from the .inc file after it was moved to
`TorchOpsODSGenerated.cpp`.
On my machine, this drops the build time of TorchOps.cpp (such as when
iterating on a canonicalizer) from >40 seconds to <10 seconds.
10 seconds still isn't great though, but at least it isn't "go get a
coffee" type of waiting.
This commit adds the op `ValsemVariantAtenIndexPutImplOp` that represents
`Aten_IndexPutImpl_Op` without the underscore. This is needed to
make sure that the `ReduceOpVariants` pass turns the in-place op
into an op that takes value tensors as inputs, otherwise the
`MaximizeValueSemantics` pass will not be able to add value
semantics correctly.
This commit also adds the lowering of `ValsemVariantAtenIndexPutImplOp` op.
This commit also updates the `torch.bincount` op test cases.
The term "pseudo" is very vague and was getting confusing (I felt I had
to explain it in every comment referencing it). Instead, rework the
"pseudo" ops to instead be named:
- MLIR Syntax: `torch.valsem.*`
- C++ / ODS: `ValsemVariant*Op`
This makes it clear what the concept is, and avoids confusion with other
things that might be called "pseudo", since these are very specific and
should be 100% consistently named w.r.t. the non-valsem-variant ops that
they correspond to.
This is code that we always want to treat as "foreign" and not get too
comfortable using in many functions. One way to accomplish that is to
make it a bit clunkier to use.
Also, fix Utils.cpp to match the LLVM/MLIR coding conventions (don't
define functions inside namespaces -- prefer `using` and explicit
qualification).
This leads to much more succinct types in many cases:
```
!torch.list<!torch.int>
!torch.list<int>
!torch.tuple<!torch.list<!torch.int>, !torch.list<!torch.int>>
!torch.tuple<list<int>, list<int>>
!torch.optional<!torch.list<!torch.int>>
!torch.optional<list<int>>
!torch.list<list<list<tensor>>>
!torch.list<!torch.list<!torch.list<!torch.tensor>>>
```
I would like to take this further and allow omitting the `!torch.`
prefix in all cases, but that's harder -- for example, we currently use
`FuncOp` for functions, and so I don't think we can customize the
printing there. It seems like it will be a longer road to getting that
level of customization.
See the documentation in `docs/shape_lib.md` and
`docs/adding_a_shape_function.md` for an overview of the system.
This completely overhauls how we represent shape functions. In
particular, RefineTypes does not infer shapes anymore (only dtypes).
Shape functions are now written in (TorchScript'able) Python.
Recommended review order:
1. Read `docs/shape_lib.md` and `docs/adding_a_shape_function.md`.
1. Code and tests for ReifyShapeCalculations, DropShapeCalculations.
1. Code and tests for SimplifyShapeCalculations.
1. shape_lib_gen.py
1. Code and tests for new RefineTypes pass.
1. Random folders/canonicalizers in TorchOps.cpp and associated test in
`canonicalize.mlir`.
1. New ReadOnly trait inferred from the registry.
1. Any miscellaneous remaining stuff.
Example `-print-ir-after-all` for ElementwiseUnaryModule:
[IR lowering dump](https://gist.github.com/silvasean/e4dc8cbc8d00aac7819602e3cbd8e212).
Example `-print-ir-after-all` for ElementwiseBinaryModule:
[IR lowering dump](https://gist.github.com/silvasean/daf6860ecced732af3568af6b1899113).
This pass is added to lower ops, which can not be lowered
via the TorchToLinalg pass, such as `torch.bincount` op.
This pass also uses torch-mlir's TMTensor Dialect to lower the
complex ops.
Also add torch.bincount op lowering with the help of TMTensor dialect
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
This commit moves the helper function which are common across
different torch-mlir conversion passes into a common directory
Utils.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
- This commit adds E2E support for `aten.rand_like` and
`aten.bernoulli_.Tensor` ops.
- The `aten.bernoulli(x)` was implemented as:
`aten.bernoulli(x) = rand_like(x) < 0.5`, assuming 0.5 as default
probability, whereas according to the pytorch documentation:
https://pytorch.org/docs/stable/generated/torch.bernoulli.html#torch.bernoulli
the input x in `aten.bernoulli(x)` is itself a tensor containing
probabilities to be used for drawing the binary random number.
- So this commit fixes the `aten.bernoulli(x)` implementation as:
`aten.bernoulli(x) = rand_like(x) < x`.
- It also fixes the case where the input to `aten.bernoulli_.float` is
an integer tensor. In this case the input must be casted to float type
before passing it as operand to `aten.rand_like` op.
`aten.bernoulli_.float(x, p) = rand_like(float(x)) < p`.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
This commit adds the invariant to the op `torch.overwrite.tensor.contents` that
both of its operands have the same shape and size. In order to
maintain the invariant, special handling of this op is added to the
`RefineTypes` pass.
- This commit decomposes the `aten.batch_norm` op into the
`aten.native_batch_norm` op, instead of lowering it to the
`linalg.generic` op.
- It also adds run-time asserts in the `aten.native_batch_norm` lowering
to make sure that the shape of the weight, bias, running_mean, and
running_var must match the num of features.
- Since the `aten.native_batch_norm` op is not supported at TOSA backend,
all the modules that are dependent on the `aten.native_batch_norm` op
will fail and therefore they should be removed from the TOSA `passing`
set.
- It also moves `checkNotNone` to utility.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
This commit adds the op `PseudoAtenFillScalarOp` that represents
`AtenFill_ScalarOp` without the underscore. The approach is the same
as in commit dd998fa4d4.
Adding this op allows for a simpler and more consistent version of the
`empty` and `empty_like` op e2e tests.
- This commit adds lowering of `aten.le.Scalar` and `aten.ge.Scalar` ops
as a part of `convert-torch-to-linalg` pass.
- It also creates a new test script `elementwise_comparison.py` for all
element-wise comparison ops.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
This commit adds the op `PseudoAtenBernoulliFloatOp` that represents
`AtenBernoulli_FloatOp` without the underscore. This is needed to make
sure that the `ReduceOpVariants` pass turns the in-place op into an op
that takes value tensors as inputs, otherwise the
`MaximizeValueSemantics` pass will not be able to add value semantics
correctly.
- This commit adds lowering of `aten.Bool.Tensor` and
`aten.Float.Tensor` op as a part of `convert-torch-to-linalg` pass.
- It also adds support for returning bool types.
- It also fixes lowering of the `aten.Int.Tensor` op for non-zero rank
input tensors.
- If a scalar number is converted to a 0-d tensor and passed on to the
`aten.Float.Tensor` op, it folds to the scalar number.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
- This commit adds `aten.assert` op in the Torch dialect.
- The `aten.assert` op is lowered to `mlir::Assert` op.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
- This commit adds support for `aten.native_batch_norm` operation.
- The current implementation only supports inference mode of
`aten.native_batch_norm` op.
Signed-Off-By: Gaurav Shukla <gaurav@nod-labs.com>
The lowering of aten::nll_loss_backward op has been added
from torch to linalg dialect. The changes has been made as
a part of -torch-convert-to-linalg pass.
Signed-off-by: Prashant Kumar prashant@nod-labs.com
This PR include the following pieces:
- Add torch `Generator` type. `Generator` type is converted to i64 in
refbackend type converter.
- Add seed managment support for the default global generator.
`torch_c.getNextSeed` op is used to get the seed. On refbackend, the
`torch_c.getNextSeed` is lowered to load/store from [0] of global
variable `default_generator` memref<i64> in `InsertRngGlobals` pass.
- Add `aten.uniform_` and testing as an example op for RNG ops. Add
`torch.pseudo.aten.uniform` op. It has the same operands and return as
the `aten.uniform_` from the op registry except for value semantics.
The added e2e maxpool testcase from #545 was not getting a static shape
due to an unfolded prim.If when RefineTypes was called. This was because
of unfolded torch.iaten.__is__ and torch.prim.unchecked_cast operators
with torch.derefine operands.
- Common code as TF repository, being moved to MLIR core.
- Will support further legalizations to be published.
Signed-off-by: Suraj Sudhir <suraj.sudhir@arm.com>
This commit adds the aten ops which do not require random number
support to aten dialect. This commit also adds some of the missing
torch types.
Signed-Off-By: Prateek Gupta <prateek@nod-labs.com>
Note that to enable folding of the code coming from an example
like the ConstantPad2dStaticModule e2e test, support for other
operations had to be added/improved:
- aten::neg.int
- aten::eq.float
- aten::eq.str
- prim::Uninitialized
This commit adds lowering of `aten.threshold` op
This commit adds lowering of `aten.threshold_backward` op
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
- This commit adds E2E support for `aten.ones_like` and
`aten.zeros_like` ops.
- Adds support for non-None `dtype` argument of `aten.empty_like` op.
- All the unit test cases related to constant tensor allocation like ops
are moved to a different file named `constant_alloc.py`.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
This commit adds lowering of `aten.arange.start_step` op.
This commit decomposes `aten.arange` and `aten.arange.start` into
`aten.arange.start_step` op.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
- It folds `aten.to.dtype` when the input tensor type and result type
are exactly same.
- It folds `aten.view` when the rank of both the input tensor type and
result type is unity.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
We only handle the expanding OR collapsing cases, we do not handle
expanding And collapsing happening at the same time or cases where
it's neither collapsing nor expanding like view of [2,3] for
3x2 tensor.
It's assumed that if a shape list element is got from
`aten.size(tensor, dim)` the corresponding dim is not splitted or
collapsed. This assumption makes it easier to deal with dynamic shapes.
- Added E2E support for `aten.eq.Tensor` and `aten.lt.Tensor` ops. Both
the operands are expected to be of the same type, i.e., type promotion
is not addressed as a part of this commit.
- Added E2E support for `aten.eq.Scalar` and `aten.lt.Scalar` ops.
Tensor operand type to Scalar operand type promotion has not been
handled in this commit.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
This commit fixes the naming of results in the torch ODS generator
when dealing with multiple results. In particular, this commit appends
an index to each result name to guarantee that they are all unique.
This patch adds the where, gt, bucketize and reshape
ops to the Torch dialect. These ops are present in the histogram
calibration module.
TEST: Successfully lowers ops to Torch dialect in histogram module.
This commit adds support for aten.native_layer_norm operation. Here
the previous code for aten.layer_norm is tweaked a little bit to
accomodate both mean and variance values alongwith the layer norm
value. This commit also adds decomposition of aten.layer_norm into
aten.native_layer_norm, which was previously getting lowered directly
to linalg.
Signed-Off-By: Prateek Gupta<prateek@nod-labs.com>
This commit adds lowering of `aten.squeeze.dim` op into
`linalg.TensorCollapseShape` op. Here, the dim(th) dimension of the
input tensor is not supposed to be dynamic.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
This commit adds lowering of `aten.gt.Scalar` and `aten.where.self` as a
part of element-wise ops lowering.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
The op lowering has been added as a part of `torch-lower-to-linalg`
pass. This takes care of ignore_index but the weight and reduction
operand is still to be accounted for.
Signed-off-by: Prashant Kumar <prashant@nod-labs.com>
- Supports variants with multiple dims, one dim, all dime
- Leverages legalize_common and legalize_utils code from
TensorFlow-TOSA work
Signed-off-by: Suraj Sudhir <suraj.sudhir@arm.com>
There is an op name change that requires trivial changes.
Also, some of the warning has been fixed.
Signed-off-by: Prashant Kumar <prashant@nod-labs.com>
This commit adds lowering of `aten.Squeeze` op into
`linalg.TensorCollapseShape` op. The size 1 dynamic dimensions are not
handled as a part of this commit.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
This is to fold the common pattern from Bert inference like:
```
%111 = torch.prim.NumToTensor.Scalar %110 : !torch.int ->
!torch.vtensor<[],si64>
%112 = torch.aten.Int.Tensor %111 : !torch.vtensor<[],si64> ->
!torch.int
```
aten.log_softmax_back_data op lowering and required
tests has been added. Some NFC have also been added.
Signed-off-by: Prashant Kumar prashant@nod-labs.com
This commit adds lowering of `aten.mul.Scalar` and also adds
decomposition of `aten.addmm` to `aten.mul.Scalar`, `aten.add.Tensor`
and `aten.mm` ops.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
This commit adds new operation `aten.gelu_backward` in the aten
dialect and adds lowering of this operation from aten to linalg.
Signed-Off-By: Prateek Gupta <prateek@nod-labs.com>
This change is to unblock the work of some backprop ops returning more
than one tensors. We will need to think of a more scalable approach
in the future if more flexible return types combinations are needed.