This moves the bulk of the Python code (including the Torch interop)
from `frontends/pytorch` into `torch-mlir/TorchPlugin`. This also
required reconciling a bunch of other Python-related stuff, like the
`torch` dialects.
As I did this, it was simpler to just remove all the old numpy/basicpy
stuff because we were going to delete it anyway and it was faster than
debugging an intermediate state that would only last O(days) anyway.
torch-mlir has two top-level python packages (built into the
`python_packages` directory):
- `torch_mlir_dialects`: `torch` dialect Python bindings (does not
depend on PyTorch). This also involves building the aggregate CAPI for
`torch-mlir`.
- `torch_mlir`: bindings to the part of the code that links against
PyTorch (or C++ code that transitively does).
Additionally, there remain two more Python packages in npcomp (but
outside `torch-mlir`):
- `npcomp_torch`: Contains the e2e test framework and testing configs
that plug into RefBackend and IREE.
- `npcomp_core`: Contains the low-level interfaces to RefBackend and
IREE that `npcomp_torch` uses, along with its own
`MLIR_PYTHON_PACKAGE_PREFIX=npcomp.` aggregation of the core MLIR
python bindings. (all other functionality has been stripped out)
After all the basicpy/numpy deletions, the `npcomp` C++ code is now very
tiny. It basically just contains RefBackend and the `TorchConversion`
dialect/passes (e.g. `TorchToLinalg.cpp`).
Correspondingly, there are now 4 main testing targets paralleling the
Python layering (which is reflective of the deeper underlying dependency
structure)
- `check-torch-mlir`: checks the `torch-mlir` pure MLIR C++ code.
- `check-torch-mlir-plugin`: checks the code in `TorchPlugin` (e.g.
TorchScript import)
- `check-frontends-pytorch`: Checks the little code we have in
`frontends/pytorch` -- mainly things related to the e2e framework
itself.
- `check-npcomp`: Checks the pure MLIR C++ code inside npcomp.
There is a target `check-npcomp-all` that runs all of them.
The `torch-mlir/build_standalone.sh` script does a standalone build of
`torch-mlir`.
The e2e tests (`tools/torchscript_e2e_test.sh`) are working too.
The update_torch_ods script now lives in
`torch-mlir/build_tools/update_torch_ods.sh` and expects a standalone
build.
This change also required a fix upstream related to cross-shlib Python
dependencies, so we also update llvm-project to
8dca953dd39c0cd8c80decbeb38753f58a4de580 to get
https://reviews.llvm.org/D109776 (no other fixes were needed for the
integrate, thankfully).
This completes most of the large source code changes. Next will be
bringing the CI/packaging/examples back to life.
We were not filling the `outs` with the neutral element of the
reduction, which resulted in reading uninitialized values (we were
getting lucky that sometimes the uninitialized buffers were all zero's).
Also,
- Slight tweak to error messages in the e2e framework.
- builder.getSymbolRefAttr is gone.
- OpAsmOpInterface's getAsmResultNames method needs explicit override
- a bunch of churn for builtin.func needing to be made explicit (and
sometimes implicit?)
- operation printers no longer need to print the operation name
themselves.
- snuck in beneficial trivial addition to TmpDeleteDeadIREEListsPass to
test a particular upstream change e2e with my local patchset.
* Now the parts of the MLIR API are directly exported under the npcomp module (i.e. `npcomp.ir`, etc).
* Has required fixes for https://reviews.llvm.org/D108489
* Deletes npcomp.tracing vs fixing it because it was a very early experiment that will not be carried forward.
* This makes the npcomp python distribution completely standalone and separate from an mlir installation.
* Makes most of npcomp itself relocatable for future use as a library.
* Most things are a namespace package now. In the future we can s/torch_mlir/npcomp.frontends.torch/ and have it layer properly.
With the following changes the compilation can continue until
RefineTypes pass:
- Add operators without ODS into `torch_ods_gen.py`
- Add some new optional and list types in `TorchTypes.td`
- Add some folders for aten int type comparator ops
- Modify GlobalizeObjectGraph.cpp. For global slots that's not used,
dont check if an aliased value is stored in more than one of global
slots. This can work around a failure where the same tensor is stored
in multiple "version" slots which are not used.
Most of the change is in the reporting code to give error messages that
are useful, and adjusting TraceItem to be semantically correct w.r.t.
Python's modeling of return values.
This allows writing a test like `ListLiteralModule_basic` for list
functionality, which we will soon be hooking up to IREE.
The IR for that test currently gets this far:
```
builtin.func @forward(%arg0: f64) -> !torch.list<!torch.float> {
%0 = torch.from_f64 %arg0
%1 = torch.prim.ListConstruct %0, %0 : (!torch.float, !torch.float) -> !torch.list<!torch.float>
return %1 : !torch.list<!torch.float>
}
```
It should be sufficient to just add a conversion of
`torch.prim.ListConstruct` (+ relevant type conversion) to necessary
IREE primitives.
For lists of *tensors* (rather than scalar floats), it gets more
complicated, as we need to deal with changing their element type to
ValueTensorType first (by default, they will all be NonValueTensorType).
It seems that IREE might have a type we can lower into for non-value
tensors as well, TBD.
This includes the following changes to import MT model into MLIR. There
are still a lot of work to for actual compilation.
- Add `torch.dict<>`, `torch.any`, `torch.number` types
- Add `torch.prim.DictConstruct` op
- Fix `torch.prim.TupleConstruct` op assembly format to include resulting types
The tests use the same (pure-Python) test framework as the
normal torchscript_e2e_test.sh, but the tests are added in
`build_tools/torchscript_e2e_heavydep_tests` instead of
`frontends/pytorch/e2e_testing/torchscript`. Any needed dependencies can
easily be configured in generate_serialized_tests.sh.
We add an initial machine translation model with a complex set of
dependencies to seed the curriculum there. I verified that this model
gets to the point of MLIR import (it fails there with a segfault due to
not being able to import the "Any" type).
This required moving a few files from the `torch_mlir` Python module
into multiple modules to isolate the code that depends on our C++
extensions (which now live in `torch_mlir` and
`torch_mlir_torchscript_e2e_test_configs`) from the pure Python code
(which now lives in `torch_mlir_torchscript`). This is an entirely
mechanical change, and lots of imports needed to be updated.
The dependency graph is:
```
torch_mlir_torchscript_e2e_test_configs
/ |
/ |
/ |
V V
torch_mlir_torchscript torch_mlir
```
The `torch_mlir_torchscript_e2e_test_configs` are then dependency-injected
into the `torch_mlir_torchscript` modules to successfully assemble a
working test harness (the code was already structured this way, but this
new file organization allows the isolation from C++ code to actually
happen). This isolation is critical to allowing the serialized programs
to be transported across PyTorch versions and for the test harness to be
used seamlessly to generate the heavydep tests.
Also:
- Extend `_Tracer` class to support nested property (submodule) accesses.
Recommended review order:
- "user-level" docs in README.md
- code in `build_tools/torchscript_e2e_heavydep_tests`.
- changes in `torch_mlir_torchscript/e2e_test/framework.py`
- misc mechanical changes.
- Add support for "expected failures" in test reporting. The new error
reports look like
[this](https://gist.github.com/silvasean/6ffd95e1d55302b699673da201da210d).
- We will now be able to put these tests into CI, since the harness
understand which tests are expected to pass and fail.
- Refactor RefBackendTestConfig to NpcompBackendTestConfig which
supports both RefBackend and IREE.
- Add instructions for installing IREE dependencies (both from packages
and for local builds of IREE)
- Add `tools/torchscript_e2e_test.sh` for invoking the e2e test
harness (this makes invoking a bit easier, as it doesn't rely on a
loose Python invocation).
This op is much better behaved than the `torch.tensor.literal` op
(which is the new name of the `torch.tensor` op). In particular
`torch.tensor.literal`:
- always has a maximally refined type.
- always has value semantics.
- can be constant folded / CSE'd.
ReduceOpVariants is changed to perform the transformation from
`torch.tensor.literal` to `torch.vtensor.literal` (which in general
involves static information casts and copies.
This new op also allowed tightening up `torch.tensor.literal` to only
accept NonValueTensorType (instead of any tensor type).
This new ".literal" name is more descriptive. It was getting too
confusing seeing an op called just `torch.tensor` (we originally called
it that because that's the name of the similar function in the Torch
Python API, but it just doesn't fit here).
This removes the dependence of the `torch` dialect on the low-level
builtin types.
Now the `torch` dialect is a standalone layer, suitable for targeting
from higher-level Python abstractions without any premature lowering to
primitive types.
This replaces the ad-hoc use of `i64` throughout the Torch layer, and
helps to keep it crystal clear the distinction between `!torch.int`
(which is modeling the Python `int` type) and the various types that
serve as dtypes of tensors, which are a totally different type universe.
Changes:
- `!torch.int` type and C bindings.
- Change `torch.constant.int` parser to not need the `: i64` at the end.
- `m_TorchConstantInt` matcher to aid with matching constants.
- BackendTypeConversion changes for `!torch.int` -> `i64` type
conversion.
- Refactor finalizing patterns in FinalizingBackendTypeConversionPass
(they were getting very repetitive).
- Mechanical rewriting of `!torch.int` to `i64` in all the tests, and
`AnyTorchIntType` to `Torch_IntType` in the `.td` files.
This removes the use of `scf.if`, which required laundering back and
forth between `i1` and `!torch.bool` in the frontend. We will eventually
lower this op to `scf.if`, but this results in a cleaner IR and layering
at the frontend.
This finishes removing the dependence on the basicpy dialect!
Changes:
- Add `!torch.bool` type and replace use of `!basicpy.BoolType` in
Torch-related code.
- Rename BuiltinTensorize to BackendTypeConversion since now it handles
bool conversions (and, when we add !torch.int and !torch.float, it
will handle those as well), and generalize the related utilities (I
also moved them to Torch/Transforms since they aren't really part of
Torch/IR).
- Add `torch.to_i1` and `torch.from_i1` ops for materializations
- [cleanup] Reorganize `torch.constant.*` ops in TorchOps.td
- Remove dependency of `torch` dialect on `basicpy` dialect and also
`std` dialect. For `std`, we use some call related ops, but the
`torch` dialect itself never produces them (we have passes that do
though).
This is fairly mechanical. Recommended review order:
- New stuff in Torch/IR
- New BuiltinTypeConversion files.
- Mechnical fixups elsewhere.
- Add `torch.constant.none` op to construct it (naming is chosen to be
analogous to Torch's representation of a prim::Constant with
NoneType, rather than using the "singleton" terminology of Basicpy).
This removes our reliance on the numpy dialect and avoids our off-label
use of the builtin tnesor type for modeling unknown dtypes. The
`!torch.vtensor` (`ValueTensorType`) type is a value-semantic tensor.
The `!torch.tensor` (`NonValueTensorType`) type is a non-value-semantic
tensor. The new types look as follows syntactically:
```
// Least-static-information, non-value-semantic tensor.
!torch.tensor
// Explicit form of least-static-information variant.
!torch.tensor<*,unk>
// Least-static-information, value-semantic tensor.
!torch.vtensor
// Explicit form of least-static-information variant.
!torch.vtensor<*,unk>
// Fixed-set of allowable element types, with first-class support for
// Torch's frontend signedness semantics.
!torch.tensor<*,si32>
// First-class support for unknown dtypes.
!torch.tensor<[?,?,?],unk>
// Standard MLIR representation of `?` for unknown dimensions.
!torch.tensor<[?,2,?,4],unk>
// Statically shaped / dtyped example.
!torch.vtensor<[1,2,3,4],f32>
```
This required fairly significant changes throughout the compiler, but
overall it is a big cleanup. We now have a much clearer layering of "the
Torch frontend lowering" vs "lowering to std + linalg + etc.".
At the C++ level, there is `ValueTensorType`, `NonValueTensorType`.
We also have a helper `BaseTensorType` (kind of like ShapedType) which
interoperates with those two.
Included changes:
- New `torch.tensor(dense<0.0> : tensor<5xf32>) : !torch.tensor` op for
creating torch tensor literals in the frontend.
- Consistently use signedness for the types (except i1 which I didn't
touch -- we need to sort out the situation with !basicpy.BoolType
there anyway so will be attending to that soon)
- Frontend can annotate whether an argument to the function has value
semantics. We currently require this, as our backend contract does not
currently allow us to even model the non-value-semantic case. Before,
the value-semantic assumption was randomly injected in the middle of
the pass pipeline.
- Move ArrayToTensor (now called MaximizeValueSemantics) and
RefinePublicReturn passes to torch dialect.
- The TorchToStd and TorchToLinalg passes are now type conversions from
`!torch.vtensor` to `tensor` and use the dialect conversion infra.
The overall conversion pipeline is set up following the best practices
of the "Type Conversions the Not-So-Hard Way" talk. This required
introducing `torch-func-builtin-tensorize` and
`torch-finalizing-builtin-tensorize` passes analogous to the upstream
bufferization passes with the corresponding names (mostly just
copypasta from there).
- Misc Torch-level canonicalizations -- we now cleanly layer the
lowering to std later in the pipeline, so we are gradually lessening
our reliance on random std constant folding before we get to that
point.
Recommended review order:
- New types in TorchTypes.td/TorchTypes.h/TorchDialect.cpp
- New ops in TorchOps.td / TorchOps.cpp
- Less important / more mechanical stuff
- Frontend changes.
- Pass changes/additions in `Torch/Transforms` and `Conversion/`
This now gives [much nicer output](https://gist.github.com/silvasean/f048e0f37b04542dae6469b86802bb3e).
Embarrassingly, we previously couldn't even report failures for two
different tests, and weren't able to report on compilation failures
(besides just crashing).
This is enough to import the program and get it through the compilation
pipeline. It of course fails at the VerifyBackendContract pass since
there is a lot missing, but the final IR for a simple quantized MLP is
looking pretty decent already:
[IR](https://gist.github.com/silvasean/f76bccd76e9b193d396cfb2f9a11f54d)
Main changes:
- Add support for importing torch quantized tensors, including
`torch.per_tensor_affine.create` op and `!torch.qint8` element type.
- Add support for importing `LinearPackedParamsBase` (basically a weight
+ optional bias, but requires `torch.linear_params.create` op +
`!torch.LinearParams` type to model it). This was less painful than I
expected, as it has the necessary methods to opaquely unpack itself. I
factored things so it should be easy to extend to other custom classes
like `ConvPackedParamsBase`.
- Add minimal boilerplate for importing `quantized::*` ops, with
`quantized::linear` being a motivating example.
- Add e2e test with simple quantized MLP (courtesy of @phoenix-meadowlark).
This is somewhat of an abuse of `!numpy.ndarray` / `tensor`, as
really the proper semantics of `!torch.qint8` dtype on a Torch tensor is
"check the quantizer object of the tensor for side data (scale/offset,
possibly per-channel) that defines the full semantics of the tensor". We
don't have any such notion of "side data" for `!numpy.ndarray` /
`tensor`, let alone anything that would have the associated behavior of
keying off the dtype to determine if the side data is present.
This will be fixed by a proper `!torch.tensor` type.
This is a really major and invasive restructuring of the way we get
torch operators (`torch::jit::Operator` / `c10::OperatorHandle`) into
MLIR. Please forgive the challenging review, but due to the sheer
invasiveness, it wasn't really practical do do it in sane smaller
pieces.
This fully replaces everything that was already working on the
TorchScript path (actually, more -- we added tanh support to
TorchToLinalg in order to delete the older code paths). Additionally,
I've kept the lights on for the acap path too, including what little e2e
stuff was working before (for expediency I made a few tiny compromises
along the way that will be easy to undo when we give that path proper
attention).
Overview of the new design:
- The torch operator `somens::someunqualname.someoverloadname` is
imported as `torch.somens.someunqualname.someoverloadname` (skip the
last dotted part if the overload name is empty), OR, if we don't have
such an op registered, it is imported as
`torch.operator "somens.someunqualname.someoverloadname" (...) : ...`.
- The addition of the "overload name" is a critical element here, as
the `(ns,unqual,overload)` triple is unique, which solves a lot of
problems we were having.
- This involves having separate MLIR ops for the `trailing_` and
`.out` variants and all the different overloads. This seemed
necessary, because the set of overloads is so wild and varied and
unstructured. The previous design was leaning into some underlying
structure that just isn't there -- the default situation is
the "random overload that we want to manage on the MLIR side",
rather than that being an exception. E.g. `aten::ne` (not-equal)
has 21 overloads, only 4 of which are c10 dispatcher ops see
[gist](https://gist.github.com/silvasean/190ba918c550c956260e21254e1b8aa1),
and the "out" variant is really called `.Tensor_out` instead of
`.out` as it frequently is for other ops.
- Rationale for all being in `torch` namespace: the set of operators
are so varied and unstructured that "dialect per namespace"
doesn't result in anything resembling the typical MLIR dialect
boundary expectations. We could maybe draw the boundary at
dispatcher ops vs non-dispatcher ops, but that doesn't seem to
really result in very much useful structure at this point in time.
- Note: within the torch operator registry, we effectively have a
mini-basicpy subdialect (already type-resolved), which is reasonably
structured.
- The existing Torch op interfaces are also removed -- now that we
track the overload name, we can losslessly find the original
operator.
- Instead of `ATenRecognizeKernelsPass`, we now have a
`ReduceOpVariantsPass` that keys off certain traits (and perhaps
eventually interfaces) to reduce variants of ops to a smaller set,
ideally operating on immutable tensors and using surrounding ops to
model the mutability/aliasing aspects.
- Note: `torch.ns.unqual.overload` ops allow both immutable and
mutable tensors (unlike the previous hard distinction in the common
case). This is a premonition for a future change that will introduce a
bona fide `!torch.tensor` type that will clean up a bunch of stuff.
- `TorchToLinalg` / `TorchToStd` supercede the existing
"ATen->TCF->TCP->Linalg" path.
- The new `torch_ods_gen.py` supercedes `torch_signature_ods_gen.py`.
It should look somewhat familiar, but the benefit of hindsight has
allowed a lot of simplifications.
The overall trend seems to be to make the `torch` dialect a nice layer
independent of anything else. It feels like as a natural result of
various future changes we will be removing the reliance on basicpy+numpy
dialects and have a nice self-contained type system too that properly
models the TorchScript type system (including proper subtyping,
mutable/immutable tensors, optional dtype, etc.).
Recommended review order:
- Start at some of the new import IR, e.g. in
`frontends/pytorch/test/node_import/prim.py`,
`frontends/pytorch/test/acap_export/test_export_add3.py`, and other
tests.
- `frontends/pytorch/python/torch_mlir_utils/codegen/torch_ods_gen.py`
and associated generated files:
- `include/npcomp/Dialect/Torch/IR/GeneratedAtenOps.td`
- `include/npcomp/Dialect/Torch/IR/GeneratedPrimOps.td`
- Inspect `ReduceOpVariants.cpp` / `reduce-op-variants.mlir` and the new
traits in `include/npcomp/Dialect/Torch/IR/TorchTraits.h`
- Various code changes in the import path in
`frontends/pytorch/csrc/builder`. Probably most interesting is the new
code in `torch_to_mlir_utils.cpp` that has the logic to create the
`torch.operator` ops or `torch.ns.unqual.overload` ops.
This is the [new ResNet IR](https://gist.github.com/silvasean/5407aafb710d07612b7b5b92eabecebe),
just to be able to look at a substantial sample of IR in the new style.
Interestingly, TorchScript has its own op (`torch::jit::Operator`)
registry separate from the dispatcher (it is a superset of the
dispatcher).
This is where the "prim" ops and some "aten" ops (that should probably
be renamed to "prim") live. In particular, `aten::__is__` is in that
latter category of "aten but really prim". This registry is also the
source of truth for what the TorchScript interpreter calls into when it
executes.
The bulk of the "not part of the dispatcher" ops live in
09feb5f579/torch/csrc/jit/runtime/register_prim_ops.cpp (L82)
And the registry itself lives in:
09feb5f579/torch/csrc/jit/runtime/operator.cpp (L196)
This fold further reduces the IR of ResNet by folding away some
more not-taken branches. These not-taken branches in ResNet require
first-class handling of the list type which we don't yet have on any
backend.
These tests pass on the reference backend.
- Add aten.linear op + shape xfer function + ATen->Linalg lowering.
- Note: this needs to be more automated, and needs to cover more cases.
- Current not implemented caveats:
- size-1 broadcasting for bias vector (either static-size-1 or ? case)
- higher-rank aten.linear ops (not produced by torch.nn.Linear though)
- type promotion (still don't even know the exact rules here)
- Add folder for torch.derefine op. Now the inliner can clean it up as
it inlines. (call boundaries are a main place we need to insert
torch.derefine) This is brittle -- the other important case is control
flow which will need to be handled via an extension to
RefineTypes.cpp (as will more robust call handling). River has an
in-flight patch to update it to the new dataflow framework so I didn't
want to do anything intrusive here.
- Also adjust torch.derefine syntax to use the keyword `to` instead of
`->`, as most type-only, cast-like ops do.
- Move frontend lowering pipelines to c++ (this helps with reproducing
failures in npcomp-opt)
- Add debugging printouts when compilation fails on RefBackendTestConfig
The experience now when a test fails during MLIR lowering is now like this:
```
NPCOMP TorchScript Object Graph IR -> NPCOMP Backend IR lowering failed with the following diagnostics:
failed to legalize operation 'torch.global_slot'
Module does not conform to npcomp's backend contract. See dialect conversion legality information above.
Error can be reproduced with:
$ npcomp-opt -torchscript-to-npcomp-backend-pipeline /tmp/ResNet18Module.mlir
```
And when TorchScript->MLIR import fails it looks like this:
```
PyTorch TorchScript module -> NPCOMP Object Graph IR import failed with the following diagnostics:
unhandled prim operation: %18 : int = prim::min(%17) # /usr/local/google/home/silvasean/.local/lib/python3.9/site-packages/torch/nn/functional.py:4532:4
```
Also,
- Add `--filter=<regex>` to e2e test harness to filter tests.
- Add a few prim ops that were needed to import ResNet18
- Fix torch.prim.Loop.condition assemblyFormat (it previously would not
round-trip in the case of no loop-carried variables)
The E2E tests can be run with
```
npcpy frontends/pytorch/e2e_testing/torchscript/main.py
```
This commit adds a couple items supporting that end, including new sugar
for annotations (no more raw use of ClassAnnotator!).
Recommended review order:
1. `frontends/pytorch/e2e_testing/torchscript/main.py` for
the harness + `basic.py` in that directory for examples of tests.
2. Annotation sugar in `frontends/pytorch/python/torch_mlir/torchscript/annotations.py`
and unittest in `frontends/pytorch/test/ivalue_import/annotations/sugar.py`
3. Global test registry / sugar in
`frontends/pytorch/python/torch_mlir/torchscript/e2e_test/registry.py`
4. `frontends/pytorch/python/torch_mlir/torchscript/e2e_test/framework.py`
for the meat of the testing framework (start at `run_tests`), and
looking at the backend configs in
`frontends/pytorch/python/torch_mlir/torchscript/e2e_test/configs`
for examples of backends. This is likely the bulk of review time.
5. Unit tests of the framework logic in `frontends/pytorch/test/torchscript_e2e_test`
There's TODO's scattered throughout, but this seems functional enough to
start pulling stuff into and kicking the tires. A few missing pieces:
1. Marking test expected pass/fail per backend.
2. Figuring out how best to fit this into dev workflows.
3. IREE TestConfig.
Also, forgive this Python newbie... Any advice on Python code structure
/ library design would be much appreciated.
These allow users to annotate a known "type bound" on the argument,
which can seed shape/dtype inference. We don't rewrite the function
types as part of the import process (it will happen in a
yet-to-be-written pass) because:
1. We would need to interprocedurally rewrite all calls to keep the IR
consistent. Currently, we have a place after GlobalizeObjectGraph but
before we convert to tensors where this is convenient to do. Ideally,
we would do this on the object graph representation.
1. We don't necessarily know that adjusting the function type is a legal
calling convention change. The pass will have blessed knowledge (by
the pass pipeline author) that adjusting the argument type based on
the type bound is safe (which it frequently is).
2. Note that in principle, a type bound could be a fairly general thing
(such as maximum sizes of dimensions, unions of multiple concrete
types, etc.). The pass will in principle have logic to interpret the
type bounds and to determine a suitable "best" (and legal) argument
type.
- renames of OwningRewritePatternList -> RewritePatternSet
- also `insert` to `add`
- RewritePatternSet holds a context now
- memref dialect split from std
Tracing seems now now capture a 4-operand version of aten::add instead
of 3-operand.
I fixed the tests that made sense. One test was XFAIL'ed, as I don't
have in cache the exact way to fix it yet (requires touching
aten-recogniz-kernels stuff). I'll be context switching to work on the
kernel recognition stuff soon, and will fix it then.
In terms of IR structure, TorchScript allows types to vary in many
circumstances where MLIR requires pointer-identical types. In particular,
it is valid to pass any subtype in place of a type. For example, if an
`Optional[int]` is required somewhere in the IR, it is legal to pass a
value of just `int` (but not the other way around; see
`torch.prim.unchecked_cast`). In effect, every *use* can have a different
type.
We introduce a new op `torch.derefine` that models that impedance
mismatch. This op allows casting a value from one type to a type that it
is a subtype of to model this behavior.
Recommended review order:
- TorchOps.td for new torch.derefine (and updated docs for
`torch.prim.unchecked_cast`)
- new test code in if.py, loop.py, function-derefine.py
- new code in node_importer.cpp for handling derefinement insertion
- function_importer.cpp and utils changes in torch_to_mlir_utils.cpp
Properly handling derefinement on function boundaries required
relayering the code so that graph_importer.cpp/.h is now
function_importer.cpp/.h because only the `torch::jit::Function`
(actually the `c10::FunctionSchema` it holds) knows the derefined types that are
actually needed at the boundary (see `function-derefine.py` for a test).
Annoyingly, this churns all the functions which are now prefixed with
`__torch__.` but that is more correct anyway (that is their linkage name
in the `torch::jit::CompilationUnit`; the previous `mb.import_function`
was actually buggy in the case of functions calling each other as it
would reference their unqualified name).
With this change, we can import `resnet18` from `torchvision` :)
IR: https://gist.github.com/silvasean/6426a5272d8a6c7caae533fce05ab704
I could not find a corresponding ListIndex in prim, which seems to
translate to a __get_attr__ under the hood. I think the reason a tuple
Index op can exist is because Tuple's are supposed to be frozen, where
List operands can be mutable.
This arises when casting optionals, which happens a lot especially
around handling of default arguments (python `if arg is None` idiom).
In this case, the offending code for the model is in max_pool2d:
[code link](b3bf08e67f/torch/nn/functional.py (L657))
Used by resnet18.
It seems to originate from a helper `_verify_batch_size`:
[code link](b3bf08e67f/torch/nn/functional.py (L2099)).
I couldn't find a way to test `prim::RaiseException` without also having
`prim::Uninitialized`.
This primarily unlocks proper handling of free functions (that is,
functions that are not methods of any torch.nn.Module).
Recommended review order:
- `ivalue_importer.cpp` + `ivalue_import/functions*.py`
- `GlobalizeObjectGraph.cpp` + test case
- misc other stuff
The `torch::jit::CompilationUnit` is basically a backing store or
"context" holding all the possible functions in the program. The
previous code was not explicitly accessing this data structure, since it
just imported the `torch::jit::Function`'s that it saw attached to
methods.
Subtly, any time a TorchScript module called into a free function, the
free function gets incorporated into the torch::jit::CompilationUnit,
but doesn't show up anywhere when dumping the module, except in the
curious pattern:
```
%5 : Function = prim::Constant[name="adaptive_avg_pool2d"]()
%6 : Tensor = prim::CallFunction(%5, %input.1, %4)
```
That is, calls are indirect calls, and are accessed via `prim::Constant`
materializing a function object. Even stranger, the `name` attribute here
doesn't really even tell the full story -- it doesn't correspond to
anything. It turns out that the c10::FunctionType itself actually holds
a pointer to the `torch::jit::Function` in the compilation unit
directly (so there is actually no indirection in prim::CallMethod,
because any two values of the same FunctionType call the same
function!). E.g. when converting the IR to bytecode, the "name" is
ignored [code link](1d6bd15790/torch/csrc/jit/runtime/interpreter.cpp (L937)).
We do import `prim::CallFunction` as a `std.call_indirect` though
because it's more braindead to do it that way (it gets canonicalized to
a direct call easily).
With this, we can import BERT!
```
pt_util ~/tmp/bert.pt --import --exported-name=forward \
| npcomp-opt -torch-globalize-object-graph -inline -symbol-dce
```
https://gist.github.com/silvasean/fe7735ff5d065cc9216f7b0346d0e977
The test case here is a bit unconventional -- it isn't actually valid
Python. To figure out how to generate it I had to go search the PyTorch
codebase for "NumToTensor" and work backward. In this case I found
this
[code](649760e5f1/torch/csrc/jit/frontend/ir_emitter.cpp (L464))
which via a wild guess I was able to turn into a test case.
In this case it didn't take me too long, but when doing this kind of
"add a bunch of trivial stuff to bring up a real model", I'm starting to
think that we might skimp on test cases when it's fairly trivial and not
obvious how to test with a small test.
- `module_import -> ivalue_import`, as it mainly tests ivalue_importer.cpp
- `graph_import -> node_import`, as it mainly tests node_importer.cpp
- graph_importer.cpp does call into node_importer.cpp, but doesn't do
much.
This was getting pretty confusing. Also add README.md's in each
directory for more clarity.