Commit Graph

1946 Commits (377720af87f8ed1a21a906dc696d65ecc2b9f952)
 

Author SHA1 Message Date
Bryce Arden 4591884d06 [refbackrt] Scalar arg support
* Adds f32 scalar argument support across the ABI boundary.
* Adds support for passing input type / shape information
  across the ABI boundary
* Adds support for parsing / creating input FloatAttr's in
  `npcomp-run-mlir`
2021-03-23 13:16:44 -07:00
Sean Silva 703428eff4 Add support for "trailing_" and "out" variants of various ops.
We already had the `promoteTrailingOutTensor` flag, but weren't using
it. A inplaceVariantKernelName flag needed to be added.

This change is a little dissatisfying, as the conversions done by the
RecognizeKernelsPass are currently non-orthogonal. In particular,
`kDropResultAndAliasArg0` probably won't work as intended if mixed with
these (we probably need to promote kDropResultAndAliasArg0 to not be an
arg-level thing anyway, as we have done with promoteTrailingOutTensor).

This involved adding a new op `numpy.overwrite_array`.

```
numpy.overwrite_array %arg2 overwrites %arg0 : tensor<2x3xf32>, !numpy.ndarray<[2,3]:f32>
```

This models the destructive update behavior. Note that in the above op,
we cannot simply RAUW %arg0 with a suitably conveted %arg2 (for example,
%arg0 might have uses that are not dominated by %arg2, or might have an
alias relation with some other array in the program). In general, we
need a pass analogous to "SSA-formation" which knows how to see through
these to uncover an underlying tensor program.

Also, add tanh_out_e2e.py/div_inplace_e2e.py and fix some bitrot in
refjit.py which is my running example I'm trying to get working.
2021-03-19 10:34:50 -07:00
Sean Silva a53ed850bd Fix signature of unboxed aten::arange for torch HEAD 2021-03-18 17:53:52 -07:00
Bairen Yi 19b9398aee Revert "Skip torchvision 0.9.0 as it is incompatible with torch nightly"
This reverts commit e7b96ebefc.
2021-03-16 19:37:45 -07:00
Bairen Yi fead0312f1 Revert "Also fallback autograd dispatch keys for torchvision::nms"
This reverts commit 30a42dea32.
2021-03-16 19:37:45 -07:00
Sean Silva ba482cbb72 Generate Conv2d definition.
We should generally be using torch_signature_ods_gen.py for generating
these. Somehow this one slipped through manually.

There is no `aten::conv2d_overridable` in the op registry AFAICT so I
removed that alias.
2021-03-16 12:39:28 -07:00
Sean Silva c607efa205 Make ATenOpRegistrations.txt dump more readable.
Also add `is_write` field.
2021-03-16 12:39:28 -07:00
Bairen Yi 30a42dea32 Also fallback autograd dispatch keys for torchvision::nms
Signed-off-by: Bairen Yi <yibairen.byron@bytedance.com>
2021-03-15 17:58:08 -07:00
Bairen Yi e7b96ebefc Skip torchvision 0.9.0 as it is incompatible with torch nightly
torchvision nightly has not bump to 0.10.0 alpha, so pip installs
torchvision==0.9.0 even with the --pre flag.

Signed-off-by: Bairen Yi <yibairen.byron@bytedance.com>
2021-03-15 17:58:08 -07:00
Sean Silva 4cf8aef5d6 Add roadmap doc. 2021-03-15 14:43:51 -07:00
Aaron Arthurs 4fd9b4afb5
Import ATen conv2d conversion and test (#180)
* Import ATen conv2d conversion and test

This is a first attempt at expanding ATen-to-TCF conversion for the
conv2d operator. Eventually, this will come in use when lowering a
high-level conv-based model.
2021-03-12 17:21:16 -08:00
Sean Silva 58c7030104 Support multiple instances of a class in GlobalizeObjectGraph.
This happens in practice with e.g. ResNet from torchvision (multiple
instances of the same BatchNorm class).

The key observation is that for this program, and the expected set of
programs, we can convert the program to the same globalized form with a
bit more static analysis and effort to suitably monomorphize the
program. Though what we are doing here is fairly annoying to implement,
it saves any nontrivial later pass from having to do similar analyses
(or worse). E.g. shape inference would need to be object-graph aware,
mutation/lifetime analyses would have to be aware, etc. Additionally, it
would make us front-load what it means to have a !torch.nn.Module type
on an ABI boundary, which we are just not ready to handle.

I'm really, really hoping that in practice we can get away with
this, otherwise it's going to be really rough designing a representation
(and implementing everything to back it) that is convenient to transform
and gracefully scales from full object graph (in the most dynamic case)
down to a fixed set of global slots like we have here (in the most
static case, which we presume a lot of practical programs fall into).

This also involved introducing a
`torch-prepare-for-globalize-object-graph` pass that does a minimal set of
lowerings to simplify the IR into a more orthogonal and analyzable form,
and a `torch-globalize-pipeline` helper.

Recommended review order:
- updated documentation in Passes.td
- new tests in `globalize-object-graph-multiple-instances*.mlir`
- implementation of GlobalizeObjectGraph.cpp
- PrepareForGlobalizeObjectGraph.cpp + prepare-for-globalize-object-graph.mlir
- misc stuff like torch-globalize-pipeline pipeline definition.

With this, we can import, globalize, and inline resnet18 from
torchvision:
https://gist.github.com/silvasean/821586afc19b67d9fb72030b2e0adeb8
2021-03-11 19:21:07 -08:00
Sean Silva 2750d2084c Add prim::device and handle derefining for prim::CallMethod 2021-03-11 14:10:09 -08:00
Sean Silva 572d198b68 Refactor prim node imports. 2021-03-11 14:10:09 -08:00
Sean Silva 01b8a01e1b prim::dtype op 2021-03-11 14:10:09 -08:00
Bairen Yi 5fed296904 Address missing default label in switch statement
Signed-off-by: Bairen Yi <yibairen.byron@bytedance.com>
2021-03-11 11:55:59 -08:00
Bairen Yi 5315598947 Update .getAttrs to ->getAttrs as it is deprecated.
Signed-off-by: Bairen Yi <yibairen.byron@bytedance.com>
2021-03-11 11:55:59 -08:00
Bryce Arden e7a8fd76e2
[refbackrt] Update Invoke API to support more than just Tensor's (#181) 2021-03-10 15:39:26 -08:00
Bairen Yi 8f9d4f917d Add LLVM_LINK_LLVM_DYLIB=ON and remove LLVM_ENABLE_LLD=ON when building LLVM in GitHub CI
So CI build options are closer to those in `build_tools/install_mlir.sh`.

Also append hash of CI spec file to LLVM commit hash when caching builds.

Signed-off-by: Bairen Yi <yibairen.byron@bytedance.com>
2021-03-10 11:01:16 -08:00
Bairen Yi 53b01cb9ba Bump llvm-project to e31c77b1827fa4dd3511f21af11cfab18ecf6d38
Signed-off-by: Bairen Yi <yibairen.byron@bytedance.com>
2021-03-10 11:01:16 -08:00
stephenneuendorffer 06373dcbbb
Add install options for npcomp libraries and executables (#183) 2021-03-10 07:18:54 -08:00
Bryce Arden b94a859e03
[torch] Add import support for IValue string Type(s) (#179)
* [torch] Add import support for IValue string Type(s)

* [test] Add test for Strings import
2021-03-04 13:08:50 -08:00
Sean Silva a36113e586 Fix recent break due to PyTorch changes.
Tracing seems now now capture a 4-operand version of aten::add instead
of 3-operand.

I fixed the tests that made sense. One test was XFAIL'ed, as I don't
have in cache the exact way to fix it yet (requires touching
aten-recogniz-kernels stuff).  I'll be context switching to work on the
kernel recognition stuff soon, and will fix it then.
2021-03-03 18:35:23 -08:00
Sean Silva 43dba03afd Properly model "derefinement".
In terms of IR structure, TorchScript allows types to vary in many
circumstances where MLIR requires pointer-identical types. In particular,
it is valid to pass any subtype in place of a type. For example, if an
`Optional[int]` is required somewhere in the IR, it is legal to pass a
value of just `int` (but not the other way around; see
`torch.prim.unchecked_cast`). In effect, every *use* can have a different
type.

We introduce a new op `torch.derefine` that models that impedance
mismatch. This op allows casting a value from one type to a type that it
is a subtype of to model this behavior.

Recommended review order:
- TorchOps.td for new torch.derefine (and updated docs for
  `torch.prim.unchecked_cast`)
- new test code in if.py, loop.py, function-derefine.py
- new code in node_importer.cpp for handling derefinement insertion
- function_importer.cpp and utils changes in torch_to_mlir_utils.cpp

Properly handling derefinement on function boundaries required
relayering the code so that graph_importer.cpp/.h is now
function_importer.cpp/.h because only the `torch::jit::Function`
(actually the `c10::FunctionSchema` it holds) knows the derefined types that are
actually needed at the boundary (see `function-derefine.py` for a test).

Annoyingly, this churns all the functions which are now prefixed with
`__torch__.` but that is more correct anyway (that is their linkage name
in the `torch::jit::CompilationUnit`; the previous `mb.import_function`
was actually buggy in the case of functions calling each other as it
would reference their unqualified name).

With this change, we can import `resnet18` from `torchvision` :)
IR: https://gist.github.com/silvasean/6426a5272d8a6c7caae533fce05ab704
2021-03-03 15:09:44 -08:00
Bryce Arden 1736ff0253 [prim] Add TupleIndex support
I could not find a corresponding ListIndex in prim, which seems to
translate to a __get_attr__ under the hood. I think the reason a tuple
Index op can exist is because Tuple's are supposed to be frozen, where
List operands can be mutable.
2021-03-02 17:28:32 -08:00
Bryce Arden 68338eafb7 [chore] Make variable names in prim.py more clear 2021-03-02 17:28:32 -08:00
Bryce Arden ca3a02da28 [prim] Add support for List|TupleUnpack 2021-03-02 17:28:32 -08:00
Sean Silva df4c5764da Add support for `prim::unchecked_cast`.
This arises when casting optionals, which happens a lot especially
around handling of default arguments (python `if arg is None` idiom).

In this case, the offending code for the model is in max_pool2d:
[code link](b3bf08e67f/torch/nn/functional.py (L657))
2021-03-02 16:01:34 -08:00
Sean Silva 939d36906f Add support for prim::Loop op.
This is a funny one. It combines a `for` and `while` loop in one op. We
will need to write some conversions to `scf`.
2021-03-02 16:01:34 -08:00
Sean Silva 7dfd6f697e Add support for prim::RaiseException.
Used by resnet18.

It seems to originate from a helper `_verify_batch_size`:
[code link](b3bf08e67f/torch/nn/functional.py (L2099)).

I couldn't find a way to test `prim::RaiseException` without also having
`prim::Uninitialized`.
2021-03-02 16:01:34 -08:00
Yi Zhang 7bb3b2eb6e Fix the import path in python samples 2021-03-02 13:40:08 -08:00
Sean Silva c837dbb077 Properly import the entire torch::jit::CompilationUnit
This primarily unlocks proper handling of free functions (that is,
functions that are not methods of any torch.nn.Module).

Recommended review order:
- `ivalue_importer.cpp` + `ivalue_import/functions*.py`
- `GlobalizeObjectGraph.cpp` + test case
- misc other stuff

The `torch::jit::CompilationUnit` is basically a backing store or
"context" holding all the possible functions in the program. The
previous code was not explicitly accessing this data structure, since it
just imported the `torch::jit::Function`'s that it saw attached to
methods.

Subtly, any time a TorchScript module called into a free function, the
free function gets incorporated into the torch::jit::CompilationUnit,
but doesn't show up anywhere when dumping the module, except in the
curious pattern:

```
%5 : Function = prim::Constant[name="adaptive_avg_pool2d"]()
%6 : Tensor = prim::CallFunction(%5, %input.1, %4)
```

That is, calls are indirect calls, and are accessed via `prim::Constant`
materializing a function object. Even stranger, the `name` attribute here
doesn't really even tell the full story -- it doesn't correspond to
anything. It turns out that the c10::FunctionType itself actually holds
a pointer to the `torch::jit::Function` in the compilation unit
directly (so there is actually no indirection in prim::CallMethod,
because any two values of the same FunctionType call the same
function!). E.g. when converting the IR to bytecode, the "name" is
ignored [code link](1d6bd15790/torch/csrc/jit/runtime/interpreter.cpp (L937)).
We do import `prim::CallFunction` as a `std.call_indirect` though
because it's more braindead to do it that way (it gets canonicalized to
a direct call easily).
2021-03-01 12:08:01 -08:00
Yi Zhang 2c2286034b
install pybind11 through pip to get version 2.6 (#173) 2021-02-28 16:19:03 -08:00
Sean Silva 79a3f639bf Give torch.global_slot an initializer region.
This is a much simpler representation than the ad-hoc initializer
function we had before. It is also less general, but given the rationale
in Passes.td it seems like the right tradeoff right now.

We can probably carry this representation for quite a while, and when we
can't, it likely means that TorchScript has fixed their object identity
bug and we probably need to just upgrade to a more general object graph
modeling (more general than GlobalizeObjectGraph).

In particular, we don't want to deal with defining and carrying around
this initializer function concept until we need it. For example, if we
want to constant-fold the global slots into uses, this is a much better
representation, and it plays better with symbol-dce (the initializer
function counts as a "use" of the symbol).

(the alternative would have been to write a pass that converts the
initializer function to this form when possible, but I realized that
lots of information had been lost which made that fairly annoying -- it
was all self-inflicted anyway, so best to just go to the source
(GlobalizeObjectGraph) before the information is lost)

Now symbol-dce works nicely (no more "training" bools)
```
pt_util ~/tmp/classifier.pt --import --exported-name forward \
| npcomp-opt -torch-globalize-object-graph -inline -symbol-dce
```
IR: https://gist.github.com/silvasean/8abe63d70d24e29d6db9170ccc8d512b
2021-02-26 16:24:19 -08:00
Sean Silva 59a3f46795 Add support for prim.NumToTensor
With this, we can import BERT!
```
pt_util ~/tmp/bert.pt  --import --exported-name=forward \
| npcomp-opt -torch-globalize-object-graph -inline -symbol-dce
```
https://gist.github.com/silvasean/fe7735ff5d065cc9216f7b0346d0e977

The test case here is a bit unconventional -- it isn't actually valid
Python. To figure out how to generate it I had to go search the PyTorch
codebase for "NumToTensor" and work backward. In this case I found
this
[code](649760e5f1/torch/csrc/jit/frontend/ir_emitter.cpp (L464))
which via a wild guess I was able to turn into a test case.

In this case it didn't take me too long, but when doing this kind of
"add a bunch of trivial stuff to bring up a real model", I'm starting to
think that we might skimp on test cases when it's fairly trivial and not
obvious how to test with a small test.
2021-02-26 10:16:56 -08:00
Sean Silva 7b6fa27838 Rename tests to match the code they test
- `module_import -> ivalue_import`, as it mainly tests ivalue_importer.cpp
- `graph_import -> node_import`, as it mainly tests node_importer.cpp
 - graph_importer.cpp does call into node_importer.cpp, but doesn't do
 much.

This was getting pretty confusing. Also add README.md's in each
directory for more clarity.
2021-02-25 13:31:33 -08:00
Bryce Arden 27a4515de2
Add Conv2D Torchscript Import Support (#167)
Adds support for lowering a torch.nn.Conv2d module to the Torch Dialect through TorchScript import.
Generated IR can be viewed here:
https://gist.github.com/brycearden/6c0f790115c4577249372ef82768e6fd

Required implementing support for tuple in the ivalue importer and list in the node importer.
2021-02-25 12:14:00 -08:00
Sean Silva a375ccf9da Add ability to annotate TorchScript classes.
The first use case is to annotate certain program constructs as either
exported or private. In this commit we plumb it down to
GlobalizeObjectGraph which makes use of this information.

Recommended review order:
1. class_annotator.h/.cpp + `test/module_import/annotations/*`
    - New abstractions to communicate with Python code and annotate.
2. IR changes in TorchOps.td
    - Adding "private" attribute to various things.
3. ivalue_import.cpp changes
    - Module + ClassAnnotator = annotated IR
4. GlobalizeObjectGraph.cpp + tests
    - use new "private" attributes to create "private" IR.
    - also, tweak some of the op deleting mechanics, which was triggering
      some memory errors / assertions

With this, we can run the classifier through and inline it as follows:
```
frontends/pytorch/utils/pt_util.py --import --exported-name forward ~/tmp/classifier.pt \
| npcomp-opt -torch-globalize-object-graph -inline
```
IR: https://gist.github.com/silvasean/32dcad9f6270557f412094a77cecdd69
2021-02-25 11:28:34 -08:00
Sean Silva c424c24ed8 Bump llvm-project to c68d2895a1f4019b387c69d1e5eec31b0eb5e7b0
- dialect registration
- StringAttr::get: order of context arg
- math dialect
- LogicalResult nodiscard
- error message for invalid broadcast
2021-02-22 12:23:24 -08:00
Sean Silva 8486968925 Add trivial inliner interfaces.
With this + manually setting private visibility on everything, a simple
classifier can be reduced to this IR, which is looking pretty lean and
mean:
https://gist.github.com/silvasean/19e7e2e21a61ff197aeac0dd864d188f

Also, include a utility script for importing `.pt` models.

```
pt_util.py --import classifier.pt | npcomp-opt -torch-globalize-object-graph
```
2021-02-22 10:40:38 -08:00
powderluv cecf1fbba5
Add a CI builder with latest pytorch CPU nightly. Also add AArch64 to the build (#166) 2021-02-21 13:36:06 -08:00
Sean Silva 1b769f7841 Extend GlobalizeObjectGraph to handle torch.prim.GetAttr returning NnModuleType
This happens in practice. With this, we can globalize slots for the
non-trivial classifier layer obtained from
https://github.com/NVIDIA/NeMo/blob/main/tutorials/nlp/Joint_Intent_and_Slot_Classification.ipynb

This also adds support for tuple return types, which were needed by that
model.
2021-02-19 10:23:25 -08:00
Bairen Yi b511158141 Add vim swap file in gitignore
Signed-off-by: Bairen Yi <yibairen.byron@bytedance.com>
2021-02-18 19:06:10 -08:00
Sean Silva 158c5c484d Implement GlobalizeObjectGraph transformation.
This required restructuring of how we model TorchScript on import. The
main difference is that now we split out a `torch.class_type` that holds
methods and declarations of the types of each slot. This is more
consistent with TorchScript (our previous representation was
"denormalized").

Recommended reading order:
1. check out the description of `torch.class_type` in `TorchOps.td` and
   look at `test/Dialect/Torch/ops.mlir` and
   `frontends/pytorch/test/module_import/` to familiarize with the new
   representation.
   - Just look at the new IR. The diff between the old names and new
     names is confusing.
2. check out `test/Dialect/Torch/globalize-object-graph*.mlir`
   and read along with the pass description in
   `include/npcomp/Dialect/Torch/Transforms/Passes.td`
3. Read the code in `GlobalizeObjectGraph.cpp` and miscellaneous changes
   in `ivalue_importer.cpp`, `TorchOps.cpp`, etc.
2021-02-18 18:18:47 -08:00
Bairen Yi 99d1db18d2 Add NoneType support for ivalue_importer
PyTorch added a Global variable `_is_full_backward_hook` recently.

See https://github.com/pytorch/pytorch/pull/46163

Signed-off-by: Bairen Yi <yibairen.byron@bytedance.com>
2021-02-18 11:20:29 -08:00
Stanley Winata a38b7b72b2 adapt acap_dispatch to latest pytorch nightly ("1.9.0.dev20210215+cpu")
Modify ACAP_Dispatch to work with latest pytorch
-Remove boxed from convolution's m.impl
-Use redispatch and constrainted keyset to replace deprecated
callwithdispatchkey
2021-02-18 11:13:29 -08:00
Sean Silva 498979ad28 Add MLIR diagnostic handler that prints to `sys.stderr`.
This is needed so that output shows up properly in a Jupyter notebook.
2021-02-17 18:50:05 -08:00
Sean Silva 572163dfde Handle object identity correctly.
This required some careful considerations when defining object identity
for tensors. See the comments for how we do it.

This also tracks some basic information for diagnostics.
2021-02-10 15:15:56 -08:00
Sean Silva 7f7bf39551 Add prim::Print and fix prim::CallMethod
For now, we are treating strings as bytes.
2021-02-10 15:15:56 -08:00
Sean Silva c4e4a11e3f Add support for prim::GetAttr/SetAttr/CallMethod/If
This required some invasive surgery to graph_importer.h/cpp,
specifically moving most of it into node_importer.h/cpp and relayering
it. The abstraction that it had didn't work well in the recursive
setting that happens with prim::If.

The key observation is that torch::jit::Graph doesn't really correspond
directly to anything on the MLIR side. It's a weird combination of a
context, builder, and function and just holds a `torch::jit::Block`. It
is `torch::jit::Node` and `torch::jit::Block` which form the recursive
structure analogous to MLIR's operation/region/block. So
node_importer.h/cpp makes sense as a core building block.

As part of doing this, I did venture a bit into the AcapController code,
and realize now that there is functionality duplicated there with the
ivalue importer. Will refactor that soon.
2021-02-04 17:01:47 -08:00