Commit Graph

129 Commits (b1c49ae6484f9fe704e7c1dbd55b49d9728a9924)

Author SHA1 Message Date
Sean Silva 3a890aa26c Miscellaneous changes while trying to work on ResNet18
- Move frontend lowering pipelines to c++ (this helps with reproducing
  failures in npcomp-opt)
- Add debugging printouts when compilation fails on RefBackendTestConfig

The experience now when a test fails during MLIR lowering is now like this:
```
NPCOMP TorchScript Object Graph IR -> NPCOMP Backend IR lowering failed with the following diagnostics:
failed to legalize operation 'torch.global_slot'
Module does not conform to npcomp's backend contract. See dialect conversion legality information above.

Error can be reproduced with:
$ npcomp-opt -torchscript-to-npcomp-backend-pipeline /tmp/ResNet18Module.mlir
```

And when TorchScript->MLIR import fails it looks like this:
```
PyTorch TorchScript module -> NPCOMP Object Graph IR import failed with the following diagnostics:
unhandled prim operation: %18 : int = prim::min(%17) # /usr/local/google/home/silvasean/.local/lib/python3.9/site-packages/torch/nn/functional.py:4532:4
```

Also,
- Add `--filter=<regex>` to e2e test harness to filter tests.
- Add a few prim ops that were needed to import ResNet18
- Fix torch.prim.Loop.condition assemblyFormat (it previously would not
  round-trip in the case of no loop-carried variables)
2021-04-27 11:51:11 -07:00
Sean Silva 544cb4ef54 Bump llvm-project to 484b6648fdd4b104eaf7a2504dd07b60af2c9f8d
- add_mlir_doc arg order
- fix some dependent dialects on passes that were now causing errors
- "encoding" attribute on mlirRankedTensorTypeGetChecked
2021-04-22 18:12:55 -07:00
Sean Silva fef1733e12 Fix issue with unused functions in torch::jit::CompilationUnit
As described in the code comment:

```
When we import TorchScript IR, we import their entire "compilation unit",
which can contain numerous functions unrelated to the current program,
which breaks torch-globalization-pipeline; for example, there can be
random functions referencing types that haven't been imported
as part of the root `torch.nn.Module` we imported. Those will
be unreferenced private functions which symbol-dce will clean up nicely.
```

This situation is really easy to hit in jupyter notebooks, where the
same cell is evaluated multiple times. That results in the same
class name (at the Python level, e.g. class `Foo` in the top-level
main module). Internally to PyTorch, it handles this situation by
mangling in a unique number to the names of ClassType's and such. When
we import the new ClassType's, we see not just the new
torch::jit::Function's in the CompilationUnit, but, also all the old
ones, which reference ClassType's that are not reachable from the
`torch.nn.Module` that we imported.

Note: there is no way to avoid importing the whole CompilationUnit
(including these old remnants) without doing a fairly complicated call
graph reachability analysis of which functions are reachable from the
methods of the ClassType's we imported. It turns out that once we are
inside MLIR, we model visibility correctly so that `symbol-dce`
"Just Works" for this use case. That is to say, this is not a quick
hack, but rather seems like a totally palatable long-term solution.
2021-04-20 12:00:35 -07:00
Sean Silva f5dfa02523 Add `aten.mm` to linalg lowering.
This is our first op with error semantics, and stresses the system.

There are a few design notes of special interest:
- RefineTypes.cpp's note about shape inference in the presence of code
  that dynamically produces and error, and it is provable statically.
- ATenToLinalg.cpp's notes about future automation of the ATen->linalg
  path.
- The notes in Passes.td about using low-tech `std.assert` ops instead
  of `shape.assuming`.

Note: Doesn't work on IREE yet due to the `std.assert` op (needs to be
lowered to `vm.fail` on the IREE side).
2021-04-16 12:03:31 -07:00
Sean Silva 927546b3c5 Add RefinePublicReturn pass.
This pass allows shape information to be propagated to return types,
which is nontrivial and cannot be cleanly put anywhere else as it
changes the public ABI, which is a concern that we want to keep
concentrated in one place.
2021-04-07 11:06:34 -07:00
Sean Silva 1e357ae680 Add simple type refinement pass.
Currently implemented as a simple intraprocedural dataflow analysis over
a standard ShapedType lattice (hasRank, sizes, and elementType).

It currently hardcodes a few key pieces of information:
- shape transfer functions
- whether it is legal to update the operand type of an op

This needs to be made pluggable obviously and the core propagation logic
moved somewhere agnostic.
2021-04-07 11:06:34 -07:00
Sean Silva 6431b0f11f Add primitive ArrayToTensor (numpy-array-to-tensor) pass.
The current implementation is just sufficient to do a unary aten.tanh
from the e2e spike, and just applies some local rewrite patterns.  I've
sketched out the more full explanation of where this pass eventually
need to go in the pass docs.

Adding this required adding `numpy.tensor_static_info_cast`, which is
the tensor analog of `numpy.static_info_cast`. This op encapsulates the
same numpy-specific "no runtime code" casting semantics, in particular
the interpretation of `!numpy.any_dtype`. The
`numpy.tensor_static_info_cast` I see in practice now are "information
erasing" and will be removed by a later pass that exploits the fact that
aten ops are agnostic to the static info in the operand types (so
substituting a type with more static info is fine).

Side note: we *need* to do dtype and rank inference before aten->tcf
(which will eventually mostly be aten->linalg+guards), because each aten
op is idiosyncratically overloaded based on dtype and rank. Without
copying that idiosyncratic overloading into lower layers (layering
violation), we cannot really lower it to anything until we do that.
2021-04-05 17:56:35 -07:00
Sean Silva 30356c41c8 Add torch-adjust-calling-conventions pass.
This pass incorporates torch.type_bound info and also removes NoneType
returns (eventually it will rewrite tuple types too, but can't yet
because !basicpy.TupleType doesn't track element types).

Recommend looking at adjust-calling-conventions.mlir first to see what
it is doing, and holding your nose for the implementation of the pass.
I decided to implement this with the conversion framework, because it
gives us *some* goodies for type conversion -- mainly avoiding large
amounts of tricky RAUW dances. Unfortunately, the conversion framework
isn't a perfect fit for a couple reasons:
- the incorporation of torch.type_bound is a context-sensitive rewrite
  (requires looking at the arg attr, not just the type).
- NoneType conversion is 1->0, which requires some special handling
- (not implemented yet) 1->N tuple type conversions require special
  handling.
It's a little bit scary, but on balance doing it the other way would
have its own downsides.
2021-04-05 17:56:35 -07:00
Sean Silva 464feacba9 Bump llvm-project to 223dcdcfbe23affdf17ada7f023ee1872fd76160
- ModuleOp no longer has a terminator.
2021-04-05 17:56:35 -07:00
Sean Silva e749074bae Basic infra for annotate shapes and dtypes on arguments.
These allow users to annotate a known "type bound" on the argument,
which can seed shape/dtype inference. We don't rewrite the function
types as part of the import process (it will happen in a
yet-to-be-written pass) because:

1. We would need to interprocedurally rewrite all calls to keep the IR
   consistent. Currently, we have a place after GlobalizeObjectGraph but
   before we convert to tensors where this is convenient to do. Ideally,
   we would do this on the object graph representation.

1. We don't necessarily know that adjusting the function type is a legal
   calling convention change. The pass will have blessed knowledge (by
   the pass pipeline author) that adjusting the argument type based on
   the type bound is safe (which it frequently is).

2. Note that in principle, a type bound could be a fairly general thing
   (such as maximum sizes of dimensions, unions of multiple concrete
   types, etc.). The pass will in principle have logic to interpret the
   type bounds and to determine a suitable "best" (and legal) argument
   type.
2021-04-01 18:40:03 -07:00
Sean Silva 99178a167d Bump llvm-project to 0524a09cc7e1a0797982feacf505825231efbee7
- renames of OwningRewritePatternList -> RewritePatternSet
  - also `insert` to `add`
- RewritePatternSet holds a context now
- memref dialect split from std
2021-03-23 14:29:05 -07:00
Bryce Arden 4591884d06 [refbackrt] Scalar arg support
* Adds f32 scalar argument support across the ABI boundary.
* Adds support for passing input type / shape information
  across the ABI boundary
* Adds support for parsing / creating input FloatAttr's in
  `npcomp-run-mlir`
2021-03-23 13:16:44 -07:00
Sean Silva 703428eff4 Add support for "trailing_" and "out" variants of various ops.
We already had the `promoteTrailingOutTensor` flag, but weren't using
it. A inplaceVariantKernelName flag needed to be added.

This change is a little dissatisfying, as the conversions done by the
RecognizeKernelsPass are currently non-orthogonal. In particular,
`kDropResultAndAliasArg0` probably won't work as intended if mixed with
these (we probably need to promote kDropResultAndAliasArg0 to not be an
arg-level thing anyway, as we have done with promoteTrailingOutTensor).

This involved adding a new op `numpy.overwrite_array`.

```
numpy.overwrite_array %arg2 overwrites %arg0 : tensor<2x3xf32>, !numpy.ndarray<[2,3]:f32>
```

This models the destructive update behavior. Note that in the above op,
we cannot simply RAUW %arg0 with a suitably conveted %arg2 (for example,
%arg0 might have uses that are not dominated by %arg2, or might have an
alias relation with some other array in the program). In general, we
need a pass analogous to "SSA-formation" which knows how to see through
these to uncover an underlying tensor program.

Also, add tanh_out_e2e.py/div_inplace_e2e.py and fix some bitrot in
refjit.py which is my running example I'm trying to get working.
2021-03-19 10:34:50 -07:00
Sean Silva 58c7030104 Support multiple instances of a class in GlobalizeObjectGraph.
This happens in practice with e.g. ResNet from torchvision (multiple
instances of the same BatchNorm class).

The key observation is that for this program, and the expected set of
programs, we can convert the program to the same globalized form with a
bit more static analysis and effort to suitably monomorphize the
program. Though what we are doing here is fairly annoying to implement,
it saves any nontrivial later pass from having to do similar analyses
(or worse). E.g. shape inference would need to be object-graph aware,
mutation/lifetime analyses would have to be aware, etc. Additionally, it
would make us front-load what it means to have a !torch.nn.Module type
on an ABI boundary, which we are just not ready to handle.

I'm really, really hoping that in practice we can get away with
this, otherwise it's going to be really rough designing a representation
(and implementing everything to back it) that is convenient to transform
and gracefully scales from full object graph (in the most dynamic case)
down to a fixed set of global slots like we have here (in the most
static case, which we presume a lot of practical programs fall into).

This also involved introducing a
`torch-prepare-for-globalize-object-graph` pass that does a minimal set of
lowerings to simplify the IR into a more orthogonal and analyzable form,
and a `torch-globalize-pipeline` helper.

Recommended review order:
- updated documentation in Passes.td
- new tests in `globalize-object-graph-multiple-instances*.mlir`
- implementation of GlobalizeObjectGraph.cpp
- PrepareForGlobalizeObjectGraph.cpp + prepare-for-globalize-object-graph.mlir
- misc stuff like torch-globalize-pipeline pipeline definition.

With this, we can import, globalize, and inline resnet18 from
torchvision:
https://gist.github.com/silvasean/821586afc19b67d9fb72030b2e0adeb8
2021-03-11 19:21:07 -08:00
Bairen Yi 5315598947 Update .getAttrs to ->getAttrs as it is deprecated.
Signed-off-by: Bairen Yi <yibairen.byron@bytedance.com>
2021-03-11 11:55:59 -08:00
Bairen Yi 53b01cb9ba Bump llvm-project to e31c77b1827fa4dd3511f21af11cfab18ecf6d38
Signed-off-by: Bairen Yi <yibairen.byron@bytedance.com>
2021-03-10 11:01:16 -08:00
Sean Silva 43dba03afd Properly model "derefinement".
In terms of IR structure, TorchScript allows types to vary in many
circumstances where MLIR requires pointer-identical types. In particular,
it is valid to pass any subtype in place of a type. For example, if an
`Optional[int]` is required somewhere in the IR, it is legal to pass a
value of just `int` (but not the other way around; see
`torch.prim.unchecked_cast`). In effect, every *use* can have a different
type.

We introduce a new op `torch.derefine` that models that impedance
mismatch. This op allows casting a value from one type to a type that it
is a subtype of to model this behavior.

Recommended review order:
- TorchOps.td for new torch.derefine (and updated docs for
  `torch.prim.unchecked_cast`)
- new test code in if.py, loop.py, function-derefine.py
- new code in node_importer.cpp for handling derefinement insertion
- function_importer.cpp and utils changes in torch_to_mlir_utils.cpp

Properly handling derefinement on function boundaries required
relayering the code so that graph_importer.cpp/.h is now
function_importer.cpp/.h because only the `torch::jit::Function`
(actually the `c10::FunctionSchema` it holds) knows the derefined types that are
actually needed at the boundary (see `function-derefine.py` for a test).

Annoyingly, this churns all the functions which are now prefixed with
`__torch__.` but that is more correct anyway (that is their linkage name
in the `torch::jit::CompilationUnit`; the previous `mb.import_function`
was actually buggy in the case of functions calling each other as it
would reference their unqualified name).

With this change, we can import `resnet18` from `torchvision` :)
IR: https://gist.github.com/silvasean/6426a5272d8a6c7caae533fce05ab704
2021-03-03 15:09:44 -08:00
Sean Silva 939d36906f Add support for prim::Loop op.
This is a funny one. It combines a `for` and `while` loop in one op. We
will need to write some conversions to `scf`.
2021-03-02 16:01:34 -08:00
Sean Silva c837dbb077 Properly import the entire torch::jit::CompilationUnit
This primarily unlocks proper handling of free functions (that is,
functions that are not methods of any torch.nn.Module).

Recommended review order:
- `ivalue_importer.cpp` + `ivalue_import/functions*.py`
- `GlobalizeObjectGraph.cpp` + test case
- misc other stuff

The `torch::jit::CompilationUnit` is basically a backing store or
"context" holding all the possible functions in the program. The
previous code was not explicitly accessing this data structure, since it
just imported the `torch::jit::Function`'s that it saw attached to
methods.

Subtly, any time a TorchScript module called into a free function, the
free function gets incorporated into the torch::jit::CompilationUnit,
but doesn't show up anywhere when dumping the module, except in the
curious pattern:

```
%5 : Function = prim::Constant[name="adaptive_avg_pool2d"]()
%6 : Tensor = prim::CallFunction(%5, %input.1, %4)
```

That is, calls are indirect calls, and are accessed via `prim::Constant`
materializing a function object. Even stranger, the `name` attribute here
doesn't really even tell the full story -- it doesn't correspond to
anything. It turns out that the c10::FunctionType itself actually holds
a pointer to the `torch::jit::Function` in the compilation unit
directly (so there is actually no indirection in prim::CallMethod,
because any two values of the same FunctionType call the same
function!). E.g. when converting the IR to bytecode, the "name" is
ignored [code link](1d6bd15790/torch/csrc/jit/runtime/interpreter.cpp (L937)).
We do import `prim::CallFunction` as a `std.call_indirect` though
because it's more braindead to do it that way (it gets canonicalized to
a direct call easily).
2021-03-01 12:08:01 -08:00
Sean Silva 79a3f639bf Give torch.global_slot an initializer region.
This is a much simpler representation than the ad-hoc initializer
function we had before. It is also less general, but given the rationale
in Passes.td it seems like the right tradeoff right now.

We can probably carry this representation for quite a while, and when we
can't, it likely means that TorchScript has fixed their object identity
bug and we probably need to just upgrade to a more general object graph
modeling (more general than GlobalizeObjectGraph).

In particular, we don't want to deal with defining and carrying around
this initializer function concept until we need it. For example, if we
want to constant-fold the global slots into uses, this is a much better
representation, and it plays better with symbol-dce (the initializer
function counts as a "use" of the symbol).

(the alternative would have been to write a pass that converts the
initializer function to this form when possible, but I realized that
lots of information had been lost which made that fairly annoying -- it
was all self-inflicted anyway, so best to just go to the source
(GlobalizeObjectGraph) before the information is lost)

Now symbol-dce works nicely (no more "training" bools)
```
pt_util ~/tmp/classifier.pt --import --exported-name forward \
| npcomp-opt -torch-globalize-object-graph -inline -symbol-dce
```
IR: https://gist.github.com/silvasean/8abe63d70d24e29d6db9170ccc8d512b
2021-02-26 16:24:19 -08:00
Sean Silva a375ccf9da Add ability to annotate TorchScript classes.
The first use case is to annotate certain program constructs as either
exported or private. In this commit we plumb it down to
GlobalizeObjectGraph which makes use of this information.

Recommended review order:
1. class_annotator.h/.cpp + `test/module_import/annotations/*`
    - New abstractions to communicate with Python code and annotate.
2. IR changes in TorchOps.td
    - Adding "private" attribute to various things.
3. ivalue_import.cpp changes
    - Module + ClassAnnotator = annotated IR
4. GlobalizeObjectGraph.cpp + tests
    - use new "private" attributes to create "private" IR.
    - also, tweak some of the op deleting mechanics, which was triggering
      some memory errors / assertions

With this, we can run the classifier through and inline it as follows:
```
frontends/pytorch/utils/pt_util.py --import --exported-name forward ~/tmp/classifier.pt \
| npcomp-opt -torch-globalize-object-graph -inline
```
IR: https://gist.github.com/silvasean/32dcad9f6270557f412094a77cecdd69
2021-02-25 11:28:34 -08:00
Sean Silva c424c24ed8 Bump llvm-project to c68d2895a1f4019b387c69d1e5eec31b0eb5e7b0
- dialect registration
- StringAttr::get: order of context arg
- math dialect
- LogicalResult nodiscard
- error message for invalid broadcast
2021-02-22 12:23:24 -08:00
Sean Silva 8486968925 Add trivial inliner interfaces.
With this + manually setting private visibility on everything, a simple
classifier can be reduced to this IR, which is looking pretty lean and
mean:
https://gist.github.com/silvasean/19e7e2e21a61ff197aeac0dd864d188f

Also, include a utility script for importing `.pt` models.

```
pt_util.py --import classifier.pt | npcomp-opt -torch-globalize-object-graph
```
2021-02-22 10:40:38 -08:00
Sean Silva 1b769f7841 Extend GlobalizeObjectGraph to handle torch.prim.GetAttr returning NnModuleType
This happens in practice. With this, we can globalize slots for the
non-trivial classifier layer obtained from
https://github.com/NVIDIA/NeMo/blob/main/tutorials/nlp/Joint_Intent_and_Slot_Classification.ipynb

This also adds support for tuple return types, which were needed by that
model.
2021-02-19 10:23:25 -08:00
Sean Silva 158c5c484d Implement GlobalizeObjectGraph transformation.
This required restructuring of how we model TorchScript on import. The
main difference is that now we split out a `torch.class_type` that holds
methods and declarations of the types of each slot. This is more
consistent with TorchScript (our previous representation was
"denormalized").

Recommended reading order:
1. check out the description of `torch.class_type` in `TorchOps.td` and
   look at `test/Dialect/Torch/ops.mlir` and
   `frontends/pytorch/test/module_import/` to familiarize with the new
   representation.
   - Just look at the new IR. The diff between the old names and new
     names is confusing.
2. check out `test/Dialect/Torch/globalize-object-graph*.mlir`
   and read along with the pass description in
   `include/npcomp/Dialect/Torch/Transforms/Passes.td`
3. Read the code in `GlobalizeObjectGraph.cpp` and miscellaneous changes
   in `ivalue_importer.cpp`, `TorchOps.cpp`, etc.
2021-02-18 18:18:47 -08:00
Aaron J Arthurs 484fe0d9bd Reformat code 2021-01-28 12:01:35 -08:00
Aaron J Arthurs c0e14da888 Fix TensorFromElementsOp reference 2021-01-28 12:01:35 -08:00
Aaron J Arthurs fc650c9447 Import TCP pad 2021-01-28 12:01:35 -08:00
Sean Silva 689b40c7a6 Add initial TorchScript module importer
It turns out that this was easiest to structure as a general IValue
importer, since torch module are just one of the possible IValue's.

We import the IValue object graph in a braindead fashion into basicpy
ops and a new `torch.nn_module` op that is used to model the
attributes/methods of a torch::jit::Module IValue. See `Torch/ops.mlir`
for an example, and also check out the .py import tests in
`frontends/pytorch/test/module_import`.

As part of this change, a few housekeeping tasks:
- extract some helpers from graph_importer.cpp
- more helpers around the C API
- misc touchups
2021-01-28 11:55:17 -08:00
Sean Silva 1965ac4d67 NFC: mark some methods as `override`
This silences some warnings I was seeing locally.
2021-01-21 11:48:41 -08:00
Sean Silva 97d6d04d41 Bump llvm-project to 16c6e9c58e9ae50a775945e6b407f1891f353d2f
Changes:
- linalg init tensor change (outs+init -> just outs)
- IntegerType::get and other builtin types now take the context as the
  first arg
- LLVMType::* is gone. Now LLVM Types are just regular Type's.
2021-01-05 16:12:11 -08:00
Sean Silva d818043986 Bump llvm-project to d50d7c37a159802c89454a6c53c0ec2e7949d84a
Fixes:
- use `op->(method on Operation)`
- update for MlirIdentifier in signature of mlirNamedAttributeGet
2020-12-14 14:30:51 -08:00
Sean Silva b2077738ca Bump llvm-project to 444822d77a7fea28aa49edf24533c987efa1b2ee
Fixes:
- renames StandardTypes -> BuiltinTypes
- std.extract_element -> tensor.extract
2020-12-11 14:43:38 -08:00
Sean Silva 46aa6d0a24 [RefBackend] Fix leaks related to ABI boundaries.
Best as I can tell (e.g. from LeakSanitizer), this fixes all the leaks
except for those due to buffers created internally to the codegenned
code itself (up next I'll add the buffer deallocation pass to fix
those).

The main change is that instead of attempting to pass `refbackrt::Tensor`
to the codegenned function directly, we make all the ABI types be
UnrankedMemRef which gets passed awkwardly (but workably) as a
`{size_t rank, void *ptrToDescriptor}` on the ABI. The reason why
refbackrt::Tensor wasn't workable is that is that MLIR doesn't really
have a way to deal with the lifetime of unranked memref descriptors that
happen inside the function, which is inevitably what would happen in the
old code that would emit runtime calls to
`refbackrt.to_memref/refbackrt.from_memref` to convert back and forth to
`refbackrt::Tensor` inside the codegenned code.

So, instead of the `refbackrt.to_memref/refbackrt.from_memref` with no
real sound basis for valid lifetime management, we now have a lovely
piece of code in `refbackrt::invoke` in `Runtime.cpp` that just barely
seems to be sound. We rely on the codegenned code having these
properties, which it seems to have:

- it won't free memref descriptors or their backing buffer for arguments
  of UnrankedMemRef type.

- it will allocate a separate memref descriptor for each result
  UnrankedMemRef (which is ensured by having a separate memref_cast for
  each)

- we can sniff the `allocatedPtr`'s (i.e. the backing buffer pointers)
  to avoid double-freeing in the case of aliasing of the backing buffer
  (including backing buffers for arguments feeding into results)

- to catch the case of statically allocated data (which we need to avoid
  passing to `free`) , check if the `allocatedPtr` is (no joke) equal to
  `0xDEADBEEF`, because there is otherwise no way to distinguish
  statically allocated from malloc'ed data...  (std.global_memref lowering
  to LLVM by happenstance sets the allocatedPtr equal to `0xDEADBEEF`,
  presumably mainly as a debugging thing)

Even with all this, we *still* need to (internally to refbackrt::invoke)
make copies of all inputs/outputs! And the details of how the LLVM-level
ABI gets laid out for e.g. function arguments/returns is still super
tricky.

This really highlights how deficient memref is as the general runtime
type for our use case. It's stewing in my mind how best to improve the
situation. My general gut feeling is that IREE's abstractions for this
are "right", but I need to think more how to distill those aspects of
IREE's design in a "reference" way for RefBackend.

Some implementation notes:

- In terms of how this is implemented, this did catch a bug in our ABI
  wrapper functions in LowerToLLVM.cpp, which I had to fix (it happened to
  work before through some combination of npcomprt::Tensor being passed as
  a single pointer + probably me infinite-monkey-ing it until it worked)

- This actually removes 2 out of the 3 compiler runtime functions (the
  only one left is "abort_if". (most of the memref descriptor code moved
  from CopmilerRuntime.cpp to Runtime.cpp)

  - this also means deleting `refbackrt.from_memref` and
  `refbackrt.to_memref`
2020-11-25 13:09:58 -08:00
Stella Laurenzo 3937dd14cb Add basicpy.numeric_constant op.
* Going through TODOs on the PyTorch side, this is a big cause of them (not being able to have constants for signed/unsigned).
* Added complex while in here since we're at the phase where it is better to just have things complete than partially done.
2020-11-24 16:44:40 -08:00
Stella Laurenzo bea0af419d NFC: Prefactor some basicpy ops in advance of more type work.
* Organizes the BasicPyOps.td file by function.
* Renamed `to_boolean` -> `as_predicate_value` (trying to consistently use "predicate" to refer to i1/low-level types and Bool/Boolean to refer to Python bool types).
2020-11-24 15:49:37 -08:00
Stella Laurenzo f03225b1f1 Bump llvm-project to f4f8a67aaf13bc66a2b7d55561b14a3724a5e0de.
* Incorporates source fixes.
* Uses upstream pybind11 detection logic.
* Patches CI.
* This may break the CI, which will need to be fixed manually in a followup.
2020-11-22 13:14:44 -08:00
Sean Silva 1dfcfa9cd1 Add aten.mm op and "test" it e2e.
Note that unlike aten.matmul which has dynamic behavior
depending on the argument ranks (can do matrix-matrix, matrix-vector,
batch matmul, etc.), aten.mm is just a vanilla matrix
multiply, which can be lowered precisely to tcf.matmul.

The "test" is really just an example that I stared at while getting my
feet wet with this. We probably want something that actually tests this
as part of `ninja check-npcomp`.
2020-11-20 17:21:24 -08:00
Sean Silva 32b2dc6ce7 Revert "Bump llvm-project to 369c51a74b5327464e27e0749ca7ac59ac1349ce"
This reverts commit c60d7b4aae.

It seems to have tickled some sort of pybind version issue:
https://github.com/llvm/mlir-npcomp/runs/1433414550?check_suite_focus=true
2020-11-20 15:09:18 -08:00
Sean Silva c60d7b4aae Bump llvm-project to 369c51a74b5327464e27e0749ca7ac59ac1349ce 2020-11-20 13:03:24 -08:00
Sean Silva 358159a6eb [RefBackend] Open-code shape.get_extent as extract_element
It was annoying that we were creating shape.get_extent in the middle of
the bufferization pipeline, as it required running convert-shape-to-std
at an awkward place. To make that cleaner, just open-code the
extract_element ops that shape.get_extent expands into.

This is a little gross, but it helps with the macroscopic pipeline
ordering issues. Anyway, the train is long-gone of trying to treat
shapes as some special data type that should only be operated on with
shape ops.

Also,
- reorder tensor constant bufferize (which is a module pass) to bracket
all the bufferization function passes, to make the parallelism
opportunities there clearer. Now we have a very clean little
bufferization segment of our pipeline construction.
2020-11-17 11:00:38 -08:00
Sean Silva 5227d52c26 [RefBackend] Use std.global_memref instead of homegrown thing
This vastly simplifies our code, allowing deleting multiple ops,
simplifying multiple passes, and removing a whole pass.

Now `refback` dialect is down to one op (refback.alloc_memref, which
simplifies allocations to just take a shape instead of individual
extents).
2020-11-13 18:43:50 -08:00
Sean Silva 1c7c362e29 [TCP] Replace tcp.matmul with linalg.matmul.
This involved adding a `tcp.splatted` op to splat a dynamically sized
init tensor. See rationale in TCPOps.td docs.

One interesting observation is that when lowering tcf.matmul to
linalg.matmul, we need to both 1) create the error checks and 2)
calculate a shape transfer function to create the init tensors.
Previously, 2) was deferred to bufferizing tcp.matmul later. I'm not
sure if this is a conflation of concerns or not. For now, it's not a big
burden.
2020-11-10 18:58:28 -08:00
Sean Silva 0427aacb0b [TCP] Replace elementwise ops with std elementwise ops. 2020-11-10 18:58:28 -08:00
Stella Laurenzo 6c702b149f Add a number of kernels and new patterns.
* convolution, convolution_backward, _log_softmax, _log_softmax_backward_data, nll_loss_forward, nll_loss_backward, nll_loss2d_forward, nll_loss2d_backward, copy_
* Extends the recognition logic and metadata for handling inplace transformations, optional tensors, ints, lists and dropped args.
* The kernel_calls generated by test_conv_nllloss_grads.py now convert to ATen.
* The result *almost* comes out as a pure tensor program with the exception of the copy_ op, which I will do some followup work to deal with.
* More progress on #97
2020-11-04 14:36:59 -08:00
Sean Silva 1874bf5eb1 NFC: Clean up some minor nits
- Remove GreedyPatternRewriteDriver.h from files that don't need it
- fix typo shouldBeCloned -> wouldBeCloned
2020-10-30 18:48:25 -07:00
Stella Laurenzo a3f4db9fe8 Bump llvm-project to c8c07b76b2cf2ada8e7ec132f7f57b97d76743cf.
* Several NFC changes to signatures/includes.
2020-10-29 15:25:55 -07:00
Stella Laurenzo c08935a418 Rewrite ATen ODS code generator to be based on new op registry and new signature recognition system.
* Deletes prior code generator from previous attempt (moved some of it into this one).
* Renames old generated tablegen source to "Legacy".
* Generates ODS and import rules for most binary and unary arithmetic ops.
* Removes old generated ops and integration tests that were testing details of the prior setup.
2020-10-28 10:37:37 -07:00
Aaron J Arthurs 94ea6f7c92 [RefBackend] Support element-wise multiply op
Register the following for the multiply op:
- tcf.mul
- tcp.mul
- TCP->TCP lowering
- Shape transfer, broadcasted multiplicands
- Lower to standard `MulFOp` op
2020-10-27 19:41:23 -07:00
Stella Laurenzo 510f226df2 Expose signature metadata to ops and implement ATenRecognizeKernelsPass pass.
* Two op interfaces, one for querying instance metadata and one for getting static data needed to construct an op from a generic form.
* For torch.generic_kernel ops, metadata is splatted in during capture from Torch (it comes from the op registry, which will work for either device capture or graph import).
* Moved the 'add' out of the generated set so I can experiment on it. It implements the TorchBuildableKernelOpInterface interface which provides its metadata.
* The ATenRecognizeKernelsPass pass generically lowers from a torch.generic_kernel to recognized ops that implement the TorchBuildableKernelOpInterface, handling the various types of transformations that we allow at this stage.
2020-10-26 20:31:45 -07:00