Commit Graph

840 Commits (98cb12e3e57f45398064efb3f57728caa6a6f759)

Author SHA1 Message Date
Sean Silva b7b7fd4959 Rewrite error reporting of e2e tests.
This now gives [much nicer output](https://gist.github.com/silvasean/f048e0f37b04542dae6469b86802bb3e).
Embarrassingly, we previously couldn't even report failures for two
different tests, and weren't able to report on compilation failures
(besides just crashing).
2021-05-20 11:28:20 -07:00
Sean Silva d66e8fe1f8 Get simple quantized model importing.
This is enough to import the program and get it through the compilation
pipeline. It of course fails at the VerifyBackendContract pass since
there is a lot missing, but the final IR for a simple quantized MLP is
looking pretty decent already:
[IR](https://gist.github.com/silvasean/f76bccd76e9b193d396cfb2f9a11f54d)

Main changes:
- Add support for importing torch quantized tensors, including
  `torch.per_tensor_affine.create` op and `!torch.qint8` element type.
- Add support for importing `LinearPackedParamsBase` (basically a weight
  + optional bias, but requires `torch.linear_params.create` op +
  `!torch.LinearParams` type to model it). This was less painful than I
  expected, as it has the necessary methods to opaquely unpack itself. I
  factored things so it should be easy to extend to other custom classes
  like `ConvPackedParamsBase`.
- Add minimal boilerplate for importing `quantized::*` ops, with
  `quantized::linear` being a motivating example.
- Add e2e test with simple quantized MLP (courtesy of @phoenix-meadowlark).

This is somewhat of an abuse of `!numpy.ndarray` / `tensor`, as
really the proper semantics of `!torch.qint8` dtype on a Torch tensor is
"check the quantizer object of the tensor for side data (scale/offset,
possibly per-channel) that defines the full semantics of the tensor". We
don't have any such notion of "side data" for `!numpy.ndarray` /
`tensor`, let alone anything that would have the associated behavior of
keying off the dtype to determine if the side data is present.
This will be fixed by a proper `!torch.tensor` type.
2021-05-20 11:28:20 -07:00
Sean Silva 2efda323ff Significantly restructure torch/aten import design.
This is a really major and invasive restructuring of the way we get
torch operators (`torch::jit::Operator` / `c10::OperatorHandle`) into
MLIR. Please forgive the challenging review, but due to the sheer
invasiveness, it wasn't really practical do do it in sane smaller
pieces.

This fully replaces everything that was already working on the
TorchScript path (actually, more -- we added tanh support to
TorchToLinalg in order to delete the older code paths). Additionally,
I've kept the lights on for the acap path too, including what little e2e
stuff was working before (for expediency I made a few tiny compromises
along the way that will be easy to undo when we give that path proper
attention).

Overview of the new design:
- The torch operator `somens::someunqualname.someoverloadname` is
  imported as `torch.somens.someunqualname.someoverloadname` (skip the
  last dotted part if the overload name is empty), OR, if we don't have
  such an op registered, it is imported as
  `torch.operator "somens.someunqualname.someoverloadname" (...) : ...`.
  - The addition of the "overload name" is a critical element here, as
    the `(ns,unqual,overload)` triple is unique, which solves a lot of
    problems we were having.
  - This involves having separate MLIR ops for the `trailing_` and
    `.out` variants and all the different overloads. This seemed
    necessary, because the set of overloads is so wild and varied and
    unstructured. The previous design was leaning into some underlying
    structure that just isn't there -- the default situation is
    the "random overload that we want to manage on the MLIR side",
    rather than that being an exception. E.g.  `aten::ne` (not-equal)
    has 21 overloads, only 4 of which are c10 dispatcher ops see
    [gist](https://gist.github.com/silvasean/190ba918c550c956260e21254e1b8aa1),
    and the "out" variant is really called `.Tensor_out` instead of
    `.out` as it frequently is for other ops.
  - Rationale for all being in `torch` namespace: the set of operators
    are so varied and unstructured that "dialect per namespace"
    doesn't result in anything resembling the typical MLIR dialect
    boundary expectations. We could maybe draw the boundary at
    dispatcher ops vs non-dispatcher ops, but that doesn't seem to
    really result in very much useful structure at this point in time.
  - Note: within the torch operator registry, we effectively have a
    mini-basicpy subdialect (already type-resolved), which is reasonably
    structured.
  - The existing Torch op interfaces are also removed -- now that we
    track the overload name, we can losslessly find the original
    operator.
- Instead of `ATenRecognizeKernelsPass`, we now have a
  `ReduceOpVariantsPass` that keys off certain traits (and perhaps
  eventually interfaces) to reduce variants of ops to a smaller set,
  ideally operating on immutable tensors and using surrounding ops to
  model the mutability/aliasing aspects.
  - Note: `torch.ns.unqual.overload` ops allow both immutable and
    mutable tensors (unlike the previous hard distinction in the common
    case). This is a premonition for a future change that will introduce a
    bona fide `!torch.tensor` type that will clean up a bunch of stuff.
- `TorchToLinalg` / `TorchToStd` supercede the existing
  "ATen->TCF->TCP->Linalg" path.
- The new `torch_ods_gen.py` supercedes `torch_signature_ods_gen.py`.
  It should look somewhat familiar, but the benefit of hindsight has
  allowed a lot of simplifications.

The overall trend seems to be to make the `torch` dialect a nice layer
independent of anything else. It feels like as a natural result of
various future changes we will be removing the reliance on basicpy+numpy
dialects and have a nice self-contained type system too that properly
models the TorchScript type system (including proper subtyping,
mutable/immutable tensors, optional dtype, etc.).

Recommended review order:
- Start at some of the new import IR, e.g. in
  `frontends/pytorch/test/node_import/prim.py`,
  `frontends/pytorch/test/acap_export/test_export_add3.py`, and other
  tests.
- `frontends/pytorch/python/torch_mlir_utils/codegen/torch_ods_gen.py`
  and associated generated files:
  - `include/npcomp/Dialect/Torch/IR/GeneratedAtenOps.td`
  - `include/npcomp/Dialect/Torch/IR/GeneratedPrimOps.td`
- Inspect `ReduceOpVariants.cpp` / `reduce-op-variants.mlir` and the new
  traits in `include/npcomp/Dialect/Torch/IR/TorchTraits.h`
- Various code changes in the import path in
  `frontends/pytorch/csrc/builder`. Probably most interesting is the new
  code in `torch_to_mlir_utils.cpp` that has the logic to create the
  `torch.operator` ops or `torch.ns.unqual.overload` ops.

This is the [new ResNet IR](https://gist.github.com/silvasean/5407aafb710d07612b7b5b92eabecebe),
just to be able to look at a substantial sample of IR in the new style.
2021-05-19 13:37:39 -07:00
Sean Silva 133bdf4b31 [cleanup] Add materializer for basicpy.singleton
This allows the canonicalizer to coalesce it like other constants.
2021-05-03 09:54:44 -07:00
Sean Silva 3d08c83580 Add flatten op recognition + shape refinement.
This op has complex aliasing semantics, so it is kept mutable for now.

With this, we reduce ResNet18 to a single BB with all aten operators
having rank + dtype:
https://gist.github.com/silvasean/2fcb1c6e4d4ae27461204a43ae9c5031
2021-05-03 09:54:44 -07:00
Sean Silva 122cae2ee3 Add aten::len.t, aten::size, and aten::gt.int primitive ops
Also add some canonicalizations that finally reduce ResNet down to a
single block.
2021-04-30 10:57:02 -07:00
Sean Silva ec6d06aa86 Add some more ResNet ops.
- aten::relu_, aten::max_pool2d, aten::adaptive_avg_pool2d, aten::batch_norm, aten::conv2d

No aten-to-linalg conversion for the latter ones, as they are fairly
substantial. At this point, I'm trying to get shape inference and stuff
working for them and the IR cleaned up.
2021-04-30 10:57:02 -07:00
Sean Silva 9257457d8a Add AllowsTypeRefinement trait and use it to improve RefineTypes
This trait lets us model the semantics of various aten/torch/numpy ops
that are insensitive to type refinements. This replaces
hardcoded/inconsistent checks for this property.

To show usage of this new trait, we fix up some old uses, and improve
RefineTypes to be smarter about rewriting with this trait.
2021-04-30 10:57:02 -07:00
Sean Silva 1c832604d2 Remove old aten-to-std / ATenLowering pass.
It was confusing now that we have `convert-aten-to-std`.
2021-04-30 10:57:02 -07:00
Sean Silva 55c3cc6624 Add recognition/folder/lowering for aten::__is__, aten::ne.int, and aten::dim
Interestingly, TorchScript has its own op (`torch::jit::Operator`)
registry separate from the dispatcher (it is a superset of the
dispatcher).

This is where the "prim" ops and some "aten" ops (that should probably
be renamed to "prim") live. In particular, `aten::__is__` is in that
latter category of "aten but really prim". This registry is also the
source of truth for what the TorchScript interpreter calls into when it
executes.

The bulk of the "not part of the dispatcher" ops live in
09feb5f579/torch/csrc/jit/runtime/register_prim_ops.cpp (L82)

And the registry itself lives in:
09feb5f579/torch/csrc/jit/runtime/operator.cpp (L196)

This fold further reduces the IR of ResNet by folding away some
more not-taken branches. These not-taken branches in ResNet require
first-class handling of the list type which we don't yet have on any
backend.
2021-04-30 10:57:02 -07:00
Sean Silva 7eb36b4ae7 Constant fold through basicpy.bool_cast.
This is the start of a push to getting ResNet running.

This involves throwing in the towel on an O0 pipelinie for now. See note
in the code. We keep an options struct with `optimize` flag, but it
default to true for now.
2021-04-30 10:57:02 -07:00
Sean Silva fb5f149e04 Reformat Passes.cpp and remove torch-globalize-pipeline.
The pipeline is subsumed by our lowering pipelines.
2021-04-30 10:57:02 -07:00
River Riddle 4678a7fedd Refactor RefineTypes to use the upstream ForwardDataFlowAnalysis engine
This removes the need for defining all of the custom propagation logic,
and also adds support for propagating value knowledge across branches,
through regions, and across calls.
2021-04-27 13:17:56 -07:00
Sean Silva 642482429c Bump llvm-project to 12011b5217929ef8a56c2099c6f3233934ea4fbc
- Rename FrozenRewritePatternList -> FrozenRewritePatternSet
2021-04-27 13:12:33 -07:00
Sean Silva 179105ca3e Add basic MLP's to the e2e curriculum.
These tests pass on the reference backend.

- Add aten.linear op + shape xfer function + ATen->Linalg lowering.
 - Note: this needs to be more automated, and needs to cover more cases.
 - Current not implemented caveats:
  - size-1 broadcasting for bias vector (either static-size-1 or ? case)
  - higher-rank aten.linear ops (not produced by torch.nn.Linear though)
  - type promotion (still don't even know the exact rules here)
- Add folder for torch.derefine op. Now the inliner can clean it up as
  it inlines. (call boundaries are a main place we need to insert
  torch.derefine) This is brittle -- the other important case is control
  flow which will need to be handled via an extension to
  RefineTypes.cpp (as will more robust call handling). River has an
  in-flight patch to update it to the new dataflow framework so I didn't
  want to do anything intrusive here.
    - Also adjust torch.derefine syntax to use the keyword `to` instead of
      `->`, as most type-only, cast-like ops do.
2021-04-27 12:18:54 -07:00
Sean Silva 9ba77c6e13 Add InlineGlobalSlots pass.
This inlines global slots if possible. This allows them to participate
in folding, canonicalization, shape inference, etc.

Example use cases:
- inlining weights and biases that are readonly during inference
- inlining the "training" bool to allow stuff to fold away

For training use cases (especially internal training loop), we will need
something smarter to get good performance. That would look like an "SSA
formation" which promotes the global slots to tensors in the program,
flushing them back to the slots at the minimal number of necessary
places. We might want to let backends do that transformation though.
This also interacts with shape inference (type bounds on the slots to
even lower them to backends in the first place).
2021-04-27 12:18:54 -07:00
Sean Silva 3a890aa26c Miscellaneous changes while trying to work on ResNet18
- Move frontend lowering pipelines to c++ (this helps with reproducing
  failures in npcomp-opt)
- Add debugging printouts when compilation fails on RefBackendTestConfig

The experience now when a test fails during MLIR lowering is now like this:
```
NPCOMP TorchScript Object Graph IR -> NPCOMP Backend IR lowering failed with the following diagnostics:
failed to legalize operation 'torch.global_slot'
Module does not conform to npcomp's backend contract. See dialect conversion legality information above.

Error can be reproduced with:
$ npcomp-opt -torchscript-to-npcomp-backend-pipeline /tmp/ResNet18Module.mlir
```

And when TorchScript->MLIR import fails it looks like this:
```
PyTorch TorchScript module -> NPCOMP Object Graph IR import failed with the following diagnostics:
unhandled prim operation: %18 : int = prim::min(%17) # /usr/local/google/home/silvasean/.local/lib/python3.9/site-packages/torch/nn/functional.py:4532:4
```

Also,
- Add `--filter=<regex>` to e2e test harness to filter tests.
- Add a few prim ops that were needed to import ResNet18
- Fix torch.prim.Loop.condition assemblyFormat (it previously would not
  round-trip in the case of no loop-carried variables)
2021-04-27 11:51:11 -07:00
Sean Silva 544cb4ef54 Bump llvm-project to 484b6648fdd4b104eaf7a2504dd07b60af2c9f8d
- add_mlir_doc arg order
- fix some dependent dialects on passes that were now causing errors
- "encoding" attribute on mlirRankedTensorTypeGetChecked
2021-04-22 18:12:55 -07:00
Sean Silva fef1733e12 Fix issue with unused functions in torch::jit::CompilationUnit
As described in the code comment:

```
When we import TorchScript IR, we import their entire "compilation unit",
which can contain numerous functions unrelated to the current program,
which breaks torch-globalization-pipeline; for example, there can be
random functions referencing types that haven't been imported
as part of the root `torch.nn.Module` we imported. Those will
be unreferenced private functions which symbol-dce will clean up nicely.
```

This situation is really easy to hit in jupyter notebooks, where the
same cell is evaluated multiple times. That results in the same
class name (at the Python level, e.g. class `Foo` in the top-level
main module). Internally to PyTorch, it handles this situation by
mangling in a unique number to the names of ClassType's and such. When
we import the new ClassType's, we see not just the new
torch::jit::Function's in the CompilationUnit, but, also all the old
ones, which reference ClassType's that are not reachable from the
`torch.nn.Module` that we imported.

Note: there is no way to avoid importing the whole CompilationUnit
(including these old remnants) without doing a fairly complicated call
graph reachability analysis of which functions are reachable from the
methods of the ClassType's we imported. It turns out that once we are
inside MLIR, we model visibility correctly so that `symbol-dce`
"Just Works" for this use case. That is to say, this is not a quick
hack, but rather seems like a totally palatable long-term solution.
2021-04-20 12:00:35 -07:00
Sean Silva c4123d4d4d Add npcomp-verify-backend-contract pass.
This pass verifies that a given module satisfies the contract that we
have for backends. This is phrased as an "allowlist", because we want to
keep this interface tight. Also, this gives much better diagnostics than
a backend randomly crashing or failing to compile would (though they
could still be improved).

This was especially painful because if we had
`tensor<?x!numpy.any_dtype>` slip through, at some point RefBackend
would convert it to a memref type and trip the "verify type invariants"
assertion which gives no location or anything and crashed the process,
which was very unpleasant.

We implement this with the dialect conversion framework, which works
reasonably well and was quick to put together and familiar, but is still
very "op oriented". We probably want to make this hand-rolled
eventually, especially the error reporting (the most useful kind of
error for a dialect conversion user is not necessarily the best for this
use case). Also, in production, these error will go to users, and need
to be surfaced carefully such as "the compiler needs a type annotation
on this function parameter" which in general requires some special
analysis, wordsmithing, and overall awareness of the e2e use case (such
as how much we can lean into certain source locations) to provide a
meaningful user-level diagnostic.

Also, add `inline` to the current frontend lowering pass pipeline to
allow slightly more complicated programs that otherwise would fail on
shape inference.
2021-04-20 12:00:35 -07:00
Sean Silva f5dfa02523 Add `aten.mm` to linalg lowering.
This is our first op with error semantics, and stresses the system.

There are a few design notes of special interest:
- RefineTypes.cpp's note about shape inference in the presence of code
  that dynamically produces and error, and it is provable statically.
- ATenToLinalg.cpp's notes about future automation of the ATen->linalg
  path.
- The notes in Passes.td about using low-tech `std.assert` ops instead
  of `shape.assuming`.

Note: Doesn't work on IREE yet due to the `std.assert` op (needs to be
lowered to `vm.fail` on the IREE side).
2021-04-16 12:03:31 -07:00
Sean Silva 28a0f02746 Add support for compiling through IREE.
Recommended review order:
- Changes in frontends/pytorch/examples/
- Changes in python/npcomp/compiler/pytorch/backend/
- Boilerplate for the `npcomp-iree-backend-lower-linkage` pass.

This change separates out a
`npcomp.compiler.pytorch.backend.frontend_lowering` module that does the
common lowering for all backends. The individual compiler backends
`npcomp.compiler.pytorch.backend.{refjit,iree}` now accept a loosely
defined "TCP + scalar code" IR mix that will be formalized in the
future as the interface to codegen backends.

This also required adding a small pass
`npcomp-iree-backend-lower-linkage` which adds `iree.module.export` onto
functions, and layering that into the frontend flow. The pass doesn't
require a C++-level dependency on IREE, which is nice for now. TBD how
we are going to handle lists (we hope we can get away with sneakerneting
some td files and relying on loose IR compatibility).

Running through IREE requires the ability to import `iree.compiler` and
`iree.runtime`, which can be obtained as follows:
```
python3 -m pip install iree-compiler-snapshot iree-runtime-snapshot -f https://github.com/google/iree/releases/tag/snapshot-20210406.200
PYTHONPATH="${PYTHONPATH}:${MY_IREE_BUILD}/bindings/python/"
```

This patch makes it painfully clear that we don't have any e2e testing
harness to really plug into, and also don't have a usable Python API to
our compiler stack (something usable in a jupyter notebook).
That will be addressed in subsequent commits. We've been flying by the
seat of our pants with this `examples` directory that isn't subject to
any kind of testing or real usability concerns.
2021-04-09 13:15:07 -07:00
Aaron J Arthurs f9d9518f6e Declare TCP dialect dependency in TCFToTCP conversion 2021-04-07 14:23:56 -07:00
Sean Silva 927546b3c5 Add RefinePublicReturn pass.
This pass allows shape information to be propagated to return types,
which is nontrivial and cannot be cleanly put anywhere else as it
changes the public ABI, which is a concern that we want to keep
concentrated in one place.
2021-04-07 11:06:34 -07:00
Sean Silva 1e357ae680 Add simple type refinement pass.
Currently implemented as a simple intraprocedural dataflow analysis over
a standard ShapedType lattice (hasRank, sizes, and elementType).

It currently hardcodes a few key pieces of information:
- shape transfer functions
- whether it is legal to update the operand type of an op

This needs to be made pluggable obviously and the core propagation logic
moved somewhere agnostic.
2021-04-07 11:06:34 -07:00
Sean Silva 6431b0f11f Add primitive ArrayToTensor (numpy-array-to-tensor) pass.
The current implementation is just sufficient to do a unary aten.tanh
from the e2e spike, and just applies some local rewrite patterns.  I've
sketched out the more full explanation of where this pass eventually
need to go in the pass docs.

Adding this required adding `numpy.tensor_static_info_cast`, which is
the tensor analog of `numpy.static_info_cast`. This op encapsulates the
same numpy-specific "no runtime code" casting semantics, in particular
the interpretation of `!numpy.any_dtype`. The
`numpy.tensor_static_info_cast` I see in practice now are "information
erasing" and will be removed by a later pass that exploits the fact that
aten ops are agnostic to the static info in the operand types (so
substituting a type with more static info is fine).

Side note: we *need* to do dtype and rank inference before aten->tcf
(which will eventually mostly be aten->linalg+guards), because each aten
op is idiosyncratically overloaded based on dtype and rank. Without
copying that idiosyncratic overloading into lower layers (layering
violation), we cannot really lower it to anything until we do that.
2021-04-05 17:56:35 -07:00
Sean Silva 30356c41c8 Add torch-adjust-calling-conventions pass.
This pass incorporates torch.type_bound info and also removes NoneType
returns (eventually it will rewrite tuple types too, but can't yet
because !basicpy.TupleType doesn't track element types).

Recommend looking at adjust-calling-conventions.mlir first to see what
it is doing, and holding your nose for the implementation of the pass.
I decided to implement this with the conversion framework, because it
gives us *some* goodies for type conversion -- mainly avoiding large
amounts of tricky RAUW dances. Unfortunately, the conversion framework
isn't a perfect fit for a couple reasons:
- the incorporation of torch.type_bound is a context-sensitive rewrite
  (requires looking at the arg attr, not just the type).
- NoneType conversion is 1->0, which requires some special handling
- (not implemented yet) 1->N tuple type conversions require special
  handling.
It's a little bit scary, but on balance doing it the other way would
have its own downsides.
2021-04-05 17:56:35 -07:00
Sean Silva 464feacba9 Bump llvm-project to 223dcdcfbe23affdf17ada7f023ee1872fd76160
- ModuleOp no longer has a terminator.
2021-04-05 17:56:35 -07:00
Sean Silva e749074bae Basic infra for annotate shapes and dtypes on arguments.
These allow users to annotate a known "type bound" on the argument,
which can seed shape/dtype inference. We don't rewrite the function
types as part of the import process (it will happen in a
yet-to-be-written pass) because:

1. We would need to interprocedurally rewrite all calls to keep the IR
   consistent. Currently, we have a place after GlobalizeObjectGraph but
   before we convert to tensors where this is convenient to do. Ideally,
   we would do this on the object graph representation.

1. We don't necessarily know that adjusting the function type is a legal
   calling convention change. The pass will have blessed knowledge (by
   the pass pipeline author) that adjusting the argument type based on
   the type bound is safe (which it frequently is).

2. Note that in principle, a type bound could be a fairly general thing
   (such as maximum sizes of dimensions, unions of multiple concrete
   types, etc.). The pass will in principle have logic to interpret the
   type bounds and to determine a suitable "best" (and legal) argument
   type.
2021-04-01 18:40:03 -07:00
Sean Silva 7a4043b7c4 Add ability to compile from object graph ir. 2021-03-31 09:25:13 -07:00
Sean Silva c6d56fed8a Add unary tanh lowering. 2021-03-30 16:39:49 -07:00
Sean Silva 641098be54 Clean up some compiler warnings on my machine. 2021-03-23 14:29:05 -07:00
Sean Silva 99178a167d Bump llvm-project to 0524a09cc7e1a0797982feacf505825231efbee7
- renames of OwningRewritePatternList -> RewritePatternSet
  - also `insert` to `add`
- RewritePatternSet holds a context now
- memref dialect split from std
2021-03-23 14:29:05 -07:00
Bryce Arden 4591884d06 [refbackrt] Scalar arg support
* Adds f32 scalar argument support across the ABI boundary.
* Adds support for passing input type / shape information
  across the ABI boundary
* Adds support for parsing / creating input FloatAttr's in
  `npcomp-run-mlir`
2021-03-23 13:16:44 -07:00
Sean Silva 703428eff4 Add support for "trailing_" and "out" variants of various ops.
We already had the `promoteTrailingOutTensor` flag, but weren't using
it. A inplaceVariantKernelName flag needed to be added.

This change is a little dissatisfying, as the conversions done by the
RecognizeKernelsPass are currently non-orthogonal. In particular,
`kDropResultAndAliasArg0` probably won't work as intended if mixed with
these (we probably need to promote kDropResultAndAliasArg0 to not be an
arg-level thing anyway, as we have done with promoteTrailingOutTensor).

This involved adding a new op `numpy.overwrite_array`.

```
numpy.overwrite_array %arg2 overwrites %arg0 : tensor<2x3xf32>, !numpy.ndarray<[2,3]:f32>
```

This models the destructive update behavior. Note that in the above op,
we cannot simply RAUW %arg0 with a suitably conveted %arg2 (for example,
%arg0 might have uses that are not dominated by %arg2, or might have an
alias relation with some other array in the program). In general, we
need a pass analogous to "SSA-formation" which knows how to see through
these to uncover an underlying tensor program.

Also, add tanh_out_e2e.py/div_inplace_e2e.py and fix some bitrot in
refjit.py which is my running example I'm trying to get working.
2021-03-19 10:34:50 -07:00
Aaron Arthurs 4fd9b4afb5
Import ATen conv2d conversion and test (#180)
* Import ATen conv2d conversion and test

This is a first attempt at expanding ATen-to-TCF conversion for the
conv2d operator. Eventually, this will come in use when lowering a
high-level conv-based model.
2021-03-12 17:21:16 -08:00
Sean Silva 58c7030104 Support multiple instances of a class in GlobalizeObjectGraph.
This happens in practice with e.g. ResNet from torchvision (multiple
instances of the same BatchNorm class).

The key observation is that for this program, and the expected set of
programs, we can convert the program to the same globalized form with a
bit more static analysis and effort to suitably monomorphize the
program. Though what we are doing here is fairly annoying to implement,
it saves any nontrivial later pass from having to do similar analyses
(or worse). E.g. shape inference would need to be object-graph aware,
mutation/lifetime analyses would have to be aware, etc. Additionally, it
would make us front-load what it means to have a !torch.nn.Module type
on an ABI boundary, which we are just not ready to handle.

I'm really, really hoping that in practice we can get away with
this, otherwise it's going to be really rough designing a representation
(and implementing everything to back it) that is convenient to transform
and gracefully scales from full object graph (in the most dynamic case)
down to a fixed set of global slots like we have here (in the most
static case, which we presume a lot of practical programs fall into).

This also involved introducing a
`torch-prepare-for-globalize-object-graph` pass that does a minimal set of
lowerings to simplify the IR into a more orthogonal and analyzable form,
and a `torch-globalize-pipeline` helper.

Recommended review order:
- updated documentation in Passes.td
- new tests in `globalize-object-graph-multiple-instances*.mlir`
- implementation of GlobalizeObjectGraph.cpp
- PrepareForGlobalizeObjectGraph.cpp + prepare-for-globalize-object-graph.mlir
- misc stuff like torch-globalize-pipeline pipeline definition.

With this, we can import, globalize, and inline resnet18 from
torchvision:
https://gist.github.com/silvasean/821586afc19b67d9fb72030b2e0adeb8
2021-03-11 19:21:07 -08:00
Sean Silva 2750d2084c Add prim::device and handle derefining for prim::CallMethod 2021-03-11 14:10:09 -08:00
Bairen Yi 5fed296904 Address missing default label in switch statement
Signed-off-by: Bairen Yi <yibairen.byron@bytedance.com>
2021-03-11 11:55:59 -08:00
Bairen Yi 5315598947 Update .getAttrs to ->getAttrs as it is deprecated.
Signed-off-by: Bairen Yi <yibairen.byron@bytedance.com>
2021-03-11 11:55:59 -08:00
Bryce Arden e7a8fd76e2
[refbackrt] Update Invoke API to support more than just Tensor's (#181) 2021-03-10 15:39:26 -08:00
Bairen Yi 53b01cb9ba Bump llvm-project to e31c77b1827fa4dd3511f21af11cfab18ecf6d38
Signed-off-by: Bairen Yi <yibairen.byron@bytedance.com>
2021-03-10 11:01:16 -08:00
Sean Silva 43dba03afd Properly model "derefinement".
In terms of IR structure, TorchScript allows types to vary in many
circumstances where MLIR requires pointer-identical types. In particular,
it is valid to pass any subtype in place of a type. For example, if an
`Optional[int]` is required somewhere in the IR, it is legal to pass a
value of just `int` (but not the other way around; see
`torch.prim.unchecked_cast`). In effect, every *use* can have a different
type.

We introduce a new op `torch.derefine` that models that impedance
mismatch. This op allows casting a value from one type to a type that it
is a subtype of to model this behavior.

Recommended review order:
- TorchOps.td for new torch.derefine (and updated docs for
  `torch.prim.unchecked_cast`)
- new test code in if.py, loop.py, function-derefine.py
- new code in node_importer.cpp for handling derefinement insertion
- function_importer.cpp and utils changes in torch_to_mlir_utils.cpp

Properly handling derefinement on function boundaries required
relayering the code so that graph_importer.cpp/.h is now
function_importer.cpp/.h because only the `torch::jit::Function`
(actually the `c10::FunctionSchema` it holds) knows the derefined types that are
actually needed at the boundary (see `function-derefine.py` for a test).

Annoyingly, this churns all the functions which are now prefixed with
`__torch__.` but that is more correct anyway (that is their linkage name
in the `torch::jit::CompilationUnit`; the previous `mb.import_function`
was actually buggy in the case of functions calling each other as it
would reference their unqualified name).

With this change, we can import `resnet18` from `torchvision` :)
IR: https://gist.github.com/silvasean/6426a5272d8a6c7caae533fce05ab704
2021-03-03 15:09:44 -08:00
Sean Silva 939d36906f Add support for prim::Loop op.
This is a funny one. It combines a `for` and `while` loop in one op. We
will need to write some conversions to `scf`.
2021-03-02 16:01:34 -08:00
Sean Silva c837dbb077 Properly import the entire torch::jit::CompilationUnit
This primarily unlocks proper handling of free functions (that is,
functions that are not methods of any torch.nn.Module).

Recommended review order:
- `ivalue_importer.cpp` + `ivalue_import/functions*.py`
- `GlobalizeObjectGraph.cpp` + test case
- misc other stuff

The `torch::jit::CompilationUnit` is basically a backing store or
"context" holding all the possible functions in the program. The
previous code was not explicitly accessing this data structure, since it
just imported the `torch::jit::Function`'s that it saw attached to
methods.

Subtly, any time a TorchScript module called into a free function, the
free function gets incorporated into the torch::jit::CompilationUnit,
but doesn't show up anywhere when dumping the module, except in the
curious pattern:

```
%5 : Function = prim::Constant[name="adaptive_avg_pool2d"]()
%6 : Tensor = prim::CallFunction(%5, %input.1, %4)
```

That is, calls are indirect calls, and are accessed via `prim::Constant`
materializing a function object. Even stranger, the `name` attribute here
doesn't really even tell the full story -- it doesn't correspond to
anything. It turns out that the c10::FunctionType itself actually holds
a pointer to the `torch::jit::Function` in the compilation unit
directly (so there is actually no indirection in prim::CallMethod,
because any two values of the same FunctionType call the same
function!). E.g. when converting the IR to bytecode, the "name" is
ignored [code link](1d6bd15790/torch/csrc/jit/runtime/interpreter.cpp (L937)).
We do import `prim::CallFunction` as a `std.call_indirect` though
because it's more braindead to do it that way (it gets canonicalized to
a direct call easily).
2021-03-01 12:08:01 -08:00
Sean Silva 79a3f639bf Give torch.global_slot an initializer region.
This is a much simpler representation than the ad-hoc initializer
function we had before. It is also less general, but given the rationale
in Passes.td it seems like the right tradeoff right now.

We can probably carry this representation for quite a while, and when we
can't, it likely means that TorchScript has fixed their object identity
bug and we probably need to just upgrade to a more general object graph
modeling (more general than GlobalizeObjectGraph).

In particular, we don't want to deal with defining and carrying around
this initializer function concept until we need it. For example, if we
want to constant-fold the global slots into uses, this is a much better
representation, and it plays better with symbol-dce (the initializer
function counts as a "use" of the symbol).

(the alternative would have been to write a pass that converts the
initializer function to this form when possible, but I realized that
lots of information had been lost which made that fairly annoying -- it
was all self-inflicted anyway, so best to just go to the source
(GlobalizeObjectGraph) before the information is lost)

Now symbol-dce works nicely (no more "training" bools)
```
pt_util ~/tmp/classifier.pt --import --exported-name forward \
| npcomp-opt -torch-globalize-object-graph -inline -symbol-dce
```
IR: https://gist.github.com/silvasean/8abe63d70d24e29d6db9170ccc8d512b
2021-02-26 16:24:19 -08:00
Sean Silva a375ccf9da Add ability to annotate TorchScript classes.
The first use case is to annotate certain program constructs as either
exported or private. In this commit we plumb it down to
GlobalizeObjectGraph which makes use of this information.

Recommended review order:
1. class_annotator.h/.cpp + `test/module_import/annotations/*`
    - New abstractions to communicate with Python code and annotate.
2. IR changes in TorchOps.td
    - Adding "private" attribute to various things.
3. ivalue_import.cpp changes
    - Module + ClassAnnotator = annotated IR
4. GlobalizeObjectGraph.cpp + tests
    - use new "private" attributes to create "private" IR.
    - also, tweak some of the op deleting mechanics, which was triggering
      some memory errors / assertions

With this, we can run the classifier through and inline it as follows:
```
frontends/pytorch/utils/pt_util.py --import --exported-name forward ~/tmp/classifier.pt \
| npcomp-opt -torch-globalize-object-graph -inline
```
IR: https://gist.github.com/silvasean/32dcad9f6270557f412094a77cecdd69
2021-02-25 11:28:34 -08:00
Sean Silva c424c24ed8 Bump llvm-project to c68d2895a1f4019b387c69d1e5eec31b0eb5e7b0
- dialect registration
- StringAttr::get: order of context arg
- math dialect
- LogicalResult nodiscard
- error message for invalid broadcast
2021-02-22 12:23:24 -08:00
Sean Silva 8486968925 Add trivial inliner interfaces.
With this + manually setting private visibility on everything, a simple
classifier can be reduced to this IR, which is looking pretty lean and
mean:
https://gist.github.com/silvasean/19e7e2e21a61ff197aeac0dd864d188f

Also, include a utility script for importing `.pt` models.

```
pt_util.py --import classifier.pt | npcomp-opt -torch-globalize-object-graph
```
2021-02-22 10:40:38 -08:00
Sean Silva 1b769f7841 Extend GlobalizeObjectGraph to handle torch.prim.GetAttr returning NnModuleType
This happens in practice. With this, we can globalize slots for the
non-trivial classifier layer obtained from
https://github.com/NVIDIA/NeMo/blob/main/tutorials/nlp/Joint_Intent_and_Slot_Classification.ipynb

This also adds support for tuple return types, which were needed by that
model.
2021-02-19 10:23:25 -08:00
Sean Silva 158c5c484d Implement GlobalizeObjectGraph transformation.
This required restructuring of how we model TorchScript on import. The
main difference is that now we split out a `torch.class_type` that holds
methods and declarations of the types of each slot. This is more
consistent with TorchScript (our previous representation was
"denormalized").

Recommended reading order:
1. check out the description of `torch.class_type` in `TorchOps.td` and
   look at `test/Dialect/Torch/ops.mlir` and
   `frontends/pytorch/test/module_import/` to familiarize with the new
   representation.
   - Just look at the new IR. The diff between the old names and new
     names is confusing.
2. check out `test/Dialect/Torch/globalize-object-graph*.mlir`
   and read along with the pass description in
   `include/npcomp/Dialect/Torch/Transforms/Passes.td`
3. Read the code in `GlobalizeObjectGraph.cpp` and miscellaneous changes
   in `ivalue_importer.cpp`, `TorchOps.cpp`, etc.
2021-02-18 18:18:47 -08:00
Sean Silva 7f7bf39551 Add prim::Print and fix prim::CallMethod
For now, we are treating strings as bytes.
2021-02-10 15:15:56 -08:00
Aaron J Arthurs 484fe0d9bd Reformat code 2021-01-28 12:01:35 -08:00
Aaron J Arthurs c0e14da888 Fix TensorFromElementsOp reference 2021-01-28 12:01:35 -08:00
Aaron J Arthurs fc650c9447 Import TCP pad 2021-01-28 12:01:35 -08:00
Sean Silva 689b40c7a6 Add initial TorchScript module importer
It turns out that this was easiest to structure as a general IValue
importer, since torch module are just one of the possible IValue's.

We import the IValue object graph in a braindead fashion into basicpy
ops and a new `torch.nn_module` op that is used to model the
attributes/methods of a torch::jit::Module IValue. See `Torch/ops.mlir`
for an example, and also check out the .py import tests in
`frontends/pytorch/test/module_import`.

As part of this change, a few housekeeping tasks:
- extract some helpers from graph_importer.cpp
- more helpers around the C API
- misc touchups
2021-01-28 11:55:17 -08:00
Sean Silva 1965ac4d67 NFC: mark some methods as `override`
This silences some warnings I was seeing locally.
2021-01-21 11:48:41 -08:00
Sean Silva 3f4161635c Bump llvm-project to be7352c00d51f4358db3a23ed6a077f7cb48eafd
- TensorFromElementsOp -> tensor::FromElementsOp
- `cmpi "eq", ...` -> `cmpi eq, ...`. Same for `cmpf`
- syntax change for private func ops
- some changes to the python bindings
2021-01-21 11:16:55 -08:00
Sean Silva 6351474382 Bump llvm-project to bc556e5685c0f97e79fb7b3c6f15cc5062db8e36
- `let typeDesription` -> `let description`
- LLVMIntegerType -> IntegerType
2021-01-08 14:18:09 -08:00
Stella Laurenzo 3f706473fd NFC: Delete npcomp python API and switch to upstream.
* Most updates are mechanical except:
  * python/npcomp/__init__.py and python/NpcompModule.cpp: New init/registration bits to replace some automatic things being done in the old bindings. Also an annoying linkage hack that I'll need to triage next.
  * NpcompModule.cpp: New python helpers for custom types and other hard to reach items (for the new bindings).
  * PybindUtils.h: Extended type casting so that the local extension can directly exchange Mlir* C types.
  * python/npcomp/dialects/*: Build support and ODS bindings for local dialects.
  * mlir_utils.py: Defines an ImportContext to replace the old/bad "Helper" class that tracked locations, and insertion points. This has a number of methods on it that would be good candidates to think about better ways to do them upstream.
* Also hoisted a few stand-alone samples to dedicated unit tests as they covered important things.
* More cleanup can be done, but keeping this patch as mechanical as possible to stay in NFC land (this is big enough).
2021-01-08 10:46:24 -08:00
Sean Silva 97d6d04d41 Bump llvm-project to 16c6e9c58e9ae50a775945e6b407f1891f353d2f
Changes:
- linalg init tensor change (outs+init -> just outs)
- IntegerType::get and other builtin types now take the context as the
  first arg
- LLVMType::* is gone. Now LLVM Types are just regular Type's.
2021-01-05 16:12:11 -08:00
powderluv 4237172bbf
Fix OSX builds. (#143)
--version_script doesn't work on OSX.
Shared libs are .dylibs on OSX.

TEST=Build on iMac Pro. M1 has other issues will be fixed later

Change-Id: I2bda46349a878b8265e273c05d8db6b46c0df633
2020-12-28 01:30:45 -08:00
Aaron Arthurs 85898aaf10
Add TCF convolutional op with bias addition (#137) 2020-12-15 12:53:12 -08:00
Sean Silva d818043986 Bump llvm-project to d50d7c37a159802c89454a6c53c0ec2e7949d84a
Fixes:
- use `op->(method on Operation)`
- update for MlirIdentifier in signature of mlirNamedAttributeGet
2020-12-14 14:30:51 -08:00
Sean Silva b2077738ca Bump llvm-project to 444822d77a7fea28aa49edf24533c987efa1b2ee
Fixes:
- renames StandardTypes -> BuiltinTypes
- std.extract_element -> tensor.extract
2020-12-11 14:43:38 -08:00
Sean Silva 251aa6e435 Bump llvm-project to 774f1d3ffd458d6cb82d5039758ef1cf6370957f
Date:   Mon Nov 30 15:20:30 2020 -0800

Changes:
- finalizing-bufferize is stricter now, and we need to pull in a DimOp
  bufferization which was previously working by happenstance. The
  offending DimOp's are actually created by the linalg bufferization
  (which creates dim ops on the original tensor values, not the
  converted memrefs), so the fix is moving std-bufferize after
  linalg-bufferize.
2020-11-30 18:40:13 -08:00
Sean Silva f9b32a99fc Bump llvm-project to 164410324d8bf3b5a99e39f7dfe3c6d6972dab30
Date:   Mon Nov 30 12:44:35 2020 -0800

Fixes:
- func-bufferize is no longer finalizing, so we need to add
  finalizing-bufferize.
2020-11-30 13:58:13 -08:00
Sean Silva 955fd3eeda Add some much-needed comments around refbackrt::invoke.
This code is really tricky, and was not commented.
2020-11-25 15:39:41 -08:00
Sean Silva 46aa6d0a24 [RefBackend] Fix leaks related to ABI boundaries.
Best as I can tell (e.g. from LeakSanitizer), this fixes all the leaks
except for those due to buffers created internally to the codegenned
code itself (up next I'll add the buffer deallocation pass to fix
those).

The main change is that instead of attempting to pass `refbackrt::Tensor`
to the codegenned function directly, we make all the ABI types be
UnrankedMemRef which gets passed awkwardly (but workably) as a
`{size_t rank, void *ptrToDescriptor}` on the ABI. The reason why
refbackrt::Tensor wasn't workable is that is that MLIR doesn't really
have a way to deal with the lifetime of unranked memref descriptors that
happen inside the function, which is inevitably what would happen in the
old code that would emit runtime calls to
`refbackrt.to_memref/refbackrt.from_memref` to convert back and forth to
`refbackrt::Tensor` inside the codegenned code.

So, instead of the `refbackrt.to_memref/refbackrt.from_memref` with no
real sound basis for valid lifetime management, we now have a lovely
piece of code in `refbackrt::invoke` in `Runtime.cpp` that just barely
seems to be sound. We rely on the codegenned code having these
properties, which it seems to have:

- it won't free memref descriptors or their backing buffer for arguments
  of UnrankedMemRef type.

- it will allocate a separate memref descriptor for each result
  UnrankedMemRef (which is ensured by having a separate memref_cast for
  each)

- we can sniff the `allocatedPtr`'s (i.e. the backing buffer pointers)
  to avoid double-freeing in the case of aliasing of the backing buffer
  (including backing buffers for arguments feeding into results)

- to catch the case of statically allocated data (which we need to avoid
  passing to `free`) , check if the `allocatedPtr` is (no joke) equal to
  `0xDEADBEEF`, because there is otherwise no way to distinguish
  statically allocated from malloc'ed data...  (std.global_memref lowering
  to LLVM by happenstance sets the allocatedPtr equal to `0xDEADBEEF`,
  presumably mainly as a debugging thing)

Even with all this, we *still* need to (internally to refbackrt::invoke)
make copies of all inputs/outputs! And the details of how the LLVM-level
ABI gets laid out for e.g. function arguments/returns is still super
tricky.

This really highlights how deficient memref is as the general runtime
type for our use case. It's stewing in my mind how best to improve the
situation. My general gut feeling is that IREE's abstractions for this
are "right", but I need to think more how to distill those aspects of
IREE's design in a "reference" way for RefBackend.

Some implementation notes:

- In terms of how this is implemented, this did catch a bug in our ABI
  wrapper functions in LowerToLLVM.cpp, which I had to fix (it happened to
  work before through some combination of npcomprt::Tensor being passed as
  a single pointer + probably me infinite-monkey-ing it until it worked)

- This actually removes 2 out of the 3 compiler runtime functions (the
  only one left is "abort_if". (most of the memref descriptor code moved
  from CopmilerRuntime.cpp to Runtime.cpp)

  - this also means deleting `refbackrt.from_memref` and
  `refbackrt.to_memref`
2020-11-25 13:09:58 -08:00
Stella Laurenzo 3937dd14cb Add basicpy.numeric_constant op.
* Going through TODOs on the PyTorch side, this is a big cause of them (not being able to have constants for signed/unsigned).
* Added complex while in here since we're at the phase where it is better to just have things complete than partially done.
2020-11-24 16:44:40 -08:00
Stella Laurenzo bea0af419d NFC: Prefactor some basicpy ops in advance of more type work.
* Organizes the BasicPyOps.td file by function.
* Renamed `to_boolean` -> `as_predicate_value` (trying to consistently use "predicate" to refer to i1/low-level types and Bool/Boolean to refer to Python bool types).
2020-11-24 15:49:37 -08:00
Sean Silva 0b7c443256 [RefBackend] Properly initialize refbackrt::Tensor refcount.
Although `refCount` is initialized as `std::atomic<int> refCount{0};` in
the definition of Tensor, our tail-allocating malloc would ignore it,
resulting in bogus values that led to leaks.

Caught with LeakSanitizer, but I added an assertion that the refcount is
non-negative to begin with, which should catch this bug in the future
fairly consistently (assuming the garbage refcount is negative half the
time).
2020-11-24 12:01:35 -08:00
Stella Laurenzo 78a3c90758 Add TorchScript graph importer.
* Does not handle all features yet but should conservatively fail on unsupported things.
* Location tracking is still somewhat mismatched between what TorchScript and MLIR do. Likely need a better heuristic for tracking locations from defs for nodes that do not carry location.
* Sets the ground-work for a specialized/generic split but only implements the generic side.
* Had some evidence that this requires a recent bump of PT nightly (within the last month) to pick up pybind11 2.6, which includes some cross-module symbol fixes (vs the previously sync'd version). No source changes, but older versions fail to cast function types at runtime.
2020-11-23 14:20:09 -08:00
Stella Laurenzo f03225b1f1 Bump llvm-project to f4f8a67aaf13bc66a2b7d55561b14a3724a5e0de.
* Incorporates source fixes.
* Uses upstream pybind11 detection logic.
* Patches CI.
* This may break the CI, which will need to be fixed manually in a followup.
2020-11-22 13:14:44 -08:00
Sean Silva 1dfcfa9cd1 Add aten.mm op and "test" it e2e.
Note that unlike aten.matmul which has dynamic behavior
depending on the argument ranks (can do matrix-matrix, matrix-vector,
batch matmul, etc.), aten.mm is just a vanilla matrix
multiply, which can be lowered precisely to tcf.matmul.

The "test" is really just an example that I stared at while getting my
feet wet with this. We probably want something that actually tests this
as part of `ninja check-npcomp`.
2020-11-20 17:21:24 -08:00
Sean Silva 32b2dc6ce7 Revert "Bump llvm-project to 369c51a74b5327464e27e0749ca7ac59ac1349ce"
This reverts commit c60d7b4aae.

It seems to have tickled some sort of pybind version issue:
https://github.com/llvm/mlir-npcomp/runs/1433414550?check_suite_focus=true
2020-11-20 15:09:18 -08:00
Sean Silva c60d7b4aae Bump llvm-project to 369c51a74b5327464e27e0749ca7ac59ac1349ce 2020-11-20 13:03:24 -08:00
Sean Silva 64a7e83184 [RefBackend] Add refback-tcf-to-tcp-pipeline
This allows invoking TCF to TCP-level conversion more easily, and starts
us towards a path of factoring it out of the RefBackend.
2020-11-17 12:33:37 -08:00
Sean Silva 358159a6eb [RefBackend] Open-code shape.get_extent as extract_element
It was annoying that we were creating shape.get_extent in the middle of
the bufferization pipeline, as it required running convert-shape-to-std
at an awkward place. To make that cleaner, just open-code the
extract_element ops that shape.get_extent expands into.

This is a little gross, but it helps with the macroscopic pipeline
ordering issues. Anyway, the train is long-gone of trying to treat
shapes as some special data type that should only be operated on with
shape ops.

Also,
- reorder tensor constant bufferize (which is a module pass) to bracket
all the bufferization function passes, to make the parallelism
opportunities there clearer. Now we have a very clean little
bufferization segment of our pipeline construction.
2020-11-17 11:00:38 -08:00
Stella Laurenzo a7ff87a922 Sever C++ level depend on IREE and rebase on exe and python interface.
* IREE doesn't have proper install support, so there is some temporary hoaky hacking in our CMakeLists.txt to shuttle some symlinks around.
* Reworked the original numpy e2e with IREE test to pipe through iree-translate.
* Removed all of the C++-level dependencies.
* Will generalize and apply to the PyTorch backend in a followup.
2020-11-16 21:32:56 -08:00
Sean Silva 5227d52c26 [RefBackend] Use std.global_memref instead of homegrown thing
This vastly simplifies our code, allowing deleting multiple ops,
simplifying multiple passes, and removing a whole pass.

Now `refback` dialect is down to one op (refback.alloc_memref, which
simplifies allocations to just take a shape instead of individual
extents).
2020-11-13 18:43:50 -08:00
Sean Silva 32388d938b Make some passes run on FuncOp so they can run in parallel. 2020-11-13 16:12:18 -08:00
Stella Laurenzo b4c7ae1e0c Repurpose numpy-compiler compiler/runtime flow for PyTorch.
* A bit gross because I took the chance to upgrade all of the backend bits to the new MLIR Python bindings and we still co-mingle the old and new for now.
* Since the Python created PassManagers are configured for explicit nesting, I had to upgrade some of the pass pipelines to be explicit.
* The demo in mul_maximum_e2e.py now compiles, runs through PyTorch and through the JIT, prints and asserts the same results.
* I am not claiming that this is the prettiest API in this patch: consider that this is just directly using low-level APIs and there should be an intervening high level API.
2020-11-11 10:38:13 -08:00
Sean Silva 1c7c362e29 [TCP] Replace tcp.matmul with linalg.matmul.
This involved adding a `tcp.splatted` op to splat a dynamically sized
init tensor. See rationale in TCPOps.td docs.

One interesting observation is that when lowering tcf.matmul to
linalg.matmul, we need to both 1) create the error checks and 2)
calculate a shape transfer function to create the init tensors.
Previously, 2) was deferred to bufferizing tcp.matmul later. I'm not
sure if this is a conflation of concerns or not. For now, it's not a big
burden.
2020-11-10 18:58:28 -08:00
Sean Silva 0427aacb0b [TCP] Replace elementwise ops with std elementwise ops. 2020-11-10 18:58:28 -08:00
Stella Laurenzo e60dc2470e Add aten.maximum op and conversions from aten->tcf.
* Conversions are very simple, suporting mul, maximum and add (alpha=1 only).
* Example added with pass pipeline needed to run.
* Much missing off of the golden path but sufficient for such simple cases.
2020-11-04 17:20:54 -08:00
Stella Laurenzo 6c702b149f Add a number of kernels and new patterns.
* convolution, convolution_backward, _log_softmax, _log_softmax_backward_data, nll_loss_forward, nll_loss_backward, nll_loss2d_forward, nll_loss2d_backward, copy_
* Extends the recognition logic and metadata for handling inplace transformations, optional tensors, ints, lists and dropped args.
* The kernel_calls generated by test_conv_nllloss_grads.py now convert to ATen.
* The result *almost* comes out as a pure tensor program with the exception of the copy_ op, which I will do some followup work to deal with.
* More progress on #97
2020-11-04 14:36:59 -08:00
Sean Silva 57e58b9272 [RefBackend] Use upstream func-bufferize pass.
Now, the only bufferization we have left is lowering tensor constants to
memref, which will hopefully proceed soon after Rahul's new
std.global_memref lands + the lowering to LLVM IR. Then I'll port
LowerConstantTensorsToMemref to upstream and we'll be 100% upstream
bufferization, except for our local TCP dialect (which will probably go
away and be replaced by std elementwise + linalg named ops on tensors :)
).
2020-11-02 17:38:33 -08:00
Sean Silva 1874bf5eb1 NFC: Clean up some minor nits
- Remove GreedyPatternRewriteDriver.h from files that don't need it
- fix typo shouldBeCloned -> wouldBeCloned
2020-10-30 18:48:25 -07:00
Sean Silva f9c2f8eb0d [RefBackend] Use upstream SCF bufferization pass. 2020-10-30 18:12:41 -07:00
Stella Laurenzo a3f4db9fe8 Bump llvm-project to c8c07b76b2cf2ada8e7ec132f7f57b97d76743cf.
* Several NFC changes to signatures/includes.
2020-10-29 15:25:55 -07:00
Stella Laurenzo c08935a418 Rewrite ATen ODS code generator to be based on new op registry and new signature recognition system.
* Deletes prior code generator from previous attempt (moved some of it into this one).
* Renames old generated tablegen source to "Legacy".
* Generates ODS and import rules for most binary and unary arithmetic ops.
* Removes old generated ops and integration tests that were testing details of the prior setup.
2020-10-28 10:37:37 -07:00
Aaron J Arthurs 94ea6f7c92 [RefBackend] Support element-wise multiply op
Register the following for the multiply op:
- tcf.mul
- tcp.mul
- TCP->TCP lowering
- Shape transfer, broadcasted multiplicands
- Lower to standard `MulFOp` op
2020-10-27 19:41:23 -07:00
Stella Laurenzo 510f226df2 Expose signature metadata to ops and implement ATenRecognizeKernelsPass pass.
* Two op interfaces, one for querying instance metadata and one for getting static data needed to construct an op from a generic form.
* For torch.generic_kernel ops, metadata is splatted in during capture from Torch (it comes from the op registry, which will work for either device capture or graph import).
* Moved the 'add' out of the generated set so I can experiment on it. It implements the TorchBuildableKernelOpInterface interface which provides its metadata.
* The ATenRecognizeKernelsPass pass generically lowers from a torch.generic_kernel to recognized ops that implement the TorchBuildableKernelOpInterface, handling the various types of transformations that we allow at this stage.
2020-10-26 20:31:45 -07:00
Mehdi Amini f3c75d957b Add missing dependency on NPCOMPCAPI from NPCOMPPythonCommon
Fix linker error:

lib/Python/libNPCOMPPythonCommon.a(MlirInit.cpp.o): in function `mlir::npcomp::python::npcompMlirInitialize()':
mlir-npcomp/build/../lib/Python/MlirInit.cpp:46: undefined reference to `npcompInitializeLLVMCodegen'
2020-10-22 22:44:18 -07:00
Stella Laurenzo 91fc83d2e7 NFC: Transition ATen passes to tablegen registration. 2020-10-22 17:12:44 -07:00
Stella Laurenzo 9618c2dbf7 NFC: Re-organize ATen directory structure and fix warnings.
* Still some more work to do on the Transforms tree to bring it in line with the others (will do that as I add things).
2020-10-22 14:13:26 -07:00
Sean Silva 14470f9ff6 [RefBackend] Use upstream std bufferization.
It now subsumes the one we had.
2020-10-21 16:46:56 -07:00
Sean Silva b6ae53b312 [RefBackend] Use new upstream SCF type conversions. 2020-10-21 16:46:56 -07:00
Stella Laurenzo fe5ceed18d NFC: Format a file that had not been. 2020-10-21 12:47:12 -07:00
Stella Laurenzo 029815152e Add remaining pieces to capture full example models.
* Adds Basicpy List, Tuple, Dict types and plumbs through C API.
* Started debugging the issues around aten::conv2d capture, but a PyTorch bug is suspected.
* Was able to manually verify that the basic conv2d forward test captures correctly with a workaround.
* Need to resolve some printing issues upstream and move these tests to an integration test target (they take ~seconds to run).
2020-10-19 22:16:59 -07:00
Stella Laurenzo 9e52f6235b More progress on PyTorch acap device capture.
* Now gets far enough to capture batch_norm.
* Has some issues still with in-place ops.
* Can materialize constants.
* Includes an upgrade to PyTorch nightly, which has important bug fixes for fallback and boxed kernel dispatch.
* Fixes #78, #79, #80.
* Will do more testing in a follow-up once further bugs are fixed that facilitate getting at the other features.
2020-10-15 21:43:21 -07:00
Sean Silva 06a8ba6900 [RefBackend] Use more idiomatic bufferize pattern for TCP.
The time has come for BypassShapes/LowerShapedResultsToMemref to go away :(
For the reference backend, being consistent with upstream conventions is
the name of the game now.

This is a step down in a number of ways, e.g. test clarity and
separation of concerns. But it is fewer files and fewer tests, and
*does* address the "TODO: This is really fragile". It also eliminates two
more ops from the refback dialect (sadly, they are the
shaped_results/yield that we were getting kind of fond of, but alas).
2020-10-15 20:15:53 -07:00
Sean Silva b6bdc8cc4f [RefBackend] Use upstream BufferizeTypeConverter
Now that it has grown source/target materialization capabilities
(spelled with ops tensor_load/tensor_to_memref), we can use it. We can
also now delete refback.memref_to_tensor/refback.tensor_to_memref.

This is also a first step to reducing the downstream functionality
needed in the refback dialect.
2020-10-15 15:58:51 -07:00
Sean Silva f2d5c26c97 Bump llvm-project to 820e65f9e2369d2990fde4b3e7cfceb64f0df9c8
Date:   Mon Oct 12 11:26:50 2020 -0700
2020-10-12 13:30:22 -07:00
Sean Silva 93fc21dad0 [RefBackend] Split out TCF->TCP conversion.
Now the reference backend is cleanly accepts "TCP"+scalar ops.

We introduce tcf-refback-lowering-pipeline which also does TCF->TCP
conversion for convenience until we have a "target interface".
2020-10-12 11:56:39 -07:00
Stella Laurenzo af4edb63ae Start reworking towards a shared library build.
* Need to have a dag of shared library deps in order to interop across python extensions (as presented in ODM).
* Introduced add_npcomp_library and friends to mirror the MLIR setup.
* Adds a libNPCOMP.so shared library.
* Redirects tools and extensions to link against libNPCOMP.so (instead of static libs).
* Moves all libraries to lib/, all binaries to bin/ and all python extensions to python/. The invariant is that the rpaths are setup to have a one level directory structure.
* Reworks the _torch_mlir extension to build like the others (still need to come up with a consolidated rule to do this instead of open coded).
* Includes an upstream version bump to pick up needed changes.

Sizes with dynamic linking (stripped, release, asserts enabled):
  libNPCOMP.so: 43M (includes much of the underlying LLVM codegen deps)
  libMLIR.so: 31M
  _npcomp.so: 1.6M (python extension)
  _torch_mlir.so: 670K (python extension)
  npcomp-capi-ir-test: 6.3K
  npcomp-opt: 351K
  npcomp-run-mlir: 461K
  mnist-playground: 530K

Still more can be done to normalize and optimize but this gets us structurally to the starting point.
2020-10-09 16:02:58 -07:00
Sean Silva 631c8070df [RefBackend] Put JITModule in refback namsepace. 2020-10-08 09:07:00 -07:00
Sean Silva 7edb5f3641 [RefBackend] Rename RefBackend dialect to Refback
I now realize that VerboseCamelCase is not the best choice for dialect
directory/file names and C++ identifiers (take e.g. "Linalg", "Basicpy",
etc. as prior art here; not LinearAlgebra or BasicPython). If I had to
name the convention it seems to be "Shortword" (or of course just
acronym dialects like LLVM, SCF, etc.).

This rename also has the side benefit of differentiating RefBackend
directories, which now refer to the actual backend itself, from
Refback/Refbackrt, which are the dialects which happen to be used by
that backend.
2020-10-08 09:07:00 -07:00
Sean Silva bf99a82832 [RefBackend] Rename Npcomprt dialect to Refbackrt. 2020-10-08 09:07:00 -07:00
Sean Silva 83ad70ef54 [RefBackend] Move runtime related code under npcomp/RefBackend/
Other than the dialect definitions (which will live in standard Dialect/
subdirectory), the goal here is to keep RefBackend-related code nested
in {include/npcomp,lib,test}/RefBackend.
2020-10-08 09:07:00 -07:00
Sean Silva 21255d5f8e [RefBackend] Rename "E2E" to RefBackend. 2020-10-07 10:29:48 -07:00
Sean Silva 03846ed8e7 Rename a couple CMake targets.
NPCOMPFoo to NPCOMPFooDialect for consistency with others.
2020-10-07 10:29:48 -07:00
Sean Silva 5017430dc7 [RefBackend] Split out RefBackend (refback) dialect from TCP.
This is the first in a patch series that is refactoring the
constellation of things variously called or associated with "E2E",
"RefE2E", "npcomprt", and "TCP" into a more cleanly layered result.

Concretely, this first patch fixes the fact that TCP was basically
acting like a dumping ground needed by the reference backend. This
splits it out, which is fairly mechanical, but touches a lot of lines of
code (basically replacing `tcp` with `refback` and `TCP` with
`RefBackend).

Now, the RefBackend dialect is that dumping ground, which
is slighly better, as it starts allowing TCP to become a nice clean
middle layer that is not related per se to the reference backend.

The previous name RefE2E or "reference e2e flow" was super confusing.
Now that we are seeing more clearly where the "backend" distinction
lies, the [RefBackend] commit tag is born :)
2020-10-07 10:29:48 -07:00
Stella Laurenzo 58b6033537 Bump llvm to ed46e84c7aaffd847656ac559acb06089096ec33.
* Minor change of MLIRStandardOps -> MLIRStandard
2020-10-06 22:02:57 -07:00
Sean Silva 8022dfaf1a [RefE2E] Initialize the linalg matmul accumulator buffer.
I was seeing some miscompiles due to the uninitialized data read here
before. Interestingly, this was masked in some of our previous test
cases, since the uninitialized data "always" was so small that it would
present as a rounding error for the 1.0-10.0 sized values that the
matmul was computing on.
2020-10-02 16:24:52 -07:00
Stella Laurenzo e5433e314f Add capture function arguments.
* Adds at::Tensor -> MlirValue tracking.
* Adds conversions for tensor and scalar types to MLIR types.
* Adds npcomp C APIs for constructing custom types.
* Reworks pybind include so as to get Torch pybind helpers (needed to pass at::Tensor type from Python->C++).
2020-10-01 18:59:58 -07:00
Stella Laurenzo 3d74337be0 Add a torch.kernel_call op and associated predicates. 2020-09-29 15:10:38 -07:00
Stella Laurenzo ba03ecc652 Add public API for constructing a module/function to capture PyTorch ops.
* Uses the MLIR-C API since that will save us a lot of grief down the road (i.e. will give PyTorch and libMLIR/libNPCOMP the ability to skew version-wise).
* Quite a few TODOs and not yet populating the function in any way.
2020-09-29 14:23:22 -07:00
Stella Laurenzo 2c9ca79c89 Add boilerplate for Torch dialect. 2020-09-28 15:26:17 -07:00
Stella Laurenzo fb895173f2 Run format_sources.sh. 2020-09-28 12:04:24 -07:00
Stella Laurenzo b5f010284f Add boilerplate to do device capture (pytorch 1.6).
* Uses the new dispatcher API.
* Just prints to the console for the moment when an op is captured.
* Executes the op through the existing implementation.
2020-09-28 10:30:54 -07:00
Sean Silva 16c26ef57e [RefE2E] Use upstream shape constraint conversion pass.
Now that we upstreamed our pass, we can remove it.
The final pass that landed upstream doesn't do the shape.assuming
canonicalization to legalize that op away, so added a
restricted-canonicalizer pass that allowed to run just shape dialect
canonicalizations, which deletes the shape.assuming.
The pass ended up kind of ugly. See the TODO's on it for some potential
cleaner directions.
2020-09-28 09:34:44 -07:00
Sean Silva 6ea37cfed6 Bump llvm-project to 9ed1e5873c19eb817fb9e36d0262c7effee5d35e
Date:   Fri Sep 18 13:55:52 2020 -0700

- Update to linalg syntax
- New generated builders are better. Custom builder for
tcp.shaped_results is now redundant.
2020-09-28 09:34:44 -07:00
Sean Silva f9b37c55b7 [RefE2E] Add support for unary ops exp and tanh
This is fairly mechanical.
2020-09-24 18:41:30 -07:00
Sean Silva 6b69beae6a [NFC] Remove stray .dump() that snuck in. 2020-09-24 18:41:30 -07:00
Sean Silva c69e9fabc5 [RefE2E] Add support for "max".
This cleans up the lowering pipeline to easily allow extending to
multiple binary ops. It looks fairly repetitive at multiple levels, but
I don't want to prematurely generalize. I think that in principle we
could derive a large swatch of TCF + TCP from a single linalg-style
specification. Another direction is to use an OpInterface (something
like "buildLinalgGenericBody"). I'm keeping my eye on it.

In a subsequent commit, I'll mechanically add a set of binary ops
modeled off of the std arithmetic ops.
2020-09-22 18:38:32 -07:00
Marius Brehler 681c4e1d4a Inject missing dialects in E2E passes 2020-09-22 08:52:23 +02:00
Sean Silva 7b7f35744b [RefE2E] Add interesting control flow example.
This also required adding a lowering for ForOp in our tensor->memref
conversion.
2020-09-21 12:25:24 -07:00
Sean Silva dc8afc9271 [RefE2E] Refactor how tcf.add is lowered.
It was previously going through this awkward route that prematurely
created linalg.generic ops, which was an annoying layering problem since
we can't compute a shape transfer function for linalg.generic in the
general case. Now we pass it through the same path as tcp.matmul, with
the shape transfer function being defined for tcp.add.

This also removed the need for TCPToLinalg (now deleted). The equivalent
of that is happening in lower-shaped-results-to-memref. One interesting
outcome of this: we're basically using linalg as a "Buffer TCP". We
might want to look into using named structured ops for more of TCP, but
that would be a big velocity hit since then any change to the ODS /
verification for those ops would be a change to the upstream structured
op ODS generator. After we have more experience defining this manually,
we should re-evaluate rebasing TCP on generated named linalg ops.
2020-09-18 15:03:53 -07:00
Sean Silva d8675f8ad2 [RefE2E] Add support for matmul.
I'm pretty happy with how this turned out. It looks pretty much like it
should -- one change at each layer. This particular op bottoms out on
linalg which takes care of the rest.

- Add tcf.matmul
- Add tcp.matmul
- Add TCF->TCP lowering
- Add tcp.matmul shape transfer function (BypassShapes.cpp)
- Add tcp.matmul -> linalg.matmul lowering (LowerShapedResultsToMemref.cpp)
- Add support to LowerShapeConstraints for lowering the new
shape.cstr_require

This matmul op is pretty limited in its capabilities. There is no
batching and no multidimensional contraction. Certainly more design work
will be needed to find the right abstractions that aren't too general
but also help to canonicalize many cases from frontends. This is mainly
to show that adding a new op needn't be very "scary" once we have the
e2e infra in place.

Also,
- this clears out some exploratory cruft from the TCF dialect now that
this is starting to become real.
2020-09-18 11:31:01 -07:00
Sean Silva 62738d3641 [RefE2E] Fix nul-termination bug.
I was seeing some of the error messages come out with some garbage at
the end. This fixes it.
2020-09-18 11:31:01 -07:00
Stella Laurenzo 678989a321
Update docker, instructions and some fixes for the pytorch 1.3 build. (#45)
* Includes pybind11 directly (for some reason using the pytorch helper header for this depends on a source file not in the image).
* Installs nnpack into the image.
* Installs new-clang and LLD and configures environment to use it (otherwise, link time is terrible).
* Fixes a gcc compile error (in the off chance you build with default gcc compiler).
* Tests are failing based on some dialect registration stuff that must not have been factored correctly. Will followup with a fix.
2020-09-16 21:57:46 -07:00
Sean Silva 75f57b461e
Totally rework RefE2E tensor to memref flow. (#42)
This now gets the overall "RefE2E" compilation stack to a point that I'm
fairly happy with. We simplify it by mostly embracing the "descriptor"
view of the world.

The overall flow is best understood by reading through the
createE2ELoweringPipeline function in lib/E2E/E2E.cpp
That function creates a pass pipeline that lowers from "TCF" (which is
~numpy level of abstraction) down to LLVM IR.

A brief high-level summary of what happens there:

1. TCF to TCP conversion. This involves reifying error handling in the
form of shape constraints. See test/Conversion/TCFToTCP/basic.mlir

2. Lowering shape constraints. This converts shape constraints into
eager error-handling code. See test/E2E/lower-shape-constraints.mlir
This pass will soon go upstream.
Because this lowers to std.assert, some later passes like
LowerToNpcomprtABI and LowerToLLVM are updated to properly plumb this
through e2e.
See test/npcomp-run-mlir/invalid-broadcast.mlir for an execution test
that properly aborts in case of an error.

3. Lowering tensors to memrefs. This is done via a series of passes
rather than an single mega conversion. Unlike the previous code that
mixed in the npcomprt ABI stuff here, it's now a very clean "pure
memref" conversion.
See test/E2E/lower-*-to-memref.mlir and
lib/E2E/TensorToMemref/
Most of the changes are concentrated here.

4. As part of the above, we use the upstream ConvertShapeToStandard for
lowering shapes.

5. We lower linalg to loops and lower loops to CFG using upstream
passes.

6. Rewrite the "ABI" boundaries of the program to npcomprt data
structures (LowerToNpcomprtABI). This mainly affects ABI boundaries and
how global tensor constants are represented. One of the major
improvements in this commit is that now it's a very clean rewrite that
just replaces memrefs on ABI boundaries with !npcomprt.tensor (before
there was a get_extent function that is not needed).
See test/E2E/lower-to-npcomprt-abi.mlir

7. Lower to LLVM with upstream mlir patterns + some patterns for the
npcomprt lowerings.

One aspect here that is still a remnant of a non-descriptor-based tensor
to memref flow is the BypassShapes + LowerShapedResultsToMemref.
BypassShapes wraps the "tensor compute" ops in a tcp.shaped_results
(basically a "tie_shape" kind of op), and then
LowerShapedResultsToMemref uses those annotations to allocate output
buffers while lowering the "tensor compute ops". Note that there are
very few "tensor compute" ops currently supported (tcp.add +
tcp.broadcast_to), so we just hardcode them in both passes.
Realistically, I expect this to go away as we fully embrace the
descriptor-based approach for simplicity, so don't look too deep into
it.
2020-09-16 17:31:40 -07:00
Marius Brehler d62f8227c2
Bump LLVM to @7d1ed69 and fix namespace handling changed upstream.
* Bump LLVM to llvm/llvm-project@7d1ed69
* Bump MLIR-HLO to tensorflow/mlir-hlo@1880f87
* Adopt to MLIR's changed namespace handling
2020-09-16 15:52:15 -07:00
Stella Laurenzo dd9172fd75 Run clang-format on files that do not comply. 2020-09-15 17:54:58 -07:00
Marius Brehler 843448cde9 Register dialects in E2E passes 2020-09-11 09:33:44 +02:00
Marius Brehler a2fb68059f Remove unused include 2020-09-11 09:33:44 +02:00
Marius Brehler 124bc65a70 Register dialects in ATen lowering pass 2020-09-09 21:55:17 -07:00
Marius Brehler fb2d1a1559 Register dialects in conversion passes 2020-09-09 21:55:17 -07:00
Stella Laurenzo 81dd571c23 Integrate upstream LLVM at 8d9c13f37d2081c11186718ae8b5aef8b507d152.
* mlir-hlo: 062a3ac4a0671d15b5199ed2cd3a9ce02a5bf077

Fixes:

* numInputs() just returns an int instead of requiring a call to .getLimitedValue()
2020-09-08 20:34:31 -07:00
Stella Laurenzo 97d83f786a Bump submodule versions.
* llvm-project: b5924a8e27536d19dd5c4d302db29fb6163d5faa
* mhlo: 848ca244d20f045b7921da55a98a04d95ef94f0e
* Multiple breakages that need to be fixed.

Fixes:
* Refactor dialect registration
* Remove all kindof methods (Casting functionality has been added upstream and is implicitly
available, see https://llvm.discourse.group/t/removing-kinds-from-attributes-and-types/1547.)
* Update dialect registration to comply with https://reviews.llvm.org/D85495.
* Remove type kinds and update some changed dialect signatures.
* Upgrade ATen dialect to match upstream needs.
  * Move dialect registration to tablegen.
  * Register the ListType in tablegen.
  * Change dialect initialization signature.
* Use TypeSwitch in MlirIr location printer.
* Remove global registry depends from npcomp-opt.
* Change LowerToLLVM to pass an MLIRContext vs an LLVMDialect for type creation.
* Remove dep on MLIREDSCInterface that is removed upstream.
* Thread through the DialectRegistry for opt and python-like tools.
* Modernize pass registration (This was forced because the GEN_PASS_REGISTRATION code now generates inline functions vs literal pass registration statements)

Co-authored-by: Marius Brehler <marius.brehler@iml.fraunhofer.de>
2020-09-08 13:26:42 -07:00
Stella Laurenzo fc4f374345 Format sources. 2020-08-27 14:47:49 -07:00
Stella Laurenzo 69cda404ef NFC: Fix extra namespace declaration.
* Was causing build break on GCC9.
2020-08-20 16:22:41 -07:00
stephenneuendorffer bb668e6e26
Add ATen Dialect (#16)
This patch adds a dialect intended to be used as a frontend dialect
to facilitate lowering from "A Tensor Library" in torch/pytorch.

This patch includes several passes that are useful in conjuction with the
dialect:

--aten-layer-name: Generates layer names for each operation, which are not
  present in the original pytorch.
--aten-to-std: Lower the ATen dialect into standard dialect function calls.
--return-elimination-pass: convert functions (primarily the toplevel function)
  to pass return values by reference.  This simplifies pytorch integration.
--aten-op-report: generate a textual report about the model
--liveness-report

Future patches will implement actual integration with the pytorch jit to
intercept and generates MLIR in this dialect, then lower the resulting MLIR
into function calls through aten-layer-name -> aten-to-std ->
return-elimination -> std-to-llvm. The result would then jitted using the LLVM
jit, linked against a runtime library which makes calls back into pytorch to
implement all the layers.

Co-authored-by: Jeff Fifield <jeff.fifield@xilinx.com>

Co-authored-by: Jeff Fifield <jeff.fifield@xilinx.com>
2020-08-12 19:28:04 -07:00
stephenneuendorffer 111ba12e7f
Fix build error (#12)
This debug rule only works with add_mlir_library.
2020-08-05 16:55:42 -07:00
stephenneuendorffer 146ea0a781
Update LLVM to c89e46e76... (#10)
Requires a fixup because BroadcastOp now has a configurable return type.
2020-08-05 14:51:02 -07:00
stephenneuendorffer 44af7a6d30
[cmake] Updates for basic shared library support (#7)
Mostly this is CMake cleanup.  Several library dependencies are missing, which
is often revealed with shared library builds.  Also, it's generally bad to
link directly against LLVM libraries because it fails when using
LLVM_LINK_LLVM_DYLIB.  MLIR will pull in libLLVM.so, and there will be
duplicate linkage with the the explicit libraries.  There may need to be more
refactoring here.
2020-08-05 14:49:18 -07:00
Stella Laurenzo 3efbbe8735 Misc fixes to enable building/testing on manylinux2014 images.
* Since the manylinux images do not hard-link against python libs (resolving them at runtime), the module must be built without resolving undefined references.
* For some reason, builds on this platform are stricter, exposing dependency ordering issues.
* The CMake bits to build the extension are still somewhat of a mess. I have better versions both upstream and in IREE and will be reconciling. For now, don't look too closely.
2020-08-04 17:26:15 -07:00
Stella Laurenzo fc484d1bd8 Rework reference shape lowering based on upstream shape dialect changes.
* Primarily, the upstream shape dialect now uses tensor<?xindex> for non-erroring, immediate shape calculations (and will return this for shape_of of a tensor or memref).
* In addition, upstream passes do not yet exist for fully lowering to standard ops, so the passes here need to be extended to handle this new convention.
* This should be seen as an intermediate state, necessary to integrate a new LLVM version and needs more work and cleanup for generality.
* There is a good deal of awkwardness in these conversions. The hope is that additional upstream work will yield better defined conversion paths once out of this intermediate state.
2020-08-03 13:43:49 -07:00
Stella Laurenzo 9d5d802cc8 Fix compilation issues due to llvm-project version bump.
* Redundant infer type implementations removed.
* Update to the linalg GenericOp build calls.
2020-08-01 15:23:57 -07:00
Sean Silva a9d7610f9d Cleanup after going to `llvm` dialect.
This reduces IR size a lot, which can help when staring at it.
2020-07-13 16:15:42 -07:00
Sean Silva d0f15d2cec Add -optimize flag to npcomp-run-mlir so that it runs optimizations.
My main interest in this is that tweaking the default of this flag is a
quick way to check for miscompiling canonicalizations / op definitions
not annotated properly (e.g. marked NoSideEffect when in fact it is not
safe to do so).
2020-07-13 16:07:44 -07:00
Stella Laurenzo 0356f65dcd Wire through codegen and runtime dependencies.
* Enables e2e test.
* With what I've learned in upstream about test directory layout, I can consolidate most of the separate directories we have for these things. Will do that in a followup.
* Not pleased with the LLVM global initialization depends but serviceable for now.
2020-07-10 22:57:26 -07:00
Stella Laurenzo 9e4a62fc71 Allow JITModule passes to be built separately.
* Re-introduces frontent/backend split.
* Adds a (very) trivial shape refinement pass.
2020-07-10 22:57:26 -07:00
Stella Laurenzo aea05d68d7 Initial python plumbing to interface with the refjit backend. 2020-07-10 22:57:26 -07:00
Sean Silva df0d3fcaff Consolidate LLVM definitions of runtime data structures.
This required making module descriptors hold a FuncDescriptor* instead
of a pointer to array of FuncDescriptors as it previously did, which is
innocuous (just requires an llvm.bitcast after the llvm.mlir.addressof).
2020-07-10 17:50:55 -07:00
Sean Silva e228aa4b11 npcomprt: add support for constants
- create tcp.global + tcp.get_global_memref
- create npcomprt.global + npcomprt.get_global
- LLVM lowering for new npcomprt ops
- Runtime:
 - GlobalDescriptor struct emitted by LLVM lowering
 - implement __npcomp_compiler_rt_get_global

Also,
- cleanly isolate all runtime data structure definitions shared by the
compiler and runtime into lib/runtime/CompilerDataStructures.h
2020-07-10 17:31:24 -07:00
Stella Laurenzo efbcf0aa44 Add NumpyPublicFunctionsToTensor pass.
* Rewrites public function signatures to operate on tensors (vs ndarray).
* Most of our backends presume immutable tensors at public function boundaries.
2020-07-08 22:51:54 -07:00
Stella Laurenzo 5ceb37c19b Add NumpyToTCF conversion.
* Just for numpy.add right now.
2020-07-08 21:03:57 -07:00
Sean Silva f18014f60c LowerRankedShapes: support shape.const_shape op.
Also, the previous code had a special case for deleting this op when it
had no uses. This is subsumed by the change in this commit since now
shape.const_shape is properly lowered.

With this change, the included test case with multiple serially
dependent ops works!
This specific issue was related to the scalar argument to that
function. We needed to compute a broadcast of a scalar shape (which is a
shape.const_shape) with another shape.
2020-07-08 20:12:40 -07:00
Sean Silva b4f0cea8fa Rework e2e flow to use new "npcomprt"
This ~totally reworks the existing "runtime" stuff to be more
principled and usable, such as from Python. It's still not fully
production-quality, mainly in the department of memory management (e.g.
it currently leaks memory; we need to figure out "who frees memrefs" +
the analysis and transformation needed to do that (maybe use upstream
buffer allocation pass?)).

The user API is in include/npcomp/runtime/UserAPI.h, though
include/npcomp/JITRuntime/JITModule.h is a friendlier wrapper.

The stuff under {include,lib}/runtime is totally firewalled from the
compiler and tiny (<6kB, though no attention has gone into optimizing
that size). For example, we don't link in libSupport into the runtime,
instead having our own bare bones replacements for basics like ArrayRef
(the JITRuntime helps with bridging that gap, since it *can* depend on
all common LLVM utilities).

The overall features of npcomprt is that it exposes a module that
with multiple function entry points. Each function has arguments and
results that are tensor-valued, and npcomprt::Tensor is the runtime type
that is used to interact with that (and a npcomprt::Ref<T>
reference-counting wrapper is provided to wrap npcomprt::Tensor in the
common case).

From an implementation perspective, an npcomprt module at the
LLVM/object/binary level exposes a single module descriptor struct that
has pointers to other metadata (currently just a list of function
metadata descriptors). All interactions with the npcomp runtime are
keyed off of that module descriptor, including function lookups and
dispatching. This is done to dodge platform ABI issues and also allow
enough reflection to e.g. verify provided arguments.

Most of the compiler-side work here was in LowerToNpcomprtABI and
LowerToLLVM.

Also,
- Rename npcomp_rt/NpcompRt to npcomprt/Npcomprt; it was getting
annoying to type the underscores/caps.
- misc improvements to bash_helpers.sh
2020-07-08 19:36:19 -07:00
Stella Laurenzo 5aa2f0f9f6 Add a trivial copy elision canonicalization on ndarray->tensor.
* This elides the very common code the compiler adds for chaining otherwise tensor-related numpy ops together.
* More aggressive canonicalizations would require more advanced analysis.
2020-07-05 18:09:43 -07:00
Stella Laurenzo 504e3c4946 Fixup local ndarray<->tensor transforms to preserve shape.
* Preserving shape across the copy ops makes more thing shaped by default.
* Inference of ndarray types will now preserve the shape when specializing the dtype.
2020-07-05 17:45:45 -07:00
Stella Laurenzo fae15ec5e7 Allow the ndarray type to carry a shape. 2020-07-05 17:34:03 -07:00
Stella Laurenzo dc271dfb87 Complete the basic spike to perform dtype inference.
* Correctly infers the unknown dtypes that emit as part of compilation for the simple ufunc case.
* Significant more testing needs to be done on the details now that the pass is minimally functional.
* The actual pass itself is still too hacky/not general, but the underlying analysis is further along.
2020-07-05 16:09:16 -07:00
Stella Laurenzo 86ea90ba84 NFC: Rename Support.(h|cpp) to Types.(h|cpp). 2020-07-04 20:42:37 -07:00
Stella Laurenzo 4a5695ae9c Fix createTensorLikeArrayType() declaration. 2020-07-04 20:37:46 -07:00
Stella Laurenzo 00c791f925 Make common utilities for converting TypeNode <-> IR types.
* Generalizes the conversions from ObjectValueType <-> tensor and ndarray.
* Creates a utility to construct the default type map hook.
2020-07-04 20:33:13 -07:00
Stella Laurenzo 97c92aa264 Remove the existing attached values/ops from CPA types.
This was ad-hoc and needs to be replaced by a more principled track back to the IR.
2020-07-04 17:47:19 -07:00
Stella Laurenzo adb8094108 Fix some compiler option and warning levels. 2020-07-04 17:38:01 -07:00
Stella Laurenzo 48a0b0ec7f NFC: Move CPATypeInference to Typing directory. 2020-07-04 16:56:09 -07:00
Stella Laurenzo 051d088161 NFC: Move CPA typing analysis down a directory. 2020-07-04 16:40:02 -07:00
Stella Laurenzo 6a50efd046 Extend the CPA type inference to work on numpy types/ops.
* Adds an op interface for adding CPA constraints.
* Adds a type conversion hook for handling built-in types (that we can't have adopt our interface).
* Converts tensor<> to object(!Tensor, [e:<type>]) just like NdArray.
* Implement a few numpy ops far enough to do dtype inference for simple sequences.
2020-07-03 18:16:34 -07:00
Stella Laurenzo 34861b18f4 Add NdArray type inference conversion. 2020-07-03 16:38:10 -07:00
Stella Laurenzo 4a2f7c0b5f Add constraint propagation and tracking of node members. 2020-07-03 13:29:52 -07:00
Stella Laurenzo 1a13c38033 More progress on CPA.
* Added transitivity propagation rules.
* Fixed up some copy-n-paste inversions from the old algorithm.
2020-07-02 18:56:05 -07:00
Stella Laurenzo 74b8bed7e3 Unique CPA type and constraints to enable comparison by pointer during propagation. 2020-07-02 17:07:02 -07:00
Stella Laurenzo a257da46e2 Introduce a type interface for mapping to CPA types.
* Currently just simplifies the logic for UnknownType -> TypeVar.
2020-07-02 13:56:27 -07:00
Stella Laurenzo b0604684ba NFC: Move CPA support down into it's own directory. 2020-07-02 11:31:23 -07:00
Stella Laurenzo aeb422b030 Some fixes to get npcomp building and passing on windows.
There is more that can be done here, but this gets it minimally working.
2020-07-01 21:28:04 -07:00
Stella Laurenzo 92190176fb Add skeleton of pass to do modified PCA type inference. 2020-06-30 20:57:09 -07:00
Stella Laurenzo f1b08a0ef0 Add some support classes for implementing a CPA type inference algorithm. 2020-06-30 18:28:39 -07:00
Stella Laurenzo 0962a31ca8 Bump llvm and IREE version to revisions circa 2020/7/29.
* Also fixes a dependency issue that was causing a build race.
2020-06-30 11:22:30 -07:00
Stella Laurenzo 046751254f Refactor old tracing tests and remove deprecated ops.
* Old doctests to run under lit.
* Old custom filecheck tests -> pytest directory (under lit).
* Rename some old ufunc ops in the tracer.
2020-06-29 16:19:03 -07:00
Stella Laurenzo 7ca292ade5 Add partial evaluator for explicit numpy ufuncs.
* This enables emission of "numpy.add(a, b)" and several dozen others.
* Will deprecate original ufunc infra in a follow-on.
2020-06-29 15:27:39 -07:00
Stella Laurenzo a4f3ce1ed3 Add value coding for ndarray.
* This lets us import arrays from the outer environment, which is the first step to actually handling numpy ops.
2020-06-28 18:42:08 -07:00
Stella Laurenzo efe8915901 Add NdArrayType. 2020-06-28 17:37:20 -07:00
Stella Laurenzo 7bd5733d38 Add "template function" ops and importer code.
* This starts to lay down the infra for reasoning about calls
* Adds the importer code to generate IR for function calls of compiler recognized static functions.
2020-06-26 18:36:36 -07:00
Stella Laurenzo fc5f10c5c5 Bump revision and fix issues.
* llvm revision = 4836188ad9b3334b0c1e055d45ccaa54ed797e4b
* iree revision = 091482e8fdf599d6cb5c701d5b3ccb27fc66c014
2020-06-19 10:38:51 -07:00
Stella Laurenzo 529873d13c Wire up IREE compilation and runtime in a new backend test.
* Adds python bindings for invoking flow, HAL, and VM lowering pipelines.
* Adds pythong bindings for translating to VM module flatbuffer.
* Adds a new backend_test/iree directory and configure lit to find the IREE python rt bindings.
* Open code a simple_invoke.py that exercises the whole pipeline (need real APIs for a lot of this).
* Fails when invoking the function because I never implemented argument marshaling for scalars :(
* Plenty of stuff to do tomorrow.
2020-06-19 00:30:34 -07:00
Stella Laurenzo 373878f31f Add _npcomp.backend.iree module.
* Populates with builders for the various path pipelines and translator.
2020-06-18 23:28:30 -07:00
Stella Laurenzo 213041449f Move most python sources to the include and lib tree. 2020-06-18 18:02:39 -07:00
Stella Laurenzo b21b5322f6 Basicpy conversion to IREE+std skeleton and first conversions.
* Conversions to std for numeric binary expressions, numeric to_boolean, and numeric comparisons.
* Added folders to constant ops to comply with requirements of the pass system.
* Extended the frontend with parameter/result annotation processing for primitives (can specify types for function arguments).
* Added (empty) directory/sources for IREEVM conversions. These are only enabled if IREE is enabled.
2020-06-13 23:45:43 -07:00
Stella Laurenzo 2ba8296151 Add script tools/format_source.sh and run it on all python and c++ sources. 2020-06-13 14:53:54 -07:00
Stella Laurenzo 2bc7a77f98 Add conditional registration of IREE passes. 2020-06-11 17:57:10 -07:00
Stella Laurenzo 19196f23e1 Make a real library for InitAll and extend it to conditionally initialize dependencies.
* Conditioned on the top level CMake option to enable IREE.
* There is still some warning flags and such that need triage, but it does build/work.
2020-06-11 17:47:14 -07:00
Stella Laurenzo 308a54c3d0 Bump llvm-project to 52cae05e087b3d4fd02849fc37c387c720055ffb (2020/6/10).
* Fixes compile errors from upstream.
* XFAIL several tests that are now failing to legalize (will hand off to Sean).
2020-06-11 16:10:05 -07:00
Stella Laurenzo 750541e9a9 Extend type inference so that it works across conditional boundaries.
* The implementation is still limited but gives something to build on.
2020-06-10 21:33:17 -07:00
Stella Laurenzo e3fd22a035 Add a (very) basic type inference pass for basicpy.
For simple programs, this gets us enough typing to lower to real backends.
2020-06-10 19:04:05 -07:00
Stella Laurenzo 3e58d8fe37 Add skeleton of type inference pass. 2020-06-10 14:48:22 -07:00
Stella Laurenzo 432e01fe8f Move Basicpy and Numpy dialect IR to IR/ folder. 2020-06-09 19:22:24 -07:00
Stella Laurenzo 340f109742 Add implicit return and expression statements where the value id discarded. 2020-06-09 18:34:07 -07:00
Stella Laurenzo 85b724e70c Adds ODS and import support for binary_expr and binary_compare ops.
* Currently only supports non-short-circuit comparisons.
2020-06-08 13:46:06 -07:00
Stella Laurenzo 72499e0319 Add bytes constants. 2020-06-07 16:00:29 -07:00
Stella Laurenzo f3829b1d4f Add string constants. 2020-06-07 15:46:28 -07:00
Stella Laurenzo 869228e316 Add bool constants. 2020-06-07 15:15:19 -07:00
Stella Laurenzo 0cc0a7165e Add basic AST -> basicpy dialect function extraction.
* Extends the bindings to support locations.
* Various other things necessary to extract a function with simple numeric expressions.
2020-06-06 21:24:28 -07:00
Sean Silva cd7258dbd4 Enable warnings by default.
The secret here is LLVM_ENABLE_WARNINGS=ON.

I also fixed a couple warnings, which gets us to be warning-clean.

I noticed also that npcomp-run-mlir/basic.mlir seems to be failing.
Maybe something since the latest integrate. My next commit (introduce
npcomp mini runtime) will largely rewrite it though, so it'll get fixed
then.
2020-06-03 20:39:34 -07:00
Sean Silva 7b9f0c3364 Add ability to run without optimizations.
The default is to only do the bare minimum needed for correctness, since
that stresses the layering of the system maximally.
2020-06-01 19:33:59 -07:00
Sean Silva e8b1a07ef4 Initial NpcompRt (npcomp_rt) dialect boilerplate. 2020-06-01 19:07:53 -07:00
Sean Silva e7b5a2b8a3 Make LowerRankedShapes clean up shape.from_extents ops.
We were previously relying on a later canonicalization pass to clean
them up, but it is a cleaner invariant if the pass gets rid of them
itself.
2020-05-29 18:00:35 -07:00
Sean Silva 3a09455540 Use upstream shape.from_extents
Replace our local `tcp.shape_from_extents` op with the upstream
`shape.from_extents` op.
2020-05-21 14:51:01 -07:00
Sean Silva 1fed1cb016 Update llvm-project to 753a21928413f8a7e76978cb1354e09150e114e0 2020-05-21 13:09:06 -07:00
Sean Silva 87aa561c69 Remove RtGetTensorExtentOp.
It is unused now, and will be superceded by a proper runtime dialect.
2020-05-21 10:17:49 -07:00
Sean Silva 1d3dbd9d5c Lower to LLVM dialect.
With this commit, we finish conversion to LLVM dialect, and should be
ready for subsequent commits to convert to an LLVM module and let LLVM
codegen to native machine code.

This required a custom "lower to LLVM" pass to support lowering
tcp.abort_if to a runtime call. In the future, this pass will grow to do
type conversions for our own runtime types as we add those.
2020-05-20 18:56:10 -07:00
Sean Silva be1971c4fc Rename tcp.abort_if to tcp.shape_observe_error
This more clearly captures its semantics as a structural "observer" of
code that we currently mark as NoSideEffect but eventually lowers to
eager error handling code.

Also, update LowerRankedShapes to erase it, now that the layering here
is clear. That pass reifies the eager error handling code, so the need
for the dummy op to keep things alive isn't needed.

With this change, we are now ready to start lowering to LLVM!
This is the current print-ir-after-all from e2e-lowering-pipeline:
https://reviews.llvm.org/P8221
2020-05-18 13:38:47 -07:00
Sean Silva 9191efdfc1 Fix typo in LowerLinalgLoopDimOps
The legality condition was reversed. It's unclear to me why this didn't
cause "failed to legalize".
2020-05-18 12:53:31 -07:00
Sean Silva 836a8d4bec Lower tcp.alloc_memref ops to tcp.get_extent + std.alloc.
- tcp.get_extent will be liminated while lowering shapes
- std.alloc is supported by the upstream LLVM lowering.
2020-05-18 12:53:31 -07:00
Sean Silva 9eaab7537c Add a comment about how IREE would layer in.
Thanks to Stella for probing me on this.
2020-05-15 17:22:47 -07:00
Sean Silva 993338a12d Lower to the upstream memref ABI.
Specifically, we use unranked memrefs which get passed as a fixed-size
set of arguments/returns. One big caveat about this is that returning
results isn't going to work. See TODO in LowerTensorLoadOp.

This is far from enough runtime-wise, but it starts to demarcate a
plausible layering. Notice for example how this removes the
runtime-dependence from LowerRankedShapes.

Eventually, we want to have an `npcomp_rt` or `npcomp_hal` dialect with
its own set of runtime types that will supercede this.

See comments in LowerTensorLoadOp for more direction about where this is
going to evolve.
2020-05-15 17:19:57 -07:00
Sean Silva 1b48d0d80b Remove the present tcp.island.
The idea was half-baked and after some deep thought felt like a solution
looking for a problem. What we had here (and is removed in this patch)
just wasn't pulling its weight.

I cannot think of anything we would want to do with tcp.island as it is
removed here beyond just sinking and merging them within a basic block,
such that the witness argument is kind of pointless (only matters for
hoisting).

TCP compute ops like tcp.add and tcp.broadcast_to have the strong
invariant of "pure or undefined behavior", which means they are always
safe to sink. The island concept as removed here conferred no benefit.

Also, I'll note that "islands" are a trick you can only play once in a
system (unless they strictly nest). I have some early-stage thoughs on
having an island concept that helps with modeling tensor shapes
robustly which seems promising (the island would serve a similar role as
tie_shape).
2020-05-14 15:19:37 -07:00
Sean Silva 98a38c3527 Add some anonymous namespaces.
This brings this code in line with the LLVM style guide and avoids
potential ODR issues.
2020-05-14 15:02:46 -07:00
Sean Silva eaeb4011e6 Lower !shape.shape to SSA values.
This uses an approach inspired by what is done in IREE. See comments on
LowerRankedShapes.cpp for how it works.

The basic gist is that we have an op that creates a !shape.shape from a
set of SSA values representing the extents, and then iteratively replace
any op producing a !shape.shape with instances of that op.
2020-05-13 17:20:23 -07:00
Sean Silva ef25428fe3 Add lowering from linalg to loops.
This also adds a small pass to clean up the `dim` ops that linalg
introduces. For now, it only has a trivial pattern that looks for a
`tcp.alloc_memref(%shape)` op to get the shape as we currently have an
invariant that all memrefs are the result of such ops.

But eventually this will need to look through view ops and any other
shape-ish stuff that linalg introduces as it lowers to loops, along with
any slicing ops introduced by buffer allocation.
2020-05-11 18:54:52 -07:00
Sean Silva 174ab19c5f Add some cleanup passes.
This makes the IR more presentable before going to the next phase of
lowering (ops on memref/buffers -> LLVM ops).
2020-05-11 15:28:21 -07:00
Sean Silva 83db558db9 Update llvm-project to 310d32cb80a611e6384a921e85607fea05841f26 2020-05-11 15:12:47 -07:00
Sean Silva 53c17dbed9 "Finish" tensor -> memref conversion.
There's a lot of details to flesh out here, but the basic approach seems
promising (see comments in createE2ELoweringPipeline).

This approach will be put to the test when we try to do our first
fusions since that tickles some of the nasty phase ordering issues
involved here.

But we're not there yet.
2020-05-11 15:00:12 -07:00
Sean Silva fec2ee0072 Avoid introducing DimOp's in LowerBroadcastToToLoops.
This makes sure we stay resonably canonically using the shape machinery.
(In fact, DimOp should probably be in the shape dialect since it hides a
`shape.shape_of` call)
2020-05-11 13:12:16 -07:00
Stella Laurenzo 950ba12426 Bump llvm-project to 3af85fa8f06220b43f03f26de216a67be4568fe7. 2020-05-08 20:42:40 -07:00
Sean Silva e29aef855b Initial TCF/TCP E2E seed.
Very much WIP.

This is enough to get tcf.add down to approximately the "linalg.generic
on buffers" level of abstraction. (but there are nuances)
2020-05-08 20:20:41 -07:00
Stella Laurenzo a91b0bfbe1 Add numpy.get_slice op and wire it up to the tracer. 2020-05-08 16:04:58 -07:00
Stella Laurenzo bc5ef81d68 Add basicpy.SlotObject type and ops to create/index into it.
* This is intended to provide low-level modeling for built-in objects.
* It is now possible to trace slice tuples (which are tuples of NoneType|EllipsisType|SlotObjectType<slice, ...>).
2020-05-05 18:16:01 -07:00
Stella Laurenzo 502ef8f195 Create skeleton for 'Basicpy' dialect.
* It is time to start adding more python mechanisms.
* Running into this for materializing slice() objects.
2020-05-04 17:48:02 -07:00
Stella Laurenzo d3632af675 Add !numpy.any_dtype dialect type. 2020-04-29 18:20:42 -07:00
Stella Laurenzo c4a192d5c9 Rename from npcomp::NUMPY to NPCOMP::numpy to follow IREE convention. 2020-04-29 17:10:10 -07:00
Stella Laurenzo e845db8a20 Add builtin_ufunc and generic_ufunc ops. 2020-04-28 23:51:54 -07:00
Stella Laurenzo 953ef89a30 Add npcomp-opt and lit runner. 2020-04-26 17:55:15 -07:00
Stella Laurenzo d3b6e1767a Add stub numpy dialect. 2020-04-26 17:20:58 -07:00
Stella Laurenzo 9ee2f6ff7f Initial commit of python boiler-plate. 2020-04-26 15:50:23 -07:00