Commit Graph

287 Commits (5c5b8db70ff3c2ce6b231a95b031e7147b289e59)
 

Author SHA1 Message Date
Stella Laurenzo 5c5b8db70f Update test configuration to import mlir from LLVM install location.
* Also adds two lit tests to verify that all of our extensions load without fireworks, which is a good indication that the shared library deps are sane.
* Bumps llvm-project version to use D89167.
2020-10-12 15:25:07 -07:00
Sean Silva f2d5c26c97 Bump llvm-project to 820e65f9e2369d2990fde4b3e7cfceb64f0df9c8
Date:   Mon Oct 12 11:26:50 2020 -0700
2020-10-12 13:30:22 -07:00
Sean Silva 93fc21dad0 [RefBackend] Split out TCF->TCP conversion.
Now the reference backend is cleanly accepts "TCP"+scalar ops.

We introduce tcf-refback-lowering-pipeline which also does TCF->TCP
conversion for convenience until we have a "target interface".
2020-10-12 11:56:39 -07:00
Stella Laurenzo 83342e8715 Add another config var to the CI for shared lib builds.
Missed this one in the previous change.
2020-10-09 16:57:00 -07:00
Stella Laurenzo af4edb63ae Start reworking towards a shared library build.
* Need to have a dag of shared library deps in order to interop across python extensions (as presented in ODM).
* Introduced add_npcomp_library and friends to mirror the MLIR setup.
* Adds a libNPCOMP.so shared library.
* Redirects tools and extensions to link against libNPCOMP.so (instead of static libs).
* Moves all libraries to lib/, all binaries to bin/ and all python extensions to python/. The invariant is that the rpaths are setup to have a one level directory structure.
* Reworks the _torch_mlir extension to build like the others (still need to come up with a consolidated rule to do this instead of open coded).
* Includes an upstream version bump to pick up needed changes.

Sizes with dynamic linking (stripped, release, asserts enabled):
  libNPCOMP.so: 43M (includes much of the underlying LLVM codegen deps)
  libMLIR.so: 31M
  _npcomp.so: 1.6M (python extension)
  _torch_mlir.so: 670K (python extension)
  npcomp-capi-ir-test: 6.3K
  npcomp-opt: 351K
  npcomp-run-mlir: 461K
  mnist-playground: 530K

Still more can be done to normalize and optimize but this gets us structurally to the starting point.
2020-10-09 16:02:58 -07:00
Stella Laurenzo d9dc16a9be Enable libMLIR.so in CI. 2020-10-09 13:57:01 -07:00
Sean Silva d6b05c507a Fix up the docker script / instructions after attempting to use it. 2020-10-09 10:27:06 -07:00
Sean Silva 631c8070df [RefBackend] Put JITModule in refback namsepace. 2020-10-08 09:07:00 -07:00
Sean Silva 7edb5f3641 [RefBackend] Rename RefBackend dialect to Refback
I now realize that VerboseCamelCase is not the best choice for dialect
directory/file names and C++ identifiers (take e.g. "Linalg", "Basicpy",
etc. as prior art here; not LinearAlgebra or BasicPython). If I had to
name the convention it seems to be "Shortword" (or of course just
acronym dialects like LLVM, SCF, etc.).

This rename also has the side benefit of differentiating RefBackend
directories, which now refer to the actual backend itself, from
Refback/Refbackrt, which are the dialects which happen to be used by
that backend.
2020-10-08 09:07:00 -07:00
Sean Silva bf99a82832 [RefBackend] Rename Npcomprt dialect to Refbackrt. 2020-10-08 09:07:00 -07:00
Sean Silva 83ad70ef54 [RefBackend] Move runtime related code under npcomp/RefBackend/
Other than the dialect definitions (which will live in standard Dialect/
subdirectory), the goal here is to keep RefBackend-related code nested
in {include/npcomp,lib,test}/RefBackend.
2020-10-08 09:07:00 -07:00
Stella Laurenzo 51d51241b4 Add scripts/documentation for VSCode setup with a docker dev image.
* Forks a subset of my shell functions into docker_shell_funcs.sh, specifically needed to create docker images that run as yourself.
* Extends the readme with the three command bootstrap to get a dev container running.
* Step by step instructions for configuring VSCode for Intellisense in either npcomp or LLVM.
* Changes LLVM config options to enable tests. This setup is now suitable for upstream changes as well without rebuilding.
2020-10-07 21:27:20 -07:00
Sean Silva ddc2e9de5d [RefBackend] Rename test/E2E.
I missed this in the previous commits.
2020-10-07 15:52:11 -07:00
Sean Silva 21255d5f8e [RefBackend] Rename "E2E" to RefBackend. 2020-10-07 10:29:48 -07:00
Sean Silva 03846ed8e7 Rename a couple CMake targets.
NPCOMPFoo to NPCOMPFooDialect for consistency with others.
2020-10-07 10:29:48 -07:00
Sean Silva 5017430dc7 [RefBackend] Split out RefBackend (refback) dialect from TCP.
This is the first in a patch series that is refactoring the
constellation of things variously called or associated with "E2E",
"RefE2E", "npcomprt", and "TCP" into a more cleanly layered result.

Concretely, this first patch fixes the fact that TCP was basically
acting like a dumping ground needed by the reference backend. This
splits it out, which is fairly mechanical, but touches a lot of lines of
code (basically replacing `tcp` with `refback` and `TCP` with
`RefBackend).

Now, the RefBackend dialect is that dumping ground, which
is slighly better, as it starts allowing TCP to become a nice clean
middle layer that is not related per se to the reference backend.

The previous name RefE2E or "reference e2e flow" was super confusing.
Now that we are seeing more clearly where the "backend" distinction
lies, the [RefBackend] commit tag is born :)
2020-10-07 10:29:48 -07:00
Stella Laurenzo 3ccc2214a7 Set PyTorch captured function return type.
* Resolves various TODOs that required an LLVM change/bump.
* Bumps LLVM to 4aa217160e5f06a96c6effc4950c3b402374de58
2020-10-07 10:14:34 -07:00
Stella Laurenzo 58b6033537 Bump llvm to ed46e84c7aaffd847656ac559acb06089096ec33.
* Minor change of MLIRStandardOps -> MLIRStandard
2020-10-06 22:02:57 -07:00
Stella Laurenzo ad3ddb9edb Implement torch.kernel_call capture.
* Had to stop short of modifying the function return signature because of a missing C-API upstream.
* Committing here is good enough for a test and will resolve the various TODOs about upstream APIs next.
2020-10-06 21:54:28 -07:00
Sean Silva dd1fa2607f Add hopefully short-lived mnist-playground utility.
This unblocks backend progress while the PyTorch frontend work is coming
online. Hopefully we can delete this soon.

See tools/mnist-playground/README.md for more context on what this tool
is for, next steps, and current status.
2020-10-05 13:59:06 -07:00
Sean Silva 8022dfaf1a [RefE2E] Initialize the linalg matmul accumulator buffer.
I was seeing some miscompiles due to the uninitialized data read here
before. Interestingly, this was masked in some of our previous test
cases, since the uninitialized data "always" was so small that it would
present as a rounding error for the 1.0-10.0 sized values that the
matmul was computing on.
2020-10-02 16:24:52 -07:00
Stella Laurenzo e5433e314f Add capture function arguments.
* Adds at::Tensor -> MlirValue tracking.
* Adds conversions for tensor and scalar types to MLIR types.
* Adds npcomp C APIs for constructing custom types.
* Reworks pybind include so as to get Torch pybind helpers (needed to pass at::Tensor type from Python->C++).
2020-10-01 18:59:58 -07:00
Stella Laurenzo 3d74337be0 Add a torch.kernel_call op and associated predicates. 2020-09-29 15:10:38 -07:00
Stella Laurenzo ba03ecc652 Add public API for constructing a module/function to capture PyTorch ops.
* Uses the MLIR-C API since that will save us a lot of grief down the road (i.e. will give PyTorch and libMLIR/libNPCOMP the ability to skew version-wise).
* Quite a few TODOs and not yet populating the function in any way.
2020-09-29 14:23:22 -07:00
Stella Laurenzo 9722a6ce78 Bump LLVM to e72d792c147ee506e337401e20c0f23042cc43fe.
* Does not bump mhlo as an upstream integrate on that project has not taken place and there is not an incompatibility.
2020-09-28 15:34:01 -07:00
Stella Laurenzo 2c9ca79c89 Add boilerplate for Torch dialect. 2020-09-28 15:26:17 -07:00
Stella Laurenzo fb895173f2 Run format_sources.sh. 2020-09-28 12:04:24 -07:00
Stella Laurenzo b5f010284f Add boilerplate to do device capture (pytorch 1.6).
* Uses the new dispatcher API.
* Just prints to the console for the moment when an op is captured.
* Executes the op through the existing implementation.
2020-09-28 10:30:54 -07:00
Sean Silva 16c26ef57e [RefE2E] Use upstream shape constraint conversion pass.
Now that we upstreamed our pass, we can remove it.
The final pass that landed upstream doesn't do the shape.assuming
canonicalization to legalize that op away, so added a
restricted-canonicalizer pass that allowed to run just shape dialect
canonicalizations, which deletes the shape.assuming.
The pass ended up kind of ugly. See the TODO's on it for some potential
cleaner directions.
2020-09-28 09:34:44 -07:00
Sean Silva 6ea37cfed6 Bump llvm-project to 9ed1e5873c19eb817fb9e36d0262c7effee5d35e
Date:   Fri Sep 18 13:55:52 2020 -0700

- Update to linalg syntax
- New generated builders are better. Custom builder for
tcp.shaped_results is now redundant.
2020-09-28 09:34:44 -07:00
Sean Silva f9b37c55b7 [RefE2E] Add support for unary ops exp and tanh
This is fairly mechanical.
2020-09-24 18:41:30 -07:00
Sean Silva 6b69beae6a [NFC] Remove stray .dump() that snuck in. 2020-09-24 18:41:30 -07:00
Stella Laurenzo 0cb28f0b06 Move tests around so we can have dedicated tests for the c10 dispatcher.
* Adds a trivial missing test for _torch_mlir.c10.get_registered_ops()
* Disables the regression tests for now on c10 (until implemented).
2020-09-24 18:28:06 -07:00
Stella Laurenzo 6e6efb2854
Add compatibility notes regarding unpacking quantized weights. (#56)
Co-authored-by: Bryce Arden <arden.bryce@gmail.com>
2020-09-24 17:47:28 -07:00
Stella Laurenzo 47c3a9f461 Add docker image/instructions for building against pytorch 1.6. 2020-09-24 17:40:25 -07:00
Stella Laurenzo 0d91885965
Add initial python bindings for c10 dispatcher internals. (#55)
* Exposes the op registry via a get_registered_ops method.
* Moves the aten dialect generation scripts in prep for integrating them with this facility.
2020-09-24 16:26:29 -07:00
Sean Silva c69e9fabc5 [RefE2E] Add support for "max".
This cleans up the lowering pipeline to easily allow extending to
multiple binary ops. It looks fairly repetitive at multiple levels, but
I don't want to prematurely generalize. I think that in principle we
could derive a large swatch of TCF + TCP from a single linalg-style
specification. Another direction is to use an OpInterface (something
like "buildLinalgGenericBody"). I'm keeping my eye on it.

In a subsequent commit, I'll mechanically add a set of binary ops
modeled off of the std arithmetic ops.
2020-09-22 18:38:32 -07:00
Marius Brehler 681c4e1d4a Inject missing dialects in E2E passes 2020-09-22 08:52:23 +02:00
Sean Silva 7b7f35744b [RefE2E] Add interesting control flow example.
This also required adding a lowering for ForOp in our tensor->memref
conversion.
2020-09-21 12:25:24 -07:00
Stella Laurenzo bc7c852379 Add more ops from the original integration.
* Still need to add a systematic mechanism for discovering gradient ops.
* Work needed on the various _ suffixed inplace ops.
* Other randoms still not mapped.
* Outside of this commit, I do have enough commented/reworked to roughly build but that will take another handful of commits to get going.
2020-09-18 19:11:18 -07:00
Sean Silva 276f5b80ea [RefE2E] Add assemblyFormat for TCF and TCP ops and tidy up. 2020-09-18 15:03:53 -07:00
Sean Silva dc8afc9271 [RefE2E] Refactor how tcf.add is lowered.
It was previously going through this awkward route that prematurely
created linalg.generic ops, which was an annoying layering problem since
we can't compute a shape transfer function for linalg.generic in the
general case. Now we pass it through the same path as tcp.matmul, with
the shape transfer function being defined for tcp.add.

This also removed the need for TCPToLinalg (now deleted). The equivalent
of that is happening in lower-shaped-results-to-memref. One interesting
outcome of this: we're basically using linalg as a "Buffer TCP". We
might want to look into using named structured ops for more of TCP, but
that would be a big velocity hit since then any change to the ODS /
verification for those ops would be a change to the upstream structured
op ODS generator. After we have more experience defining this manually,
we should re-evaluate rebasing TCP on generated named linalg ops.
2020-09-18 15:03:53 -07:00
Sean Silva d8675f8ad2 [RefE2E] Add support for matmul.
I'm pretty happy with how this turned out. It looks pretty much like it
should -- one change at each layer. This particular op bottoms out on
linalg which takes care of the rest.

- Add tcf.matmul
- Add tcp.matmul
- Add TCF->TCP lowering
- Add tcp.matmul shape transfer function (BypassShapes.cpp)
- Add tcp.matmul -> linalg.matmul lowering (LowerShapedResultsToMemref.cpp)
- Add support to LowerShapeConstraints for lowering the new
shape.cstr_require

This matmul op is pretty limited in its capabilities. There is no
batching and no multidimensional contraction. Certainly more design work
will be needed to find the right abstractions that aren't too general
but also help to canonicalize many cases from frontends. This is mainly
to show that adding a new op needn't be very "scary" once we have the
e2e infra in place.

Also,
- this clears out some exploratory cruft from the TCF dialect now that
this is starting to become real.
2020-09-18 11:31:01 -07:00
Sean Silva 62738d3641 [RefE2E] Fix nul-termination bug.
I was seeing some of the error messages come out with some garbage at
the end. This fixes it.
2020-09-18 11:31:01 -07:00
Sean Silva 2284f6b4f1 Bump llvm-project to 7c44651360dd94e17011fd1cd7ec3c755e0363b4
Date:   Thu Sep 17 18:16:41 2020 -0700
2020-09-18 11:31:01 -07:00
Sean Silva 7486befffd Fixes for run_lit.sh
- new build directory layout
- build NPCOMPNativePyExt, now that lit tests use it
2020-09-18 11:31:01 -07:00
Stella Laurenzo 361abebb51 Update README to reference published docker tag. 2020-09-16 23:12:05 -07:00
Stella Laurenzo 8ac29594df
Explicitly load aten and std dialects when constructing a context. (#47)
* This gets the pytorch frontend broadly working and what is left appears to be legitimate failures in 9 tests.
* Errors noted in #46
2020-09-16 23:06:22 -07:00
Stella Laurenzo 678989a321
Update docker, instructions and some fixes for the pytorch 1.3 build. (#45)
* Includes pybind11 directly (for some reason using the pytorch helper header for this depends on a source file not in the image).
* Installs nnpack into the image.
* Installs new-clang and LLD and configures environment to use it (otherwise, link time is terrible).
* Fixes a gcc compile error (in the off chance you build with default gcc compiler).
* Tests are failing based on some dialect registration stuff that must not have been factored correctly. Will followup with a fix.
2020-09-16 21:57:46 -07:00
Sean Silva 75f57b461e
Totally rework RefE2E tensor to memref flow. (#42)
This now gets the overall "RefE2E" compilation stack to a point that I'm
fairly happy with. We simplify it by mostly embracing the "descriptor"
view of the world.

The overall flow is best understood by reading through the
createE2ELoweringPipeline function in lib/E2E/E2E.cpp
That function creates a pass pipeline that lowers from "TCF" (which is
~numpy level of abstraction) down to LLVM IR.

A brief high-level summary of what happens there:

1. TCF to TCP conversion. This involves reifying error handling in the
form of shape constraints. See test/Conversion/TCFToTCP/basic.mlir

2. Lowering shape constraints. This converts shape constraints into
eager error-handling code. See test/E2E/lower-shape-constraints.mlir
This pass will soon go upstream.
Because this lowers to std.assert, some later passes like
LowerToNpcomprtABI and LowerToLLVM are updated to properly plumb this
through e2e.
See test/npcomp-run-mlir/invalid-broadcast.mlir for an execution test
that properly aborts in case of an error.

3. Lowering tensors to memrefs. This is done via a series of passes
rather than an single mega conversion. Unlike the previous code that
mixed in the npcomprt ABI stuff here, it's now a very clean "pure
memref" conversion.
See test/E2E/lower-*-to-memref.mlir and
lib/E2E/TensorToMemref/
Most of the changes are concentrated here.

4. As part of the above, we use the upstream ConvertShapeToStandard for
lowering shapes.

5. We lower linalg to loops and lower loops to CFG using upstream
passes.

6. Rewrite the "ABI" boundaries of the program to npcomprt data
structures (LowerToNpcomprtABI). This mainly affects ABI boundaries and
how global tensor constants are represented. One of the major
improvements in this commit is that now it's a very clean rewrite that
just replaces memrefs on ABI boundaries with !npcomprt.tensor (before
there was a get_extent function that is not needed).
See test/E2E/lower-to-npcomprt-abi.mlir

7. Lower to LLVM with upstream mlir patterns + some patterns for the
npcomprt lowerings.

One aspect here that is still a remnant of a non-descriptor-based tensor
to memref flow is the BypassShapes + LowerShapedResultsToMemref.
BypassShapes wraps the "tensor compute" ops in a tcp.shaped_results
(basically a "tie_shape" kind of op), and then
LowerShapedResultsToMemref uses those annotations to allocate output
buffers while lowering the "tensor compute ops". Note that there are
very few "tensor compute" ops currently supported (tcp.add +
tcp.broadcast_to), so we just hardcode them in both passes.
Realistically, I expect this to go away as we fully embrace the
descriptor-based approach for simplicity, so don't look too deep into
it.
2020-09-16 17:31:40 -07:00