This required making module descriptors hold a FuncDescriptor* instead
of a pointer to array of FuncDescriptors as it previously did, which is
innocuous (just requires an llvm.bitcast after the llvm.mlir.addressof).
Also, the previous code had a special case for deleting this op when it
had no uses. This is subsumed by the change in this commit since now
shape.const_shape is properly lowered.
With this change, the included test case with multiple serially
dependent ops works!
This specific issue was related to the scalar argument to that
function. We needed to compute a broadcast of a scalar shape (which is a
shape.const_shape) with another shape.
This ~totally reworks the existing "runtime" stuff to be more
principled and usable, such as from Python. It's still not fully
production-quality, mainly in the department of memory management (e.g.
it currently leaks memory; we need to figure out "who frees memrefs" +
the analysis and transformation needed to do that (maybe use upstream
buffer allocation pass?)).
The user API is in include/npcomp/runtime/UserAPI.h, though
include/npcomp/JITRuntime/JITModule.h is a friendlier wrapper.
The stuff under {include,lib}/runtime is totally firewalled from the
compiler and tiny (<6kB, though no attention has gone into optimizing
that size). For example, we don't link in libSupport into the runtime,
instead having our own bare bones replacements for basics like ArrayRef
(the JITRuntime helps with bridging that gap, since it *can* depend on
all common LLVM utilities).
The overall features of npcomprt is that it exposes a module that
with multiple function entry points. Each function has arguments and
results that are tensor-valued, and npcomprt::Tensor is the runtime type
that is used to interact with that (and a npcomprt::Ref<T>
reference-counting wrapper is provided to wrap npcomprt::Tensor in the
common case).
From an implementation perspective, an npcomprt module at the
LLVM/object/binary level exposes a single module descriptor struct that
has pointers to other metadata (currently just a list of function
metadata descriptors). All interactions with the npcomp runtime are
keyed off of that module descriptor, including function lookups and
dispatching. This is done to dodge platform ABI issues and also allow
enough reflection to e.g. verify provided arguments.
Most of the compiler-side work here was in LowerToNpcomprtABI and
LowerToLLVM.
Also,
- Rename npcomp_rt/NpcompRt to npcomprt/Npcomprt; it was getting
annoying to type the underscores/caps.
- misc improvements to bash_helpers.sh
With this commit, we finish conversion to LLVM dialect, and should be
ready for subsequent commits to convert to an LLVM module and let LLVM
codegen to native machine code.
This required a custom "lower to LLVM" pass to support lowering
tcp.abort_if to a runtime call. In the future, this pass will grow to do
type conversions for our own runtime types as we add those.
This more clearly captures its semantics as a structural "observer" of
code that we currently mark as NoSideEffect but eventually lowers to
eager error handling code.
Also, update LowerRankedShapes to erase it, now that the layering here
is clear. That pass reifies the eager error handling code, so the need
for the dummy op to keep things alive isn't needed.
With this change, we are now ready to start lowering to LLVM!
This is the current print-ir-after-all from e2e-lowering-pipeline:
https://reviews.llvm.org/P8221
Specifically, we use unranked memrefs which get passed as a fixed-size
set of arguments/returns. One big caveat about this is that returning
results isn't going to work. See TODO in LowerTensorLoadOp.
This is far from enough runtime-wise, but it starts to demarcate a
plausible layering. Notice for example how this removes the
runtime-dependence from LowerRankedShapes.
Eventually, we want to have an `npcomp_rt` or `npcomp_hal` dialect with
its own set of runtime types that will supercede this.
See comments in LowerTensorLoadOp for more direction about where this is
going to evolve.
The idea was half-baked and after some deep thought felt like a solution
looking for a problem. What we had here (and is removed in this patch)
just wasn't pulling its weight.
I cannot think of anything we would want to do with tcp.island as it is
removed here beyond just sinking and merging them within a basic block,
such that the witness argument is kind of pointless (only matters for
hoisting).
TCP compute ops like tcp.add and tcp.broadcast_to have the strong
invariant of "pure or undefined behavior", which means they are always
safe to sink. The island concept as removed here conferred no benefit.
Also, I'll note that "islands" are a trick you can only play once in a
system (unless they strictly nest). I have some early-stage thoughs on
having an island concept that helps with modeling tensor shapes
robustly which seems promising (the island would serve a similar role as
tie_shape).
This uses an approach inspired by what is done in IREE. See comments on
LowerRankedShapes.cpp for how it works.
The basic gist is that we have an op that creates a !shape.shape from a
set of SSA values representing the extents, and then iteratively replace
any op producing a !shape.shape with instances of that op.
This also adds a small pass to clean up the `dim` ops that linalg
introduces. For now, it only has a trivial pattern that looks for a
`tcp.alloc_memref(%shape)` op to get the shape as we currently have an
invariant that all memrefs are the result of such ops.
But eventually this will need to look through view ops and any other
shape-ish stuff that linalg introduces as it lowers to loops, along with
any slicing ops introduced by buffer allocation.
There's a lot of details to flesh out here, but the basic approach seems
promising (see comments in createE2ELoweringPipeline).
This approach will be put to the test when we try to do our first
fusions since that tickles some of the nasty phase ordering issues
involved here.
But we're not there yet.
This makes sure we stay resonably canonically using the shape machinery.
(In fact, DimOp should probably be in the shape dialect since it hides a
`shape.shape_of` call)