Rework e2e flow to use new "npcomprt"
This ~totally reworks the existing "runtime" stuff to be more
principled and usable, such as from Python. It's still not fully
production-quality, mainly in the department of memory management (e.g.
it currently leaks memory; we need to figure out "who frees memrefs" +
the analysis and transformation needed to do that (maybe use upstream
buffer allocation pass?)).
The user API is in include/npcomp/runtime/UserAPI.h, though
include/npcomp/JITRuntime/JITModule.h is a friendlier wrapper.
The stuff under {include,lib}/runtime is totally firewalled from the
compiler and tiny (<6kB, though no attention has gone into optimizing
that size). For example, we don't link in libSupport into the runtime,
instead having our own bare bones replacements for basics like ArrayRef
(the JITRuntime helps with bridging that gap, since it *can* depend on
all common LLVM utilities).
The overall features of npcomprt is that it exposes a module that
with multiple function entry points. Each function has arguments and
results that are tensor-valued, and npcomprt::Tensor is the runtime type
that is used to interact with that (and a npcomprt::Ref<T>
reference-counting wrapper is provided to wrap npcomprt::Tensor in the
common case).
From an implementation perspective, an npcomprt module at the
LLVM/object/binary level exposes a single module descriptor struct that
has pointers to other metadata (currently just a list of function
metadata descriptors). All interactions with the npcomp runtime are
keyed off of that module descriptor, including function lookups and
dispatching. This is done to dodge platform ABI issues and also allow
enough reflection to e.g. verify provided arguments.
Most of the compiler-side work here was in LowerToNpcomprtABI and
LowerToLLVM.
Also,
- Rename npcomp_rt/NpcompRt to npcomprt/Npcomprt; it was getting
annoying to type the underscores/caps.
- misc improvements to bash_helpers.sh
2020-07-09 08:15:40 +08:00
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
//
|
|
|
|
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
|
|
// See https://llvm.org/LICENSE.txt for license information.
|
|
|
|
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
|
|
|
#include "PassDetail.h"
|
2020-10-07 07:14:37 +08:00
|
|
|
#include "npcomp/RefBackend/RefBackend.h"
|
Rework e2e flow to use new "npcomprt"
This ~totally reworks the existing "runtime" stuff to be more
principled and usable, such as from Python. It's still not fully
production-quality, mainly in the department of memory management (e.g.
it currently leaks memory; we need to figure out "who frees memrefs" +
the analysis and transformation needed to do that (maybe use upstream
buffer allocation pass?)).
The user API is in include/npcomp/runtime/UserAPI.h, though
include/npcomp/JITRuntime/JITModule.h is a friendlier wrapper.
The stuff under {include,lib}/runtime is totally firewalled from the
compiler and tiny (<6kB, though no attention has gone into optimizing
that size). For example, we don't link in libSupport into the runtime,
instead having our own bare bones replacements for basics like ArrayRef
(the JITRuntime helps with bridging that gap, since it *can* depend on
all common LLVM utilities).
The overall features of npcomprt is that it exposes a module that
with multiple function entry points. Each function has arguments and
results that are tensor-valued, and npcomprt::Tensor is the runtime type
that is used to interact with that (and a npcomprt::Ref<T>
reference-counting wrapper is provided to wrap npcomprt::Tensor in the
common case).
From an implementation perspective, an npcomprt module at the
LLVM/object/binary level exposes a single module descriptor struct that
has pointers to other metadata (currently just a list of function
metadata descriptors). All interactions with the npcomp runtime are
keyed off of that module descriptor, including function lookups and
dispatching. This is done to dodge platform ABI issues and also allow
enough reflection to e.g. verify provided arguments.
Most of the compiler-side work here was in LowerToNpcomprtABI and
LowerToLLVM.
Also,
- Rename npcomp_rt/NpcompRt to npcomprt/Npcomprt; it was getting
annoying to type the underscores/caps.
- misc improvements to bash_helpers.sh
2020-07-09 08:15:40 +08:00
|
|
|
|
|
|
|
#include "mlir/Dialect/StandardOps/IR/Ops.h"
|
2020-12-12 06:43:38 +08:00
|
|
|
#include "mlir/IR/BuiltinTypes.h"
|
Rework e2e flow to use new "npcomprt"
This ~totally reworks the existing "runtime" stuff to be more
principled and usable, such as from Python. It's still not fully
production-quality, mainly in the department of memory management (e.g.
it currently leaks memory; we need to figure out "who frees memrefs" +
the analysis and transformation needed to do that (maybe use upstream
buffer allocation pass?)).
The user API is in include/npcomp/runtime/UserAPI.h, though
include/npcomp/JITRuntime/JITModule.h is a friendlier wrapper.
The stuff under {include,lib}/runtime is totally firewalled from the
compiler and tiny (<6kB, though no attention has gone into optimizing
that size). For example, we don't link in libSupport into the runtime,
instead having our own bare bones replacements for basics like ArrayRef
(the JITRuntime helps with bridging that gap, since it *can* depend on
all common LLVM utilities).
The overall features of npcomprt is that it exposes a module that
with multiple function entry points. Each function has arguments and
results that are tensor-valued, and npcomprt::Tensor is the runtime type
that is used to interact with that (and a npcomprt::Ref<T>
reference-counting wrapper is provided to wrap npcomprt::Tensor in the
common case).
From an implementation perspective, an npcomprt module at the
LLVM/object/binary level exposes a single module descriptor struct that
has pointers to other metadata (currently just a list of function
metadata descriptors). All interactions with the npcomp runtime are
keyed off of that module descriptor, including function lookups and
dispatching. This is done to dodge platform ABI issues and also allow
enough reflection to e.g. verify provided arguments.
Most of the compiler-side work here was in LowerToNpcomprtABI and
LowerToLLVM.
Also,
- Rename npcomp_rt/NpcompRt to npcomprt/Npcomprt; it was getting
annoying to type the underscores/caps.
- misc improvements to bash_helpers.sh
2020-07-09 08:15:40 +08:00
|
|
|
#include "mlir/IR/Verifier.h"
|
|
|
|
#include "mlir/Transforms/DialectConversion.h"
|
|
|
|
|
2020-10-08 08:30:10 +08:00
|
|
|
#include "npcomp/Dialect/Refback/IR/RefbackOps.h"
|
2020-10-08 08:12:52 +08:00
|
|
|
#include "npcomp/Dialect/Refbackrt/IR/RefbackrtDialect.h"
|
|
|
|
#include "npcomp/Dialect/Refbackrt/IR/RefbackrtOps.h"
|
Rework e2e flow to use new "npcomprt"
This ~totally reworks the existing "runtime" stuff to be more
principled and usable, such as from Python. It's still not fully
production-quality, mainly in the department of memory management (e.g.
it currently leaks memory; we need to figure out "who frees memrefs" +
the analysis and transformation needed to do that (maybe use upstream
buffer allocation pass?)).
The user API is in include/npcomp/runtime/UserAPI.h, though
include/npcomp/JITRuntime/JITModule.h is a friendlier wrapper.
The stuff under {include,lib}/runtime is totally firewalled from the
compiler and tiny (<6kB, though no attention has gone into optimizing
that size). For example, we don't link in libSupport into the runtime,
instead having our own bare bones replacements for basics like ArrayRef
(the JITRuntime helps with bridging that gap, since it *can* depend on
all common LLVM utilities).
The overall features of npcomprt is that it exposes a module that
with multiple function entry points. Each function has arguments and
results that are tensor-valued, and npcomprt::Tensor is the runtime type
that is used to interact with that (and a npcomprt::Ref<T>
reference-counting wrapper is provided to wrap npcomprt::Tensor in the
common case).
From an implementation perspective, an npcomprt module at the
LLVM/object/binary level exposes a single module descriptor struct that
has pointers to other metadata (currently just a list of function
metadata descriptors). All interactions with the npcomp runtime are
keyed off of that module descriptor, including function lookups and
dispatching. This is done to dodge platform ABI issues and also allow
enough reflection to e.g. verify provided arguments.
Most of the compiler-side work here was in LowerToNpcomprtABI and
LowerToLLVM.
Also,
- Rename npcomp_rt/NpcompRt to npcomprt/Npcomprt; it was getting
annoying to type the underscores/caps.
- misc improvements to bash_helpers.sh
2020-07-09 08:15:40 +08:00
|
|
|
|
|
|
|
using namespace mlir;
|
|
|
|
using namespace mlir::NPCOMP;
|
|
|
|
|
2021-03-11 01:53:03 +08:00
|
|
|
// Since input/output shapes are not hyper-rectangular we specify
|
|
|
|
// a maximum rank for each input shape such that shapes are padded
|
|
|
|
// out to kMaxRank at the ABI boundary. That way we can represent
|
|
|
|
// shapes using a traditional DenseElementsAttr.
|
|
|
|
//
|
|
|
|
// NOTE: When changing this parameter, also change the same `kMaxRank`
|
|
|
|
// parameter in `lib/RefBackend/LowerToLLVM.cpp` so that the LLVM lowering
|
|
|
|
// stays consistent.
|
|
|
|
static constexpr int kMaxRank = 6;
|
|
|
|
|
2020-09-17 08:31:40 +08:00
|
|
|
// Get the type used to represent MemRefType `type` on ABI boundaries.
|
|
|
|
// For convenience we do a cast to MemRefType internally.
|
|
|
|
static Type getABIMemrefType(Type type) {
|
|
|
|
return UnrankedMemRefType::get(type.cast<MemRefType>().getElementType(),
|
|
|
|
/*memorySpace=*/0);
|
|
|
|
}
|
|
|
|
|
Rework e2e flow to use new "npcomprt"
This ~totally reworks the existing "runtime" stuff to be more
principled and usable, such as from Python. It's still not fully
production-quality, mainly in the department of memory management (e.g.
it currently leaks memory; we need to figure out "who frees memrefs" +
the analysis and transformation needed to do that (maybe use upstream
buffer allocation pass?)).
The user API is in include/npcomp/runtime/UserAPI.h, though
include/npcomp/JITRuntime/JITModule.h is a friendlier wrapper.
The stuff under {include,lib}/runtime is totally firewalled from the
compiler and tiny (<6kB, though no attention has gone into optimizing
that size). For example, we don't link in libSupport into the runtime,
instead having our own bare bones replacements for basics like ArrayRef
(the JITRuntime helps with bridging that gap, since it *can* depend on
all common LLVM utilities).
The overall features of npcomprt is that it exposes a module that
with multiple function entry points. Each function has arguments and
results that are tensor-valued, and npcomprt::Tensor is the runtime type
that is used to interact with that (and a npcomprt::Ref<T>
reference-counting wrapper is provided to wrap npcomprt::Tensor in the
common case).
From an implementation perspective, an npcomprt module at the
LLVM/object/binary level exposes a single module descriptor struct that
has pointers to other metadata (currently just a list of function
metadata descriptors). All interactions with the npcomp runtime are
keyed off of that module descriptor, including function lookups and
dispatching. This is done to dodge platform ABI issues and also allow
enough reflection to e.g. verify provided arguments.
Most of the compiler-side work here was in LowerToNpcomprtABI and
LowerToLLVM.
Also,
- Rename npcomp_rt/NpcompRt to npcomprt/Npcomprt; it was getting
annoying to type the underscores/caps.
- misc improvements to bash_helpers.sh
2020-07-09 08:15:40 +08:00
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
// Creating module metadata.
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
2020-10-08 08:12:52 +08:00
|
|
|
// Returns true if the function signature can be expressed with the refbackrt
|
Rework e2e flow to use new "npcomprt"
This ~totally reworks the existing "runtime" stuff to be more
principled and usable, such as from Python. It's still not fully
production-quality, mainly in the department of memory management (e.g.
it currently leaks memory; we need to figure out "who frees memrefs" +
the analysis and transformation needed to do that (maybe use upstream
buffer allocation pass?)).
The user API is in include/npcomp/runtime/UserAPI.h, though
include/npcomp/JITRuntime/JITModule.h is a friendlier wrapper.
The stuff under {include,lib}/runtime is totally firewalled from the
compiler and tiny (<6kB, though no attention has gone into optimizing
that size). For example, we don't link in libSupport into the runtime,
instead having our own bare bones replacements for basics like ArrayRef
(the JITRuntime helps with bridging that gap, since it *can* depend on
all common LLVM utilities).
The overall features of npcomprt is that it exposes a module that
with multiple function entry points. Each function has arguments and
results that are tensor-valued, and npcomprt::Tensor is the runtime type
that is used to interact with that (and a npcomprt::Ref<T>
reference-counting wrapper is provided to wrap npcomprt::Tensor in the
common case).
From an implementation perspective, an npcomprt module at the
LLVM/object/binary level exposes a single module descriptor struct that
has pointers to other metadata (currently just a list of function
metadata descriptors). All interactions with the npcomp runtime are
keyed off of that module descriptor, including function lookups and
dispatching. This is done to dodge platform ABI issues and also allow
enough reflection to e.g. verify provided arguments.
Most of the compiler-side work here was in LowerToNpcomprtABI and
LowerToLLVM.
Also,
- Rename npcomp_rt/NpcompRt to npcomprt/Npcomprt; it was getting
annoying to type the underscores/caps.
- misc improvements to bash_helpers.sh
2020-07-09 08:15:40 +08:00
|
|
|
// ABI.
|
2020-10-08 08:12:52 +08:00
|
|
|
static bool expressibleWithRefbackrtABI(FunctionType type) {
|
|
|
|
// Currently, only memref types can be exposed at refbackrt ABI boundaries.
|
Rework e2e flow to use new "npcomprt"
This ~totally reworks the existing "runtime" stuff to be more
principled and usable, such as from Python. It's still not fully
production-quality, mainly in the department of memory management (e.g.
it currently leaks memory; we need to figure out "who frees memrefs" +
the analysis and transformation needed to do that (maybe use upstream
buffer allocation pass?)).
The user API is in include/npcomp/runtime/UserAPI.h, though
include/npcomp/JITRuntime/JITModule.h is a friendlier wrapper.
The stuff under {include,lib}/runtime is totally firewalled from the
compiler and tiny (<6kB, though no attention has gone into optimizing
that size). For example, we don't link in libSupport into the runtime,
instead having our own bare bones replacements for basics like ArrayRef
(the JITRuntime helps with bridging that gap, since it *can* depend on
all common LLVM utilities).
The overall features of npcomprt is that it exposes a module that
with multiple function entry points. Each function has arguments and
results that are tensor-valued, and npcomprt::Tensor is the runtime type
that is used to interact with that (and a npcomprt::Ref<T>
reference-counting wrapper is provided to wrap npcomprt::Tensor in the
common case).
From an implementation perspective, an npcomprt module at the
LLVM/object/binary level exposes a single module descriptor struct that
has pointers to other metadata (currently just a list of function
metadata descriptors). All interactions with the npcomp runtime are
keyed off of that module descriptor, including function lookups and
dispatching. This is done to dodge platform ABI issues and also allow
enough reflection to e.g. verify provided arguments.
Most of the compiler-side work here was in LowerToNpcomprtABI and
LowerToLLVM.
Also,
- Rename npcomp_rt/NpcompRt to npcomprt/Npcomprt; it was getting
annoying to type the underscores/caps.
- misc improvements to bash_helpers.sh
2020-07-09 08:15:40 +08:00
|
|
|
return llvm::all_of(
|
|
|
|
llvm::concat<const Type>(type.getInputs(), type.getResults()),
|
2021-03-11 01:53:03 +08:00
|
|
|
[](Type t) {
|
|
|
|
return t.isa<UnrankedMemRefType, MemRefType, FloatType>();
|
|
|
|
});
|
|
|
|
}
|
|
|
|
|
|
|
|
// Returns the integer rerpresentation of the CompilerDataStructures::ABIType
|
|
|
|
// Must stay aligned with CompilerDataStructures::ABIArgType enum
|
|
|
|
static uint32_t getIntReprForABIType(Type type) {
|
|
|
|
if (type.isa<MemRefType>() || type.isa<UnrankedMemRefType>()) {
|
|
|
|
return 1;
|
|
|
|
} else if (auto floatTy = type.dyn_cast<FloatType>()) {
|
|
|
|
switch (floatTy.getWidth()) {
|
|
|
|
case 32:
|
|
|
|
return 2;
|
|
|
|
case 64:
|
|
|
|
return 3;
|
|
|
|
default:
|
|
|
|
assert(false && "Unsupported float bit width");
|
|
|
|
}
|
|
|
|
} else if (auto intTy = type.dyn_cast<IntegerType>()) {
|
|
|
|
}
|
|
|
|
// assert(false && "couldn't get IntReprForABIType");
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
// Must stay aligned with CompilerDataStructures::ABIElementType enum
|
|
|
|
static uint32_t getIntReprForABIElementType(Type type) {
|
|
|
|
if (auto shapedTy = type.dyn_cast<ShapedType>()) {
|
|
|
|
auto elemTy = shapedTy.getElementType();
|
|
|
|
if (auto floatTy = elemTy.dyn_cast<FloatType>()) {
|
|
|
|
switch (floatTy.getWidth()) {
|
|
|
|
case 32:
|
|
|
|
return 1;
|
|
|
|
default:
|
|
|
|
assert(false && "Unsupported tensor element type");
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static SmallVector<int32_t, kMaxRank>
|
|
|
|
getExtentsForType(Type type, const int32_t maxRank = kMaxRank) {
|
|
|
|
// Extend all shapes out to 4D to make our lives easier at the ABI boundary
|
|
|
|
if (auto shapedType = type.dyn_cast<ShapedType>()) {
|
|
|
|
if (!shapedType.hasRank()) {
|
|
|
|
return {kMaxRank, kMaxRank, kMaxRank, kMaxRank, kMaxRank, kMaxRank};
|
|
|
|
}
|
|
|
|
|
|
|
|
auto shape = shapedType.getShape();
|
|
|
|
auto shapeRank = shapedType.getRank();
|
|
|
|
if (shapeRank <= maxRank) {
|
|
|
|
SmallVector<int32_t, kMaxRank> extendedShape;
|
|
|
|
// Push back all the values of the shape
|
|
|
|
for (auto extentAndIndex : llvm::enumerate(shape)) {
|
|
|
|
auto extent = extentAndIndex.value();
|
|
|
|
auto index = extentAndIndex.index();
|
|
|
|
if (shapedType.isDynamic(index)) {
|
|
|
|
extendedShape.push_back(-1);
|
|
|
|
} else {
|
|
|
|
extendedShape.push_back(extent);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// Pad whatever is left so we have even vectors
|
|
|
|
auto padRank = maxRank - shapeRank;
|
|
|
|
for (int i = 0; i < padRank; i++)
|
|
|
|
extendedShape.push_back(0xDEAD'BEEF);
|
|
|
|
|
|
|
|
return extendedShape;
|
|
|
|
} else {
|
|
|
|
assert(false && "unsupported rank");
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// Represent Scalar's as all 1's.
|
|
|
|
return {kMaxRank, kMaxRank, kMaxRank, kMaxRank, kMaxRank, kMaxRank};
|
|
|
|
}
|
|
|
|
|
|
|
|
int32_t getRankForType(Type type) {
|
|
|
|
// Returns a rank of -1 if the tensor is unranked
|
|
|
|
if (auto shapedType = type.dyn_cast<ShapedType>()) {
|
|
|
|
return shapedType.hasRank() ? shapedType.getRank() : -1;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
uint32_t hasStaticShape(Type type) {
|
|
|
|
if (auto shapedType = type.dyn_cast<ShapedType>()) {
|
|
|
|
return shapedType.hasStaticShape() ? 1 : 0;
|
|
|
|
}
|
|
|
|
// Assume scalars and non-shaped type things are static
|
|
|
|
return 1;
|
Rework e2e flow to use new "npcomprt"
This ~totally reworks the existing "runtime" stuff to be more
principled and usable, such as from Python. It's still not fully
production-quality, mainly in the department of memory management (e.g.
it currently leaks memory; we need to figure out "who frees memrefs" +
the analysis and transformation needed to do that (maybe use upstream
buffer allocation pass?)).
The user API is in include/npcomp/runtime/UserAPI.h, though
include/npcomp/JITRuntime/JITModule.h is a friendlier wrapper.
The stuff under {include,lib}/runtime is totally firewalled from the
compiler and tiny (<6kB, though no attention has gone into optimizing
that size). For example, we don't link in libSupport into the runtime,
instead having our own bare bones replacements for basics like ArrayRef
(the JITRuntime helps with bridging that gap, since it *can* depend on
all common LLVM utilities).
The overall features of npcomprt is that it exposes a module that
with multiple function entry points. Each function has arguments and
results that are tensor-valued, and npcomprt::Tensor is the runtime type
that is used to interact with that (and a npcomprt::Ref<T>
reference-counting wrapper is provided to wrap npcomprt::Tensor in the
common case).
From an implementation perspective, an npcomprt module at the
LLVM/object/binary level exposes a single module descriptor struct that
has pointers to other metadata (currently just a list of function
metadata descriptors). All interactions with the npcomp runtime are
keyed off of that module descriptor, including function lookups and
dispatching. This is done to dodge platform ABI issues and also allow
enough reflection to e.g. verify provided arguments.
Most of the compiler-side work here was in LowerToNpcomprtABI and
LowerToLLVM.
Also,
- Rename npcomp_rt/NpcompRt to npcomprt/Npcomprt; it was getting
annoying to type the underscores/caps.
- misc improvements to bash_helpers.sh
2020-07-09 08:15:40 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static LogicalResult createModuleMetadata(ModuleOp module) {
|
|
|
|
auto moduleMetadata =
|
|
|
|
OpBuilder::atBlockBegin(module.getBody())
|
2020-10-08 08:12:52 +08:00
|
|
|
.create<refbackrt::ModuleMetadataOp>(module.getLoc());
|
Rework e2e flow to use new "npcomprt"
This ~totally reworks the existing "runtime" stuff to be more
principled and usable, such as from Python. It's still not fully
production-quality, mainly in the department of memory management (e.g.
it currently leaks memory; we need to figure out "who frees memrefs" +
the analysis and transformation needed to do that (maybe use upstream
buffer allocation pass?)).
The user API is in include/npcomp/runtime/UserAPI.h, though
include/npcomp/JITRuntime/JITModule.h is a friendlier wrapper.
The stuff under {include,lib}/runtime is totally firewalled from the
compiler and tiny (<6kB, though no attention has gone into optimizing
that size). For example, we don't link in libSupport into the runtime,
instead having our own bare bones replacements for basics like ArrayRef
(the JITRuntime helps with bridging that gap, since it *can* depend on
all common LLVM utilities).
The overall features of npcomprt is that it exposes a module that
with multiple function entry points. Each function has arguments and
results that are tensor-valued, and npcomprt::Tensor is the runtime type
that is used to interact with that (and a npcomprt::Ref<T>
reference-counting wrapper is provided to wrap npcomprt::Tensor in the
common case).
From an implementation perspective, an npcomprt module at the
LLVM/object/binary level exposes a single module descriptor struct that
has pointers to other metadata (currently just a list of function
metadata descriptors). All interactions with the npcomp runtime are
keyed off of that module descriptor, including function lookups and
dispatching. This is done to dodge platform ABI issues and also allow
enough reflection to e.g. verify provided arguments.
Most of the compiler-side work here was in LowerToNpcomprtABI and
LowerToLLVM.
Also,
- Rename npcomp_rt/NpcompRt to npcomprt/Npcomprt; it was getting
annoying to type the underscores/caps.
- misc improvements to bash_helpers.sh
2020-07-09 08:15:40 +08:00
|
|
|
moduleMetadata.metadatas().push_back(new Block);
|
|
|
|
Block &metadatas = moduleMetadata.metadatas().front();
|
|
|
|
OpBuilder::atBlockEnd(&metadatas)
|
2020-10-08 08:12:52 +08:00
|
|
|
.create<refbackrt::ModuleMetadataTerminatorOp>(module.getLoc());
|
Rework e2e flow to use new "npcomprt"
This ~totally reworks the existing "runtime" stuff to be more
principled and usable, such as from Python. It's still not fully
production-quality, mainly in the department of memory management (e.g.
it currently leaks memory; we need to figure out "who frees memrefs" +
the analysis and transformation needed to do that (maybe use upstream
buffer allocation pass?)).
The user API is in include/npcomp/runtime/UserAPI.h, though
include/npcomp/JITRuntime/JITModule.h is a friendlier wrapper.
The stuff under {include,lib}/runtime is totally firewalled from the
compiler and tiny (<6kB, though no attention has gone into optimizing
that size). For example, we don't link in libSupport into the runtime,
instead having our own bare bones replacements for basics like ArrayRef
(the JITRuntime helps with bridging that gap, since it *can* depend on
all common LLVM utilities).
The overall features of npcomprt is that it exposes a module that
with multiple function entry points. Each function has arguments and
results that are tensor-valued, and npcomprt::Tensor is the runtime type
that is used to interact with that (and a npcomprt::Ref<T>
reference-counting wrapper is provided to wrap npcomprt::Tensor in the
common case).
From an implementation perspective, an npcomprt module at the
LLVM/object/binary level exposes a single module descriptor struct that
has pointers to other metadata (currently just a list of function
metadata descriptors). All interactions with the npcomp runtime are
keyed off of that module descriptor, including function lookups and
dispatching. This is done to dodge platform ABI issues and also allow
enough reflection to e.g. verify provided arguments.
Most of the compiler-side work here was in LowerToNpcomprtABI and
LowerToLLVM.
Also,
- Rename npcomp_rt/NpcompRt to npcomprt/Npcomprt; it was getting
annoying to type the underscores/caps.
- misc improvements to bash_helpers.sh
2020-07-09 08:15:40 +08:00
|
|
|
|
|
|
|
SymbolTable symbolTable(module);
|
|
|
|
auto builder = OpBuilder::atBlockBegin(&metadatas);
|
|
|
|
for (auto func : module.getOps<FuncOp>()) {
|
|
|
|
if (symbolTable.getSymbolVisibility(func) !=
|
|
|
|
SymbolTable::Visibility::Public) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
// TODO: Add richer information here such as expected shapes and element
|
|
|
|
// types.
|
2021-03-11 01:53:03 +08:00
|
|
|
SmallVector<uint32_t, 6> inputABIArgTypes;
|
|
|
|
SmallVector<uint32_t, 6> inputABIElementTypes;
|
|
|
|
SmallVector<SmallVector<int32_t, kMaxRank>, 6> inputABIShapes;
|
|
|
|
SmallVector<uint32_t, 6> inputABIRanks;
|
|
|
|
// SmallVector<uint32_t, 6> inputIsStatic;
|
|
|
|
for (const auto &inputArgType : func.getBody().front().getArgumentTypes()) {
|
|
|
|
inputABIArgTypes.push_back(getIntReprForABIType(inputArgType));
|
|
|
|
inputABIElementTypes.push_back(getIntReprForABIElementType(inputArgType));
|
|
|
|
inputABIShapes.push_back(
|
|
|
|
getExtentsForType(inputArgType, /*maxRank=*/kMaxRank));
|
|
|
|
inputABIRanks.push_back(getRankForType(inputArgType));
|
|
|
|
// inputIsStatic.push_back(hasStaticShape(inputArgType));
|
|
|
|
}
|
|
|
|
|
|
|
|
SmallVector<uint32_t, 6> outputABIArgTypes;
|
|
|
|
SmallVector<uint32_t, 6> outputABIElementTypes;
|
|
|
|
SmallVector<SmallVector<int32_t, kMaxRank>, 6> outputABIShapes;
|
|
|
|
SmallVector<uint32_t, 6> outputABIRanks;
|
|
|
|
SmallVector<uint32_t, 6> outputIsStatic;
|
|
|
|
for (const auto &outputArgType : func.getCallableResults()) {
|
|
|
|
outputABIArgTypes.push_back(getIntReprForABIType(outputArgType));
|
|
|
|
outputABIElementTypes.push_back(
|
|
|
|
getIntReprForABIElementType(outputArgType));
|
|
|
|
outputABIShapes.push_back(
|
|
|
|
getExtentsForType(outputArgType, /*maxRank=*/kMaxRank));
|
|
|
|
outputABIRanks.push_back(getRankForType(outputArgType));
|
|
|
|
// outputIsStatic.push_back(hasStaticShape(outputArgType));
|
|
|
|
}
|
|
|
|
|
|
|
|
auto i32Type = builder.getIntegerType(32);
|
|
|
|
auto inputABIDataType =
|
|
|
|
RankedTensorType::get(inputABIArgTypes.size(), i32Type);
|
|
|
|
auto inputABIElementType =
|
|
|
|
RankedTensorType::get(inputABIElementTypes.size(), i32Type);
|
|
|
|
auto inputABIShapesType = RankedTensorType::get(
|
|
|
|
llvm::ArrayRef<int64_t>{static_cast<long>(inputABIShapes.size()) *
|
|
|
|
kMaxRank},
|
|
|
|
i32Type);
|
|
|
|
auto inputABIRanksType =
|
|
|
|
RankedTensorType::get(inputABIRanks.size(), i32Type);
|
|
|
|
// auto inputIsStaticType = RankedTensorType::get(inputIsStatic.size(),
|
|
|
|
// i32Type);
|
|
|
|
auto outputABIDataType =
|
|
|
|
RankedTensorType::get(outputABIArgTypes.size(), i32Type);
|
|
|
|
auto outputABIElementType =
|
|
|
|
RankedTensorType::get(outputABIElementTypes.size(), i32Type);
|
|
|
|
auto outputABIShapesType = RankedTensorType::get(
|
|
|
|
llvm::ArrayRef<int64_t>{static_cast<long>(outputABIShapes.size()) *
|
|
|
|
kMaxRank},
|
|
|
|
i32Type);
|
|
|
|
auto outputABIRanksType =
|
|
|
|
RankedTensorType::get(outputABIRanks.size(), i32Type);
|
|
|
|
// auto outputIsStaticType = RankedTensorType::get(outputIsStatic.size(),
|
|
|
|
// i32Type);
|
|
|
|
|
|
|
|
// TODO(brycearden): I'm sure there's a cleaner way to do this
|
|
|
|
auto flattenABIShapes =
|
|
|
|
[](SmallVector<SmallVector<int32_t, kMaxRank>, 6> shapes) {
|
|
|
|
SmallVector<int32_t, 32> ret;
|
|
|
|
for (auto &shape : shapes)
|
|
|
|
for (auto &dim : shape)
|
|
|
|
ret.push_back(dim);
|
|
|
|
return ret;
|
|
|
|
};
|
|
|
|
|
|
|
|
SmallVector<NamedAttribute, 16> namedAttrs;
|
|
|
|
|
|
|
|
// Add attributes that are valid for every func (funcName, numInputs,
|
|
|
|
// numOutputs)
|
|
|
|
namedAttrs.push_back(
|
|
|
|
std::make_pair(Identifier::get("funcName", module.getContext()),
|
|
|
|
builder.getSymbolRefAttr(func.getName())));
|
|
|
|
namedAttrs.push_back(
|
|
|
|
std::make_pair(Identifier::get("numInputs", module.getContext()),
|
|
|
|
builder.getI32IntegerAttr(func.getNumArguments())));
|
|
|
|
namedAttrs.push_back(
|
|
|
|
std::make_pair(Identifier::get("numOutputs", module.getContext()),
|
|
|
|
builder.getI32IntegerAttr(func.getNumResults())));
|
|
|
|
|
|
|
|
if (inputABIArgTypes.size()) {
|
|
|
|
// Only add input information if there are inputs
|
|
|
|
namedAttrs.push_back(std::make_pair(
|
|
|
|
Identifier::get("inputArgTypes", func.getContext()),
|
|
|
|
DenseIntElementsAttr::get(inputABIDataType,
|
|
|
|
llvm::makeArrayRef(inputABIArgTypes))));
|
|
|
|
namedAttrs.push_back(std::make_pair(
|
|
|
|
Identifier::get("inputElementTypes", func.getContext()),
|
|
|
|
DenseIntElementsAttr::get(inputABIElementType,
|
|
|
|
llvm::makeArrayRef(inputABIElementTypes))));
|
|
|
|
namedAttrs.push_back(std::make_pair(
|
|
|
|
Identifier::get("inputRanks", func.getContext()),
|
|
|
|
DenseIntElementsAttr::get(inputABIRanksType,
|
|
|
|
llvm::makeArrayRef(inputABIRanks))));
|
|
|
|
auto inputShapesFlattened = flattenABIShapes(inputABIShapes);
|
|
|
|
namedAttrs.push_back(std::make_pair(
|
|
|
|
Identifier::get("inputShapes", func.getContext()),
|
|
|
|
DenseIntElementsAttr::get(
|
|
|
|
inputABIShapesType,
|
|
|
|
llvm::makeArrayRef(flattenABIShapes(inputABIShapes)))));
|
|
|
|
}
|
|
|
|
|
|
|
|
if (outputABIArgTypes.size()) {
|
|
|
|
// Only add output information if there are outptus
|
|
|
|
namedAttrs.push_back(std::make_pair(
|
|
|
|
Identifier::get("outputArgTypes", func.getContext()),
|
|
|
|
DenseIntElementsAttr::get(outputABIDataType,
|
|
|
|
llvm::makeArrayRef(outputABIArgTypes))));
|
|
|
|
namedAttrs.push_back(std::make_pair(
|
|
|
|
Identifier::get("outputElementTypes", func.getContext()),
|
|
|
|
DenseIntElementsAttr::get(
|
|
|
|
outputABIElementType,
|
|
|
|
llvm::makeArrayRef(outputABIElementTypes))));
|
|
|
|
namedAttrs.push_back(std::make_pair(
|
|
|
|
Identifier::get("outputRanks", func.getContext()),
|
|
|
|
DenseIntElementsAttr::get(outputABIRanksType,
|
|
|
|
llvm::makeArrayRef(outputABIRanks))));
|
|
|
|
namedAttrs.push_back(std::make_pair(
|
|
|
|
Identifier::get("outputShapes", func.getContext()),
|
|
|
|
DenseIntElementsAttr::get(
|
|
|
|
outputABIShapesType,
|
|
|
|
llvm::makeArrayRef(flattenABIShapes(outputABIShapes)))));
|
|
|
|
}
|
|
|
|
|
|
|
|
builder.create<refbackrt::FuncMetadataOp>(func.getLoc(), ArrayRef<Type>{},
|
|
|
|
ArrayRef<Value>{}, namedAttrs);
|
Rework e2e flow to use new "npcomprt"
This ~totally reworks the existing "runtime" stuff to be more
principled and usable, such as from Python. It's still not fully
production-quality, mainly in the department of memory management (e.g.
it currently leaks memory; we need to figure out "who frees memrefs" +
the analysis and transformation needed to do that (maybe use upstream
buffer allocation pass?)).
The user API is in include/npcomp/runtime/UserAPI.h, though
include/npcomp/JITRuntime/JITModule.h is a friendlier wrapper.
The stuff under {include,lib}/runtime is totally firewalled from the
compiler and tiny (<6kB, though no attention has gone into optimizing
that size). For example, we don't link in libSupport into the runtime,
instead having our own bare bones replacements for basics like ArrayRef
(the JITRuntime helps with bridging that gap, since it *can* depend on
all common LLVM utilities).
The overall features of npcomprt is that it exposes a module that
with multiple function entry points. Each function has arguments and
results that are tensor-valued, and npcomprt::Tensor is the runtime type
that is used to interact with that (and a npcomprt::Ref<T>
reference-counting wrapper is provided to wrap npcomprt::Tensor in the
common case).
From an implementation perspective, an npcomprt module at the
LLVM/object/binary level exposes a single module descriptor struct that
has pointers to other metadata (currently just a list of function
metadata descriptors). All interactions with the npcomp runtime are
keyed off of that module descriptor, including function lookups and
dispatching. This is done to dodge platform ABI issues and also allow
enough reflection to e.g. verify provided arguments.
Most of the compiler-side work here was in LowerToNpcomprtABI and
LowerToLLVM.
Also,
- Rename npcomp_rt/NpcompRt to npcomprt/Npcomprt; it was getting
annoying to type the underscores/caps.
- misc improvements to bash_helpers.sh
2020-07-09 08:15:40 +08:00
|
|
|
|
2020-10-08 08:12:52 +08:00
|
|
|
if (!expressibleWithRefbackrtABI(func.getType()))
|
|
|
|
return func.emitError() << "func not expressible with refbackrt ABI";
|
Rework e2e flow to use new "npcomprt"
This ~totally reworks the existing "runtime" stuff to be more
principled and usable, such as from Python. It's still not fully
production-quality, mainly in the department of memory management (e.g.
it currently leaks memory; we need to figure out "who frees memrefs" +
the analysis and transformation needed to do that (maybe use upstream
buffer allocation pass?)).
The user API is in include/npcomp/runtime/UserAPI.h, though
include/npcomp/JITRuntime/JITModule.h is a friendlier wrapper.
The stuff under {include,lib}/runtime is totally firewalled from the
compiler and tiny (<6kB, though no attention has gone into optimizing
that size). For example, we don't link in libSupport into the runtime,
instead having our own bare bones replacements for basics like ArrayRef
(the JITRuntime helps with bridging that gap, since it *can* depend on
all common LLVM utilities).
The overall features of npcomprt is that it exposes a module that
with multiple function entry points. Each function has arguments and
results that are tensor-valued, and npcomprt::Tensor is the runtime type
that is used to interact with that (and a npcomprt::Ref<T>
reference-counting wrapper is provided to wrap npcomprt::Tensor in the
common case).
From an implementation perspective, an npcomprt module at the
LLVM/object/binary level exposes a single module descriptor struct that
has pointers to other metadata (currently just a list of function
metadata descriptors). All interactions with the npcomp runtime are
keyed off of that module descriptor, including function lookups and
dispatching. This is done to dodge platform ABI issues and also allow
enough reflection to e.g. verify provided arguments.
Most of the compiler-side work here was in LowerToNpcomprtABI and
LowerToLLVM.
Also,
- Rename npcomp_rt/NpcompRt to npcomprt/Npcomprt; it was getting
annoying to type the underscores/caps.
- misc improvements to bash_helpers.sh
2020-07-09 08:15:40 +08:00
|
|
|
}
|
|
|
|
return success();
|
|
|
|
}
|
|
|
|
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
// Dialect conversion.
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
|
|
|
namespace {
|
2020-09-17 08:31:40 +08:00
|
|
|
class LowerAssertOp : public OpConversionPattern<AssertOp> {
|
Rework e2e flow to use new "npcomprt"
This ~totally reworks the existing "runtime" stuff to be more
principled and usable, such as from Python. It's still not fully
production-quality, mainly in the department of memory management (e.g.
it currently leaks memory; we need to figure out "who frees memrefs" +
the analysis and transformation needed to do that (maybe use upstream
buffer allocation pass?)).
The user API is in include/npcomp/runtime/UserAPI.h, though
include/npcomp/JITRuntime/JITModule.h is a friendlier wrapper.
The stuff under {include,lib}/runtime is totally firewalled from the
compiler and tiny (<6kB, though no attention has gone into optimizing
that size). For example, we don't link in libSupport into the runtime,
instead having our own bare bones replacements for basics like ArrayRef
(the JITRuntime helps with bridging that gap, since it *can* depend on
all common LLVM utilities).
The overall features of npcomprt is that it exposes a module that
with multiple function entry points. Each function has arguments and
results that are tensor-valued, and npcomprt::Tensor is the runtime type
that is used to interact with that (and a npcomprt::Ref<T>
reference-counting wrapper is provided to wrap npcomprt::Tensor in the
common case).
From an implementation perspective, an npcomprt module at the
LLVM/object/binary level exposes a single module descriptor struct that
has pointers to other metadata (currently just a list of function
metadata descriptors). All interactions with the npcomp runtime are
keyed off of that module descriptor, including function lookups and
dispatching. This is done to dodge platform ABI issues and also allow
enough reflection to e.g. verify provided arguments.
Most of the compiler-side work here was in LowerToNpcomprtABI and
LowerToLLVM.
Also,
- Rename npcomp_rt/NpcompRt to npcomprt/Npcomprt; it was getting
annoying to type the underscores/caps.
- misc improvements to bash_helpers.sh
2020-07-09 08:15:40 +08:00
|
|
|
public:
|
|
|
|
using OpConversionPattern::OpConversionPattern;
|
|
|
|
LogicalResult
|
2020-09-17 08:31:40 +08:00
|
|
|
matchAndRewrite(AssertOp op, ArrayRef<Value> operands,
|
Rework e2e flow to use new "npcomprt"
This ~totally reworks the existing "runtime" stuff to be more
principled and usable, such as from Python. It's still not fully
production-quality, mainly in the department of memory management (e.g.
it currently leaks memory; we need to figure out "who frees memrefs" +
the analysis and transformation needed to do that (maybe use upstream
buffer allocation pass?)).
The user API is in include/npcomp/runtime/UserAPI.h, though
include/npcomp/JITRuntime/JITModule.h is a friendlier wrapper.
The stuff under {include,lib}/runtime is totally firewalled from the
compiler and tiny (<6kB, though no attention has gone into optimizing
that size). For example, we don't link in libSupport into the runtime,
instead having our own bare bones replacements for basics like ArrayRef
(the JITRuntime helps with bridging that gap, since it *can* depend on
all common LLVM utilities).
The overall features of npcomprt is that it exposes a module that
with multiple function entry points. Each function has arguments and
results that are tensor-valued, and npcomprt::Tensor is the runtime type
that is used to interact with that (and a npcomprt::Ref<T>
reference-counting wrapper is provided to wrap npcomprt::Tensor in the
common case).
From an implementation perspective, an npcomprt module at the
LLVM/object/binary level exposes a single module descriptor struct that
has pointers to other metadata (currently just a list of function
metadata descriptors). All interactions with the npcomp runtime are
keyed off of that module descriptor, including function lookups and
dispatching. This is done to dodge platform ABI issues and also allow
enough reflection to e.g. verify provided arguments.
Most of the compiler-side work here was in LowerToNpcomprtABI and
LowerToLLVM.
Also,
- Rename npcomp_rt/NpcompRt to npcomprt/Npcomprt; it was getting
annoying to type the underscores/caps.
- misc improvements to bash_helpers.sh
2020-07-09 08:15:40 +08:00
|
|
|
ConversionPatternRewriter &rewriter) const override {
|
2020-09-17 08:31:40 +08:00
|
|
|
AssertOp::Adaptor adaptor(operands);
|
2020-10-08 08:12:52 +08:00
|
|
|
// The refbackrt runtime function aborts if the argument is true, rather
|
|
|
|
// than when it is false as an `assert` does. So negate the predicate (by
|
|
|
|
// xor'ing with 1).
|
2020-09-17 08:31:40 +08:00
|
|
|
auto c1 = rewriter.create<ConstantOp>(
|
|
|
|
op.getLoc(), rewriter.getIntegerAttr(rewriter.getI1Type(),
|
|
|
|
APInt(/*numBits=*/1, /*val=*/1)));
|
|
|
|
Value assertFailed = rewriter.create<XOrOp>(op.getLoc(), adaptor.arg(), c1);
|
2020-10-08 08:12:52 +08:00
|
|
|
rewriter.replaceOpWithNewOp<refbackrt::AbortIfOp>(op, assertFailed,
|
|
|
|
op.msgAttr());
|
Rework e2e flow to use new "npcomprt"
This ~totally reworks the existing "runtime" stuff to be more
principled and usable, such as from Python. It's still not fully
production-quality, mainly in the department of memory management (e.g.
it currently leaks memory; we need to figure out "who frees memrefs" +
the analysis and transformation needed to do that (maybe use upstream
buffer allocation pass?)).
The user API is in include/npcomp/runtime/UserAPI.h, though
include/npcomp/JITRuntime/JITModule.h is a friendlier wrapper.
The stuff under {include,lib}/runtime is totally firewalled from the
compiler and tiny (<6kB, though no attention has gone into optimizing
that size). For example, we don't link in libSupport into the runtime,
instead having our own bare bones replacements for basics like ArrayRef
(the JITRuntime helps with bridging that gap, since it *can* depend on
all common LLVM utilities).
The overall features of npcomprt is that it exposes a module that
with multiple function entry points. Each function has arguments and
results that are tensor-valued, and npcomprt::Tensor is the runtime type
that is used to interact with that (and a npcomprt::Ref<T>
reference-counting wrapper is provided to wrap npcomprt::Tensor in the
common case).
From an implementation perspective, an npcomprt module at the
LLVM/object/binary level exposes a single module descriptor struct that
has pointers to other metadata (currently just a list of function
metadata descriptors). All interactions with the npcomp runtime are
keyed off of that module descriptor, including function lookups and
dispatching. This is done to dodge platform ABI issues and also allow
enough reflection to e.g. verify provided arguments.
Most of the compiler-side work here was in LowerToNpcomprtABI and
LowerToLLVM.
Also,
- Rename npcomp_rt/NpcompRt to npcomprt/Npcomprt; it was getting
annoying to type the underscores/caps.
- misc improvements to bash_helpers.sh
2020-07-09 08:15:40 +08:00
|
|
|
return success();
|
|
|
|
}
|
|
|
|
};
|
|
|
|
} // namespace
|
|
|
|
|
2020-07-11 08:31:24 +08:00
|
|
|
namespace {
|
[RefBackend] Fix leaks related to ABI boundaries.
Best as I can tell (e.g. from LeakSanitizer), this fixes all the leaks
except for those due to buffers created internally to the codegenned
code itself (up next I'll add the buffer deallocation pass to fix
those).
The main change is that instead of attempting to pass `refbackrt::Tensor`
to the codegenned function directly, we make all the ABI types be
UnrankedMemRef which gets passed awkwardly (but workably) as a
`{size_t rank, void *ptrToDescriptor}` on the ABI. The reason why
refbackrt::Tensor wasn't workable is that is that MLIR doesn't really
have a way to deal with the lifetime of unranked memref descriptors that
happen inside the function, which is inevitably what would happen in the
old code that would emit runtime calls to
`refbackrt.to_memref/refbackrt.from_memref` to convert back and forth to
`refbackrt::Tensor` inside the codegenned code.
So, instead of the `refbackrt.to_memref/refbackrt.from_memref` with no
real sound basis for valid lifetime management, we now have a lovely
piece of code in `refbackrt::invoke` in `Runtime.cpp` that just barely
seems to be sound. We rely on the codegenned code having these
properties, which it seems to have:
- it won't free memref descriptors or their backing buffer for arguments
of UnrankedMemRef type.
- it will allocate a separate memref descriptor for each result
UnrankedMemRef (which is ensured by having a separate memref_cast for
each)
- we can sniff the `allocatedPtr`'s (i.e. the backing buffer pointers)
to avoid double-freeing in the case of aliasing of the backing buffer
(including backing buffers for arguments feeding into results)
- to catch the case of statically allocated data (which we need to avoid
passing to `free`) , check if the `allocatedPtr` is (no joke) equal to
`0xDEADBEEF`, because there is otherwise no way to distinguish
statically allocated from malloc'ed data... (std.global_memref lowering
to LLVM by happenstance sets the allocatedPtr equal to `0xDEADBEEF`,
presumably mainly as a debugging thing)
Even with all this, we *still* need to (internally to refbackrt::invoke)
make copies of all inputs/outputs! And the details of how the LLVM-level
ABI gets laid out for e.g. function arguments/returns is still super
tricky.
This really highlights how deficient memref is as the general runtime
type for our use case. It's stewing in my mind how best to improve the
situation. My general gut feeling is that IREE's abstractions for this
are "right", but I need to think more how to distill those aspects of
IREE's design in a "reference" way for RefBackend.
Some implementation notes:
- In terms of how this is implemented, this did catch a bug in our ABI
wrapper functions in LowerToLLVM.cpp, which I had to fix (it happened to
work before through some combination of npcomprt::Tensor being passed as
a single pointer + probably me infinite-monkey-ing it until it worked)
- This actually removes 2 out of the 3 compiler runtime functions (the
only one left is "abort_if". (most of the memref descriptor code moved
from CopmilerRuntime.cpp to Runtime.cpp)
- this also means deleting `refbackrt.from_memref` and
`refbackrt.to_memref`
2020-11-25 09:18:57 +08:00
|
|
|
// At ABI boundaries, convert all memrefs to unranked memrefs so that they have
|
|
|
|
// a fixed ABI.
|
2020-09-17 08:31:40 +08:00
|
|
|
class FuncOpSignatureConversion : public OpConversionPattern<FuncOp> {
|
2020-07-11 08:31:24 +08:00
|
|
|
public:
|
|
|
|
using OpConversionPattern::OpConversionPattern;
|
|
|
|
LogicalResult
|
2020-09-17 08:31:40 +08:00
|
|
|
matchAndRewrite(FuncOp op, ArrayRef<Value> operands,
|
2020-07-11 08:31:24 +08:00
|
|
|
ConversionPatternRewriter &rewriter) const override {
|
2020-09-17 08:31:40 +08:00
|
|
|
FunctionType type = op.getType();
|
|
|
|
|
|
|
|
TypeConverter::SignatureConversion entryConversion(type.getNumInputs());
|
|
|
|
if (failed(typeConverter->convertSignatureArgs(type.getInputs(),
|
|
|
|
entryConversion)))
|
|
|
|
return rewriter.notifyMatchFailure(op, "could not convert inputs");
|
|
|
|
SmallVector<Type, 1> newResultTypes;
|
|
|
|
if (failed(typeConverter->convertTypes(type.getResults(), newResultTypes)))
|
|
|
|
return rewriter.notifyMatchFailure(op, "could not convert outputs");
|
|
|
|
|
|
|
|
rewriter.updateRootInPlace(op, [&] {
|
|
|
|
// Update the function type.
|
2021-01-06 08:12:11 +08:00
|
|
|
op.setType(FunctionType::get(op.getContext(),
|
|
|
|
entryConversion.getConvertedTypes(),
|
|
|
|
newResultTypes));
|
2020-09-17 08:31:40 +08:00
|
|
|
// Rewrite the entry block.
|
|
|
|
Block &oldEntry = op.getBody().front();
|
|
|
|
Block &newEntry =
|
|
|
|
*rewriter.applySignatureConversion(&op.getBody(), entryConversion);
|
|
|
|
OpBuilder::InsertionGuard guard(rewriter);
|
|
|
|
rewriter.setInsertionPointToStart(&newEntry);
|
|
|
|
BlockArgument newArg, oldArg;
|
|
|
|
for (auto newAndOldArg :
|
|
|
|
llvm::zip(newEntry.getArguments(), oldEntry.getArguments())) {
|
|
|
|
std::tie(newArg, oldArg) = newAndOldArg;
|
[RefBackend] Fix leaks related to ABI boundaries.
Best as I can tell (e.g. from LeakSanitizer), this fixes all the leaks
except for those due to buffers created internally to the codegenned
code itself (up next I'll add the buffer deallocation pass to fix
those).
The main change is that instead of attempting to pass `refbackrt::Tensor`
to the codegenned function directly, we make all the ABI types be
UnrankedMemRef which gets passed awkwardly (but workably) as a
`{size_t rank, void *ptrToDescriptor}` on the ABI. The reason why
refbackrt::Tensor wasn't workable is that is that MLIR doesn't really
have a way to deal with the lifetime of unranked memref descriptors that
happen inside the function, which is inevitably what would happen in the
old code that would emit runtime calls to
`refbackrt.to_memref/refbackrt.from_memref` to convert back and forth to
`refbackrt::Tensor` inside the codegenned code.
So, instead of the `refbackrt.to_memref/refbackrt.from_memref` with no
real sound basis for valid lifetime management, we now have a lovely
piece of code in `refbackrt::invoke` in `Runtime.cpp` that just barely
seems to be sound. We rely on the codegenned code having these
properties, which it seems to have:
- it won't free memref descriptors or their backing buffer for arguments
of UnrankedMemRef type.
- it will allocate a separate memref descriptor for each result
UnrankedMemRef (which is ensured by having a separate memref_cast for
each)
- we can sniff the `allocatedPtr`'s (i.e. the backing buffer pointers)
to avoid double-freeing in the case of aliasing of the backing buffer
(including backing buffers for arguments feeding into results)
- to catch the case of statically allocated data (which we need to avoid
passing to `free`) , check if the `allocatedPtr` is (no joke) equal to
`0xDEADBEEF`, because there is otherwise no way to distinguish
statically allocated from malloc'ed data... (std.global_memref lowering
to LLVM by happenstance sets the allocatedPtr equal to `0xDEADBEEF`,
presumably mainly as a debugging thing)
Even with all this, we *still* need to (internally to refbackrt::invoke)
make copies of all inputs/outputs! And the details of how the LLVM-level
ABI gets laid out for e.g. function arguments/returns is still super
tricky.
This really highlights how deficient memref is as the general runtime
type for our use case. It's stewing in my mind how best to improve the
situation. My general gut feeling is that IREE's abstractions for this
are "right", but I need to think more how to distill those aspects of
IREE's design in a "reference" way for RefBackend.
Some implementation notes:
- In terms of how this is implemented, this did catch a bug in our ABI
wrapper functions in LowerToLLVM.cpp, which I had to fix (it happened to
work before through some combination of npcomprt::Tensor being passed as
a single pointer + probably me infinite-monkey-ing it until it worked)
- This actually removes 2 out of the 3 compiler runtime functions (the
only one left is "abort_if". (most of the memref descriptor code moved
from CopmilerRuntime.cpp to Runtime.cpp)
- this also means deleting `refbackrt.from_memref` and
`refbackrt.to_memref`
2020-11-25 09:18:57 +08:00
|
|
|
auto memref = rewriter.create<MemRefCastOp>(op.getLoc(), newArg,
|
2020-09-17 08:31:40 +08:00
|
|
|
oldArg.getType());
|
|
|
|
rewriter.replaceUsesOfBlockArgument(oldArg, memref);
|
|
|
|
}
|
|
|
|
});
|
2020-07-11 08:31:24 +08:00
|
|
|
return success();
|
|
|
|
}
|
|
|
|
};
|
|
|
|
} // namespace
|
|
|
|
|
|
|
|
namespace {
|
[RefBackend] Fix leaks related to ABI boundaries.
Best as I can tell (e.g. from LeakSanitizer), this fixes all the leaks
except for those due to buffers created internally to the codegenned
code itself (up next I'll add the buffer deallocation pass to fix
those).
The main change is that instead of attempting to pass `refbackrt::Tensor`
to the codegenned function directly, we make all the ABI types be
UnrankedMemRef which gets passed awkwardly (but workably) as a
`{size_t rank, void *ptrToDescriptor}` on the ABI. The reason why
refbackrt::Tensor wasn't workable is that is that MLIR doesn't really
have a way to deal with the lifetime of unranked memref descriptors that
happen inside the function, which is inevitably what would happen in the
old code that would emit runtime calls to
`refbackrt.to_memref/refbackrt.from_memref` to convert back and forth to
`refbackrt::Tensor` inside the codegenned code.
So, instead of the `refbackrt.to_memref/refbackrt.from_memref` with no
real sound basis for valid lifetime management, we now have a lovely
piece of code in `refbackrt::invoke` in `Runtime.cpp` that just barely
seems to be sound. We rely on the codegenned code having these
properties, which it seems to have:
- it won't free memref descriptors or their backing buffer for arguments
of UnrankedMemRef type.
- it will allocate a separate memref descriptor for each result
UnrankedMemRef (which is ensured by having a separate memref_cast for
each)
- we can sniff the `allocatedPtr`'s (i.e. the backing buffer pointers)
to avoid double-freeing in the case of aliasing of the backing buffer
(including backing buffers for arguments feeding into results)
- to catch the case of statically allocated data (which we need to avoid
passing to `free`) , check if the `allocatedPtr` is (no joke) equal to
`0xDEADBEEF`, because there is otherwise no way to distinguish
statically allocated from malloc'ed data... (std.global_memref lowering
to LLVM by happenstance sets the allocatedPtr equal to `0xDEADBEEF`,
presumably mainly as a debugging thing)
Even with all this, we *still* need to (internally to refbackrt::invoke)
make copies of all inputs/outputs! And the details of how the LLVM-level
ABI gets laid out for e.g. function arguments/returns is still super
tricky.
This really highlights how deficient memref is as the general runtime
type for our use case. It's stewing in my mind how best to improve the
situation. My general gut feeling is that IREE's abstractions for this
are "right", but I need to think more how to distill those aspects of
IREE's design in a "reference" way for RefBackend.
Some implementation notes:
- In terms of how this is implemented, this did catch a bug in our ABI
wrapper functions in LowerToLLVM.cpp, which I had to fix (it happened to
work before through some combination of npcomprt::Tensor being passed as
a single pointer + probably me infinite-monkey-ing it until it worked)
- This actually removes 2 out of the 3 compiler runtime functions (the
only one left is "abort_if". (most of the memref descriptor code moved
from CopmilerRuntime.cpp to Runtime.cpp)
- this also means deleting `refbackrt.from_memref` and
`refbackrt.to_memref`
2020-11-25 09:18:57 +08:00
|
|
|
// At the return ABI boundaries, convert to the ABI type.
|
2020-09-17 08:31:40 +08:00
|
|
|
// This pattern is needed to trigger the type conversion mechanics to do a
|
|
|
|
// target materialization.
|
|
|
|
class RewriteReturnOp : public OpConversionPattern<ReturnOp> {
|
2020-07-11 08:31:24 +08:00
|
|
|
public:
|
|
|
|
using OpConversionPattern::OpConversionPattern;
|
|
|
|
LogicalResult
|
2020-09-17 08:31:40 +08:00
|
|
|
matchAndRewrite(ReturnOp op, ArrayRef<Value> operands,
|
2020-07-11 08:31:24 +08:00
|
|
|
ConversionPatternRewriter &rewriter) const override {
|
2020-09-17 08:31:40 +08:00
|
|
|
rewriter.replaceOpWithNewOp<ReturnOp>(op, operands);
|
2020-07-11 08:31:24 +08:00
|
|
|
return success();
|
|
|
|
}
|
|
|
|
};
|
|
|
|
} // namespace
|
|
|
|
|
Rework e2e flow to use new "npcomprt"
This ~totally reworks the existing "runtime" stuff to be more
principled and usable, such as from Python. It's still not fully
production-quality, mainly in the department of memory management (e.g.
it currently leaks memory; we need to figure out "who frees memrefs" +
the analysis and transformation needed to do that (maybe use upstream
buffer allocation pass?)).
The user API is in include/npcomp/runtime/UserAPI.h, though
include/npcomp/JITRuntime/JITModule.h is a friendlier wrapper.
The stuff under {include,lib}/runtime is totally firewalled from the
compiler and tiny (<6kB, though no attention has gone into optimizing
that size). For example, we don't link in libSupport into the runtime,
instead having our own bare bones replacements for basics like ArrayRef
(the JITRuntime helps with bridging that gap, since it *can* depend on
all common LLVM utilities).
The overall features of npcomprt is that it exposes a module that
with multiple function entry points. Each function has arguments and
results that are tensor-valued, and npcomprt::Tensor is the runtime type
that is used to interact with that (and a npcomprt::Ref<T>
reference-counting wrapper is provided to wrap npcomprt::Tensor in the
common case).
From an implementation perspective, an npcomprt module at the
LLVM/object/binary level exposes a single module descriptor struct that
has pointers to other metadata (currently just a list of function
metadata descriptors). All interactions with the npcomp runtime are
keyed off of that module descriptor, including function lookups and
dispatching. This is done to dodge platform ABI issues and also allow
enough reflection to e.g. verify provided arguments.
Most of the compiler-side work here was in LowerToNpcomprtABI and
LowerToLLVM.
Also,
- Rename npcomp_rt/NpcompRt to npcomprt/Npcomprt; it was getting
annoying to type the underscores/caps.
- misc improvements to bash_helpers.sh
2020-07-09 08:15:40 +08:00
|
|
|
static LogicalResult doDialectConversion(ModuleOp module) {
|
|
|
|
auto *context = module.getContext();
|
|
|
|
|
2020-09-17 08:31:40 +08:00
|
|
|
TypeConverter typeConverter;
|
|
|
|
typeConverter.addConversion([](Type type) { return type; });
|
[RefBackend] Fix leaks related to ABI boundaries.
Best as I can tell (e.g. from LeakSanitizer), this fixes all the leaks
except for those due to buffers created internally to the codegenned
code itself (up next I'll add the buffer deallocation pass to fix
those).
The main change is that instead of attempting to pass `refbackrt::Tensor`
to the codegenned function directly, we make all the ABI types be
UnrankedMemRef which gets passed awkwardly (but workably) as a
`{size_t rank, void *ptrToDescriptor}` on the ABI. The reason why
refbackrt::Tensor wasn't workable is that is that MLIR doesn't really
have a way to deal with the lifetime of unranked memref descriptors that
happen inside the function, which is inevitably what would happen in the
old code that would emit runtime calls to
`refbackrt.to_memref/refbackrt.from_memref` to convert back and forth to
`refbackrt::Tensor` inside the codegenned code.
So, instead of the `refbackrt.to_memref/refbackrt.from_memref` with no
real sound basis for valid lifetime management, we now have a lovely
piece of code in `refbackrt::invoke` in `Runtime.cpp` that just barely
seems to be sound. We rely on the codegenned code having these
properties, which it seems to have:
- it won't free memref descriptors or their backing buffer for arguments
of UnrankedMemRef type.
- it will allocate a separate memref descriptor for each result
UnrankedMemRef (which is ensured by having a separate memref_cast for
each)
- we can sniff the `allocatedPtr`'s (i.e. the backing buffer pointers)
to avoid double-freeing in the case of aliasing of the backing buffer
(including backing buffers for arguments feeding into results)
- to catch the case of statically allocated data (which we need to avoid
passing to `free`) , check if the `allocatedPtr` is (no joke) equal to
`0xDEADBEEF`, because there is otherwise no way to distinguish
statically allocated from malloc'ed data... (std.global_memref lowering
to LLVM by happenstance sets the allocatedPtr equal to `0xDEADBEEF`,
presumably mainly as a debugging thing)
Even with all this, we *still* need to (internally to refbackrt::invoke)
make copies of all inputs/outputs! And the details of how the LLVM-level
ABI gets laid out for e.g. function arguments/returns is still super
tricky.
This really highlights how deficient memref is as the general runtime
type for our use case. It's stewing in my mind how best to improve the
situation. My general gut feeling is that IREE's abstractions for this
are "right", but I need to think more how to distill those aspects of
IREE's design in a "reference" way for RefBackend.
Some implementation notes:
- In terms of how this is implemented, this did catch a bug in our ABI
wrapper functions in LowerToLLVM.cpp, which I had to fix (it happened to
work before through some combination of npcomprt::Tensor being passed as
a single pointer + probably me infinite-monkey-ing it until it worked)
- This actually removes 2 out of the 3 compiler runtime functions (the
only one left is "abort_if". (most of the memref descriptor code moved
from CopmilerRuntime.cpp to Runtime.cpp)
- this also means deleting `refbackrt.from_memref` and
`refbackrt.to_memref`
2020-11-25 09:18:57 +08:00
|
|
|
typeConverter.addConversion(
|
|
|
|
[](MemRefType type) { return getABIMemrefType(type); });
|
2020-09-17 08:31:40 +08:00
|
|
|
typeConverter.addTargetMaterialization(
|
[RefBackend] Fix leaks related to ABI boundaries.
Best as I can tell (e.g. from LeakSanitizer), this fixes all the leaks
except for those due to buffers created internally to the codegenned
code itself (up next I'll add the buffer deallocation pass to fix
those).
The main change is that instead of attempting to pass `refbackrt::Tensor`
to the codegenned function directly, we make all the ABI types be
UnrankedMemRef which gets passed awkwardly (but workably) as a
`{size_t rank, void *ptrToDescriptor}` on the ABI. The reason why
refbackrt::Tensor wasn't workable is that is that MLIR doesn't really
have a way to deal with the lifetime of unranked memref descriptors that
happen inside the function, which is inevitably what would happen in the
old code that would emit runtime calls to
`refbackrt.to_memref/refbackrt.from_memref` to convert back and forth to
`refbackrt::Tensor` inside the codegenned code.
So, instead of the `refbackrt.to_memref/refbackrt.from_memref` with no
real sound basis for valid lifetime management, we now have a lovely
piece of code in `refbackrt::invoke` in `Runtime.cpp` that just barely
seems to be sound. We rely on the codegenned code having these
properties, which it seems to have:
- it won't free memref descriptors or their backing buffer for arguments
of UnrankedMemRef type.
- it will allocate a separate memref descriptor for each result
UnrankedMemRef (which is ensured by having a separate memref_cast for
each)
- we can sniff the `allocatedPtr`'s (i.e. the backing buffer pointers)
to avoid double-freeing in the case of aliasing of the backing buffer
(including backing buffers for arguments feeding into results)
- to catch the case of statically allocated data (which we need to avoid
passing to `free`) , check if the `allocatedPtr` is (no joke) equal to
`0xDEADBEEF`, because there is otherwise no way to distinguish
statically allocated from malloc'ed data... (std.global_memref lowering
to LLVM by happenstance sets the allocatedPtr equal to `0xDEADBEEF`,
presumably mainly as a debugging thing)
Even with all this, we *still* need to (internally to refbackrt::invoke)
make copies of all inputs/outputs! And the details of how the LLVM-level
ABI gets laid out for e.g. function arguments/returns is still super
tricky.
This really highlights how deficient memref is as the general runtime
type for our use case. It's stewing in my mind how best to improve the
situation. My general gut feeling is that IREE's abstractions for this
are "right", but I need to think more how to distill those aspects of
IREE's design in a "reference" way for RefBackend.
Some implementation notes:
- In terms of how this is implemented, this did catch a bug in our ABI
wrapper functions in LowerToLLVM.cpp, which I had to fix (it happened to
work before through some combination of npcomprt::Tensor being passed as
a single pointer + probably me infinite-monkey-ing it until it worked)
- This actually removes 2 out of the 3 compiler runtime functions (the
only one left is "abort_if". (most of the memref descriptor code moved
from CopmilerRuntime.cpp to Runtime.cpp)
- this also means deleting `refbackrt.from_memref` and
`refbackrt.to_memref`
2020-11-25 09:18:57 +08:00
|
|
|
[](OpBuilder &builder, UnrankedMemRefType type, ValueRange inputs,
|
2020-09-17 08:31:40 +08:00
|
|
|
Location loc) -> Value {
|
|
|
|
assert(inputs.size() == 1);
|
[RefBackend] Fix leaks related to ABI boundaries.
Best as I can tell (e.g. from LeakSanitizer), this fixes all the leaks
except for those due to buffers created internally to the codegenned
code itself (up next I'll add the buffer deallocation pass to fix
those).
The main change is that instead of attempting to pass `refbackrt::Tensor`
to the codegenned function directly, we make all the ABI types be
UnrankedMemRef which gets passed awkwardly (but workably) as a
`{size_t rank, void *ptrToDescriptor}` on the ABI. The reason why
refbackrt::Tensor wasn't workable is that is that MLIR doesn't really
have a way to deal with the lifetime of unranked memref descriptors that
happen inside the function, which is inevitably what would happen in the
old code that would emit runtime calls to
`refbackrt.to_memref/refbackrt.from_memref` to convert back and forth to
`refbackrt::Tensor` inside the codegenned code.
So, instead of the `refbackrt.to_memref/refbackrt.from_memref` with no
real sound basis for valid lifetime management, we now have a lovely
piece of code in `refbackrt::invoke` in `Runtime.cpp` that just barely
seems to be sound. We rely on the codegenned code having these
properties, which it seems to have:
- it won't free memref descriptors or their backing buffer for arguments
of UnrankedMemRef type.
- it will allocate a separate memref descriptor for each result
UnrankedMemRef (which is ensured by having a separate memref_cast for
each)
- we can sniff the `allocatedPtr`'s (i.e. the backing buffer pointers)
to avoid double-freeing in the case of aliasing of the backing buffer
(including backing buffers for arguments feeding into results)
- to catch the case of statically allocated data (which we need to avoid
passing to `free`) , check if the `allocatedPtr` is (no joke) equal to
`0xDEADBEEF`, because there is otherwise no way to distinguish
statically allocated from malloc'ed data... (std.global_memref lowering
to LLVM by happenstance sets the allocatedPtr equal to `0xDEADBEEF`,
presumably mainly as a debugging thing)
Even with all this, we *still* need to (internally to refbackrt::invoke)
make copies of all inputs/outputs! And the details of how the LLVM-level
ABI gets laid out for e.g. function arguments/returns is still super
tricky.
This really highlights how deficient memref is as the general runtime
type for our use case. It's stewing in my mind how best to improve the
situation. My general gut feeling is that IREE's abstractions for this
are "right", but I need to think more how to distill those aspects of
IREE's design in a "reference" way for RefBackend.
Some implementation notes:
- In terms of how this is implemented, this did catch a bug in our ABI
wrapper functions in LowerToLLVM.cpp, which I had to fix (it happened to
work before through some combination of npcomprt::Tensor being passed as
a single pointer + probably me infinite-monkey-ing it until it worked)
- This actually removes 2 out of the 3 compiler runtime functions (the
only one left is "abort_if". (most of the memref descriptor code moved
from CopmilerRuntime.cpp to Runtime.cpp)
- this also means deleting `refbackrt.from_memref` and
`refbackrt.to_memref`
2020-11-25 09:18:57 +08:00
|
|
|
return builder.create<MemRefCastOp>(
|
2020-09-17 08:31:40 +08:00
|
|
|
loc, inputs[0], getABIMemrefType(inputs[0].getType()));
|
|
|
|
});
|
Rework e2e flow to use new "npcomprt"
This ~totally reworks the existing "runtime" stuff to be more
principled and usable, such as from Python. It's still not fully
production-quality, mainly in the department of memory management (e.g.
it currently leaks memory; we need to figure out "who frees memrefs" +
the analysis and transformation needed to do that (maybe use upstream
buffer allocation pass?)).
The user API is in include/npcomp/runtime/UserAPI.h, though
include/npcomp/JITRuntime/JITModule.h is a friendlier wrapper.
The stuff under {include,lib}/runtime is totally firewalled from the
compiler and tiny (<6kB, though no attention has gone into optimizing
that size). For example, we don't link in libSupport into the runtime,
instead having our own bare bones replacements for basics like ArrayRef
(the JITRuntime helps with bridging that gap, since it *can* depend on
all common LLVM utilities).
The overall features of npcomprt is that it exposes a module that
with multiple function entry points. Each function has arguments and
results that are tensor-valued, and npcomprt::Tensor is the runtime type
that is used to interact with that (and a npcomprt::Ref<T>
reference-counting wrapper is provided to wrap npcomprt::Tensor in the
common case).
From an implementation perspective, an npcomprt module at the
LLVM/object/binary level exposes a single module descriptor struct that
has pointers to other metadata (currently just a list of function
metadata descriptors). All interactions with the npcomp runtime are
keyed off of that module descriptor, including function lookups and
dispatching. This is done to dodge platform ABI issues and also allow
enough reflection to e.g. verify provided arguments.
Most of the compiler-side work here was in LowerToNpcomprtABI and
LowerToLLVM.
Also,
- Rename npcomp_rt/NpcompRt to npcomprt/Npcomprt; it was getting
annoying to type the underscores/caps.
- misc improvements to bash_helpers.sh
2020-07-09 08:15:40 +08:00
|
|
|
|
|
|
|
OwningRewritePatternList patterns;
|
|
|
|
ConversionTarget target(*context);
|
2020-10-08 08:12:52 +08:00
|
|
|
target.addLegalDialect<refbackrt::RefbackrtDialect>();
|
2020-09-17 08:31:40 +08:00
|
|
|
target.addLegalDialect<StandardOpsDialect>();
|
Rework e2e flow to use new "npcomprt"
This ~totally reworks the existing "runtime" stuff to be more
principled and usable, such as from Python. It's still not fully
production-quality, mainly in the department of memory management (e.g.
it currently leaks memory; we need to figure out "who frees memrefs" +
the analysis and transformation needed to do that (maybe use upstream
buffer allocation pass?)).
The user API is in include/npcomp/runtime/UserAPI.h, though
include/npcomp/JITRuntime/JITModule.h is a friendlier wrapper.
The stuff under {include,lib}/runtime is totally firewalled from the
compiler and tiny (<6kB, though no attention has gone into optimizing
that size). For example, we don't link in libSupport into the runtime,
instead having our own bare bones replacements for basics like ArrayRef
(the JITRuntime helps with bridging that gap, since it *can* depend on
all common LLVM utilities).
The overall features of npcomprt is that it exposes a module that
with multiple function entry points. Each function has arguments and
results that are tensor-valued, and npcomprt::Tensor is the runtime type
that is used to interact with that (and a npcomprt::Ref<T>
reference-counting wrapper is provided to wrap npcomprt::Tensor in the
common case).
From an implementation perspective, an npcomprt module at the
LLVM/object/binary level exposes a single module descriptor struct that
has pointers to other metadata (currently just a list of function
metadata descriptors). All interactions with the npcomp runtime are
keyed off of that module descriptor, including function lookups and
dispatching. This is done to dodge platform ABI issues and also allow
enough reflection to e.g. verify provided arguments.
Most of the compiler-side work here was in LowerToNpcomprtABI and
LowerToLLVM.
Also,
- Rename npcomp_rt/NpcompRt to npcomprt/Npcomprt; it was getting
annoying to type the underscores/caps.
- misc improvements to bash_helpers.sh
2020-07-09 08:15:40 +08:00
|
|
|
|
2020-09-17 08:31:40 +08:00
|
|
|
patterns.insert<FuncOpSignatureConversion>(typeConverter, context);
|
|
|
|
target.addDynamicallyLegalOp<FuncOp>(
|
|
|
|
[&](FuncOp op) { return typeConverter.isSignatureLegal(op.getType()); });
|
|
|
|
patterns.insert<RewriteReturnOp>(typeConverter, context);
|
|
|
|
target.addDynamicallyLegalOp<ReturnOp>(
|
|
|
|
[&](ReturnOp op) { return typeConverter.isLegal(op); });
|
Rework e2e flow to use new "npcomprt"
This ~totally reworks the existing "runtime" stuff to be more
principled and usable, such as from Python. It's still not fully
production-quality, mainly in the department of memory management (e.g.
it currently leaks memory; we need to figure out "who frees memrefs" +
the analysis and transformation needed to do that (maybe use upstream
buffer allocation pass?)).
The user API is in include/npcomp/runtime/UserAPI.h, though
include/npcomp/JITRuntime/JITModule.h is a friendlier wrapper.
The stuff under {include,lib}/runtime is totally firewalled from the
compiler and tiny (<6kB, though no attention has gone into optimizing
that size). For example, we don't link in libSupport into the runtime,
instead having our own bare bones replacements for basics like ArrayRef
(the JITRuntime helps with bridging that gap, since it *can* depend on
all common LLVM utilities).
The overall features of npcomprt is that it exposes a module that
with multiple function entry points. Each function has arguments and
results that are tensor-valued, and npcomprt::Tensor is the runtime type
that is used to interact with that (and a npcomprt::Ref<T>
reference-counting wrapper is provided to wrap npcomprt::Tensor in the
common case).
From an implementation perspective, an npcomprt module at the
LLVM/object/binary level exposes a single module descriptor struct that
has pointers to other metadata (currently just a list of function
metadata descriptors). All interactions with the npcomp runtime are
keyed off of that module descriptor, including function lookups and
dispatching. This is done to dodge platform ABI issues and also allow
enough reflection to e.g. verify provided arguments.
Most of the compiler-side work here was in LowerToNpcomprtABI and
LowerToLLVM.
Also,
- Rename npcomp_rt/NpcompRt to npcomprt/Npcomprt; it was getting
annoying to type the underscores/caps.
- misc improvements to bash_helpers.sh
2020-07-09 08:15:40 +08:00
|
|
|
|
2020-09-17 08:31:40 +08:00
|
|
|
patterns.insert<LowerAssertOp>(context);
|
|
|
|
target.addIllegalOp<AssertOp>();
|
2020-07-11 08:31:24 +08:00
|
|
|
|
2020-10-30 06:25:55 +08:00
|
|
|
return applyPartialConversion(module, target, std::move(patterns));
|
Rework e2e flow to use new "npcomprt"
This ~totally reworks the existing "runtime" stuff to be more
principled and usable, such as from Python. It's still not fully
production-quality, mainly in the department of memory management (e.g.
it currently leaks memory; we need to figure out "who frees memrefs" +
the analysis and transformation needed to do that (maybe use upstream
buffer allocation pass?)).
The user API is in include/npcomp/runtime/UserAPI.h, though
include/npcomp/JITRuntime/JITModule.h is a friendlier wrapper.
The stuff under {include,lib}/runtime is totally firewalled from the
compiler and tiny (<6kB, though no attention has gone into optimizing
that size). For example, we don't link in libSupport into the runtime,
instead having our own bare bones replacements for basics like ArrayRef
(the JITRuntime helps with bridging that gap, since it *can* depend on
all common LLVM utilities).
The overall features of npcomprt is that it exposes a module that
with multiple function entry points. Each function has arguments and
results that are tensor-valued, and npcomprt::Tensor is the runtime type
that is used to interact with that (and a npcomprt::Ref<T>
reference-counting wrapper is provided to wrap npcomprt::Tensor in the
common case).
From an implementation perspective, an npcomprt module at the
LLVM/object/binary level exposes a single module descriptor struct that
has pointers to other metadata (currently just a list of function
metadata descriptors). All interactions with the npcomp runtime are
keyed off of that module descriptor, including function lookups and
dispatching. This is done to dodge platform ABI issues and also allow
enough reflection to e.g. verify provided arguments.
Most of the compiler-side work here was in LowerToNpcomprtABI and
LowerToLLVM.
Also,
- Rename npcomp_rt/NpcompRt to npcomprt/Npcomprt; it was getting
annoying to type the underscores/caps.
- misc improvements to bash_helpers.sh
2020-07-09 08:15:40 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
namespace {
|
|
|
|
// This pass lowers the public ABI of the module to the primitives exposed by
|
2020-10-08 08:12:52 +08:00
|
|
|
// the refbackrt dialect.
|
|
|
|
class LowerToRefbackrtABI
|
|
|
|
: public LowerToRefbackrtABIBase<LowerToRefbackrtABI> {
|
2020-09-10 15:23:46 +08:00
|
|
|
void getDependentDialects(DialectRegistry ®istry) const override {
|
2020-10-08 08:12:52 +08:00
|
|
|
registry.insert<refbackrt::RefbackrtDialect>();
|
2020-09-10 15:23:46 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
void runOnOperation() override {
|
Rework e2e flow to use new "npcomprt"
This ~totally reworks the existing "runtime" stuff to be more
principled and usable, such as from Python. It's still not fully
production-quality, mainly in the department of memory management (e.g.
it currently leaks memory; we need to figure out "who frees memrefs" +
the analysis and transformation needed to do that (maybe use upstream
buffer allocation pass?)).
The user API is in include/npcomp/runtime/UserAPI.h, though
include/npcomp/JITRuntime/JITModule.h is a friendlier wrapper.
The stuff under {include,lib}/runtime is totally firewalled from the
compiler and tiny (<6kB, though no attention has gone into optimizing
that size). For example, we don't link in libSupport into the runtime,
instead having our own bare bones replacements for basics like ArrayRef
(the JITRuntime helps with bridging that gap, since it *can* depend on
all common LLVM utilities).
The overall features of npcomprt is that it exposes a module that
with multiple function entry points. Each function has arguments and
results that are tensor-valued, and npcomprt::Tensor is the runtime type
that is used to interact with that (and a npcomprt::Ref<T>
reference-counting wrapper is provided to wrap npcomprt::Tensor in the
common case).
From an implementation perspective, an npcomprt module at the
LLVM/object/binary level exposes a single module descriptor struct that
has pointers to other metadata (currently just a list of function
metadata descriptors). All interactions with the npcomp runtime are
keyed off of that module descriptor, including function lookups and
dispatching. This is done to dodge platform ABI issues and also allow
enough reflection to e.g. verify provided arguments.
Most of the compiler-side work here was in LowerToNpcomprtABI and
LowerToLLVM.
Also,
- Rename npcomp_rt/NpcompRt to npcomprt/Npcomprt; it was getting
annoying to type the underscores/caps.
- misc improvements to bash_helpers.sh
2020-07-09 08:15:40 +08:00
|
|
|
ModuleOp module = getOperation();
|
|
|
|
|
|
|
|
// Before we lower anything, capture any needed metadata about the argument
|
|
|
|
// lists that will be needed for safely invoking the raw runtime functions
|
|
|
|
// later. (for example, number of expected arguments/results, types,
|
|
|
|
// etc.)
|
|
|
|
if (failed(createModuleMetadata(module)))
|
|
|
|
return signalPassFailure();
|
|
|
|
|
|
|
|
// Now do the actual conversion / lowering.
|
|
|
|
if (failed(doDialectConversion(module)))
|
|
|
|
return signalPassFailure();
|
|
|
|
}
|
|
|
|
};
|
|
|
|
} // namespace
|
|
|
|
|
|
|
|
std::unique_ptr<OperationPass<ModuleOp>>
|
2020-10-08 08:12:52 +08:00
|
|
|
mlir::NPCOMP::createLowerToRefbackrtABIPass() {
|
|
|
|
return std::make_unique<LowerToRefbackrtABI>();
|
Rework e2e flow to use new "npcomprt"
This ~totally reworks the existing "runtime" stuff to be more
principled and usable, such as from Python. It's still not fully
production-quality, mainly in the department of memory management (e.g.
it currently leaks memory; we need to figure out "who frees memrefs" +
the analysis and transformation needed to do that (maybe use upstream
buffer allocation pass?)).
The user API is in include/npcomp/runtime/UserAPI.h, though
include/npcomp/JITRuntime/JITModule.h is a friendlier wrapper.
The stuff under {include,lib}/runtime is totally firewalled from the
compiler and tiny (<6kB, though no attention has gone into optimizing
that size). For example, we don't link in libSupport into the runtime,
instead having our own bare bones replacements for basics like ArrayRef
(the JITRuntime helps with bridging that gap, since it *can* depend on
all common LLVM utilities).
The overall features of npcomprt is that it exposes a module that
with multiple function entry points. Each function has arguments and
results that are tensor-valued, and npcomprt::Tensor is the runtime type
that is used to interact with that (and a npcomprt::Ref<T>
reference-counting wrapper is provided to wrap npcomprt::Tensor in the
common case).
From an implementation perspective, an npcomprt module at the
LLVM/object/binary level exposes a single module descriptor struct that
has pointers to other metadata (currently just a list of function
metadata descriptors). All interactions with the npcomp runtime are
keyed off of that module descriptor, including function lookups and
dispatching. This is done to dodge platform ABI issues and also allow
enough reflection to e.g. verify provided arguments.
Most of the compiler-side work here was in LowerToNpcomprtABI and
LowerToLLVM.
Also,
- Rename npcomp_rt/NpcompRt to npcomprt/Npcomprt; it was getting
annoying to type the underscores/caps.
- misc improvements to bash_helpers.sh
2020-07-09 08:15:40 +08:00
|
|
|
}
|