This pass incorporates torch.type_bound info and also removes NoneType
returns (eventually it will rewrite tuple types too, but can't yet
because !basicpy.TupleType doesn't track element types).
Recommend looking at adjust-calling-conventions.mlir first to see what
it is doing, and holding your nose for the implementation of the pass.
I decided to implement this with the conversion framework, because it
gives us *some* goodies for type conversion -- mainly avoiding large
amounts of tricky RAUW dances. Unfortunately, the conversion framework
isn't a perfect fit for a couple reasons:
- the incorporation of torch.type_bound is a context-sensitive rewrite
(requires looking at the arg attr, not just the type).
- NoneType conversion is 1->0, which requires some special handling
- (not implemented yet) 1->N tuple type conversions require special
handling.
It's a little bit scary, but on balance doing it the other way would
have its own downsides.
We already had the `promoteTrailingOutTensor` flag, but weren't using
it. A inplaceVariantKernelName flag needed to be added.
This change is a little dissatisfying, as the conversions done by the
RecognizeKernelsPass are currently non-orthogonal. In particular,
`kDropResultAndAliasArg0` probably won't work as intended if mixed with
these (we probably need to promote kDropResultAndAliasArg0 to not be an
arg-level thing anyway, as we have done with promoteTrailingOutTensor).
This involved adding a new op `numpy.overwrite_array`.
```
numpy.overwrite_array %arg2 overwrites %arg0 : tensor<2x3xf32>, !numpy.ndarray<[2,3]:f32>
```
This models the destructive update behavior. Note that in the above op,
we cannot simply RAUW %arg0 with a suitably conveted %arg2 (for example,
%arg0 might have uses that are not dominated by %arg2, or might have an
alias relation with some other array in the program). In general, we
need a pass analogous to "SSA-formation" which knows how to see through
these to uncover an underlying tensor program.
Also, add tanh_out_e2e.py/div_inplace_e2e.py and fix some bitrot in
refjit.py which is my running example I'm trying to get working.