I've upstreamed the necessary quantized linalg Op with the
"channel-first" ordering used by torch
(https://github.com/llvm/llvm-project/pull/107740) for 2d convolution.
This patch changes the lowering for the quantized 2d case of
`aten.convolution` accordingly, which saves three transpositions per
convolution (input, weights, result) and therefore removes the
requirement to try to optimize these away in downstream passes.
This patch add a test for 638ef14, which use `linalg.broadcast` instead
of `generic` for convolution bias.
Co-authored-by: Rongsheng Gao <gaorongsheng@huawei.com>
1. truncates zero-points to i32
2. modifies the default accumulator type for i8 from i64 to i32.
3. now uses the input dtype to infer accumulator dtype.
The `convertTensorToElementType` function expects it's argument to have
a valid tensor type that is not `Torch::NoneType`. This PR checks that
the bias tensor is not of type `Torch::NoneType` before calling
`convertTensorToElementType` on the bias tensor argument in the
`matchAndRewrite` member function of the `ConvertAtenConvolutionOp`
class.