torch-mlir

Commit Graph

Author	SHA1	Message	Date
zjgarvey	75d1d72059	Generalize Operand Quantization in FuseQuantizeOps (#3327 ) This change enables more customization with operand quantization, and generalizes the patterns QuantizeOperands and QuantizeTransposeOperands to QuantizeOperandsPastCommutingOps. This allows for passing quantization through operations which are functionally unaffected by quantization, such as view-like ops. The purpose of this change is to address a myriad of quantization issues seen in quantized onnx models that have some reshape-like operations sandwiched in between a dequant and something like a matmul (whose other operand is immediately quantizable).	2024-05-12 20:49:59 -07:00
Rob Suderman	e3faef5224	[onnx] Convert `onnx.QLinearConv` to `torch` (#2851 ) Leaning on the QDQ functionality in torch we can support the QLinearConv operation by piggybacking through `torch.Convolution`. This includes some changes such as allowing the `onnx` rewriter to run recursively. Doing so allows `QLinearConv` to decopmose to `onnx.Convolution` which is then lowered to `torch`.	2024-02-05 16:09:41 -08:00
Rob Suderman	25a5a22cbd	[torch] Support `torch.convolution` quantized lowering to `linalg` (#2811 ) Linalg has quantized specific operations. We can lower to these operations when there is a known zeropoint and scale operations. This allows the `convolution` to occur with lower bitwidth's, improving the overall performance.	2024-01-30 13:46:47 -08:00
Rob Suderman	f6f890520b	[torch][quant] Quantized `torch.mm` for linalg with end-to-end test (#2750 ) This includes custom op matching for decomposed operations and fusing dequantization into dense operations. As a validation we compare to the dequant+mm torch implementation.	2024-01-24 14:02:50 -08:00

4 Commits (23b53050deb7eda150716917a0305ae1591f1b44)