torch-mlir/tools/mnist-playground
Sean Silva dd1fa2607f Add hopefully short-lived mnist-playground utility.
This unblocks backend progress while the PyTorch frontend work is coming
online. Hopefully we can delete this soon.

See tools/mnist-playground/README.md for more context on what this tool
is for, next steps, and current status.
2020-10-05 13:59:06 -07:00
..
CMakeLists.txt Add hopefully short-lived mnist-playground utility. 2020-10-05 13:59:06 -07:00
README.md Add hopefully short-lived mnist-playground utility. 2020-10-05 13:59:06 -07:00
fc.mlir Add hopefully short-lived mnist-playground utility. 2020-10-05 13:59:06 -07:00
mnist-playground.cpp Add hopefully short-lived mnist-playground utility. 2020-10-05 13:59:06 -07:00

README.md

mnist-playground

This is intended to be a short-lived "playground" for doing various experiments, guided by a real model use case, for improving the npcomp reference backend.

It's expected that utilities developed here will graduate to a more general utility or that this utility will be obsoleted by Python-driven flows once those come online.

Goals:

  • Obtain a performance-grounded analysis of the TCF/TCP design + reference backend design, and improve the designs.

  • Make forward progress on TCF/TCP + reference backend while the PyTorch frontend is being brought up.

Rough sketch of how we intend to get there:

  1. Link against PyTorch, and write a simple routine to do inference on a simple FC MNIST.

  2. Write a similar routine in TCF, extending TCF and the reference backend as needed for functional completeness. The PyTorch code serves as a numerical correctness reference.

  3. Run and profile the reference backend and obtain a set of action items for design improvements, both to performance and stability. The PyTorch code serves as a performance baseline.

  4. Implement important action items on a priority basis, and document remaining major design issues that don't make sense to address at this time, along with a justification for why the current design doesn't prevent us from eventually solving them. Iterate the previous step and this one as makes sense.

  5. (Stretch) Add support for convolutional MNIST and/or training.

Current Status

Step 1. DONE

Step 2. MOSTLY DONE. Still need to improve the op set to make the FC MNIST more complete. In particular, implementing functionality for reshape and softmax.

Step 3. STARTING. Initial performance on 10x784x100 (10 FC feature, batch 100) is 66x off from PyTorch. No profiling done yet.

Example command line (the .mlir file and -invoke are similar to npcomp-run-mlir):

$ mnist-playground tools/mnist-playground/fc.mlir -invoke fc
PyTorch: numRuns: 16384 nsPerRun: 3.947563e+05
RefE2E: numRuns: 256 nsPerRun: 2.471073e+07
Ratio (RefE2E / PyTorch): 62.5974

There is currently a fragile dependency between hardcoded at:: function calls in the .cpp file and the TCF code in the .mlir file. A correctness check is done to make sure they agree. Once we have a PyTorch frontend and/or ATen roundrip ATen backend oneline, we can avoid this fragility.