As [@ezyang suggested](https://github.com/pytorch/pytorch/issues/90276#issuecomment-1339791275),
use `torch._dynamo.optimizations.training.aot_autograd` instead of raw
`make_fx`. This is more future proof and gives us the backward pass and
functionalization. We don't currently get functionalization because of
https://github.com/pytorch/pytorch/issues/90759
This also incidentally fixes the source location handling, which makes
`lockstep_basic.py` give an accurate source location!
Thanks to TorchDynamo's great layering and design, this is only about
100 lines of code for a basic lockstep debugger.
This should allow us to deprecate eager_mode, since AFAIK the only
interesting use case that it was really supporting is for downstream users to
write lockstep debuggers.
NOTE: The exact reporting and interface here is subject to change. Please
try it out and provide feedback (or patches :) ).
- make_fx should not drop source locations: https://github.com/pytorch/pytorch/issues/90276
- Report tensors better (huge tensors should be summarized)
- Maybe don't abort, but just warn?
- Allow customizing atol/rtol.
- How best to print the failing node? And include surrounding graph
context?