Why did no one tell me about gdb-add-index?
I’ve been exploring TensorFlow internals in the past couple of weeks.
Doing that required me to run some python scripts with a debug version of TensorFlow, set some breakpoints, examine some values, modify the script, repeat. You know, general sleuthing.
TensorFlow core functionality is written in C++. The python package is a wrapper around C++ code + libraries at the python layer. The way that python interacts with TensorFlow core is by loading a shared object.
For big objects, and a debug build of TensorFlow is at least 1GB, it can take a while to load them into gdb.
For example (neural_network_raw.py
is a small file using TensorFlow):
$ time gdb --ex run --ex quit --args python neural_network_raw.py
real 1m19.613s
user 1m3.667s
sys 0m5.242s
Ugh.
You can solve the problem by running gdb-add-index
on the big shared objects. It will preprocess the shared object and append the index as the .gdb_index
section.
Use it like this:
$ gdb-add-index venv/lib/python3.6/site-packages/tensorflow/libtensorflow_framework.so
$ gdb-add-index venv/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
Afterwards:
$ time gdb --ex run --ex quit --args python neural_network_raw.py
real 0m15.974s
user 0m12.916s
sys 0m2.541s
Each time you save a minute. Not bad.
How did I just learn about this now?