tensorflow code optimization strategy
问题 Please excuse the broadness of this question. Maybe once I know more perhaps I can ask more specifically. I have performance sensitive piece of tensorflow code. From the perspective of someone who knows little about gpu programming, I would like to know what guides or strategies would be a "good place to start" to optimizing my code. (single gpu) Perhaps even a readout of how long was spent on each tensorflow op would be nice... I have a vague understanding that Some operations go faster when