Different teams use different libraries to train and run neural networks (caffe, torch, theano...). This makes sharing difficult: each library has its own format to store ne
ONNX is an open-source AI ecosystem that provides a common format for neural networks.
It helps converting a deep learning model to another.
Generally, model conversion takes weeks/months without ONNX:

ONNX offers simpler and faster conversion process:

For all supported conversions, see here.
It makes deployment easier, models stored in a much preferable way: In the above image, ONNX acts as a data ingestion layer, transforms each input model to the same format. Otherwise, all models would be like a bunch of puzzle pieces that do not fit each other.

Let's say you have your Keras model and you want to transform it to ONNX:
model = load_model("keras_model.hdf5") # h5 is also OK!
onnx_model = keras2onnx.convert_keras(model, model.name)
onnx_model_file = 'output_model.onnx'
onnx.save_model(onnx_model, onnx_model_file)
Then load & run saved model :
onnx_model = onnx.load_model('output_model.onnx')
content = onnx_model.SerializeToString()
sess = onnxruntime.InferenceSession(content)
x = x if isinstance(x, list) else [x]
feed = dict([(input.name, x[n]) for n, input in enumerate(sess.get_inputs())])
# Do inference
pred_onnx = sess.run(None, feed)
This example uses keras2onnx to convert the Keras model and onnxruntime to do inference.
Note: There are also lots of pre-trained models in the ONNX format. Check this out!
References:
1. https://towardsdatascience.com/onnx-made-easy-957e60d16e94
2. https://blog.codecentric.de/en/2019/08/portability-deep-learning-frameworks-onnx/
3. http://on-demand.gputechconf.com/gtc/2018/presentation/s8818-onnx-interoperable-deep-learning-presented-by-facebook.pdf
4. https://devblogs.nvidia.com/tensorrt-3-faster-tensorflow-inference/