DEEP LEARNING BEST PRACTICES: CHECKPOINTING YOUR DEEP LEARNING MODEL TRAINING

What Needs to be Saved
H6 Subheading

H7 subheading (orange)

H8 subheading (black)

This article covers one of many best practices in Deep Learning, which is creating checkpoints while training your deep learning model. We will look at what needs to be saved while creating checkpoints, why checkpoints are needed (especially on NUS HPC systems), methods to create them, how to create checkpoints in various deep learning frameworks (Keras, Tensorflow, Pytorch) and their benefits.

hpc-checkpointing-01
checkpointer = ModelCheckpoint(filepath='/tmp/weights.hdf5', , monitor='val_loss', verbose=1, save_best_only=True, save_weights_only=False, mode='auto', period=1)
)

model.fit(x_train, y_train, batch_size=128, epochs=20, verbose=0, validation_data=(X_test, Y_test), callbacks=[checkpointer])
# Linux Command Description
1 Mark Otto
2

# cp /hpctmp2/a0123456/file2 ~/file2

Thornton