NUSIT HPC

Home » Latest News » Technical Updates » Training Detectron2

Training Detectron2

Ku Wee Kiat, Research Computing, NUS Information Technology

In this article, we will cover how to train a segmentation model using Detectron2 on your COCO-annotated dataset on NUS HPC GPU Cluster. A singularity container with detectron2 pre-installed is available on the NUS HPC GPU Cluster. The path is:

/app1/common/singularity-img/3.0.0/detectron2_pytorch_1.9_cuda11.1_ubuntu20.04v2.sif

Similar to how we use the cluster to train any other deep learning model, we will have to create a job script. Below is an example of a job script we can use:

#!/bin/bash
#PBS -l select=1:mem=90gb:ncpus=10:ngpus=1
#PBS -l walltime=7:00:00
#PBS -q volta_gpu
#PBS -P Volta_gpu_demo
#PBS -N detectron2_coco_train
#PBS -j oe
 
# Singularity image to use
# Other images available in /app1/common/singularity-img/3.0.0/
image="/app1/common/singularity-img/3.0.0/detectron2_pytorch_1.9_cuda11.1_ubuntu20.04v2.sif"
 
# Change to directory where job was submitted
if [ x"$PBS_O_WORKDIR" != x ] ; then
cd "$PBS_O_WORKDIR" || exit $?
fi
 
np=`cat $PBS_NODEFILE | wc -l`
  
singularity exec $image bash << EOF > stdout.$PBS_JOBID 2> stderr.$PBS_JOBID
#insert commands here
python train_cell_complex.py
EOF

We will be using the LIVECell dataset to train a cell segmentation model. The LIVECell dataset is available at the following link.

The folder structure for the dataset will be as follows:

Overall:

/hpctmp/ccekwk/dataset/livecell/LIVECell_dataset_2021/
|-- annotations
|   `-- dataset_size_split
`-- images
    |-- livecell_test_images
    `-- livecell_train_val_images
Annotations directory:
/hpctmp/ccekwk/dataset/livecell/LIVECell_dataset_2021/annotations/
|-- dataset_size_split
|   |-- 0_train2percent.json
|   |-- 1_train4percent.json
|   |-- 2_train5percent.json
|   |-- 3_train25percent.json
|   `-- 4_train50percent.json
|-- livecell_coco_test.json
|-- livecell_coco_train.json
`-- livecell_coco_val.json

The COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x provided by Detectron2 model will be finetuned on the LIVECell dataset. The filename when downloaded directly is

model_final_f10217.pkl.

The script below demonstrates the simplest way to train a detectron2 segmentation model:

import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()
 
# import some common libraries
import numpy as np
import os, json, cv2, random
 
 
# import some common detectron2 utilities
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data.datasets import register_coco_instances
from detectron2.data import MetadataCatalog, DatasetCatalog
from detectron2.structures import BoxMode
from detectron2.engine import DefaultTrainer
 
weights_path = "../detectron2_weights/model_final_f10217.pkl"
 
# Dataset definition
register_coco_instances("livecell_dataset_train", {}, "/hpctmp/ccekwk/dataset/livecell/LIVECell_dataset_2021/annotations/livecell_coco_train.json", "/hpctmp/ccekwk/dataset/livecell/LIVECell_dataset_2021/images/livecell_train_val_images")
register_coco_instances("livecell_dataset_val", {}, "/hpctmp/ccekwk/dataset/livecell/LIVECell_dataset_2021/annotations/livecell_coco_val.json", "/hpctmp/ccekwk/dataset/livecell/LIVECell_dataset_2021/images/livecell_train_val_images")
 
balloon_metadata = MetadataCatalog.get("livecell_dataset_train")
 
 # Model and training config
 
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("livecell_dataset_train",)
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 4
cfg.MODEL.WEIGHTS = os.path.join(weights_path)
#cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")  # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 1
cfg.SOLVER.BASE_LR = 0.0025  # pick a good LR
cfg.SOLVER.MAX_ITER = 10000    # 300 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset
cfg.SOLVER.STEPS = [2000, 5000]        # do not decay learning rate
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 256   # faster, and good enough for this toy dataset (default: 512)
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # only has one class (cell). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)
# NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here.
 
# Start training
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()
 
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")  # path to the model we just trained

# Inference
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7   # set a custom testing threshold
predictor = DefaultPredictor(cfg)
 
from detectron2.utils.visualizer import ColorMode
 
dataset_dicts = DatasetCatalog.get("livecell_dataset_val")
for d in random.sample(dataset_dicts, 3):
    im = cv2.imread(d["file_name"])
    outputs = predictor(im)  # format is documented at https://detectron2.readthedocs.io/tutorials/models.html#model-output-format
    v = Visualizer(im[:, :, ::-1],
                   metadata=balloon_metadata,
                   scale=0.5,
                   instance_mode=ColorMode.IMAGE_BW   # remove the colors of unsegmented pixels. This option is only available for segmentation models
    )
    out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
    cv2.imwrite("final.png", out.get_image()[:, :, ::-1])

To load COCO-annotated datasets for use with detectron2, we need to use the register_coco_instances function provided by detectron2.data.datasets module.

Example:

register_coco_instances(
"livecell_dataset_train", 
{}, 
"/hpctmp/ccekwk/dataset/livecell/LIVECell_dataset_2021/annotations/livecell_coco_train.json", 
"/hpctmp/ccekwk/dataset/livecell/LIVECell_dataset_2021/images/livecell_train_val_images"
)

Parameters

Parameters
•	name (str) – the name that identifies a dataset, e.g. “coco_2014_train”.
•	metadata (dict) – extra metadata associated with this dataset. You can leave it as an empty dict.
•	json_file (str) – path to the json instance annotation file.
•	image_root (str or path-like) – directory which contains all the images.

In the config (cfg) section we can define training settings and other configuration items.

cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))

cfg.DATASETS.TRAIN = ("livecell_dataset_train",)
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 4

cfg.MODEL.WEIGHTS = os.path.join(weights_path)
cfg.SOLVER.IMS_PER_BATCH = 1

cfg.SOLVER.BASE_LR = 0.0025  # pick a good LR
cfg.SOLVER.MAX_ITER = 10000    # 300 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset
cfg.SOLVER.STEPS = [2000, 5000]        

cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 256   # faster, and good enough for this toy dataset (default: 512)

cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # only has one class (ballon). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)
# NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here.

Once all the above are done, we can submit the job script to the scheduler for it to be dispatched for training.

When training completes, there will be an output folder with checkpointed weights, final weights, and tfevents files for tensor board visualisation and metrics.