Ku Wee Kiat, Research Computing, NUS Information Technology
In this article, we will cover how to train a segmentation model using Detectron2 on your COCO-annotated dataset on NUS HPC GPU Cluster. A singularity container with detectron2 pre-installed is available on the NUS HPC GPU Cluster. The path is:
/app1/common/singularity-img/3.0.0/detectron2_pytorch_1.9_cuda11.1_ubuntu20.04v2.sif
Similar to how we use the cluster to train any other deep learning model, we will have to create a job script. Below is an example of a job script we can use:
#!/bin/bash #PBS -l select=1:mem=90gb:ncpus=10:ngpus=1 #PBS -l walltime=7:00:00 #PBS -q volta_gpu #PBS -P Volta_gpu_demo #PBS -N detectron2_coco_train #PBS -j oe # Singularity image to use # Other images available in /app1/common/singularity-img/3.0.0/ image="/app1/common/singularity-img/3.0.0/detectron2_pytorch_1.9_cuda11.1_ubuntu20.04v2.sif" # Change to directory where job was submitted if [ x"$PBS_O_WORKDIR" != x ] ; then cd "$PBS_O_WORKDIR" || exit $? fi np=`cat $PBS_NODEFILE | wc -l` singularity exec $image bash << EOF > stdout.$PBS_JOBID 2> stderr.$PBS_JOBID #insert commands here python train_cell_complex.py EOF
We will be using the LIVECell dataset to train a cell segmentation model. The LIVECell dataset is available at the following link.
The folder structure for the dataset will be as follows:
Overall:
/hpctmp/ccekwk/dataset/livecell/LIVECell_dataset_2021/ |-- annotations | `-- dataset_size_split `-- images |-- livecell_test_images `-- livecell_train_val_images Annotations directory: /hpctmp/ccekwk/dataset/livecell/LIVECell_dataset_2021/annotations/ |-- dataset_size_split | |-- 0_train2percent.json | |-- 1_train4percent.json | |-- 2_train5percent.json | |-- 3_train25percent.json | `-- 4_train50percent.json |-- livecell_coco_test.json |-- livecell_coco_train.json `-- livecell_coco_val.json
The COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x provided by Detectron2 model will be finetuned on the LIVECell dataset. The filename when downloaded directly is
model_final_f10217.pkl.
The script below demonstrates the simplest way to train a detectron2 segmentation model:
import detectron2 from detectron2.utils.logger import setup_logger setup_logger() # import some common libraries import numpy as np import os, json, cv2, random # import some common detectron2 utilities from detectron2 import model_zoo from detectron2.engine import DefaultPredictor from detectron2.config import get_cfg from detectron2.utils.visualizer import Visualizer from detectron2.data.datasets import register_coco_instances from detectron2.data import MetadataCatalog, DatasetCatalog from detectron2.structures import BoxMode from detectron2.engine import DefaultTrainer weights_path = "../detectron2_weights/model_final_f10217.pkl" # Dataset definition register_coco_instances("livecell_dataset_train", {}, "/hpctmp/ccekwk/dataset/livecell/LIVECell_dataset_2021/annotations/livecell_coco_train.json", "/hpctmp/ccekwk/dataset/livecell/LIVECell_dataset_2021/images/livecell_train_val_images") register_coco_instances("livecell_dataset_val", {}, "/hpctmp/ccekwk/dataset/livecell/LIVECell_dataset_2021/annotations/livecell_coco_val.json", "/hpctmp/ccekwk/dataset/livecell/LIVECell_dataset_2021/images/livecell_train_val_images") balloon_metadata = MetadataCatalog.get("livecell_dataset_train") # Model and training config cfg = get_cfg() cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")) cfg.DATASETS.TRAIN = ("livecell_dataset_train",) cfg.DATASETS.TEST = () cfg.DATALOADER.NUM_WORKERS = 4 cfg.MODEL.WEIGHTS = os.path.join(weights_path) #cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml") # Let training initialize from model zoo cfg.SOLVER.IMS_PER_BATCH = 1 cfg.SOLVER.BASE_LR = 0.0025 # pick a good LR cfg.SOLVER.MAX_ITER = 10000 # 300 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset cfg.SOLVER.STEPS = [2000, 5000] # do not decay learning rate cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 256 # faster, and good enough for this toy dataset (default: 512) cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1 # only has one class (cell). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets) # NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here. # Start training os.makedirs(cfg.OUTPUT_DIR, exist_ok=True) trainer = DefaultTrainer(cfg) trainer.resume_or_load(resume=False) trainer.train() cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth") # path to the model we just trained # Inference cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7 # set a custom testing threshold predictor = DefaultPredictor(cfg) from detectron2.utils.visualizer import ColorMode dataset_dicts = DatasetCatalog.get("livecell_dataset_val") for d in random.sample(dataset_dicts, 3): im = cv2.imread(d["file_name"]) outputs = predictor(im) # format is documented at https://detectron2.readthedocs.io/tutorials/models.html#model-output-format v = Visualizer(im[:, :, ::-1], metadata=balloon_metadata, scale=0.5, instance_mode=ColorMode.IMAGE_BW # remove the colors of unsegmented pixels. This option is only available for segmentation models ) out = v.draw_instance_predictions(outputs["instances"].to("cpu")) cv2.imwrite("final.png", out.get_image()[:, :, ::-1])
To load COCO-annotated datasets for use with detectron2, we need to use the register_coco_instances function provided by detectron2.data.datasets module.
Example:
register_coco_instances( "livecell_dataset_train", {}, "/hpctmp/ccekwk/dataset/livecell/LIVECell_dataset_2021/annotations/livecell_coco_train.json", "/hpctmp/ccekwk/dataset/livecell/LIVECell_dataset_2021/images/livecell_train_val_images" )
Parameters
Parameters • name (str) – the name that identifies a dataset, e.g. “coco_2014_train”. • metadata (dict) – extra metadata associated with this dataset. You can leave it as an empty dict. • json_file (str) – path to the json instance annotation file. • image_root (str or path-like) – directory which contains all the images.
In the config (cfg) section we can define training settings and other configuration items.
cfg = get_cfg() cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")) cfg.DATASETS.TRAIN = ("livecell_dataset_train",) cfg.DATASETS.TEST = () cfg.DATALOADER.NUM_WORKERS = 4 cfg.MODEL.WEIGHTS = os.path.join(weights_path) cfg.SOLVER.IMS_PER_BATCH = 1 cfg.SOLVER.BASE_LR = 0.0025 # pick a good LR cfg.SOLVER.MAX_ITER = 10000 # 300 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset cfg.SOLVER.STEPS = [2000, 5000] cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 256 # faster, and good enough for this toy dataset (default: 512) cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1 # only has one class (ballon). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets) # NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here.
Once all the above are done, we can submit the job script to the scheduler for it to be dispatched for training.
When training completes, there will be an output folder with checkpointed weights, final weights, and tfevents files for tensor board visualisation and metrics.