SSD

Grab a hands-on realtime-object-detection tool

Try to get a fast (what I mean is detecting in lesss than 1 second on mainstream CPU) object-detection tool from Github, I experiment with some repositories written by PyTorch (because I am familiar with it). Below are some conclusions:
1. detectron2
This the official tool from Facebook Corporation. I download and installed it successfully. The test python code is:

import detectron2
from detectron2.utils.logger import setup_logger
from detectron2.config import get_cfg
from detectron2.engine import DefaultPredictor
from detectron2 import model_zoo
setup_logger()
# import some common libraries
import numpy as np
import cv2
import sys
import time
cfg = get_cfg()
# add project-specific config (e.g., TensorMask) here if you're not running a model in detectron2's core library
cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"))
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5  # set threshold for this model
# Find a model from detectron2's model zoo. You can use the https://dl.fbaipublicfiles... url as well
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")
cfg.MODEL.DEVICE = "cpu"
predictor = DefaultPredictor(cfg)
img = cv2.imread(sys.argv[1])
begin = time.time()
outputs = predictor(img)
print("time:", time.time() - begin)
print(outputs)

Although can’t recognize all birds in below image, it will cost more than 5 seconds on CPU (my MackbookPro). Performance is not as good as my expectation.

2. efficientdet
From the paper, the EfficientDet should be fast and accurate. But after I wrote a test program, it totally couldn’t recognize the object at all. Then I gave up this solution.
3. EfficientDet.Pytorch
Couldn’t download models from it’s model_zoo.
4. ssd.pytorch
Finally, I came to my sweet ssd(Single Shot Detection). Since have studied it for more than half a year, I wrote below snippet quickly:

def base_transform(image, size, mean):
    x = cv2.resize(image, (size, size)).astype(np.float32)
    x -= mean
    x = x.astype(np.float32)
    return x
class BaseTransform:
    def __init__(self, size, mean):
        self.size = size
        self.mean = np.array(mean, dtype=np.float32)
    def __call__(self, image, boxes=None, labels=None):
        return base_transform(image, self.size, self.mean), boxes, labels
def detect(img, net, transform):
    FONT = cv2.FONT_HERSHEY_SIMPLEX
    COLORS = [(255, 0, 0), (0, 255, 0), (0, 0, 255)]
    height, width = img.shape[:2]
    x = torch.from_numpy(transform(img)[0]).permute(2, 0, 1)
    x = Variable(x.unsqueeze(0))
    y = net(x)  # forward pass
    detections = y.data[0]
    # scale each detection back up to the image
    scale = torch.Tensor([width, height, width, height])
    for index, loc in enumerate(detections[3]):
        score = loc.numpy()[0]
        if score >= 0.5:
            loc = loc[1:]
            pt = loc * scale
            print(score, pt)
            cv2.rectangle(
                img,
                (int(pt[0]), int(pt[1])),
                (int(pt[2]), int(pt[3])),
                COLORS[index % 3],
                2,
            )
            cv2.putText(
                img,
                str(score),
                (int(pt[0]), int(pt[1])),
                FONT,
                1,
                (255, 255, 255),
                1,
                cv2.LINE_AA,
            )
    return img
img = cv2.imread("bird_matrix.jpg")
net = build_ssd("test", 300, 21)  # initialize SSD
net.load_state_dict(torch.load("ssd300_mAP_77.43_v2.pth", map_location="cpu"))
transform = BaseTransform(net.size, (104 / 256.0, 117 / 256.0, 123 / 256.0))
img = detect(img, net, transform)
cv2.imwrite("result.jpg", img)

The result is not perfect but good enough for my current situation.

Fix Resnet-101 model in example of MXNET

SSD(Single Shot MultiBox Detector) is the fastest method in object-detection task (Another detector YOLO, is a little bit slower than SSD). In the source code of MXNET，there is an example for SSD implementation. I test it by using different models: inceptionv3, resnet-50, resnet-101 etc. and find a weird phenomenon: the size .params file generated by resnet-101 is smaller than resnet-50.

Model	Size of .params file
resnet-50	119MB
resnet-101	69MB

Since deeper network have larger number of parameters, resnet-101 has smaller file size for parameters seems suspicious.
Reviewing the code of example/ssd/symbol/symbol_factory.py:

    elif network == 'resnet50':
        num_layers = 50
        image_shape = '3,224,224'  # resnet require it as shape check
        network = 'resnet'
        from_layers = ['_plus12', '_plus15', '', '', '', '']
        num_filters = [-1, -1, 512, 256, 256, 128]
        strides = [-1, -1, 2, 2, 2, 2]
        pads = [-1, -1, 1, 1, 1, 1]
        sizes = [[.1, .141], [.2,.272], [.37, .447], [.54, .619], [.71, .79], [.88, .961]]
        ratios = [[1,2,.5], [1,2,.5,3,1./3], [1,2,.5,3,1./3], [1,2,.5,3,1./3], \
            [1,2,.5], [1,2,.5]]
        normalizations = -1
        steps = []
        return locals()
    elif network == 'resnet101':
        num_layers = 101
        image_shape = '3,224,224'
        network = 'resnet'
        from_layers = ['_plus12', '_plus15', '', '', '', '']
        num_filters = [-1, -1, 512, 256, 256, 128]
        strides = [-1, -1, 2, 2, 2, 2]
        pads = [-1, -1, 1, 1, 1, 1]
        sizes = [[.1, .141], [.2,.272], [.37, .447], [.54, .619], [.71, .79], [.88, .961]]
        ratios = [[1,2,.5], [1,2,.5,3,1./3], [1,2,.5,3,1./3], [1,2,.5,3,1./3], \
            [1,2,.5], [1,2,.5]]
        normalizations = -1
        steps = []
        return locals()

Why resnet-50 and resnet-101 has the same ‘from_layers’ ? Let’s check these two models:

In resnet-50, the SSD use two layers (as show in red line) to extract features. One from output of stage-3, another from output of stage-4. In resnet-101, it should be the same (as show in blue line), but it incorrectly copy the config code of resnet-50. The correct ‘from_layers’ for resnet-100 is:

from_layers = ['_plus29', '_plus32', '', '', '', '']

This seems like a bug, so I create a pull request to try fixing it.

DCTC 2016 conference

Yesterday I went to attend DCTC(Data Center Technology Conference) 2016 in Beijing. Although it is called “Data Center Technology”, most topics is about storage, because the conference is hold by Memblaze, a famous flash-storage startup company in China.
Xuebing Yin, The CEO of Memblaze, gave the first topic:

2016 is a important year for flash-storage because the revenue of SSD become bigger than Hard-disk for the first time. As we can see, data center will become full silicon (Hard-Disk is the only non-silicon component in servers) in the near future. As the speed of SSD and its interface (from SATA to PCIE) become faster and faster, many old softwares become bottleneck of performance: mysql 5.6 can’t use up the performance of high-speed SSD, but mysql 5.7 could.
Janene Ellefson from NVME organization introduced why we need a standard for high-speed data transfer.

Could use up to 64K queues with 64K command for each queue, NMVE is definitely the most powerful protocol for modern (or future) IO devices.
Xin Wu from GBase introduced the problems they face in using SSD for database

GBase is a serial of database products for OLTP/OLAP and global data storage. The SSD benefited the OLTP application, but for OLAP application the SSD is too expensive because the Hard-disk Array could also provide the same bandwidth. Maybe that’s why AWS released new type of EBS months ago.
Coly Li (Yes, my old friend ^_^) from SUSE Labs showed us the improvements of Linux Soft RAID in recent years. Many years ago, Linux soft RAID was used only for low speed Hard Disks, so the cost of bad software implement is not significant. But recently, the widely use of SSD expose many bottlenecks in soft RAID, and the developers of open source community commit many patches to improve performance. And, many patches came from Shaohua Li, a sophisticate kernel developer.(He worked first for Intel, and then Fusionio, and now Facebook).

In the tea break, I visited the exhibition of Memblaze.

This is a 1U server built by SuperMicro, it use eight NVME SSD by SFF8639 interfaces in the front. Running Mysql at full speed (4600% CPU usage), the wind run out from back is still not hot. Looks the server of SuperMicro is very effective, and cool 🙂

Robin on Linux

SSD