In this challenge, we choose the faster rcnn as our main framework. At fisrt, we analysed the dataset and determine the range of the object scales and aspect ratios. Then we tuning the algorithm as the steps below. We choose the resnet101 as our backbone network. Because it is deep enough and also can be training in a short time. We choose Faster Rcnn as our main framework as the two stage framework always has better precision. We use roialign layer instead of roipooling layer, because it will be benefit to location precision. Multiscale training and multiscale testing, it will make our model be more robust to scale varience. Multimodel ensemble. Different model always show different performance on same dataset, so at last, we fusion the results generated by different models, it make a big improvement on our results. This is our work description, however, there are still be a lot of works we haven't done, so we think we still have a lot of works to do to improve the results.
Institute of computer science & technology of Peking University ,VDIG
CFENet: Exploiting a Real Effective Single Shot Object Detector with Comprehensive Feature Enhancement module
Karlsruhe Institute of Technology
A Mask-RCNN network with Resnet-101 backbone. The backbone was initialized with ImageNet-pretrained Resnet-50 weights. The backbone uses batch-norm and everything else group-norm. To increase scores for rarer classes, images containing trains were sampled 20 times as often during training and images containing motorbikes, bicycles or riders were sampled twice as often as other images. The model was trained on 4 GPUs with 2 images per GPU. All weights except for the heads were frozen for the first 1.5 epochs of training, followed by 0.5 epochs of warmup with a learning rate reduced by a factor of 1/100. After this warmup-phase, training resumed normally with a learning rate decrease by a factor of 10 shortly before the end of training.