In conjunction with this workshop, we will hold three challenges this year.
This track targets on learning to perform object semantic segmentation using image-level annotations as supervision [1, 2, 3]. The dataset is built upon the image detection track of ImageNet Large Scale Visual Recognition Competition (ILSVRC) [4], which totally includes 456, 567 training images from 200 categories. We provide pixel-level annotations of 15K images (validation/testing: 5, 000/10, 000) for evaluation.
Rank | Participant team | Mean IoU | Mean accuracy | Pixel accuracy |
---|---|---|---|---|
1st | Shuo Li, Zehua Hao, Yaoyang Du, Fang Liu#, Licheng Jiao#. Xidian University, IPIU Lab [slide] | 49.06 | 68.1 | 86.64 |
2nd | Junwen Pan, Yongjuan Ma and Pengfei Zhu. Tianjin University [slide] | 49.03 | 67.87 | 87.53 |
3rd | Xun Feng1, Zhenyuan Chen1, Zhendong Wang1, Yibing Zhan2, Chen Gong1. 1Nanjing University of Science and Technology, 2JD Explore Academy, JD.com [slide] | 39.68 | 53 | 82.18 |
Given a photo containing multiple product instances and a user-provided description, the track aims to detect the boxes of each product and retrieve the correct single product image in the gallery. We collect 1132830 real-world product photos in e-commerce website where each photo contains 2.83 products on average and corresponds to a user-provided description, and a single-product gallery with 40033 images for evaluating the retrieval performance. We split 9220 photos and their corresponding descriptions as the test set and provide product-level bounding boxes for each photo. This new track poses a very common setting in real-world application (e.g. e-commerce) and an interesting testbed for learning from imperfect data which testifies both the weak-supervised object retrieval given a caption, fine-grained instance recognitions and cross-modality (i.e. text and image) object-level retrieval. More details of this challenge are provided at https://competitions.codalab.org/competitions/30123
Rank | Participant team |
---|---|
1st | Baojun Li, Gengxin Wang, Jiamian Huang, Tao Liu, Zhiwei Shi, Zhimeng Wang. Joyy Al Research [slide] |
2nd | Yanxin Long, Shuai Lin. Sun Yat-sen University |
3rd | Hanyu Zhang, Pengliang Sun, Xing Liu. Chinese University of Hong Kong |
This track targets on making the classification networks be equipped with the ability of object localization [7, 8, 9]. The dataset is built upon the image classification/localization track of ImageNet Large Scale Visual Recognition Competition (ILSVRC), which totally includes 1.2 million training images from 1000 categories. We provide pixel-level annotations of 44, 271 images (validation/testing: 23, 151/21, 120) for evaluation.
Rank | Participant team | Peak_IoU | Peak_Threshold |
---|---|---|---|
1st | Xun Feng1, Zhenyuan Chen1, Zhendong Wang1, Yibing Zhan2, Chen Gong1. 1Nanjing University of Science and Technology, 2JD Explore Academy, JD.com [slide] | 0.697 | 149 |
2nd | Yonsei-CVPR | 0.55 | 41 |
This track aims to recognize human parts (19 semantics in total) within high-resolution images by learning with low-resolution ones, which is few explored before. To this end, we annotated 10,500 single-person images (training/validation/testing: 6,000/500/4,000) with an average resolution of 3950 by 2200. Besides the provided high-resolution images, off-the-shelf low-resolution datasets such as LIP and Pascal-Person-Part are welcome adopted for pre-training. This new track poses a new task of learning from imperfect data, transferring the learned knowledge from low-resolution images to high-resolution images. More details of this challenge are provided at https://competitions.codalab.org/competitions/30375
Rank | Participant team | eIoU | mIoU |
---|---|---|---|
1st | Lu Yang1, Liulei Li2, 4, Tianfei Zhou3, Wenguan Wang3, Yi Liu4, Qing Song1. 1BUPT-PRIV, 2BIT, 3ETH Zurich, 4Baidu [slide] | 48.29 | 79.32 |
2nd | DeepBlueAI team [slide] | 46.24 | 77.49 |
3rd | DISL | 43.87 | 76.72 |