In conjunction with this workshop, we will hold three challenges this year.
This track targets on learning to perform object semantic segmentation using image-level annotations as supervision [1, 2, 3]. The dataset is built upon the image detection track of ImageNet Large Scale Visual Recognition Competition (ILSVRC) , which totally includes 456, 567 training images from 200 categories. We provide pixel-level annotations of 15K images (validation/testing: 5, 000/10, 000) for evaluation.
|Rank||Participant team||Mean IoU||Mean accuracy||Pixel accuracy|
|1st||Shuo Li, Zehua Hao, Yaoyang Du, Fang Liu#, Licheng Jiao#. Xidian University, IPIU Lab [slide]||49.06||68.1||86.64|
|2nd||Junwen Pan, Yongjuan Ma and Pengfei Zhu. Tianjin University [slide]||49.03||67.87||87.53|
|3rd||Xun Feng1, Zhenyuan Chen1, Zhendong Wang1, Yibing Zhan2, Chen Gong1. 1Nanjing University of Science and Technology, 2JD Explore Academy, JD.com [slide]||39.68||53||82.18|
Given a photo containing multiple product instances and a user-provided description, the track aims to detect the boxes of each product and retrieve the correct single product image in the gallery. We collect 1132830 real-world product photos in e-commerce website where each photo contains 2.83 products on average and corresponds to a user-provided description, and a single-product gallery with 40033 images for evaluating the retrieval performance. We split 9220 photos and their corresponding descriptions as the test set and provide product-level bounding boxes for each photo. This new track poses a very common setting in real-world application (e.g. e-commerce) and an interesting testbed for learning from imperfect data which testifies both the weak-supervised object retrieval given a caption, fine-grained instance recognitions and cross-modality (i.e. text and image) object-level retrieval. More details of this challenge are provided at https://competitions.codalab.org/competitions/30123
|1st||Baojun Li, Gengxin Wang, Jiamian Huang, Tao Liu, Zhiwei Shi, Zhimeng Wang. Joyy Al Research [slide]|
|2nd||Yanxin Long, Shuai Lin. Sun Yat-sen University|
|3rd||Hanyu Zhang, Pengliang Sun, Xing Liu. Chinese University of Hong Kong|
This track targets on making the classification networks be equipped with the ability of object localization [7, 8, 9]. The dataset is built upon the image classification/localization track of ImageNet Large Scale Visual Recognition Competition (ILSVRC), which totally includes 1.2 million training images from 1000 categories. We provide pixel-level annotations of 44, 271 images (validation/testing: 23, 151/21, 120) for evaluation.
|1st||Xun Feng1, Zhenyuan Chen1, Zhendong Wang1, Yibing Zhan2, Chen Gong1. 1Nanjing University of Science and Technology, 2JD Explore Academy, JD.com [slide]||0.697||149|
This track aims to recognize human parts (19 semantics in total) within high-resolution images by learning with low-resolution ones, which is few explored before. To this end, we annotated 10,500 single-person images (training/validation/testing: 6,000/500/4,000) with an average resolution of 3950 by 2200. Besides the provided high-resolution images, off-the-shelf low-resolution datasets such as LIP and Pascal-Person-Part are welcome adopted for pre-training. This new track poses a new task of learning from imperfect data, transferring the learned knowledge from low-resolution images to high-resolution images. More details of this challenge are provided at https://competitions.codalab.org/competitions/30375
|1st||Lu Yang1, Liulei Li2, 4, Tianfei Zhou3, Wenguan Wang3, Yi Liu4, Qing Song1. 1BUPT-PRIV, 2BIT, 3ETH Zurich, 4Baidu [slide]||48.29||79.32|
|2nd||DeepBlueAI team [slide]||46.24||77.49|