In conjunction with this workshop, we will hold three challenges this year.

Track1

Weakly-supervised Semantic Segmentation

This track targets on learning to perform object semantic segmentation using image-level annotations as supervision [1, 2, 3]. The dataset is built upon the image detection track of ImageNet Large Scale Visual Recognition Competition (ILSVRC) [4], which totally includes 456, 567 training images from 200 categories. We provide pixel-level annotations of 15K images (validation/testing: 5, 000/10, 000) for evaluation.

Evalution: Mean Intersection-Over-Union (IoU) score over 200 categories.
Download: The training dataset is available at Imagenet DET Baidu Drive (pwd: 5e7g) Google Drive , val and test dataset are available at Baidu Drive and Google Drive
Note: The image label information can be extracted using the scripts
Submission: https://evalai.cloudcv.org/web/challenges/challenge-page/556/overview

Rank	Participant team	Mean IoU	Mean accuracy	Pixel accuracy
1st	Shuo Li, Zehua Hao, Yaoyang Du, Fang Liu^#, Licheng Jiao^#. Xidian University, IPIU Lab [slide]	49.06	68.1	86.64
2nd	Junwen Pan, Yongjuan Ma and Pengfei Zhu. Tianjin University [slide]	49.03	67.87	87.53
3rd	Xun Feng¹, Zhenyuan Chen¹, Zhendong Wang¹, Yibing Zhan², Chen Gong¹. ¹Nanjing University of Science and Technology, ²JD Explore Academy, JD.com [slide]	39.68	53	82.18

Track2

Weakly supervised product retrieval

Given a photo containing multiple product instances and a user-provided description, the track aims to detect the boxes of each product and retrieve the correct single product image in the gallery. We collect 1132830 real-world product photos in e-commerce website where each photo contains 2.83 products on average and corresponds to a user-provided description, and a single-product gallery with 40033 images for evaluating the retrieval performance. We split 9220 photos and their corresponding descriptions as the test set and provide product-level bounding boxes for each photo. This new track poses a very common setting in real-world application (e.g. e-commerce) and an interesting testbed for learning from imperfect data which testifies both the weak-supervised object retrieval given a caption, fine-grained instance recognitions and cross-modality (i.e. text and image) object-level retrieval. More details of this challenge are provided at https://competitions.codalab.org/competitions/30123

Rank	Participant team
1st	Baojun Li, Gengxin Wang, Jiamian Huang, Tao Liu, Zhiwei Shi, Zhimeng Wang. Joyy Al Research [slide]
2nd	Yanxin Long, Shuai Lin. Sun Yat-sen University
3rd	Hanyu Zhang, Pengliang Sun, Xing Liu. Chinese University of Hong Kong

Track3

Weakly-supervised Object Localization

This track targets on making the classification networks be equipped with the ability of object localization [7, 8, 9]. The dataset is built upon the image classification/localization track of ImageNet Large Scale Visual Recognition Competition (ILSVRC), which totally includes 1.2 million training images from 1000 categories. We provide pixel-level annotations of 44, 271 images (validation/testing: 23, 151/21, 120) for evaluation.

Evalution: IoU curve. With the predicted object localization map, we calculate the IoU scores between the foreground pixels and the ground-truth masks under different thresholds. In the ideal curve, the highest IoU score is expected to close to 1.0. The threshold value corresponding to the highest IoU score is expected to be 255 since the higher threshold values can reflect a higher contrast between the target object and the background.
Download: validation dataset, test list and evaluation scripts are available at Baidu Drive (pwd: z5yp) and Google Drive
Submission: https://evalai.cloudcv.org/web/challenges/challenge-page/557/overview
The evaluation server error occurred. Please send the zipped results to liutingtianna@gmail.com for evaluation.

Rank	Participant team	Peak_IoU	Peak_Threshold
1st	Xun Feng¹, Zhenyuan Chen¹, Zhendong Wang¹, Yibing Zhan², Chen Gong¹. ¹Nanjing University of Science and Technology, ²JD Explore Academy, JD.com [slide]	0.697	149
2nd	Yonsei-CVPR	0.55	41

Track4

High-resolution Human Parsing

This track aims to recognize human parts (19 semantics in total) within high-resolution images by learning with low-resolution ones, which is few explored before. To this end, we annotated 10,500 single-person images (training/validation/testing: 6,000/500/4,000) with an average resolution of 3950 by 2200. Besides the provided high-resolution images, off-the-shelf low-resolution datasets such as LIP and Pascal-Person-Part are welcome adopted for pre-training. This new track poses a new task of learning from imperfect data, transferring the learned knowledge from low-resolution images to high-resolution images. More details of this challenge are provided at https://competitions.codalab.org/competitions/30375

Details will be available soon.

Rank	Participant team	eIoU	mIoU
1st	Lu Yang¹, Liulei Li^{2, 4}, Tianfei Zhou³, Wenguan Wang³, Yi Liu⁴, Qing Song¹. ¹BUPT-PRIV, ²BIT, ³ETH Zurich, ⁴Baidu [slide]	48.29	79.32
2nd	DeepBlueAI team [slide]	46.24	77.49
3rd	DISL	43.87	76.72