In conjunction with this workshop, we will hold three challenges this year.
This track targets on learning to perform object semantic segmentation using image-level annotations as supervision [1, 2, 3]. The dataset is built upon the image detection track of ImageNet Large Scale Visual Recognition Competition (ILSVRC) , which totally includes 456, 567 training images from 200 categories. We provide pixel-level annotations of 15K images (validation/testing: 5, 000/10, 000) for evaluation.
Given a photo containing multiple product instances and a user-provided description, the track aims to detect the boxes of each product and retrieve the correct single product image in the gallery. We collect 1132830 real-world product photos in e-commerce website where each photo contains 2.83 products on average and corresponds to a user-provided description, and a single-product gallery with 40033 images for evaluating the retrieval performance. We split 9220 photos and their corresponding descriptions as the test set and provide product-level bounding boxes for each photo. This new track poses a very common setting in real-world application (e.g. e-commerce) and an interesting testbed for learning from imperfect data which testifies both the weak-supervised object retrieval given a caption, fine-grained instance recognitions and cross-modality (i.e. text and image) object-level retrieval. More details of this challenge are provided at https://competitions.codalab.org/competitions/30123
This track targets on making the classification networks be equipped with the ability of object localization [7, 8, 9]. The dataset is built upon the image classification/localization track of ImageNet Large Scale Visual Recognition Competition (ILSVRC), which totally includes 1.2 million training images from 1000 categories. We provide pixel-level annotations of 44, 271 images (validation/testing: 23, 151/21, 120) for evaluation.
This track aims to recognize human parts (19 semantics in total) within high-resolution images by learning with low-resolution ones, which is few explored before. To this end, we annotated 10,500 single-person images (training/validation/testing: 6,000/500/4,000) with an average resolution of 3950 by 2200. Besides the provided high-resolution images, off-the-shelf low-resolution datasets such as LIP and Pascal-Person-Part are welcome adopted for pre-training. This new track poses a new task of learning from imperfect data, transferring the learned knowledge from low-resolution images to high-resolution images. More details of this challenge are provided at https://competitions.codalab.org/competitions/30375
This year, we have two strict rules for all competitors.
 George Papandreou, Liang-Chieh Chen, Kevin Murphy, and Alan L Yuille. Weakly-and semi-supervised learning of a dcnn for semantic image segmentation. In ICCV, 2015.
 Yunchao Wei, Huaxin Xiao, Honghui Shi, Zequn Jie, Jiashi Feng, and Thomas S Huang. Revisiting dilated convolution: A simple approach for weakly-and semi-supervised semantic segmentation. In CVPR, 2018.
 Peng-Tao Jiang, Qibin Hou, Yang Cao, Ming-Ming Cheng, Yunchao Wei, and Hong-Kai Xiong. Integral object mining via online attention accumulation. In ICCV, 2019.
 Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: a large-scale hierarchical image database. In CVPR, 2009.
 Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. Scene parsing through ade20k dataset. In CVPR, 2017.
 Rui Qian, Yunchao Wei, Honghui Shi, Jiachen Li, Jiaying Liu, and Thomas Huang. Weakly Supervised Scene Parsing with Point-based Distance Metric Learning. In AAAI, 2019
 Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Learning Deep Features for Discriminative Localization. In IEEE CVPR, 2016.
 Xiaolin Zhang, Yunchao Wei, Jiashi Feng, Yi Yang, and Thomas Huang. Adversarial complementary learning for weakly supervised object localization. In IEEE CVPR, 2018.
 Xiaolin Zhang, Yunchao Wei, Guoliang Kang, Yi Yang, and Thomas Huang. Self-produced guidance for weakly-supervised object localization. In ECCV, 2018.
 Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, 2018.