2nd International Workshop on 
Compact and Efficient Feature Representation and Learning 
in Computer Vision

September 9th, 2018

Munich, Germany 
in conjunction with ECCV 2018

2nd International Workshop on 
Compact and Efficient Feature Representation and Learning 
in Computer Vision

September 9th, 2018
Munich, Germany 
in conjunction with ECCV 2018
Feature representation is at the core of many computer vision problems such as image classification, object detection, tracking and recognition, and image/video retrieval. This rapidly developing field is concerned with questions surrounding how we can best seek meaningful and useful features that can support effective machine learning. In the past two decades we have witnessed remarkable progress in feature representation and learning, starting from hand-crafted features entering the scene and evolving to deep learning based ones dominating the computer vision field today. Hand-crafted features are not data adaptive and usually labor intensive. Deep Convolutional Neural Networks (DCNN) is a hierarchical structure that attempts to learn representations of data with multiple levels of abstraction automatically.  

However, it is a common belief that existing DCNN based features often rely on computationally expensive deep models, which are very slow for numerous applications. Nowadays, featuring exponentially increasing number of images and videos, the emerging phenomenon of big dimensionality (millions of dimensions and above) renders the inadequacies of existing approaches, no matter hand-crafted or deep learning based features. There is thus a pressing need for new scalable and efficient approaches that can cope with this explosion of dimensionality. In addition, with the prevalence of social media networks and portable/wearable devices which have limited computational capabilities and storage space, the demands for real-time sophisticated portable/wearable device applications in handling large-scale visual data is rising.  

Therefore, there is a growing need for feature descriptors that are fast to compute, memory efficient, and yet exhibiting good discriminability and robustness. A number of attempting efforts, such as compact binary features, DCNN network quantization, simple and efficient neural network architectures and big dimensionality-oriented feature selection, have appeared in top conferences and journals. The aim of this workshop is to stimulate researchers from the fields of computer vision to present high quality work and to provide a cross-fertilization ground for stimulating discussions on the next steps in this important research area.
Feature representation is at the core of many computer vision problems such as image classification, object detection, tracking and recognition, and image/video retrieval. This rapidly developing field is concerned with questions surrounding how we can best seek meaningful and useful features that can support effective machine learning. In the past two decades we have witnessed remarkable progress in feature representation and learning, starting from hand-crafted features entering the scene and evolving to deep learning based ones dominating the computer vision field today. Hand-crafted features are not data adaptive and usually labor intensive. Deep Convolutional Neural Networks (DCNN) is a hierarchical structure that attempts to learn representations of data with multiple levels of abstraction automatically.
However, it is a common belief that existing DCNN based features often rely on computationally expensive deep models, which are very slow for numerous applications. Nowadays, featuring exponentially increasing number of images and videos, the emerging phenomenon of big dimensionality (millions of dimensions and above) renders the inadequacies of existing approaches, no matter hand-crafted or deep learning based features. There is thus a pressing need for new scalable and efficient approaches that can cope with this explosion of dimensionality. In addition, with the prevalence of social media networks and portable/wearable devices which have limited computational capabilities and storage space, the demands for real-time sophisticated portable/wearable device applications in handling large-scale visual data is rising.
Therefore, there is a growing need for feature descriptors that are fast to compute, memory efficient, and yet exhibiting good discriminability and robustness. A number of attempting efforts, such as compact binary features, DCNN network quantization, simple and efficient neural network architectures and big dimensionality-oriented feature selection, have appeared in top conferences and journals. The aim of this workshop is to stimulate researchers from the fields of computer vision to present high quality work and to provide a cross-fertilization ground for stimulating discussions on the next steps in this important research area.

Important Dates

Paper Submission Deadline:
Notification of Acceptance:
Camera-Ready Due:
Workshop:
July 8, 2018
July 15, 2018
July 31, 2018
August 2, 2018
September 25, 2018
September 9, 2018 (Full Day)

Important Dates

Paper Submission Deadline:
Notification of Acceptance:
Camera-Ready Due:
Workshop:
July 8, 2018
July 15, 2018
July 31, 2018
August 2, 2018
September 25, 2018
September 9, 2018 (Full Day)

Topics

We encourage researchers to study and develop new compact and efficient feature representations that are fast to compute, memory efficient, and yet exhibiting good discriminability and robustness. We also encourage new theories and applications related to features for dealing with these challenges. We are soliciting original contributions that address a wide range of theoretical and practical issues including, but not limited to: 

1. New features (handcrafted features, efficient and novel DCNN architectures, and feature learning in supervised, weakly supervised or unsupervised way) that are fast to compute, memory efficient and suitable for large-scale problems; 

2. New compact and efficient features that are suitable for wearable devices (e.g., smart glasses, smart phones, smart watches) with strict requirements for computational efficiency and low power consumption; 

3. Hashing/binary codes learning and its related applications in different domains, e.g. content-based retrieval; 

4. Big dimensionality-oriented dimensionality reduction and feature selection;  

5. Evaluations of current handcrafted descriptors and deep learning based features; 

6. Other applications in different domains, such as robotics and medical image analysis.

Keynote Speakers

Title: Robust and Compact Deep Learning Features for Image Matching and Retrieval

Abstract: Feature representations based on deep learning allow for study of feature distribution properties that are critical for robust visual matching and retrieval. Various loss functions like pairwise, triplet, or global losses have been developed recently with impressive performance. In this talk, we will present a few ideas to achieve desirable feature distribution properties (spread-pout, compact, discriminative) for large-scale applications. We will present a general regularization step to make features spread out in the feature space. We will present a temperature control mechanism to make the classifier-based features more compact and separable. Finally, we will tackle the large-scale indexing problem - how to preserve the logarithmic search ability of the recent emergent small-world graph indexing scheme when reducing features to compact binary hash code. (Based on joint work with Xu Zhang and Svebor Karaman).

Biography: Prof. Shih-Fu Chang is the Richard Dicker Professor at Columbia University, with appointments in both Electrical Engineering Department and Computer Science Department. His research is focused on multimedia information retrieval, computer vision, machine learning, and signal processing. A primary goal of his work is to develop intelligent systems that can extract rich information from the vast amount of visual data such as those emerging on the Web, collected through pervasive sensing, or available in gigantic archives. His work on content-based visual search in the early 90's, VisualSEEk and VideoQ, set the foundation of this vibrant area. Over the years, he continued to develop innovative solutions for image/video recognition, multimodal analysis, visual content ontology, image authentication, and compact hashing for large-scale indexing. His scholarly work can be seen in more than 300 peer-reviewed publications, many best paper awards, more than 30 issued patents, and technologies licensed to seven companies. He was listed as the Most Influential Scholar in the field of Multimedia by Aminer in 2016. For his long-term pioneering contributions, he has been awarded the IEEE Signal Processing Society Technical Achievement Award, ACM Multimedia Special Interest Group Technical Achievement Award, Honorary Doctorate from the University of Amsterdam, the IEEE Kiyo Tomiyasu Award, and IBM Faculty Award. He served as Chair of ACM SIGMM (2013-2017), Chair of Columbia Electrical Engineering Department (2007-2010), the Editor-in-Chief of the IEEE Signal Processing Magazine (2006-8), and advisor for several international research institutions and companies. He is a Fellow of the American Association for the Advancement of Science (AAAS), ACM, and IEEE.
Title: Invariance and Stability of Deep Convolutional Representations (Slides)

Abstract: In this work, we study invariant properties of convolutional neural networks, their stability to image deformations, and their model complexity from a kernel point of view. This is achieved by generalizing the multilayer kernel construction introduced in the context of convolutional kernel networks and by studying the geometry of the corresponding reproducing kernel Hilbert space. We show that the signal representation is stable and that models from this functional space, such as a large class of convolutional neural networks with homogeneous activation functions, may enjoy the same stability. In particular, we study the norm of such models, which acts as a measure of complexity that controls both stability and generalization. This is a joint work with Alberto Bietti.

Biography: Dr. Julien Mairal is a research scientist at Inria. He received a graduate degree from Ecole Polytechnique, France, in 2005, and a Ph.D. from Ecole Normale Supérieure, Cachan, France, in 2010. He spent two years as a post-doctoral researcher in the statistics department of UC Berkeley, before joining Inria in 2012. His research interests include machine learning, computer vision, mathematical optimization, and statistical image and signal processing. In 2013, he received the Cor Baayen prize, awarded every year by ERCIM to a promising young researcher in computer science and applied mathematics. In 2016, he received a Starting Grant from the European Research Council (ERC), and he received the IEEE PAMI young researcher award in 2017. He has served as an area chair for CVPR, ICCV, ECCV, NIPS, ICLR, and ICML. He is a senior member of the IEEE, and he is an associate editor of the International Journal of Computer Vision (IJCV), of the Journal of Mathematical Imaging and Vision (JMIV), and of SIAM journal on imaging sciences (SIIMS).
Title: Learning and Adapting from the Web for Visual Recognition (Slides)

Abstract: Visual recognition is of fundamental importance to computer vision and downstream applications. Substantial research in the last half decade shows that coupling powerful deep learning models with a big manually compiled dataset can yield excellent results for generic object, scene, and human activity recognition. However, it is both costly and laborious to select and label data, not mentioning that the data is extremely scarce in some domains. As a result, transfer learning, semi-supervised learning, self-supervised learning, and webly-supervised learning, etc. have gained increasing interests recently, thanks to their potential of leveraging the deep models yet with low-cost data. In this talk, I will progressively present three of our recent works on learning and adapting from the Web data for visual recognition. In the first setting, we show how to effectively learn from Google Images by an outlier-resilient semi-supervised learning method. In the second setting, we investigate surprisingly accurate geometric labels encoded by the 3D videos/movies which are vastly available on the Web. Finally, we let the Web images and Web videos mutually vote for the query-relevant subsets in order to automatically harvest a webly labeled diverse training set for human activity recognition. In parallel with the efforts of exploiting the Web data, I will also present in-depth some of the techniques behind those works (e.g., the multiple shades of dropout, curriculum learning, and kernel mean matching), shedding lights on the variations of them in our other works on WGAN and domain adaptation.

Biography: Dr. Boqing Gong is a Principal Researcher of Tencent AI Lab at Seattle, working on machine learning and computer vision. Before joining Tencent, he was a tenure-track Assistant Professor in University of Central Florida (UCF). His research in UCF was supported in part by an NSF CRII award (so-PI, received in 2016) and an NSF BIGDATA award (PI, received in 2017), both of which were the first of their kinds ever granted to UCF. He actively serves on NSF panels and the program committees of computer vision conferences (CVPR, ICCV, ECCV, etc.) and machine learning conferences (ICML, NIPS, AISTATS, etc.). He was an area chair of IEEE WACV'18 and a mentor of its PhD forum. In 2015, he received a Ph.D. degree in Computer Science from the University of Southern California, where his work was partially supported by the Viterbi Fellowship.

Keynote Speakers

Title: Robust and Compact Deep Learning Features for Image Matching and Retrieval

Abstract: Feature representations based on deep learning allow for study of feature distribution properties that are critical for robust visual matching and retrieval. Various loss functions like pairwise, triplet, or global losses have been developed recently with impressive performance. In this talk, we will present a few ideas to achieve desirable feature distribution properties (spread-pout, compact, discriminative) for large-scale applications. We will present a general regularization step to make features spread out in the feature space. We will present a temperature control mechanism to make the classifier-based features more compact and separable. Finally, we will tackle the large-scale indexing problem - how to preserve the logarithmic search ability of the recent emergent small-world graph indexing scheme when reducing features to compact binary hash code. (Based on joint work with Xu Zhang and Svebor Karaman).

Biography: Prof. Shih-Fu Chang is the Richard Dicker Professor at Columbia University, with appointments in both Electrical Engineering Department and Computer Science Department. His research is focused on multimedia information retrieval, computer vision, machine learning, and signal processing. A primary goal of his work is to develop intelligent systems that can extract rich information from the vast amount of visual data such as those emerging on the Web, collected through pervasive sensing, or available in gigantic archives. His work on content-based visual search in the early 90's, VisualSEEk and VideoQ, set the foundation of this vibrant area. Over the years, he continued to develop innovative solutions for image/video recognition, multimodal analysis, visual content ontology, image authentication, and compact hashing for large-scale indexing. His scholarly work can be seen in more than 300 peer-reviewed publications, many best paper awards, more than 30 issued patents, and technologies licensed to seven companies. He was listed as the Most Influential Scholar in the field of Multimedia by Aminer in 2016. For his long-term pioneering contributions, he has been awarded the IEEE Signal Processing Society Technical Achievement Award, ACM Multimedia Special Interest Group Technical Achievement Award, Honorary Doctorate from the University of Amsterdam, the IEEE Kiyo Tomiyasu Award, and IBM Faculty Award. He served as Chair of ACM SIGMM (2013-2017), Chair of Columbia Electrical Engineering Department (2007-2010), the Editor-in-Chief of the IEEE Signal Processing Magazine (2006-8), and advisor for several international research institutions and companies. He is a Fellow of the American Association for the Advancement of Science (AAAS), ACM, and IEEE.
Title: Invariance and Stability of Deep Convolutional Representations (Slides)

Abstract: In this work, we study invariant properties of convolutional neural networks, their stability to image deformations, and their model complexity from a kernel point of view. This is achieved by generalizing the multilayer kernel construction introduced in the context of convolutional kernel networks and by studying the geometry of the corresponding reproducing kernel Hilbert space. We show that the signal representation is stable and that models from this functional space, such as a large class of convolutional neural networks with homogeneous activation functions, may enjoy the same stability. In particular, we study the norm of such models, which acts as a measure of complexity that controls both stability and generalization. This is a joint work with Alberto Bietti.

Biography: Dr. Julien Mairal is a research scientist at Inria. He received a graduate degree from Ecole Polytechnique, France, in 2005, and a Ph.D. from Ecole Normale Supérieure, Cachan, France, in 2010. He spent two years as a post-doctoral researcher in the statistics department of UC Berkeley, before joining Inria in 2012. His research interests include machine learning, computer vision, mathematical optimization, and statistical image and signal processing. In 2013, he received the Cor Baayen prize, awarded every year by ERCIM to a promising young researcher in computer science and applied mathematics. In 2016, he received a Starting Grant from the European Research Council (ERC), and he received the IEEE PAMI young researcher award in 2017. He has served as an area chair for CVPR, ICCV, ECCV, NIPS, ICLR, and ICML. He is a senior member of the IEEE, and he is an associate editor of the International Journal of Computer Vision (IJCV), of the Journal of Mathematical Imaging and Vision (JMIV), and of SIAM journal on imaging sciences (SIIMS).
Title: Learning and Adapting from the Web for Visual Recognition (Slides)

Abstract: Visual recognition is of fundamental importance to computer vision and downstream applications. Substantial research in the last half decade shows that coupling powerful deep learning models with a big manually compiled dataset can yield excellent results for generic object, scene, and human activity recognition. However, it is both costly and laborious to select and label data, not mentioning that the data is extremely scarce in some domains. As a result, transfer learning, semi-supervised learning, self-supervised learning, and webly-supervised learning, etc. have gained increasing interests recently, thanks to their potential of leveraging the deep models yet with low-cost data. In this talk, I will progressively present three of our recent works on learning and adapting from the Web data for visual recognition. In the first setting, we show how to effectively learn from Google Images by an outlier-resilient semi-supervised learning method. In the second setting, we investigate surprisingly accurate geometric labels encoded by the 3D videos/movies which are vastly available on the Web. Finally, we let the Web images and Web videos mutually vote for the query-relevant subsets in order to automatically harvest a webly labeled diverse training set for human activity recognition. In parallel with the efforts of exploiting the Web data, I will also present in-depth some of the techniques behind those works (e.g., the multiple shades of dropout, curriculum learning, and kernel mean matching), shedding lights on the variations of them in our other works on WGAN and domain adaptation.

Biography: Dr. Boqing Gong is a Principal Researcher of Tencent AI Lab at Seattle, working on machine learning and computer vision. Before joining Tencent, he was a tenure-track Assistant Professor in University of Central Florida (UCF). His research in UCF was supported in part by an NSF CRII award (so-PI, received in 2016) and an NSF BIGDATA award (PI, received in 2017), both of which were the first of their kinds ever granted to UCF. He actively serves on NSF panels and the program committees of computer vision conferences (CVPR, ICCV, ECCV, etc.) and machine learning conferences (ICML, NIPS, AISTATS, etc.). He was an area chair of IEEE WACV'18 and a mentor of its PhD forum. In 2015, he received a Ph.D. degree in Computer Science from the University of Southern California, where his work was partially supported by the Viterbi Fellowship.

Program

(Location: N1189, Building N1, TU München)
Time Event
09:00 - 09:05 Welcome Introduction
09:05 - 09:40 Invited Talk by Prof. Shih-Fu Chang
09:40 - 10:15 Invited Talk by Dr. Julien Mairal
10:15 - 10:40 Coffee Break
10:40 - 12:00 Oral Session 1 (4 presentations: 20min each)
12:00 - 14:00 Lunch Break
14:00 - 14:30 Invited Talk by Dr. Boqing Gong
14:30 - 15:30 Poster Session
15:30 - 16:50 Oral Session 2 (4 presentations: 20min each)
16:50 - 16:55 Awards announced by the CEO of IIAI, Prof. Ling Shao
16:55 - 17:00 Closing Remarks

Program

(Location: N1189, Building N1, TU München)
Time Event
09:00 - 09:05 Welcome Introduction
09:05 - 09:40 Invited Talk by Prof. Shih-Fu Chang
09:40 - 10:15 Invited Talk by Dr. Julien Mairal
10:15 - 10:40 Coffee Break
10:40 - 12:00 Oral Session 1 (4 presentations: 20min each)
12:00 - 14:00 Lunch Break
14:00 - 14:30 Invited Talk by Dr. Boqing Gong
14:30 - 15:30 Poster Session
15:30 - 16:50 Oral Session 2 (4 presentations: 20min each)
16:50 - 16:55 Awards announced by the CEO of IIAI, Prof. Ling Shao
16:55 - 17:00 Closing Remarks

Oral Session 1 (10:40 - 12:00)

Oral Session 2 (15:30 - 16:50)

Poster Session (14:30 - 15:30)

Awards

CEFRL 2018 will announce one Best Paper Award and one Best Paper Honorable Mention Award, fully sponsored by the Inception Institute of Artificial Intelligence:

※ CEFRL 2018 Best Paper Award ($1,500)
Compact Deep Aggregation for Set Retrieval, Yujie Zhong, Relja Arandjelović, Andrew Zisserman

※ CEFRL 2018 Best Paper Honorable Mention Award ($500)
DNN Feature Map Compression using Learned Representation over GF(2), Denis A Gudovskiy, Alec Hodgkinson, Luca Rigazio

Awards

CEFRL 2018 will announce one Best Paper Award and one Best Paper Honorable Mention Award, fully sponsored by the Inception Institute of Artificial Intelligence:

※ CEFRL 2018 Best Paper Award ($1,500)
Compact Deep Aggregation for Set Retrieval, Yujie Zhong, Relja Arandjelović, Andrew Zisserman

CEFRL 2018 Best Paper Honorable Mention Award ($500)
DNN Feature Map Compression using Learned Representation over GF(2), Denis A Gudovskiy, Alec Hodgkinson, Luca Rigazio

Submission

All submissions will be handled electronically via the workshop’s CMT Website. Click the following link to go to the submission site:
https://cmt3.research.microsoft.com/CEFRL2018
Example submission paper with detailed instructions: eccv2018submission_workshop.pdf
Papers should describe original and unpublished work about the related topics. Each paper will receive double blind reviews, moderated by the workshop chairs. Authors should take into account the following:
- All papers must be written and presented in English.
- All papers must be submitted in PDF format. The workshop paper format guidelines are the same as the Main Conference papers.
- The maximum paper length is 14 pages (excluding references). Note that shorter submissions are also welcome.
- The accepted papers will be published within the conference proceedings of ECCV.

Submission

All submissions will be handled electronically via the workshop’s CMT Website. Click the following link to go to the submission site:
https://cmt3.research.microsoft.com/CEFRL2018
Example submission paper with detailed instructions: eccv2018submission_workshop.pdf
Papers should describe original and unpublished work about the related topics. Each paper will receive double blind reviews, moderated by the workshop chairs. Authors should take into account the following:
- All papers must be written and presented in English.
- All papers must be submitted in PDF format. The workshop paper format guidelines are the same as the Main Conference papers.
- The maximum paper length is 14 pages (excluding references). Note that shorter submissions are also welcome.
- The accepted papers will be published as Post-Proceedings with Springer.

Organizers