Ting Yao

About ME

Ting Yao (姚霆) Google Scholar tingyao.ustc@gmail.com

Ting Yao is currently the Co-Founder and CTO of HiDream.ai, propelling it to be the top Generative Artificial Intelligence company in China. Previously, he was a Principal Researcher with JD AI Research in Beijing, China and a Researcher with Microsoft Research Asia in Beijing, China. Dr. Yao has co-authored more than 100 peer-reviewed papers in top-notch conferences/journals. His seminal work on Pseudo-3D network has become one of standard 3D convolution neural networks for spatiotemporal data analysis, and his video-to-text dataset (MSR-VTT) has been used by 500+ institutes worldwide. His research has led to several commercial products with millions of daily active users.

Dr. Yao currently serves as an associate editor of IEEE Transactions on Multimedia, Pattern Recognition Letters and Multimedia Systems, and frequently served as an area chair and keynote/tutorial speaker in numerous conferences. He has organized 10+ high-quality workshops/challenges with the flagship conferences. His works have led to many awards, including 2015 ACM SIGMM Outstanding Ph.D. Thesis Award, 2019 ACM SIGMM Rising Star Award, 2019 IEEE Computer Society TCMC Rising Star Award, 2022 IEEE ICME Multimedia Star Innovator Award, 2022 Chinese Intelligent Computing Technology Innovators, and the winning of 10 championships in international multimedia analytics competitions.

Dr. Yao received the B.Sc. degree in theoretical and applied mechanics, B.Eng. double degree in electronic information engineering, and M.Eng. degree in signal and information processing all from the University of Science and Technology of China, Hefei, China. He completed a Ph.D. in computer science (2014) at the City University of Hong Kong, advised by Prof. Chong-Wah Ngo.

Distinctions

Awards

Chinese Intelligent Computing Technology Innovators, 2022.
First Grade Scientific and Technology Prize of the China Society of Image and Graphics (CSIG), "Key Technologies and Applications of Ultrafine Image Recognition," 2022.
IEEE ICME Multimedia Star Innovator Award, "for outstanding innovative contribution in the area of Multimedia Intelligence," 2022.
Nicolas D. Georganas Best Paper Award, "Smart Director: An Event-Driven Directing System for Live Broadcasting," ACM Transactions on Multimedia Computing, Communications, and Applications, 2022.
IEEE Computer Society TCMC Rising Star Award, "for contributions in video content recognition and description generation," 2019.
ACM SIGMM Rising Star Award, "for contributions in activity recognition and video captioning," 2019.
ACM SIGMM Outstanding Ph.D. Thesis Award, "Multimedia Search by Self, External, and Crowdsourcing Knowledge," 2015.
Best Open Source Award, "X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics," ACM International Conference on Multimedia (ACM MM), 2021.
Outstanding Associate Editor, IEEE Transactions on Multimedia, 2021.
Second Place Best Demo Award, "Animating Your Life: Real-Time Video-to-Animation Translation," ACM International Conference on Multimedia (ACM MM), 2019.
Best Paper Finalist, "Action Recognition by Learning Deep Multi-Granular Spatio-Temporal Video Representation," ACM on International Conference on Multimedia Retrieval (ICMR), 2016.
Best Paper Award, "Click-boosting Random Walk for Image Search Reranking," International Conference on Internet Multimedia Computing and Service (ICIMCS), 2013.

Top-performing Systems in International Competitions

Rank 1, No Interaction Track, the first workshop on Generalizable Policy Learning in the Physical World with International Conference on Learning Representations (ICLR), 2022.
Rank 1, No Restriction Track, the first workshop on Generalizable Policy Learning in the Physical World with International Conference on Learning Representations (ICLR), 2022.
Rank 1, Open-set Image Classification task, Open World Vision Challenge, with IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
Rank 1, Multi-Source Domain Adaptation Track, Visual Domain Adaptation Challenge, with IEEE International Conference on Computer Vision (ICCV), 2019.
Rank 1, Trimmed Activity Recognition (Kinetics), International Challenge on Activity Recognition (ActivityNet), with IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
Rank 1, Open-set Classification Track, Visual Domain Adaptation Challenge, with European Conference on Computer Vision (ECCV), 2018.
Rank 1, Detection Track, Visual Domain Adaptation Challenge, with European Conference on Computer Vision (ECCV), 2018.
Rank 1, Segmentation Track, Visual Domain Adaptation Challenge, with IEEE International Conference on Computer Vision (ICCV), 2017.
Rank 1, Dense-Captioning Events in Videos, International Challenge on Activity Recognition (ActivityNet), with IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
Rank 1, COCO Image Captioning Challenge, 2017.

Professional Activities

Service in Professional Organizations

Elected Member, IEEE Signal Processing Society: Image, Video, and Multidimensional Signal Processing Technical Committee (IVMSP-TC), 2023 – present
Member, IEEE Computer Society: Technical Committee on Multimedia Computing (TCMC), 2019 – present
Member, IEEE Computer Society: Technical Committee on Pattern Analysis and Machine Intelligence (PAMI-TC), 2020 - present

Journal Editorship

Associate Editor, Pattern Recognition Letters, 2022.02 – present
Associate Editor, IEEE Transactions on Multimedia, 2019.11 – 2023.11
Associate Editor, Multimedia Systems, 2018.10 – present
Guest Editor, ACM Transactions on Multimedia Computing, Communications, and Applications, Special Issue on “Deep Learning for Intelligent Multimedia Analytics,” 2019.

Conference/Workshop/Challenge Organizer

Technical Demo and Video Program Chair, ACM International Conference on Multimedia (ACM MM), 2023.
Challenge Lead-organizer, Conversational Head Generation Challenge, with ACM International Conference on Multimedia (ACM MM), 2023 & 2022.
Challenge Lead-organizer, Pre-training for Video Understanding Challenge, with ACM International Conference on Multimedia (ACM MM), 2022 & 2021.
Workshop Lead-organizer, the first International Workshop on Theories, Applications, and Cross Modality for Self-Supervised Learning Models, with International Conference on Pattern Recognition (ICPR), 2022.
Workshop Lead-organizer, the first International Workshop on Deep Learning for Human Centric Activity Understanding, with International Conference on Pattern Recognition (ICPR), 2020.
Challenge Lead-organizer, Pre-training for Video Captioning Challenge, with ACM International Conference on Multimedia (ACM MM), 2020.
Challenge/Workshop Co-organizer, the first Workshop and Challenge on Conceptual Captions, with IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
Challenge Lead-organizer, MSR Video to Language Challenge, with ACM International Conference on Multimedia (ACM MM), 2017 & 2016.
Workshop Lead-organizer, AI Technology for Visual Fashion Computing, with IEEE International Conference on Multimedia & Expo (ICME), 2019.
Workshop Lead-organizer, Deep Learning for Intelligent Multimedia Analytics, with IEEE International Conference on Multimedia & Expo (ICME), 2017.
Special Session Chair, Multimedia Computing for Intelligent Life, with International Conference on Multimedia Modeling (MMM), 2017.

Keynote/Tutorial Speaker

Keynote Speaker, "Key Technologies and Applications of Ultrafine Image Recognition,'' CSIG Award Forum, with Chinese Congress on Image and Graphics (CCIG), 2023.
Keynote Speaker, "Deep Spatiotemporal Visual Representation Learning and Applications,'' Forum on Video Action Detection and Recognition, with Chinese Conference on Pattern Recognition and Computer Vision (PRCV), 2022.
Keynote Speaker, "From Visual Representation Learning to Visual-Language Intelligence,'' Forum on Intelligent Computing Techniques for Vision and Language, with China Multimedia (ChinaMM), 2022.
Keynote Speaker, "Trustworthy Visual Understanding: Generic Representation Learning and Explainable Interpretation," the first International Workshop on Trustworthy AI for Multimedia Computing, with ACM International Conference on Multimedia (ACM MM), 2021.
IEEE TCMC Award Talks Speaker, "Vision to Language: from Independency, Interaction, to Symbiosis," IEEE International Conference on Multimedia Information Processing and Retrieval (MIPR), 2021.
Keynote Speaker, "Vision to Language: from Independency, Interaction, to Symbiosis," the Second Workshop on Multimodal Natural Language Processing, with the CCF International Conference on Natural Language Processing and Chinese Computing (NLPCC), 2021.
Keynote Speaker, "Vision to Language: from Independency, Interaction, to Symbiosis," the International Workshop on Multi-Modal Deep Learning: Challenges and Applications, with International Conference on Pattern Recognition (ICPR), 2020.
ACM SIGMM Award Talks Speaker, "Deep Video Understanding: Action Recognition and Language Generation," ACM International Conference on Multimedia (ACM MM), 2019.
Tutorial Speaker, "Vision and Text: Search, Generation and Translation," IEEE International Conference on Image Processing (ICIP), 2019.
Tutorial Speaker, "Human Behavior Understanding: From Human-Oriented Analysis to Action Recognition," IEEE International Conference on Multimedia & Expo (ICME), 2019.
Tutorial Speaker, "Human Behavior Understanding: From Action Recognition to Complex Event Detection," ACM International Conference on Multimedia (ACM MM), 2018.
Keynote Speaker, "Describing Multimedia by Localization and Generation," 1st Person in Context (PIC) Workshop and Challenge, with European Conference on Computer Vision (ECCV), 2018.
Keynote Speaker, "Describing Multimedia by Localization and Generation," 1st Vision and Learning Seminar (VALSE) Workshop on Vision and Language, 2018.
ACM SIGMM Award Talks Speaker, "Bridging Vision and Text for Multimedia Search," ACM International Conference on Multimedia (ACM MM), 2015.

Area Chair / Senior PC

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, 2024.
International Conference on Pattern Recognition (ICPR), 2020, 2022.
IEEE International Conference on Image Processing (ICIP), 2019, 2021, 2022, 2023, 2024.
IEEE International Conference on Multimedia \& Expo (ICME), 2018, 2019.
ACM International Conference on Multimedia (ACM MM), 2018, 2019, 2023.
International Joint Conference on Artificial Intelligence (IJCAI), 2019, 2020, 2021.
AAAI Conference on Artificial Intelligence (AAAI), 2020, 2021, 2022.

Selected Publications (Full List)

HIRI-ViT: Scaling vision transformer with high resolution inputs

T. Yao, Y. Li, Y. Pan, T. Mei
IEEE TPAMI, 2024

Dual vision transformer

T. Yao, Y. Li, Y. Pan, Y. Wang, X.P. Zhang, T. Mei
IEEE TPAMI, 2023

Bi-calibration networks for weakly-supervised video representation learning

F. Long, T. Yao, Z. Qiu, X. Tian, J. Luo, T. Mei
International Journal of Computer Vision, 2023

Control3d: Towards controllable text-to-3d generation

Y. Chen, Y. Pan, Y. Li, T. Yao, T. Mei
ACM Multimedia, 2023

Wave-vit: Unifying wavelet and transformers for visual representation learning

T. Yao, Y. Pan, Y. Li, C.-W. Ngo, T. Mei
ECCV, 2022

Contextual transformer networks for visual recognition

Y. Li, T. Yao, Y. Pan, T. Mei
IEEE TPAMI, 2022

Seco: Exploring sequence supervision for unsupervised representation learning

T. Yao, Y. Zhang, Z. Qiu, Y. Pan, T. Mei
AAAI, 2021

Smart director: An event-driven directing system for live broadcasting

Y. Pan, Y. Chen, Q. Bao, N. Zhang, T. Yao, J. Liu, T. Mei
ACM TOMM, 2021 (Nicolas D. Georganas Best Paper 2022)

X-modaler: A versatile and high-performance codebase for cross-modal analytics

Y. Li, Y. Pan, J. Chen, T. Yao, T. Mei
ACM Multimedia, 2021 (Best Open Source Award)

X-linear attention networks for image captioning

Y. Pan, T. Yao, Y. Li, T. Mei
CVPR, 2020

Joint contrastive learning with infinite possibilities

Q. Cai, Y. Wang, Y. Pan, T. Yao, T. Mei
NeurIPS, 2020

Learning spatio-temporal representation with local and global diffusion

Z. Qiu, T. Yao, C.-W. Ngo, X. Tian, T. Mei
CVPR, 2019

Hierarchy parsing for image captioning

T. Yao, Y. Pan, Y. Li, T. Mei
ICCV, 2019

Gaussian temporal awareness networks for action localization

F. Long, T. Yao, Z. Qiu, X. Tian, J. Luo, T. Mei
CVPR, 2019

Exploring visual relationship for image captioning

T. Yao, Y. Pan, Y. Li, T. Mei
ECCV, 2018

Fully convolutional adaptation networks for semantic segmentation

Y. Zhang, Z. Qiu, T. Yao, D. Liu, T. Mei
CVPR, 2018

Learning spatio-temporal representation with pseudo-3d residual networks

Z. Qiu, T. Yao, T. Mei
ICCV, 2017

Boosting image captioning with attributes

T. Yao, Y. Pan, Y. Li, Z. Qiu, T. Mei
ICCV, 2017

Video captioning with transferred semantic attributes

Y. Pan, T. Yao, H. Li, T. Mei
CVPR, 2017

Incorporating copying mechanism in image captioning for learning novel objects

T. Yao, Y. Pan, Y. Li, T. Mei
CVPR, 2017

To create what you tell: Generating videos from captions

Y. Pan, Z. Qiu, T. Yao, H. Li, T. Mei
ACM Multimedia, 2017

MSR-VTT: A large video description dataset for bridging video and language

J. Xu, T. Mei, T. Yao, Y. Rui
CVPR, 2016

Jointly modeling embedding and translation to bridge video and language

Y. Pan, T. Mei, T. Yao, H. Li, Y. Rui
CVPR, 2016

Highlight detection with pairwise deep ranking for first-person video summarization

T. Yao, T. Mei, Y. Rui
CVPR, 2016

Action recognition by learning deep multi-granular spatio-temporal video representation

Q. Li, Z. Qiu, T. Yao, T. Mei, Y. Rui, J. Luo
ACM ICMR, 2016 (the Best Paper Finalist)

Why fly? when you can walk on the water!

About ME

Ting Yao (姚霆) Google Scholar tingyao.ustc@gmail.com

Distinctions

Awards

Top-performing Systems in International Competitions

Professional Activities

Service in Professional Organizations

Journal Editorship

Conference/Workshop/Challenge Organizer

Keynote/Tutorial Speaker

Area Chair / Senior PC

Selected Publications (Full List)

HIRI-ViT: Scaling vision transformer with high resolution inputs

Dual vision transformer

Bi-calibration networks for weakly-supervised video representation learning

Control3d: Towards controllable text-to-3d generation

Wave-vit: Unifying wavelet and transformers for visual representation learning

Contextual transformer networks for visual recognition

Seco: Exploring sequence supervision for unsupervised representation learning

Smart director: An event-driven directing system for live broadcasting

X-modaler: A versatile and high-performance codebase for cross-modal analytics

X-linear attention networks for image captioning

Joint contrastive learning with infinite possibilities

Learning spatio-temporal representation with local and global diffusion

Hierarchy parsing for image captioning

Gaussian temporal awareness networks for action localization

Exploring visual relationship for image captioning

Fully convolutional adaptation networks for semantic segmentation

Learning spatio-temporal representation with pseudo-3d residual networks

Boosting image captioning with attributes

Video captioning with transferred semantic attributes

Incorporating copying mechanism in image captioning for learning novel objects

To create what you tell: Generating videos from captions

MSR-VTT: A large video description dataset for bridging video and language

Jointly modeling embedding and translation to bridge video and language

Highlight detection with pairwise deep ranking for first-person video summarization

Action recognition by learning deep multi-granular spatio-temporal video representation