Calendar

[Seminar] Vision-and-language Models: Opportunities and Limitations

Wednesday, April 5, 2023

11:00 am - 12:00 pm

Speaker

Vicente Ordoñez

Rice University

Location
PGH 232

Abstract

Training large scale models that learn about the world purely through language has proven impressive in terms of the capabilities that these models can acquire. However, models that are trained with text and images have also produced an impressive set of recent results. I will summarize the extent to which vision-and-language models have the potential to replace some purely visually trained models and some of the evolution and progress of vision-and-language models throughout the years. I will also use the opportunity to discuss some recent works in my group in this area including CLIP-Lite (, AISTATS 2023) which is an effort to investigate how to train CLIP models on limited and scarce data, and Attention-Masking Consistency (AMC) (, CVPR 2023) which is an effort to improve the visual grounding capabilities of vision-language models. I will also discuss issues surrounding the impact of societal biases in these models and efforts to expose and mitigate those biases through systematic benchmarking and model interventions.

About the Speaker

Vicente Ordóñez is an Associate Professor in the Department of Computer Science at Rice University where he directs a research group focusing on computer vision, natural language processing and machine learning. He is also an Amazon Visiting Academic at Amazon Alexa AI. His focus is on building efficient visual recognition models that can perform tasks that leverage both images and text. He is a recipient of a Best Long Paper Award at EMNLP 2017 and the Best Paper Award – Marr Prize at ICCV 2013. He has also been the recipient of an NSF CAREER Award, an IBM Faculty Award, a Google Faculty Research Award, a Facebook Research Award, and a Google Inclusion Research Award. Previously, he was Assistant Professor in the Department of Computer Science at the University of Virginia, and obtained his PhD in Computer Science at the University of North Carolina at Chapel Hill. In the past, he has also been a visiting researcher at the Allen Institute for Artificial Intelligence and a visiting professor at Adobe Research.

��������

[Seminar] Vision-and-language Models: Opportunities and Limitations

��