"Acerrimus ex omnibus nostris sensibus est sensus videndi" - 'The keenest of all our senses is the sense of sight'
A picture is worth a thousand words. How many words will a video be worth, then? Can we find the right words to represent a video?
Literal translation of my research. Leverage visual information and associated text to build machines that better learn from data, and match human-level perception and understanding.