Hello, my name is Siva

MS in Computer Science @ University of Virginia



"Acerrimus ex omnibus nostris sensibus est sensus videndi" (The keenest of all our senses is the sense of sight )

My Research Venn Diagram

A picture is worth a thousand words. How many words will a video be worth, then? Can we find the right words to represent a video?

Leverage visual information and associated text to build learning machines at par with human-level perception and understanding.

I work with Prof. Vicente Ordonez as a part of the Vision and Language Research Group (VISLANG). I am also advised by Prof. Gabriel Robins.



Raspberry Pi + Amazon Echo = recognize faces at your command!
Code Report


Movie Trailer + Plot Summaries = classify genres using deep learning!
Code Report Demo


Sorting an album of images and associated caption!
Code Report Demo