Xiangyu Jin (Ph.D)

Department of Computer Science
University of Virginia
151 Engineer's Way
P. O. Box 400740
Charlottesville, VA 22904-4740

Email: xiangyu at virginia dot edu

I am currently a senior research engineer at Alibaba group R&D in silicon valley. I work on high performance search engine and cloud computing infrastructures.

Hello, my name is Xiangyu Jin. Xiangyu is my first name, where "Xiang" means flying and "Yu" means universe in Chinese. My last "jin" means gold in Chinese. So, my name actually means "gold flying in universe". Wah! What a fantastic world!

Actually, my parents gave me this name in memory to Zhou Enlai (Zi Xiangyu), the 1st Premier of P.R.China, who passed away in Jan. 1976.

My nick name is D.D.Duck, a lovely cartoon figure. I have been called so since high school and through college by my classmates. Another reason for this is that my hometown, Nanjing, is famous for its delicious Nanjing Salted Duck.

I get my B.S. and M.E. from Department of Computer Science, Nanjing University, in 1999 and 2002, respectively. I am currently a PhD candidate in Department of Computer Science, University of Virginia.















I am currently working with Prof. James French on Information Retrieval. My recent research focuses on interactive retrieval techniques such as relevance feedback, especially under multimedia retrieval environments. I also worked on Chinese text mining and sketch recognition before. Click here for my publication list.

Following is the selected projects that I have worked on. Some projects has its own web page which provides data and source code to re-implement our experiments.

Skill Mining (Summer 2006, HP Lab, Palo Alto, CA)

Traditional expert finding applications based on register-search procedure is both unreliable and highly cost. We target at mining people's skills (e.g., programming skills) via content-analysis of their computing systems. Signatures captured from the history of the file systems are used to analyze the user's up to date skills.


Relevance Feedback in Multimedia Retrieval (2004-present, Univ. of Virginia)

The new characteristics introduced by multimedia retrieval applications make traditional text-oriented relevance feedback approaches not work well. A lot of new approaches have been proposed to adapt to these new characteristics. However, different background research groups tend to follow different traditions within their communities and, in particular, they often have different standards and approaches to evaluation. This makes it extremely difficult to fairly compare and evaluate techniques.

We make a systemic and cross-media study for relevance feedback techniques in multimedia retrieval applications. Different from most existing studies, we put a lot of energy on evaluation itself. We study the ways to achieve consistent evaluation among systems [AMR05]. We also consider practical constrains of the user [SIGIR06].  

For detail information, please visit

Toward Consistent Evaluation (The AFTER Project)

Click here to view the project page

Quantitative Analysis of the Impact of Judging Inconsistency (The HAR05 Project)

Click here to view the project page



Multi-channel CBIR System (2002-present, Univ. of Virginia)

Click here to view the project page

We apply multiple viewpoints in Content-based Image Retrieval (CBIR) environment. We propose a synthetic retrieval model [MTA05] which integrates multiple image representations [ISSPA03], multiple retrieval systems [MIS03], and multiple query images [MMDB03] to generate different viewpoints (search results) toward the same information need.

The relationship among these information channels and how to effectively combine them are further studied. We also perform empirical study for the scalability of this technique [CIVR04].


Sketch Recognition & User Adaptation (2001-2002, MSR-Asia & Nanjing Univ.)

Smart Sketchpad [ICDAR01] is a prototype for on-line graphics recognition system. Graphics inputting is a difficult task for small screen devices such as PDAs. We employ a sketch-based user interface to provide users a natural, convenient, and efficient way to input graphic objects by quickly drawing part of their sketchy shapes [GREC01]. We predict the user's intention and list candidate graphic objects for user selection. Smart Sketchpad learns incrementally [IJPRAI04] to adapt to different user's drawing habit. A user study is also performed [APCHI02]. My master thesis summarizes this work.

Content-based Image Retrieval (2000-2002, Nanjing Univ.)

The semantic gap is a hard nut for Content-based Image Retrieval applications. Instead of directly mapping the low-level visual features to high-level semantics, we segment the images into fixed blocks and assign semantics labels for these blocks via pre-trained classifiers. Later, high-level image semantics can be inferred from such mid-level semantic vectors. This approach avoid possible failures from automatic image segmentation.


We select 34 image categories from COREL database as the training set. We segment each image into fixed blocks (16*16) and label each block (with a unique word at semantic level) by semi-automatic tools. Later this training data is used to train classifiers over the full COREL database.

Chinese Text Mining for Online Publishing System (1999-2001, Nanjing Univ.)

WinGPS (Web Information Gathering & Publication System) is the prototype system of the project "Multimedia Information Retrieval and Publishing Techniques for Digital Library". My research was focused on cooperative retrieval and information filtering [JCST00]. First, user provides the description for the information need by fill in online forms. Then user-interested information is searched through existing web search engines. Afterwards, the search results are fused and download to a local server. These download documents are further clustered and summarized by sub-topics. Finally, the organized information is packed and delivered to the user by emails.


WIPS is a self-driven project I designed as an integrated online publishing solution for Chinese news media. The huge volume of published Chinese news media documents (in PS files) are segmented, re-combined, clustered, indexed, and summarized for online publishing. A novel Chinese tokenization method is proposed to overcome the limitation of the dictionary-based approach. The first version join the the 6th National "Challenge Cup Contest". The commercialized product (Da Han Ban Tong) still dominate online publishing industry in China for major news media. 
















Click here to view my BLOG

I put most of my experience of coming to U.S. into web pages. I did not use commercial BLOG systems since I need flexibility and need to move them often, plus I need huge space. The dis-advantages is this is not a real social network and there is no place for visitors to leave comments.


Click here to view my photo album


Memory is a treasure. Here are some selected pictures, especially pictures taken when I was a child.




Ancient Civilization

This ranks my No.1 hobbie. I am curious to know where we are from and where we are going. To know the past can help expect the future. It is really amazing that our ancestors create so advanced civilizations. Such as Sumerian, Persian, Egypt, and Maya, etc. There are many mystery remains.


Guitar & Choir

I have my first guitar in high school. Now I am playing with a Washburn acoustic electronic. I like the sound of guitar although I am not good at it, :). I am also a guitar player in the "beginner" band "Second Face". Here are some songs we recently recorded (home, yellow, afterwards, truly love you). Forgive us for the poor recording qualityJ.

My song book is a very cool place where you can find most guitar pro files.

I was in choir group since elementary school. Now I am a tenor in the PCA choir group. Here are some songs I record with an earphone. In order to record each part of the choir, I do it multiple times and mix together afterwards. Here are some examples, Cinderella, Youth, Edelweiss, Silent night.



I like anime. And I am definitely an OP (one piece) fan. Its plot is so complex and the author keeps the story so tightly organized. Any tiny clue will be useful later. I had never seen any similar works before. Dragon ball is long but it is only "repetitive" in plot. OP is not. It is awesome, hilarious, and touching. It encourage us never give up our dreams and appreciate your friendship. I strongly recommend it.



I like playing b-ball and currently serve as the coordinator of CSSS b-ball club. I used to play twice or three times per week before. However due to a knee problem, I has to restrict myself in games these days, called walking attack and standing defense



I like traveling. I have traveled (in US) Great Smoky Mountain,  Assateague, Yellow Stone, Grad Teton, Shenandoah, Blue Ridge, Grand Canyon, Painted Desert, Bryce Canyon, Zion, Yosemite, Kings Canyon & Sequoya, Lesson Volcanic, Mt. Rainier, Mt. Hood,  Crater lake I already visit around 20 national parks and I wish I could visit all fifty national parks here one day. I put most of my traveling in my BLOG.

My dream place is Mideast, Egypt, and Andes mountain, where ancient civilization resides.