End-to-End Text Recognition with Convolutional Neural Networks

David J. Wu


Advised by Professor Andrew Y. Ng


Full end-to-end text recognition in natural images is a challenging problem that has recently received much attention in computer vision and machine learning. Traditional systems in this area have relied on elaborate models that incorporate carefully hand-engineered features or large amounts of prior knowledge. In this thesis, I describe an alternative approach that combines the representational power of large, multilayer neural networks with recent developments in unsupervised feature learning. This particular approach enables us to train highly accurate text detection and character recognition modules. Because of the high degree of accuracy and robustness of these detection and recognition modules, it becomes possible to integrate them into a full end-to-end, lexicon-driven, scene text recognition system using only simple off-the-shelf techniques. In doing so, we demonstrate state-of-theart performance on standard benchmarks in both cropped-word recognition as well as full end-to-end text recognition.

  author = {David J. Wu},
  title  = {End-to-End Text Recognition with Convolutional Neural Networks},
  misc   = {Stanford Undergraduate Honors Thesis},
  year   = {2013}