Multimedia Analysis with Deep Learning

Enable Deep Learning algorithms to learn fused representations from hybrid resources

(Source: HPI)

From the year 2006 "Deep Learning" has attracted more and more attentions in both academia and industry. Recently "Deep Learning" gives us break-record results in many novel areas, as e.g., beating human in strategic game systems like Go (Google's AlphaGo), self-driving cars, achieving dermatologist-level classification of skin cancer etc.

The widely accepted success factors of Deep Learning include the rejuvenation of stacked artificial neural networks (getting deeper and wider), the appearance of large labeled datasets (over millions of training samples), and the evolution of computation power (applying GPU acceleration and distributed computing). Multimedia data is one of most suitable objectives for deep learning research, because of its multiple modalities. Multimedia consists of visual, textual and auditory content. This specific feature could enable "Deep Learning" algorithms to learn fused representations from hybrid resources, which illustrate common semantic meaning.

On the other hand, "Deep Learning" is a data-driven technology which makes it highly suitable to process massive amounts of multimedia data. In this event we will present three current research topics based on deep learning technologies:

  • "SceneTextReg: Real-time Scene Text Recognition Using Deep Neural Networks"
  • "Neural Visual Translator: An Image Captioning Approach"
  • "Binary Neural Network: Enable Deep Neural Networks on Low Power Devices"