Artificial intelligence (AI) is the intelligence exhibited by computer. This term is applied when a machine mimics "cognitive" functions that humans associate with other human minds, such as "learning" and "problem solving". Currently researchers in this field are making efforts to AI and machine learning which intend to train the computer to mimic some human skills such as "reading", "listening", "writing" and "making inference" etc. Some AI applications such as optical character recognition (OCR) and speech recognition (ASR) recently become routine technologies in industry. From the year 2006 "Deep Learning" (DL) has attracted more and more attentions in both academia and industry. Deep learning by deep neural networks is a branch of machine learning based on a set of algorithms that attempt to learn representations of data and model their high level abstractions. In a deep neural network, there are multiple so-called "neural layers" between the input and output. The algorithm is allowed to use those layers to learn higher abstraction, composed of multiple linear and non-linear transformations. Recently DL gives us break-record results in many novel areas as, e.g., beating human in strategic game systems like Go (Google’s AlphaGo), self-driving cars, achieving dermatologist-level classification of skin cancer etc.
Multimedia data is one of most suitable objectives for deep learning research, because of its multiple modalities. Multimedia consists of visual, textual and auditory content. This specific feature could enable DL algorithms to learn fused representations from hybrid resources, which illustrate common semantic meaning. On the other hand, DL is a data-driven technology which makes it highly suitable to process massive amounts of multimedia data.