Captioning Image to Assist People Who are Blind

Surveyed attention-based deep learning model methods applied in image captioning areas.

Implemented a model composed of Residual neural network (ResNet) and soft attention mechanism to generate high-level representative features. These features are then fed into a Long Short-Term Memory (LSTM) network to output a description of the image in valid English description.

Trained the model on VizWiz dataset with BLEU metric, the model achieved comparable to state-of-the-art performance and generated highly descriptive captions that can potentially greatly improve the lives of visually impaired people.

Report can be viewd here and slides can be viewed here.