Captioning Image to Assist People Who are Blind

Surveyed attention-based deep learning model methods applied in image captioning areas.

Implemented a model composed of Residual neural network (ResNet) and soft attention mechanism to generate high-level representative features. These features are then fed into a Long Short-Term Memory (LSTM) network to output a description of the image in valid English description.

Trained the model on VizWiz dataset with BLEU metric, the model achieved comparable to state-of-the-art performance and generated highly descriptive captions that can potentially greatly improve the lives of visually impaired people.

Report can be viewd here and slides can be viewed here.

Share on

Twitter Facebook LinkedIn

Sihang Wei

Share on