Captioning Image to Assist People Who are Blind
Surveyed attention-based deep learning model methods applied in image captioning areas.
Implemented a model composed of Residual neural network (ResNet) and soft attention mechanism to generate high-level representative features. These features are then fed into a Long Short-Term Memory (LSTM) network to output a description of the image in valid English description.
Trained the model on VizWiz dataset with BLEU metric, the model achieved comparable to state-of-the-art performance and generated highly descriptive captions that can potentially greatly improve the lives of visually impaired people.