This network is used for multilingual text detection. The network is composed of a ResNet-FPN feature extractor and a detection predictor. The model is trained by ICDAR-2017. The input is an image containing some text. The output is a structure that includes the words recognized and their position. The following image shows the result of Textmountain model.
Figure 1. Textmountain Detection