Аннотации:
We present an ensemble-based method for classifying photographs containing patches with text. In particular, the proposed solution is suitable for the task of classification the images of commercial building facades by the type of provided services. Our model is based on heterogeneous ensemble usage and analysis of textual and visual features as well as special visual descriptors for areas with English text. It should be noted that our classifier demonstrates remarkable performance (0.71 in F1 score against 0.43 baseline result). We also provide our own dataset containing 3000 images of facades with signboards in order to provide complete classification benchmark.