Аннотации:
This paper presents an approach to the speaker diarization problem based on a step-wise form of speech file. There are known solutions to the diarization problem using complicated models of neural nets. Hence training of such net requires serious computation. The research goal is to construct an algorithm that uses very restricted resources during the discussion with a few persons using just a regular notebook. The time needed for training the system for work with the given persons is also minimal. This goal is attained by transforming the input signal of the net into a step-function having three values. This circumstance provides leveraging a simple model of the neural net for end-to-end diarization. For training, we use a segmented speech file where any segment belongs to one speaker. The number of speakers is known in advance. We convert each segment into a step-function applying the threshold value estimated using the developed fast algorithm. Using the end-to-end neural net, we exclude the clusterization step in the speaker diarization problem. Experiments show the acceptability of diarization quality.