Using support vector machines and state-of-the-art algorithms for phonetic alignment to identify cognates in multi-lingual wordlists

J├Ąger, List and Sofroniev

The paper introduces an algorithm for automatic identification of cognate classes that relies on support vector machines to leverage various state-of-the-art methods for phonetic alignment and cognate detection. Training and evaluating an SVM on a typologically broad collection of gold-standard data shows this novel approach to be superior to other known methods.

On this webpage you can explore the datasets used in the paper. Each dataset comprises a multilingual list of glosses grouped by cognate class. The source of the cognate classes is either (1) the manual expert judgements, (2) the output of the LexStat algorithm, or (3) the output of the SVM-powered approach introduced in the paper.

github paper