no code implementations • Journal of Biomedical Informatics: X 2019 • Anna Korolevaa, Sanjay Kamatha, Patrick Paroubeka
Methods: We tested several approaches, including single measures of similarity (based on strings, stems and lemmas, paths and distances in an ontology, and vector representations of phrases), classifiers using a combination of single measures as features, and a deep learning approach that consists in fine-tuning pre-trained deep language representations. We tested language models provided by BERT (trained on general-domain texts), BioBERT and SciBERT (trained on biomedical and scientific texts, respectively). We explored the possibility of improving the results by taking into account the variants for referring to an outcome (e. g. the use of a measurement tool name instead on the outcome name; the use of abbreviations). We release an open corpus with annotation for similarity of pairs of outcomes.