Joint Entity and Relation Extraction from Scientific Documents: Role of Linguistic Information and Entity Types
Scientific articles contain various types of domain-specific entities and relations between them. The entities and their relations succinctly capture important information about the topic of the document and hence, they are crucial to the understanding and automatic analysis of the documents. In this paper, we aim to automatically extract entities and relations from a scientific abstract using a deep neural model. Given an input sentence, we use a pretrained transformer to produce contextual embeddings of the tokens which are then enriched with embeddings of their part-of-speech (POS) tags. A sequence of enriched token representations forms a span, and entities and relations are jointly learned over spans. Entity logits predicted by the entity classifier are used as features in the relation classifier. Our proposed model improves upon competitive baselines in the literature for entity and relation extraction on SciERC and ADE datasets.
PDFCode
Datasets
Results from the Paper
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Relation Extraction | Adverse Drug Events (ADE) Corpus | SpERT.PL (without overlap and BioBERT) | RE+ Macro F1 | 82.39 | # 5 | |
NER Macro F1 | 91.14 | # 5 | ||||
Relation Extraction | Adverse Drug Events (ADE) Corpus | SpERT.PL (with overlap and BioBERT) | RE+ Macro F1 | 82.03 | # 7 | |
NER Macro F1 | 91.17 | # 4 | ||||
Joint Entity and Relation Extraction | SciERC | SpERT.PL (SciBERT) | Entity F1 | 70.53 | # 1 | |
Relation F1 | 51.25 | # 3 | ||||
Cross Sentence | No | # 1 |