Language Inference-based Learning for Low-Resource Chinese Clinical Named Entity Recognition Using Language Model

Electronic health records (EHRs) have been broadly employed in China, along with the development of medical informatics. They encompass crucial patient information such as medical history, diagnoses, medications, treatment plans, immunization dates, allergies, radiological images, as well as laboratory and item results. These data hold significant importance for subsequent medical research analysis and human health development. Therefore, it is extremely important to extract the necessary information from EHRs [1]. Artificial intelligence (AI) has been continuously advancing and finding applications in the medical field. Among these applications, natural language processing (NLP) techniques can achieve the goal of extracting key information from EHRs. Clinical named entity recognition (CNER) is a specific research domain within the field of information extraction. The main objective of CNER is to extract named entities such as diseases, drugs, clinical symptoms and clinical item examinations from EHRs.

In the early stages, relevant researchers employed expert-defined rules and templates for entity recognition through rule-based matching [2], [3]. Although intuitively these methods seem to guarantee good results. With the increasing volume of data, particularly with the rise of EHRs, the number of entity types to be recognized is growing and the complexity of recognition scenarios is increasing. Additionally, the process of defining rules and templates is time-consuming, which hinders long-term development.

In recent years, the emergence of pre-trained language models (PLMs) [4], [5], [6], [7], [8], [9], [10] has provided significant support for CNER tasks. The prevailing paradigm for CNER tasks involves appending various feature extraction networks to fine-tune the parameters of a pre-trained language model. For instance, incorporating Bidirectional Long Short-Term Memory (BiLSTM) networks or Conditional Random Fields (CRF) [11] became a prominent approach. Dai et al.[12] used the network structure of PLMs-BiLSTM-CRF for the CNER on Chinese EHRs and achieved a 75% F1-score.

The standard fine-tuning approaches for CNER can be considered data-hungry methods, as they require a large amount of manually annotated data obtained through the expertise of domain specialists. However, in the practical medical field, acquiring such abundant annotated data is not feasible due to its high cost and time-consuming nature. The annotation of a small amount of data is feasible in terms of time and resource costs. To address the low-resource challenge, researchers have explored the use of prompt-based learning methods [13], [14]. The prompt-based learning can be seen as a new paradigm for natural language processing study. Specifically, prompt-based learning designs some prompt text and reconstructs the original sentences. The pre-trained language models are required to understand contexts and infer information from reconstructed text. The core idea is to convert the NLP task into a cloze-style question-answering task. The probability of the word where at the location of the mask tokens is predicted by the PLMs mask head. In contrast to standard fine-tuning methods, prompt-based methods eliminate the gap between the pre-training task and the downstream task and are more in line with the pre-training process of PLMs. At the same time, they do not need to design diverse elaborate network layers, thereby avoiding the requirement for parameter training from scratch. Consequently, prompt-based methods reduce training costs and expedite convergence speed. For example, previous work [15] investigated cloze-style-based text classification tasks in low-resource scenarios. However, the CNER task is more complex than the text classification task, as they not only require the identification of entities within a sentence but also the prediction of their respective types. Therefore, the CNER task involves considering the relationship between entities and their corresponding types within the context [16], [17] during model training time. For this reason, directly applying the cloze-style approach to predict entities and their types remains a significant challenge.

In this paper, a Language Inference-based Learning method (LANGIL) is proposed for the CNER task. As is shown in Fig. 1, the traditional CNER task has been reformulated as a language inference-based task, aiming to explore the capability of a language model in comprehending natural language. Following the prompt-based learning paradigm, the hypothesis text and premise text are considered as two slots in a pattern. They are used to accommodate candidate object text and their corresponding original sentence, respectively. Additionally, the mask token is used to connect the hypothesis and premise text, where the hypothesis text precedes the premise text. This configuration constitutes an input instance for PLMs. Then PLMs utilize contextual semantic information to infer whether the hypothesis text is entailment or non-contradictory given the premise text. If yes, the hypothesis text content corresponding to the premise text is considered valid, then the entity and corresponding type in the hypothesis text is a result. Conversely, if the inference indicates contradiction, it implies that the current instance does not contain any valid answer. In order to allow PLMs to focus on more gradient and textual information during the training, the de-coupled equilibrium sample loss function is designed to help the model learn more semantic information. At the same time, intermediate training is carried out on the data-rich medical dataset, in order to further improve the expressiveness and domain adaptability of the language model in medical scenarios. In inference time, constituency entities are enumerated and wrapped into hypothesis text. These instances are used as input to LANGIL and obtain corresponding scores. The type corresponding to the largest score is selected as the constituency entity type. Compared to the standard fine-tuning method, with sample size K=50, we can increase F1-score by 12.66%, 12.94%, 15.58%, and 16.56% on the CCKS-2017, CCKS-2019, CCKS-2020, and IMCS datasets, respectively.

Since Transformer [18] was proposed, various pre-trained language models emerged one after another. The Bidirectional Encoder Representations from Transformers (BERT) pre-trained language model is applied to various downstream tasks in NLP [6]. The BERT consists of two pre-training tasks: Masked Language Model (MLM) and Next Sentence Prediction (NSP). A robust optimization of BERT pre-training method model (RoBERTa) was proposed [8], which involves adjustments and improvements to the original BERT during the pre-training process. These modifications included the employment of dynamic mask mechanisms and the removal of the NSP task. The performance of RoBERTa surpassed that of BERT in certain tasks. With the remarkable performance of PLMs, Chinese pre-trained language models also begin to flourish. For instance, the joint laboratory of Harbin Institute of Technology and iFLYTEK re-trained the popular PLMs based on Chinese corpus and released them. In the meantime, a novel model called MacBERT was proposed [5]. The MLM as a correction task was used instead of the original MLM task. The goal is to convert the words represented by predicted masked tokens into text correction tasks, as masked tokens are not present in downstream tasks. Similarly, the Enhanced Representation through Knowledge Integration (ERNIE) language model, was designed and released by Baidu [19]. The masking process of BERT was optimized and further improved. Unlike randomly selecting words in the input, it involved entity-level masking and phrase-level masking.

The primary objective of CNER is to identify and classify various entities, including diseases, medications and clinical symptoms and so on, from electronic health records. In the early stage, dictionary-based methods are an immediate and effective approach. For instance, some experts developed systems to solve the CNER problem, such as MedKAT [2] and cTAKES [3]. However, due to the complexity of medical scenarios, these methods may not be able to match the correct entities from an abstract context.

The emergence of deep learning and PLMs alleviates this dilemma. At this point, the paradigm of fine-tuning based on PLMs is widely applied in CNER. Specifically, PLMs are utilized for text feature extraction, which is followed by all kinds of classifiers such as BiLSTM, CRF, or multi-layer perceptron (MLP) [12], [20], [21]. For example, Zhang et al. [20] used the model structure of BERT-BiLSTM-CRF to achieve entity recognition in breast cancer diagnosis and achieved an F1-score of 93.53%. Kim et al. [22] constructed a similar model that could achieve an 84% F1-score on the Korean CNER task, which was superior to the BiLSTM-CRF network.

With the release of the GPT model [13], the prompt-based approaches are widely studied. These methods mainly compensate for the gap between pre-trained tasks and downstream tasks, thus enabling the execution of various tasks in low-resource scenarios [23]. An important procedure in low-resource learning is the design of templates to provide semantic information for language models. For example, Schick et al. [15] used a manually defined template for text classification. Wang et al. [24] also predefined a template to build a cloze-style task based on knowledge guided for a low-resource classification problem. Recently, there has been some work studying NER in low-resource scenarios [25], [26], [27], [28]. For example, the demonstration-based learning method proposed by Lee et al was applied to NER task [26]. By appending contextual prompt information to the original sentence, they are able to improve the F1-score of the baseline model by 4%-17% when training samples are extremely limited. By defining templates and utilizing language models, Cui et al. [25] were able to achieve entity recognition with an F1-score that is at least 10% higher than that of fine-tuned models in their experimental results. In comparison, with reference to the work of Wang et al.[29], the goal of the CNER task shifted to one where language models were better suited for language inference. This method did not require the design of complex verbalizer engineering and eliminated the issue of semantic mismatches caused by regressive models.

Comments (0)

No login
gif