Ekstraksi Informasi Pada Literatur Ilmiah Mengenai Mutagenesis Ems Berbasis Text Mining Dengan Menggunakan Model Bert

Ibrahim, Maulana Malik (2026) Ekstraksi Informasi Pada Literatur Ilmiah Mengenai Mutagenesis Ems Berbasis Text Mining Dengan Menggunakan Model Bert. Undergraduate thesis, Politeknik Negeri Jember.

[img] Text (Abstract)
ABSTRACT.pdf - Submitted Version
Available under License Creative Commons Attribution Share Alike.

Download (11kB)
[img] Text (Bab 1 Pendahuluan)
BAB 1.pdf - Submitted Version
Available under License Creative Commons Attribution Share Alike.

Download (169kB)
[img] Text (Daftar Pustaka)
DAFTAR PUSTAKA.pdf - Submitted Version
Available under License Creative Commons Attribution Share Alike.

Download (876kB)
[img] Text (Laporan Lengkap)
LAPORAN LENGKAP.pdf
Restricted to Registered users only

Download (4MB) | Request a copy

Abstract

Chemical mutagenesis using Ethyl Methane Sulfonate (EMS) is one of the primary methods in plant breeding programs for inducing controlled genetic diversity. Information regarding EMS treatment parameters, including dosage, target organism, mutated genes, and phenotype, is scattered across thousands of unstructured scientific publications, making it difficult for plant breeders to efficiently access the accumulated knowledge. This study develops an automated information extraction system based on the Bidirectional Encoder Representations from Transformers (BERT) model, specifically BioBERT, fine- tuned for the Named Entity Recognition (NER) task on the EMS mutagenesis domain. To address the data scarcity challenge inherent to the molecular biology domain, this study integrates an Active Learning approach with a Teacher-Student Model pseudo-labeling mechanism. Through eight Active Learning cycles, the training data expanded automatically from 1,252 manually annotated sentences to 12,037 high-quality sentences. The resulting model achieved an overall F1-Score of 93.13% (Precision 92.54%, Recall 93.72%) under the Strict Exact Match evaluation. Per-entity performance yielded F1- Scores of 95% for the GENE class, 93% for ORGANISM, 91% for PHENOTYPE, and 90% for DOSAGE. The system was implemented using a Microservices architecture integrating Laravel as the frontend and Flask API as the artificial intelligence backend, augmented by a Continuous Learning (Data Flywheel) mechanism that enables the system to improve continuously with increased usage. The results demonstrate that a BioBERT-based text mining approach provides a reliable and effective tool for comprehensively synthesizing EMS mutagenesis knowledge from scientific literature without requiring manual reading.

Item Type: Thesis (Undergraduate)
Contributors:
ContributionContributorsNIDN/NIDK
Thesis advisorAtmadji, Ery Setiyawan JullevNIDN0010078903
Uncontrolled Keywords: Active Learning, BioBERT, Ethyl Methane Sulfonate (EMS), Ekstraksi Informasi, Mutagenesis, Named Entity Recognition, Text Mining
Subjects: 410 - Rumpun Ilmu Teknik > 450 - Teknik Elektro dan Informatika > 458 - Teknik Informatika
410 - Rumpun Ilmu Teknik > 450 - Teknik Elektro dan Informatika > 459 - Ilmu Komputer
410 - Rumpun Ilmu Teknik > 450 - Teknik Elektro dan Informatika > 461 - Sistem Informasi
410 - Rumpun Ilmu Teknik > 450 - Teknik Elektro dan Informatika > 462 - Teknologi Informasi
Divisions: Jurusan Teknologi Informasi > Prodi D4 Teknik Informatika > Tugas Akhir
Depositing User: Maulana Malik Ibrahim
Date Deposited: 20 May 2026 04:54
Last Modified: 20 May 2026 04:55
URI: https://sipora.polije.ac.id/id/eprint/55971

Actions (login required)

View Item View Item