Skip to content

MIL-RBERT: A Data-driven Approach for Noise Reduction in Distantly Supervised Biomedical Relation Extraction (BioNLP @ ACL 2020)

License

Notifications You must be signed in to change notification settings

suamin/MIL-RBERT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UMLS-MEDLINE Biomedical Distant RE for Bag-level Multiple Instance Learning

Code for the paper BioNLP 2020 paper A Data-driven Approach for Noise Reduction in Distantly Supervised Biomedical Relation Extraction.

Model Architecture

Requirements

pip install -r requirements.txt

Data

To run the code, please obtain the data as follows:

UMLS

Install the UMLS tools by following the steps here. Once installed, under INSTALLED_DIR/2019AB/META, you can find MRREL.RRF and MRCONSO.RRF, copy the files and place under data/UMLS.

MEDLINE

Download MEDLINE abstracts medline_abs.txt (~24.5GB) and place under data/MEDLINE. UPDATE: Please follow the discussion here: #2

Data Creation
  1. From project base dir, call the script to process UMLS as: python -m data_utils.process_umls. This will create an object data/umls_vocab.pkl.
  2. Next, run the script python -m data_utils.extract_unique_sentences_medline. This might take a while. This will create a file data/MEDLINE/medline_unique_sentences.txt.
  3. Link the entities with texts: python -m data_utils.link_entities (see config.py to adjust linking settings).
Data Splits

To reproduce the data splits used reported in the paper for k-tag setting, run wit default options as python -m data_utils.create_split. This will take a while for the first time because of generating the one time file data/MEDLINE/linked_sentences_to_groups.jsonl. For next runs, it will use the cached version. For s-tag, set the flag k_tag=False in config.py. For s-tag+exprels, additionally set the flag expand_rels=True.

Features

Run python -m data_utils.features. Running the job with multi-processing will be significantly faster.

Train

Run python train.py.

Checkpoint

Download the best model checkpoint here.

Citation

If you use this code for your research, please consider citing:

@inproceedings{amin-etal-2020-data,
    title = "A Data-driven Approach for Noise Reduction in Distantly Supervised Biomedical Relation Extraction",
    author = "Amin, Saadullah and Dunfield, Katherine Ann and Vechkaeva, Anna and Neumann, G{\"u}nter",
    booktitle = "Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.bionlp-1.20",
    doi = "10.18653/v1/2020.bionlp-1.20",
    pages = "187--194"
}

Also, check our follow up work introducing a new benchmark using PubMed abstracts and SNOMED CT knowledge base, MedDistant19:

@inproceedings{amin-etal-2022-meddistant19,
    title = "{M}ed{D}istant19: Towards an Accurate Benchmark for Broad-Coverage Biomedical Relation Extraction",
    author = "Amin, Saadullah and Minervini, Pasquale and Chang, David and Stenetorp, Pontus and Neumann, G{\"u}nter",
    booktitle = "Proceedings of the 29th International Conference on Computational Linguistics",
    month = oct,
    year = "2022",
    address = "Gyeongju, Republic of Korea",
    publisher = "International Committee on Computational Linguistics",
    url = "https://aclanthology.org/2022.coling-1.198",
    pages = "2259--2277",
}

Acknowledgements

We thank Qin Dai (daiqin@ecei.tohoku.ac.jp) for guiding us on steps to obtain the relevant triples data from the UMLS in private communication.

About

MIL-RBERT: A Data-driven Approach for Noise Reduction in Distantly Supervised Biomedical Relation Extraction (BioNLP @ ACL 2020)

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •  

Languages

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy