Namespace molcrawl.compounds.dataset

Sub-modules

molcrawl.compounds.dataset.dataset_config

Defining and configuring compound datasets …

molcrawl.compounds.dataset.download_chembl

Download ChEMBL database (SQLite) and extract canonical SMILES strings …

molcrawl.compounds.dataset.hf_converter

Conversion to HuggingFace Dataset format …

molcrawl.compounds.dataset.multi_loader

Multi dataset loader …

molcrawl.compounds.dataset.organix13
molcrawl.compounds.dataset.prepare_chembl

Prepare ChEMBL for GPT-2 / BERT fine-tuning on the compounds domain …

molcrawl.compounds.dataset.prepare_gpt2
molcrawl.compounds.dataset.prepare_gpt2_organix13
molcrawl.compounds.dataset.processor

Individual dataset processing processor …

molcrawl.compounds.dataset.tokenizer