Namespace molcrawl.compounds.dataset
Sub-modules
molcrawl.compounds.dataset.dataset_config-
Defining and configuring compound datasets …
molcrawl.compounds.dataset.download_chembl-
Download ChEMBL database (SQLite) and extract canonical SMILES strings …
molcrawl.compounds.dataset.hf_converter-
Conversion to HuggingFace Dataset format …
molcrawl.compounds.dataset.multi_loader-
Multi dataset loader …
molcrawl.compounds.dataset.organix13molcrawl.compounds.dataset.prepare_chembl-
Prepare ChEMBL for GPT-2 / BERT fine-tuning on the compounds domain …
molcrawl.compounds.dataset.prepare_gpt2molcrawl.compounds.dataset.prepare_gpt2_organix13molcrawl.compounds.dataset.processor-
Individual dataset processing processor …
molcrawl.compounds.dataset.tokenizer