Module molcrawl.preparation

Data preparation scripts for various datasets.

Sub-modules

molcrawl.preparation.convert_parquet_to_arrow

Convert a combined parquet file with split column into separate arrow files

molcrawl.preparation.download_guacamol

GuacaMol Dataset Download Script …

molcrawl.preparation.preparation_script_compounds
molcrawl.preparation.preparation_script_genome_sequence
molcrawl.preparation.preparation_script_molecule_related_nat_lang
molcrawl.preparation.preparation_script_protein_sequence
molcrawl.preparation.preparation_script_rna
molcrawl.preparation.test_molecule_nat_lang_compatibility