Module molcrawl.chemberta2.main

ChemBERTa-2 Training Script

RoBERTa-based Transformer model training script specialized for SMILES compound data. Learn on large-scale compound data using the ChemBERTa-2 architecture.

Features: - Tokenization exclusively for SMILES - RoBERTa architecture (improved version of BERT) - Easy transfer learning to compound property prediction - Efficient batch processing and memory management