Module molcrawl.chemberta2.main
ChemBERTa-2 Training Script
RoBERTa-based Transformer model training script specialized for SMILES compound data. Learn on large-scale compound data using the ChemBERTa-2 architecture.
Features: - Tokenization exclusively for SMILES - RoBERTa architecture (improved version of BERT) - Easy transfer learning to compound property prediction - Efficient batch processing and memory management