Examples ======== EvoAug2 provides comprehensive examples demonstrating different integration approaches and use cases. These examples range from basic PyTorch integration to advanced PyTorch Lightning workflows. Available Examples ----------------- **PyTorch Lightning Integration** (:doc:`examples/lightning_module`) Complete training script with Lightning integration and two-stage approach. Shows DataModule creation, checkpoint management, and performance comparison. **Vanilla PyTorch Integration** (:doc:`examples/vanilla_pytorch`) Basic PyTorch implementation demonstrating core augmentation functionality. Ideal for users who prefer direct PyTorch control without Lightning abstractions. Example Categories ----------------- **Training Approaches:** 1. **Two-Stage Training (Recommended)** - Stage 1: Train with augmentations for robust feature learning - Stage 2: Fine-tune on original data to remove augmentation bias - Best performance and generalization 2. **Single-Stage Training** - Train with augmentations throughout - Simpler but may have augmentation bias - Good for quick prototyping **Integration Methods:** 1. **PyTorch Lightning** - Professional training workflows - Built-in logging and checkpointing - Easy experiment management - Recommended for production use 2. **Vanilla PyTorch** - Direct control over training loop - Customizable augmentation strategies - Good for research and experimentation **Augmentation Strategies:** 1. **Stochastic Augmentation** - Randomly apply augmentations during training - Good for general robustness 2. **Hard Augmentation** - Always apply exactly N augmentations per sequence - Consistent training signal - Used in EvoAug2 paper Running the Examples -------------------- **Prerequisites:** .. code-block:: bash # Install with examples dependencies pip install evoaug2[examples] # Or install from source git clone https://github.com/aduranu/evoaug.git cd evoaug pip install -e .[examples] **Download Data (for DeepSTARR examples):** .. code-block:: bash # Download DeepSTARR dataset wget https://zenodo.org/record/7265991/files/DeepSTARR_data.h5 # Or use the provided script python -c "from evoaug_utils import utils; utils.download_deepstarr_data()" **Run Lightning Example:** .. code-block:: bash python example_lightning_module.py **Run Vanilla PyTorch Example:** .. code-block:: bash python example_vanilla_pytorch.py Example Outputs --------------- **Training Progress:** - Loss curves for each stage - Validation metrics - Augmentation statistics **Model Checkpoints:** - Stage 1: Augmented model - Stage 2: Fine-tuned model - Control: Standard training model **Performance Comparison:** - Correlation metrics (Pearson, Spearman) - Visualization plots - Statistical analysis **Generated Files:** - Trained models (.ckpt files) - Performance plots (.png files) - Training logs - Evaluation results Customizing Examples -------------------- **Modify Augmentation Parameters:** .. code-block:: python # Adjust mutation rate RandomMutation(mut_frac=0.1) # 10% mutation rate # Change deletion range RandomDeletion(delete_min=5, delete_max=50) # 5-50 nucleotides # Modify translocation range RandomTranslocation(shift_min=10, shift_max=30) # 10-30 shifts **Change Training Parameters:** .. code-block:: python # Adjust learning rates learning_rate = 0.0005 # Lower for fine-tuning # Modify epochs max_epochs = 50 # Fewer epochs for quick testing finetune_epochs = 3 # Shorter fine-tuning # Change batch size batch_size = 64 # Smaller for memory constraints **Custom Datasets:** .. code-block:: python # Load your own data from evoaug_utils import utils # Custom H5Dataset dataset = utils.H5Dataset( filepath='your_data.h5', batch_size=32, lower_case=False, transpose=False ) # Or use numpy arrays sequences = np.load('sequences.npy') labels = np.load('labels.npy') Troubleshooting Examples ------------------------ **Common Issues:** 1. **Memory Errors:** - Reduce batch size - Use gradient accumulation - Enable mixed precision training 2. **Data Loading Issues:** - Check file paths - Verify data format - Ensure sufficient disk space 3. **Training Instability:** - Adjust learning rate - Check augmentation parameters - Verify data preprocessing **Getting Help:** - Check the :doc:`user_guide/troubleshooting` section - Review example logs and error messages - Consult the `GitHub Issues `_ page Next Steps ---------- After running the examples: 1. **Modify Parameters:** Experiment with different augmentation settings 2. **Custom Datasets:** Apply to your own genomic data 3. **Advanced Usage:** Explore the :doc:`user_guide` for detailed explanations 4. **API Reference:** Check the :doc:`api/evoaug` for all available options **Example Modifications:** - Change augmentation types and parameters - Modify model architectures - Adjust training schedules - Add custom evaluation metrics - Integrate with other frameworks The examples provide a solid foundation for understanding EvoAug2's capabilities and can be easily adapted for your specific use cases.