Examples
EvoAug2 provides comprehensive examples demonstrating different integration approaches and use cases. These examples range from basic PyTorch integration to advanced PyTorch Lightning workflows.
Available Examples
- PyTorch Lightning Integration (PyTorch Lightning Integration Example)
Complete training script with Lightning integration and two-stage approach. Shows DataModule creation, checkpoint management, and performance comparison.
- Vanilla PyTorch Integration (Vanilla PyTorch Integration Example)
Basic PyTorch implementation demonstrating core augmentation functionality. Ideal for users who prefer direct PyTorch control without Lightning abstractions.
Example Categories
Training Approaches:
Two-Stage Training (Recommended) - Stage 1: Train with augmentations for robust feature learning - Stage 2: Fine-tune on original data to remove augmentation bias - Best performance and generalization
Single-Stage Training - Train with augmentations throughout - Simpler but may have augmentation bias - Good for quick prototyping
Integration Methods:
PyTorch Lightning - Professional training workflows - Built-in logging and checkpointing - Easy experiment management - Recommended for production use
Vanilla PyTorch - Direct control over training loop - Customizable augmentation strategies - Good for research and experimentation
Augmentation Strategies:
Stochastic Augmentation - Randomly apply augmentations during training - Good for general robustness
Hard Augmentation - Always apply exactly N augmentations per sequence - Consistent training signal - Used in EvoAug2 paper
Running the Examples
Prerequisites:
# Install with examples dependencies
pip install evoaug2[examples]
# Or install from source
git clone https://github.com/aduranu/evoaug.git
cd evoaug
pip install -e .[examples]
Download Data (for DeepSTARR examples):
# Download DeepSTARR dataset
wget https://zenodo.org/record/7265991/files/DeepSTARR_data.h5
# Or use the provided script
python -c "from evoaug_utils import utils; utils.download_deepstarr_data()"
Run Lightning Example:
python example_lightning_module.py
Run Vanilla PyTorch Example:
python example_vanilla_pytorch.py
Example Outputs
Training Progress: - Loss curves for each stage - Validation metrics - Augmentation statistics
Model Checkpoints: - Stage 1: Augmented model - Stage 2: Fine-tuned model - Control: Standard training model
Performance Comparison: - Correlation metrics (Pearson, Spearman) - Visualization plots - Statistical analysis
Generated Files: - Trained models (.ckpt files) - Performance plots (.png files) - Training logs - Evaluation results
Customizing Examples
Modify Augmentation Parameters:
# Adjust mutation rate
RandomMutation(mut_frac=0.1) # 10% mutation rate
# Change deletion range
RandomDeletion(delete_min=5, delete_max=50) # 5-50 nucleotides
# Modify translocation range
RandomTranslocation(shift_min=10, shift_max=30) # 10-30 shifts
Change Training Parameters:
# Adjust learning rates
learning_rate = 0.0005 # Lower for fine-tuning
# Modify epochs
max_epochs = 50 # Fewer epochs for quick testing
finetune_epochs = 3 # Shorter fine-tuning
# Change batch size
batch_size = 64 # Smaller for memory constraints
Custom Datasets:
# Load your own data
from evoaug_utils import utils
# Custom H5Dataset
dataset = utils.H5Dataset(
filepath='your_data.h5',
batch_size=32,
lower_case=False,
transpose=False
)
# Or use numpy arrays
sequences = np.load('sequences.npy')
labels = np.load('labels.npy')
Troubleshooting Examples
Common Issues:
Memory Errors: - Reduce batch size - Use gradient accumulation - Enable mixed precision training
Data Loading Issues: - Check file paths - Verify data format - Ensure sufficient disk space
Training Instability: - Adjust learning rate - Check augmentation parameters - Verify data preprocessing
Getting Help:
Check the user_guide/troubleshooting section
Review example logs and error messages
Consult the GitHub Issues page
Next Steps
After running the examples:
Modify Parameters: Experiment with different augmentation settings
Custom Datasets: Apply to your own genomic data
Advanced Usage: Explore the user_guide for detailed explanations
API Reference: Check the EvoAug2 Core API for all available options
Example Modifications:
Change augmentation types and parameters
Modify model architectures
Adjust training schedules
Add custom evaluation metrics
Integrate with other frameworks
The examples provide a solid foundation for understanding EvoAug2’s capabilities and can be easily adapted for your specific use cases.