Examples

EvoAug2 provides comprehensive examples demonstrating different integration approaches and use cases. These examples range from basic PyTorch integration to advanced PyTorch Lightning workflows.

Available Examples

PyTorch Lightning Integration (PyTorch Lightning Integration Example): Complete training script with Lightning integration and two-stage approach. Shows DataModule creation, checkpoint management, and performance comparison.
Vanilla PyTorch Integration (Vanilla PyTorch Integration Example): Basic PyTorch implementation demonstrating core augmentation functionality. Ideal for users who prefer direct PyTorch control without Lightning abstractions.

Example Categories

Training Approaches:

Two-Stage Training (Recommended) - Stage 1: Train with augmentations for robust feature learning - Stage 2: Fine-tune on original data to remove augmentation bias - Best performance and generalization
Single-Stage Training - Train with augmentations throughout - Simpler but may have augmentation bias - Good for quick prototyping

Integration Methods:

PyTorch Lightning - Professional training workflows - Built-in logging and checkpointing - Easy experiment management - Recommended for production use
Vanilla PyTorch - Direct control over training loop - Customizable augmentation strategies - Good for research and experimentation

Augmentation Strategies:

Stochastic Augmentation - Randomly apply augmentations during training - Good for general robustness
Hard Augmentation - Always apply exactly N augmentations per sequence - Consistent training signal - Used in EvoAug2 paper

Running the Examples

Prerequisites:

# Install with examples dependencies
pip install evoaug2[examples]

# Or install from source
git clone https://github.com/aduranu/evoaug.git
cd evoaug
pip install -e .[examples]

Download Data (for DeepSTARR examples):

# Download DeepSTARR dataset
wget https://zenodo.org/record/7265991/files/DeepSTARR_data.h5

# Or use the provided script
python -c "from evoaug_utils import utils; utils.download_deepstarr_data()"

Run Lightning Example:

python example_lightning_module.py

Run Vanilla PyTorch Example:

python example_vanilla_pytorch.py

Example Outputs

Training Progress: - Loss curves for each stage - Validation metrics - Augmentation statistics

Model Checkpoints: - Stage 1: Augmented model - Stage 2: Fine-tuned model - Control: Standard training model

Performance Comparison: - Correlation metrics (Pearson, Spearman) - Visualization plots - Statistical analysis

Generated Files: - Trained models (.ckpt files) - Performance plots (.png files) - Training logs - Evaluation results

Customizing Examples

Modify Augmentation Parameters:

# Adjust mutation rate
RandomMutation(mut_frac=0.1)  # 10% mutation rate

# Change deletion range
RandomDeletion(delete_min=5, delete_max=50)  # 5-50 nucleotides

# Modify translocation range
RandomTranslocation(shift_min=10, shift_max=30)  # 10-30 shifts

Change Training Parameters:

# Adjust learning rates
learning_rate = 0.0005  # Lower for fine-tuning

# Modify epochs
max_epochs = 50         # Fewer epochs for quick testing
finetune_epochs = 3     # Shorter fine-tuning

# Change batch size
batch_size = 64         # Smaller for memory constraints

Custom Datasets:

# Load your own data
from evoaug_utils import utils

# Custom H5Dataset
dataset = utils.H5Dataset(
    filepath='your_data.h5',
    batch_size=32,
    lower_case=False,
    transpose=False
)

# Or use numpy arrays
sequences = np.load('sequences.npy')
labels = np.load('labels.npy')

Troubleshooting Examples

Common Issues:

Memory Errors: - Reduce batch size - Use gradient accumulation - Enable mixed precision training
Data Loading Issues: - Check file paths - Verify data format - Ensure sufficient disk space
Training Instability: - Adjust learning rate - Check augmentation parameters - Verify data preprocessing

Getting Help:

Check the user_guide/troubleshooting section
Review example logs and error messages
Consult the GitHub Issues page

Next Steps

After running the examples:

Modify Parameters: Experiment with different augmentation settings
Custom Datasets: Apply to your own genomic data
Advanced Usage: Explore the user_guide for detailed explanations
API Reference: Check the EvoAug2 Core API for all available options

Example Modifications:

Change augmentation types and parameters
Modify model architectures
Adjust training schedules
Add custom evaluation metrics
Integrate with other frameworks

The examples provide a solid foundation for understanding EvoAug2’s capabilities and can be easily adapted for your specific use cases.