
Machine Learning for Biological Data
Interpretable models built by someone who understands the biology.
Machine learning in biology has a credibility problem. Too many models are built to hit an accuracy number without any connection to biological mechanism. I build models differently: classifiers that can tell your PI which genes and cell types are driving the prediction, regression models with feature importance you can validate on the bench, and image analysis tools with decision criteria a pathologist can evaluate. The goal is results that are meaningful, reproducible, and defensible in a manuscript or a board presentation.
Classification
Identify cell types, disease states, treatment responders, or phenotypic clusters from high-dimensional omics or imaging data. I build and validate classifiers using random forests, gradient boosting, SVMs, and neural networks, with rigorous cross-validation and feature importance analysis.
Regression & Predictive Modeling
Predict continuous outcomes: gene expression levels, cytokine concentrations, drug responses, or time-to-event. I handle feature selection, regularization, and model evaluation to ensure predictions are robust and not overfit to training data.
Unsupervised Learning
Discover hidden structure in your data. Clustering (k-means, hierarchical, DBSCAN, Leiden), dimensionality reduction (UMAP, PCA, tSNE), and feature extraction to surface patterns you didn't know to look for.
Deep Learning for Images & Sequences
Convolutional neural networks for microscopy image classification and segmentation. Transformer-based models and graph neural networks for spatial transcriptomics and sequence-level biological problems. Scoped carefully to ensure training data is sufficient.
When ML is not the answer
Sometimes a well-designed statistical test outperforms a neural network. Part of my job is knowing the difference. If your question is better answered by a DESeq2 analysis than a deep learning model, I will tell you that upfront, and scope accordingly.
What I Deliver
- Trained, validated models with documented performance metrics
- Feature importance and biological interpretation of model outputs
- Reproducible code (Python/scikit-learn/PyTorch) with clear documentation
- Publication-ready visualizations of model results and uncertainty
- Honest assessment of what the data can and cannot support
Related Services
Wondering if ML is the right approach for your data? Let's talk it through.
Book a Free 30-Minute Call




