Machine learning for biological data

Machine Learning for Biological Data

Interpretable models built by someone who understands the biology.

Machine learning in biology has a credibility problem. Too many models are built to hit an accuracy number without any connection to biological mechanism. I build models differently: classifiers that can tell your PI which genes and cell types are driving the prediction, regression models with feature importance you can validate on the bench, and image analysis tools with decision criteria a pathologist can evaluate. The goal is results that are meaningful, reproducible, and defensible in a manuscript or a board presentation.

Classification

Identify cell types, disease states, treatment responders, or phenotypic clusters from high-dimensional omics or imaging data. I build and validate classifiers using random forests, gradient boosting, SVMs, and neural networks, with rigorous cross-validation and feature importance analysis.

Random ForestGradient BoostingSVMscikit-learn

Regression & Predictive Modeling

Predict continuous outcomes: gene expression levels, cytokine concentrations, drug responses, or time-to-event. I handle feature selection, regularization, and model evaluation to ensure predictions are robust and not overfit to training data.

RegularizationFeature SelectionCross-ValidationPython

Unsupervised Learning

Discover hidden structure in your data. Clustering (k-means, hierarchical, DBSCAN, Leiden), dimensionality reduction (UMAP, PCA, tSNE), and feature extraction to surface patterns you didn't know to look for.

UMAPPCALeidenClustering

Deep Learning for Images & Sequences

Convolutional neural networks for microscopy image classification and segmentation. Transformer-based models and graph neural networks for spatial transcriptomics and sequence-level biological problems. Scoped carefully to ensure training data is sufficient.

CNNsPyTorchTransformersSpatial Transcriptomics

When ML is not the answer

Sometimes a well-designed statistical test outperforms a neural network. Part of my job is knowing the difference. If your question is better answered by a DESeq2 analysis than a deep learning model, I will tell you that upfront, and scope accordingly.

What I Deliver

  • Trained, validated models with documented performance metrics
  • Feature importance and biological interpretation of model outputs
  • Reproducible code (Python/scikit-learn/PyTorch) with clear documentation
  • Publication-ready visualizations of model results and uncertainty
  • Honest assessment of what the data can and cannot support

Wondering if ML is the right approach for your data? Let's talk it through.

Book a Free 30-Minute Call