Research

Publications

Google Scholar →

Code

Open source

GitHub →
  • Python

    cloudfit

    Cloud-agnostic machine type advisor for batch and bioinformatics workloads. Given a workload spec (CPU, RAM, region, optimize for cost / performance / availability), returns ranked instance recommendations with transparent per-factor scoring. Multi-package OSS ecosystem: scoring engine (cloudfit-core), GCP provider (cloudfit-provider-gcp), and a stateless FastAPI service (cloudfit-api) with a multi-region bundled snapshot. Built to fill the pre-launch and batch-workload sizing gap that incumbent free tools (Compute Optimizer, GCP Recommender) don't cover.

    $ pip install cloudfit-core cloudfit-provider-gcp
    PyPI FastAPI Multi-cloud Apache 2.0 GitHub ↗ Landing ↗ Live demo ↗
  • Python

    clinops

    Clinical ML pipeline toolkit: MIMIC-IV / FHIR loaders, temporal feature windows, and patient-aware train/test splits that don't leak across cohorts. Distilled from production work in clinical and genomic data engineering.

    $ pip install clinops
    PyPI Healthcare Apache 2.0 GitHub ↗ Docs ↗
  • Python

    samplesheet-parser

    Format-agnostic parser for Illumina SampleSheet.csv files. Auto-detects IEM v1 vs. BCLConvert v2, validates index integrity with Hamming distance checks, and converts, diffs, or merges sheets across mixed sequencing fleets.

    $ pip install samplesheet-parser
    PyPI Bioconda Bioinformatics Apache 2.0 GitHub ↗ Docs ↗ Bioconda ↗

Writing

Tutorials & articles

All on Medium →

Want to chat about data or research?

Always happy to swap notes on data systems, ML in production, or open-source projects, and to hear feedback on my writing or research.