Posts
Cellular Deconvolution
Cellular deconvolution is an important analytical step for experiments that rely on samples that could comprise several different cell types (e.g. bulk RNA-seq, spatial genomics, etc.). Though these experiments are...
Score and Information - GLMMs
This post is just a catch-all for my derivations for my score test project. Our set-up is as follows. We have $n$ observations coming from $k$ different clusters, each of...
Variant Deconvolution
In the case of a rapidly mutating virus such as SARS-CoV-2, the genetic material in a wastewater sample will be a mixture from different variants. More specifically, the sample will...
Estimation Theory
In my work trying to work through the proofs in Lin1 and Moran2, I’ve run into some confusion in parameter estimation. In this post, I’ll cover the relevant sections in...
Generalized Linear Mixed Models
This post is a primer on generalized linear mixed models (GLMM) and their associated estimation procedures. Throughout this post, we perform derivatives of matrices, vectors, and scalars. We denote all...
Generalized Linear Models
This post is a primer on generalized linear models and their associated estimation procedures. Set-Up Let’s assume we have covariate matrix, $\mathbf{X}$, response vector $\mathbf{y}$, and parameter (coefficient) vector, $\beta$,...
Improved Inference For SPLASH
This post builds off my earlier post on concentration inequalities and empirical Bernstein bounds. Here, I’m going to try to apply those ideas to get a better bound on the...
Betting-Based Confidence Sequences
A colleague introduced me to some recent work from Waudby-Smith and Ramdas here at Carnegie Mellon. Since I’ve been working on applications of concentration bounds, it certainly seems important to...
Martingales
This post works through highlights of Aaditya Ramdas’ 2018 minicourse on martingales at Carnegie Mellon University1 with some supplemental information taken Durrett2 and some definitions from Wikipedia. Additional definitions are...
Measure Theory
My work has become much more technical that I am used to, so I thought it would be good to take some notes on basic measure and probability theory in...
SPLASH
There is an exciting new framework for reference-free genomic discovery called SPLASH (Statistically Primary aLignment Agnostic Sequence Homing) from the Salzman Lab at Stanford1 2. It’s super cool, and there...
Concentration Inequalities
In this post, I’m going to take myself through a review of standard concentration inequalities in probability theory with an ultimate goal of exploring empirical Bernstein bounds. Note: Not all...
One-Sided Score Test
In many cases, we may want to test the null hypothesis that a parameter is zero against a one-sided alternative (e.g. the parameter is non-negative). In this setting, we are...
Likelihood and Large-Sample Theory
The score test in non-standard conditions has been the motivation for much of my reading these past few months. However, it has led me to wonder about the small details...
Clustering Stability
My last post related to clustering discussed how to describe a “good” clustering algorithm. One way to measure this is by stability, which I’ll define more rigorously later. The main...
Clustering: An Axiomatic Approach
Though the journey to this point is a bit confusing, I have recently become interesting in clustering metrics and evaluation. In this post, I’ll work through a couple papers on...