❓ what make a topic model (i.e. Latent Dirichlet Allocation)
📚 its general application
🎗️ Its use in cancer research
📊 data, modeling, software
🏥 medical implications


Signatures of Mutational Processes in Human Cancer



[0, 1]




| sample | chr | position | ref | alt |
|---|---|---|---|---|
| sample1 | chr1 | 100 | A | C |
| sample1 | chr2 | 100 | G | T |
| sample2 | chr1 | 300 | T | C |
| sample3 | chr3 | 400 | T | C |

| sample | chr | position | ref | alt |
|---|---|---|---|---|
| sample1 | chr1 | 100 | A | C |
| sample1 | chr2 | 100 | G | T |
| sample2 | chr1 | 300 | T | C |
| sample3 | chr3 | 400 | T | C |
Yang et al. HiLDA: a statistical approach to investigate differences in mutational signatures, 2019







Are they different?

fractions pk and concentration parameters αk p1,⋯,pK∼Dirichlet(α1,⋯,αK) μk=αk∑kαk
To capture the group difference for signature i, Δi=μ(1)i−μ(2)i What we'd like to know, H0:Δi=0,i=1,2,...,K
Why don't we just compare the estimated values?
Δi=^μ(1)i−^μ(2)i

My Journey From Frequentist to Bayesian Statistics

When there is WinBUGS already, why is JAGS necessary?
install.packages("R2jags")library(R2jags)The focus of the analysis presented here is on accurately estimating the mean of IQ using simulated data.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4158865/

https://towardsdatascience.com/visualizing-beta-distribution-7391c18031f1
We hold meetings every month and post it on meetup.com, which tells us how many people've signed up, n. But not everyone show up (i.e., traffic). So, we'd like to know what percentage of people, θ, will show up to order enough pizzas.
Yi|θ∼Binom(ni,θ)
θ∼Beta(a,b)
Weak prior:
Beta(1,1)↔Unif(0,1)
Y <- c(77, 62, 30)N <- c(39, 29, 16)But as you know, someone (@lukanegoita) said
Yi|θ∼Binom(ni,θ)
θ∼Beta(a,b)
In R
model{for(i in 1:3){ Y[i] <- rbinom(1, N[i], theta) } theta <- rbeta(1, 1, 1)}
In JAGS
model{for(i in 1:3){ Y[i] ~ dbinom(theta, N[i])} theta ~ dbeta(1, 1)}
dist instead of rdist for random samplinglibrary(R2jags)model_string<-"model{# Likelihoodfor(i in 1:3){ Y[i] ~ dbinom(theta, N[i])}# Prior theta ~ dbeta(1, 1)}"# DataN <- c(77, 62, 30)Y <- c(39, 29, 16)# specify parameters to be moniteredparams <- c("theta")R2jags and runjags are just wrappers of rjagsrunjags makes it easy to run chains in parallel on multiple processesR2jags allows running until converged and paralleling MCMC chainsR2jagsoutput <- jags(model.file = textConnection(model_string), data = list(Y = Y, N = N), parameters.to.save = params, n.chains = 2, n.iter = 2000)
## Compiling model graph## Resolving undeclared variables## Allocating nodes## Graph information:## Observed stochastic nodes: 3## Unobserved stochastic nodes: 1## Total graph size: 8## ## Initializing model## ## | | | 0% | |++ | 4% | |++++ | 8% | |++++++ | 12% | |++++++++ | 16% | |++++++++++ | 20% | |++++++++++++ | 24% | |++++++++++++++ | 28% | |++++++++++++++++ | 32% | |++++++++++++++++++ | 36% | |++++++++++++++++++++ | 40% | |++++++++++++++++++++++ | 44% | |++++++++++++++++++++++++ | 48% | |++++++++++++++++++++++++++ | 52% | |++++++++++++++++++++++++++++ | 56% | |++++++++++++++++++++++++++++++ | 60% | |++++++++++++++++++++++++++++++++ | 64% | |++++++++++++++++++++++++++++++++++ | 68% | |++++++++++++++++++++++++++++++++++++ | 72% | |++++++++++++++++++++++++++++++++++++++ | 76% | |++++++++++++++++++++++++++++++++++++++++ | 80% | |++++++++++++++++++++++++++++++++++++++++++ | 84% | |++++++++++++++++++++++++++++++++++++++++++++ | 88% | |++++++++++++++++++++++++++++++++++++++++++++++ | 92% | |++++++++++++++++++++++++++++++++++++++++++++++++ | 96% | |++++++++++++++++++++++++++++++++++++++++++++++++++| 100%## | | | 0% | |** | 4% | |**** | 8% | |****** | 12% | |******** | 16% | |********** | 20% | |************ | 24% | |************** | 28% | |**************** | 32% | |****************** | 36% | |******************** | 40% | |********************** | 44% | |************************ | 48% | |************************** | 52% | |**************************** | 56% | |****************************** | 60% | |******************************** | 64% | |********************************** | 68% | |************************************ | 72% | |************************************** | 76% | |**************************************** | 80% | |****************************************** | 84% | |******************************************** | 88% | |********************************************** | 92% | |************************************************ | 96% | |**************************************************| 100%## Compiling model graph## Resolving undeclared variables## Allocating nodes## Graph information:## Observed stochastic nodes: 3## Unobserved stochastic nodes: 1## Total graph size: 9## ## Initializing model## ## | | | 0% | |++ | 4% | |++++ | 8% | |++++++ | 12% | |++++++++ | 16% | |++++++++++ | 20% | |++++++++++++ | 24% | |++++++++++++++ | 28% | |++++++++++++++++ | 32% | |++++++++++++++++++ | 36% | |++++++++++++++++++++ | 40% | |++++++++++++++++++++++ | 44% | |++++++++++++++++++++++++ | 48% | |++++++++++++++++++++++++++ | 52% | |++++++++++++++++++++++++++++ | 56% | |++++++++++++++++++++++++++++++ | 60% | |++++++++++++++++++++++++++++++++ | 64% | |++++++++++++++++++++++++++++++++++ | 68% | |++++++++++++++++++++++++++++++++++++ | 72% | |++++++++++++++++++++++++++++++++++++++ | 76% | |++++++++++++++++++++++++++++++++++++++++ | 80% | |++++++++++++++++++++++++++++++++++++++++++ | 84% | |++++++++++++++++++++++++++++++++++++++++++++ | 88% | |++++++++++++++++++++++++++++++++++++++++++++++ | 92% | |++++++++++++++++++++++++++++++++++++++++++++++++ | 96% | |++++++++++++++++++++++++++++++++++++++++++++++++++| 100%## | | | 0% | |** | 4% | |**** | 8% | |****** | 12% | |******** | 16% | |********** | 20% | |************ | 24% | |************** | 28% | |**************** | 32% | |****************** | 36% | |******************** | 40% | |********************** | 44% | |************************ | 48% | |************************** | 52% | |**************************** | 56% | |****************************** | 60% | |******************************** | 64% | |********************************** | 68% | |************************************ | 72% | |************************************** | 76% | |**************************************** | 80% | |****************************************** | 84% | |******************************************** | 88% | |********************************************** | 92% | |************************************************ | 96% | |**************************************************| 100%plot(as.mcmc(output))

library(mcmcplots)mcmcplot(as.mcmc(output),c("theta"))denplot(as.mcmc(output),c("theta"))

So it might take a longer time for run
output$BUGSoutput$summary
## mean sd 2.5% 25% 50% 75%## deviance 14.6755846 1.3992791 13.6488245 13.7614902 14.1370965 15.0345916## theta 0.4973384 0.0388237 0.4187856 0.4710636 0.4970235 0.5249785## 97.5% Rhat n.eff## deviance 18.7558006 1.017801 220## theta 0.5720824 1.003696 2000If it fails to converge, we can do the following,
update(output, n.iter = 1000)autojags(output, n.iter = 1000, Rhat = 1.05, n.update = 2)output$BUGSoutput$summary
## mean sd 2.5% 25% 50% 75%## deviance 14.6755846 1.3992791 13.6488245 13.7614902 14.1370965 15.0345916## theta 0.4973384 0.0388237 0.4187856 0.4710636 0.4970235 0.5249785## 97.5% Rhat n.eff## deviance 18.7558006 1.017801 220## theta 0.5720824 1.003696 2000output2$BUGSoutput$summary
## mean sd 2.5% 25% 50% 75%## deviance 15.2841452 1.77163845 13.6509996 13.9603596 14.6893500 16.0268236## theta 0.5357841 0.02994589 0.4754806 0.5161242 0.5361084 0.5560317## 97.5% Rhat n.eff## deviance 20.0407688 1.005639 490## theta 0.5934454 1.002685 690denplot(as.mcmc(output), parms = "theta", xlim = c(0.3, 0.7))

denplot(as.mcmc(output2), parms = "theta", xlim = c(0.3, 0.7))

In JAGS
model<-"model{# Likelihoodfor(i in 1:N){ y[i] ~ dnorm(mu[i], tau) mu[i] <- beta0 + beta1*x[i]}# Prior # intercept beta0 ~ dnorm(0, 0.0001) # slope beta1 ~ dnorm(0, 0.0001) # error term tau ~ dgamma(0.01, 0.01) sigma <- 1/sqrt(tau)}"lm_output <- jags(model.file = textConnection(model), data = list(y = y, x = x, N=15), parameters.to.save = c("beta0","beta1", "sigma"), n.chains = 2, n.iter = 2000, progress.bar ="none")
In R
lm(y~x)
In statistics 101
Yi=b0+b1Xi+σ Y∼N(b0+b1X,σ2)
## Compiling model graph## Resolving undeclared variables## Allocating nodes## Graph information:## Observed stochastic nodes: 15## Unobserved stochastic nodes: 3## Total graph size: 70## ## Initializing modelIn JAGS
lm_output$BUGSoutput$summary[1:2, c(3,7)]
## 2.5% 97.5%## beta0 -8.853992 52.131691## beta1 -16.496277 3.19287395% credible interval
In R
confint(lm(y~x))
## 2.5 % 97.5 %## (Intercept) -9.32782 51.322335## x -16.19329 3.48652895% confidence interval
In runjags
model <- template.jags( y ~ x, data, n.chains=2, family='gaussian')
In R
lm(y~x)
RJAGS: how to get started by Rens van de Schoot, retweeted by Martyn Plummer
"perky ears" + "floppy ears" ?= "half-up ears"

Slides: https://lda-mutations.netlify.com/
@zhiiiyang zhiiiyang zyang895@gmail.com
Slides created via the R package xaringan and xaringanthemer
❓ what make a topic model (i.e. Latent Dirichlet Allocation)
📚 its general application
🎗️ Its use in cancer research
📊 data, modeling, software
🏥 medical implications
Keyboard shortcuts
| ↑, ←, Pg Up, k | Go to previous slide |
| ↓, →, Pg Dn, Space, j | Go to next slide |
| Home | Go to first slide |
| End | Go to last slide |
| Number + Return | Go to specific slide |
| b / m / f | Toggle blackout / mirrored / fullscreen mode |
| c | Clone slideshow |
| p | Toggle presenter mode |
| t | Restart the presentation timer |
| ?, h | Toggle this help |
| Esc | Back to slideshow |