Title: | Analysis of Codon Data under Stationarity using a Bayesian Framework |
---|---|
Description: | Is a collection of models to analyze genome scale codon data using a Bayesian framework. Provides visualization routines and checkpointing for model fittings. Currently published models to analyze gene data for selection on codon usage based on Ribosome Overhead Cost (ROC) are: ROC (Gilchrist et al. (2015) <doi:10.1093/gbe/evv087>), and ROC with phi (Wallace & Drummond (2013) <doi:10.1093/molbev/mst051>). In addition 'AnaCoDa' contains three currently unpublished models. The FONSE (First order approximation On NonSense Error) model analyzes gene data for selection on codon usage against of nonsense error rates. The PA (PAusing time) and PANSE (PAusing time + NonSense Error) models use ribosome footprinting data to analyze estimate ribosome pausing times with and without nonsense error rate from ribosome footprinting data. |
Authors: | Authors@R |
Maintainer: | Cedric Landerer <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.4.4 |
Built: | 2024-11-06 04:43:46 UTC |
Source: | https://github.com/clandere/anacoda |
Converts one character amino acid code to the set of codon encoding that amino acid
AAToCodon(aa, focal = FALSE)
AAToCodon(aa, focal = FALSE)
aa |
Amino acid in single character notation |
focal |
logical, Include the alphabetically last (focal) codon |
Returns the names of the codon encoding the give amino acid
The function calculates and by defaults plots the acf and estimates the autocorrelation in the trace
acfCSP( parameter, csp = "Mutation", numMixtures = 1, samples = NULL, lag.max = 40, plot = TRUE )
acfCSP( parameter, csp = "Mutation", numMixtures = 1, samples = NULL, lag.max = 40, plot = TRUE )
parameter |
object of class Parameter |
csp |
indicates which parameter to calculate the autocorrelation. Must be Mutation (the default, ROC, FONSE), Selection (ROC, FONSE), Alpha (PA, PANSE), LambdaPrime (PA, PANSE), NSERate (PA, PANSE)" |
numMixtures |
indicates the number of CSP mixtures used |
samples |
number of samples at the end of the trace used to calculate the acf |
lag.max |
Maximum amount of lag to calculate acf. Default is 10*log10(N), where N i the number of observations. |
plot |
logical. If TRUE (default) a plot of the acf is created |
The function calculates and by defaults plots the acf and estimates the autocorrelation in the trace.
acfMCMC(mcmc, type = "LogPosterior", samples = NULL, lag.max = 40, plot = TRUE)
acfMCMC(mcmc, type = "LogPosterior", samples = NULL, lag.max = 40, plot = TRUE)
mcmc |
object of class MCMC |
type |
"LogPosterior" or "LogLikelihood", defaults to "LogPosterior" |
samples |
number of samples at the end of the trace used to calculate the acf |
lag.max |
Maximum amount of lag to calculate acf. Default is 10*log10(N), where N i the number of observations. |
plot |
logical. If TRUE (default) a plot of the acf is created |
addObservedSynthesisRateSet
returns the observed
synthesis rates of the genes within the genome specified.
addObservedSynthesisRateSet( genome, observed.expression.file, match.expression.by.id = TRUE )
addObservedSynthesisRateSet( genome, observed.expression.file, match.expression.by.id = TRUE )
genome |
A genome object initialized with
|
observed.expression.file |
A string containing the location of a file containing empirical expression rates (optional). |
match.expression.by.id |
If TRUE (default) observed expression values will be assigned by matching sequence identifier. If FALSE observed expression values will be assigned by order |
Returns the genome after adding the new gene expression values
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") expression_file <- system.file("extdata", "expression.csv", package = "AnaCoDa") ## reading genome genome <- initializeGenomeObject(file = genome_file) ## add expression values after the genome was initiallized, ## or adding an additional set of expression values genome <- addObservedSynthesisRateSet(genome = genome, observed.expression.file = expression_file)
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") expression_file <- system.file("extdata", "expression.csv", package = "AnaCoDa") ## reading genome genome <- initializeGenomeObject(file = genome_file) ## add expression values after the genome was initiallized, ## or adding an additional set of expression values genome <- addObservedSynthesisRateSet(genome = genome, observed.expression.file = expression_file)
Returns a vector of all amino acids
aminoAcids()
aminoAcids()
Returns a vector of all amino acids
initializes the model object.
calculateMarginalLogLikelihood( parameter, mcmc, mixture, n.samples, divisor, warnings = TRUE )
calculateMarginalLogLikelihood( parameter, mcmc, mixture, n.samples, divisor, warnings = TRUE )
parameter |
An object created with |
mcmc |
An object created with |
mixture |
determines for which mixture the marginal log-likelihood should be calculated |
n.samples |
How many samples should be used for the calculation |
divisor |
A value > 1 in order to scale down the tails of the importance distribution |
warnings |
Print warnings such as when the variance of a parameter is 0, which might occur when parameter is fixed |
calculateMarginalLogLikelihood Calculate marginal log-likelihood for calculation of the Bayes factor using a generalized harmonix mean estimator of the marginal likelihood. See Gronau et al. (2017) for details
This function returns the model object created.
## Not run: # Calculate the log-marginal likelihood parameter <- loadParameterObject("parameter.Rda") mcmc <- loadMCMCObject("mcmc.Rda") calculate_marginal_likelihood(parameter, mcmc, mixture = 1, samples = 500, scaling = 1.5) # Calculate the Bayes factor for two models parameter1 <- loadParameterObject("parameter1.Rda") parameter2 <- loadParameterObject("parameter2.Rda") mcmc1 <- loadMCMCObject("mcmc1.Rda") mcmc2 <- loadMCMCObject("mcmc2.Rda") mll1 <- calculate_marginal_likelihood(parameter1, mcmc1, mixture = 1, samples = 500, scaling = 1.5) mll2 <- calculate_marginal_likelihood(parameter2, mcmc2, mixture = 1, samples = 500, scaling = 1.5) cat("Bayes factor: ", mll1 - mll2, "\n") ## End(Not run)
## Not run: # Calculate the log-marginal likelihood parameter <- loadParameterObject("parameter.Rda") mcmc <- loadMCMCObject("mcmc.Rda") calculate_marginal_likelihood(parameter, mcmc, mixture = 1, samples = 500, scaling = 1.5) # Calculate the Bayes factor for two models parameter1 <- loadParameterObject("parameter1.Rda") parameter2 <- loadParameterObject("parameter2.Rda") mcmc1 <- loadMCMCObject("mcmc1.Rda") mcmc2 <- loadMCMCObject("mcmc2.Rda") mll1 <- calculate_marginal_likelihood(parameter1, mcmc1, mixture = 1, samples = 500, scaling = 1.5) mll2 <- calculate_marginal_likelihood(parameter2, mcmc2, mixture = 1, samples = 500, scaling = 1.5) cat("Bayes factor: ", mll1 - mll2, "\n") ## End(Not run)
calculateSCUO
calulates the SCUO value for each gene in genome. Note that if a codon is absent, this will be treated as NA and will be skipped in final calculation
calculateSCUO(genome)
calculateSCUO(genome)
genome |
A genome object initialized with |
returns the SCUO value for each gene in genome
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") ## reading genome genome <- initializeGenomeObject(file = genome_file) scuo <- calculateSCUO(genome)
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") ## reading genome genome <- initializeGenomeObject(file = genome_file) scuo <- calculateSCUO(genome)
Returns a vector of all codons
codons()
codons()
Returns a vector of all codons
Translates a given codon into the amino acid encoded by it.
codonToAA(codon)
codonToAA(codon)
codon |
character, codon to translate |
Returns the amino acid encoded by the given codon as character
Convergence Test
convergence.test( object, samples = 10, frac1 = 0.1, frac2 = 0.5, thin = 1, plot = FALSE, what = "Mutation", mixture = 1 )
convergence.test( object, samples = 10, frac1 = 0.1, frac2 = 0.5, thin = 1, plot = FALSE, what = "Mutation", mixture = 1 )
object |
an object of either class Trace or MCMC |
samples |
number of samples at the end of the trace used to determine convergence (< length of trace). Will use as starting point of convergence test. If the MCMC trace is of length x, then starting point for convergence test will be x - samples. |
frac1 |
fraction to use from beginning of samples |
frac2 |
fraction to use from end of samples |
thin |
the thinning interval between consecutive observations, which is used in creating a coda::mcmc object (according to the Coda documentation, users should specify if a MCMC chain has already been thinned using a the thin parameter). This does not further thin the data. |
plot |
(logical) plot result instead of returning an object |
what |
(for Trace Object only) which parameter to calculate convergence.test – current options are Selection, Mutation, MixtureProbability, Sphi, Mphi, ExpectedPhi, and AcceptanceCSP |
mixture |
(for Trace Object only) mixture for which to calculate convergence.test |
Be aware that convergence.test for Trace objects works primarily for Trace objects from the ROC parameter class. Future updates will adapt this function to work for parameters from other models and expression traces
Geweke score object evaluating whether means of two fractions (frac1 and frac2) differ. Convergence occurs when they don't differ significantly, i.e. pnorm(abs(convergence.test(mcmcObj)$a, ,lower.tail=FALSE)*2 > 0.05
## check for convergence after a run: genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") genome <- initializeGenomeObject(file = genome_file) sphi_init <- c(1,1) numMixtures <- 2 geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2))) parameter <- initializeParameterObject(genome = genome, sphi = sphi_init, num.mixtures = numMixtures, gene.assignment = geneAssignment, mixture.definition = "allUnique") samples <- 2500 thinning <- 50 adaptiveWidth <- 25 mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth, est.expression=TRUE, est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE) divergence.iteration <- 10 ## Not run: runMCMC(mcmc = mcmc, genome = genome, model = model, ncores = 4, divergence.iteration = divergence.iteration) # check if posterior trace has converged convergence.test(object = mcmc, samples = 500, plot = TRUE) trace <- getTrace(parameter) # check if Mutation trace has converged convergence.test(object = trace, samples = 500, plot = TRUE, what = "Mutation") # check if Sphi trace has converged convergence.test(object = trace, samples = 500, plot = TRUE, what = "Sphi") # check if ExpectedPhi trace has converged convergence.test(object = trace, samples = 500, plot = TRUE, what = "ExpectedPhi") ## End(Not run)
## check for convergence after a run: genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") genome <- initializeGenomeObject(file = genome_file) sphi_init <- c(1,1) numMixtures <- 2 geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2))) parameter <- initializeParameterObject(genome = genome, sphi = sphi_init, num.mixtures = numMixtures, gene.assignment = geneAssignment, mixture.definition = "allUnique") samples <- 2500 thinning <- 50 adaptiveWidth <- 25 mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth, est.expression=TRUE, est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE) divergence.iteration <- 10 ## Not run: runMCMC(mcmc = mcmc, genome = genome, model = model, ncores = 4, divergence.iteration = divergence.iteration) # check if posterior trace has converged convergence.test(object = mcmc, samples = 500, plot = TRUE) trace <- getTrace(parameter) # check if Mutation trace has converged convergence.test(object = trace, samples = 500, plot = TRUE, what = "Mutation") # check if Sphi trace has converged convergence.test(object = trace, samples = 500, plot = TRUE, what = "Sphi") # check if ExpectedPhi trace has converged convergence.test(object = trace, samples = 500, plot = TRUE, what = "ExpectedPhi") ## End(Not run)
findOptimalCodon
extracrs the optimal codon for each amino acid.
findOptimalCodon(csp)
findOptimalCodon(csp)
csp |
a |
A named list with with optimal codons for each amino acid.
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") genome <- initializeGenomeObject(file = genome_file) sphi_init <- 1 numMixtures <- 1 geneAssignment <- rep(1, length(genome)) parameter <- initializeParameterObject(genome = genome, sphi = sphi_init, num.mixtures = numMixtures, gene.assignment = geneAssignment, mixture.definition = "allUnique") model <- initializeModelObject(parameter = parameter, model = "ROC") samples <- 2500 thinning <- 50 adaptiveWidth <- 25 mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth, est.expression=TRUE, est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE) divergence.iteration <- 10 ## Not run: runMCMC(mcmc = mcmc, genome = genome, model = model, ncores = 4, divergence.iteration = divergence.iteration) csp_mat <- getCSPEstimates(parameter, CSP="Selection") opt_codons <- findOptimalCodon(csp_mat) ## End(Not run)
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") genome <- initializeGenomeObject(file = genome_file) sphi_init <- 1 numMixtures <- 1 geneAssignment <- rep(1, length(genome)) parameter <- initializeParameterObject(genome = genome, sphi = sphi_init, num.mixtures = numMixtures, gene.assignment = geneAssignment, mixture.definition = "allUnique") model <- initializeModelObject(parameter = parameter, model = "ROC") samples <- 2500 thinning <- 50 adaptiveWidth <- 25 mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth, est.expression=TRUE, est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE) divergence.iteration <- 10 ## Not run: runMCMC(mcmc = mcmc, genome = genome, model = model, ncores = 4, divergence.iteration = divergence.iteration) csp_mat <- getCSPEstimates(parameter, CSP="Selection") opt_codons <- findOptimalCodon(csp_mat) ## End(Not run)
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Fix the value of selection its current value
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Fix the value of mutation its current value
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Fix the value of s_phi (standard deviation of lognormal for synthesis rates) at its current value
geomMean
will calculate the geometric mean of a list of numerical values.
geomMean(x, rm.invalid = TRUE, default = 1e-05)
geomMean(x, rm.invalid = TRUE, default = 1e-05)
x |
A vector of numerical . |
rm.invalid |
Boolean value for handling 0, negative, or NA values in the vector. Default is TRUE and will not
include these values in the calculation. If FALSE, these values will be replaced by the value give to |
default |
Numerical value that serves as the value to replace 0, negative, or NA values in the calculation when rm.invalid is FALSE. Default is 1e-5. |
This function is a special version of the geometric mean specifically for AnaCoda.
Most models in Anacoda assume a log normal distribution for phi values, thus all values in x
are expectd to be positive.
geomMean returns the geometric mean of a vector and can handle 0, negative, or NA values.
Returns the geometric mean of a vector.
x <- c(1, 2, 3, 4) geomMean(x) y<- c(1, NA, 3, 4, 0, -1) # Only take the mean of non-Na values greater than 0 geomMean(y) # Replace values <= 0 or NAs with a default value 0.001 and then take the mean geomMean(y, rm.invalid = FALSE, default = 0.001)
x <- c(1, 2, 3, 4) geomMean(x) y<- c(1, NA, 3, 4, 0, -1) # Only take the mean of non-Na values greater than 0 geomMean(y) # Replace values <= 0 or NAs with a default value 0.001 and then take the mean geomMean(y, rm.invalid = FALSE, default = 0.001)
Return sample adaptiveWidth value, which is the number of samples (not iterations) between adapting parameter proposal widths
number of sample steps between adapting proposal widths
getCAI
returns the Codon Adaptation Index for a
genome based on a provided reference.
getCAI(referenceGenome, testGenome, default.weight = 0.5)
getCAI(referenceGenome, testGenome, default.weight = 0.5)
referenceGenome |
A genome object initialized with |
testGenome |
A genome object initialized with |
default.weight |
Default weight to use if codon is missing from referenceGenome |
Returns a named vector with the CAI for each gene
genome_file1 <- system.file("extdata", "more_genes.fasta", package = "AnaCoDa") genome_file2 <- system.file("extdata", "genome.fasta", package = "AnaCoDa") ## reading genome referenceGenome <- initializeGenomeObject(file = genome_file1) testGenome <- initializeGenomeObject(file = genome_file2) cai <- getCAI(referenceGenome, testGenome)
genome_file1 <- system.file("extdata", "more_genes.fasta", package = "AnaCoDa") genome_file2 <- system.file("extdata", "genome.fasta", package = "AnaCoDa") ## reading genome referenceGenome <- initializeGenomeObject(file = genome_file1) testGenome <- initializeGenomeObject(file = genome_file2) cai <- getCAI(referenceGenome, testGenome)
getCAIweights
returns the weights for the Codon Adaptation Index
based on a reference genome.
getCAIweights(referenceGenome, default.weight = 0.5)
getCAIweights(referenceGenome, default.weight = 0.5)
referenceGenome |
A genome object initialized with |
default.weight |
Set default weight for any codon not observed in the reference genome |
Returns a named vector with the CAI weights for each codon
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") ## reading genome referenceGenome <- initializeGenomeObject(file = genome_file) wi <- getCAIweights(referenceGenome)
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") ## reading genome referenceGenome <- initializeGenomeObject(file = genome_file) wi <- getCAIweights(referenceGenome)
provides the codon counts for a fiven amino acid across all genes
getCodonCounts(genome)
getCodonCounts(genome)
genome |
A genome object from which the counts of each codon can be obtained. |
The returned matrix containes a row for each gene and a column
for each synonymous codon of aa
.
Returns a data.frame storing the codon counts for each amino acid.
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") ## reading genome genome <- initializeGenomeObject(file = genome_file) counts <- getCodonCounts(genome)
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") ## reading genome genome <- initializeGenomeObject(file = genome_file) counts <- getCodonCounts(genome)
provides the codon counts for a fiven amino acid across all genes
getCodonCountsForAA(aa, genome)
getCodonCountsForAA(aa, genome)
aa |
One letter code of the amino acid for which the codon counts should be returned |
genome |
A genome object from which the counts of each codon can be obtained. |
The returned matrix containes a row for each gene and a coloumn
for each synonymous codon of aa
.
Returns a data.frame storing the codon counts for the specified amino acid.
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") ## reading genome genome <- initializeGenomeObject(file = genome_file) counts <- getCodonCountsForAA("A", genome)
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") ## reading genome genome <- initializeGenomeObject(file = genome_file) counts <- getCodonCountsForAA("A", genome)
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Calculate codon-specific parameter (CSP) posterior mean
mixtureElement |
mixture to calculate CSP posterior mean. Should be between 1 and n, where n is number of mixtures. |
samples |
number of samples to use for calculating posterior mean |
codon |
codon to calculate CSP |
paramType |
CSP to calculate posterior mean for. 0: Mutation (ROC,FONSE) or Alpha (PA, PANSE). 1: Selection (ROC,FONSE), Lambda (PANSE), Lambda^prime (PA). 2: NSERate (PANSE) |
withoutReference |
If model uses reference codon, then ignore this codon (fixed at 0). Should be TRUE for ROC and FONSE. Should be FALSE for PA and PANSE. |
log_scale |
If true, calculate posterior mean on log scale. Should only be used for PA and PANSE. |
posterior mean value for CSP
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Calculate codon-specific parameter (CSP) variance
mixtureElement |
mixture to calculate CSP variance. Should be between 1 and n, where n is number of mixtures. |
samples |
number of samples to use for calculating variance |
codon |
codon to calculate CSP |
paramType |
CSP to calculate variance for. 0: Mutation (ROC,FONSE) or Alpha (PA, PANSE). 1: Selection (ROC,FONSE), Lambda (PANSE), Lambda^prime (PA). 2: NSERate (PANSE) |
unbiased |
If TRUE, should calculate variance using unbiased (N-1). Otherwise, used biased (N) correction |
withoutReference |
If model uses reference codon, then ignore this codon (fixed at 0). Should be TRUE for ROC and FONSE. Should be FALSE for PA and PANSE. |
log_scale |
If true, calculate posterior mean on log scale. Should only be used for PA and PANSE. |
variance over trace for CSP
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Calculate quantiles of CSP traces
mixtureElement |
mixture to calculate CSP variance. Should be between 1 and n, where n is number of mixtures. |
samples |
number of samples to use for calculating variance |
codon |
codon to calculate CSP |
paramType |
CSP to calculate variance for. 0: Mutation (ROC,FONSE) or Alpha (PA, PANSE). 1: Selection (ROC,FONSE), Lambda (PANSE), Lambda^prime (PA). 2: NSERate (PANSE) |
probs |
vector of two doubles between 0 and 1 indicating range over which to calculate quantiles. <0.0275, 0.975> would give 95% quantiles. |
withoutReference |
If model uses reference codon, then ignore this codon (fixed at 0). Should be TRUE for ROC and FONSE. Should be FALSE for PA and PANSE. |
log_scale |
If true, calculate posterior mean on log scale. Should only be used for PA and PANSE. |
vector representing lower and upper bound of quantile
getCSPEstimates
returns the codon specific
parameter estimates for a given parameter and mixture or write it to a csv file.
getCSPEstimates( parameter, filename = NULL, mixture = 1, samples = 10, relative.to.optimal.codon = T, report.original.ref = T, log.scale = F )
getCSPEstimates( parameter, filename = NULL, mixture = 1, samples = 10, relative.to.optimal.codon = T, report.original.ref = T, log.scale = F )
parameter |
parameter an object created by |
filename |
Posterior estimates will be written to file (format: csv). Filename will be in the format <parameter_name>_<filename>.csv. |
mixture |
estimates for which mixture should be returned |
samples |
The number of samples used for the posterior estimates. |
relative.to.optimal.codon |
Boolean determining if parameters should be relative to the preferred codon or the alphabetically last codon (Default=TRUE). Only applies to ROC and FONSE models |
report.original.ref |
Include the original reference codon (Default = TRUE). Note this is only included for the purposes of simulations, which expect the input parameter file to be in a specific format. Later version of AnaCoDa will remove this. |
log.scale |
Calculate posterior means, standard deviation, and posterior probability intervals on the natural log scale. Should be used for PA and PANSE models only. |
returns a list data.frame with the posterior estimates of the models
codon specific parameters or writes it directly to a csv file if filename
is specified
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") genome <- initializeGenomeObject(file = genome_file) sphi_init <- c(1,1) numMixtures <- 2 geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2))) parameter <- initializeParameterObject(genome = genome, sphi = sphi_init, num.mixtures = numMixtures, gene.assignment = geneAssignment, mixture.definition = "allUnique") model <- initializeModelObject(parameter = parameter, model = "ROC") samples <- 2500 thinning <- 50 adaptiveWidth <- 25 mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth, est.expression=TRUE, est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE) divergence.iteration <- 10 ## Not run: runMCMC(mcmc = mcmc, genome = genome, model = model, ncores = 4, divergence.iteration = divergence.iteration) ## return estimates for codon specific parameters csp_mat <- getCSPEstimates(parameter) # write the result directly to the filesystem as a csv file. No values are returned getCSPEstimates(parameter, filename=file.path(tempdir(), "test.csv")) ## End(Not run)
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") genome <- initializeGenomeObject(file = genome_file) sphi_init <- c(1,1) numMixtures <- 2 geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2))) parameter <- initializeParameterObject(genome = genome, sphi = sphi_init, num.mixtures = numMixtures, gene.assignment = geneAssignment, mixture.definition = "allUnique") model <- initializeModelObject(parameter = parameter, model = "ROC") samples <- 2500 thinning <- 50 adaptiveWidth <- 25 mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth, est.expression=TRUE, est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE) divergence.iteration <- 10 ## Not run: runMCMC(mcmc = mcmc, genome = genome, model = model, ncores = 4, divergence.iteration = divergence.iteration) ## return estimates for codon specific parameters csp_mat <- getCSPEstimates(parameter) # write the result directly to the filesystem as a csv file. No values are returned getCSPEstimates(parameter, filename=file.path(tempdir(), "test.csv")) ## End(Not run)
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Get estimated mixture assignment for gene
samples |
number of samples over which to calculate mixture assignment |
geneIndex |
corresponding index of gene in genome. Should be a number between 1 and length(genome). |
returns value between 1 and n, where n is number of mixtures
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Get estimated mixture assignment probabilities for gene
samples |
number of samples over which to calculate mixture assignment probabilities |
geneIndex |
corresponding index of gene in genome. Should be a number between 1 and length(genome). |
returns vector of probabilities representing mixture probabilities for gene
Posterior estimates for the phi value of specified genes
getExpressionEstimates( parameter, gene.index, samples, quantiles = c(0.025, 0.975), genome = NULL )
getExpressionEstimates( parameter, gene.index, samples, quantiles = c(0.025, 0.975), genome = NULL )
parameter |
on object created by |
gene.index |
a integer or vector of integers representing the gene(s) of interesst. |
samples |
number of samples for the posterior estimate |
quantiles |
vector of quantiles, (default: c(0.025, 0.975)) |
genome |
if genome is given, then will include gene ids in output (default is NULL) |
The returned vector is unnamed as gene ids are only stored in the genome
object,
but the gene.index
vector can be used to match the assignment to the genome.
returns a vector with the mixture assignment of each gene corresbonding to gene.index
in the same order as the genome.
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") genome <- initializeGenomeObject(file = genome_file) sphi_init <- c(1,1) numMixtures <- 2 geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2))) parameter <- initializeParameterObject(genome = genome, sphi = sphi_init, num.mixtures = numMixtures, gene.assignment = geneAssignment, mixture.definition = "allUnique") model <- initializeModelObject(parameter = parameter, model = "ROC") samples <- 2500 thinning <- 50 adaptiveWidth <- 25 mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth, est.expression=TRUE, est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE) divergence.iteration <- 10 ## Not run: runMCMC(mcmc = mcmc, genome = genome, model = model, ncores = 4, divergence.iteration = divergence.iteration) # get the estimated expression values for all genes based on the mixture # they are assigned to at each step estimatedExpression <- getExpressionEstimates(parameter, 1:length(genome), 1000) ## End(Not run)
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") genome <- initializeGenomeObject(file = genome_file) sphi_init <- c(1,1) numMixtures <- 2 geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2))) parameter <- initializeParameterObject(genome = genome, sphi = sphi_init, num.mixtures = numMixtures, gene.assignment = geneAssignment, mixture.definition = "allUnique") model <- initializeModelObject(parameter = parameter, model = "ROC") samples <- 2500 thinning <- 50 adaptiveWidth <- 25 mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth, est.expression=TRUE, est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE) divergence.iteration <- 10 ## Not run: runMCMC(mcmc = mcmc, genome = genome, model = model, ncores = 4, divergence.iteration = divergence.iteration) # get the estimated expression values for all genes based on the mixture # they are assigned to at each step estimatedExpression <- getExpressionEstimates(parameter, 1:length(genome), 1000) ## End(Not run)
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Get amino acids (ROC, FONSE) or codons (PA, PANSE) for which parameters will be estimated
returns list of amino acids or codons
Method of MCMC class (access via mcmc$<function name>, where mcmc is an object initialized by initializeMCMCObject). Returns the logLikelihood trace
vector representing logLikelihood trace
Method of MCMC class (access via mcmc$<function name>, where mcmc is an object initialized by initializeMCMCObject). Calculate the mean log posterior probability over the last n samples
samples |
postive value less than total length of the MCMC trace |
mean logPosterior
Method of MCMC class (access via mcmc$<function name>, where mcmc is an object initialized by initializeMCMCObject). Returns the logPosterior trace
vector representing logPosterior trace
Posterior estimates for the mixture assignment of specified genes
getMixtureAssignmentEstimate(parameter, gene.index, samples)
getMixtureAssignmentEstimate(parameter, gene.index, samples)
parameter |
on object created by |
gene.index |
a integer or vector of integers representing the gene(s) of interesst. |
samples |
number of samples for the posterior estimate |
The returned vector is unnamed as gene ids are only stored in the genome
object,
but the gene.index
vector can be used to match the assignment to the genome.
returns a vector with the mixture assignment of each gene corresbonding to gene.index
in the same order as the genome.
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") genome <- initializeGenomeObject(file = genome_file) sphi_init <- c(1,1) numMixtures <- 2 geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2))) parameter <- initializeParameterObject(genome = genome, sphi = sphi_init, num.mixtures = numMixtures, gene.assignment = geneAssignment, mixture.definition = "allUnique") model <- initializeModelObject(parameter = parameter, model = "ROC") samples <- 2500 thinning <- 50 adaptiveWidth <- 25 mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth, est.expression=TRUE, est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE) divergence.iteration <- 10 ## Not run: runMCMC(mcmc = mcmc, genome = genome, model = model, ncores = 4, divergence.iteration = divergence.iteration) # get the mixture assignment for all genes mixAssign <- getMixtureAssignmentEstimate(parameter = parameter, gene.index = 1:length(genome), samples = 1000) # get the mixture assignment for a subsample mixAssign <- getMixtureAssignmentEstimate(parameter = parameter, gene.index = 5:100, samples = 1000) # or mixAssign <- getMixtureAssignmentEstimate(parameter = parameter, gene.index = c(10, 30:50, 3, 90), samples = 1000) ## End(Not run)
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") genome <- initializeGenomeObject(file = genome_file) sphi_init <- c(1,1) numMixtures <- 2 geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2))) parameter <- initializeParameterObject(genome = genome, sphi = sphi_init, num.mixtures = numMixtures, gene.assignment = geneAssignment, mixture.definition = "allUnique") model <- initializeModelObject(parameter = parameter, model = "ROC") samples <- 2500 thinning <- 50 adaptiveWidth <- 25 mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth, est.expression=TRUE, est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE) divergence.iteration <- 10 ## Not run: runMCMC(mcmc = mcmc, genome = genome, model = model, ncores = 4, divergence.iteration = divergence.iteration) # get the mixture assignment for all genes mixAssign <- getMixtureAssignmentEstimate(parameter = parameter, gene.index = 1:length(genome), samples = 1000) # get the mixture assignment for a subsample mixAssign <- getMixtureAssignmentEstimate(parameter = parameter, gene.index = 5:100, samples = 1000) # or mixAssign <- getMixtureAssignmentEstimate(parameter = parameter, gene.index = c(10, 30:50, 3, 90), samples = 1000) ## End(Not run)
returns the identifiers of the genes within the genome specified.
getNames(genome, simulated = FALSE)
getNames(genome, simulated = FALSE)
genome |
A genome object initialized with |
simulated |
A logical value denoting if the gene names to be listed are simulated or not. The default value is FALSE. |
gene.names Returns the names of the genes as a vector of strings.
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") ## reading genome genome <- initializeGenomeObject(file = genome_file) ## return all gene ids for the genome geneIDs <- getNames(genome, FALSE)
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") ## reading genome genome <- initializeGenomeObject(file = genome_file) ## return all gene ids for the genome geneIDs <- getNames(genome, FALSE)
getNc
returns the Effective Number of Codons for a genome.
getNc(genome)
getNc(genome)
genome |
A genome object initialized with |
Returns a named vector with the Effective Number of Codons for each gene
genome_file <- system.file("extdata", "more_genes.fasta", package = "AnaCoDa") ## reading genome genome <- initializeGenomeObject(file = genome_file) nc <- getNc(genome)
genome_file <- system.file("extdata", "more_genes.fasta", package = "AnaCoDa") ## reading genome genome <- initializeGenomeObject(file = genome_file) nc <- getNc(genome)
getNcAA
returns the Effective Number of Codons for each Amino Acid.
getNcAA(genome)
getNcAA(genome)
genome |
A genome object initialized with |
Returns an object of type data.frame
with the Effective Number of Codons
for each amino acid in each gene.
genome_file <- system.file("extdata", "more_genes.fasta", package = "AnaCoDa") ## reading genome genome <- initializeGenomeObject(file = genome_file) nc <- getNcAA(genome)
genome_file <- system.file("extdata", "more_genes.fasta", package = "AnaCoDa") ## reading genome genome <- initializeGenomeObject(file = genome_file) nc <- getNcAA(genome)
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Calculate posterior mean of standard deviation parameter of lognormal describing distribution of synthesis rates
index |
mixture index to use. Should be number between 0 and n-1, where n is number of mixtures |
samples |
number of samples over which to calculate posterior mean |
returns posterior mean for standard deviation of lognormal distribution of synthesis rates
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Calculate variance of noise offset parameter used when fitting model with empirical estimates of synthesis rates (ie. withPhi fits)
index |
mixture index to use. Should be number between 0 and n-1, where n is number of mixtures |
samples |
number of samples over which to calculate variance |
unbiased |
If TRUE, should calculate variance using unbiased (N-1). Otherwise, used biased (N) correction |
returns variance for noise offset
getObservedSynthesisRateSet
returns the observed
synthesis rates of the genes within the genome specified.
getObservedSynthesisRateSet(genome, simulated = FALSE)
getObservedSynthesisRateSet(genome, simulated = FALSE)
genome |
A genome object initialized with |
simulated |
A logical value denoting if the synthesis rates to be listed are simulated or not. The default value is FALSE. |
Returns a data.frame with the observed expression values in genome
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") expression_file <- system.file("extdata", "expression.csv", package = "AnaCoDa") ## reading genome genome <- initializeGenomeObject(file = genome_file) ## return expression values as a data.frame with gene ids in the first column. expressionValues <- getObservedSynthesisRateSet(genome = genome)
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") expression_file <- system.file("extdata", "expression.csv", package = "AnaCoDa") ## reading genome genome <- initializeGenomeObject(file = genome_file) ## return expression values as a data.frame with gene ids in the first column. expressionValues <- getObservedSynthesisRateSet(genome = genome)
Method of MCMC class (access via mcmc$<function name>, where mcmc is an object initialized by initializeMCMCObject). Return number of samples set for MCMCAlgorithm object
number of samples used during MCMC
getSelectionCoefficients
calculates the selection coefficient of each codon in each gene.
getSelectionCoefficients(genome, parameter, samples = 100)
getSelectionCoefficients(genome, parameter, samples = 100)
genome |
A genome object initialized with
|
parameter |
an object created by |
samples |
The number of samples used for the posterior estimates. |
A matrix with selection coefficients.
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") genome <- initializeGenomeObject(file = genome_file) sphi_init <- 1 numMixtures <- 1 geneAssignment <- rep(1, length(genome)) parameter <- initializeParameterObject(genome = genome, sphi = sphi_init, num.mixtures = numMixtures, gene.assignment = geneAssignment, mixture.definition = "allUnique") model <- initializeModelObject(parameter = parameter, model = "ROC") samples <- 2500 thinning <- 50 adaptiveWidth <- 25 mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth, est.expression=TRUE, est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE) divergence.iteration <- 10 ## Not run: runMCMC(mcmc = mcmc, genome = genome, model = model, ncores = 4, divergence.iteration = divergence.iteration) ## return estimates for selection coefficients s for each codon in each gene selection.coefficients <- getSelectionCoefficients(genome = genome, parameter = parameter, samples = 1000) ## End(Not run)
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") genome <- initializeGenomeObject(file = genome_file) sphi_init <- 1 numMixtures <- 1 geneAssignment <- rep(1, length(genome)) parameter <- initializeParameterObject(genome = genome, sphi = sphi_init, num.mixtures = numMixtures, gene.assignment = geneAssignment, mixture.definition = "allUnique") model <- initializeModelObject(parameter = parameter, model = "ROC") samples <- 2500 thinning <- 50 adaptiveWidth <- 25 mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth, est.expression=TRUE, est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE) divergence.iteration <- 10 ## Not run: runMCMC(mcmc = mcmc, genome = genome, model = model, ncores = 4, divergence.iteration = divergence.iteration) ## return estimates for selection coefficients s for each codon in each gene selection.coefficients <- getSelectionCoefficients(genome = genome, parameter = parameter, samples = 1000) ## End(Not run)
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Calculate posterior mean of standard deviation parameter of lognormal describing distribution of synthesis rates
samples |
number of samples over which to calculate posterior mean |
mixture |
mixture index to use. Should be number between 0 and n-1, where n is number of mixtures |
returns posterior mean for standard deviation of lognormal distribution of synthesis rates
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Calculate variance of standard deviation parameter of lognormal describing distribution of synthesis rates
samples |
number of samples over which to calculate variance |
mixture |
mixture index to use. Should be number between 0 and n-1, where n is number of mixtures |
unbiased |
If TRUE, should calculate variance using unbiased (N-1). Otherwise, used biased (N) correction |
returns variance for standard deviation of lognormal distribution of synthesis rates
Method of MCMC class (access via mcmc$<function name>, where mcmc is an object initialized by initializeMCMCObject). Return number of iterations (total iterations = samples * thinning) to allow proposal widths to adapt
number of sample steps to adapt
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Get current synthesis rates for all genes and all mixtures
2 by 2 vector of numeric values
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Get posterior mean synthesis rate value for a gene
samples |
number of samples over which to calculate mean |
geneIndex |
corresponding index of gene in genome for which posterior mean synthesis rate will be calculated. Should be a number between 1 and length(genome) |
log_scale |
Calculate posterior mean on log scale |
posterior mean synthesis rate for gene
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Get synthesis rate variance for a gene
samples |
number of samples over which to calculate variance |
geneIndex |
corresponding index of gene in genome for which synthesis rate variance will be calculated. Should be a number between 1 and length(genome) |
unbiased |
Should calculate variance using unbiased (N-1) or biased (N) correction |
log_scale |
Calculate variance on log scale |
posterior mean synthesis rate for gene
Method of MCMC class (access via mcmc$<function name>, where mcmc is an object initialized by initializeMCMCObject). Return thinning value, which is the number of iterations (total iterations = samples * thinning) not being kept
thinning value used during MCMC
extracts an object of traces from a parameter object.
getTrace(parameter)
getTrace(parameter)
parameter |
A Parameter object that corresponds to one of the model types. |
trace Returns an object of type Trace extracted from the given parameter object
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") genome <- initializeGenomeObject(file = genome_file) sphi_init <- c(1,1) numMixtures <- 2 geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2))) parameter <- initializeParameterObject(genome = genome, sphi = sphi_init, num.mixtures = numMixtures, gene.assignment = geneAssignment, mixture.definition = "allUnique") trace <- getTrace(parameter) # empty trace object since no MCMC was perfomed
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") genome <- initializeGenomeObject(file = genome_file) sphi_init <- c(1,1) numMixtures <- 2 geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2))) parameter <- initializeParameterObject(genome = genome, sphi = sphi_init, num.mixtures = numMixtures, gene.assignment = geneAssignment, mixture.definition = "allUnique") trace <- getTrace(parameter) # empty trace object since no MCMC was perfomed
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Get Trace object stored by a Parameter object. Useful for plotting certain parameter traces.
Trace object
Initialize Covariance Matrices
initializeCovarianceMatrices( parameter, genome, numMixtures, geneAssignment, init.csp.variance = 0.0025 )
initializeCovarianceMatrices( parameter, genome, numMixtures, geneAssignment, init.csp.variance = 0.0025 )
parameter |
A Parameter object that corresponds to one of the model types. Valid values are "ROC", "PA", and "FONSE". |
genome |
An object of type Genome necessary for the initialization of the Parameter object. |
numMixtures |
The number of mixture elements for the underlying mixture distribution (numMixtures > 0). |
geneAssignment |
A vector holding the initial mixture assignment for each gene. The vector length has to equal the number of genes in the genome. Valid values for the vector range from 1 to numMixtures. It is possible but not advised to leave a mixture element empty. |
init.csp.variance |
initial proposal variance for codon specific parameter, default is 0.0025. |
parameter Returns the Parameter argument, now modified with initialized mutation, selection, and covariance matrices.
initializeGenomeObject
initializes the Rcpp Genome object
initializeGenomeObject( file, genome = NULL, observed.expression.file = NULL, fasta = TRUE, positional = FALSE, match.expression.by.id = TRUE, append = FALSE )
initializeGenomeObject( file, genome = NULL, observed.expression.file = NULL, fasta = TRUE, positional = FALSE, match.expression.by.id = TRUE, append = FALSE )
file |
A file of coding sequences in fasta or RFPData format |
genome |
A genome object can be passed in to concatenate the input file to it (optional). |
observed.expression.file |
String containing the location of a file containing empirical expression rates (optional). Default value is NULL. |
fasta |
Boolean value indicating whether |
positional |
Boolean indicating if the positional information in the RFPData file is necessary. Default value is FALSE |
match.expression.by.id |
If TRUE, observed expression values will be assigned by matching sequence identifier. If FALSE, observed expression values will be assigned by order. Default value is TRUE. |
append |
If TRUE, function will read in additional genome data to append to an existing genome. If FALSE, genome data is cleared before reading in data (no preexisting data). Default value is FALSE. |
This function returns the initialized Genome object.
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") genes_file <- system.file("extdata", "more_genes.fasta", package = "AnaCoDa") expression_file <- system.file("extdata", "expression.csv", package = "AnaCoDa") ## reading genome genome <- initializeGenomeObject(file = genome_file) ## reading genome and observed expression data genome <- initializeGenomeObject(file = genome_file, observed.expression.file = expression_file) ## add aditional genes to existing genome genome <- initializeGenomeObject(file = genome_file) genome <- initializeGenomeObject(file = genes_file, genome = genome, append = TRUE)
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") genes_file <- system.file("extdata", "more_genes.fasta", package = "AnaCoDa") expression_file <- system.file("extdata", "expression.csv", package = "AnaCoDa") ## reading genome genome <- initializeGenomeObject(file = genome_file) ## reading genome and observed expression data genome <- initializeGenomeObject(file = genome_file, observed.expression.file = expression_file) ## add aditional genes to existing genome genome <- initializeGenomeObject(file = genome_file) genome <- initializeGenomeObject(file = genes_file, genome = genome, append = TRUE)
initializeMCMCObject
initializes a MCMC object to
perform a model fitting for a parameter and model object.
initializeMCMCObject( samples, thinning = 1, adaptive.width = 100, est.expression = TRUE, est.csp = TRUE, est.hyper = TRUE, est.mix = TRUE )
initializeMCMCObject( samples, thinning = 1, adaptive.width = 100, est.expression = TRUE, est.csp = TRUE, est.hyper = TRUE, est.mix = TRUE )
samples |
Number of samples to be produced when running the MCMC algorithm. No default value. |
thinning |
The thinning interval between consecutive observations. If set to 1, every step will be saved as a sample. Default value is 1. |
adaptive.width |
Number that determines how often the acceptance/rejection
window should be altered. Default value is 100 samples.
Proportion of MCMC steps where the proposal distribution is adaptive can be set using |
est.expression |
Boolean that tells whether or not synthesis rate values should be estimated in the MCMC algorithm run. Default value is TRUE. |
est.csp |
Boolean that tells whether or not codon specific values should be estimated in the MCMC algorithm run. Default value is TRUE. |
est.hyper |
Boolean that tells whether or not hyper parameters
should be estimated in the MCMC algorithm run. Default value is TRUE.
Setting for expression noise parameter sepsilon can be overridden by setting |
est.mix |
Boolean that tells whether or not the genes' mixture element should be estimated in the MCMC algorithm run. Default value is TRUE. |
initializeMCMCObject
sets up the MCMC object
(monte carlo markov chain) and returns the object so a model fitting can be done.
It is important to note that est.expression and est.hyper will affect one another
negatively if their values differ.
mcmc Returns an intialized MCMC object.
## initializing an object of type mcmc samples <- 2500 thinning <- 50 adaptiveWidth <- 25 ## estimate all parameter types mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth, est.expression=TRUE, est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE) ## do not estimate expression values, initial conditions will remain constant mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth, est.expression=FALSE, est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE) ## do not estimate hyper parameters, initial conditions will remain constant mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth, est.expression=TRUE, est.csp=TRUE, est.hyper=FALSE, est.mix = TRUE)
## initializing an object of type mcmc samples <- 2500 thinning <- 50 adaptiveWidth <- 25 ## estimate all parameter types mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth, est.expression=TRUE, est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE) ## do not estimate expression values, initial conditions will remain constant mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth, est.expression=FALSE, est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE) ## do not estimate hyper parameters, initial conditions will remain constant mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth, est.expression=TRUE, est.csp=TRUE, est.hyper=FALSE, est.mix = TRUE)
initializes the model object.
initializeModelObject( parameter, model = "ROC", with.phi = FALSE, fix.observation.noise = FALSE, rfp.count.column = 1 )
initializeModelObject( parameter, model = "ROC", with.phi = FALSE, fix.observation.noise = FALSE, rfp.count.column = 1 )
parameter |
An object created with |
model |
A string containing the model to run (ROC, FONSE, or PA), has to match parameter object. |
with.phi |
(ROC only) A boolean that determines whether or not to include empirical phi values (expression rates) for the calculations. Default value is FALSE |
fix.observation.noise |
(ROC only) Allows fixing the noise term sepsilon in the observed expression dataset to its initial condition. This value should override the est.hyper=TRUE setting in |
rfp.count.column |
(PA and PANSE only) A number representing the RFP count column to use. Default value is 1. |
initializeModelObject initializes a model. The type of model is determined based on the string passed to the model
argument.
The Parameter object has to match the model that is initialized. E.g. to initialize a ROC model,
it is required that a ROC parameter object is passed to the function.
This function returns the model object created.
#initializing a model object genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") expression_file <- system.file("extdata", "expression.csv", package = "AnaCoDa") genome <- initializeGenomeObject(file = genome_file, observed.expression.file = expression_file) sphi_init <- c(1,1) numMixtures <- 2 geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2))) parameter <- initializeParameterObject(genome = genome, sphi = sphi_init, num.mixtures = numMixtures, gene.assignment = geneAssignment, mixture.definition = "allUnique") # initializing a model object assuming we have observed expression (phi) # values stored in the genome object. initializeModelObject(parameter = parameter, model = "ROC", with.phi = TRUE) # initializing a model object ignoring observed expression (phi) # values stored in the genome object. initializeModelObject(parameter = parameter, model = "ROC", with.phi = FALSE)
#initializing a model object genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") expression_file <- system.file("extdata", "expression.csv", package = "AnaCoDa") genome <- initializeGenomeObject(file = genome_file, observed.expression.file = expression_file) sphi_init <- c(1,1) numMixtures <- 2 geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2))) parameter <- initializeParameterObject(genome = genome, sphi = sphi_init, num.mixtures = numMixtures, gene.assignment = geneAssignment, mixture.definition = "allUnique") # initializing a model object assuming we have observed expression (phi) # values stored in the genome object. initializeModelObject(parameter = parameter, model = "ROC", with.phi = TRUE) # initializing a model object ignoring observed expression (phi) # values stored in the genome object. initializeModelObject(parameter = parameter, model = "ROC", with.phi = FALSE)
initializeParameterObject
initializes a new parameter object or reconstructs one from a restart file
initializeParameterObject( genome = NULL, sphi = NULL, num.mixtures = 1, gene.assignment = NULL, initial.expression.values = NULL, model = "ROC", split.serine = TRUE, mixture.definition = "allUnique", mixture.definition.matrix = NULL, init.with.restart.file = NULL, mutation.prior.mean = 0, mutation.prior.sd = 0.35, propose.by.prior = FALSE, init.csp.variance = 0.0025, init.sepsilon = 0.1, init.w.obs.phi = FALSE, init.initiation.cost = 4, init.partition.function = 1 )
initializeParameterObject( genome = NULL, sphi = NULL, num.mixtures = 1, gene.assignment = NULL, initial.expression.values = NULL, model = "ROC", split.serine = TRUE, mixture.definition = "allUnique", mixture.definition.matrix = NULL, init.with.restart.file = NULL, mutation.prior.mean = 0, mutation.prior.sd = 0.35, propose.by.prior = FALSE, init.csp.variance = 0.0025, init.sepsilon = 0.1, init.w.obs.phi = FALSE, init.initiation.cost = 4, init.partition.function = 1 )
genome |
An object of type Genome necessary for the initialization of the Parameter object. The default value is NULL. |
sphi |
Initial values for sphi. Expected is a vector of length numMixtures. The default value is NULL. |
num.mixtures |
The number of mixtures elements for the underlying mixture distribution (numMixtures > 0). The default value is 1. |
gene.assignment |
A vector holding the initial mixture assignment for each gene. The vector length has to equal the number of genes in the genome. Valid values for the vector range from 1 to numMixtures. It is possible but not advised to leave a mixture element empty. The default Value is NULL. |
initial.expression.values |
(Optional) A vector with intial phi values. The length of the vector has to equal the number of genes in the Genome object and the order of the genes should match the order of the genes in the Genome. The default value is NULL. |
model |
Specifies the model used. Valid options are "ROC", "PA", "PANSE", or "FONSE". The default model is "ROC". ROC is described in Gilchrist et al. 2015. PA, PANSE and FONSE are currently unpublished. |
split.serine |
Whether serine should be considered as one or two amino acids when running the model. TRUE and FALSE are the only valid values. The default value for split.serine is TRUE. |
mixture.definition |
A string describing how each mixture should be treated with respect to mutation and selection. Valid values consist of "allUnique", "mutationShared", and "selectionShared". The default value for mixture.definition is "allUnique". See details for more information. |
mixture.definition.matrix |
A matrix representation of how the mutation and selection categories correspond to the mixtures. The default value for mixture.definition.matrix is NULL. If provided, the model will use the matrix to initialize the mutation and selection categories instead of the definition listed directly above. See details for more information. |
init.with.restart.file |
File name containing information to reinitialize a previous Parameter object. If given, all other arguments will be ignored. The default value for init.with.restart.file is NULL. |
mutation.prior.mean |
Controlling the mean of the normal prior on mutation paramters. If passed in as single number (default is 0), this will be the mean value for all categories, for all codons. User may also supply a vector with n * 40 values, where n is the number of mutation categories. Future versions will check the number of rows matches the number of mutation categories definded by user. |
mutation.prior.sd |
Controlling the standard deviation of the normal prior on the mutation parameters. If passed in as single number (default is 0.35), this will be the standard deviation value for all categories, for all codons. User may also supply a vector with n * 40 values, where n is the number of mutation categories. Future versions will check the number of rows matches the number of mutation categories definded by user. |
propose.by.prior |
Mutation bias parameters will be proposed based on the means and standard deviations set in mutation.prior.mean and mutation.prior.sd |
init.csp.variance |
specifies the initial proposal width for codon specific parameter (default is 0.0025). The proposal width adapts during the runtime to reach a taget acceptance rate of ~0.25 |
init.sepsilon |
specifies the initial value for sepsilon. default is 0.1 |
init.w.obs.phi |
If TRUE, initialize phi values with observed phi values (data from RNAseq, mass spectrometry, ribosome footprinting) Default is FALSE. If multiple observed phi values exist for a gene, the geometric mean of these values is used as initial phi. When using this function, one should remove any genes with missing phi values, as these genes will not have an initial phi value. |
init.initiation.cost |
FOR FONSE ONLY. Initializes the initiation cost a_1 at this value. |
init.partition.function |
FOR PANSE ONLY. initializes the partition function Z. |
initializeParameterObject
checks the values of the arguments
given to insure the values are valid.
The mixture definition and mixture definition matrix describes how the mutation
and selection categories are set up with respect to the number of mixtures. For
example, if mixture.definition = "allUnique" and numMixtures = 3, a matrix
representation would be matrix(c(1,2,3,1,2,3), ncol=2)
where each row represents a mixture, the first column represents the mutation
category, and the second column represents the selection category.
Another example would be mixture.definition = "selectionShared" and numMixtures = 4 (
matrix(c(1,2,3,4,1,1,1,1), ncol=2)
).
In this case, the selection category is the same for every mixture. If a matrix
is given, and it is valid, then the mutation/selection relationship will be
defined by the given matrix and the keyword will be ignored. A matrix should only
be given in cases where the keywords would not create the desired matrix.
parameter Returns an initialized Parameter object.
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") restart_file <- system.file("extdata", "restart_file.rst", package = "AnaCoDa") genome <- initializeGenomeObject(file = genome_file) ## initialize a new parameter object sphi_init <- 1 numMixtures <- 1 geneAssignment <- rep(1, length(genome)) parameter <- initializeParameterObject(genome = genome, sphi = sphi_init, num.mixtures = numMixtures, gene.assignment = geneAssignment, mixture.definition = "allUnique") ## re-initialize a parameter object from a restart file. Useful for checkpointing parameter <- initializeParameterObject(init.with.restart.file = restart_file) ## initialize a parameter object with a custon mixture definition matrix def.matrix <- matrix(c(1,1,1,2), ncol=2) geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2))) parameter <- initializeParameterObject(genome = genome, sphi = c(0.5, 2), num.mixtures = 2, gene.assignment = geneAssignment, mixture.definition.matrix = def.matrix)
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") restart_file <- system.file("extdata", "restart_file.rst", package = "AnaCoDa") genome <- initializeGenomeObject(file = genome_file) ## initialize a new parameter object sphi_init <- 1 numMixtures <- 1 geneAssignment <- rep(1, length(genome)) parameter <- initializeParameterObject(genome = genome, sphi = sphi_init, num.mixtures = numMixtures, gene.assignment = geneAssignment, mixture.definition = "allUnique") ## re-initialize a parameter object from a restart file. Useful for checkpointing parameter <- initializeParameterObject(init.with.restart.file = restart_file) ## initialize a parameter object with a custon mixture definition matrix def.matrix <- matrix(c(1,1,1,2), ncol=2) geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2))) parameter <- initializeParameterObject(genome = genome, sphi = c(0.5, 2), num.mixtures = 2, gene.assignment = geneAssignment, mixture.definition.matrix = def.matrix)
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Initialize synthesis rates using SCUO values calcuated from the genome
genome |
a Genome object |
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Initialize synthesis rates with values passed in as a list
expression |
a list of values to use as initial synthesis rate values. Should be same size as number of genes in genome. |
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Initialize synthesis rates by drawing a from a lognormal distribution with mean = -(sd_phi)^2/2 and sd = sd_phi
sd_phi |
a positive value which will be the standard deviation of the lognormal distribution |
Initialize values for mutation CSP. File should be of comma-separated with header. Three columns should be of order Amino_acid,Codon,Value
files |
list of files containing starting values. Number of files should equal the number of categories. |
numCategories |
number of mutation categories (should be less than or equal to number of mixtures) |
fix |
Can use this parameter to fix mutation at current values (won't change over course of MCMC run) |
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Initialize values for selection CSP. File should be of comma-separated with header. Three columns should be of order Amino_acid,Codon,Value
files |
list of files containing starting values. Number of files should equal the number of categories. |
numCategories |
number of mutation categories (should be less than or equal to number of mixtures) |
fix |
Can use this parameter to fix selection at current values (won't change over course of MCMC run) |
length
gives the length of a genome
## S3 method for class 'Rcpp_Genome' length(x)
## S3 method for class 'Rcpp_Genome' length(x)
x |
A genome object initialized with |
returns the number of genes in a genome
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") ## reading genome genome <- initializeGenomeObject(file = genome_file) length(genome) # 10
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") ## reading genome genome <- initializeGenomeObject(file = genome_file) length(genome) # 10
loadMCMCObject
creates a new MCMC object and fills it with
the information in the file given.
loadMCMCObject(files)
loadMCMCObject(files)
files |
The filenames where the data will be stored. |
This MCMC object is not intended to be used to do another model fitting, only to graph the stored results.
This function has no return value.
## loading mcmc objects from the filesystem ## Not run: # load one mcmc object mcmc <- loadMCMCObject(files = "mcmc.Rda") # load and combine multiple mcmc objects. Useful when using checkpointing mcmc <- loadMCMCObject(files = c("mcmc1.Rda", "mcmc2.Rda")) ## End(Not run)
## loading mcmc objects from the filesystem ## Not run: # load one mcmc object mcmc <- loadMCMCObject(files = "mcmc.Rda") # load and combine multiple mcmc objects. Useful when using checkpointing mcmc <- loadMCMCObject(files = c("mcmc1.Rda", "mcmc2.Rda")) ## End(Not run)
loadParameterObject
will load a parameter object from the filesystem
loadParameterObject(files)
loadParameterObject(files)
files |
A list of parameter filenames to be loaded. If multiple files are given, the parameter objects will be concatenated in the order provided |
The function loads one or multiple files. In the case of multiple file, e.g. due to the use of check pointing, the files will be concatenated to one parameter object. See writeParameterObject for the writing of parameter objects
Returns an initialized Parameter object.
## Not run: # load a single parameter object parameter <- loadParameterObject("parameter.Rda") # load and concatenate multiple parameter object parameter <- loadParameterObject(c("parameter1.Rda", "parameter2.Rda")) ## End(Not run)
## Not run: # load a single parameter object parameter <- loadParameterObject("parameter.Rda") # load and concatenate multiple parameter object parameter <- loadParameterObject(c("parameter1.Rda", "parameter2.Rda")) ## End(Not run)
Plots traces from the model object such as synthesis rates for each gene. Will work regardless of whether or not expression/synthesis rate levels are being estimated. If you wish to plot observed/empirical values, these values MUST be set using the initial.expression.values parameter found in initializeParameterObject. Otherwise, the expression values plotted will just be SCUO values estimated upon initialization of the Parameter object.
## S3 method for class 'Rcpp_FONSEModel' plot( x, genome, samples = 100, mixture = 1, simulated = FALSE, codon.window = NULL, ... )
## S3 method for class 'Rcpp_FONSEModel' plot( x, genome, samples = 100, mixture = 1, simulated = FALSE, codon.window = NULL, ... )
x |
An Rcpp model object initialized with |
genome |
An Rcpp genome object initialized with |
samples |
The number of samples in the trace |
mixture |
The mixture for which to graph values. |
simulated |
A boolean value that determines whether to use the simulated genome. |
codon.window |
A boolean value that determines the codon window to use for calculating codon frequencies. If NULL (the default), use complete sequences. |
... |
Optional, additional arguments. For this function, a possible title for the plot in the form of a list if set with "main". |
This function has no return value.
plot
graphs the mutation or selection parameter for a ROC or FONSE
parameter object for each mixture element.
## S3 method for class 'Rcpp_FONSEParameter' plot( x, what = "Mutation", samples = 100, mixture.name = NULL, with.ci = TRUE, ... )
## S3 method for class 'Rcpp_FONSEParameter' plot( x, what = "Mutation", samples = 100, mixture.name = NULL, with.ci = TRUE, ... )
x |
A parameter object |
what |
Which aspect of the parameter to plot. Default value is "Mutation". |
samples |
Number of samples to plot using the posterior mean. Default value is 100. |
mixture.name |
a vector with names/descriptions of the mixture distributions in the parameter object |
with.ci |
Plot with or without confidence intervals. Default value is TRUE |
... |
Arguments to be passed to methods, such as graphical parameters. |
Graphs are based off the last # samples for the posterior mean.
This function has no return value.
This function will plot the logLikelihood trace, and if the Hmisc package is installed, it will plot a subplot of the logLikelihood trace with the first few samples removed.
## S3 method for class 'Rcpp_MCMCAlgorithm' plot(x, what = "LogPosterior", zoom.window = NULL, ...)
## S3 method for class 'Rcpp_MCMCAlgorithm' plot(x, what = "LogPosterior", zoom.window = NULL, ...)
x |
An Rcpp_MCMC object initialized with |
what |
character defining if log(Posterior) (Default) or log(Likelihood) options are: LogPosterior or logLikelihood |
zoom.window |
A vector describing the start and end of the zoom window. |
... |
Arguments to be passed to methods, such as graphical parameters. |
This function has no return value.
Plots traces from the model object such as synthesis rates for each gene. Will work regardless of whether or not expression/synthesis rate levels are being estimated. If you wish to plot observed/empirical values, these values MUST be set using the initial.expression.values parameter found in initializeParameterObject. Otherwise, the expression values plotted will just be SCUO values estimated upon initialization of the Parameter object.
## S3 method for class 'Rcpp_ROCModel' plot(x, genome = NULL, samples = 100, mixture = 1, simulated = FALSE, ...)
## S3 method for class 'Rcpp_ROCModel' plot(x, genome = NULL, samples = 100, mixture = 1, simulated = FALSE, ...)
x |
An Rcpp model object initialized with |
genome |
An Rcpp genome object initialized with |
samples |
The number of samples in the trace |
mixture |
The mixture for which to graph values. |
simulated |
A boolean value that determines whether to use the simulated genome. |
... |
Optional, additional arguments. For this function, a possible title for the plot in the form of a list if set with "main". |
This function has no return value.
plot
graphs the mutation or selection parameter for a ROC or FONSE
parameter object for each mixture element.
## S3 method for class 'Rcpp_ROCParameter' plot( x, what = "Mutation", samples = 100, mixture.name = NULL, with.ci = TRUE, ... )
## S3 method for class 'Rcpp_ROCParameter' plot( x, what = "Mutation", samples = 100, mixture.name = NULL, with.ci = TRUE, ... )
x |
A parameter object |
what |
Which aspect of the parameter to plot. Default value is "Mutation". |
samples |
Number of samples to plot using the posterior mean. Default value is 100. |
mixture.name |
a vector with names/descriptions of the mixture distributions in the parameter object |
with.ci |
Plot with or without confidence intervals. Default value is TRUE |
... |
Arguments to be passed to methods, such as graphical parameters. |
Graphs are based off the last # samples for the posterior mean.
This function has no return value.
Plots different traces, specified with the what
parameter.
## S3 method for class 'Rcpp_Trace' plot( x, what = c("Mutation", "Selection", "MixtureProbability", "Sphi", "Mphi", "Aphi", "Sepsilon", "ExpectedPhi", "Expression", "NSEProb", "NSERate", "InitiationCost", "PartitionFunction"), geneIndex = 1, mixture = 1, log.10.scale = F, ... )
## S3 method for class 'Rcpp_Trace' plot( x, what = c("Mutation", "Selection", "MixtureProbability", "Sphi", "Mphi", "Aphi", "Sepsilon", "ExpectedPhi", "Expression", "NSEProb", "NSERate", "InitiationCost", "PartitionFunction"), geneIndex = 1, mixture = 1, log.10.scale = F, ... )
x |
An Rcpp trace object initialized with |
what |
A string containing one of the following to graph: |
geneIndex |
When plotting expression, the index of the gene to be plotted. |
mixture |
The mixture for which to plot values. |
log.10.scale |
A logical value determining if figures should be plotted on the log.10.scale (default=F). Should not be applied to mutation and selection parameters estimated by ROC/FONSE. |
... |
Optional, additional arguments. For this function, may be a logical value determining if the trace is ROC-based or not. |
This function has no return value.
Plots acceptance ratios for codon-specific parameters. Will be by amino acid for ROC and FONSE models, but will be by codon for PA and PANSE models. Note assumes estimating parameters for all codons.
plotAcceptanceRatios(trace, main = "CSP Acceptance Ratio Traces")
plotAcceptanceRatios(trace, main = "CSP Acceptance Ratio Traces")
trace |
An Rcpp trace object initialized with |
main |
The title of the plot. |
This function has no return value.
Plots a codon-specific set of traces, specified with the type
parameter.
plotCodonSpecificParameters( trace, mixture, type = "Mutation", main = "Mutation Parameter Traces", ROC.or.FONSE = TRUE, log.10.scale = F )
plotCodonSpecificParameters( trace, mixture, type = "Mutation", main = "Mutation Parameter Traces", ROC.or.FONSE = TRUE, log.10.scale = F )
trace |
An Rcpp trace object initialized with |
mixture |
The mixture for which to plot values. |
type |
A string containing one of the following to graph: |
main |
The title of the plot. |
ROC.or.FONSE |
A logical value determining if the Parameter was ROC/FONSE or not. |
log.10.scale |
A logical value determining if figures should be plotted on the log.10.scale (default=F). Should not be applied to mutation and selection parameters estimated by ROC/FONSE. |
This function has no return value.
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Read synthesis rate values from file. File should be two column file <gene_id,phi> and is expected to have a header row
filename |
name of file to be read |
runMCMC
will run a monte carlo markov chain algorithm
for the given mcmc, genome, and model objects to perform a model fitting.
runMCMC(mcmc, genome, model, ncores = 1, divergence.iteration = 0)
runMCMC(mcmc, genome, model, ncores = 1, divergence.iteration = 0)
mcmc |
MCMC object that will run the model fitting algorithm. |
genome |
Genome that the model fitting will run on. Should be the same genome associated with the parameter and model objects. |
model |
Model to run the fitting on. Should be associated with the given genome. |
ncores |
Number of cores to perform the model fitting with. Default value is 1. |
divergence.iteration |
Number of steps that the initial conditions can diverge from the original conditions given. Default value is 0. |
runMCMC
will run for the number of samples times the number
thinning given when the mcmc object is initialized. Updates are provided every 100
steps, and the state of the chain is saved every thinning steps.
This function has no return value.
#fitting a model to a genome using the runMCMC function genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") genome <- initializeGenomeObject(file = genome_file) sphi_init <- c(1,1) numMixtures <- 2 geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2))) parameter <- initializeParameterObject(genome = genome, sphi = sphi_init, num.mixtures = numMixtures, gene.assignment = geneAssignment, mixture.definition = "allUnique") model <- initializeModelObject(parameter = parameter, model = "ROC") samples <- 2500 thinning <- 50 adaptiveWidth <- 25 mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth, est.expression=TRUE, est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE) divergence.iteration <- 10 ## Not run: runMCMC(mcmc = mcmc, genome = genome, model = model, ncores = 4, divergence.iteration = divergence.iteration) ## End(Not run)
#fitting a model to a genome using the runMCMC function genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") genome <- initializeGenomeObject(file = genome_file) sphi_init <- c(1,1) numMixtures <- 2 geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2))) parameter <- initializeParameterObject(genome = genome, sphi = sphi_init, num.mixtures = numMixtures, gene.assignment = geneAssignment, mixture.definition = "allUnique") model <- initializeModelObject(parameter = parameter, model = "ROC") samples <- 2500 thinning <- 50 adaptiveWidth <- 25 mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth, est.expression=TRUE, est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE) divergence.iteration <- 10 ## Not run: runMCMC(mcmc = mcmc, genome = genome, model = model, ncores = 4, divergence.iteration = divergence.iteration) ## End(Not run)
Method of MCMC class (access via mcmc$<function name>, where mcmc is an object initialized by initializeMCMCObject). Set sample adaptiveWidth value, which is the number of samples (not iterations) between adapting parameter proposal widths
_adaptiveWidth |
postive value |
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Set amino acids (ROC, FONSE) or codons (PA, PANSE) for which parameters will be estimated. Note that non-default groupLists are still in beta testing and should be used with caution.
List |
of strings epresenting groups for parameters to be estimated. Should be one letter amino acid (ROC, FONSE) or list of sense codons (PA, PANSE). |
Method of MCMC class (access via mcmc$<function name>, where mcmc is an object initialized by initializeMCMCObject). Set restart file output name and frequency prior to running MCMC
filename |
name of restart file |
interval |
number of samples (ie. iterations * thinning) between writing new restart file |
multiple |
if true, will output a new restart file at each interval (file name will include sample it was written at) |
setRestartSettings
sets the needed information (what the file
is called, how often the file should be written) to write
information to restart the MCMC algorithm from a given point.
setRestartSettings(mcmc, filename, samples, write.multiple = TRUE)
setRestartSettings(mcmc, filename, samples, write.multiple = TRUE)
mcmc |
MCMC object that will run the model fitting algorithm. |
filename |
Filename for the restart files to be written. |
samples |
Number of samples that should occur before a file is written. |
write.multiple |
Boolean that determines if multiple restart files are written. Default value is TRUE. |
setRestartSettings
writes a restart file every set amount of samples
that occur. Also, if write.multiple is true, instead of overwriting the previous restart
file, the sample number is prepended onto the file name and multiple rerstart files
are generated for a run.
This function has no return value.
## set restart settings for checkpointing samples <- 2500 thinning <- 50 adaptiveWidth <- 25 ## estimate all parameter types mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth, est.expression=TRUE, est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE) # prompts the mcmc to write a restart file every 100 samples during the run. setRestartSettings(mcmc = mcmc, filename = "test_restart", samples = 100) # prompts the mcmc to write a restart file every 100 samples during the run, # but will overwrite it each time. setRestartSettings(mcmc = mcmc, filename = "test_restart", samples = 100, write.multiple = FALSE)
## set restart settings for checkpointing samples <- 2500 thinning <- 50 adaptiveWidth <- 25 ## estimate all parameter types mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth, est.expression=TRUE, est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE) # prompts the mcmc to write a restart file every 100 samples during the run. setRestartSettings(mcmc = mcmc, filename = "test_restart", samples = 100) # prompts the mcmc to write a restart file every 100 samples during the run, # but will overwrite it each time. setRestartSettings(mcmc = mcmc, filename = "test_restart", samples = 100, write.multiple = FALSE)
Method of MCMC class (access via mcmc$<function name>, where mcmc is an object initialized by initializeMCMCObject). Set number of samples set for MCMCAlgorithm object
_samples |
postive value |
Method of MCMC class (access via mcmc$<function name>, where mcmc is an object initialized by initializeMCMCObject). Set number of iterations (total iterations = samples * thinning) to allow proposal widths to adapt
steps |
a postive value |
Set thinning value, which is the number of iterations (total iterations = samples * thinning) not being kept
_thinning |
postive value |
Method of Model class (access via model$<function name>, where model is an object initialized by initializeModelObject). Will simulate a version of the given genome using the current set of parameters stored in the Parameter object. This can be written to a FASTA file using genome$writeFasta(<filename>,simulated = TRUE).
genome |
a Genome object initialized by initializeGenomeObject |
summary
summarizes the description of a genome, such as number of genes and average gene length.
## S3 method for class 'Rcpp_Genome' summary(object, ...)
## S3 method for class 'Rcpp_Genome' summary(object, ...)
object |
A genome object initialized with |
... |
Optional, additional arguments to be passed to the main summary function that affect the summary produced. |
This function returns by default an object of class c("summaryDefault", table").
writeMCMCObject
stores the MCMC information from the
model fitting run in a file.
writeMCMCObject(mcmc, file)
writeMCMCObject(mcmc, file)
mcmc |
MCMC object that has run the model fitting algorithm. |
file |
A filename where the data will be stored. |
This function has no return value.
## saving the MCMC object after model fitting genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") genome <- initializeGenomeObject(file = genome_file) sphi_init <- c(1,1) numMixtures <- 2 geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2))) parameter <- initializeParameterObject(genome = genome, sphi = sphi_init, num.mixtures = numMixtures, gene.assignment = geneAssignment, mixture.definition = "allUnique") samples <- 2500 thinning <- 50 adaptiveWidth <- 25 mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth, est.expression=TRUE, est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE) divergence.iteration <- 10 ## Not run: runMCMC(mcmc = mcmc, genome = genome, model = model, ncores = 4, divergence.iteration = divergence.iteration) writeMCMCObject(mcmc = mcmc, file = file.path(tempdir(), "file.Rda")) ## End(Not run)
## saving the MCMC object after model fitting genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") genome <- initializeGenomeObject(file = genome_file) sphi_init <- c(1,1) numMixtures <- 2 geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2))) parameter <- initializeParameterObject(genome = genome, sphi = sphi_init, num.mixtures = numMixtures, gene.assignment = geneAssignment, mixture.definition = "allUnique") samples <- 2500 thinning <- 50 adaptiveWidth <- 25 mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth, est.expression=TRUE, est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE) divergence.iteration <- 10 ## Not run: runMCMC(mcmc = mcmc, genome = genome, model = model, ncores = 4, divergence.iteration = divergence.iteration) writeMCMCObject(mcmc = mcmc, file = file.path(tempdir(), "file.Rda")) ## End(Not run)
writeParameterObject
will write the parameter object as binary to the filesystem
writeParameterObject(parameter, file)
writeParameterObject(parameter, file)
parameter |
parameter on object created by |
file |
A filename that where the data will be stored. |
As Rcpp object are not serializable with the default R save
function,
therefore this custom save function is provided (see loadParameterObject).
This function has no return value.
## Not run: genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") genome <- initializeGenomeObject(file = genome_file) sphi_init <- c(1,1) numMixtures <- 2 geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2))) parameter <- initializeParameterObject(genome = genome, sphi = sphi_init, num.mixtures = numMixtures, gene.assignment = geneAssignment, mixture.definition = "allUnique") ## writing an empty parameter object as the runMCMC routine was not called yet writeParameterObject(parameter = parameter, file = file.path(tempdir(), "file.Rda")) ## End(Not run)
## Not run: genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa") genome <- initializeGenomeObject(file = genome_file) sphi_init <- c(1,1) numMixtures <- 2 geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2))) parameter <- initializeParameterObject(genome = genome, sphi = sphi_init, num.mixtures = numMixtures, gene.assignment = geneAssignment, mixture.definition = "allUnique") ## writing an empty parameter object as the runMCMC routine was not called yet writeParameterObject(parameter = parameter, file = file.path(tempdir(), "file.Rda")) ## End(Not run)