Note: The calculations (especially for 6+ genes) may take up to 2 minutes to complete. If it takes longer please refresh your browser and try again.
This app is best suited for geneticists interested in the evolution of bacterial populations of phenotypes/phasotypes dictated by phase variation (PV). Exposure to fluctuating environments is thought to drive the evolution of genes with a binary (ON/OFF) switching property, often called phase variation, in bacteria populations. Evolution of PV is influenced by mutation characterisitics and selection coefficients and bottlenecks. In this app, we propose a discrete-time discrete-space stochastic model for PV which takes into account mutation and selection mechanisms.
With a starting distribution of phasotypes, the app uses mutation rates for each gene with advantageous selection chosen for particular phasotypes to find the expected distribution after a given number of estimated generations. In addition, the app also finds the expected stationary distribution, the amount of generations required to reach this stationary distribution, and produces plots for each of the three distributions featured. The basic features of this interactive model are explained through a step-by-step guide.
As input parameters, the model has (in order of that seen in the app) the number of genes with the ON/OFF PV property, the initial distribution of gene states in the population, the number of bacterial generations n , mutation rates per gene of OFF-to-ON and ON-to-OFF switching probabilities per generation, and a vector of fitness parameters (if selection is turned ON). The output of the model is the distribution of gene states after the selected n generations, with a stationary distribution and time to stationary distribution also recorded. The stationary distribution as n goes to infinity is unique and not conditional on the initial distribution choice.
The model is derived under the assumption of a near infinite population size. Derivation of the model is based on an idea of splitting of mutation and selection at each time step and on averaging of all possible realisations of the evolution of the branching tree. In terms of its properties, the model resembles a nonlinear Markov chain. When all the fitness parameters are equal, the model degenerates to a model that considers mutation without selection.
The idea is that beginners to the app or those who wish only to see how the model works can explore the model more easily with pre-loaded data. For those wishing to use their own data and are more familiar with the app, upload buttons are provided on the input page for selection rates, mutation rates, and initial distribution.
Mutation rates are based on experimental estimates for actual genes, assigned randomly to genes in our app. The initial distribution vectors were randomly Dirichlet sampled once for each gene and the selection rate vectors were randomly uniform sampled once per gene, within acceptable selection rate limits based on the gene size. The number of generations n is fixed at 220 regardless of the number of genes chosen.
The vector size for selection without the option of selection per gene is length 2^(genes). This means that fitness values can be chosen per phasotype. If the checkbox to input selection per gene is turned on, a new (genes x 2) matrix appears in which you can amend selection per gene. The app then applies a tensor product on this matrix to produce the values in the selection vector of phasotypes. This can be seen as a multiplication of all phasotype combinations in the first matrix (e.g. all 0 values multiplied in the selection per gene matrix for three genes will equal the value seen in the 000 cell in the vector of phasotypes). The choice of using selection per gene may be the more desirable option if you have an idea of selective values per gene, or which to save time with manual input of values. However, the vector of phasotypes is then just a collection of products, and so better accuracy in results may be attained if individual cells in this vector could be refined. Hence, the vector of phasotypes can still be edited even if tensor calculations are made. Regardless of whether the checkbox is checked on or off, the calculations performed when the model is run is always from the phasotype vector.
The main reasons are either because the amount of genes chosen is quite large, or that the model is performing a lot of calculations due to slow convergence to the stationary distribution.
To address the first issue in more detail, the computational cost increases at an exponential rate when we look at more genes. This is evident in the vector sizes for the initial distribution and selection; the vector sizes for each are of length 2^(genes), so for four genes they are length-16 vectors, but a subtle change to six genes make them length-64 vectors. This is quite a large increase and is more expensive for the model.
To address the second issue in more detail, we mentioned in the How does the model work? question that the model in this app resembles a nonlinear Markov chain. One of the results of this assumption is that the distribution at time n is dependent on the distribution at time n-1 . Therefore, every generation up to the one chosen on the input page must be calculated, and also beyond that to find the time to stationary distribution. Slow changes in distribution may be due to low mutation rates, low (or no) selection on phasotypes, or high numbers of genes. The stationary distribution for this app is taken (by default) to be when the Euclidean distance between the distribution at time n and the distribution at time n-1 is lower than 10^(-5).
The reason for large amounts of generations in the model is explained in the question above. To prevent excessive waiting, the model stops automatically at a default value of 50,000 generations.
In relation to the answers from the two previous questions, the following amendments can be made (please run the model again after making amendments):
We first tested the model using experimental Campylobacter jejuni data provided by Dr Bayliss' lab at the University of Leicester. Estimates of the mutation rates are known from specially designed experiments. From in vitro bacteria evolution experiments, we have sample distributions at the start of the experiment (inoculum) and at the end of the experiment. The number of generations n is estimated from the time length of the experiment, which was 220 for our data. The fitness parameters are not directly observable. We developed an efficient method for checking whether the mutation only model (i.e., when all the fitness parameters are equal) can explain the data. We showed that the behaviour of some of the 28 phase variable genes of Campylobacter jejuni is consistent with the mutation mechanism whilst there are other genes whose evolution cannot be explained by mutations alone. For the latter, we need to estimate fitness parameters of our selection model. The model was applied to the experimental data of such genes which may exhibit selection. With specific choices for the selection vector, the final distribution of the model after 220 generations closely resembled that of the actual data final distribution. This shows that our model provides one possible explanation for the distributions we see in the data.
App Version 1.1
I'm Ryan Howitt, a mathematics research student at the University of Nottingham, belonging to the Scientific Computation group but with a specialism in statistics. My research focuses primarily on biostatistical analysis of the model in this app, using in vitro Campylobacter jejuni genetic data (further details of the model can be found in the FAQ tab). My work is EPSRC funded research in the School of Mathematical Sciences at University of Nottingham, in collaboration with the School of Veterinary Medicine at University of Nottingham and Department of Genetics (Dr Chris Bayliss' lab) at University of Leicester.
Bayliss, C.D., Fallaize, C., Howitt, R. and Tretyakov, M.V., 2016. Mutation and selection models for binary switching in bacterial genes. [In preparation]