Modelling

Mutation with selection model of phasotypes

To use this app, a step-by-step guide of instructions have been outlined. We are currently on the input tab, please begin below.

Selection

OFF

Choose whether you wish selection on phasotypes to be turned on or off.

Choose the desired number of genes. This will give phasotypes of length 2^(genes).

Option to upload data

Upload selection rates

Selection per gene?

Choose file to upload

Browse...

The format must be exactly the same as the examples, with column names in row 1, and the vector of values in row 2

The format must be exactly the same as the examples, with column names in row 1, and the vector of values in row 2 to row (genes+1)

Upload mutation rates

Choose file to upload

Browse...

The format must be exactly the same as the examples, with column names in row 1, and the vector of values in row 2 to row (genes+1)

Upload initial distribution

Choose file to upload

Browse...

The format must be exactly the same as the examples, with column names in row 1, and the vector of values in row 2

If desired, the option is available to bypass the preloaded examples and upload ones own data via a .csv file. This is only recommended for experienced users whom are familar with this app.

Now press update to observe your initial distribution, number of generations, mutation rate and selection choice tables

Initial distribution

Input initial distribution values for each phasotype. This must sum to 1, which you can check in the grey box above.

Input number of generations.

Mutation rates

Input mutation rates per gene (per row), which determine probability of phase variation status switch from OFF-to-ON and ON-to-OFF. These values must be between (but not including) 0 and 1.

Selection rates

Selection per gene?

Selection per gene requires only to input selection values per gene, rather than per phasotype otherwise (although amendments per phasotype can still be made).

Input selection values for each phasotype. Please input values greater than or equal to 1, where 1 indicates no selection and >1 indicates advantageous selection. For example, 1.02 is a 2% selective advantage.

Press the Run model button, and use the tabs at the top of the webpage to see your output distributions and plots

Note: The calculations (especially for 6+ genes) may take up to 2 minutes to complete. If it takes longer please refresh your browser and try again.

Final distribution (after the specified number of generations)

File type:

csv

tsv

Download Here you can download your final distribution output as a .csv or .tsv file.

Stationary distribution

An error message will appear here instead if stationary distribution is not reached. If this happens, please increase the max number of generations/decrease tolerance for stationarity on the FAQ page.

Time to stationary distribution

File type:

csv

tsv

Download Here you can download your stationary distribution output as a .csv or .tsv file.

Who is the app suitable for?

This app is best suited for geneticists interested in the evolution of bacterial populations of phenotypes/phasotypes dictated by phase variation (PV). Exposure to fluctuating environments is thought to drive the evolution of genes with a binary (ON/OFF) switching property, often called phase variation, in bacteria populations. Evolution of PV is influenced by mutation characterisitics and selection coefficients and bottlenecks. In this app, we propose a discrete-time discrete-space stochastic model for PV which takes into account mutation and selection mechanisms.

What are the outcomes of the app?

With a starting distribution of phasotypes, the app uses mutation rates for each gene with advantageous selection chosen for particular phasotypes to find the expected distribution after a given number of estimated generations. In addition, the app also finds the expected stationary distribution, the amount of generations required to reach this stationary distribution, and produces plots for each of the three distributions featured. The basic features of this interactive model are explained through a step-by-step guide.

How does the model work?

As input parameters, the model has (in order of that seen in the app) the number of genes with the ON/OFF PV property, the initial distribution of gene states in the population, the number of bacterial generations n , mutation rates per gene of OFF-to-ON and ON-to-OFF switching probabilities per generation, and a vector of fitness parameters (if selection is turned ON). The output of the model is the distribution of gene states after the selected n generations, with a stationary distribution and time to stationary distribution also recorded. The stationary distribution as n goes to infinity is unique and not conditional on the initial distribution choice.

The model is derived under the assumption of a near infinite population size. Derivation of the model is based on an idea of splitting of mutation and selection at each time step and on averaging of all possible realisations of the evolution of the branching tree. In terms of its properties, the model resembles a nonlinear Markov chain. When all the fitness parameters are equal, the model degenerates to a model that considers mutation without selection.

Why are there specific preset values on the input page?

The idea is that beginners to the app or those who wish only to see how the model works can explore the model more easily with pre-loaded data. For those wishing to use their own data and are more familiar with the app, upload buttons are provided on the input page for selection rates, mutation rates, and initial distribution.

How were the specific preset values on the input page chosen?

Mutation rates are based on experimental estimates for actual genes, assigned randomly to genes in our app. The initial distribution vectors were randomly Dirichlet sampled once for each gene and the selection rate vectors were randomly uniform sampled once per gene, within acceptable selection rate limits based on the gene size. The number of generations n is fixed at 220 regardless of the number of genes chosen.

Why is there an option to input selection per gene?

The vector size for selection without the option of selection per gene is length 2^(genes). This means that fitness values can be chosen per phasotype. If the checkbox to input selection per gene is turned on, a new (genes x 2) matrix appears in which you can amend selection per gene. The app then applies a tensor product on this matrix to produce the values in the selection vector of phasotypes. This can be seen as a multiplication of all phasotype combinations in the first matrix (e.g. all 0 values multiplied in the selection per gene matrix for three genes will equal the value seen in the 000 cell in the vector of phasotypes). The choice of using selection per gene may be the more desirable option if you have an idea of selective values per gene, or which to save time with manual input of values. However, the vector of phasotypes is then just a collection of products, and so better accuracy in results may be attained if individual cells in this vector could be refined. Hence, the vector of phasotypes can still be edited even if tensor calculations are made. Regardless of whether the checkbox is checked on or off, the calculations performed when the model is run is always from the phasotype vector.

Why does the app take a while to run?

The main reasons are either because the amount of genes chosen is quite large, or that the model is performing a lot of calculations due to slow convergence to the stationary distribution.

To address the first issue in more detail, the computational cost increases at an exponential rate when we look at more genes. This is evident in the vector sizes for the initial distribution and selection; the vector sizes for each are of length 2^(genes), so for four genes they are length-16 vectors, but a subtle change to six genes make them length-64 vectors. This is quite a large increase and is more expensive for the model.

To address the second issue in more detail, we mentioned in the How does the model work? question that the model in this app resembles a nonlinear Markov chain. One of the results of this assumption is that the distribution at time n is dependent on the distribution at time n-1 . Therefore, every generation up to the one chosen on the input page must be calculated, and also beyond that to find the time to stationary distribution. Slow changes in distribution may be due to low mutation rates, low (or no) selection on phasotypes, or high numbers of genes. The stationary distribution for this app is taken (by default) to be when the Euclidean distance between the distribution at time n and the distribution at time n-1 is lower than 10^(-5).

Why does my stationary distribution reach a limit?

The reason for large amounts of generations in the model is explained in the question above. To prevent excessive waiting, the model stops automatically at a default value of 50,000 generations.

How can I change the settings of the app to make the run time shorter/longer?

In relation to the answers from the two previous questions, the following amendments can be made (please run the model again after making amendments):

Tolerance to determine stationary distribution (must be >0):

Decrease number = increase run time (and vice versa)

Maximum generation number of the model (must be >0):

Decrease number = decrease run time (and vice versa)

If the stationary distribution is reached in less generations than the number of generations chosen on the input page, the final distribution returned will be the same as the stationary distribution returned (as a time saving method). Therefore, if the tolerance here is chosen to be too low, the final distribution (and by nature the stationary distribution) may not be accurate.

If the maximum generation number is less than the number of generations chosen in the input page, the final distribution returned will only be up to the maximum generation number, unless stationarity is achieved before then

What were your discoveries from the model?

We first tested the model using experimental Campylobacter jejuni data provided by Dr Bayliss' lab at the University of Leicester. Estimates of the mutation rates are known from specially designed experiments. From in vitro bacteria evolution experiments, we have sample distributions at the start of the experiment (inoculum) and at the end of the experiment. The number of generations n is estimated from the time length of the experiment, which was 220 for our data. The fitness parameters are not directly observable. We developed an efficient method for checking whether the mutation only model (i.e., when all the fitness parameters are equal) can explain the data. We showed that the behaviour of some of the 28 phase variable genes of Campylobacter jejuni is consistent with the mutation mechanism whilst there are other genes whose evolution cannot be explained by mutations alone. For the latter, we need to estimate fitness parameters of our selection model. The model was applied to the experimental data of such genes which may exhibit selection. With specific choices for the selection vector, the final distribution of the model after 220 generations closely resembled that of the actual data final distribution. This shows that our model provides one possible explanation for the distributions we see in the data.

About me

App Version 1.1

Author

I'm Ryan Howitt, a mathematics research student at the University of Nottingham, belonging to the Scientific Computation group but with a specialism in statistics. My research focuses primarily on biostatistical analysis of the model in this app, using in vitro Campylobacter jejuni genetic data (further details of the model can be found in the FAQ tab). My work is EPSRC funded research in the School of Mathematical Sciences at University of Nottingham, in collaboration with the School of Veterinary Medicine at University of Nottingham and Department of Genetics (Dr Chris Bayliss' lab) at University of Leicester.

Acknowledgements

Supervisors:

Professor Michael V. Tretyakov
Dr. Christopher J. Fallaize

School of Mathematical Sciences, University of Nottingham, Nottingham, NG7 2RD

Dr. Christopher D. Bayliss

Department of Genetics, University of Leicester, Leicester, LE1 7RH, UK

With special thanks to:

Lingyi Yang

Wellcome Trust internship

The models used are from the following papers

Bayliss, C.D., Fallaize, C., Howitt, R. and Tretyakov, M.V., 2016. Mutation and selection models for binary switching in bacterial genes. [In preparation]