Software
This list is by no means complete or even exhaustive. At the bottom of the page, there are some other lists you may want to consult. New programs appear almost monthly (most published in Molecular Ecology Resources), so stay aware of developments in the field. With all programs, always read the original paper and the manual before use.
General Purpose Programs
These programs are a collection of tests and methods commonly used in population genetics
- Arlequin
- General purpose package that does almost every analysis in the book, and accepts microsatellite, allozyme, sequence and other data. Has an import feature for GENEPOP files but check carefully as that doesn't always work.Using the Excel Microsatellite Toolkit for input file preparation is a better option. Also watch out with results from data sets with lots of missing data.
- PowerMarker
- Potentially powerful program that calculates all sorts of genetic distances and also does a few other things that no other program does. Accepts input files from Excel (copy&paste) that are relatively easy to prepare.
- Genetix
- Powerful analysis package for population genetics, but you have to understand French. Has nice features such as a PCA on individual genotypes and permutation tests of FST. Imports Genepop files, but make sure that the import worked - sometimes alleles get mixed up.
Estimation and Test of Population Genetic Parameters
- GENEPOP
- performs exact tests for deviation from Hardy-Weinberg, linkage disequilibrium, population differentiation and isolation by distance (DOS). Can be downloaded as stand-alone program or used as a web based version of the program. Whatever you do, read the manual first.
- FSTAT
- Calculates FST, RST and tests the estimates, among other standard population genetics statistics. Also produces and tests pairwise FST values. New and useful feature is the estimation of allelic richness corrected for sample size, and tests for differences in genetic diversity between groups of samples.
- ChiFish
- DOS package for testing for significant differentiation using both Fisher's exact test (via Markhov Chain, equivalent to GENPOP) and a standard chi-squared test. Probabilities over all loci are calculated by Fisher's combination of probabilities and chi-squared summation, respectively. Have a look at Ryman & Jorde 2001 for a description of the two approaches.
- PowSim
- DOS program to estimate the power of a specific molecular marker set in detecting significant population structure. Read also the associated paper by Ryman & Palm (2006) and the documentation that comes with the program.
- RSTCalc
- DOS program for the analysis of microsatellite data. Calculates and tests by permutation a variety of statistics assuming stepwise mutation models.
- MICROSAT
- Program for microsatellite analysis (distances, Fst, Rst), which has been compiled for both DOS and Mac.
- MSA
- DOS program for calculating genetic distances and obtaining descriptive distances for microsatellite data. Also converts into a variety of input formats for other programs.
- Spagedi
- Spagedi does permutation tests to compare FST and RST - if the latter is larger than the former, allele size contributes significantly to population differentiation, suggesting some importance of stepwise mutation processes.
Detection and estimation of genotyping error
- Microchecker
- Program to test for deviations from Hardy-Weinberg proportion, estimate null allele frequencies, and test for genotyping error such as stuttering and large allele drop-out.
- Pedant
- Maximum Likelihood program to estimate genotyping error from repeat genotypes. Also allows simulations to establish how many genotypes have to be repeated for reliable error estimates.
Coalescence
- Migrate
- Part of a package of programs for computing population parameters (LAMARC), such as population size, population growth rate and migration rates by using likelihoods over all possible gene genealogies for samples of data (sequences, microsatellites, and electrophoretic polymorphisms) from populations . Includes Coalesce, Fluctuate, Migrate and Recombine
Population assignment
These programs require samples of baseline populations contributing to a mixture, and estimate either proportions of each baseline population in the mixture or assign individuals to baseline populations, or both. They differ in their approach, but also how they deal with alleles that are found in the mixture but not in the baseline.
Assignment Tests and Mixed Stock Analysis requiring a baseline
- GeneClass2
- Powerful and versatile program to assign individuals to populations and to identify recent immigrants. Also has a test to exclude individuals from specific populations. Userfriendly and accepts GENEPOP files. \
- WhichRun
- Specifically designed for salmon populations. Fast and user-friendly. Frequency of alleles present in the mixed sample and not present in source populations depends on sample size from source populations - may cause a considerable bias in assignments.Accepts GENEPOP files and prepares input files for SPAM.
- WhichLoci
- Companion program to WhichRun. Uses genotype data to identify the loci most useful for population assignment. Takes GENEPOP files.
- ONCOR
- new program using a partial Bayesian approach to mixture analysis and individual assignment. Uses a new method to simulate mixture samples from existing baselines, avoiding overoptimistic assessment of power provided by many other programs
- SPAM
- Statistics Package for Analyzing Mixtures. Advanced mixed stock analysis using Likelihood and Bayesian statistics. Input files are nasty, but can be prepared using WhichRun.
- Bayes
- Fully Bayesian mixture analysis which also allows assignment of individuals. There is also a version in C+ by DFO which is supposed to run faster.
- IMMANC
- A program that calculates the probability that an individual is an immigrant, or has recent immigrant ancestry, using the multilocus genotype.
- AFLPOP
- Allocation of individuals to populations from AFLP (dominant) data
Bayesian Clustering
These programs do not require baseline samples, although some can use them.
- Structure
- The first and most widely used program in the series. New version is usefriendly and straightforward. Takes very long to run, but allows batch processing.
- Partition
- Part of the Genetix series. Takes some time to run and has unwieldy output files.
- BAPS
- Much faster than STRUCTURE and Partition, but Robin's paper suggests that it may be less powerful. The original paper describing the method compares it with a dataset that had been analyzed with STRUCTURE and finds not many differences. In my experience, it deals better with larger data sets than STRUCTURE, and if it goes, wrong, it's very obvious (unlike STRUCTURE) - see Hauser et al. (2006)
- BayesAss+
- Identifies recent immigrants into populations and estimates migration rates. Assumes that less than 30% of a population are immigrants, but does not assume Hardy-Weinberg equilibrium. Make sure that your acceptance ratio is really between 40-60% (see manual), and check for repeatability among runs. There is a a user forum run by the author, where you can post questions.
- HWLER
- New program by Michele Masuda and Jerry Pella, who developed Bayes and the precursor of SPAM. Not yet published, but should be soon.
Landscape Genetics
These are programs which explicitly use geographic information in their analyses
Isolation by Distance
See also GENEPOP (Mantel tests) and FSTAT (partial Mantel tests)
- GenAlEx
- Excel Add-In that does spatial autocorrelation, amongst other things (AMOVA, FST, Hardy-Weinberg etc.). Accepts both dominant and co-dominant data and sequences. Claims to import GenePop files, but I never got that to work.
- Spagedi
- Program for isolation by distance calculation and other routines.
- IBDWS
- Web based program to test for isolation by distance and estimate dispersal parameter.
Identification of migration barriers
- Barrier
- Uses genetic and geographic distances to identify regions with disproportionally high genetic divergence, i.e. genetic discontinuities. Make sure you read manual and the original paper.
- SAMOVA
- Spatial AMOVA - similar to an AMOVA (see Arlequin) but does not a priori definition of groups.
- Geneland
- Geneland runs under R but has a very userfriendly interface. Essentially it groups individuals into populations which are in Hardy-Weinberg and linkage equilibrium, under consideration of geographic position of each individual.
- BAPS
- Divides collections of individuals into populations that are in Hardy Weinberg and linkage disequilirbium. Can incorporate spatial information.
- Alleles in Space
- does a range of landscape genetic analyses and produces very nice and intuitive graphs of spatial structure.
- Tess
- Bayesian Clustering using tessellations and Markov models for spatial population genetics
-
Effective population size
- Ne Estimator
- Uses various methods to estimate effective population size. Can be integrated with three other programs, TM3 (Bayesian), McLeeps (multilocus maximum likelihood) and a method allowing immigration (MLNE).
- Bottleneck
- Program to detect recent population bottlenecks from genetic data
Selection
FDIST2: relatively old but still widely used program based on Beaumont & Nichols 1996. Uses simuations to derive confidence limits around mean FSTs - outliers from these limits may be under selection.
LOSITAN: Java version of the above program. Runs very muchh faster and has tricky addition such as repeated simulations to exclude outlier loci from mean FST calculations.
DetSel: Another program that identifies outlier loci by coalescence simulations. Less stringent assumptions than FDist. See alos the publication in the Journal of Heredity.
BayesFST: Bayesian estimation of FST and identification of markers under diversifying or balancing selection.
Phylogeography
- DNASP
- DnaSP, DNA Sequence Polymorphism, is a software package for the analysis of nucleotide polymorphism from aligned DNA sequence data. Lots of pop-ups when I went there.
- TCS
- Java program for calculating and constructing minimum spanning networks.(written by David Posada, University of Vigo, Spain)
- GeoDis
- Java p rogram for the calculation and statistical test of Nested Clade statistics. (written by David Posada, University of Vigo, Spain)
- GeoPhyl
- Excel spreadsheet demonstrating permutation tests of Dc and Dn (average distances within clade, Dc, and within th nested clade, Dn). (written by Anton Weisstein, Truman State University, USA)
- ANeCA
- Java program for Automated Nested Clade Analysis - does all the steps of a nested clade analysis, including construction of minimum spanning network via TCS, permutation tests of Dc and Dn (via GeoDis) and an automatic implementation of the online inference key on the GeoDis site This program is still under development, and currently uses old versions of TCS and GeoDis - check the repective websites for bug reports. (written by Mahesch Panchal, University of Reading, UK)
Relatedness and Parentage
- CERVUS
- Likelihood based inference of parentage using co-dominant markers. Includes powerful simulations allowing for genotyping error. Original version was published in 1998, but this is a new version (2006) that removed some of the earlier problems.
- PAPA
- Program allowing parentage assignment in closed systems (i.e. all parents are sampled) with genotyping error (though the error model is very specific). Also does simulations to estimate the power of the approach and identify the most powerful loci.
- PASOS
- Program for parentage assignment in open systems. Assigns parentage and estimates the proportion of unsampled parents.
- KINGROUP
- Java application that does many analyses previously only available in KINSHIP on the Mac platform. Primarily used for pedigree relationship reconstruction and kin group assignments using genetic markers
- COLONY
- Program assigning individuals into full-sib groups nested within half-sib groups using a maximum likelihood approach allowing for genotyping errors. Only one of the two sexes is assumed to be polygamous.
Utilities
- Microsatellite Toolkit
- Excellent Add-In for Excel that produces input files for GENEPOP, Arlequin, FSTAT and others. Unfortunately, it seems to disappear from the web on occasions, so the files are also here:
MS_Tools.xla, MS_Tools.GID, MS_Data.xls, MS_Tools.hlp, Readme.txt.
Copy into a directory and follow the instructions in the readme.txt file.
- Convert
- Conversion utility from Excel format to a range of different programs. Also reads GENEPOP files.
- CREATE
- Conversion
- PopTools
- Want to do tricky permutation tests yourself? Here's an Excel Add-In that allows you to do exactly that! I love it!
Software Lists
- FISH 543
- Selected computer programs for relatedness, population genetics and phylogenetics, courtesy of another of my courses. Some links may be outdated - the page was established in 2004.
- Phylogenetics Software
- Although concentrating on phylogenetic methods (i.e. among species) this is probably the most comprehensive list of computer software, including population genetics. Compiled by Joe Felsenstein of the University of Washington.
- Genetics Software List
- Another exhaustive list of genetics software, this time from Bernie May's lab at UC Davis.
-