Phylip Software Package

To go to top of Software pages
  1. Phylip Package Software
  2. Phylip Software Packages
  3. Phylip Software Package Pricing
  4. Phylip File
  5. Phylip Download

PHYLIP is a free package of programs for inferring phylogenies. It is distributed as source code, documentation files, and a number of different types of executable files. Phylogeny programs; External links. PHYLIP website; PHYLIP on Facebook; List of phylogeny software. This limitation does not apply to read.phylip.data, but you might want to follow it if you plan to use PHYLIP. Author(s) Diaz-Uriarte, R., and Garland, T., Jr. Diaz-Uriarte, R., and Garland, T., Jr., in prep. PHYLOGR: an R package for the analysis of comparative data via Monte Carlo simulations and generalized least squares approaches. Read.phylip accepts both interleaved and sequential phylip, the number of sequences is identified by parsing the first line of the file. Sequences and their names will be stored in a data frame. If cleanname is TRUE, punctuation characters and white space be replaced by '. Definition of punctuation characters can be found at regex. Download the source code and documentation package (phylip-3.66.tar.gz) into a suitable folder. Unzip the package with gzip utility (gzip –d phylip-3.66.tar.gz) and expand the tar ball (tar xvf phylip-3.66.tar). Move to the newly formed folder containing the source codes (cd phylip3.6/src). The folder contains a file called Makefile.

PHYLIP version 3.6 is my own package.It is available free, from its Web site, in C source code, oras executables for Windows, Mac OS X, and Mac OS 8 or 9.The C source code can easily be compiled on Unix or Linux systems.It includes programs to carry out parsimony,distance matrix methods, maximum likelihood, and other methods on a varietyof types of data, including DNA and RNA sequences, protein sequences,restriction sites, 0/1 discrete characters data, gene frequencies,continuous characters and distance matrices. It may be themost widely-distributed phylogeny package, with about 29,000 registered users,some of them satisfied.It is third after PAUP*and MrBayesin the competition to be theprogram responsible for the most published trees. It has beendistributed since October, 1980 and has celebrated its 30th anniversary,as the oldest distributed phylogeny package.PHYLIP is distributed at the PHYLIP web site at http://evolution.gs.washington.edu/phylip.html.A number of sites offerweb-servers that will perform data analyses using PHYLIP.

David Swofford of the School of Computational Science andInformation Technology, Florida State University, Tallahassee, Floridahas written PAUP* (which originally meant Phylogenetic Analysis Using Parsimony).PAUP* version 4.0beta10 has been released as a provisionalversion by Sinauer Associates, of Sunderland, Massachusetts.It has Macintosh, PowerMac, Windows, and Unix/OpenVMS versions.PAUP* has many options and close compatibility with MacClade.It includes parsimony, distance matrix, invariants, and maximumlikelihood methods and many indices and statistical tests.It is described in a web pageat http://paup.csit.fsu.edu/, which also contains links toits web pages at Sinauer Associates.It is available for the following types of systems:

  • For PowerMac and 68k Macintosh Mac OS 9 in a version with full mouse-windowsuser interface, which can also be run under the Classic environment on Mac OS X,
  • For PowerPC Mac OS X systems or Intel Mac OS X systems when runningunder emulation) in a version with a command-line interface,
  • For Windows in a version with acharacter-based command-line interface (which appears in a Windows window),
  • For DOS or a Windows DOS box in a version which has command-line interface, and
  • In a Unix/Linux version, with command-line interface, forAlpha Compaq/Digital Unix, Alpha Linux, PowerPC Linux, Intel-compatible Linux,Sun SPARC/UltraSPARC Solaris, and Alpha VMS.
The priceis $100 US for the Macintosh and PowerMac executable versions,$85 for the Windows executable version, and$150 for the Unix source code version, plus $20 for shipment. The Beta version comes with a Command Reference Document.Their ISBN numbers are0-87893-806-0, -807-9, and -804-4. Contact and orderinginformation will be found at the SinauerAssociates web site.The international distributor for many countries is Palgrave Macmillan,Brunel Road, Houndsmills, Basingstoke, Hampshire RG21 6XS, U.K. Tel:+44-1256-329242Fax: +44-1256-330688. Their e-mail address islecturerservices(at) palgrave.com.For New Zealand, Korea, Japan, Brazil and Australia see the addresses atthis web page.

Phylip Package Software

Derek Sikes of the University of Alaska Mueum, Fairbanks, Alaska(ffdss (at) uaf.edu) and PaulLewis of theDepartment of Ecology and Evolutionary Biology of the University of Connecticuthave produced PAUPRat, a program that generates a text filewhich can be used as commands by PAUP*to have it carry implement Kevin Nixon's highly effectivetree search method, the ParsimonyRatchet. The input files for PAUPRat can alsobe modified to implement Rutger Vos'scomparable Likelihood Ratchet. It is available as Mac OS,Mac OS X and Linux executables, as a DOS executable that can be run underWindows,and in source code, fromits web siteat http://users.iab.uaf.edu/~derek_sikes/software2.htm.

MacClade is a pioneering program forinteractive analysis of evolution of a variety of character types,including discrete characters and molecular sequences. It works onMacintoshes with Mac OS X, up to and includingnow Leopard, Mac OS X version 10.6 (and also on Mac OS).MacClade enables you to use the mouse-window interface to specify andrearrange phylogenies by hand, and watch the number of character steps and thedistribution of states of a given character on the tree change as you do so.It has many other features beyond this, including ability to edit data,print out phylogenies, and even simulate the evolution of data on a tree.MacClade was written by Wayne Maddison (now of the Department of Zoology, University of British Columbia)and David Maddison of the Department of Entomology,University of Arizona. Until 2011 it was distributed commercially bySinauer Associates of Sunderland, Massachusetts, USA.As MacClade will not function with the forthcoming Mac OS X 10.7 (Lion),the Maddisons have made it available as a free download. It isavailable at the MacClade web sitestarting with version 4.08a. It includes a manual.An much earlier and less capable Version, 2.1 (which for examplecannot read nucleic acid sequences and has many fewer features for discretecharacters) is also available as a Mac OS 9 executable from the EMBL and Indianamolecular biology software servers at (respectively)iubio.bio.indiana.edu, and ftp.ebi.ac.uk, in directoriesmolbio/mac and pub/software/mac,respectively, as a BinHexed and squeezed archive, (respectivelymacclade-old.hqx and macclade21.hqx. A demoversion of MacClade 3 that will not save or print filesis also available there.

J. S. Farris has produced Hennig86, a fast parsimony program includingbranch-and-bound search for most parsimonious trees and interactive treerearrangement. Although complete benchmarks have not been published it is saidto be faster than Swofford's PAUP*; both are a great many times faster than theparsimony programs in PHYLIP. The program is distributed in executable objectcode only and costs $50, plus $5 mailing costs ($10 outside of of the U.S.).The user's name should be stated, as copies are personalized as a copy-protection measure. It is distributed by Arnold Kluge, Amphibians andReptiles, Museum of Zoology, University of Michigan, Ann Arbor, Michigan48109-1079, U.S.A. (akluge(at) umich.edu) and by Diana Lipscomb atGeorge Washington University (biodl(at) gwuvm.gwu.edu). It runs on PC-compatiblemicrocomputers with at least 512K of RAM and needs no math coprocessor orgraphics monitor. It can handle up to 180 taxa and 999 characters.It was described in the paper:Farris, J.S. 1989, Hennig86: a PC-DOS program for phylogenetic analysis.Cladistics5: 163.

Mark Siddall, Assistant Curator of Annelidaat the American Museum of Natural History, New York(siddall(at) amnh.org) has released Random Cladistics, version 4.0.3, a set of programs that can carryout bootstrapping, jackknifing, a variety of kinds of permutation tests, andsearch for 'islands' of trees,using Hennig86 orNONA to analyze the data. It can alsomark ranges of sites forinclusion or exclusion, compare trees from the analyses, compute an indexof incongruence between data sets, and do many otheroperations. To use it you must have a copy of Hennig86(for whose distribution see above). Random Cladistics will carry out theappropriate transformations of your data and will call Hennig86 and have itanalyze them, and then it will summarize the results.Random Cladistics is described by its author as no longer being supportedsoftware -- he says that 'Wincladais far superior and provide's a nice interface.'Random Cladistics and associated programs are still distributed by their authorfrom its web site athttp://research.amnh.org/~siddall/rc.html as MSDOS executables.

Torsten Eriksson of the Bergius BotanicalGarden, Stockholm, Sweden (torsten(at) bergianska.se) has written a program, AutoDecay whichgenerates Decay Indices from an existing PAUP* 4.0 treefile. It is intendedto simplify the the task ofcreating reverse constraint trees in PAUP* 4.0 and subsequent generation ofBremer support values. (Bremer, K. 1994. Cladistics10: 295-304).AutoDecay version 5.06 is written in the scripting language Perl, and runs onmost systems that have Perl installed. Autodecay canbe obtained from Eriksson's software web page fromhttp://www.bergianska.se/index_forskning_soft.html.

Doug Eernisse of theCalifornia State University, Fullerton (DEernisse(at) fullerton.edu)has constructed DNA Stacks version 1.3.5, a Macintosh HyperCard stackthat can carry out a variety of analyses on DNA sequences. It does not do phylogenies itself. It has an alignment editor, and cancarry out various kinds of translation,and codon bias analysis. It can write out data sets in PAUP*, Hennig86, andPHYLIP formats. It is included here because in its'Support Index Blocks...' menu item it is able to prepare jobs forPAUP* to enable Decay Index (Support Index) analysis.It is available by World Wide Web fromhttp://biology.fullerton.edu/deernisse/dnastacks.html.

Michael Sorensonof the Department of Biology, Boston University (msoren(at) bu.edu)has releasedTreeRot, version 3, a program that helps make Bremer SupportIndices ('decay indices') for parsimony analyses. It generates aPAUP* command file with a constraintstatement for each node in a given shortest or strict consensus tree andwith commands to search for trees inconsistent with each of these constraintstatements in turn. For nodes with decay indices of more than a few steps, theconstraint statement approach is much more effective than simply finding alltrees 1, 2, 3, 4, etc. steps longer than the shortest tree and then examiningtheir strict consensus for which nodes are lost.This version also supports the determination of partitioned Bremer supportindices introduced in the paper:Baker, R.H., and R. DeSalle. 1997. Multiple sources of character informationand the phylogeny of Hawaiian Drosophilids. Systematic Biology46: 654-673, and it will also parse thePAUP* log file, automatically calculating the decay index for each node.It is written in the Perl scripting language, anda Mac OS Macintosh executable is also available. Both are distributed atits web siteat http://people.bu.edu/msoren/TreeRot.html.

J. S. Farris has written RA (Rapid nucleotide Analysis).It features rapid bootstrapping. It is available from Arnold Kluge, Amphibians and Reptiles, Museum of Zoology, University of Michigan, Ann Arbor, Michigan 48109-1079, U.S.A.(akluge(at) umich.edu)and Diana Lipscomb at George Washington University (BIODL(at) gwuvm.gwu.edu) who may be contacted for details. The cost is said to be about $30 US.

Kevin Nixon of the L. H. Bailey Hortorium atCornell University in Ithaca, New York (kcn2(at) cornell.edu) has writtenWINCLADA version 0.9.99m24, an interactive program that can read and edittrees and data files, display character state changes inferred by parsimonyon diagrams of the trees, and launch runs of the programsNONA, PIWE, andHennig86. WINCLADA is availableas a Windows95/98/NT executable fromits web site athttp://www.cladistics.com/about_winc.htm. It is available ona shareware basis: the user who downloads it must pay $50 to Kevin Nixon atWinclada/Kevin C. Nixon, 2210 Ellis Hollow Road, Ithaca, New York 14850.There is also a $200-per-class fee for its use in courses.WINCLADA supersedes and combines features of Nixon's earlier programsClaDOS and DADA, which are no longer distributed.

Pablo Goloboff, of INSUE - Fundación e InstitutoMiguel Lillo 205, 4000 S. M. deTucumán, Argentina (instlillo(at) infovia.com.arwith Subject line 'para Pablo Goloboff') has writtenNONA (Noname), version 2.0, PiWe(Parsimony with Implied WEights),and SPA to carry out parsimony including weightedparsimony analyses. NONA searches for most parsimonious trees according tocharacter weights defined by the user a priori. Pee-Wee calculates weights ofthe characters by a method introduced by Goloboff, anoniterative version of J. S. Farris's 'successive weighting'. It was describedin Goloboff's paper in Cladistics9: 83-91, 1993.SPA is a generalized parsimony program that allows differential weighting ofchanges between different states.NONA is said to be faster than other parsimony programs.A Windows version of NONA which includes Piweand SPA is available as freeware fromits web page athttp://www.cladistics.com/aboutNona.htm.

Pablo Goloboff, of INSUE - Fundación e InstitutoMiguel Lillo 205, 4000 S. M. de Tucumán, Argentina,(pablogolo (at) csnat.unt.edu.ar)together with J. S. Farris of the, Laboratory of Molecular Systematics ofthe Naturhistoriska Riksmuseet, Stockholm,Sweden and Kevin Nixon of the L. H. Bailey Hortorium, Cornell University,Ithaca, New York, have produced TNT (Tree analysis usingNew Technology), version of August 2008. This is a parsimonyprogram intended for use on very large data sets. It makes use of themethods for speeding up parsimony searches introduced by Goloboff inthe paper: Goloboff, P.A. 1999. Analyzing large data sets in reasonable times:solutions for composite optima. Cladistics15: 415-428, andthe highly effective 'parsimony ratchet' search strategy introduced byNixon in the paper: Nixon, K.C. 1999. The parsimony ratchet, a new methodfor rapid parsimony analysis. Cladistics15: 407-414.It can handle characters with discrete states as well as continuous characters.The program is distributed as Windows, Linux, and both PowerMac and Intel MacOS X executables.The program and some supportfiles including documentation is available fromits web pageat http://www.zmuc.dk/public/phylogeny/TNTIt is free, provided you agree to a license with some reasonable limitations.

Frédéric Calendini and Jean-Francois Martinof the Departement Protection des plantes et environnmentof the Ecole Nationale Supérieur, Montpellier, France(martinjf (at) ensam.inra.fr)have produced PaupUpversion 1.0.3.1, a graphical frontend for Paup* DOS software. The PauUp program provides a user-friendly interface to the phylogenetic program PAUP* on the Windows operating systems. The DOS version of PAUP* is entirely command-line driven and does not provide any graphical interface. PaupUp partly resolves this issue, providing around 80% of the available commands (the most commonly used in our opinion) in a graphical environment comparable to the MAC OS version while the last 20% commands are still available through direct command-line input in a single integrated design. The programsTreeView and Modeltestcan be called from PaupUp. PaupUp is not compatible with the Windows version of PAUP* but is compatible with the DOS version that is distributed with that Windows version. It is available as a Windows executable. It also requires the Microsoft .NET executable framework to be installed. PaupUp can be downloaded fromits web siteat http://www.agro-montpellier.fr/sppe/Recherche/JFM/PaupUp/

Kai Müllerof the Nees-Institut für Biodiversit&aauml;t der Pflanzenof the University of Bonn, Germany(kaimueller (at) uni-bonn.de)has written PRAP(Parsimony Ratchet Analyses using PAUP* and likelihood)version 2.0, a Java program to drive PAUP* in computing Bremer supportof groups, and in doing ratchet searches for parsimony or likelihood trees. It allows the user to makePAUP* carry out searches using the 'parsimony ratchet' strategy of Kevin Nixon. In version 2.0 this can be done using either the parsimony criterion or the likelihood criterion (in spite of the name of the search method). It can also do variations on the parsimony ratchet including multiple random addition sequences.It is described in the paper:Müller, K. F. 2004. PRAP - computation of Bremer support for large data sets. Molecular Phylogenetics and Evolution31: 780-782, and the search strategies it implements are described in the paper: Müller, K. 2005. The efficiency of different search strategies in estimating parsimony jackknife, bootstrap, and Bremer support. BMC Evolutionary Biology5: 58.It is available as Java executables, as downloads for Windows, Mac OS X,and for Unix. It can be downloaded fromits web siteat http://systevol.nees.uni-bonn.de/software.The earlier versions 1.0 and 0.99 are also available there.

MEGA (MolecularEvolutionary Genetic Analysis) is produced by Sudhir Kumar ofthe Center for Evolutionary Functional Genomics of theThe Biodesign Institute atArizona State University, Tempe, Arizona (s.kumar(at) asu.edu)together with Joel Dudley of theStanford Center for Biomedical Informatics Research at Stanford University,Koichiro Tamura of Tokyo Metropolitan University and Masatoshi Nei,of Pennsylvania State University.It carries out parsimony, distance matrix and likelihood methods formolecular data (nucleic acid sequences and protein sequences). Itcan do boostrapping, consensus trees, and a variety of distance measures,with Neighbor-Joining, Minimum Evolution, UPGMA, and parsimony treemethods, as a well as a large variety of data editing tasks, sequencealignment using an implementation ofClustalW, tests of themolecular clock, and single-branch tests of significance of groups.MEGA4 is the current version. MEGA4 is described in the papers:

  • Kumar, S., J. Dudley, M. Nei and K. Tamura K. 2008. MEGA:A biologist-centricsoftware for evolutionary analysis of DNA and protein sequences.Briefings in Bioinformatics9: 299-306.
  • K. Tamura, J. Dudley, M. Nei, and S. Kumar. 2007. MEGA4:Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0.Molecular Biology and Evolution24: 1596-1599.
It is available for free atits web siteat http://www.megasoftware.net.as Windows executables, with a downloadable manual. Manual webpages are also accessible there.It can be run under Mac OS X and under Linux using Windows emulators, ifyou have those.In addition, MEGA 4.1 is available as a downloadable beta release.An earlier version, MEGA 1.02, is also available there as a DOS executable.It is downloadable at the MEGA site and that version's manualis also available on line athttp://evolgen.biol.metro-u.ac.jp/MEGA/manual/default.html.

Xuhua Xia of the Department of Biology and theCenter for Advanced Research in Environmental Genomics (CAREG)of the Universityof Ottawa, Ontario, Canada(xxia(at) uottawa.ca) has released DAMBE(Data Analysis in Molecular Biology and Evolution), version 5.0.25,a general-purpose package for DNA and protein sequence phylogenies,and also gene frequencies. It can read andconvert a number of file formats, and has many features fordescriptive statistics. It can compute a number of commonly-useddistance matrix measures and infer phylogenies by parsimony, distance,or likelihood methods, including bootstrapping (by sites or by codons)and jackknifing. There area number of kinds of statistical tests of trees available, and many otherfeatures. Itcan also display phylogenies. DAMBE includes a copy of ClustalW; there is also code fromPHYLIP.An interesting feature is a simple web browser that allows sequences tobe fetched over the web while running DAMBE.DAMBE is described in two publications, a paper and a book:

  • Xia, X., and Z. Xie. 2001. DAMBE: Data analysis in molecular biology andevolution. Journal of Heredity92: 371-373, and a book:
  • Xia, X. 2000. Data Analysis in Molecular Biology and Evolution.Kluwer Academic Publishers, Boston.
DAMBE consists of Windows executables. It is available for free from its web site athttp://dambe.bio.uottawa.ca/dambe.asp.

Matthew Goode, Alexei Drummond, Ed Buckler, and Korbinian Strimmer, together with seven other contributors,have released PAL (Phylogenetic Analysis Library)version 1.5, a free collection of Java classes for use in molecularphylogenetics. The addresses of the four principal contributers are respectively:

Phylip Software Packages

  • Matthew Goode (m.goode(at) auckland.ac.nz),Bioinformatics Institute, School of Biological Sciences, University of Auckland, New Zealand.
  • Alexei Drummond (alexei(at) cs.auckland.ac.nz, Department of Computer Science,University of Auckland, New Zealand
  • Ed Buckler (esb33(at) cornell.edu),Department of Plant Breeding and Genetics, Cornell University, Ithaca, New York.
  • Korbinian Strimmer (strimmer(at) uni-leipzig.de, Institute for Medical Informatics,Statistics and Epidemiology (IMISE) of the University of Leipzig, Germany.
PAL is intended to facilitate the rapid construction of bothgeneral applications as well as special-purpose tools for phylogeneticanalysis. It focuses on probabilistic data modelling and provides,e.g., routines for
  • maximum likelihood, neighbor-joining and least squaresanalysis
  • probability models for nucleotide/amino acid substitution, including constraints for amolecular clock
  • bootstrapping, and the Kishino-Hasegawa-Templeton and Shimodaira-Hasegawa tests
  • simulation of trees and data sets, including coalescent treeswith growing populations and serial samples
  • reading and write trees and alignments
  • adjusting for rate variation among sites
  • obtaining splits from trees and calculating a distance between trees
among many other functions. It currently consists of over 200 components in 16packages. PAL is described in a paper:
  • Drummond, A., and K. Strimmer. 2001. PAL: An object-oriented programming library for molecular evolution and phylogenetics. Bioinformatics17:662-663.
It isavailable at its web siteat http://www.cebl.auckland.ac.nz/pal-project/. Two user interfacesare available which contain application programs written using PAL.They have separate entries in these pages:
  • Vanilla (by Strimmer): A simple text front end
  • Pebble (vCEBL) (by Drummond): A GUI interface to PAL plus a functional command language.
PAL can be run on any machine that has Java, and can also be compiled intonative code by the Gnu Compiler for Java (gcj).

Korbinian Strimmer, of the Institute for Medical Informatics,Statistics and Epidemiology (IMISE) of the University of Leipzig, Germany(strimmer(at) uni-leipzig.de),has written Vanilla, version 1.2, a character-basedinterface to the PAL Java classes, which includes anumber of programs carrying out different kinds of phylogenetic analysis,including:

  • MLDIST which computes maximum likelihood distances between DNA sequences, protein sequences, and two-state data, withcorrection for unequal rates at different sites. It has many differentsubstitution models available. It also computes observed distances and canobtain approximate estimates of unknown model parameters such as theTs/Tv ratio.
  • MLTREE which computes the likelihood of a given tree underthe same models as MLDIST, allowing branch lengths to be provided or tobe estimated by the program, with the possibility of constraining them tobe clocklike. If two or more tree are provided it can also comparethem using the Kishino-Hasegawa test, the Shimodaira-Hasegawa test,and expected Akaike weights.
  • EVOLVE simulates data along a tree using the above models.
  • DISTTREE computes least squares branch lengths from distancematrices on a given tree, and can also construct Neighbor-Joining andUPGMA trees.
  • REWRITE converts data sets between different formats.nucleotides and amino acid data, to estimate of maximum-likelihood branchlengths on trees (incl. clock trees and dated tips), for statistical (e.g.,Shimodaira-Hasegawa) and topological (Robinson-Foulds) comparison of trees, toinfer demographic parameters from trees (based on the coalescent), and alsoutility programs to reformat and modify alignments.
PhylipThere are also 6 other programs with a command-line interface whichcan estimate demographic parameters from coalescent trees,compute distance matrices from trees,reroot trees, and carry out some manipulations of data sets.Vanilla has a menu-based interface. It is written in Java, and isavailable fromits web siteat http://strimmerlab.org/software/vanilla/index.htmlIt can run on Java systems on many machines. Strimmer notes thatVanilla does not provide all the functionality in PAL, and is perhapsmost useful as a source of examples on how to use PAL.

Wayne Maddison of the Departments of Zoology andBotany, University of British Columbia, Vancouver, Canada, and David Maddison of the Department of Entomology, University of Arizona, Tucson,together with Peter Midford, Danny Mandel, and Jeff Oliver havereleased Mesquite, version 2.5. The project emailaddress is info(at) mesquiteproject.org. Mesquiteis a large and variedset of modules in Java to carry out a wide variety of analyses incomparative biology. It is also intended as a framework for otherdevelopers to use to add additional functons. Some of the over 500 functionsavailable in the project currently are:

  • Reconstruction of ancestral states by parsimony or likelihood and displayof the reconstructed states
  • Tests of process of character evolution, including comparative methods.
  • Simulation of character evolution (for categorical, DNA, or continuouscharacters)
  • Simulation or testing of tree shapes including the effect of acharacter on the shape of a tree
  • Inferences of the fit of gene trees to species trees
  • Parametric bootstrapping (with integration with programs such asPAUP* and NONA)
  • Morphometrics (PCA, CVA, geometric morphometrics)
  • Coalescence (simulations, other calculations)
  • Tree comparisons and simulations (tree similarity, Markov speciation models)
  • Search among trees using different tree rearrangement methods as well asexhaustive enumeration
  • Cluster analysis including single linkage and UPGMA methods
  • Trees can be displayed and manipulated
Other Java modules that use Mesquite include Tree Set Vizand a Java version of PDAP. Some mesquite modules make use of PAL.

Mesquite is available in Java source code and Java executables fromits web page at http://mesquiteproject.org. It can run on Mac OS X, Windows,and Linux/Unix systems using recent versions of Java.

Julien Y. Dutheil, Bastien Boussau, and co-workersof the Institut des Sciences de l'Evolution de Montpellier (ISE-M)of the Université Montpellier 2, France(julien.dutheil (at) univ-montp2.fr)have released Bio++version 1.8, a set of C++ libraries and programs dedicated to sequence analysis, phylogenetics, molecular evolution and population genetics. The Bio++ project is a collaborative effort to provide reusable implementations of standard phylogenetics and population genetics methods published in the literature, in order to analyze and manipulate sequence data, and with the goal to facilitate the development of new methods. Bio++ is fully object-oriented and documented. Two discussion forums are also available.A non-exhaustive list of available methods includes:

  • sequence and tree manipulations
  • a large set of substitution models (nucleotides, protein, codons)
  • distance estimation and tree reconstruction (by Neighbor Joining, BIONJ and UPGMA)
  • maximum likelihood methods
  • nucleotide diversity estimators
  • tools for drawing phylogenies
Two recent additions also allow you to query sequences from databases and to build GUIs using the Qt libraries. A set of example programs (The Bio++ Program Suite) is also available with examples and a manual. Bio++ contains one of the largest set of models for phylogenetics, including non-homogeneous models. It also features a very general way to set up your own non-homogeneous model and fit it, for instance assuming a different equilibrium GC content for distinct clades in the phylogeny.Bio++ is distributed as source code on a CVS/SVN server, and stable snapshots are made every six months. In addition to the source code, these stable releases can also be installed as pre-compiled packages for various linux distributions.It is described in the papers:
  • Dutheil, J., S. Gaillard, E. Bazin, S. Glémin, V. Ranwez, N. Galtier, and K. Belkhir. 2006. Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics. BMC Bioinformatics4 (7):188
  • Dutheil, J., B. Boussau. 2008. Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs. >BMC Evolutionary Biology.22 (8): 255.
It is available as C++ source code, Windows executables, Linux executables, Powermac Mac OS X executables and Intel Mac OS X executables, and packagedas .deb (Debian, Ubuntu, etc), .rpm (Fedora, Mandriva, etc) packages and aGentoo overlay. It can be downloaded fromits web siteat http://kimura.univ-montp2.fr/BioPP/

Jaime Huerta-Cepas, Joaquin Dopazo and Toni Gabaldónof the Comparative Genomics groupat the Centre for Genomic Regulation (CRG), Barcelona, Spain(jhuerta (at) crg.es)has released ETE(a python Environment for Tree Exploration),version 2.0.ETE is a Python programming toolkit that assists in the automated manipulation, analysis and visualization of hierarchical trees. It provides a broad range of tree handling options, specific methods to work on phylogenetics and clustering analyses, bindings to the phylogenomic databases such as phylomeDB, advanced node annotation, interactive visualization, and acustomizable tree drawing engine to create PDF tree images. It also implements methods for orthology and paralogy prediction and topological dating.It is described in the paper:Huerta-Cepas, J., J. Dopazo and T. Gabaldón. 2010. ETE: a python Environment for Tree Exploration. BMC Bioinformatics11: 24.It is available as C source code, Windows executables Mac OS X universal executables, and a Python module. It can be downloaded fromits web siteat http://ete.cgenomics.org

Rutger Vos of the School of Biological Sciencesof the University of Reading, United Kingdom(rutgeraldo (at) gmail.com)has released Bio::Phylo(Phyloinformatic analysis using perl),version 0.35, a phylogeny package with tree simulation, topology, visualization, data conversion functionality. It has modules for simulating tree shapes under various models, compute various tree topology indices, manage and convert data in various formats and visualize tree shapes.It is described in the paper:Vos, R. A., J. Caravas, K. Hartmann, M. A. Jensen and C. Miller. 2011. Bio::Phylo - phyloinformatic analysis using Perl. BMC Bioinformatics12:63. http://dx.doi.org/10.1186/1471-2105-12-63.It is available as Perl script. It can be downloaded fromits web siteat http://search.cpan.org/dist/Bio-Phylo/

Gavin Huttley, Rob Knight, PyCogent Development Teamof the John Curtin School of Medical Researchof the Australian National University, Canberra, Australia(gavin.huttley (at) anu.edu.au)has released PyCogent(COmparative GENomics Toolkit, written in Python)),version 1.4.1. PyCogent is a software library for genomic biology. It is an integratedframework for controlling third-party applications; devising workflows; querying databases; conducting novel probabilistic analyses of biological sequence evolution; and generating publication quality graphics. It is intended that it be able to carry out a variety ofphylogeny methods itself, but for now these have not been implemented. Itcan, however, be used to submit runs of some existing programs to inferphylogenies, including RAxML,FASTML, andMuscle. It is described in the paper:Knight, R., P. Maxwell, A. Birmingham, J. Carnes, J. G. Caporaso, B. C. Eastonet al. 2007. Pycogent: A toolkit for making sense from sequence. GenomeBiology8(8): R171.It is available as C source code, Python script, Linux executables, Intel Mac OS X executables and Mac OS X universal executables. It can be downloaded fromits web siteat http://pycogent.sourceforge.net/

Jeet Sukumaran and Mark Holderof the Department of Ecology and Evolutionary Biologyof the University of Kansas, Lawrence, Kansas(jeet (at) ku.edu)have produced DendroPyversion 3.6.1, phylogenetic computing library. DendroPy is a Python library for phylogenetic computing. It provides classes and functions for the simulation, processing, and manipulation of phylogenetic trees and character matrices, and supports the reading and writing of phylogenetic data in a range of formats, such as NEXUS, Newick, NeXML, Phylip, FASTA, etc. Application scripts for performing some useful phylogenetic operations, such as data conversion and tree posterior distribution summarization, are also distributed and installed as part of the libary. DendroPy can thus function as a stand-alone library for phylogenetics, a component of more complex multi-library phyloinformatic pipelines, or as a scripting “glue” that assembles and drives such pipelines.DendroPy's component SumTrees supersedes Sukumaran's previous programbootscore.DendroPy is described in the paper:Sukumaran, J. and Mark T. Holder. 2010. DendroPy: A Python library for phylogenetic computing. Bioinformatics 26: 1569-1571.It is available as Python script. It can be downloaded fromits web siteat http://packages.python.org/DendroPy/

Jason Evans, of Canonware.com(jasone (at) canonware.com)has released Crux version 1.2.0, a set of Python modulestogether with code in C, that carries out many methods in phylogenyreconstruction. It can be used to compute distances, likelihoods, anddo Bayesian MCMC on phylogenies. It can also find neighbor-joining trees,manipulate trees. and computer Robinson-Foulds distances between trees.Crux is written in Cython, an extension of Python which includes somefeatures of the C language. Evans describes Crux as particularly usefulfor developing scripts to automate phylogeny tasks.Installing it requires Python and a C compiler.It is available at its web site at http://www.canonware.com/Crux/

Applied Maths NV of Keistraat 120, 9830 Sint-Martens-Latem, Belgium (info @ applied-maths.com)has released Bionumerics, aprogram to manage a wide variety of biological data 'from1D patterns, 2D gels, phenotype arrays, and DNA/protein sequences'.In addition to database and image processingcapabilities, it can do clustering and phylogenetic inference. Avariety of clustering methods including UPGMA and neighbor-joiningdistance matrix methods are available, and for inferringphylogenies generalized parsimony and maximum likelihood are describedas available. Bootstrap support for groups can also be computed.There are also facilities for plotting the trees.Bionumerics is distributed as Windows executables. Bionumericsis commercial software. Information about it is available atits web siteat http://www.applied-maths.com/bn/bn.htm,including requesting a free demo version.For price and ordering information contact them through theweb site or by email, or by phone at +32 9 2222 100, fax them at +32 9 2222 102.Their U.S. Sales Office is at Applied Maths Inc.,13809 Research Blvd, Suite 645, Austin, Texas 78750. phone +1 512-482-9700, fax +1 512-482-9708 (email is info-us @ applied-maths.com).

John Czelusniak, then of the Department of Anatomy and Cell Biology,Wayne State University, Detroit, Michiganwrote sog, aC program demonstrating an algorithm to find the most parsimonious phylogenyalong with the parsimony strength of grouping (or Bremer decay index) fornucleotide sequences in one pass of a branch and bound algorithm. This differsfrom the implementation in PAUP* which uses a separate branch and bound searchto find the strength of grouping for each group in the tree, usingthe tree group exclusion option. John said (some time ago) that'sog is a rather ugly hackwhich will be optimized and streamlined. It IS ALPHA SOFTWARE, which means ithas not been tested extensively on datasets other than our primate datasets.'It is available at the IUBIO archiveat http://iubio.bio.indiana.edu/soft/molbio/evolve/.It is distributed as generic C source code which should be able to compileand run on any system that has a C compiler.

Rino Zandee (rino.zandee (at) gmail.com)formerly of the Institute of Evolutionary and Ecological Science, Van der KlaauwLaboratory, Leiden University, has written CAFCA version 1.5.12,the Collection of APL Functions for ComparativeAnalysis. It carries out asearch for the most parsimonious tree with discrete-character data (eithertwo-state or multistate), using a search for cliques of componentcompatibility (monothetic subsets) to propose the candidates for mostparsimonious trees. The program is written as functions in the APL language,but PowerPC Mac OS (or maybe it's Mac OS X) executables are distributed. The program is free and is available from theCAFCA Web Siteat http://www.mzandee.net/~zandee/cafca/.

Valery Zaporozhchenkoof the Research Centre for Medical Genetics, Moscow, Russia(valery (at) regmed.ru)has released Murkaversion 1.2, a phylogeny package for parsimony methods. It constructs mediannetworks and from them finds Steiner trees (estimates of the most parsimonioustree) from biological alignments. The package includes subprograms for building full median networks and their subsets (such as Median Joining and Reduced Median networks), extracting Steiner trees and analyzing results. Murkais a cross-platform command line application with a source code distributedunder the LGPL license. Documentation can be viewed at thedocumentation page at its web site.It is available as C++ source code, Windows executables and Linux executables. It can be downloaded fromits web siteat http://phylomurka.sourceforge.netFor visualization of trees and networks Murka requires that the graphvisualization programs GrappViz also be installed.

Pricing

Kai Müllerof the Nees-Institut für Biodiversität der Pflanzenof the University of Bonn, Germany(kaimueller (at) uni-bonn.de)has produced SeqStateversion 1.40. It carries out a variety of primer design functions and also calculates various statistics on aligned DNA sequences. For the purposes of this listing, the relevant feature is that it can be used to implement a number of different kinds of coding of indels (insertions and deletions).It is described in the paper:Müller K. F. 2005. SeqState - primer design and sequence statistics for phylogenetic DNA data sets. Applied Bioinformatics4: 65-69 and the different indel coding methods are discussed in two other papers:

  • Müller, K. F. 2006. Incorporating information from length-mutational events into phylogenetic analysis. Molecular Phylogenetics and Evolution38: 667-676.
  • Simmons, M. P., K. F. Müller, A. P. Norton. 2007. The relative performance ofindel-coding methods in simulations. Molecular Phylogenetics and Evolution44: 724-740.
Phylip Software PackageIt is available as Java executables, for Windows, for Mac OS X, and forLinux. It can be downloaded fromits web siteat Softwarehttp://systevol.nees.uni-bonn.de/software/SeqState

Naoko Takezaki,now of the Division of Genome Analysis and Genetic Research, Departmentof Medicine, Kagawa University, Kagawa, Japan, (takezaki(at) med.kagawa-u.ac.jp)has written gmaes, a program that estimates a gammadistribution parameter for rate variation among sites by counting the minimumnumber of substitutions at each site for a given tree topology.The program is distributed as generic C source code which can becompiled on any system that has a C compilerfrom the IUBIO archiveat http://iubio.bio.indiana.edu/soft/molbio/evolve/.

Chris Creevey and James McInerneyof the Bioinformatics and Pharmacogenomics Laboratoryof the National University of Ireland, Maynooth(chris.creevey (at) may.ie)have released CRANN(an Irish word for 'tree'),version 1.04, a program to detect natural selection using rates of synonymousand nonsynonymous substitutions. Crann takes FASTA format alignednucleotide sequence files and either infers a tree using neighbor-joiningbased on nonsynonymous differences, or allows the user to read in a tree.It reconstructs the placements of the synonymous and nonsynonymoussubstitutions on the tree, and carries out a statistical test for an excess ofnonsynonymous changes. It can also calculate synonymous and nonsynonymousdifferences between all pairs of sequences, and can also do that in a slidingwindow along the sequences.It is described in the papers:

  • Creevey, C. and J. O. McInerney. 2003. CRANN: Detecting adaptive evolutionin protein-coding DNA sequences. Bioinformatics19: 1726.
  • Creevey, C. and J. O. McInerney. 2002. An algorithm for detectingdirectional and non-directional positive selection, neutrality and negativeselection in protein coding DNA sequences. Gene300: 43-51.
It is available as Windows executables, Linux executables, Powermac Mac OS Xexecutables and Mac OS 9 executables. It can be downloaded fromits web siteat http://bioinf.may.ie/crann/

Mathieu Blanchette, of the School of Computer Science, McGillUniversity, Montréal, Québec(blanchem(at) mcb.mcgill.edu), Fei Feng(fei (at) cb.mcgill.ca),of the same school, and Martin Tompa of theDepartment of Computer Science and Engineering at the University ofWashington, Seattle (tompa(at) cs.washington.edu) have releasedFootPrinter 2.0, a programthat uses parsimony scores to carry out 'phylogenetic footprinting' tosearch for regulatory sequences in the vicinity of genes that have beensequenced in multiple species. The program looks for locations upstreamof each gene which, when taken together on a known phylogeny, show thelargest amount of conservation by having the smallest number of changes ofstate along the tree. The method is described in these papers:

Phylip
  • Blanchette, M. and M. Tompa. 2003. FootPrinter: a program designed forphylogenetic footprinting. Nucleic Acids Research31: 3840-3842.
  • Blanchette, M. and M. Tompa. 2002. Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Research12: 739-748.
  • Blanchette, M., B. Schwikowski, and M. Tompa. Algorithms for phylogeneticfootprinting. Journal of Computational Biology9: 211-223.
The program is available as C source code (including some programs fromPHYLIP) froma web siteat http://bio.cs.washington.edu/software/motif_discovery#Motif%20Discovery.Two web servers are available,one running FootPrinter 3.01, a more recent version,and one, MicroFootPrinter, that searches for prokaryotic sequences that are similarto your sequence and runs a FootPrinter 2.0, on that data set.

Daniel Barker (db60 (at) st-andrews.ac.uk)of the University of St. Andrews, Scotland, U.K.,has written LVB version 3.1,a program for inferring phylogenies using parsimony and simulated annealing.Simulated annealing is intended to allow searches for most parsimonious treeswith large numbers of species.It is described as often giving good results with large matrices. Up to16383 objects and 32766 characters may be used. Aligned nucleotide sequenceswith ambiguous nucleotides and/or discrete morphological characters can be used.Bootstrapping of the data is also supported.The program is currently available in ANSI C source code as a Unix tar file,and as executables for Windows, Mac OS X, and Linux.The text of a manualcan also be read or downloaded from the web site.LVB is available from its Web site athttp://eggg.st-andrews.ac.uk/lvb. It is also available as aWeb serverfrom the Institut Pasteur.

Dick Hwang of the Department of Genome Sciences,University of Washington (dhwang(at) u.washington.edu)has written GAPars, a program using a genetic algorithm to search formost parsimonious phylogenies. The program is written in C++ and shouldcompile on Unix C++ compilers and on most other C++ compilers. He describesit as working 'rather inefficiently' and 'not ready for prime-time use'.It can be obtained by emailing Hwang at the address above.

Quinn Snell, Mark Clement, and Hyrum Carrollof the Computational Science Laboratory of the Department of Computer Scienceat Brigham Young University, Provo, Utah(snell (at) cs.byu.edu)and (clement (at) cs.byu.edu)have written PSODA, a parsimony program for nucleotide sequences. The program reads the NEXUS file format, and carries out heuristic rearrangement of trees using the parsimony criterion.It is available as C++ source code, Windows executables, Linux executables and Powermac Mac OS X executables. It can be downloaded fromits web siteat http://dna.cs.byu.edu/psoda/

Rod Page (r.page(at) bio.gla.ac.uk), of the Division of Environmental andEvolutionary Biology of the University of Glasgow has releasedGeneTree, version 1.3.0,a program that produces 'reconciled trees' that fit a tree of gene copies toa species tree. It uses a parsimony criterion where the penalty is thenumber of deletions and duplications required to reconcile the gene tree withthe species tree. The program is described as 'preliminary'. The programis described in the paper: Page, R. D. M. 1998. GeneTree: comparing gene andspecies phylogenies using reconciled trees. Bioinformatics14:819-820, and its algorithm is described in the paper:Page, R. D. M. and M. A. Charleston. 1997. From gene to organismal phylogeny:Reconciled trees and the gene tree/species tree problem. Molecular Phylogenetics and Evolution7: 231-240.It is available as a Macintosh executable and as an executable forWindows. They areavailable from the GeneTree web site athttp://taxonomy.zoology.gla.ac.uk/rod/genetree/genetree.html.A manual is also available online there.

John Huelsenbeck(johnh(at) berkeley.edu) of theDepartment of Integrative Biology, University of California, BerkeleyreleasedCodonBootstrap version 3, now distributed byJonathan Bollback. This is a utility thatwill generate non-parametric bootstrap data sets from a DNA sequence file. Theprogram re-samples codons to (1) avoid problems when analysing data undermodels that assume coding structure (e.g., rates partitioned by sites), or(2) when the user wishes to re-sample sites and maintain the originalautocorrelation among positions within the codon.CodonBootstrap is available as a C source code that can be compiled forUnix from Jonathan Bollback's software web pageat http://www.simmap.com/bollback/software.html.A Macintosh version that was formerly distributed seems not be availableany more.

Mark Clement, David Posada, and Keith Crandall of the Universidad Vigo, Spain (Posada) and the Department ofZoology, Brigham Young University, Provo, Utah (dposada(at) uvigo.es)have released TCS version 1.21, a program forestimating gene genealogies within a population. It does so by using themethod introduced in the paper: Templeton, A. R., K. A. Crandall andC. F. Sing. 1992. A cladistic analysis of phenotypic associations with haplotypes inferred from restrictionendonuclease mapping and DNA sequence data. III. Cladogram estimation.Genetics132: 619-633.This is a method that connects existing haplotypes in a minimum spanningtree which is essentially a parsimony method. It can also infernetworks with loops in them.TCS is written in Java and has a graphic user interface for thedisplay of the resulting networks. It may be run on any system that has the Javaruntime environment. The program is described in the paper:Clement M., D. Posada, and K. Crandall. 2000. TCS: a computer program toestimate gene genealogies. Molecular Ecology9: 1657-1660.It implements the estimation of the 95% parsimony connection limit, and the estimation of outgroup weights (whichare used to designate the root of the tree). It takes as inputsequence files in NEXUS or PHYLIP format, and accepts absolute distancesbetween sequences as input.The output is a Postscript picture of the tree, which can be saved as aPostscript file.TCS is available as Java executables, with documentation, atits web siteat http://darwin.uvigo.es/software/tcs.html.

David Posada (dposada(at) uvigo.es), of theUniversidad Vigo, Spain, Keith Crandall, of the Department of Zoology,Brigham Young University, Provo, Utah (Keith_Crandall(at) byu.edu)and Alan Templeton, of the Department of Biology of Washington University, SaintLouis, Missouri (temple_a(at) biology.wustl.edu) have made availableGEODIS (version 2.6).It implements Templeton's method of Nested Clade Analysis, whichis intended to distinguish between historical divergence of populationsand geographical separation, using the geographical distribution ofhaplotypes in a genealogy. GEODIS is a Java program which can run onany platform. It is described in a paper: Posada D., K. A. Crandall andA. R. Templeton. 2000. GeoDis: A program for the cladistic nested analysis ofthe geographical distribution of genetic haplotypes. Molecular Ecology9: 487-488. It is available atits web siteat http://darwin.uvigo.es/software/geodis.html

Jon Jeffery (jon (at) donnasaxby.com), then of the Insitute of Biology, Leiden University, The Netherlandshas written Parsimov, a series of Perl scripts to implement'event cracking', a parsimony-based method of finding the minimum number ofchanges in developmental sequences of events that are necessary to explainthe evolution of pairs of characters on a tree. Among the uses of thismethod is to reconstruct ancestral developmental sequences.The programs include:

  • Parsimv7g.pl which implements event-pair 'parsimony cracking'.
  • ReplacerParsimv.pl which takes a Parsimv7g.pl output file and replaces the PAUP* character numbers with more readable character names according to a user-specified text list.
  • Describe.pl, which creates a PAUP*command file to describe each tree in memory under ACCTRAN and DELTRAN optimizations (saving each as separate log files) plus a Parsimv7g.pl batch file (e.g., ParsBatch.txt) to crack each of the PAUP* log files produced.
The programs can beexecuted on any system that has Perl installed. They are described in apaper: Jeffery, J.E., O.R.P. Bininda-Emonds, M.I. Coates, and M.K. Richardson.2005. A new technique for identifying sequence heterochrony. Systematic Biology

Phylip Software Package Pricing

54: 230-240.The Parsimov programs are available as (separate) downloads atOlaf Bininda-Emonds's software web pageathttp://www.uni-oldenburg.de/molekularesystematik/en/34011.html#EvoDevo

David Swofford, of the Center for Evolutionary Genomics,Duke University, Durham, North Carolina, together with Stewart Berlocherof the Department of Entomology of the University of Illinois, Urbana,Illinois wroteFreqpars. It implements parsimony analysis based on genefrequencies. The method was described by D. L. Swofford and S. H. Berlocherin a paper in Systematic Zoology36: 293-325, 1987. The programis available in FORTRAN 77 source code. The search for most parsimonioustrees under Swofford and Berlocher's criterion is not very extensive,Swofford notes,because the individual tree evaluations are computationally difficult.The source code in FORTRAN, with documentation, has been madeavailable (after a period of unavailability) at Swofford's PAUP website as one of a number of 'companion applications'.

To next section of software pages

Notices added in compliance with University of Washingtonrequirements for web sites hosted at the University: PrivacyTerms

PHYLIP Exercises

PHYLIP (Phylogenetic Inference Package) provides a set of 'classic'phylogeny programs that have been available since 1980Phylip Home Page.

Unfortunately, in part because they were written in the 80's, the userinterface is quite primitive, and in some ways somewhat hostile.Fortunately, the PHYLIP programs have been repackaged as part of theEMBOSS software package, which provides a much more modern commandline interface around the PHYLIP programs. In addition, EMBOSSprovides some other very helpful programs for producing files in thecorrect format.

This workshop will use the EMBOSS programs on interactive.hpc to construct evolutionarytrees using protein and DNA sequences. It is possible to run the workshop on hpc, but you will NOT be able to use the EMBOSS versions of the programs.

This series of exercises will be your homework for Wednesday, March14. Please do the exercises in a new biol4230/hwk6 directory.Though we will do this exercise interactively today, please create aphylip.sh shell script file that shows exactly thesteps you used to do the analyses.

  1. Before you can use the EMBOSS programs, you will need to ensure thatseqprg/emboss/bin is in your path.Check to see that the EMBOSS programs are in your path by looking at help on one of them:All of the EMBOSS programs have a -help option, that you will need to use to learn how to specify the program input and output file names, and other options.
  2. On interactive.hpc.virginia.edu, copy the files gstm.alib andgstm.nlib from ${SLIB2}/biol4230/data/phylip to a new hwk6 directory.
  3. Align the gstm.alib sequences using(the -stable option ensures that the output alignment is in the same sequence order as the input)

    By default, muscle writes out the result in FASTAformat, which you can use to produce the DNA alignment. You may alsowant to write out the alignment in Clustalw format (option-clw) to look at alignment conservation.

    Looking at either the FASTA or ClustalW format multiple sequence alignment, how many gaps do you see? Do you think a different alignment program would produce a different multiple sequence alignment?

  4. Use tranalign program to align the protein sequences in gstm.alib to the DNA sequences in gstm.nlib.

    tranalign -asequence gstm.nlib -bsequence gstm.a_aln -outseq gstm.n_aln
    Look at the gstm.n_align file. Is it in PHYLIP format?

  5. Use the seqret program:to reformat gstm.a_aln and gstm.n_aln alignments in FASTA format into PHYLIP format (gstm.a_phy, gstm.n_phy).
  6. Use the fprotdist program to build a matrix of protein distancesfrom gstm.a_phy.

    Use the fdnadist program to build a matrix of DNA distances fromgstm.n_phy

  7. Use the ffitch and fkitsch programs to build trees from theprotein and DNA distance files. For fitch (but not fkitsch), you should specify an outgroup: -outgrno 19When you run the program, it will ask for an (optional) -intreefile, which you do not need (or have). Just hit return, or create a file with a blank line in it (not empty, it must have one newline). If you call it 'blank-line.txt', you can run:And the program will run properly.
    1. Looking at the program output (-outfile option), do both theprotein distance and DNA distance trees look the same? Do theffitch and fkitsch trees look the same?
    2. The data set includes several paralogous human, mouse, and rat glutathione S-transferases.Can you identify the mouse/rat orthologs?
    3. Can you identify any mouse/human orthologs? What evolutionary events might cause the human/mouse orthologs to be more difficult to identify?
  8. Use the fprotpars and fdnapars programs to build trees from theprotein and DNA alignment files.
  9. Use the fdnaml and fdnamlk programs to build trees from the DNAalignment file.
  10. Use the fconsense program to compare the trees. To do this, you must combine all the tree files you produced with the different programs into one:
    1. Which parts of the tree are found by all 3 methods on both DNA and protein datasets?
    2. Does one method (distance/parsimony/maximum likelihood) do a better job of assigning mouse/human orthologs than the others?
Homework 6 (hwk6) due Wednesday, March 14, at noon, should provide a script, with comments, that does each of the analyses listed above. A second file, answers.txt

Phylip File

should answer the additional questions in parts:

3. — multiple alignment and gaps

7. — are the trees the same, which are the orthlogs

Phylip Download

10. — which parts of tree are consistent, which method identifies more orthologs

Course home page