Programs, modules, toolkits, and packages required in order to run through this pipeline in its full mode. If you want to carry out this pipeline on a Windows platform, you will need to have a Unix portal, such as Cygwin, installed or run Linux in addition to Windows. If you do not intend to go through all steps, some software might not be needed.

Software Name Description Where to find it Step(s) that require(s) this software
Ubuntu Linux Ubuntu is one of many Linux versions. The advantage of Ubuntu, and many other Linux distributions, is that it can be easily installed and removed on a Windows PC or a Mac, without need of reformatting your hard drive. (Mac OS X or PC)
All (not needed on Mac)
CygWin CygWin is a Unix-environment portal that allows you to run most of the Unix-formatted software described here on a PC. (Windows only) All (not needed on Mac)
Xcode Xcode is a suite of application tools from Apple that includes a modified GNU Compiler Collection (supports basic languages such as C, C++, Python, Perl, etc.). For Mac OS X "Lion", Xcode can be downloaded for free from the AppStore. For "SnowLeopard", Xcode comes shipped as part of the "Developer's Tools" CD. If you are using Windows, Python and Perl need to be installed separately (see below). (Mac OS X only)
Xcode 3 or 4
Fortran compilers gcc and g77 Fortran compilers allow you to build your own executable files from source code, which is needed to install most of the software in this list. (Intel Macs)
(Linux or Windows)
See specific packages for your chosen Windows emulator or Linux version
Git Git allows you to easily download and install software through the Terminal interface. (Mac OS X or Windows)
Text editor GUI text editors are recommended for editing scripts. Some editors support certain programming languages and highlight/color text so it can be easily interpreted for debugging purposes. TextWrangler (Mac OS X only)
Notepad ++ (Windows only)
Python 2.7 Python is a programming language. This may be included in your Xcode download. Check to see if you have python installed by typing python in Terminal and hitting enter. The first line should tell you what version you have installed. If this does not put you in interactive script mode, then you must install python. (Mac OS X, Linux, or Windows)
Python 2.7.x
Raw data post-processing; BLAST, Gene annotation
Biopython Biopython is a set of tools for biological computation, all written for the programming language Python. Biopython is filled with useful libraries and applications for a variety of bioinformatics tasks. Please reference for more information. (Mac OS X, Linux, or Windows)
Raw data post-processing; BLAST, Gene annotation
NumPy NumPy (Numerical Python) is a python module relevant to Biopython. NumPy is pre-installed on Python 2.7 on the Max OS X "Lion". (Mac OS X, Linux, or Windows)
Raw data post –processing; BLAST, Gene annotation
SciPy SciPy (Scientific tools for Python) is another module relevant to Biopython.  (Mac OS X, Linux, or Windows)
Raw data post-processing; BLAST, Gene annotation
Perl Perl is a programming language. This may be included in your Xcode download. If it is not included in Xcode, check to see if you have Perl installed by typing perl in Terminal and hitting enter. If this does not put you in interactive script mode, then you must install Perl. (Mac OS X, Linux, or Windows)
Mapping reads to reference
BioPerl BioPerl is a module for biological computations in the Perl programming language. (Mac OS X, Linux, or Windows)
Parsing BLAST output.
Java Java is a programming language used by many programs referenced below. You may already have this installed on your computer. Check to see if you have java installed by typing java -version in Terminal. If java is not installed, the computer should prompt you to install it. (Mac OS X)
Applications/Utilities/Java Preferences.
(Linux or Windows)
Alignment processing and variant detection
Apache ant Apache ant is a Java library called on by some software in this protocol. Check if it's pre-installed by typing ant –version in a Terminal window. If it is not installed, you need to install it. (Mac OS X, Linux, or Windows)
SNP detection
R R is a software environment for statistical computing and graphics. It has its own language and syntax as well as its own environment, all of which are downloaded in the software package. R is a very powerful program, which has applications that extend far wider than genomics. For more information about R, please reference  (Mac OS X, Linux, or Windows)
Expression count
FASTX-Toolkit FASTX-Toolkit is a FASTQ short-reads pre-processing toolkit. It contains programs for quality-trimming and clipping, computing of quality statistics, converting files, reverse-complementing, splitting barcodes, and more.  (Mac OS X or Linux)
Raw data post-processing
CLC Genomics Workbench CLC Genomics Workbench is a platform for genomic analysis. It can perform a variety of tasks, but we are using CLC only for creating de novo assemblies. There are other programs that can do de novo assemblies, but they may require more memory than your computer has. Read more about what CLC Genomics Workbench can do at (Mac OS X or Windows)
CLC Genomics Workbench ($4,995)

CLC Genomics 2 week free trial (Mac OS X or Windows)
"De Novo" assembly
BWA Burrows-Wheeler Aligner (BWA) is a program that will align short nucleotide sequences within one file (usually an individual) to a reference sequence (usually a whole genome or, in our case, a de novo assembly). There are other programs that can do alignments (such as CLC), but this is a freely available one that produces reliable results.  (Mac OS X or Linux)
Mapping reads to reference
BLAST+ 2.2.25 Local BLAST will allow you to perform searches locally on databases downloaded onto your computer. Latest releases of both software and NCBI databases (e.g. nr) can be found at (Mac OS X, Linux, or Windows)
BLAST, Gene annotation
Uniprot Knowledgebase UniProt/SwissProt contains information about protein sequences and UniProt ID tags in a curated database, which can be used for functional analyses. UniProtKB/Swiss-Prot (download FASTA format) BLAST, Gene annotation
nr database nr is a large database containing all non-redundant GenBank protein translations. (download nr.gz)
BLAST, Gene annotation
DESeq DESeq is a package for the software environment R to analyze count data from alignment files to test for differential expression. DESeq must be installed through R. (Mac OS X, Linux, or Windows:
must be downloaded via R)
Test for differential expression
ErmineJ ErmineJ allows for analysis of GO (Gene Ontology) categories for RNA-Seq data to test for overrepresented biological pathways. ErmineJ requires Java >=1.5 in order to run properly. (Mac OS X, Linux, or Windows)
Functional enrichment analysis
SAMtools SAMtools is a software package that allows you to manipulate and view .sam and .bam files. SAMtools (UNIX-based OS)
Alignment processing
Picard Tools Picard Tools is a Java based program for handling .sam and .bam files. Picard Tools (UNIX-based OS)
Alignment processing
Genome Analysis Toolkit GATK is a software package developed by the Broad Institute for the analysis of genomic data. We use it specifically for variant detection. GATK (UNIX-based OS)
Variant detection
EigenSoft EigenSoft is a software package for Principal Components Analysis. The software available online only works in Linux and must be re-compiled in order to run on other system (A Mac executable is available in the SFG scripts repository). (Linux only) Genotype analysis
BayeScan BayeScan is a GUI program for detecting candidate SNPs under selection in a genomic dataset by analyzing differences in allele frequencies between populations. There are other programs (such as Lositan Selection Workbench) which can also perform FST outlier tests. (Mac OS X, Linux, or Windows)
FST outlier tests