Connecting Infrastructure, Connecting Research

Bioinformatics Applications and Databases

This page provides information about the existing bioinformatics applications available on the NGS and the bioinformatics databases available at RAL, Oxford, Leeds and Glasgow-Scotgrid.

The NGS at RAL would like to thank the researchers at the Institute of Grasslands Research (IGER) at Aberystwyth, and at the NERC Environmental Bioinformatics Centre at Oxford University notably Dr Bela Tiwari , for their guidance in setting up these databases on the NGS.

These discussions were partly funded by a BBSRC grant "Supporting Bioinformatics Research on the NGS" which ended in Sept 2007.

Applications

For more information and examples on how to run the applications below, click on the name of the relevant application.

  • Mrbayes Bayesian estimation of phylogeny
  • Beast Bayesian MCMC analysis of molecular sequences
  • NCBI BLAST Toolbox
  • NCBI BLAST - Rapid searching of nucleotide and protein databases
  • mpiBlast - Parallelization of NCBI BLAST (deprecated)
  • Emboss - Suite of bioinformatics applications e.g. for sequence analysis,  enzyme kinetics
  • EXONERATE - Pairwise sequence comparison
  • FASTA - Sequence homology search
  • GROMACS - Molecular dynamics
  • NAMD - Molecular dynamics
  • SIESTA - Electronic structure calculations
DNA

 

Databases

The following databases are hosted at STFC/RAL, Oxford OERC, Leeds NGS, Glasgow Scotgrid:

Database Location Description

EBI EMBL NUCLEOTIDE
(release 102 (Dec 2009) and daily updates)

See the release notes file, titled relnotes.txt for a complete description.
--in    
FASTA format ${DB}/EBI_NUCLEOTIDE_DB/fasta_DB/ The files retain the names they have in the mirror site (i.e. according to data class and taxonomic division) with the suffix "em_rel" for the quarterly release and the suffix "em_cum" for the updates) More details can be found in : EMBL Nucleotide Sequence data files in FASTA format
BLAST format ${DB}/EBI_NUCLEOTIDE_DB/blast_DB/ The files are named after data class and taxonomic division (ie. est_env) The updates have the suffix "_upd" i.e., "est_inv_upd" For more detilas, please read EMBL Nucleotide Sequence data files in BLAST format
MPI-BLAST format ${DB}/EBI_NUCLEOTIDE_DB/mpi-blast_DB/ The files are split by data class. For more details, please read EMBL Nucleotide Sequence data files in MPI-BLAST format
EBI Uniprot Knowledgebase PROTEIN (latest update)  
-- in    
FASTA format ${DB}/EBI_PROTEIN_DB/fasta_DB/ For a list of the Protein Sequences files, please refer to EBI Uniprot Protein files
BLAST format ${DB}/EBI_PROTEIN_DB/blast_DB/  
MPI-BLAST format ${DB}/EBI_PROTEIN_DB/mpi-blast_DB/  
PROSITE (latest update)  
PROSITE uncompressed files ${DB}/PROSITE_DB/ Retrieved from this site.
PROSITE integrated in EMBOSS ${EMBOSS}/PROSITE/  
PRINTS (latest update)
PRINTS uncompressed files ${DB}/PRINTS_DB/ Retrieved from this site.
PRINTS integrated in EMBOSS ${EMBOSS}/PRINTS/  
REBASE (latest update)
REBASE uncompressed files ${DB}/NEB_REBASE_DB/  
REBASE integrated in EMBOSS ${EMBOSS}/REBASE/ Retrieved from this site.

where ${DB} stands for the local location of the database files eg at RAL /var/data/bioinformatics/db/ . (The following instructions can be applied to a data file containing one or more sequences.)

  • To convert an EMBL(or a non-EMBL) format data file into a 'FASTA' formatted file:
    /usr/ngs/EMBOSS seqret "Pathname of a data file" "Pathname of output file" -osformat fasta

    [N.B. Make sure that your sequence data file can be read by 'seqret'.]

  • To convert a 'FASTA' formatted file containing one or many sequences into a 'BLAST 2' database:
    • Create a file called '.formatdbrc in the current directory.
    • [NCBI]
      Data=/usr/local/applications/bioinformatics/ncbi/data
      
    • Use Command:
      /usr/ngs/BLAST-TOOLBOX-NCBI formatdb -i "Pathname of a FASTA file" \
      -o T -p F -n -t "Title for database file" \
      -v "Size of database(in millions of letters)" \
      -l "Pathname of a log file"
  • Contact

    For any difficulties you have using the above software and databases, for more information or for letting us know about other software applications you would like to use, please contact the NGS Helpdesk.

 

Applications Support

The NGS cannot offer scientific support for applications. However if you require further information or believe there is something wrong with the installation, please contact the NGS support centre.

Acknowledgements

Please note: When publishing work based on use of the NGS, users should acknowledge both the authors of any programs used (see the individual program web sites, or contact the authors directly) and the NGS directly using the following line:
"The authors would like to acknowledge the use of the UK National Grid Service in carrying out this work"
This line must also accompany any use of the NGS logos.