• Core Target Regions

  • Request for Additional Sequencing Projects

  • Data and Analysis Results

  • Consortium Members

  • Management and Oversight of Project

  • Informatics

  • NCBI clone-registry

  • Useful Resources

  • Data Use Policy

  • Contact Webmaster


  • Informatics


    The integration of efficient data capture and analysis systems in a project of this scope is essential both for the smooth running of the project but also to maximise the analysis. This breaks down into the following areas:

    • Coordination of mapping and sequencing

    • Support for sequence determination

    • Sequence annotation

    • Comparative sequence analysis

    • Downstream Integration

      Within the lifetime of this project we will be designing specialist tools and software, especially in the area of comparative analysis.

      Coordination of mapping and sequencing

      Sequence ready maps will be fingerprinted at the laboratory responsible for mapping each region using the Image and FPC software, within which a minimal tiling path can be elected for sequencing.

      Clones from the assembled contigs will be registered with the NCBI clone-registry in order to avoid duplication of effort with other sequencing centres.

      Support for sequence determination

      The Hinxton Sequencing Consortium will use the same software systems for assembly and finishing, namely PHRAP, GAP and CAF packages.

      During the sequencing process, assembled contigs longer than 2kb will be made available by ftp and for sequence searching and submitted automatically to the HTG division of EMBL/Genbank, following the Bermuda agreements for large-scale genomic sequence determination.

      Sequence annotation

      Automated sequence annotation will be carried out not only on finished sequence but also on partially finished sequence:

    • Unfinished Sequence

      This will annotated using the Ensembl programme developed at the Sanger Centre by the Annotation Group.

      See http://ensembl.ebi.ac.uk

      It's a joint EMBL-EBI and Sanger Centre project. See site for more information. Note spelling "Ensembl"

    • Finished Sequence

      Once the finished sequence for individual clones has been determined, the primary sequence annotation process carries out computational analyses and database searches to search for information on the biological content of the sequence, following which it is submitted to the main EMBL nucleotide database.

      Have a look at our explainations at http://www.sanger.ac.uk/HGP/Humana/

      Comparative sequence analysis

      The current strategies for sequence annotation, are not optimised for using homologous genomic sequence. However we expect comparison of the corresponding mouse and human sequence to be the most valuable source of information in the current programme, in particular for confirming gene structure and identifying potential regulatory sequences.

      Various computer programs and software systems have been or are being developed for the analysis of this type of data, e.g. PIP, ALFRESCO. We propose to expand and to undertake further work in this area, in particular assessing the available methods for their potential for large scale automated application, and integrating them into the standard analysis workflow and database systems.

      Downstream Integration

      In order to gain the maximum utility from finished and annotated sequence it will be essential that it is viewed in context with other experimental and genetic information from corresponding mouse and human chromosomal regions. We propose to develop targeted data integration tools that will seek to aid the integration and discovery of links between a number of different data sets, for example:

    • Mouse mutagenesis - detailed phenotype data derived from genome-wide and targeted mutagenesis programmes worldwide.

    • Expression profiles - both micro-array, in situ and other forms of gene expression data deposited in the Mouse Gene Expression Database.