Data extraction

The Human subdivision database from GenBank release 183 was used to extract intronless genes according to the CDS feature. A perl script, which we called GPS (GenBank Parser Script) was used to identify genomic position of intronless genes and extract their sequences from the sequences of assembled human chromosomes based on the following criteria:
(i) Contains the field CDS in the features
(ii) The CDS line contains a continuous span of bases indicated by two numbers separated by two periods (for example 23..179).
(iii) Contain the word DNA in the LOCUS line.

Data curation

As the extraction procedure above can catch intronless CDS some of which can be multi-exonic genes, the sequences extracted were subject to manual curation. Using the entry names, we searched for individual Genebank files in the Entrez Gene database to get the most updated annotation of that entry.

Data classification

The entries were classified according to their Refseq annotation status (Pruitt et al., 2005).
They were also classified according to their functional class in GeneOntology (GO).

Data update

IGD updates are provided manually on a yearly basis by checking genes structure using their accession number in ENTREZ Gene database. A daily update is provided by ENTREZ Gene through the Gene links for each record. Updates include records updated to reflect sequence or annotation changes, including complete re-annotation of a genome.


* Pruitt,K.D. et al. (2009) NCBI Reference Sequence (RefSeq): current status, policy and new initiatives. Nucleic Acids Res., 37, D32-D36.
* Sakharkar,M.K. and Kangueane,P. (2004) Genome SEGE: a database for intronless genes in eukaryotic genomes. BMC Bioinformatics, 5, 67.
* Sakharkar,M.K. et al. (2002) SEGE: a database on intronless/single exonic genes from eukaryotes. Bioinformatics, 18, 1266-1267.

24422 visitors
Copyright 2019 Intronless Gene Database. All rights reserved
PLease address comments, questions or suggestions regarding this website to: