Database and Way to Use Them
- What Databases can be searched at CBI?
- What is the difference between dbEST and the EST division of GenBank/EMBL?
- What is the best way of searching for a sequence in the databases by keyword?
- How can I submit a sequence to EMBL/Genbank?
- Is it true that a sequence database entry never has multiple LOCUS(GenBank,etc.)or ID(EMBL,etc.) names?
- What is the mechanism by which multiple accession numbers get assigned to one sequence?
- What are the current and new accession number formats?
What Databases can be searched at CBI?
Go to SRS, please.
What is the difference between dbEST and the EST division of GenBank/EMBL?
The sequences and accession numbers are identical. The dbEST, database however, contains additional annotation not found in GenBank/EMBL, such as information on the clone library, sequencing method, map location if known, sources for obtaining the physical clone, and results of blast sequence similarity searches.
What is the best way of searching for a sequence in the databases by keyword?
How can I submit a sequence to EMBL/Genbank?
Go to WWW submission of nucleotide sequences to EMBL or GenBank .
Is it true that a sequence database entry never has multiple LOCUS (GenBank,etc.) or ID (EMBL,etc.) names?
In any particluar release there should be one unique LOCUS name for an entry.However there is no guarantee that an entry's LOCUS or ID name will remain constant from release to release ,i.e.,locus name is not the primary reference for a sequence.The accession number is the unique identifier for a sequence data entry,the only ID guaranteed to remain constant. Essentially the same entry may have different LOCUS(ID) names in EMBL and GenBank but the accession number will be the same in both databank releases.
What is the mechanism by which multiple accession numbers get assigned to one sequence?
If an entry is revised, e.g., when smaller entries were merged into a larger stretch of continuous sequence, the new entry is given a new accession number (a "primary" accession number, the first one after ACCESSION in that field) and the primary accession number(s) of the earlier entry/entries are entered as secondary accession numbers in the new entry. Secondary accession numbers are all of those that follow to the right of the first primary accession number on the ACCESSION line.
What are the current and new accession number formats?
Currently, accession numbers used by the nucleotide sequence databases consist of one prefix letter followed by 5 digits(e.g.,A00001). EST projects and projects to add patent data have accelerated the need to extend the accession number space. It is projected that the databases will run out of accession numbers within 8 to 10 months.
A new form of accession number will be created, defined as an 8-character alphanumeric string, beginning with two upper case letters and followed only by digits (e.g., SR004562). Leading and trailing zeros are significant. The letter 'O' will not be used.
Existing 6-character accession numbers will remain as they are, and will never be transformed to an 8-character form.
New accession numbers will not be used before February 1, 1996. DDBJ/EMBL/GenBank agree to avoid using new accession numbers as long as possible after that.
Copyright © 1996-2008,