Role: The purpose of the SEQUENCE class is to represent all types of sequence information that may be associated with various map markers, with the exception of primer sequences which have their own (simplified) data type. That is, rather than making the actual text sequences physically part of each such record, all sequences are stored in a separate table and associated with the appropriate map marker through a join operation. Only one SEQUENCE record of a given seq_type can be associated with a given map marker. Other restrictions may apply as well, e.g., you would not expect to associate a sequence with a RAPD.
How/Who enters: Currently known SEQUENCES are entered initially by the DB Admin, as part of entering the information for known map markers. In the future, it is expected that authorized submitters will add/update SEQUENCES in the course of describing various map markers.
Security: Only authorized submitters are allowed to add new SEQUENCE records to the database; as always, submissions are tagged with the time and the identity of the submitter. Additions to related_pubs are allowed, as usual, but all other fields can be changed only by DB_Admin once the submission has been made.
| Superclasses: | GENERIC_OBJECT | |
| DATA_ITEM | comments, primary_pub, related_pubs, record_status |
marker_id (CLONED_GENE, REQUIRED). Contains the zdb_id of the CLONED_GENE which this SEQUENCE record describes.
seq_type ( closed list{protein, genomic, cDNA, EST}, REQUIRED). Describes the type of sequence that is described in this SEQUENCE record.
sequence ( TEXT, REQUIRED). The actual sequence, i.e., a list of base pairs.
genbank_num (TEXT, REQUIRED). Contains the GenBank accession number for the sequence. The fact that this field is required implies that we accept only sequences that have been submitted to GenBank.
start_codon (integer, OPTIONAL). Integer that locates the start codon within the sequence
stop_codon (integer, OPTIONAL). Integer that locates the stop codon within the sequence.
introns (TEXT, OPTIONAL). Contains a comma-separated list of integers that describe the intron locations in the sequence.
• How long are the strings of base pairs that we want to put in sequence attribute? Need to make sure it isn’t higher than limit for TEXT data type.
• Are we going to check the validity of the GenBank number they give us dynamically?
• How to handle updates to the sequence stored in GenBank --> if we are caching the actual sequence, we have an updating/consistency problem. TED: Don’t cache it, get it dynamically each time. ECK: let’s cache and find a way to make updates automatically.