Document not found! Please try again

genes either interactively or in batch mode. resolution). The programs ...

5 downloads 5241 Views 443KB Size Report
a full screen sequence editor and a codon usage analysis program. SEQIN-ST and ... gel data. The full use of ambiguity codes is allowed (NC-IUB, 1985). Lower.
Volume 16 Number 5 1988 Volume

16

Number 5 1988 and CODFREG:

a

full

screen

sequence

editor

and

a

Nucleic Acids Research

Nucleic Acids program Research codon usage analysis

SEQIN-ST and CODFREG: a full screen sequence editor and a codon usage analysis program for the Atari ST Kurt Stiiber and Kurt Spanierl

Max Planck-Institut fiir Zuchtungsforschung, Egelspfad, D-5000 und Zuchtung, von Siebold Strasse 8, D-3400 Gottingen, FRG

Koin 30 and lInstitut fiir Pflanzenbau

Received August 21, 1987; Revised and Accepted November 7, 1987

1nt1rdulMDn. The program SEQIN-ST helps entering and analysing nucleic acid sequences and protein sequences using personal computers of the ATARI ST series. It allows the manual entry and/or the direct entry from a sequencing gel using the graphic mouse. The user decides the entry mode and the layout of the sequence. In check mode the sequence can be entered a second time and is automatically compared with the original sequence. The sequence can directly be analysed for its reading frames, restriction sites, direct and inverted repeats. The program CODFREG calculates codon frequencies of one or more genes either interactively or in batch mode. I softwam. The programs were developed using an ATARI 1040STF computer with one megabyte of memory, a SM124 monochrome monitor (resolution 640x400 dots) and a 20 megabyte hard disk SH205 (optionally) It has a 3.5 inch floppy disk drive with 720 kByte storing capacity per disk. Attached to it is a NEC P6 dot matrix printer (24pins, 300 dots per inch resolution). The programs are completely written in GfA-BASIC (Ostrowski, 1987). HAM

SEQIN-ST: Program descrpton General. SEQIN-ST is a full screen sequence editor for nucleic acid and protein sequences. The entry is done manually or with the aid of the mouse directly from the sequencing autoradiogram. Restriction maps, reading frames, direct and inverted repeats can be predicted. The main program menu is continuously available in the header line of the screen. Manual entry. The sequence is entered manually either from a paper or from gel data. The full use of ambiguity codes is allowed (NC-IUB, 1985). Lower © I RL Press Limited, Oxford, England.

1821

Nucleic Acids Research

Desk File Edit Options Analysis Help

Figure 1: The relation between a pencil and the mouse during gel entry. case characters are automatically translated to upper case. The sequence may be blocked to any blocksize between 1 and 80 characters with a default size of 10. Corrections (insertions and deletetions) can be made at any sequence position. After the correction, changes in the blocking pattern can be reorganized by pressing the F3 key once. The entry of nucleic acid sequences is supported by an acoustic feedback. For each of the four bases a sound with a different pitch is produced. On a single screen, 1920 bases (24 lines with 80 bases) can be seen at any one time. The upper limit of the editor is 500 lines or 40000 bases. When the screen is switched to small character size presentation S088 bases (48 lines with 106 bases each) are shown simultaneously and the upper limit of the sequence length is 53000. Mouse entry: As an alternative to manual entry of sequence data via the keyboard, the program has been adapted to permit direct reading of sequence gels via the mouse. For this, a pointed object such as a pencil is firmly attached to the mouse as shown in figure 1. This diagram is an animated 3D figure produced by the program as online help message. For entering data, the autoradiogramm is placed on a viewing screen and the mouse is placed on 1822

Nucleic Acids Research Desk File Edit Options Rnalysis Help TAT6TTCCTT 6ACTT656CA 6TTAR6TRTRRAT66RCRA6 A*U-n= TRTTCCT6AC AT66T656A6 CAT6RTTTTC TCATTTTTTC slot I Slot 2 Slot 6R6TTRT66T 666TAT666T CA6RA66ACT CCTRC6TR66 CR6A6CRR6A 6A66CATCCT 6ACCCT6AA6 TATCCCRTC6 CRTCACCAAC T666AC6ACA T66A6AA6AT CT66CACCRC RT6R6CTCC6 T6TT6CTCCC 6A66A6CACC CCRCCCT6CT CC6CT6RRCC CCRA66CCAA CC6G66AAA6 RT6ACTCA6R 6ACCTTCART 6TCCCT6CCA T6TAC6T66C CRTCCR66CA T6TRT6CTTC T66CC6TRCC RAAC6_

66TRTTTRRA RCRCT6ATTR TTCTCRTR66 T6AT6AR6CC R6CAT66TAT RCCTTCTRCR CACA6R66CC TCAT6TTT6A OT6CTRTCCC

Figure 2

3

Slot 4

Screen display of a schematic sequencing gel during gel entry.

the autoradiogramm with the pencil parallel to the lanes. The point of the pencil is moved to coincide with the center of the slot with the currently greatest density. Pressing the left mouse key enters the base corresponding to this slot into the sequence. Confirmation of the entry is provided by the corresponding aoustic tone. Then the mouse is moved to the next spot in another lane still using the pencil as a pointer. Simultaneously the sequence can still be entered and corrected manually as before using the keyboard. A graphic representation of the sequencing gel autoradiogram is shown on the screen which can be used to check the base assignments made with the mouse (see figure 2). Since data entry via the mouse is independent of vertical displacement in the gel, "smiling" autoradiograms can be processed with same ease as "non-smiling" types. Check entry. To ensure the correctness of the sequence it is entered a second time. To enter the check sequence the F4 key is pressed. The check entry is done exactly as the original entry and the user again can choose between normal and small character presentation and between manual and mouse entry 1823

Nucleic Acids Research Desk Film Edit Options Analysis Help A1I3?0..6.41I 1. .?e 2?9L..6...... .I-6.. Alul Avel Dal I Danz

Sanz I

aSup I 5t5 SatNI

Dpni ora!

oral!!

MC667! Pan!

i

""III1 lip.! I Mp#4!

i

ii..!

NbC! I N!.!!!

i

i

ft55 Sac I

SauDA Sami! SndSl Taql Xm!z II

Figure 3 :Screen display of

a restriction enzyme map.

mode. Each time the correct base is entered the computer responds with the respective acoustic feedback. If a base is entered which differs from the base in the original entry, the computer issues a beep and displays the incorrect base in lower case. The user can now correct the base in the check sequence or switch back to the original sequence (using the F4 key). With the F4 key the user can switch back and forth between the original entry and the check entry at any time. The complete screen contents are internally buffered so that no time is lost in reconstructing the display. To check a sequence only locally the mode local entry must be chosen. Here only a fraction of the sequence is seen at any one time. The start position of the local check must be entered. This can repeated as often as is needed. Comment entry. The comments are entered freely in a normal text editor style into a separate buffer (500 lines maximum). This buffer is already filled with some informnation (a template) which has to be supplemented by the user. This ensures that most of the vital data about the sequence are entered. Data storage. The data are stored in ASCII files which correspond directly to the sequence and comment data as seen on the screen. Alternatively the data can be stored in a binary format, i.e. a single base is encoded in two bits. 1824

Nucleic Acids Research

Figure 4: Screen display of a dotmatrix showing direct repeats. The shaded region on the right side is an area of the dotmatrix not currently seen on the screen. The missing region can be displayed by moving the nonshaded area with the mouse. This saves 75% of the disk space and is recommended when disk space is limiting. No comments are stored in the binary format. In this way more than two million bases can be stored on a single 3.5 inch floppy disk. Data transfer. The program has an integrated terminal emulation mode which allows data transfer to other computers. Special adaptations exist for the data transfer to VAX computers (microVAX and VAX 11/750). The sequence is transferred in an format compatible with the programs from the Genetics Computer Group (Devereux, et al. 1984). Data analyzis. Some data analysis features have been incorporated into this editor. By selecting the menu options restrict, translate, dlrect repeats or ndlrect repeats, a restriction map (figure 3), a translation to amino acid sequence in one or three letter codes or a dotplot of direct or inverted repeats (figure 4) will be shown on the screen. With the hardcopy option of the menu

1825

Nucleic Acids Research this will also be plotted on a printer. So far the NEC P6 or NEC P7 printers and the Star NLIO printer can be used.

CODRFRG: Progrm description. The program calculates codon usage tables from single sequences or sets of sequences either manually or in batch. During the analysis several regions of the same gene may be specified which allows the correct analysis of frame shift mutants. The codon usage table can be written to the screen and to a file. The program also gives a list of most highly scored triplets for each amino acid. The design of the codon usage table corresponds to the format given by Lathe (1985). The codon usage of 15 nuclear encoded plant genes from Antirrhinum majus and Zea mays was determined and the acyl carrier protein from Spinacia was backtranslated using the most frequent codons found. The backtranslated DNA sequence proved to be 78.8% homologous to the known gene sequence. Such data can then be used to design oligonucleotide probes to screen for genes with known protein

sequences.

Summa This program is intended as an addition to other programs that allow the analysis of biological sequences with the ATARI computer. Several other programs are planned. Most of the programs in the GENEXPERT program library (StUber, 1986) will be made available for the ATARI ST. Anyone who develops sequence analysis software for the ATARI is invited to submit these programs to the authors who will distribute them together with SEQIN-ST on a public domain basis. Submitted programs should be free of copyright restrictions. If possible, source codes and executable files should both be provided. The use of GfA-BASIC is recommended but not obligatory. A description of the program should be included as a ASCII separate file. Submitted programs will be distributed but not supported. Colleagues who wish to obtain the programs should send a 3.5 inch floppy disk to:

1826

Nucleic Acids Research ATARI PROGRAM EXCHANGE FOR GENETICS c/o K. Stuber Max-Planck-Institut fur Zuchtungsforschung Egelspfad D-5000 Koln 30 West Germany together with a self adressed envelope. Single sided or double sided floppies can be provided (please specify).

References Devereux J., Haeberli P., Smithies 0.

(1984) Nucl. Acids Res. 12, 387-395. Lathe R. (1985) Synthetic oligonucleotide probes deduced from amino acid sequence data - Theoretical and practical considerations. J. Mol. Biol. 183. 1-12. Nomenclature Committee of the International Union of Biochemistry (NC-IUB) Nomenclature for incompletely specified bases in nucleic acid sequences (1985) Eur. J. Biochem. 150, 1-5. Ostrowski F. (1987) GfA Basic, GfA-Systemtechnik GmbH, Dusseldorf Stuber K., (1986) Nucleic acid secondary structure prediction and display.Nucl. Acids Res. 14, 317-326

1827

Suggest Documents