Discovery and structures of the cyclotides: novel ... - Springer Link

1 downloads 0 Views 923KB Size Report
Discovery and structures of the cyclotides: novel macrocyclic peptides from plants. David J. Craik 1., Marilyn A. Anderson 2, Daniel G. Barry I , Richard J. Clark I ...
~

Letters in PeptideScience, 8: 119-128,2002. KLUWER/ESCOM 9 2001 KluwerAcademicPublishers. Printedin the Netherlands.

119

Discovery and structures of the cyclotides: novel macrocyclic peptides from plants D a v i d J. C r a i k 1., M a r i l y n A. A n d e r s o n 2, D a n i e l G. B a r r y I , R i c h a r d J. C l a r k I , N o r e l l e L. D a l y 1, C a m e r o n V. J e n n i n g s 2 & J a s o n M u l v e n n a 1 1 Institute for Molecular Bioscience, Australian Research Council Special Research Centre for Functional and Applied Genomics, The University of Queensland, QLD 4072, Australia 2 Department of Biochemistry, La Trobe University, Bundoora, VIC, 3086, Australia (* Authorfor correspondence, e-mail: [email protected]., Phone: +61-7-3365-4945, Fax: +61-7-3365-2487) Received22 December2001;Accepted 12 March 2002

Key words: circular proteins, cystine knot, NMR spectroscopy, Oldenlandia affinis, plant proteins

Summary Circular disulfide-rich polypeptides were unknown a decade ago but over recent years a large family of such molecules has been discovered, which we now refer to as the cyclotides. They are typically about 30 amino acids in size, contain an N- to C-cyclised backbone and incorporate three disulfide bonds arranged in a cystine knot motif. In this motif, an embedded ring in the structure formed by two disulfide bonds and their connecting backbone segments is penetrated by the third disulfide bond. The combination of this knotted and strongly braced structure with a circular backbone renders the cyclotides impervious to enzymatic breakdown and makes them exceptionally stable. This article describes the discovery of the cyclotides in plants from the Rubiaceae and Violaceae families, their chemical synthesis, folding, structural characterisation, and biosynthetic origin. The cyclotides have a diverse range of biological applications, ranging from uterotonic action, to anti-HIV and neurotensin antagonism. Certain plants from which they are derived have a history of uses in native medicine, with activity being observed after oral ingestion of a tea made from the plants. This suggests the possibility that the cyclotides may be orally bioavailable. They therefore have a range of potential applications as a stable peptide framework.

Abbreviations: CCK, cyclic cystine knot; MCoTI, Momordica cochinchinensis trypsin inhibitor; RP-HPLC, reversed phase high performance liquid chromatography; BOC, tert-butoxycarbonyl; HF, hydrogen fluoride; HBTU, 2-(1H-Benzotriazol-l-yl)-l,l,3,3-tetramethyluronium; DIEA, diisopropylethylamine; DMF, dimethylformamide; TCEP, Tris(2-carboxyethyl) phosphine; NMR, nuclear magnetic resonance; Con A, concanavalin-A.

Introduction The cyclotides are a novel family of proteins from plants that are approximately 30 amino acids in length and possess a cyclic backbone. They contain a knotted disulfide topology in which an embedded ring, formed by two disulfide bridges and their connecting backbone segments, is threaded by a third disulfide bond (Figure 1). This cyclic cystine knot (CCK) framework [1] makes the cyclotides exceptionally stable, a characteristic that is exemplified in an early native medicinal use. During the 1970's, Red Cross workers

in Zaire noted that women of the Lulua tribe made a thick tea from the leaves of the plant Oldenlandia affinis and sipped the concoction during childbirth in order to increase uterine contractions and accelerate delivery [2, 3]. Although their unique structural attributes were not recognized at the time, kalata B1 and B2 were isolated as active peptide-based constituents of the tea. The peptides were named from 'Kalata-Kalata', the name for the native medicine in the Tsjiluba language. The three-dimensional structure of kalata B 1 was determined some twenty years later, when the mac-

120

6

Loop Cys IV Loop 2 / /

Cys II III

Figure 1. A schematicdiagramof a cyclotideframeworkshowing the cyclic backbone, knotted disulfide arrangement(I-IV,II-V,IJ1-VI),and the numberof residues per loop in the peptide backbone. The structure is that of kalata B 1 (PDB code 1KAL). rocyclic peptide backbone and highly unusual CCK motif were revealed [4]. At around that time, bioassayguided screening led to the discovery of a range of other macrocyclic peptides from certain plants in the Violaceae and Rubiaceae families. The discoveries were made independently by several groups, with a diverse range of biological activities reported. In addition to the uterotonic activity of kalata B 1, the circulins were discovered in a screen for anti-HIV activity [5], cyclopsychotride A was found based on its inhibition of neurotensin binding [6], and screening for haemolytic activity led to the discovery of violapeptide I [7]. More recently many additional macrocyclic peptides have been identified, including varv peptides A-H [8, 9] from Viola arvensis; 12 cycloviolacins from Viola odorata [10] and Viola hederaceae [10], some new kalata variants from Oldenlandia affinis [10, 11], several new circulins and cycloviolins [ 12, 13], palicourein from Palicourea condensata [ 14] and a peptide from Hybanthus parviflorus [15]. All share the feature of a macrocyclic backbone and six conserved Cys residues. It is clear that that they form a family of peptides and for classification purposes we named them the plant cyclotides [10, 16].

To date, more than 40 cyclotides, exhibiting a range of biological activities, have been reported from the Violaceae and Rubiaceae plant families (Table 1). Recently another group of Cys-rich macrocyclic peptides, MCoTI-I and MCoTI-II, from Momordica cochinchinensis [17] were reported. The three dimensional structure of MCoTI-II has recently been determined [18, 19] and contains a CCK motif, suggesting that these Momordica peptides may be related to the cyclotide family. While there are some differences in spacings of Cys residues relative to the previously reported cyclotides, the fact that the new molecules come from a different plant family (Cucurbitaceae) suggests that cyclotide-like molecules may be quite widely distributed. While the cyclotides display a diverse range of activities, it appears that their main role may be in plant defence, based on a recent report of the potent insecticidal properties of kalata B 1 [ 11]. In this article we review our recent studies on the cyclotides. These studies have included the isolation of novel cyclotides, examination of the similarities and differences between cyclotide sub-classes, the chemical synthesis of cyclotides to facilitate structure/activity studies, examination of the stability and

121 Table 1. Sequence alignment of peptides from the cyclotide familya

Bracelet cyclotides m

I;

iv

Ref.

vl

V

cyc!oviolacin O1

A ESCV

Y

P

TVT A L L

cycloviolacin 02

GESCV

W

P

I SSA

cycloviolacin 03

GESCV

W

P

I SSA

cycloviolacin 04

GESCV

W

P

P`LTSA

cycloviolacin 05

GESCV

W

P

cycloviolacin 06

GESCV

W

P

P` I . S A A V G C

cycloviolacin 07

GESCV

W

P

P`T I T A L A G C

cycloviolacin 08

GESCV

W

P

I SS.

cycloviolacin 09

GESCiV

W

P

P` L T S A V ,

G C SIC

K S KV C YRNG

. I P [10]

cycloviolacin O10

GESCV

Y

P

8LTSAV.

G C SiC

K S KV C YRNG

. I P [10]

cycloviolacin O11

GESGiV

W

P

P` I . S A V V G C

cycloviolacin H1

GESCV

Y

P

P`LTSA.

G C SIP`

cycloviolin A

GESCV

F

P

P`LTTVA

cycloviolin B

GESCY

V L P

P` F . T V

cycloviolin C

GESC V F

P

C

cycloviolin D

GESCV

P

P` I , S A A

F

SID

SN

. I G C SIC I G C Si P` I

I SSA

GIC Sip`

V G ' C SIC

VVGC

Y.

RV

N G . I P [10]

K S KV CYRNG K N KV CYRNG.

. I P [10]

K S KV CYRNG

. I P [10]

I P[10]

K N KV C YKNGT

. P [10]

Sip` KIC

L P [10] K S KV C YKNGT K S KV CY. NS . I p[10]

sir-

K S KV C YKNGT

L P [10]

K S KV C YKNGT

L P [10] I p[10]

G C SIP`

K S KV CYRNG. CYRNG. KNKV

GO~'C

TSSQ

C FKNGTA.

GCSIC

KNKV

CYRNGV

I P [13]

G C SIC

KNKV

CYRNG.

F p [13]

KNKV

CYRNG.

L LGGSIC

KNKV

,C Y R N G V

. GCSIC

KSKV

!C Y R N G

.

p [12]

ENKV

CYHDK

.

p [12]

.

p [12]

A .

p [12]

.

. SAA

SIC

I p [13] [13]

circulin A

GESCV

W

P

13 I 9 S A A L G C S I C

circulin B

GESC

F

P

P` I S T .

circulin C

P P

P` I T S V A

circulin D

G E S CiV F GESCV W

P`VTS

. I FNGKIP`

circulin E

GESCV

W

P

C LTS

. V F N O KIP`

ENKV

C YHDK

circulin F

GESCV

W

P

P`

I GGSIC

KNKV

C YR.

P

P ` V T . A L L G C SIC

KSKV

C YKNS

Y

P

P` I . S G V

I G C SIP`

T DKV

CYLNGT

p [10]

AESCV

Y

P

P`T I T A L

LGCSIC

KNKV

C Y.

P [15]

GETCR

V

P V {:::

cyclopsychotride A

GESCiVF

kalata B5

GESCV

hypa A palicourein

T

Loop 1

Loop 2

. SAA

YSAA Loop 3

LGC~'P`

IDRS

Loop

.

NG .

D G L mC K R N G D .

Loop 5

p

[5]

P

[5]

p

[6]

p [14]

Loop 6

M6ebius cyclotides kalata B1

V :IG ET C-IvG

NTPGCTC

SWPV

CTRNG.

LP

kalata B2

V ; I G E T ClF G G T

NT

TWPI

CTRDG.

L P [10]

kalata B3

T ;IG ET ClF G G T

NT PGCTC

kalata B6

T ;IG ET ClF G G T

kalata B7

V ;IG ET ClT L G T

YT

kalata S

V ~,IGE T ClV G G T

NT PGCSC

GT

CSC

[4]

i'D

PWPI

CTRDG

. L P [10]

S

SWPI

CTRNG

L p [11]

SWPI

CKRNG

L P [1~1]

SWPV

CTRNG

L P [10]

SWPV

CTRNG

LP

[8] [9]

CTC

van/peptide A

V ; I G E T ClV G G T

varv peptide B

V ; I G E T ClF G G T

NT PGICSC

PWPM

CSRNG

LP

vary peptide C

I ; I G E T ClV G G T

NTPGCSC

SWPV

CTRNGV

. P

[9]

vary peptide D

I :IG E T CIV G G S

NT PC-CSC

SWPV

CTRNG.

LP

[9]

NT PGC

SiC

vary peptide E

I ;IG E T ClV G G T

NTPECSC

SWPV

CTRNG.

LP

[9]

vary peptide F

I ; G E T ClT L G T

YTAGCSC

SWPV

C T RNGV

. P

[9]

varv peptide G

V ; G E T ClF G G T

NTPGCSC

PWPV

C SRNGV

. P

[9]

vary peptide H

V ;IG ET CIF G G T

NTPGCSC

E

TWPV

C$RNG.

LP

[9]

violapeptide I

V ; I G E T JClV G G T

NTPECSC

9

SRPV

CTXNG

. LP

[7]

m

m

a The cyclotides can be subdivided into two families; the Bracelet and M6ebius sub-families. The conserved cysteine residues are boxed and numbered I-VI at the top of the sequence list. The X in Violapeptide I was not determined in the original report but is presumably R based on sequence homology.

122 folding pathway of the CCK, and the biosynthesis of the cyclotides.

Screening studies and peptide sequences In order to determine the distribution of cyclotides amongst plant species, and to identify unusual examples of the CCK motif, we implemented a screening program based on the unique physico-chemical properties of the cyclotides. Peptides are extracted from plant tissue utilising either solvent/solvent partitioning or dilute acid. These extracts are then analysed using reverse-phase HPLC, mass spectrometry, Edman sequencing, and ultimately the structures of novel cyclotides are determined using NMR spectroscopy. The initial identification of potential cyclotides is based on their late elution on RP-HPLC and characteristic masses ('~3 kDa). Their late elution appears to be a result of a surface-exposed patch of hydrophobic amino acids. Our screening program has produced a large number of novel cyclotides from plant species such as V. hederaceae, V. arvensis, V. tricolor and O. affinis [!0]. The pattern of discovery so far indicates that cyclotides are most common in the Violaceae family and are especially concentrated in the Viola genus. Interestingly, despite extensive screening, few new cyclotide-containing plants have been discovered in the Rubiaceae family, other than those in the early reports. Table 1 lists the cyclotide sequences presently known. The sequences are aligned based on the six conserved Cys residues that make up the cystine knot, labelled I to VI using the same convention as in Figure 1. The backbone segments between successive Cys residues are referred to as loops and these are numbered in Figure 1, with their sequences indicated in Table 1. Two of the backbone loops (loops 1 and 4) and their connecting disulfide bonds make up the embedded ring that is penetrated by the third disulfide bond. Consideration of the sequences in Table 1 shows that this embedded ring comprises only eight amino acids and thus represents a very constricted hole through which the third disulfide bond must pass. It is likely that this tight packing of the cysteine residues contributes significantly to the stability of the cystine knot. Given the importance of the knot to the structure it is not surprising that the size and nature of both loops 1 and 4 are very well conserved throughout the cyclotide

family. Loops 1 and 4 always comprise three and one amino acid(s), respectively. It is clear from Table I that the cyclotides also have significant sequence homology in many of the other loops. The region corresponding to loop 2 always comprises just four residues, except for the anti-HIV peptide palicourein. There is some size variability in loops 3 and 5, with each ranging from four to seven residues. The greatest size variation is seen within loop 6. With the very recent discovery of some new cyclotides, loop 6 has been shown to accommodate from as few as five to as many as ten residues (unpublished data). The sequence alignment of Table 1 identifies a number of residues that are conserved across all the cyclotides. These residues may be important for biological activity, structural integrity or in the processing reactions involved in backbone cyclisation. As noted already, loops 1 and 4 are absolutely conserved in size and amino acid type and are involved in formation of the cysfine knot. Similarly, the C-terminal end of loop 3 contains a highly conserved Gly residue that may play a structural role as it is directly adjacent to the cystine knot. Examination of loop 6 suggests residues that may be pivotal in the formation of the amide head-to-tail cyclic backbone, which may in turn be important in the exceptional physiological stability of the cyclotides. In particular, the RNG(L/I)P stretch within loop 6 is highly conserved and data derived from the gene sequences of precursor proteins confirms the proximity of these residues to processing sites (see below). Table 1 has been split into two groups, based on the classification of the backbone as either of M6ebius or Bracelet type [ 10]. This nomenclature arises because it has been proposed that a cis-Pro pepfide bond in loop 5 can be thought of as providing a twist in the conceptual ribbon of the pepfide backbone, leading to the circular backbone being regarded as a M6ebius strip. When this cis-Pro is not present, all backbone peptide bonds are in the trans arrangement, making the backbone bracelet-like. Hence, the cyclotide family is divided based on the presence (or absence) of the putative cisPro peptide bond in loop 5. It is stressed that this is a convenient conceptual description only and it is not suggested that the molecules exhibit the topological properties of either bracelets or M6ebius strips. There are clear distinctions, aside from the cis-Pro, between the cyclotides in each sub-family. The most notable difference lies in the residues of loop 3. For the M6ebius sub-family, which includes kalata B 1, loop

123 3 comprises the short NTPG sequence that forms an extended turn in solution structures. In contrast, loop 3 of the Bracelet cyclotides, comprises a series of hydrophobic residues that form a short, though definable, helical structure. It is interesting that, with the exception of palicourein, which to some extent has chimeric properties, most of the Bracelet cyclotides contain the sequence GES in loop 1, whereas it is invariably GET in the M6ebius family.

Synthesis of the cyclotides This section describes the methodology that we use for the chemical synthesis of the cyclotides. The methodology was developed by investigating two strategies for synthesising kalata B 1 [20]. Strategy A (Figure 2) was to take a linear reduced kalata B 1 molecule and oxidise it to bring the N and C termini into proximity. It was presumed that this would facilitate the intramolecular amide bond formation required for backbone cyclization relative to competing oligomerisation reactions, Strategy B (Figure 2) involved using native chemical ligation [21, 22, 23, 24] to cyclise the backbone followed by oxidation to form the cystine knot. In summary, strategy A involves oxidation followed by cyclisation and strategy B cyclisation followed by oxidation. In strategy A (Figure 2) the ligation point in the protein backbone was made at the Gly-Gly sequence in kalata B 1. This provides a cyclization site where the prospect of racemization can be ruled out and steric hindrance is low. Additionally, the Gly-Gly sequence is involved in a t-turn and therefore a break in the protein backbone between these two residues in the linear precursor peptide is less likely to disrupt the folding of the molecule. Assembly of the linear peptide was achieved on-resin using standard BOC/HBTU chemistry and followed by cleavage with HE A number of different buffer conditions were then tested to determine the most efficient oxidation conditions. In aqueous buffers the folding of linear kalata B 1 proceeded with very low yields. However, addition of an organic solvent resulted in a substantial increase in the yield of correctly folded product [20]. A trial of a number of different organic solvents demonstrated that the most efficient conditions for oxidising kalata B 1 involved 50% isopropanol in 0.1M ammonium bicarbonate (pH 8.5) and lmM reduced glutathione. The improvement in the yield of correctly folded protein on the addition of an organic solvent is presumed to

be due to the stabilisation of the exposed hydrophobic patch on the surface of the protein. Cyclisation of the oxidized linear kalata B 1 molecule was achieved using HBTU with an excess of DIEA in DME The overall yield of correctly folded circular kalata B 1 from linear reduced peptide was 2%. While lower than is typically seen (~10%) for the synthesis of small disulfide-rich peptides, the magnitude of the yield is not unexpected given the complexity of the cystine knot and the need for additional purification steps associated with both oxidation and cyclisation. Strategy B (Figure 2) utilised native chemical ligation to cyclise the backbone followed by oxidation to form kalata B1. This methodology requires a Nterminal Cys residue, which forms a thioester with a functionalised C-terminus that subsequently undergoes an S,N acyl migration to form a native peptide bond [21, 22, 23, 24]. Of the six possible Cys residues, the Gly-Cys in the kalata B 1 sequence was chosen as the break point of the backbone as a Gly residue at the C-terminus would decrease steric hindrance. The linear precursor was synthesised using solid phase peptide synthesis with BOC/HBTU chemistry and cleaved from the resin with HF [20]. The cyclization reaction was performed in 0.1M sodium phosphate (pH 7.4) with an excess of TCEP over a period of approximately 30 minutes. The cyclic reduced precursor was then oxidized and, as was the case for strategy A, it was found that the presence of organic solvent in the folding buffer significantly improved the yield of correctly folded kalata B 1. The oxidation was found to occur with greater efficiency for the cyclized precursor (7% overall yield) relative to the linear reduced peptide. This suggests that cyclization significantly favours the folding process. In summary, there appear to be two main factors that affect the formation of the CCK topology in kalata B1. Firstly, the presence of a hydrophobic environment significantly improves the efficiency of the oxidation as this presumably stabilises the hydrophobic residues that are exposed on the surface of correctly folded kalata B 1. Secondly, the prior backbone cyclization of kalata B 1 appears to be an important driving force in the correct folding of the molecule. The ability to synthesise cyclotide molecules has opened the possibility of grafting other bioactive sequences onto this framework to take advantage of its exceptional stability as a template in drug design [16].

124

Figure 2. Two synthesis strategies for the formation of kalata B1. Strategy A (left-hand side) involves the oxidation of a linear kalata B1 precursor followedby cyclisation while in Strategy B (right-hand side) the linear precursoris first cyclised then oxidised.

Structures of cyclotides The three-dimensional structures of three members of the cyclotide family (kalata B1 [4], circulin A [25], and cycloviolacin O1 [10]) have been determined by N M R spectroscopy and reveal similar overall folds

[10]. As noted earlier, the core structural motif has been termed the cyclic cystine knot (CCK) and is characterized by a cystine knot embedded in a macrocyclic backbone [10]. The cystine knot involves two intracysteine backbone segments and their connecting disulfide bonds, CysI-Cys TM and Cysn-Cys v, which

125

Figure 3. (A) Ribbon diagram of the NMR structure of kalata B1 [4]. (B) An overlayof the cystine knot region of (a) MCoTI-II[18], (b) kalata B1, (c) cycloviolacinO1 [10] and (d) circulin A [25]. The circular segmentsshow the backboneloops and disulfide bonds that form an embedded ring in the structure that is penetrated by the third disulfidebond.

form a ring that is penetrated by the third disulfide bond, Cysm-Cys vI. The conserved structural characteristics of the cyclotides also include a r-hairpin, which is generally part of a triple-stranded r-sheet [10]. The third strand is distorted from ideal fl geometry and contains a r-bulge. The three-dimensional structure of kalata B 1 is shown in Figure 3. The loops involved in the backbone strands of the cystine knot (loops 1 and 4) are not only conserved in size but also have very similar conformations in the three structures determined. This is illustrated in an overlay of the knot regions of kalata B 1, circulin A and cycloviolacin O1 in Figure 3. While the core of the cyclotides is structurally highly conserved, greater variation occurs outside the cystine knot region. Loops 2 and 5 comprise r-turns in the structures of kalata B 1, circulin A and cycloviolacin O1. However, both of these loops are not conserved in residue type or size across all cyclotides and therefore the structures cannot be conserved. Loops 3 and 6 also have structural variability. As noted earlier, members of the Bracelet family have a helical region in loop 3 that is lacking in the M6ebius family. Loop 6 varies somewhat in length and generally has irregular secondary structure. Our recent determination of the three-dimensional structure of the 34 residue macrocyclic trypsin inhib-

itor MCoTI-II from M. cochinchinensis [18] revealed a CCK topology similar to the previously determined cyclotides despite the absence of sequence homology. Because of this topological similarity and the conserved macrocyclic backbone we have classified MCoTI-II as a cyclotide. The most significant structural differences between MCoTI-II and the previously known cyclotides include an increased size of the embedded ring of the cystine knot (11 residues instead of 8) and a more disordered loop 6. The increased knot size is apparent in Figure 3. To help understand the significance and role of the circular backbone in the cyclotides we recently introduced the concept of acyclic permutation of circular proteins [26]. This effectively involves breaking the backbone to produce acyclic homologues. We synthesized the six acyclic permutants corresponding to opening the backbone in each of the six loops between successive Cys residues in the prototypic cyclotide kalata B 1. We found that four of the six permutants folded into native-like conformations, but two did not. These were precisely the two that involved breaking the embedded ring of the cystine knot [26]. This emphasises the importance of the cystine knot in folding and confirms that it is the crucial structural core of the cyclotides.

126

Oakl

A

Oak2

IERI S

BI B3

QLKGLPVCGETCVGGTCNTPGCTCS-WPVCTRNSL~ SLAA QLKGLPTCGETCFGGTCNTPGCTCDPWPICTRDSL~ SAAA

B7

Q ,KG , VCG ,TCT ,GTCY QGCTCS-WPIC

B2

QLKGLPVCGETCFGGTCNTPGCSCT-WPICTRDSLPLVAA

, DVm

C 9

A

Figure 4. (A) A diagrammatic representation of the predicted proteins encodedby two cyclotide genes from O. affinis (Oak1 and Oak2) with the endoplasmic reticulum signal sequence (ER), N-terminal repeat fragment (ntr) and cyclotide domains (B1, B3 and B6) indicated. Bars represent the three disulfide bonds in the cyclotidedomains. (B, C) Sequence of the B1, B2, B3, and B7 cyclotidesfrom O. affinis showing the flanking regions predicted from the cDNAclones. A -Gly-Leu-Promotif flanks both sides of the B 1, B3 and B7 sequences within the respective precursor proteins. Processing adjacent to Gly, Leu or Pro could yield the mature cyclotides that contain only one of the -Gly-Leu-Pro motifs. Retention of the -Gly-Leu-Pro- motif, and not the -Ser-Leu-Pro- motif in the B2 cyclotide suggests that processing occurs on the N-terminal side of the Gly and Ser residues respectively.

Genes and precursor proteins In contrast to the non-ribosomally produced, small cyclic peptides (5-12 aa) that are found in fungi and bacteria, the cyclotides are true gene products. Clones encoding cyclotides from O. affinis have been obtained using a PCR approach with a primer corresponding to a region of the cyclic protein in combination with oligo-dT [11]. The amplified 400 bp fragment encoded the entire kalata B I cyclotide together with a C-terminal extension of 4 amino acids. This PCR fragment was subsequently used to isolate cDNAs corresponding to four cyclotide genes from O. affinis. These cDNAs encode multi-domain precursor proteins with one, two or three cyclotide domains. The cyclotides corresponding to these mature domains, kalata B1, B2, B3, B6 and B7, have all been isolated from plant tissue, verifying the production and processing of the precursors. The predicted precursors from Oakl-4 have a typical endoplasmic reticulum signal sequence and thus

are likely to enter the secretory pathway where folding and disulfide bond formation occurs. Each precursor contains a relatively long N-terminal pro-domain that is not tightly conserved in sequence or length. This domain is followed by a relatively well-conserved 25 amino acid peptide that we have called the N-terminal repeated fragment (ntr). The ntr precedes each cyclotide domain in the single, double and triple cyclotide encoding precursors. We speculate that the ntr domain may assist in folding or processing of the cyclotide domain. The predicted precursors also have a C-terminal propeptide of 7 residues, but the function of this domain is unknown. The isolation of cDNA clones Oaksl-4, together with kalata B1, B2, B3, B6 and B7 cyclotides from plant tissues provides some insight into the potential processing sites. Within the precursors, each of the kalata B1, B3, B6 and B7 sequences are flanked on both sides by a -Gly-Leu-Pro- motif. Initially the processing site could not be defined because the mature cyclic proteins retain only one copy of the - G l y -

127 Leu- Pro- and thus processing adjacent to the Gly, Leu or Pro would yield an identical cyclic product. Further insight was provided by the Oak4 clone that encoded a precursor protein with three copies of the cyclotide kalata B2. In Oak4 the three kalata B2 domains are flanked by -Gly-Leu-Pro- at the N-terminus and -Ser-Leu-Pro- at the C-terminus. Retention of the -Gly-Leu-Pro- sequence in the B2 peptide indicated that processing occurs next to the lysine and asparagine/aspartate residues as shown in Figure 4. Enzymes that accommodate Lys as well as Asn/Asp residues have not been identified in plants but proteases specific for Asn residues are commonly found within the secretory pathway [27, 28, 29]. Indeed an asparaginyl endopeptidase has been implicated in the post-translational cleavage and ligation of the concanavalin-A (Con A) precursor within the secretory pathway of the plant Canavalia ensiformis (Jack bean). It is possible that a similar asparaginyl endopeptidase has a role in the production of cyclotides, because the C-terminal cleavage site has a conserved Asn or Asp residue. In contrast to Con A, the cyclotides require a second cleavage event next to a lysine residue, suggesting looser specificity of a single enzyme or the requirement of a second protease. We have also considered the possibility that processing of cyclotide precursors involves an inteinrelated event. Indeed, intein related technologies have been used to produce cyclic peptides in vitro and in vivo using split intein systems [30, 31, 32]. The cyclotide precursor proteins do not share any homology with known intein sequences. Nevertheless, we cannot exclude the possibility of a new type of autocatalytic processing event at this stage.

Concluding remarks Small circular proteins have been known for less than a decade but have emerged as an exciting new class of molecules that have a range of potential applications in the pharmaceutical and agricultural fields. The circular backbone and knotted topology of the cyclotides renders then extremely stable proteins. New sequences are regularly being discovered and it is likely that interest in circular proteins will expand greatly over the next few years.

Acknowledgments Studies on the plant cyclotides have been supported by grants (to DJC and MAA) from the Australian Research Council. DJC is an ARC Professorial Fellow. We thank Shaiyena Williams and Robyn Craik for assistance in the preparation of the manuscript.

References 1. Craik, DJ., Daly, N.L. and Waine, C., Toxicon, 39 (2001) 43. 2. Gran, L., Medd. Nor. Farm. Selsk., 12 (1970) 173. 3. Gran, L., Sandberg, F. and Sletten, K., J. Ethnopharmacol., 70 (2000) 197. 4. Saether, O., Craik, D.J., Campbell, I.D., Sletten, K., Juul, J. and Norman, D.G., Biochemistry, 34 (1995) 4147. 5. Gustafson, K.R., Sowder II, R.C., Henderson, LE., Parsons, I.C., Kashman, Y., Cardellina H, J.H., McMahon, J.B., Buckheit Jr., R.W., Pannell, L.K. and Boyd, M.R., J. Am. Chem. Soc., 116 (1994) 9337. 6. Witherup, K.M., Bogusky, M.J., Anderson, P.S., Ramjit, H., Ransom, R.W., Wood, T. and Sardana, M., J. Nat. Prod., 57 (1994) 1619. 7. Sch6pke, T., Hasan Agha, M.I., Kraft, R., Otto, A. and Hiller, K., Sci. Pharm., 61 (1993) 145. 8. Claeson, P., G6ransson, U., Johansson, S., Luijendijk, T. and Bohlin, L., J. Nat. Prod., 61 (1998) 77. 9. G6ransson, U., Luijendijk, T., Johansson, S., Bohlin, L. and Claeson, P., J. Nat. Prod., 62 (1999) 283. 10. Craik, D.J., Daly, N.L., Bond, T. and Waine, C., J. Mol. Biol., 294 (1999) 1327. 11. Jennings, C., West, J., Waine, C., Craik, D. and Anderson, M., Proc. Natl. Acad. Sci. U. S. A., 98 (2001) 10614. 12. Gustafson, K.R., Walton, L.K., Sowder, R.C.I., Johnson, D.G., Pannell, L.K., Cardellina, J.H.I. and Boyd, M.R., J. Nat. Prod., 63 (2000) 176. 13. Hallock, Y.E, Sowder, R.C.I., Pannell, L.K., Hughes, C.B., Johnson, D.G., Gulakowski, R., Cardellina, J.H.I. and Boyd, M.R., J. Org. Chem., 65 (2000) 124. 14. Bokesch, H.R., Pannell, L.K., Cochran, P.K., Sowder, R.C., 2nd, McKee, T.C. and Boyd, M.R., J. Nat. Prod., 64 (2001) 249. 15. Broussalis, A.M., Goransson, U., Coussio, J.D., Ferraro, G., Martino, V. and Claeson, P., Phytochemistry, 58 (2001) 47. 16. Craik, D.J., Toxicon, 39 (2001) 1809. 17. Hernandez, J.E, Gagnon, J., Chiche, L., Nguyen, T.M., Andrieu, J.P., Heitz, A., Trinh Hong, T., Pham, T.T. and Le Nguyen, D., Biochemistry 39 (2000) 5722. 18. Felizmenio-Quimio, M.E., Daly, N.L. and Craik, D.J., J. Biol. Chem., 276 (2001) 22875. 19. Heitz, A., Hemandez, J.F., Gagnon, J., Hong, T.T., Pham, T.T., Nguyen, T.M., Le-Nguyen, D. and Chiche, L., Biochemistry, 40 (2001) 7973. 20. Daly, N.L., Love, S., Alewood, P.E and Craik, D.J., Biochemistry, 38 (1999) 10606. 21. Tam, J.P. and Lu, Y.-A., Tetrahedron Lett., 38 (1997) 5599. 22. Tam, J.P. and Lu, Y.-A., Protein Sci., 7 (1998) 1583. 23. Tam, J.P., Lu, Y.-A. and Yu, Q., J. Am. Chem. Soc., 121 (1999) 4316. 24. Camarero, J.A., Cotton, G.J., Adeva, A. and Muir, T.W., J. Pept. Res., 51 (1998) 303.

128 25. 26. 27. 28.

Daly, N.L., Koltay, A., Gustafson, K., R., Boyd, M.R., CasasFinet, J.R. and Craik, D.J., J. Mol. Biol., 285 (1999) 333. Daly, N.L. and Craik, D.J., J. Biol. Chem., 275 (2000) 19068. Scott, M.P., Jung, R., Muntz, K. and Neilsen, N.C., Proc. Natl. Acad. Sci. USA, 89 (1992) 658. Hara-Nishimuri, I., Takeuchi, Y., Inoue, K. and Nishimura, M., Plant J., 4 (1993) 793.

29.

Takeda, O., Miura, Y., Mitta, M., Matsushita, H., Kato, I., Abe, Y., Yokosawa, H. and Ishii, S.-I., J. Biochem. (Tokyo), 116 (1994) 541. 30. Camarero, J.A. and Muir, T.W., J. Am. Chem. Soc., 121 (1999) 5597. 31. Evans, T.C., Jr., Benner, J. and Xu, M.Q., J. Biol. Chem., 274 (1999) 18359. 32. Scott, C.P., Abel-Santos, E., Wall, M., Wahnon, D.C. and Benkovic, S.J., Proc. Natl. Acad. Sci. U S A, 96 (1999) 13638.