A Two-Fold Structural Classification Method for

2 downloads 0 Views 8MB Size Report
inaccurate assessment of the structural ensemble of the protein molecules due to the ..... (d) Relative errors, γ defined in Eq. (3.2) resulting from one- and two-fold methods ..... Alternative routes for entry of hgx2 into the active site of mercuric.
Commun. Comput. Phys. doi: 10.4208/cicp.OA-2018-0140

Vol. 25, No. 4, pp. 1010-1023 April 2019

A Two-Fold Structural Classification Method for Determining the Accurate Ensemble of Protein Structures Pan Tan1,2 , Zuyue Fu2,3 , Loukas Petridis4,5 , Shuo Qian4 , Delin You6 , Dongqing Wei6 , Jinglai Li2,7 and Liang Hong1,2, ∗ 1

School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai 200240, China. 2 Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, China. 3 Zhiyuan College, Shanghai Jiao Tong University, Shanghai 200240, China. 4 Oak Ridge National Laboratory, Oak Ridge, Tennessee 37830, United States. 5 Department of Biochemistry, Cellular & Molecular Biology, The University of Tennessee, Knoxville, Tennessee 37996, United States. 6 School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China. 7 Department of Mathematical Sciences, University of Liverpool, Liverpool L69 7ZL, UK. Received 22 May 2018; Accepted (in revised version) 27 August 2018

Abstract. Atomic-level structural characterization of flexible proteins, such as intrinsically disordered proteins and multi-domain proteins connected by flexible linkers, is challenging as they possess distinct conformations in physiological conditions. Significant efforts have been made to develop integrated approaches by combining small angle neutron/X-ray scattering experiments with molecular simulations to reveal the distinct atomic structures and the corresponding populations for these flexible proteins. One widely used method, the basis-set supported ensemble method, classifies the simulation-generated protein conformations into a set of structural basis and then derives the corresponding populations by fitting to the experimental data. This method makes an implicit assumption that protein conformations of similar structures have similar small angle scattering profiles.The present work demonstrates that, for various protein systems ranging from compact globular proteins and flexible multidomain proteins through to intrinsically disordered proteins, this method provides inaccurate assessment of the structural ensemble of the protein molecules due to the breakdown of the assumption made. To alleviate this problem, a two-fold-clustering ∗ Corresponding

author. Email addresses: [email protected] (L. Hong), [email protected] (P. Tan), [email protected] (Z. Fu), [email protected] (L. Petridis), [email protected] (S. Qian), [email protected] (J. Li) http://www.global-sci.com/

1010

c

2019 Global-Science Press

P. Tan et al. / Commun. Comput. Phys., 25 (2019), pp. 1010-1023

1011

method is developed to cluster the simulation-generated protein structures using information on both 3D structure and scattering profiles. As benchmarked by both simulation and experimental results, this new method yields much more accurate populations of structural basis of protein molecules. PACS: 62-07 Key words: Protein structures, statistical data analysis, Monte Carlo, cluster analysis.

1 Introduction A central task in molecular biochemistry and molecular biophysics is to determine the atomic structure of proteins at physiological conditions. Although X-ray crystallography and NMR can provide high-resolution atomic-level structure of bio-macromolecules, they are limited by either the availability of crystalline samples or the size of the macromolecules. These high-resolution techniques can be complemented by low-resolution ones, such as cryo-electron macroscopy, mass spectroscopy and small angle scattering (SAS). SAS, either with X-ray or neutron, has the advantage of measuring protein structures in physiological conditions [3, 5, 12, 14, 25, 26, 31, 33]. However, SAS is inherently limited because the three-dimensional real-space structural information of a biomacromolecule is reduced to a one-dimensional scattering profile in reciprocal space, resulting in loss of information and the difficulty of converting the SAS intensity to a 3D structure [11, 15, 23, 24, 29, 31–33]. Ab initio method and the Bayesian refinement method [28] have been developed to re-construct low-resolution representations of the biomacromolecules by modeling the experimental SAS data using spatially packed spheres of sub nanometer size [30], which, however, lack the atomic-level, or even secondary-structurelevel information. Recently, there have been significant efforts to develop integrated approaches by combining small-angle scattering experiments with molecular simulations to derive the atomic-level structures of bio-macromolecules [3, 4, 6, 11, 16, 20, 21, 24, 28, 32, 33]. These approaches roughly fall into two categories: one is to search for the protein conformations from existing structural candidates, pre-generated from molecular simulation using standard force fields, which best fits to the experimental SAS data [11, 21, 22, 24, 32, 33]; while the other one is to apply biased potentials to drive the simulation towards the protein conformations in better agreement with experiment [4,6,16]. The present work focuses on the discussion of the first type of approaches. It becomes increasingly clear that flexible biomolecules, such as multi-domain proteins linked through flexible linkers and intrinsically disordered proteins, possess multiple conformations in the physiological conditions [14, 33]. Revealing the populations of distinct conformations of such protein system in solution is of great importance towards the understanding of the enzymatic mechanism. This inspires the development of the ensemble-based SAS approaches, such as the ensemble optimization method [3], the basis-set supported ensemble method [26, 33],

1012

P. Tan et al. / Commun. Comput. Phys., 25 (2019), pp. 1010-1023

the SAS module in an integrative platform by taking into account the structural constrains in the bio macromolecules [11], and the minimal ensemble search method [24]. The ensemble-based SAS approaches present a significant advancement in interpretation of flexible protein conformations as compared to methods which find a single protein structure fitting best to the experimental SAS data [14]. The present work mainly tests the basis-set supported ensemble method [26, 33], which can be schematically illustrated by Fig. 1(a). Briefly, a large pool of candidates of protein conformations, pre-generated using a simulation method, e.g., molecular dynamics simulations (MD) or Monte Carlo simulation, is clustered into a limited number of sub ensembles, i.e., representative structures, based on structural similarity as quantified by a structural metric. Here, three structural metrics are used: root-mean-squared deviation (RMSD), distance root-mean-square (DRMS) [26] (see the supporting information) and dihedral distance [7]. Then, SAS profiles are calculated for each sub ensemble and used as a linear basis to fit against the experimental data to obtain the corresponding populations. This method makes an implicit assumption that structurally similar conformations have similar SAS profiles, which can be invalid as reported by Refs. [15, 32]. Here, we demonstrate that, for various protein systems, this ensemble-search method can lead to inaccurate assignment of protein structural ensemble because of the failure of the assumption made. To overcome this deficiency, a two-fold-clustering method is developed, which clusters the structural candidates using information on both structures and SAS profiles. The ability of this new method in predicting a more accurate ensemble of protein structures in solution is confirmed by both simulation and experimental tests.

2 Method For simplicity, the basis-set supported ensemble method as shown in Fig. 1(a) is referred as one-fold-clustering method herein. The two-fold-clustering method introduced here, schematically illustrated in Fig. 1(b), contains four steps: (1). The original pool of protein conformations, which are pre-generated by molecular simulations, is clustered into N sub ensembles, using the algorithm developed in Ref. [8], based on their structural similarity, as quantified by one of the three structural metrics (RMSD, DMSD or dihedral distance). This step is the same as for the one-fold-clustering method; (2). Protein conformations within each sub ensemble are further divided into several smaller clusters based on their similarity in I (q), quantified by, δij =

[ Ii (qs )− Ij (qs )]2 1 , L−1 ∑ σ2 ( q s ) s

(2.1)

where, Ii (qs ) and Ij (qs ) are the SAS intensities of the protein conformation i and j, respectively, qs is the scattering wave vector, L is the total number of the scattering wave vectors, and σ2 (qs ) is the variance of I (qs ) among all the conformations. Thus, the protein conformations within each cluster are similar in both 3D structure and I (q). (3). The population

9

P. Tan et al. / Commun. Comput. Phys., 25 (2019), pp. 1010-1023

1013

Figure 1: (Color online) Schematic illustration for (a) one- and (b) two-fold-clustering methods. Ref. [33] also proposed a two-criteria method using both RMSD and I (q ) to cluster simulation-derived protein conformations to reveal the atomic structures of proteins. It is, however, different from the present work. It first clusters the simulation-derived protein conformations to small sub ensembles based on their structural similarity using RMSD, and then merges them to bigger clusters based on similarity in I (q ). Finally, these big clusters serve as linear basis to fit against the experimental I (q ) to obtain the corresponding populations. Within each of these big clusters, protein conformations can differ significantly despite I (q ) is similar. In other words, Ref. [33] obtains an ensemble of I (q ), from which one can hardly get the information on the populations of different 6B;m`2 R, U+QHQ` QMHBM2V a+?2KiB+ BHHmbi`iBQM 7Q`delivered UV QM2@ U#V method irQ@7QH/@+Hmbi2`BM; representative structures. In contrast, this is the direct message by theM/ two-fold proposed in present work. K2i?Q/bX _27X(jj) HbQ T`QTQb2/  irQ@+`Bi2`B K2i?Q/ mbBM; #Qi? _Ja. M/ iQ +Hmb@

i2` bBKmHiBQM@/2`Bp2/ T`Qi2BM +QM7Q`KiBQMb iQ `2p2H i?2 iQKB+ bi`m+im`2b Q7 T`Qi2BMbX AiofBb?Qr2p2`/Bz2`2Mi 7`QK i?2 T`2b2Mi rQ`FX Ai }`bi i?2linear bBKmHiBQM@/2`Bp2/ each cluster is obtained by fitting the experimental I (q) +Hmbi2`b to a set of basis comT`Qi2BM +QM7Q`KiBQMb iQ bKHH bm# 2Mb2K#H2b #b2/ QM i?2B` bi`m+im`H bBKBH`Biv mbBM; posed by the scattering profiles calculated from each cluster. (4). The so-obtained pop_Ja.i?2M K2`;2b i?2K iQfrom #B;;2` QM bBKBH`Biv BM together X 6BMHHvulationsM/ of the clusters originating the +Hmbi2`b same sub#b2/ ensemble are summed to i?2b2 #B; +Hmbi2`b b2`p2 b HBM2` #bBb iQ }i ;BMbi i?2 2tT2`BK2MiH iQ Q#iBM represent the population of that sub ensemble. Therefore, the two-fold-clustering method i?2 +Q``2bTQM/BM; qBi?BM (representative 2+? Q7 i?2b2 #B; +Hmbi2`b-asT`Qi2BM +QM7Q`KiBQMb +M yields the sameTQTmHiBQMbX set of sub ensembles structures) the one-fold-clustering method does, while/2bTBi2 the corresponding populations could be different. /Bz2` bB;MB}+MiHv Bb bBKBH`X AM Qi?2` rQ`/b_27X(jj) Q#iBMb M 2Mb2K#H2 Q7 - 7`QK r?B+? QM2 +M ?`/Hv ;2i i?2 BM7Q`KiBQM QM i?2 TQTmHiBQMb Q7 /Bz2`2Mi `2T`2b2MiiBp2 bi`m+im`2bX AM +QMi`bi- i?Bb Bb i?2 /B`2+i K2bb;2 /2HBp2`2/ #v i?2 irQ@7QH/ K2i?Q/ T`QTQb2/ BM T`2b2Mi rQ`FX

1014

P. Tan et al. / Commun. Comput. Phys., 25 (2019), pp. 1010-1023

8

Figure 2:k,(Color online) Snapshots of the simulation (a) IDP (b) lysozyme (c)U#V MerA. (d) SAXS M/ 6B;m`2 U+QHQ` QMHBM2V aMTb?Qib Q7 i?2 systems, bBKmHiBQM bvbi2KbUV and A.S HvbQxvK2 profiles calculated based on the protein structures displayed in (a), (b) and (c) using software Sassena [19] U+V J2`X U/V asa T`Q}H2b +H+mHi2/ #b2/ QM i?2 T`Qi2BM bi`m+im`2b /BbTHv2/ BM UVU#V M/ U+V mbBM; bQ7ir`2 abb2M(RN)

To test the performance of these two different methods, three proteins with distinct structural features are examined here: an intrinsically disordered protein (IDP, Tau2672tKBM2/ ?2`2, M BMi`BMbB+HHv /BbQ`/2`2/ T`Qi2BM UA.S- hmked@jRk- S." A., kJwdV-  312, PDB ID: 2MZ7), a multi-domain protein (mercuric ion reductase, MerA) and a globKmHiB@/QKBM T`Qi2BM UK2`+m`B+ BQM `2/m+ib2- J2`V M/  ;HQ#mH` bBM;H2@/QKBM T`Qi2BM ular single-domain protein (lysozyme). Typical structures of these three systems are disUHvbQxvK2VX hvTB+H i?`22 bvbi2Kb /BbTHv2/ BM 6B;bX k(SAXS) iQ +- M/ played in Figs. 2(a) bi`m+im`2b to (c), and Q7 thei?2b2 corresponding small`2 angle X-ray scattering i?2 +Q``2bTQM/BM; bKHH M;H2 b+ii2`BM; UasaV T`Q}H2bAU[V- `2dynamics T`2b2Mi2/simBM 6B;X profiles, I(q), are presented in s@`v Fig. 2(d). We employ all-atom molecular k/X q2 2KTHQv HH@iQK KQH2+mH` /vMKB+b bBKmHiBQM iQ ;2M2`i2 Ryyy +QM7Q`KiBQMb ulation to generate 1000 conformations for each system (see the supporting information 7Q` bvbi2K Ub22 i?2 bmTTQ`iBM; BM7Q`KiBQM UaAV /2iBH2/ bBKmHiBQM T`QiQ+QHbV(SI)2+? at www.global-sci.com/cicp.OA-2018-0140.SI for7Q` detailed simulation protocols), r?B+? `2 mb2/ b i?2 BMBiBH TQQH Q7 T`Qi2BM +QM7Q`KiBQMb 7Q` +Hmbi2` MHvbBbX which are used as the initial pool of protein conformations for cluster analysis.

j3 _2bmHib Results jXR 3.1 aBKmHiBQM Simulationi2bi test AMIni?2 i?2averaged p2`;2/ I (q) calculated +H+mHi2/from 7`QK bMTb?Qib Q7 i?2 T`Qi2BM the bBKmHiBQM simulation i2bibtests, the thei?2 snapshots of the protein KQH2+mH2 7Q` each 2+? bvbi2K moleculeBMini?2 the}`bi firstQM2 one}7i? fifthQ7ofi?2 theJ. MDi`D2+iQ`v trajectory Ukyy (200 +QM7Q`KiBQMbV conformations) for system `2 iF2M b i?2asi`;2i asaSAXS /i-data, M/ Bi rBHH #v i?2byQM2@ are taken the target and it #2 willKQ/2H2/ be modeled the M/ one-irQ@7QH/@+Hmbi2`BM; and two-foldK2i?Q/bX b b22M BM 6B;bX j iQ +#Qi? K2i?Q/b vB2H/ ;QQ/ }ib iQ i?2fitsbBKmHiBQM@ clustering methods. As seen in Figs. 3(a) to (c), both methods yield good to the ;2M2`i2/ i`;2i asa /iX hQdata. [mMiB7v i?2 ;`22K2Mi #2ir22M i?2 }ii2/ M/ i?2 simulation-generated target SAXS To quantify the agreement between the fitted bBKmHiBQM@;2M2`i2/ -  b+Q`2 Bb /2}M2/ b(R9- jk), and the simulation-generated I (q), T`K2i2` a score parameter is defined as [14, 32]: χ2 =

r?2`2 M/ 1Mb2K#H2 -

[∑ iN=1 Pi ∗ Ii (qs )− Itar (qs )]2 1 , ∑ L−1 s σ2 ( q s )

(3.1)

`2 i?2 }ii2/ TQTmHiBQM M/ bKHH M;H2 b+ii2`BM; T`Q}H2b 7Q` am# Bb i?2 i`;2i asa T`Q}H2- M/ Bb i?2 biM/`/ /2pBiBQM KQM;

P. Tan et al. / Commun. Comput. Phys., 25 (2019), pp. 1010-1023

1015

Figure 3: (Color online) Comparison of target SAXS profiles with those derived from one- and two-fold-clustering methods IDP,QMHBM2V (b) lysozyme and (c) MerA. The error bar represents one standard deviation. 6B;m`2 for j, (a) U+QHQ` *QKT`BbQM Q7 i`;2i asa T`Q}H2b rBi? i?Qb2 /2`Bp2/ 7`QK QM2@

M/ irQ@7QH/@+Hmbi2`BM; K2i?Q/b 7Q` UV A.S U#V HvbQxvK2 M/ U+V J2`X h?2 2``Q` #` `2T`2b2Mib QM2I biM/`/ /2pBiBQMX where Pi and i ( q) are the fitted population and small angle scattering profiles for Sub Ensemble i, Itar (q) is the target SAXS profile, and σ(q) is the standard deviation among all Ii (q) in the target trajectory. The values of χ2 are displayed in Table 1, showing that one-fold method agrees better with the target SAXS profiles. Table 1: Scoring parameters, χ2 , the cutoffs of RMSD, DRMS and dihedral distance, and the ratio between γone − f old and γtwo − f old. The analysis is based on SAXS profiles. Detailed procedures can be found in SI.

χ2one χ2two lysozyme

IDP

MerA

γone/γtwo cutoff χ2one χ2two γone/γtwo cutoff χ2one χ2two γone/γtwo cutoff

RMSD 1.27291 1.61756 5.5264 ˚ 1.65 A

DRMS 0.71003 1.2889 9.129 ˚ 1A

0.34998 1.9347 4.1128 ˚ 11 A 0.3793 2.0482 13.7346 ˚ 6.5 A

0.30538 1.4863 4.6989 ˚ 9A 0.2303 2.5459 7.2686 ˚ 4A

Dihedral distance 0.8444 1.73755 8.2081 0.15 0.2985 2.2675 16.7 0.52 0.3382 2.5594 3.4194 0.16

1016

P. Tan et al. / Commun. Comput. Phys., 25 (2019), pp. 1010-1023

d

Figure 4: (Color online) Simulation tests on the performance of one-fold and two-fold-clustering methods.

Comparison of the true populations of sub ensembles the target MD trajectories with theM/ populations 6B;m`2 9, U+QHQ` QMHBM2V aBKmHiBQM i2bib constituting QM i?2 T2`7Q`KM+2 Q7 QM2@7QH/ irQ@7QH/@ derived from one-fold and two-fold methods by fitting to the target I (q ) for (a) IDP, (b) lysozyme and (c) MerA. +Hmbi2`BM; K2i?Q/bX *QKT`BbQM i?2 i`m2 bm# 2Mb2K#H2b +QMbiBimiBM; (d) Relative errors, γ defined in Eq. (3.2)Q7 resulting from TQTmHiBQMb one- and two-foldQ7 methods and (e) the ratio of γone /γtwo i?2 for different systems. A Monte Carlo procedure is used to derive the populations of sub ensembles using the i`;2i J. i`D2+iQ`B2b rBi? i?2 TQTmHiBQMb /2`Bp2/ 7`QK QM2@7QH/ M/ irQ@7QH/ K2i?Q/b one- and two-fold methods, details of which can be found in supporting information. All the results presented in this figure are i`;2i obtained by using as theU#V structural metric. M/ UsingU+V other structural will provide #v }iiBM; iQ i?2 7Q`RMSD UV A.SHvbQxvK2 J2`X U/Vmetrics _2HiBp2 2``Q`bqualitative similar results (see Table 1). /2}M2/ BM 1[X UjV `2bmHiBM; 7`QK QM2@ M/ irQ@7QH/ K2i?Q/b M/ U2V i?2 `iBQ Q7 f 7Q` /Bz2`2Mi bvbi2KbX  JQMi2 *`HQ T`Q+2/m`2 Bb mb2/ iQ /2`Bp2 i?2 TQTmHiBQMb Q7 bm# Although both methods provide seemingly good fits to the target data (Fig. 3), the 2Mb2K#H2b mbBM; i?2 QM2@ M/ irQ@7QH/ K2i?Q/b- /2iBHb Q7 r?B+? +M #2 7QmM/ BM bmTTQ`iBM; resulting populations of sub ensembles differ dramatically (see Figs. 4(a) to (c)). For BM7Q`KiBQMX HH i?2 `2bmHib T`2b2Mi2/ BM i?Bb };m`2 `2 Q#iBM2/ #v mbBM; _Ja. b i?2 comparison, the true populations of sub ensembles constituting the target MD trajectories bi`m+im`H K2i`B+b T`QpB/2with [mHBiiBp2 bBKBH` `2bmHib are alsoK2i`B+X plotted, lbBM; and areQi?2` foundbi`m+im`H to be in much betterrBHH agreement those derived from Ub22 the h#H2 RVX two-fold-clustering method. (The true populations of sub ensembles are obtained by

directly examining each snapshot in the target MD trajectory, and the detailed procedure for how to determine them is presented in Fig. S4 in SI.) To quantify the accuracy of these HH two methods BM i?2 i`;2i i`D2+iQ`vX h?2 pHm2b Q7 `2 /BbTHv2/ BM h#H2 R- b?QrBM; i?i in predicting the populations of protein conformations, an error parameter, QM2@7QH/ K2i?Q/ ;`22b #2ii2` rBi? i?2 i`;2i asa T`Q}H2bX γ, is defined: v u N Hi?Qm;? #Qi? K2i?Q/b T`QpB/2 b22KBM;Hv ;QQ/ }ib iQ i?2 i`;2i /i U6B;X jV- i?2 u ∑i=1 ( Pi − P∗ )2 i t `2bmHiBM; TQTmHiBQMb Q7 bm# 2Mb2K#H2b Ub22 6B;bX 9 iQ +VX 6Q` , (3.2)+QK@ γ = /Bz2`N /`KiB+HHv ∗ )2 ( P ∑ i = 1 i T`BbQM- i?2 i`m2 TQTmHiBQMb Q7 bm# 2Mb2K#H2b +QMbiBimiBM; i?2 i`;2i J. i`D2+iQ`B2b

∗ `2 HbQ THQii2/`2 7QmM/ iQ #2 BM Km+? #2ii2` ;`22K2Mi rBi? i?Qb2 /2`Bp2/ 7`QK where Pi and PM/ i are the fitted and true populations for a given Sub Ensemble i, respeci?2 tively, irQ@7QH/@+Hmbi2`BM; K2i?Q/X Uh?2 i`m2ensembles.Thus, TQTmHiBQMb Q7γbm# 2Mb2K#H2b `2 Q#iBM2/ and N is the total number of sub quantifies the deviation be- #v /B`2+iHv 2tKBMBM; 2+?the bMTb?Qi BM i?2 i`;2i J. i`D2+iQ`vi?2agreement /2iBH2/ T`Q+2/m`2 tween the fitted and true populations, i.e., smaller γ indicatesM/ better to the true populations. As seen in Figs. 4(d), 4(e) and Table 1, γ is many folds larger 7Q` ?Qr iQ /2i2`KBM2 i?2K Bb T`2b2Mi2/ BM 6B;X a9 BM aAXV hQone [mMiB7v i?2 ++m`+v Q7 i?2b2 − f old γtwo−BM , i.e., the populations derivedQ7from the two-fold method is to f oldT`2/B+iBM; irQ than K2i?Q/b i?2 TQTmHiBQMb T`Qi2BM +QM7Q`KiBQMbMmuch 2``Q`closer T`K2i2`the true ones, although I ( q ) derived from the one-fold method agrees better with the - Bb /2}M2/, 2

target SAS results, i.e., smaller χ (see Table 1). Hence, determining the 3D protein struc-

r?2`2

M/

`2 i?2 }ii2/ M/ i`m2 TQTmHiBQMb 7Q`  ;Bp2M am# 1Mb2K#H2 -

P. Tan et al. / Commun. Comput. Phys., 25 (2019), pp. 1010-1023

1017

tures simply based on examining the value of χ2 can sometimes be misleading. These findings are independent of which portion of the MD trajectory is used to represent the target I (q) (see Fig. S1 in SI), independent of the size, compactness, secondary and tertiary structures of the system studied, independent of the choice of the structural metric, and independent of whether SAXS or SANS profiles (see Table S1) are considered. The improved performance of the two-fold-clustering method arises from not assuming that the structurally proximate protein conformations possess similar SAS profiles, an assumption which can be invalid. This is evident by Figs. 5(a) to (c), which show no strong correlation between the structural difference (e.g., RMSD) δij (difference between I (q) as defined in Eq. (2.1)) for every two conformations of a given protein system. Similar findings have been also reported by others [15, 32] and should be generally applicable to bio-macromolecules as the test is conducted herein over a wide range of systems, independent on the choice of the structural metric (Figs. S2 and S3 in SI) and independent on whether small angle X-ray or neutron scattering is considered (Fig. S3 in SI).

3.2 Experimental test In addition to the simulation test, we also examined the performance of the two methods using the experimental SAXS data of MerA [14]. In order to get sufficient sampling of protein conformations to have better agreement with SAXS data, here we conducted gas phase simulation of MerA (see more details in SI), the protein structures generated from which were used for cluster analysis to fit SAXS data. As seen in Fig. 6(a), the two methods both provide good fits to the experimental I (q), with the one-fold method presenting a slightly smaller χ2 , but yield vastly different populations of representative structures, i.e., sub ensembles (see Fig. 6(b)). The two-fold method predicts two highly populated protein structures, while the one-fold method gives one dominant structure with the other two much less populated conformations. More importantly, the dominant protein structure identified by the one-fold method does not agree with any of the two structures revealed by the two-fold method. MerA possesses metallochaperone-like N-terminal domains (NMerA), which are tethered to the homodimeric catalytic core by 30 residue linkers [1, 2] (Fig. 8). NMerA contains a pair of cysteine residues in a GMTCXXC sequence motif that is conserved in those soft metal ion trafficking proteins sharing the common βαββαβ structural fold [18]. MerA binding and processing of Hg(II) is schematically illustrated in Fig. 8. Hg(II) binds first to a pair of cysteines (C11 and C14) in NMerA, these being the most solventaccessible cysteines in the protein [17, 18]. The ion is then transferred to another pair of cysteines (C561’ and C562’) located on the mobile C-terminal segment of the core. The mobile segment then moves the C-terminal Hg(II)-bound cysteines from the surface to the interior of the protein, from which the ions is further delivered to the buried, activesite cysteine pair (C135 and C140) where the reduction occurs [9,10]. In vivo experiments showed that the presence of NMerA domains significantly enhances cell survival in the presence of Hg(II) [17], in which the high-affinity Hg(II)-chelating NMerA domains are

1018

P. Tan et al. / Commun. Comput. Phys., 25 (2019), pp. 1010-1023 N

6B;m`2 8, *QKT`BbQM Q7 _Ja. M/ i?2 /Bz2`2M+2 BM asa T`Q}H2b-

b /2}M2/ #v

Figure 5: Comparison of RMSD and the difference in SAXS profiles, δij as defined by Eq. (2.1) for every two 1[X URV 7Q` 2p2`v irQ T`Qi2BM +QM7Q`KiBQMb 7Q` UV A.S- U#V HvbQxvK2 M/ U+V K2`X protein conformations for (a) IDP, (b) lysozyme and (c) merA. We found that the between SAS q2 7QmM/ i?i i?2 /Bz2`2M+2 #2ir22M aa T`Q}H2b 7Q` irQ T`Qi2BM +QM7Q`KiBQMb ?bdifference MQ profiles for two protein conformations hasbi`m+im`H no strong correlation with their`2structural difference. These findings bi`QM; +Q``2HiBQM rBi? i?2B` /Bz2`2M+2X h?2b2 }M/BM;b BM/2T2M/2Mi QM are independent r?B+? on which structural metric (Figs. S3 in SI) QM and independent on whether small bi`m+im`H K2i`B+ mb2/ U6B;bXused ak M/ aj BMS2 aAV and M/ BM/2T2M/2Mi r?2i?2` bKHH M;H2 s@`v Q` M2mi`QM Bb +QMbB/2`2/ aj BM aAVX angle X-ray or neutron scattering is b+ii2`BM; considered (Fig. S3U6B;X in SI).

essential for rapid acquisition, localization and transfer of Hg(II) to the core for reduction and detoxification [17]. Ref. [13], by combining small angle scattering, coarse-grained simulation and all-atom molecular dynamics simulation, revealed that MerA adopts a highly compact structure in solution, where the NMerA domains are leashed by the linkers to bound strongly with the core domain, being close to the C-terminal Hg(II)-binding cysteines for rapid delivery of mercury ions. It is further confirmed by neutron spin echo

P. Tan et al. / Commun. Comput. Phys., 25 (2019), pp. 1010-1023

RR 1019

Figure 6: (Color online) QMHBM2V The experimental test on the i2bi performance one- and two-fold-clustering methods in 6B;m`2 e, U+QHQ` h?2 2tT2`BK2MiH QM i?2 of T2`7Q`KM+2 Q7 QM2@ M/ irQ@7QH/@ the case of MerA. (a) Comparisons of I(q) of MerA obtained from the SAXS experiment with that derived from i?2RMSD +b2 Q7asJ2`X UV *QKT`BbQMb Q7 AU[V clustering. Q7 J2` Q#iBM2/ 7`QK of one- +Hmbi2`BM; and two-foldK2i?Q/b methods BM using the structural metric for structural (b) Populations i?2 asa 2tT2`BK2Mi rBi? i?i /2`Bp2/ 7`QK QM2@ M/ irQ@7QH/ K2i?Q/b mbBM; _Ja. b sub ensembles derived from one-fold (black), two-fold methods (red).

i?2 bi`m+im`H K2i`B+ 7Q` bi`m+im`H +Hmbi2`BM;X U#V SQTmHiBQMb Q7 bm# 2Mb2K#H2b /2`Bp2/ 7`QK QM2@7QH/ U#H+FV- irQ@7QH/ K2i?Q/b U`2/VX

measurements, which showed no appreciable inter-domain dynamics being present between the NMerA and core domains [13]. *@i2`KBMH b2;K2Mi i?2 +Q`2X h?2 KQ#BH2 b2;K2Mi i?2M in KQp2b >;UAAV@ As shown by TableQ72 and Fig. 7, Sub Ensemble 1 (Red Fig. i?2 7) is*@i2`KBMH the most extended #QmM/ +vbi2BM2b 7`QK i?2 bm`7+2 iQ i?2 BMi2`BQ` Q7 i?2 T`Qi2BM7`QK r?B+? i?2 Bb conformation, in which its NMerA domains are detached significantly from the BQMb catalytic 7m`i?2` /2HBp2`2/ iQ i?2 #m`B2/- +iBp2@bBi2 +vbi2BM2 TB` U*Rj8 M/ *R9yV r?2`2 i?2 `2@ core (marked in red in Table 2), thus contradicting the findings of Ref. [13]. This unlikely /m+iBQM Q++m`b(N- Ry)X AM pBpQ 2tT2`BK2Mib b?Qr2/ i?i i?2 T`2b2M+2 Q7 LJ2` /QKBMb conformation is highly populated (66%) as revealed by the one-fold method, but has bB;MB}+MiHv 2M?M+2b +2HH bm`pBpH BM i?2 T`2b2M+2 Q7 >;UAAV(Rd)- BM r?B+? i?2 ?B;?@{MBiv zero>;UAAV@+?2HiBM; weight as predicted the two-fold method. Moreover, theHQ+HBxiBQM most dominant strucLJ2`by /QKBMb `2 2bb2MiBH 7Q` `TB/ +[mBbBiBQMM/ i`Mb@ ture72` identified the +Q`2 two-fold method,M/ i.e., Sub Ensemble 2_27X(Rj)(Blue in#vFig. 7), is the most Q7 >;UAAV by iQ i?2 7Q` `2/m+iBQM /2iQtB}+iBQM(Rd)X +QK#BMBM; bKHH compact conformation among the three sub ensembles and has the shortest distances M;H2 b+ii2`BM;- +Q`b2@;`BM2/ bBKmHiBQM M/ HH@iQK KQH2+mH` /vMKB+b bBKmHiBQMbetween C11i?i andJ2` C561’/QTib and between andbi`m+im`2 C561, i.e., measur`2p2H2/  ?B;?Hv C11’ +QKT+i BM characteristic bQHmiBQM- r?2`2distances i?2 LJ2` /Q@ ing KBMb the capability of #v thei?2 linker to leash the bi`QM;Hv NMerA rBi? domain around the core `2 H2b?2/ HBMF2`b iQ #QmM/ i?2 +Q`2 /QKBM#2BM;C-terminal +HQb2 iQ Hg(II)-binding cysteines, which is in better7Q`agreement withQ7the findingBQMbX of Ref. [13]. Howi?2 *@i2`KBMH >;UAAV@#BM/BM; +vbi2BM2b `TB/ /2HBp2`v K2`+m`v Ai Bb 7m`i?2` +QM}`K2/ #v M2mi`QM bTBM 2+?Q K2bm`2K2Mib- r?B+? b?Qr2/ MQ TT`2+B#H2 BMi2`@/QKBM /vMKB+b #2BM; T`2b2Mi #2ir22M i?2 LJ2` M/ +Q`2 /QKBMb(Rj)X b b?QrM #v h#H2 k 6B;X d- am# 1Mb2K#H2ofR MerA U_2/ conformations BM 6B;X dV Bb revealed i?2 KQbi +QM7Q`KiBQM- BMmethods r?B+? by TableM/ 2: Structural characteristics by 2ti2M/2/ one- and two-fold-clustering modeling the experimental data [14]. The values inside brackets represent one standard deviation. Bib LJ2` /QKBMbSAXS `2 /2i+?2/ bB;MB}+MiHv 7`QK i?2 +iHviB+ +Q`2 UK`F2/ BM `2/Values BM marked in red that the conformation is highly h#H2 kV-indicate i?mb +QMi`/B+iBM; i?2 }M/BM;b Q7 extended. _27X(Rj)X h?Bb mMHBF2Hv +QM7Q`KiBQM Bb ?B;?Hv TQTmHi2/ UeeWV #v i?2 QM2@7QH/ K2i?Q/#mi ?b r2B;?i b/ T`2/B+i2/ #v sub- b `2p2H2/ population Minimum distance be- x2`Q C11-C516’ C11’-C516 i?2 irQ@7QH/ensemble K2i?Q/X JQ`2Qp2`- i?2 tween KQbi /QKBMMi bi`m+im`2 B/2MiB}2/ #v i?2 irQ@7QH/ ˚ each of the two (A) K2i?Q/- BX2X6B;X dV- and Bb i?2the KQbi +QKT+i +QM7Q`KiBQM KQM; IDam# 1Mb2K#H2 k U"Hm2 BM NmerAs core ˚/BbiM+2b #2ir22M *RR M/ *8eRǶ M/ #2@ i?2 i?`22 bm# 2Mb2K#H2b M/ ?b i?2domain b?Q`i2bi(A) ir22M *RRǶ M/1*8eR- BX2X- 66% +?`+i2`BbiB+ /BbiM+2b K2bm`BM; i?2 +T#BHBiv Q7 i?2 HBMF2` 14(6)/12(5) 87(14)/74(13) iQ H2b? i?2 LJ2` /QKBM `QmM/ i?2 +Q`2 *@i2`KBMH >;UAAV@#BM/BM; +vbi2BM2b- r?B+? 2 21% 7 (3)/5 (3) 64(10)/61(11) one-fold Bb BM #2ii2` ;`22K2Mi rBi? i?2 }M/BM; Q7 _27X(Rj)X >Qr2p2`- i?Bb +QKT+i +QM7Q`KiBQM 3 13% 10 (4)/9 (4) 72(11)/75(12) r2B;?ib Km+? H2bb b T`2/B+i2/ #v i?2 QM2@7QH/ K2i?Q/X >2M+2- i?2 2tT2`BK2MiH i2bi HbQ 1 0% 14(6)/12(5) 87(14)/74(13) 2 53% 7(3)/5 (3) 64(10)/61(11) two-fold 3 47% 10 (4)/9 (4) 72(11)/75(12)

Rk

1020

P. Tan et al. / Commun. Comput. Phys., 25 (2019), pp. 1010-1023

Figure 7: Structural representatives for Sub Ensemble 1 (red), Sub Ensemble 2 (blue) and Sub Ensemble 3 (orange) obtained through cluster analysis on the simulation-generated protein conformations. To generate a sufficiently broad pool of protein conformations for further clustering to have a reasonably quantitative agreement with SAXS experimental results, we performed a gas-phase MD for MerA at 500 K [14] (see more details in the supporting information). As the experimental SAXS data were collected from a protein solution, water 6B;m`2 d, significantly ai`m+im`H contribute `2T`2b2MiiBp2b am# 1Mb2K#H2 R U`2/Vam# U#Hm2V M/ the molecules will to the 7Q` scattering signals. To account for1Mb2K#H2 the solventk contribution, am# 1Mb2K#H2 j UQ`M;2V i?`Qm;? +Hmbi2` MHvbBbFoXS QM i?2 SAXS profile calculated from MD Q#iBM2/ conformations is using the software [27],bBKmHiBQM@;2M2`i2/ which assumes an implicit solvent layer. +QM7Q`KiBQMbX hQ ;2M2`i2  bm{+B2MiHv #`Q/ TQQH Q7 T`Qi2BM +QM7Q`KiBQMb Rj T`Qi2BM 7Q`

7m`i?2` +Hmbi2`BM; iQ ?p2  `2bQM#Hv [mMiBiiBp2 ;`22K2Mi rBi? asa 2tT2`BK2MiH `2bmHib- r2 T2`7Q`K2/  ;b@T?b2 J. 7Q` J2` i 8yy E(R9) Ub22 KQ`2 /2iBHb BM i?2 bmTTQ`iBM; BM7Q`KiBQMVX b i?2 2tT2`BK2MiH asa /i r2`2 +QHH2+i2/ 7`QK  T`Qi2BM bQHmiBQM- ri2` KQH2+mH2b rBHH bB;MB}+MiHv +QMi`B#mi2 iQ i?2 b+ii2`BM; bB;MHbX hQ ++QmMi 7Q` i?2 bQHp2Mi +QMi`B#miBQM- i?2 asa T`Q}H2 +H+mHi2/ 7`QK J. +QM7Q`KiBQMb Bb mbBM; i?2 bQ7ir`2 6Qsa(kd)- r?B+? bbmK2b M BKTHB+Bi bQHp2Mi Hv2`X

3, picture b+?2KiB+ TB+im`2where Q7 J2` r?2`2 i?2 +`m+BHfor`2bB/m2b 7Q` i?2 2MxvKiB+ Figure 8: A 6B;m`2 schematic of MerA the crucial residues the enzymatic function are labeled. 7mM+iBQM `2 H#2H2/X

7pQ`b i?2 T`Qi2BM +QM7Q`KiBQMH 2Mb2K#H2 `2p2H2/ #v i?2 irQ@7QH/ +Hmbi2`BM; K2i?Q/X

9

*QM+HmbBQM

"v bim/vBM; i?`22 T`Qi2BMb rBi? /Bz2`2Mi ~2tB#BHBiv @ M BMi`BMbB+HHv /BbQ`/2`2/ T`Qi2BM +QKT+i bBM;H2@/QKBM ;HQ#mH` T`Qi2BM M/  KmHiB@/QKBM T`Qi2BM- i?2 T`2b2Mi rQ`F /2KQMbi`i2b i?i i?2 #bBb@bmTTQ`i2/ 2Mb2K#H2 K2i?Q/ +M ;Bp2 `Bb2 iQ BM++m`i2 b@ b2bbK2Mi Q7 i?2 TQTmHiBQMb Q7 T`Qi2BM +QM7Q`KiBQMb BM bQHmiBQM r?2M KQ/2HBM; i?2 bKHH M;H2 b+ii2`BM; 2tT2`BK2MiH /iX h?Bb `Bb2b 7`QK i?2 7BHm`2 Q7 i?2 bbmKTiBQM K/2 i?i bi`m+im`HHv bBKBH` T`Qi2BM +QM7Q`KiBQMb ?p2 bBKBH` bKHH M;H2 b+ii2`BM; T`Q}H2bX >2`2-  irQ@7QH/ +Hmbi2`BM; K2i?Q/ Bb /2p2HQT2/- r?B+? +HbbB}2b i?2 bBKmHiBQM@/2`Bp2/ T`Qi2BM +QM7Q`KiBQMb #b2/ QM bBKBH`Biv QM #Qi? j. bi`m+im`2 M/ b+ii2`BM; T`Q}H2bX b +QM}`K2/ #v #Qi? bBKmHiBQM M/ 2tT2`BK2MiH `2bmHib- i?Bb M2r K2i?Q/ T`QpB/2b

P. Tan et al. / Commun. Comput. Phys., 25 (2019), pp. 1010-1023

1021

ever, this compact conformation weights much less as predicted by the one-fold method. Hence, the experimental test also favors the protein conformational ensemble revealed by the two-fold clustering method.

4 Conclusion By studying three proteins with different flexibility - an intrinsically disordered protein, a compact single-domain globular protein and a multi-domain protein, the present work demonstrates that the basis-supported ensemble method can give rise to inaccurate assessment of the populations of protein conformations in solution when modeling the small angle scattering experimental data. This arises from the failure of the assumption made that structurally similar protein conformations have similar small angle scattering profiles. Here, a two-fold clustering method is developed, which classifies the simulation-derived protein conformations based on similarity on both 3D structure and scattering profiles. As confirmed by both simulation and experimental results, this new method provides exact the same set of representative protein structures as the basissupported ensemble method does, but much more accurate populations.

Author contribution P. Tan did the simulation, analyzed the data and wrote the manuscript. Z. Fu did the simulation and analyzed the data. ‡They contribute equally to the work. S. Qian wrote the manuscript. L. Petridis and J. Li designed the work and wrote the manuscript. L. Hong conceived the research, designed the work and wrote the manuscript.

Acknowledgments This work was supported by the National Natural Science Foundation of China (NO. 11504231 and NO. 31630002). The SJTU group acknowledges support from Center for High Performance Computing (HPC) facility at Shanghai Jiao Tong University. LP was supported by the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the U. S. Department of Energy. The MD trajectory of IDP was provided by Prof. Haifeng Chen in School of Life Science and Biotechnology of SJTU. References [1] T. Barkay, K. Kritee, E. Boyd, and G. Geesey. A thermophilic bacterial origin and subsequent constraints by redox, light and salinity on the evolution of the microbial mercuric reductase. Environ Microbiol, 12(11):2904–17, 2010.

1022

P. Tan et al. / Commun. Comput. Phys., 25 (2019), pp. 1010-1023

[2] T. Barkay, S. M. Miller, and A. O. Summers. Bacterial mercury resistance from atoms to ecosystems. FEMS Microbiol Rev, 27(2-3):355–84, 2003. [3] P. Bernado, E. Mylonas, M. V. Petoukhov, M. Blackledge, and D. I. Svergun. Structural characterization of flexible proteins using small-angle x-ray scattering. J Am Chem Soc, 129(17):5656–64, 2007. [4] A. Bjorling, S. Niebling, M. Marcellini, D. van der Spoel, and S. Westenhoff. Deciphering solution scattering data with experimentally guided molecular dynamics simulations. J Chem Theory Comput, 11(2):780–7, 2015. [5] E. Boura, B. Rozycki, D. Z. Herrick, H. S. Chung, J. Vecer, W. A. Eaton, D. S. Cafiso, G. Hummer, and J. H. Hurley. Solution structure of the escrt-i complex by small-angle x-ray scattering, epr, and fret spectroscopy. Proceedings of the National Academy of Sciences of the United States of America, 108(23):9437–9442, 2011. [6] Po-chia Chen and Jochen S. Hub. Interpretation of solution x-ray scattering by explicitsolvent molecular dynamics. Biophysical journal, 108(10):2573–2584, 2015. [7] Pilar Cossio, Alessandro Laio, and Fabio Pietrucci. Which similarity measure is better for analyzing protein structures in a molecular dynamics trajectory? Physical Chemistry Chemical Physics, 13(22):10421–10425, 2011. [8] X. Daura, K. Gademann, B. Jaun, D. Seebach, W. F. van Gunsteren, and A. E. Mark. Peptide folding: When simulation meets experiment. Angewandte Chemie-International Edition, 38(12):236–240, 1999. [9] S. Engst and S. M. Miller. Alternative routes for entry of hgx2 into the active site of mercuric ion reductase depend on the nature of the x ligands. Biochemistry, 38(12):3519–29, 1999. [10] Stefan Engst and Susan M Miller. Rapid reduction of hg (ii) by mercuric ion reductase does not require the conserved c-terminal cysteine pair using hgbr2 as the substrate. Biochemistry, 37(33):11496–11507, 1998. [11] F. Forster, B. Webb, K. A. Krukenberg, H. Tsuruta, D. A. Agard, and A. Sali. Integration of small-angle x-ray scattering data into structural modeling of proteins and their assemblies. J Mol Biol, 382(4):1089–106, 2008. [12] M. A. Graewert and D. I. Svergun. Impact and progress in small and wide angle x-ray scattering (saxs and waxs). Curr Opin Struct Biol, 23(5):748–54, 2013. [13] L. Hong, M. A. Sharp, S. Poblete, R. Biehl, M. Zamponi, N. Szekely, M. S. Appavou, R. G. Winkler, R. E. Nauss, A. Johs, J. M. Parks, Z. Yi, X. Cheng, L. Liang, M. Ohl, S. M. Miller, D. Richter, G. Gompper, and J. C. Smith. Structure and dynamics of a compact state of a multidomain protein, the mercuric ion reductase. Biophys J, 107(2):393–400, 2014. [14] A. Johs, I. M. Harwood, J. M. Parks, R. E. Nauss, J. C. Smith, L. Y. Liang, and S. M. Miller. Structural characterization of intramolecular hg (2+) transfer between flexibly linked domains of mercuric ion reductase. Journal of Molecular Biology, 413(3):639–656, 2011. [15] H. S. Kim and F. Gabel. Uniqueness of models from small-angle scattering data: the impact of a hydration shell and complementary nmr restraints. Acta Crystallogr D Biol Crystallogr, 71(Pt 1):57–66, 2015. [16] D. Kimanius, I. Pettersson, G. Schluckebier, E. Lindahl, and M. Andersson. Saxs-guided metadynamics. J Chem Theory Comput, 11(7):3491–8, 2015. [17] R. Ledwidge, B. Patel, A. Dong, D. Fiedler, M. Falkowski, J. Zelikova, A. O. Summers, E. F. Pai, and S. M. Miller. Nmera, the metal binding domain of mercuric ion reductase, removes hg2+ from proteins, delivers it to the catalytic core, and protects cells under glutathionedepleted conditions. Biochemistry, 44(34):11402–16, 2005. [18] Richard Ledwidge, Baoyu Hong, Volker Dotsch, and Susan M Miller. Nmera of tn 501 mer-

P. Tan et al. / Commun. Comput. Phys., 25 (2019), pp. 1010-1023

[19]

[20]

[21] [22]

[23]

[24] [25] [26] [27]

[28]

[29] [30]

[31] [32]

[33]

1023

curic ion reductase: Structural modulation of the p k a values of the metal binding cysteine thiols. Biochemistry, 49(41):8988–8998, 2010. B. Lindner and J. C. Smith. Sassena - x-ray and neutron scattering calculated from molecular dynamics trajectories using massively parallel computers. Computer Physics Communications, 183(7):1491–1501, 2012. Tatiana Maximova, Ryan Moffatt, Buyong Ma, Ruth Nussinov, and Amarda Shehu. Principles and overview of sampling methods for modeling macromolecular structure and dynamics. PLOS Computational Biology, 12(4):e1004619, 2016. A. N. Naganathan and M. Orozco. The native ensemble and folding of a protein moltenglobule: functional consequence of downhill folding. J Am Chem Soc, 133(31):12154–61, 2011. Alexandr Nasedkin, Moreno Marcellini, Tomasz L. Religa, Stefan M. Freund, Andreas Menzel, Alan R. Fersht, Per Jemth, David van der Spoel, and Jan Davidsson. Deconvoluting protein (un)folding structural ensembles using x-ray scattering, nuclear magnetic resonance spectroscopy and molecular dynamics simulation. PLOS ONE, 10(5):e0125662, 2015. A. Panjkovich and D. I. Svergun. Deciphering conformational transitions of proteins by small angle x-ray scattering and normal mode analysis. Phys Chem Chem Phys, 18(8):5707– 19, 2016. M. Pelikan, G. L. Hura, and M. Hammel. Structure and flexibility within proteins as identified through small angle x-ray scattering. Gen Physiol Biophys, 28(2):174–89, 2009. R. P. Rambo and J. A. Tainer. Accurate assessment of mass, models and resolution by smallangle scattering. Nature, 496(7446):477–81, 2013. B. Rozycki, Y. C. Kim, and G. Hummer. Saxs ensemble refinement of escrt-iii chmp3 conformational transitions. Structure, 19(1):109–116, 2011. D. Schneidman-Duhovny, M. Hammel, J. A. Tainer, and A. Sali. Foxs, foxsdock and multifoxs: Single-state and multi-state structural modeling of proteins and their complexes based on saxs profiles. Nucleic Acids Research, 44(W1):W424–W429, 2016. Roman Shevchuk and Jochen S. Hub. Bayesian refinement of protein structures and ensembles against saxs data using molecular dynamics. PLOS Computational Biology, 13(10):e1005800, 2017. D. I. Svergun. Small-angle scattering studies of macromolecular solutions. Journal of Applied Crystallography, 40(s1):S10–S17, 2007. DIBCKMHJ Svergun, Claudio Barberato, and Michel HJ Koch. Crysola program to evaluate x-ray solution scattering of biological macromolecules from atomic coordinates. Journal of applied crystallography, 28(6):768–773, 1995. J. Trewhella. Small-angle scattering and 3d structure interpretation. Curr Opin Struct Biol, 40:1–7, 2016. V. G. Vandavasi, D. K. Putnam, Q. Zhang, L. Petridis, W. T. Heller, B. T. Nixon, C. H. Haigler, U. Kalluri, L. Coates, P. Langan, J. C. Smith, J. Meiler, and H. O’Neill. A structural study of cesa1 catalytic domain of arabidopsis cellulose synthesis complex: Evidence for cesa trimers. Plant Physiol, 170(1):123–35, 2016. S. C. Yang, L. Blachowicz, L. Makowski, and B. Roux. Multidomain assembled states of hck tyrosine kinase in solution. Proceedings of the National Academy of Sciences of the United States of America, 107(36):15757–15762, 2010.