Solubis tools in YASARA

From Solubis plugin for YASARA
Jump to navigation Jump to search

Preface

In this section we explain the various Solubis functionalities inside YASARA. It is recommended that you have practiced with YASARA before. But don't worry, all commands are straightforward and easy to use. That's the point of the plugin. However, we will refer to YASARA specific nomenclature such as Objects and Molecules. Their definition can be found in the YASARA documentation on your computer in yasara/doc/index.html.

The output of the SOLUBIS calculations are always printed to the YASARA console when the plugin ends. The console should open by default, but when it fails, you can enter the console by pressing the spacebar once or twice.

Solubis tools menu in YASARA

All Solubis tools in YASARA can be accessed by clicking:

Analyze > Solubis > ...

Aggregation predictor

The aggregation predictor used in this Solubis plugin is TANGO [1] .

Tango predicts the aggregation propensity of an input polypeptide sequence and returns the beta-­aggregating segments that have a high tendency to nucleate protein aggregation through the formation of intermolecular beta-­sheets. Aggregation tendency is influenced by pH, temperature, ionic strength and TFE concentration so these parameters can be adjusted to fit your experiment. All these parameters are set to the default values but can be changed in the Options menu.

FoldX energies

When you perform a Solubis run, you can select whether you want to calculate the effect of a mutation on protein stability. Therefor we make use of FoldX [2]. The main focus of FoldX is the prediction of free energy changes, e.g. what happens to the free energy of the protein when we mutate an Asp to a Tyr? FoldX will then calculate the free energy of the wild type (WT) and the mutant (MT) and make the difference:

ΔΔG(change) = ΔG(MT) - ΔG(WT)


As a rule of thumb we use:

ΔΔG(change) > 0 : the mutation is destabilizing

ΔΔG(change) < 0 : the mutation is stabilizing


The error margin of FoldX is approximately 0.5 kcal/mol, so changes in that range are insignificant.

Number of runs

This option affects the FoldX BuildModel command, which is behind the plugin commands 'Solubis run on complete molecule' and 'Solubis run in marked region'. It tells FoldX how many times it should do the specified mutations. We recommend the default number of 3 runs. This is to be certain if the algorithm has achieved convergence, or in other words if the solution offered is the optimal or a trapped solution. Note that a high number of runs will take equivalent more time for the algorithm to finish the mutagenesis. When it is larger than one the algorithm will do the same mutations but changing the rotamer set used and the order of the rotamer moves. In this way other alternative solutions could be explored. For most of the mutations this will not result in significant differences from one run to the other. However, in some cases, like mutations involving Arg (as mutation site or as residue close to the mutation site) with many degrees of freedom in the side-chain, significant differences could be found.

Complete analysis of object

With this command you calculate the aggregation propensity on an object of choice. This will result in a PDB (the original or a new one) where the aggregation-prone regions in the protein are colored, if desired according to strength.

Aggregating stretches present in the protein

In the first option menu, you first define the conditions in which you will work. As Temperature (Kelvin), pH, ionic strength and TFE concentration influence aggregation, you can adapt the default settings.

In the next menu, you can define the specifications of an aggregating stretch:

  1. Window threshold: What is the minimum score for each residue to belong to an aggregating stretch. If you higher this threshold, you select only very strong aggregating stretches. On default we assume that a contiguous stretch of 5 residues with a score of at least 5 is prone to aggregation.
  2. Minimum window size: What is the minimal length of the aggregating stretches you want to output.
  3. Flank size: Aggregating stretches are flanked by so-called gatekeepers that oppose aggregation. Here you define how many gatekeepers on each side are returned.


Next you specify the way of visualization.

In the first panel you select whether the aggregating stretches need to be indicated in the original pdb or in a new pdb (that is automatically loaded) . In the second panel, you set how you want to color the zones:

  • "All same color": aggregating stretches predicted by TANGO are colored in red.
  • "Color gradient by aggregation tendency": the stretch is colored according to its strength, ranging from yellow to red for TANGO. Weak aggregating stretches are defined as having a score below 50, strong aggregating stretches have a score above 70.

At last, the plugin also asks you whether your structure contains a gap.

Define whether your protein structure contains a gap

As TANGO is a sequence based predictor it is important to provide the complete sequence if your sequence contains a gap. If this is not the case, you just press ok and continue. Otherwise you provide us with an alignment file that contains 2 lines for each molecule, f.e. :

Obj1MolA ILT---IITL

Obj1MolA ILTLLLIITL

The first line contains the name of the incomplete PDB structure (ObjXMolY), followed with a tab and the sequence as present in the PDB with '-' indicating that there is a gap at that position. The second line also contains the name of the incomplete PDB structure (ObjXMolY), followed with a tab and the complete sequence. Both sequences should have the same length and it is important that the first line contains exactly the same sequence as in the structure with the additional '-' indicating a gap.

The output generated in the console returns all the discovered stretches in a table format. In the different columns you find the following information:

  1. ProteinName: the name of the analysed structure (ObjXMolY)
  2. Predictor: which aggregation predictor is used, either TANGO or WALTZ
  3. Position Sequence: this indicates the start of the aggregating stretch in the pdb sequence. If your structure contains a gap, this position is based on the complete protein sequence
  4. Position PDB: Sometimes the numbering of your structure doesn't start from 1. Therefore, we provide in this column the residue number as present in the PDB.
  5. Nterm_GK: the N-terminal gatekeepers. The number depends on the settings (as described above).
  6. Stretch: the aggregating stretch.
  7. Cterm_GK: the C-terminal gatekeepers. The number depends on the settings (as described above)
  8. Score: the score of the aggregating stretch as returned by TANGO an WALTZ. A strong aggregating stretch is defined as having a score above 70.
  9. Stretch present in the structure: this is only relevant if your structure contains a gap. If there is an aggregating stretch present that has no or only partial structural information, we indicate this with 'NO' or 'YES but not completely'

The output can also be downloaded as described in the tab Save last calculation.

Complete analysis of a molecule

This menu performs the same action as "Complete analysis of Object". You can select one or more molecules in different objects. All the other options are exactly the same as described before.

Mutate residue

With this command, you can analyze the effect of a point mutation on the aggregation tendency. First you select the residue that you want to mutate, second you choose the amino acid changes you want to create. You can mutate one residue in multiple possibilities by holding the Ctrl-key.

After these steps, the same steps as in the previous commands have to be followed.

Running this command result in the creation of new pdbs:

  1. We generate a new PDB for the wild type structure where the TANGO aggregating stretches are indicated.
  2. A new PDB for each selected mutant is also created where the TANGO aggregating stretches and the mutated residue are indicated.

In the console, you'll again receive a table with the aggregating stretches present in both WT and mutants. As a conclusion, we also output whether a mutation

  1. has no effect on the aggregation tendency (difference between -50 and 50)
  2. increases the aggregation tendency (difference above 50)
  3. decreases the aggregation tendency (difference below -50)

If you also want to analyze the effect of the point mutation on protein stability, we advise you to use the FoldX plugin [3]

Solubis run on a complete molecule

This command performs the Solubis method which optimizes your protein by introducing stabilizing mutations that reduce the aggregation tendency. By combining FoldX and TANGO, it is possible to create a sub-optimal protein that is more stable during protein purification and other applications. The method first searches for a selected number of mutants that minimize the intrinsic aggregation tendency and, if selected, do not destabilize the protein. As minimizing the aggregation tendency is done by introducing so called gatekeepers (P, R, K, D and E) (Rousseau, et al., 2006) into APRs, it is advised to filter out the destabilizing mutations with FoldX.

The required steps are:

Select the molecule to mutate

Here you can select the molecule that you want to analyze.

Select the molecule you want to analyze
Set the options for TANGO
  • Determine physico-chemical conditions:

Here you can specify the conditions in which you will work. As Temperature (Kelvin), pH, ionic strength and TFE concentration influence aggregation, you can adapt the default settings.

Set the options for TANGO
  • Definition of aggregating stretch:

As described in Complete analysis of Object, you define the specification of an aggregating stretch:

  1. Window threshold: What is the minimum score for each residue to belong to an aggregating stretch. If you higher this threshold, you select only very strong aggregating stretches. On default we assume that a contiguous stretch of 5 residues with a score of at least 5 is prone to aggregation.
  2. Minimum window size: What is the minimal length of the aggregating stretches you want to output.
  3. Flank size: Aggregating stretches are flanked by so-called gatekeepers that oppose aggregation. Here you define how many gatekeepers on each side are returned.
Define the characteristics of an aggregating stretch
Enable the FoldX analysis

Introducing gatekeepers to minimize aggregation has the drawback that most of these mutations destabilize the protein. Therefore it is important to run FoldX to select only these mutations that stabilize the protein structure (Δ ΔG(change) < -0.5). In the FoldX routines menu there are two possibilities

  • FoldX RepairPDB:

To calculate the stability change upon mutation with FoldX, it is necessary to start from a repaired structure. The RepairPDB command performs this action and minimizes the energy of a protein structure by rearranging the amino acid side-chains in order to get a better free energy of the protein. RepairPDB only rearranges side-chains, not the backbone. If you start from a repaired PDB structure, it is not necessary to select this box. We refer to the FoldX manual for an even more detailed explanation on repairing PDB structures.

  • Calculate stability change:

This command calculates and displays the stability change upon mutation, meaning the difference in stability between mutant and wild type structure. All the mutations that decrease the aggregation tendency will be analyzed for their effect on protein stability where mutations with a negative Δ ΔG stabilize the protein structure ( see FoldX energies)

Select 'Calculate stability change if you want to investigate the effect on protein structure
Set FoldX options (if selected)

If you specified in the previous menu that you want to analyze the effect of the point mutation on protein stability, you have to specify the options in this menu. Otherwise you can just ignore it and press ok.

  • number of runs: Tells the algorithm how many times it should do the specified mutations. Look here for more information.
  • temperature: Temperature (K).
  • pH: At the moment only used for pH effects on metal binding. Will not affect the protonation of charged groups in this version.
  • ionic strength: Ionic strength of the solution (M) x 100. The multiplication by 100 was needed for YASARA implementation since no decimal numbers are allowed in menus. So the default of 5 is actually 0.05M.
  • van der waals design: When set to 2, it considers rotamer penalizations due to internal clashes, maximum penalization for inter-residue VanderWaals' clashes, a ceiling for the VanderWaals’ clashes between two atoms of 5 kcal/mol and strict H-bond geometry. When set to zero there is a weak rotamer penalization, there is a ceiling for the VanderWaals’ clashes of 1 kcal/mol and we use a relaxed H-bond geometry. This option should be set to 2 (default in this plugin).


Set the options for FoldX
Select the thresholds

In this menu, you define the thresholds for the preferred effect on aggregation tendency and protein stability:

  • Decrease in aggregation tendency: Here you define the minimal difference in aggregation tendency between the wild type protein and the mutants. We search for mutations that decrease the aggregation tendency, and it is recommended to set this filter quite stringent. On default, we assume that a mutation lowers the aggregation tendency if the score drops with at least 300. If you don't retrieve mutants, you can lower this threshold.
  • Maximum ΔΔG (if calculated): This value indicates the maximum ΔΔG a mutation can have. On default this parameter is set to -0.5, meaning that we only search for mutations that stabilize the protein structure. If you do not retrieve Solubis mutations, you can higher this threshold but remember that positive values indicate that the mutation will destabilize you protein structure ( see FoldX energies)
Set the thresholds that the mutations have to fulfill
Select how many mutations you want to retrieve

In this menu you select how many mutations that fulfill the criteria (set in the previous menu) should be returned. By default, we pick one but you can higher the number. However, you have to realize that the more mutations you request, the more time it will take the plugin to finish. If you don't receive the selected number of mutations, this means that not enough mutations fulfill the criteria. You can try to lower the thresholds.

How many mutations you want to retrieve
Indicate whether your structure contains a gap

As described in the previous methods (see Complete analysis of Object), you have to indicate whether your structure contains a gap. If an aggregating stretch is not completely present in the PDB, we don't analyze it.

As output in the console, we return all the mutation that were analyzed, grouped per category

  1. Mutations that did not lower the aggregation tendency enough (according to threshold)
  2. Mutations that minimise the aggregation tendency but destabilize too much (according to threshold)
  3. Mutations that fulfill the criteria

Using the Save last calculation method, a file containing a table with all the information is provided.

Solubis run in marked region

As a Solubis run on a complete molecule can last very long, we added an extra command where you don't analyze a complete molecule but you select which part of the protein you want to analyze.

The first menu that pops up, asks you to select the residues that you want to analyze.

Select the residues you want to analyze

To minimize calculation time, we advise to first run a Complete analysis of Object which identifies the aggregating stretches present in your protein. As Solubis only performs mutations in these aggregating stretches, you can select the aggregating stretch of choice. As such you can f.e. avoid to run SOLUBIS on an aggregating stretch present in the catalytic site and so on.

Save last calculation

This option lets you specify a target folder and/or filename prefix for the beginning of all files of the last calculation to be saved.

The first selection window lets you choose whether you want to save all files or just the SUMMARY file. This file contains the identified aggregating stretches after a 'complete analysis of object/molecule' run or the effect of the calculated mutations after a 'mutate residue' or 'Solubis' run. If you save everything, you obtain the raw output files of TANGO or FoldX.

In the next selection window you can either select a single folder or a folder and filename prefix:

  • Folder: E.g. selecting c:\testrun\ will save all last calculation files to that folder.
  • Folder with filename prefix: E.g. selecting c:\testrun\MyRun will save all last calculation files to the testrun folder and put MyRun_ before every filename. In this way, it is possible to save more than one calculation in the same folder by using different prefixes.

Configure plugin

See the section Installation and first use for explanation of this option.

References to the SOLUBIS methodology

  1. Fernandez-Escamilla, A.M., Rousseau, F., Schymkowitz, J. and Serrano, L. (2004) Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins, Nat Biotechnol, 22, 1302-1306.
  2. Schymkowitz J, Borg J, Stricher F, Nys R, Rousseau F, Serrano L. The FoldX web server: an online force field. Nucl. Acids Res. 33:W382-8 (2005)
  3. Van Durme J, Delgado J, Stricher F, Serrano L, Schymkowitz J, Rousseau F, Serrano L. A graphical interface for the FoldX force field. Bioinformatics. 2011 Jun 15;27(12):1711-2.