CGenFF program FAQs

Frequency Asked Questions

If you have any queries that are not answered here, please contact us.

Q : What does the more "ne" statements than neighbors warning mean?
Q : What does the sp-hybridized aromatic atom warning mean?
Q : What does the carbon radical, carbocation or carbanion not supported warning mean?
Q : What does the amide base not supported warning mean?
Q : What does the =[N+]= not supported warning mean?
Q : What do readmol2 warnings mean?
Q : What does the unknown bond type warning mean?
Q : Then how do I create a valid mol2 file?
Q : The output has an incorrect total charge. What should I do?
Q : What does the no valid resonance structure found warning mean?
Q : How do I get around the aromatic subgraph too large warning?
Q : Why doesn't my .str file contain all parameters that apply to my molecule?
Q : Why aren't there improper dihedrals on all sp² atoms/planar centers in my molecule?
Q : I got relatively high penalties, suggesting I should perform validation and/or optimization. How should I go about this?
Q : The new release of the CGenFF program gives me parameters with higher penalties. Is it better to keep using parameters from the previous release?
Q : My molecule has really high penalties. Does this mean I should rather use force field X or empirical parameter generation interface Y, which doesn't give me any penalties/scores?
Q : How can I use CGenFF on a complex containing multiple small organic molecules?
Q : How do I use my CGenFF-generated .str file with my simulation software of choice (CHARMM, NAMD, ACEMD, GROMACS,...)?
Q : How should I cite CGenFF?
Q : What happened with versions 0.9.2 - 0.9.5 and 0.9.8 and 0.9.9 of the CGenFF program?
Q : Can I just go to the CGenFF website, submit a number of molecules, and get high-quality parameters out?
Q : If I submit a molecule to the CGenFF web interface, how long does it take before I get my parameters?

Q : What does the more "ne" statements than neighbors warning mean?
Q : What does the sp-hybridized aromatic atom warning mean?
Q : What does the carbon radical, carbocation or carbanion not supported warning mean?
Q : What does the amide base not supported warning mean?
Q : What does the =[N+]= not supported warning mean?
It means that the connectivity of the molecule is not supported by the atom typing rules. These errors are most often encountered when submitting a molecule that misses hydrogens. Other possible causes include unfulfilled valences and inconsistent bond orders. A very common origin of these defects is the conversion from file formats that don't contain connectivity information (such as pdb) to mol2. OpenBabel assigns bond orders based on the proximity between atoms, which is an inherently imprecise process, especially when starting from a distorted structure. We therefore recommend to upload molecules in a format that contains bond orders, such as mol2; tips on how to generate valid mol2 files can be found in this FAQ entry. If the problem occurs despite the molecule being uploaded in mol2 format and you are unsure about how to correct it, it is advisable to ask help from a chemist. If the problem persists, please contact us. Remember, atom typing is almost never correct when these warnings are encountered!

Q : What do readmol2 warnings mean?
Q : What does the unknown bond type warning mean?
Readmol2 warnings point to technical issues with the input mol2 file. If followed by the phrase "skipped molecule" (as in the case of "unknown bond type"), the technical issue prohibits the generation of an output str file. If not, the issue may be as trivial as the presence of fields in the mol2 file that are not part of the mol2 specification, which would be safely ignored. As an example of the latter, mol2 files generated by Accelrys software will trigger several such warnings.

Q : Then how do I create a valid mol2 file?
- Sadly, a lot of popular molecular viewers (eg. VMD, pymol,...) do not generate valid mol2 files.
- The Avogadro molecular editor is an open-source program that allows the user to visualize and edit bond orders and save the result as a valid mol2 file that is conform the mol2 standard. It does not write out a molecule name or atom numbering, but these can readily be added by the user by modifying the mol2 file in a text editor (the ***** on the second line is a placeholder for the molecule name).
- mol2 files from the Zinc database are generally high-quality.
- Here are 2 free closed-source programs that do the job:
  - Accelrys' Discovery Studio Visualizer
  - Schrödinger's Maestro (free for academia only through the Maestro Academic Campaign)
- Here are some non-free programs (other than the paid versions of the above):
  - Chemical Computing Group's MOE
  - Tripos' Sybyl
  - Wavefunction's Spartan
  - Semichem's AGUI (part of the AMPAC suite) after sed -i 's/Ar/ar/' *.mol2
  - Gaussian's GaussView after sed -i 's/Ar/ar/' *.mol2
  - Hypercube's HyperChem after sed -i 's/_\+ //' *.mol2
Please contact us if you think additions should be made to these lists! In particular, it would be nice to know of more open-source programs that have the same abilities.

Q : The output has an incorrect total charge. What should I do?
This is a strong indication that the program used incorrect bond orders. If this is the case, the atom types, charges or parameters are meaningless for your molecule of interest. Possible remedies include:
- If you submitted your molecule in a format other than mol2, OpenBabel may have made a mistake in the conversion to mol2. Please try to generate and submit a valid mol2 file.
- If you submitted your molecule in mol2 format, please check whether the bond orders in the mol2 file are correct.
- If you checked the "Guess bond orders from connectivity" check box, try again without guessing bond orders. Although it doesn't happen often, there are cases where the guess will inevitably be wrong.
- If you thoroughly verified all of the above and the problem persists, you might have hit a rare bug in the atom typing rules. In this case, please contact us.

Q : What does the no valid resonance structure found warning mean?
In the vast majority of cases, this means there are errors in the connectivity of the molecule, as discussed more elaborately in the first FAQ entry.

Q : How do I get around the aromatic subgraph too large warning?
- If your mol2 file contains bonds that are marked aromatic, ("ar" in the @<TRIPOS>BOND section), assign (sensible) single and double bonds instead. The softwares listed above may assist you in doing so. As explained in the references, the CGenFF program internally operates on a single resonance structure; not feeding it aromatic bonds saves it the effort of converting them to single and double bonds.
- If you checked the "Guess bond orders from connectivity" check box, try again without guessing bond orders.

Q : Why doesn't my .str file contain all parameters that apply to my molecule?
The CHARMM General Force Field (CGenFF) program - to which paramchem.org provides an interface - only outputs new parameters that are required for your molecule, as opposed to parameters that are already present in the main CGenFF parameter file. This is why our utilization page stipulates that paramchem.org's output should be read into your molecular simulation program after reading CGenFF.

Q : Why aren't there improper dihedrals on all sp² atoms/planar centers in my molecule?
In CGenFF, out-of-plane motions are often reproduced correctly by the valence angle and (proper) dihedral terms alone. Some moieties (most notably carbonyl groups) do consistently need improper dihedrals, but most centers don't.

Q : I got relatively high penalties, suggesting I should perform validation and/or optimization. How should I go about this?
Validation strategies are in general decided on a case-by-case basis. To determine the most optimal validation approach, some understanding of the force field parametrization procedure is required. CGenFF is mostly optimized targeting QM data, as explained in the 2010 CGenFF paper and the CGenFF tutorial (in addition, the optimization of the bonded parameters is now made easier by the lsfitpar program). After QM-based parameter optimization, it is common to validate the resulting force field against bulk phase experimental data, and fine-tune it if necessary. This is important because the force field ultimately is expected to reproduce bulk phase experimental properties and not gas-phase QM data. However, one cannot entirely optimize a force field based on experimental data because this would require a very large investment of time and computer power and a number of target data points that is generally unavailable for all but the simplest molecules. Indeed, in most practical cases, the few available experimental data points do not correlate optimally with the (high-penalty) parameters to be optimized. For instance, the energetics of binding and solvation correlate with the charge distribution but often not with individual charges. Similarly, solution phases conformational preferences (as typically measured by NMR experiments) can be used to detect problematic dihedral parameters, but are not always straightforward to use for optimizing these dihedrals.
The standard recommendation for end-users to validate a force field is to run simulations to reproduce known experimental properties of interest to the user's project. However, it follows from the above explanation that doing so will not guarantee all the parameters are sensible. The QM calculations detailed in the aforementioned paper and tutorial are often complementary in this respect, in the sense that they can precisely pinpoint parameters in need of optimization, and can directly be used to guide this optimization.

Q : The new release of the CGenFF program gives me parameters with higher penalties. Is it better to keep using parameters from the previous release?
No. Because of improvements in the rules for assignment of charges and parameters by analogy, it is normal to get higher penalties with newer versions on some molecules. This does not imply that the old parameters were more accurate; on the contrary, it means that the penalties for these parameters were too low in the old version, and that after adjusting them upwards, better parameters turn out to be available, with a penalty that is lower than the adjusted penalty for the old parameter but higher than the original (wrong) penalty.

Q : My molecule has really high penalties. Does this mean I should rather use force field X or empirical parameter generation interface Y, which doesn't give me any penalties/scores?
Class I additive force field parameters, in particular dihedrals and charges, are inherently badly transferable between molecules. Short of explicit parameter optimization using QM or experimental target data, parameters for small organic molecules are almost always obtained with analogy- and wildcard-based schemes, which inevitably suffer from this transferability problem. For some molecules, close analogies or closely matching wildcard may be found, resulting in an acceptable description on the compound's physical behavior, but for many other molecules, the description will be inaccurate regardless of the force field or parameter generation interface. The CGenFF program is honest and gives the user an estimate of how bad the description is (which is often slightly pessimistic to err on the side of caution) and even occasionally rejects a badly-supported functional group. There's a strong possibility that the results of product X or Y are no better, only the user is left in blissful ignorance. As mentioned above, the only real solution to the transferability problem is explicit parameter optimization; see this FAQ entry.

Q: How can I use CGenFF on a complex containing multiple small organic molecules?
There is a one-to-one correspondence between the @<TRIPOS>MOLECULE entry in the input mol2 file and the RESI entry in the output topology section. This implies that a @<TRIPOS>MOLECULE record containing multiple molecules will give rise to a RESI entry containing multiple molecules. However, the latter is in violation of CHARMM conventions; a supramolecular RESI may cause problems in simulations. Therefore, the recommended course of action is to submit each molecular species in your complex separately to the CGenFF program at paramchem.org , then use your MD software of choice to assemble your system. In the example of CHARMM, the user has the choice between:
- generating all the small molecules as one segment by issuing one READ SEQUence command containing multiple residues, followed by one GENErate command
- generating each small molecule as a separate segment with its own READ SEQUence command and GENErate command
Note that for PBC calculations, one has to be careful to set up imaging correctly depending on the above choice.

Q: How do I use my CGenFF-generated .str file with my simulation software of choice (CHARMM, NAMD, ACEMD, GROMACS,...)?
Please refer to the CGenFF usage information page. Note that members of the communities around other programs are welcome to submit usage information for those programs to the contact address on that page!

Q: How should I cite CGenFF?
The current CGenFF references are available at the bottom of the summary of output data and its utilization page. Please note that it is important for reproducibility to include the version number of the interface (currently 1.0.0) and the force field (currently 3.0.1) in your computational details!

Q: What happened with versions 0.9.2 - 0.9.5 and 0.9.8 and 0.9.9 of the CGenFF program?
versions 0.9.2 - 0.9.4 were unstable versions for internal testing. Version 0.9.5 never existed - we decided to jump straight to 0.9.6 to reflect the significant improvement in functionality and the (more) advanced stage of beta testing. Similarly, 0.9.8 and 0.9.9 never existed because we jumped straight from 0.9.7.1, the last beta version, to 1.0.0, the first non-beta release.

Q: Can I just go to the CGenFF website, submit a number of molecules and get high-quality parameters out?
Q: If I submit a molecule to the CGenFF web interface, how long does it take before I get my parameters?
When you submit your molecule to paramchem.org, the CGenFF program generates parameters and charges by analogy. This costs only seconds and does not require user interaction, but the quality of the resulting parameters varies widely depending on the availability of analogous molecules in the CGenFF database. Therefore, each of the parameters is accompanied by an approximate "quality score" that should give you some idea of its reliability. If the scores are high, the associated parameters and charges should be validated and/or optimized using the procedure outlined in the CGenFF tutorials. This is made easier by the lsfitpar program for robust fitting of bonded parameters. We are open to supporting 3^rd parties in using these technologies to develop a robust and reliable automated parameter interface. If this effort is successful, officially endorsed utilities will be listed on our links hub.
This FAQ entry gives a few more details about the validation part.

Supported by grant from National Science Foundation: Grant No. 0823198
XSEDE is acknowledged for maintaining and supporting application software and hardware resources

Page last updated on August 17, 2018