ChemSpotlight v1.2.1
ChemSpotlight is a Spotlight metadata importer plugin for Mac OS X 10.4 Tiger, which reads common chemical file formats (ChemDraw .cdx, .cdxml, MDL .mol, .mdl, .sd, .sdf, Tripos .mol2, Protein Data Bank .pdb, Chemical Markup Language .cml, and XYZ) using the Open Babel chemistry library. It is provided as a Universal Binary for PowerPC and Intel, for optimized performance on both.
It’s probably easier to show the results from ChemSpotlight than to describe it.
ChemSpotlight indexes chemistry files, adds molecular formulas (complete with subscripts in the Finder), molecular weight, and a variety of other information for Spotlight searches and “Get Info” windows. Notice the computed chemical formula and molecular weight information for this file:
ChemSpotlight adds all of this information to the Spotlight index, allowing chemical searches for a range of properties. Since Spotlight indexes automatically in the background as files are created or modified, after installing, you don’t need to worry about updating the database.
Version History
- 1.2.1: Many bug fixes, including support for ChemDraw CDX on both PowerPC and Intel. Uses Open Babel 2.1.1 for unique, canonical SMILES representations, as well as improved PDB metadata indexing.
- 1.2:Further improved error and memory handling. Now uses Open Babel 2.1, including automatic handling of Mac OS 9 line endings. Now includes ChemDraw CDX and CDXML file formats.
- 1.1: Further improved indexing of files that would be ignored by previous versions. Support for indexing residue sequence information. Support for full-text indexing of MDL SD files (including defined key/value property metadata).
- 1.0.1: Fixed a variety of bugs, significantly improving stability on hard-to-index files. Corrected bugs with XYZ and PDB input on aromatic molecules.
- 1.0: First public release.
- 0.9: First (limited) beta release.
Still to Come
- Graphical search interface to Spotlight including the ability to “draw a molecule” search.
- Support for compressed files (e.g., 1ABC.pdb.gz)
- Fragments / fingerprints for molecular similarity and substructure searching
- Molecular descriptors (e.g., LogP, etc.)
Known Bugs and Limitations
- Due to a Finder bug in rounding, sometimes the molecular weight will appear with too many decimal places.
- The Finder will show all metadata added to a file. If you perform “Get Info” on a file with 100s or 1000s of molecules, you will see 100s or 1000s of formulas, masses, etc. (if you even have a big enough screen for that)!
- Spotlight currently does not offer any possibility of returning a particular record/molecule inside a file with many molecules.
Suggestions? Bugs? Coding contributions? E-mail me at <geoff.hutchison at gmail.com> and please try to include “ChemSpotlight” in the subject of your message. If you have a file which does not appear to index, please try to send me a copy.
For discussing ChemSpotlight and announcements of new versions, please subscribe to the mailing list: chemspotlight@lists.openmolecules.net.
Technical Details
ChemSpotlight reads in files using the Open Babel library and then generates the following fields for any molecules it finds:
| Metadata Field | Notes |
|---|---|
| net_sourceforge_openbabel_Chirality | True/False (1/0) |
| net_sourceforge_openbabel_Dimension | 0D/2D/3D depending on the coordinates found |
| net_sourceforge_openbabel_DisplayFormula | Formula with subscripts for Finder “Get Info” windows |
| net_sourceforge_openbabel_Formula | Chemical formula in standard “Hill Order” |
| net_sourceforge_openbabel_Mass | Standard molecular weight in a.m.u. (g/mol) |
| net_sourceforge_openbabel_ExactMass | Molecular mass of most common isotopes for mass spectra |
| net_sourceforge_openbabel_NumAtoms | Number of atoms in the molecule |
| net_sourceforge_openbabel_NumBonds | Number of bonds in the molecule |
| net_sourceforge_openbabel_NumMols | Number of molecules in the file |
| net_sourceforge_openbabel_NumResidues | Number of biomolecule residues |
| net_sourceforge_openbabel_SMILES | Daylight SMILES string for this molecule |
| net_sourceforge_openbabel_InChI | IUPAC/NIST canonical identifier |
| net_sourceforge_openbabel_Sequence | Biomolecule residue sequence |
All of these are available for searching from Spotlight, including command-line searches. For example, you can search for C6H6 in a regular search and return molecules with C6H6 as part of the formula. Or try the following command-line searches:
# Return all files with at least one molecule with mass < 200 mdfind "net_sourceforge_openbabel_Mass < 200" # The next line matches all files containing molecules with mass < 200 AND molecules with > 10 atoms mdfind "net_sourceforge_openbabel_Mass < 200 && net_sourceforge_openbabel_NumAtoms > 10" # Match the c1(c(cccc1)Br SMILES string (i.e., a literal string) mdfind "net_sourceforge_openbabel_SMILES = '*c1(c(cccc1)Br*'"
Note that the SMILES matching is for literal strings of SMILES. I don’t (yet) know how to use the Daylight SMARTS matching system inside Spotlight, although perhaps other tools can filter the results from Spotlight using the SMARTS system.
Source Code
ChemSpotlight is provided under the GNU General Public License (GPL) and is free software. The source code (in XCode) is available.
Thanks
Thanks to Henry Rzepa and Chris Swain for the original idea and pushing to get this started. Thanks to Simon Saubern, Fredrik Wallner, and Bill Day for debugging help and suggestions.
We must be the change we wish to see in the world.
