2005-01-05

Open Web Services for Chemistry

Filed under: Blue ObeliskChemistryWeb — Geoff @ 5:45 pm

Dr. Peter Murray-Rust’s group at Cambridge recently kicked off their so-called “World Wide Molecular Matrix,” or more appropriately, a set of open source web services for chemistry.

Now these are perhaps the most ambitious open chemistry web services currently available. I can actually submit calculations to the Cambridge computers and pick up the results. But they’re in a long line of increasingly improving work. For example, Prof. Henry Rzepa tied together ht://Dig and various chemistry file formats to index molecular information on a website circa 2002. (I think it was earlier that he started the project, but that’s when the publication came out.) Older Java applications like JMol and web interfaces like WebMO also exist. And CORINA will generate fairly decent 3D coordinates from 2D structures or “0D” SMILES strings without coordinates.

I think these new offerings are great–I’ve had a number of conversations at conferences in the last year that suggests that a wide variety of easy-to-use chemistry and molecular visualization web tools would be very well received.

But in the e-mail announcing the WWMM, there’s this:

- there may be a performance hit - is this is actually a problem?

In a word? Yes. I see two big problems with open web services for chemistry.

  • Performance and scalability
  • Proprietary data

Performance is a problem for many computational chemists.

Take me, for example. I finished off one paper recently that involved 210 compounds, each with 4 separate calculations using 3 different programs. That involved 840 calculations–most of which took the better part of days to run. Another paper requires a large number of single-point calculations to map the relative energies of some structures–3 different lengths of molecules, 6 different geometric orientations, 3 different classes of molecules, 6-10 points to determine the curve. That’s ~300-500 calculations, not including mistakes, side-projects, and other tasks. Thank goodness for high-performance cluster computing–I was able to run calculations on dozens of computers and finish things up in a reasonable amount of time.

And I’m small potatoes compared to some–I do calculations on small molecules and relatively small numbers at a time. Pharma companies might try to screen 200,000 compounds against large proteins that are easily 3-5 times the size of the molecules I consider. They usually also want the results fast–not after months of calculations.

Ah yes… Pharmaceutical companies. That’s the other problem: proprietary data. Even though web services can be run over encrypted connections, chemical sciences have to deal with a lot of semi-secret data. Publication of data can ruin possible patents, not to mention journal articles–and it’s a highly competitive area where people really worry about others discovering what they’re researching. These may not be ethical actions I’m describing, but I don’t believe the world is inherently ethical.

Which just goes to say that I may not want to trust my molecular data to a website–even one with a good reputation. I’d much rather run these calculations on my personal laptop–which will probably be faster anyway. In other words, while I intend to contribute open web services of my own at some point, I also think in the computational chemistry field, high-performance desktop applications will still be key.

No Responses

No responses yet.

Subscribe to comments on this post via RSS or TrackBack

Sorry, the comment form is closed at this time.

Powered by WordPress
Except where noted, all contents Copyright © 2004-2005 Geoffrey R. Hutchison, licensed under a Creative Commons license.