Solutions to IT problems

Solutions I found when learning new IT stuff

Creating a Framework for Chemical Structure Search – Part 3

leave a comment »


Series Overview

This is Part 3 – Current Cheminformatics Landscape of the “Creating a Framework for Chemical Structure Search“-Series.

Previous posts:

Follow-ups:

Introduction

In this part I will explain why I believe that a free, open-source framework for creating chemical structure search enabled database applications is needed. This is my personal opinion, nothing less or more.

Cheminformatics Landscape

There are several companies that offer a range of standard products. Not all companies offer all products. Common products are:

  • Chemical Drawing Program
  • Client for local or remote chemical databases
  • Database Cartridge (for chemical structure search in relational database
  • Chemistry Toolkit (API for one or multiple programming languages)

A database cartridge is an “add-on” to a relational database management system (RDBMS) that enables chemical structure searching directly within the RDBMS. A cartridge usually also supports the conversion between different chemical formats. In the database client the chemical structure search usually takes place client-side and no cartridge is needed.

Out of above components the companies create and offer some typical applications (often called “solutions”) like

  • Chemical Registration
  • Chemical Inventory
  • Electronic Lab Notebook (ELN)
  • Search / Analysis Tool
  • others

Those applications are either web applications or client-server applications. They usually require a commercial RDBMS like Oracle or SQL Server.

Most suppliers create their own proprietary format for handling chemical structures. While they offer the option to convert them to standard formats like SMILES or molfiles, the available functionality with those is usually limited. Certain functionality can not be converted at all.

Besides the commercial suppliers there are multiple free, open-source chemistry tool-kits for different programming languages. Also there is a professionally developed open-source database cartridge for Oracle, SQL Server and PostgreSQL.

What is the problem?

Commercial Solutions

The commercial solutions have a high risk for vendor lock-in. Once you decide for one product it will be much easier to integrate it with a product from the same vendor and licensing a whole bundle of products from one vendor is a lot cheaper. After accumulating data over several years migrating the whole application suite becomes very cumbersome and expensive.

Another issue is the need for a specific commercial RDBMS which are also known to be fairly expensive. There is no guarantee that the free version or the cheapest version is supported by the database cartridge. Often this isn’t even clearly documented and especially free versions have some serious limitations.

The support offered for commercial products is in my opinion not worth the money. Usually support is outsourced to “cheap countries” and the positions are filled with entry-level staffers and proficiency in English and their accent can be an issue that makes dealing with support much harder and more time consuming than it needs to be. The ones that stay longer and get more proficient in the applications usually move out of support rather quickly. There are good people there but you normally just don’t get into contact with them directly. Anyway the issues support can solve are almost always those a competent application admin can solve himself. And in case you find real bugs there is no guarantee they will be fixed soon if at all. And if you want new features you either have to be very patient with no guarantee you will ever get it or finance it yourself and the supplier will happily implement it for you.

Another issue is scientific reproducibility. The code is proprietary and hence you can not reproduce experiments that use such proprietary solutions unless you own a license too. Also you can’t guarantee that the code doesn’t have some bug of any kind making the results invalid.

Open-Source Products

The current issue is that these are more like individual components but not whole applications. To create as example a fully functional, chemical structure searchable Inventory System, that would require a significant development (programming) effort and hence the according expertise, be it in-house or outsourced. So while the commercial solutions are expensive, creating your own using open-source components can be even more expensive. This only makes sense if you are a large company with very special needs.

Conclusion

If you are a small to mid-level organizations your basically forced to buy a pricey product and RDBMS which you can barley afford. These systems are in general used by research departments which usually get less budget than say “production department” meaning while it is fine to spend millions in CRM and SRM system, this is not true for research systems.

Proposed Solution

The proposed solution would be a free, open-source ecosystem of cheminformatics applications. This is however a very long way to go. In this series I’m introducing you to the first, basic step: A free, open-source framework for creating chemical structure search enabled database applications.

Advertisements

Written by kienerj

April 15, 2013 at 12:41

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: