Solutions to IT problems

Solutions I found when learning new IT stuff

Posts Tagged ‘Spring-Security

Creating a Framework for Chemical Structure Search – Part 8

leave a comment »

Series Overview

This is Part 8 – Spring Security Integration of the “Creating a Framework for Chemical Structure Search“-Series.

Previous posts:

Follow-ups:

Introduction

MoleculeDatabaseFramework is integrated with Spring-Security. It offers optional method level security in the service layer. This article will explain how it works and how to configure your application to use Spring-Security.

Annotation-Based

MoleculeDatabaseFramework has been integrated with Spring-Security using annotations. This means as long as you do not enable security in you Application Context, everything will work just fine without any security. Security is applied to methods in the Service interfaces.

The security integrations allows you to limit a user to certain types of entities, eg. one user can only read SimpleCompound while his supervisor also has access to SecretCompound. You can also assign roles that allows a user to update and/or delete compounds he created himself. And of course roles that allow to update or delete any compound of a given implementation.

The security integration uses the @PreAuthorize annotation. The value of this annotation is written in SpEL – Spring Expression Language. MoleculeDatabaseFramework uses the expressions hasRole or hasPermission. hasRole directly checks if the current user has the according role (called authority in Spring Security) and if yes grants access to the method or else throws an AccessDeniedException. hasPermission requires an implementation of org.springframework.security.access.PermissionEvaluator which has 2 overloaded methods public boolean hasPermission();. This is used on the save(entity) methods to determine if the given entity can be created or updated by the current user.

See the Spring Security Method Expression Documentation for more information on this topic.

Conventions

To make use of security you need to follow certain conventions. There are 6 basic “role-types”: create, read, update, delete, update_created and delete_created. A complete role is a “role-type” followed by and underscore and the entity class implementations SimpleClassName. So if you have an entity RegistrationCompound and you want a user to be able to read RegistrationCompounds he needs the role read_RegistrationCompound. And so forth.

  • A role starts with a “role-type” followed by “_” and the simple class name: read_RegistrationCompound
  • A user can either read none or all entries for a given entity implementation.
  • A service interface method that requires read-role must be annotated with @PreAuthorize("hasRole(#root.this.getReadRole())")

Above applies to ChemicalCompound, Containable and ChemicalCompoundContainer. For ChemicalStructure there is always only the supplied entity ChemicalStructure which a user of the framework should not extend. ChemicalStructure only has a save-Role (instead of save-Permission) and any user that can create or update any type of ChemicalCompound must have this role save_ChemicalStructure. Or in code:


public interface ChemicalCompoundService<T extends ChemicalCompound>
		extends Service<T> {

	//...snipped...
		
	@Transactional(readOnly = false)
	@PreAuthorize("hasPermission(#compound, 'save_' + #root.this.getCompoundClassSimpleName())")
	@Override
	T save(T compound);
	
	//...snipped...
}

public interface ChemicalStructureService<T extends ChemicalStructure>
		extends Service<T> {	
		
	//...snipped...
	
	@Transactional(readOnly = false)
	@PreAuthorize("hasRole('save_ChemicalStructure')")
	@Override
	T save(T structure);
	
	//...snipped...
}
  • For managing Users and Roles you need to use the supplied entities User and Role and their services.

User implements UserDetails from Spring Security and UserService extends UserDetailsService. To create, update or delete a User or a Role you need the role manage_User or manage_Role respectively.

Security Behaviour

MoleculeDatabaseFramework ships with a PermissionEvaluator implementation. This PermissionEvaluator checks if a given user can create, update or delete a given domain object. (Note: For read-methods just having the read-role is enough; they use hasRole instead of hasPermission in @PreAuthorize). The supplied DefaultPermissionEvaluator internally uses Permission objects to determine a users permissions.

The supplied PermissionEvaluator allows users with create, update or delete role to perform that action on any domain object (of the given implementation). Users with update_created or delete_created role can perform that action only on domain objects they created (getCreatedBy().equals(loggedInUserName).

Services only offer a save(entity) method. If a domain object is being created or being updated is determined whether its id is set or not (id == null -> create). This is exactly the same what hibernate does.

In your application you should only create exactly 1 implementation of ChemicalCompoundContainer. However this Container can hold any type of Containable and hence any type of ChemicalCompound. To ensure that a user only sees Containers that contain a ChemicalCompound he has the read-role for, all other Containers are filtered out. This includes the count() service method. So users with different privileges on ChemicalCompounds see different numbers of Containers.

The supplied PermissionEvaluator requires a RoleHierarchy bean to be configured in the security context. RoleHierarchy is very useful as it automatically assigns a “lower” privilege to someone with a “higher” one. So you say

create_RegistrationCompound > read_RegistrationCompound

then anyone with create_RegistrationCompound role automatically also has role read_RegistrationCompound.

Cascading of Persist and Merge

ChemicalCompound, Containable and ChemicalCompoundContainer in their JPA relationships between each other all use CascadeType.PERSIST and CascadeType.REFRESH. This means changes (updates) to existing entities must always be done using that entities service.save(entity) method because CascadeType.MERGE (update) is not set and hence updates are not cascaded.

In case of creating a new entity, the Permission implementations check if the current user has the privilege to also create the associated, new entities. If you create a new ChemicalCompoundContainer that contains a new Containable which is made of a new ChemicalCompound, the ContainerPermission will verify if the current user has the privilege to not only create the new ChemicalCompoundContainer but also to create the new ChemicalCompound and new Containable. If this is not the case, an AccessDeniedException is thrown.

Below example will create a new RegistrationCompound, a new Batch and a new CompoundContainer if the current user has the privileges create_RegistrationCompound, create_Batch and create_CompoundContainer.

RegistrationCompound regCompound = new RegistrationCompound();
regCompound.setCompoundName("Registration Compound");
regCompound.setCas(cas);
regCompound.setRegNumber(regNumber);

ChemicalStructure structure =
        chemicalStructureFactory.createChemicalStructure(structureData);

ChemicalCompoundComposition composition = new ChemicalCompoundComposition();
composition.setCompound(regCompound);
composition.setChemicalStructure(structure);
composition.setPercentage(100.0);

regCompound.getCompositions().add(composition);

Batch batch = new Batch(regCompound, batchNumber);

regCompound.getBatches().add(batch);

CompoundContainer container = new CompoundContainer("C00001", batch);
container = compoundContainerService.save(container);

In case an associated entity already exists, create-privilege is not required, if the entity that is being persisted is the “parent” in the hierarchy. Or said otherwise you can associate a new ChemicalCompoundContainer with an existing Containable but you can not associate a new Containable with an existing ChemicalCompoundContainer because that Container could not exist before the Containable was created. Same logic for relationship between Containable and ChemicalCompound.

Customize Security

To do so, you need to implement your own PermissionEvaluator and / or Permission implementations.

Configuration Example

Below the SecurityContext.xml used for testing security in MoleculeDatabaseFramework.

<?xml version="1.0" encoding="UTF-8"?>

<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:security="http://www.springframework.org/schema/security"
       xmlns:util="http://www.springframework.org/schema/util"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.1.xsd
             http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util-3.1.xsd
             http://www.springframework.org/schema/security http://www.springframework.org/schema/security/spring-security-3.1.xsd">

    <!-- User Service - In a real application this should use database
   and the according UserService and RoleService.  -->
    <security:user-service id="userService">
        <security:user name="user" password="password" authorities="read_TestCompound,
            read_TestContainable, create_TestCompoundContainer, create_RegistrationCompound, create_Batch"/>
        <security:user name="creator" password="password" authorities="create_TestCompound, create_TestContainable"/>
        <security:user name="owner" password="password" authorities="update_created_TestCompound, update_created_TestContainable, delete_created_TestCompound, delete_created_TestContainable, delete_created_TestCompoundContainer"/>
        <security:user name="editor" password="password" authorities="update_TestCompound, update_TestCompoundContainer, read_TestContainable"/>
        <security:user name="admin" password="admin" authorities="admin_TestCompound, admin_TestContainable, admin_TestCompoundContainer"/>
    </security:user-service>

    <!--    <bean id="userService"
        class="org.bitbucket.kienerj.moleculedatabaseframework.service.UserServiceImpl">
    </bean>-->

    <!-- Use a RoleHierarchy and a PermissionEvaluator in SpEL expression in
    @PreAuthorize -->
    <bean id = "methodSecurityExpressionHandler"
          class = "org.springframework.security.access.expression.method.DefaultMethodSecurityExpressionHandler">
        <property name="roleHierarchy" ref="roleHierarchy"/>
        <property name="permissionEvaluator" ref="permissionEvaluator"/>
    </bean>

    <!-- Role Hierachy - Probably should use database for this too in real app -->
    <bean id="roleHierarchy"
          class="org.springframework.security.access.hierarchicalroles.RoleHierarchyImpl">
        <property name="hierarchy">
            <value>
                admin_TestCompound > update_TestCompound
                update_TestCompound > update_created_TestCompound
                update_created_TestCompound > create_TestCompound
                admin_TestCompound > create_TestCompound
                create_TestCompound > read_TestCompound
                create_TestCompound > save_ChemicalStructure
                admin_TestCompound > delete_TestCompound
                delete_TestCompound > delete_created_TestCompound
                update_RegistrationCompound > create_RegistrationCompound
                create_RegistrationCompound > read_RegistrationCompound
                create_RegistrationCompound > save_ChemicalStructure
                admin_TestContainable > update_TestContainable
                admin_TestContainable > delete_TestContainable
                update_TestContainable > create_TestContainable
                create_TestContainable > read_TestContainable
                update_created_TestContainable > create_TestContainable
                admin_TestCompoundContainer > update_TestCompoundContainer
                admin_TestCompoundContainer > delete_TestCompoundContainer
                delete_created_TestCompoundContainer > create_TestCompoundContainer
                update_TestCompoundContainer > create_TestCompoundContainer
                create_TestCompoundContainer > read_TestCompoundContainer
            </value>
        </property>
    </bean>

    <!-- Permission Evaluator supplied by framework. The constructor takes a Map
    of SimpleClassName -> PermissionImplementation associations-->
    <bean id="permissionEvaluator"
         class="org.bitbucket.kienerj.moleculedatabaseframework.security.DefaultPermissionEvaluator">
        <constructor-arg index="0">
            <map key-type="java.lang.String"
                 value-type="org.bitbucket.kienerj.moleculedatabaseframework.security.Permission">
                <entry key="TestCompound" value-ref="chemicalCompoundPermission"/>
                <entry key="RegistrationCompound" value-ref="chemicalCompoundPermission"/>
                <entry key="TestContainable" value-ref="containablePermission"/>
                <entry key="TestCompoundContainer" value-ref="chemicalCompoundContainerPermission"/>
                <entry key="Batch" value-ref="containablePermission"/>
            </map>
        </constructor-arg>
    </bean>

    <!-- Permission implementations uses -->
    <bean id="chemicalCompoundPermission"
          class="org.bitbucket.kienerj.moleculedatabaseframework.security.ChemicalCompoundPermission">
    </bean>
    <bean id="containablePermission"
          class="org.bitbucket.kienerj.moleculedatabaseframework.security.ContainablePermission">
    </bean>
    <bean id="chemicalCompoundContainerPermission"
          class="org.bitbucket.kienerj.moleculedatabaseframework.security.ChemicalCompoundContainerPermission">
    </bean>


    <security:authentication-manager alias="testAuthenticationManager">
        <security:authentication-provider user-service-ref="userService"/>
    </security:authentication-manager>

    <!-- enable annotations and set expression handler to use-->
    <security:global-method-security pre-post-annotations="enabled">
        <security:expression-handler ref = "methodSecurityExpressionHandler"/>
    </security:global-method-security>
</beans>
Advertisements

Written by kienerj

May 22, 2013 at 08:37

Posted in Chemistry, Java, Programming

Tagged with ,

Creating a Framework for Chemical Structure Search – Part 7

leave a comment »

Series Overview

This is Part 7 – Service Layer of the “Creating a Framework for Chemical Structure Search“-Series.

Previous posts:

Follow-ups:

Introduction

In this article I will introduce the service layer of MoleculeDatabaseFramework. The service layer is responsible for transaction support and security.

Service for ChemicalStructure Entity

This service manages entities of type ChemicalStructure. This service is provided by the framework and should be used as-is. Any access of ChemicalStructures should be through this service. ChemicalStructureService contains the logic that make ChemicalStructures “immutable”. Quote from Part 5 of the series:

A ChemicalStructure is unique and immutable and managed by the framework. Users operate on ChemicalCompounds and not ChemicalStructures directly. Unique means if a new ChemicalCompound is saved, the framework checks if the ChemicalStructures in it already exist and if yes re-uses them. Immutable means that if a ChemicalCompound is updated and one of the ChemicalStructures has changed the framework will automatically check if the updated ChemicalStructure already exist and use it or create a new ChemicalStructure. The old one will remain unchanged!

In case a ChemicalStructure actually needs to be updated and the change should affect all ChemicalCompounds containing it the service has a separate method updateExistingStructure(structure) which requires the current user to have the role update_ChemicalStructure when Spring Security is enabled. In general this method should be restricted to Admins.

Below the source code for the two mentioned methods. Note that for brevity argument checks and logging statements were removed:

public T save(T structure) {
       
	T result = structureRepository.findByStructureKey(structure.getStructureKey());

	if (result == null) {
		// clear id (and createdBy and created) so that a new row is inserted
		structure.reset();

		return structureRepository.save(structure);
	} else {
		return structureRepository.save(result);
	}
}

@Override
public T updateExistingStructure(T structure){

	T result = structureRepository.findOne(structure.getId());

	if (result == null){
		throw new IllegalArgumentException("Given ChemicalStructure does not exist. It can't be updated.");
	}        
	return structureRepository.save(structure);
}

Services for ChemicalCompound and Containable

The services for ChemicalCompound and Containable are very similar. They consist of an interface which contains the security annotations and an implementation containing the @Transactional annotations for declarative transactions. They also offer methods with optional loading of lazy collections.

When a entity is saved, the service automatically sets the correct ChemicalStructures by either selecting an existing one from the database or creating a new one.
Also before the entity is passed to Hibernate, the method preSave(entity); is executed. This method is empty in the provided abstract services like ChemicalCompoundServiceImpl but can be overridden in subclasses. As example one could set all non-nullable properties to a default or a sequence value if it is null.

Below the source code of the save(entity) method of ChemicalCompoundServiceImpl for clarification:

public final T save(T compound) {
	logger.entry(compound);
	Preconditions.checkNotNull(compound);
	preSave(compound);
	if (compound.getCompositions() != null) {
		for (ChemicalCompoundComposition composition : compound.getCompositions()) {
			ChemicalStructure structure = composition.getChemicalStructure();
			ChemicalStructure result = chemicalStructureService.save(structure);
			composition.setChemicalStructure(result);
		}
	}
	//...snipped...
}

You need to create a service interface and an implementation for every ChemicalCompound and Containable implementation the application uses. The services must implement the getRepository(), checkUniqueness() and getExistingCompound() methods.

getRepository() must return the Spring-Data repository responsible for saving the entity of the same type as the current service is for.

checkUniqueness() must return if an entity is unique (does not violate any unique constraints) and if it is not unique it must return a Mapping of the violated constraint(all field names of constraint) and the offending value.

getExistingCompound() or getExistingContainable() respectively is called during import of SD-Files. The method must use the data from the SD-File (SdfRecord) to check if the current compound that is being imported already exists in the database. If it already exists it must return it, else it must return null.

Below an Example of such a Service implementation for RegistrationCompound entity:

RegistrationCompoundService UML

Note that Java Generics are not shown in above diagram hence in source code:

public interface RegistrationCompoundService extends ChemicalCompoundService<RegistrationCompound> {
	//...snipped...
}

public class RegistrationCompoundServiceImpl extends ChemicalCompoundServiceImpl<RegistrationCompound>
        implements RegistrationCompoundService {
	//...snipped...
}

Please see the MoleculeDatabaseFramework Tutorial for further information on how to implement such services.

ChemicalCompoundContainerService

This service manages entities of type ChemicalCompoundContainer. The main difference to services for ChemicalCompound and Containable is that an application should only have 1 implementation of ChemicalCompoundContainer and hence only 1 such service. The service can be used out-of-the-box or in some cases must be extended as example when the ChemicalCompoundContainer implementation adds additional unique constraints, checkUniqueness() and getExistingContainer() must be overridden.

Also ChemicalCompoundContainerService has some special considerations in terms of the Spring-Security integration.

Please see the MoleculeDatabaseFramework Tutorial for further information on how to implement such a service.

Written by kienerj

May 8, 2013 at 08:06

Creating a Framework for Chemical Structure Search – Part 5

with one comment

Series Overview

This is Part 5 – Entity Model of the “Creating a Framework for Chemical Structure Search“-Series.

Previous posts:

Follow-ups:

Introduction

In this part I will introduce you to the chosen design for the model (entity classes) and I will explain the reasoning behind it. The model is fairly simple but it still took me rather long to finalize it. The issue is that I needed to consider what different applications using my framework might require and at the same time keep it as simple as possible.

Entity Model

I’m just going to show you a simple UML class diagram created with yuml.me – An Online UML Diagram Generator and then introduce each entity.

Class Diagram of Model

BaseEntity

This is a base class that holds metadata like creation date. This is a @MappedSuperclass which the other model classes extend.

Source Code for BaseEntity

UPDATE: Due to a new feature BaseEntity now extends MetaDataEntity. BaseEntity contains an extra abstract method public Long getId();. All entities except ChemicalCompoundComposition extend BaseEntity and ChemicalCompoundComposition extends MetaDataEntity as it has no id property and sadly it is non-trivial or not possible at all to add a generated id to an @Embeddable using JPA and Hibernate.

ChemicalStructure

Entity for holding the chemical structure data (SMILES or molfile) and the structure key (InChiKey). A ChemicalStructure is unique and immutable and managed by the framework. Users operate on ChemicalCompounds and not ChemicalStructures directly. Unique means if a new ChemicalCompound is saved, the framework checks if the ChemicalStructures in it already exist and if yes re-uses them. Immutable means that if a ChemicalCompound is updated and one of the ChemicalStructures has changed the framework will automatically check if the updated ChemicalStructure already exist and use it or create a new ChemicalStructure. The old one will remain unchanged!

Source Code for ChemicalStructure

ChemicalCompoundComposition

Links together ChemicalStructure and ChemicalCompound and defines the relative occurrence of the ChemicalStructure within the ChemicalCompound.

Source Code for ChemicalCompoundComposition

ChemicalCompound

Abstract model of a ChemicalCompound. A ChemicalCompound consists of ChemicalCompoundCompositions. The class contains some basic fields like compoundName and cas. A ChemicalCompound can also be associated with a Set of Containables. Developers using MoleculeDatabaseFramework must create concrete implementations of this class. An application can have multiple implementations of ChemicalCompound and each implementation is stored and searched separately (Table per Concrete class Inheritance). Note that due to better usability it was decided to make CAS-Number column nullable and it is not unique.

A ChemicalCompound is a “virtual entity” or “descriptive entity”. It is like a specific car model that describes all properties of that car but is not a concrete object that physically exists.

Source Code for ChemicalCompound

Containable

A Containable is like a set of a ChemicalCompounds that were produced in the same way. In a Chemical Registration System this would be a Batch and in an Inventory System a Lot. The important part is that ChemicalCompound and Containable are generic classes and must form a pair:


@Entity
@Table(name="registration_compound")
@Data
@EqualsAndHashCode(callSuper=false, of = {"regNumber"})
public class RegistrationCompound extends ChemicalCompound<Batch> {
    // snipped
}

@Entity
@Table(name="batch", uniqueConstraints=
        @UniqueConstraint(columnNames = {"chemical_compound_id", "batch_Number"}))
@Data
@EqualsAndHashCode(callSuper=true, of = {"batchNumber"})
public class Batch extends Containable<RegistrationCompound> {
    // snipped
}

Source Code for Containable

ChemicalCompoundContainer

A ChemicalCompoundContainer holds exactly 1 Containable of any type. An application should only have 1 implementation of this entity. This represents a concrete physically available object containing a ChemicalCompound linked by a Containable. ChemicalCompoundContainer has a barcode field which is unique and not nullable. The barcode hence uniquely identifies a physically available sample of a ChemicalCompound.

Role and User

Role and User are only relevant if you plan on using MoleculeDatabaseFramework with Spring-Security. ChemicalCompound and Containable hold a reference to their Read-Role. This is used to filter ChemicalCompoundContainers in the database based on the current Users privileges. Example:

Your application has 2 ChemicalCompound-Implementations, DefaultCompound and SecretCompound. Current User has the Role to read DefaultCompounds but not for reading (viewing) SecretCompounds. So if this User searches for ChemicalCompoundContainers, only ChemicalCompoundContainer that contain a DefaultCompound must be returned by the search. To achieve that the queries WHERE-clause is extended and the filter based on the Role is added automatically. The main advantage of doing this filtering in the database compared to filtering the results within the applications is that you get pageable results which would not be easily possible (if at all) with application-side filtering (and performance is probably a lot better too).

Source Code for Role
Source Code for User

I will go further into Spring-Security Integration in later article. If you are interested in learning more about it I can refer you to MoleculeDatabaseFrameworks Spring-Security Wiki Page.

Written by kienerj

April 30, 2013 at 07:22

Posted in Chemistry, Java, Programming

Tagged with , ,

Creating a Framework for Chemical Structure Search – Part 4

leave a comment »

Series Overview

This is Part 4 – Component Selection of the “Creating a Framework for Chemical Structure Search“-Series.

Previous posts:

Follow-ups:

Introduction

Finally I will start with the actual creation of the framework. In this part I will introduce the main components (existing 3rd party frameworks and libraries) I use and briefly explain my choices. At this point I think it is fair to mention that my work was basically integrating different existing software components into my desired end-product while taking into account real-world problems and offering a solution for them. There are no new magic algorithms in chemical structure searching, modeling or drug discovery to be found here!

My first try

In my previous effort at creating a framework for chemical structure search, I thought being platform independent, especially regarding the used relational database management system (RDBMS), is an important aspect. Therefore I relied on doing the chemical structure search in the application and not the database. However it is exactly that part that lead to huge performance and efficiency problems. I had to do some stuff that just felt wrong and “hacky” to get usable performance.

Encountered issues with Application-based Substructure Search

Object Creation Performance

The first issue was, that for every structure search, all the structures (molfiles) passing the fingerprint screen had to be loaded from the database and converted to an IAtomContainer Object from the Chemistry Development Kit. It was the creation of these objects that was very CPU intensive. This was due to the fact that you had to detect aromaticity and similar things for every AtomContainer object. I found the solution for this in OrChem, a free cartridge for Oracle based on the CDK. The creators seemed to have the exact same issue and came up with their custom format. That format stored everything required like aromaticity and so forth in a CDK-specific way so the creation of IAtomContainers was not an issue anymore.

Substructure Search Performance

The second issue was the mediocre performance of the substructure search itself. The solution was a complex approach using multi-threading and queues. The first thread screened all structures using the pre-generated fingerprints. Fingerprints were stored in the database but loaded into memory on application start. If a structure passed the screen it’s database id was put into a queue. A second thread reads form that queue, loaded the molfile from database and generated the IAtomContainer and put them into a second queue. Then there were multiple threads (configurable amount) that took the AtomContainers from the queue and did the actual test for subgraph isomorphism. Again, if a structure passed this phase too, it’s database id was put into the output queue and the AtomContainer discarded. This last step was required because AtomContainers are memory hogs and you had to control somehow how many there were in memory at any time.

CPU load now easily reached 100% for seconds during substructure searches. I then realized that the database alone could easily use 20% or more of that probably due to loading all the structures form it. So I added the option to hold the custom format from OrChem in memory ( not big of an issue actually in terms of memory consumption) to reduce load on database and hence use those CPU cycles for substructure search. I guess you have long figured out how convoluted this all was. But it actually worked amazingly well! Because the hits were put into a queue it was easily possible to display the first say 5 hits on a web page while the search continued in the background. So you could give the impression of a very fast search!

Why start from scratch again?

So why change it? Tons of reasons. All of this was done with plain JDBC and various kinds of data transfer objects. Tight-Coupling and maintainability was a serious issue. On the application side of things it was impossible to sort the results because hits are returned somewhat randomly and hence real paging was not possible either. The second thing was how could you search for a substructure and a numeric property at the same time? Well the solution for that was, that one of the substructure search methods had a Set-argument. The Set should contain the database ids of the structures the search should be performed over. Hence do an SQL query for the numeric property first and feed the ids into the substructure search. That worked but again, not very straight forward. Adding and using such custom properties to the database was rather messy too, it lacked proper transaction support and so forth. All in all it was nothing to be proud of and certainly not usable in a real production environment. I did however learn a lot about the Java 5 concurrency package.

Component for Substructure Search

I decided that being dependent on a specific RDBMS is a minor issue compared to above outlined problems. I already knew about the open-source Bingo Cartridge and to my luck the company behind it was developing a version for PostgreSQL. So my choice of this component was easy. Use PostgreSQL with Bingo, both are free and open-source.

Application-side Chemistry toolkit

Especially for Input-output the framework required a Chemistry Toolkit and I again chose the Chemistry Development Kit CDK.

ORM

While it would be preferable to be independent of the ORM, I wasn’t able to achieve that but I admit I did not but much effort in it. MoleculeDatabaseFramework uses JPA 2.0 and hibernate as it’s JPA provider. The part that is hibernate specific is the custom SQL dialect I created for accessing the Structure Search functions of Bingo in JPQL and hence also QueryDSL. There is no specific reason I chose hibernate except I already knew it and it was able to do what I required. So I did not investigate any other JPA providers.

Application Framework – Dependency-Injection

Well I guess this is obvious. I chose Spring. I’ve heard and read a lot about Spring. I’ve always wanted to learn it and this was my chance. I also did not want the framework to depend an a full-blown Java EE Application server.

Data Access Layer – CRUD and Querying

I initial started the project with plain Spring and JPA (Hibernate). But shortly after I in my “research” I read about Spring Data JPA and it’s integration with QueryDSL. I quote from Spring-Data website:

Spring Data JPA aims to significantly improve the implementation of data access layers by reducing the effort to the amount that’s actually needed. As a developer you write your repository interfaces, including custom finder methods, and Spring will provide the implementation automatically.

To illustrate this here an example snippet showing an example implementation of my framework:

@Repository
public interface RegistrationCompoundRepository extends ChemicalCompoundRepository {

    List findByRegNumberStartingWith(String regNumber);

}

RegistrationCompound has a property called regNumber. Above interface method is automatically implemented by Spring Data and will return a result List of the RegistrationCompounds that match the passed in argument. That’s all you need to write. No SQL and not even a method implementation. Just create the interface and then follow the findBy method conventions of Spring Data.

A Spring Data repository can also make use of QueryDSL.

Querydsl is a framework which enables the construction of type-safe SQL-like queries for multiple backends including JPA, JDO and SQL in Java.

Example:

List result = query.from(customer)
    .where(customer.lastName.like("A%"), customer.active.eq(true))
    .orderBy(customer.lastName.asc(), customer.firstName.desc())
    .list(customer);

If you use QueryDSL in your Spring Data Repository using QueryDslPredicateExecutor

@Repository
@Transactional(propagation = Propagation.MANDATORY)
public interface ChemicalCompoundRepository
        extends ChemicalStructureSearchRepository, JpaRepository<T, Long>,
        QueryDslPredicateExecutor {
    //...
}

the repository will have additional methods that take a QueryDSL Predicate as an input. A Predicate is basically the WHERE-Clause of the query, like from above example customer.lastName.like("A%"). Some methods take additional parameter like a Pageable. This can be used for paging, the Pageable includes the paging (limit, offset) and sorting information.

This all means it is trivial to extend the repository my framework provides and add your own custom search methods to it. With using predicates you can create complex queries which at the same time search by chemical substructure, return the result sorted and paged and all this with a 1-line method declaration.

public Page findByChemicalStructure(String structureData,
            StructureSearchType searchType,
            Pageable pageable, Predicate predicate);

So I hope this got you interested!