Solutions to IT problems

Solutions I found when learning new IT stuff

Archive for the ‘Chemistry’ Category

Working with Swings JTable – An Example

leave a comment »

Introduction

In my previous article Fast random file access and line by line reading in Java I described how I created a random access file class that has a fast readLine() method. The reason for creating that class was, that I was implementing a random access reader for the chemistry file format sdf. Such files are ASCII text files and can contain tens of thousands of records. Each record consists of a variable amount of lines. For fast random access I wanted to index the offset where each record begins. With seek(offset) a requested record can then be accessed very quickly. To index the file it must be read line by line to search for the record separator $$$$. Therefore a fast readLine() method was crucial for performance.

After successfully implementing my reader for the sdf format, called sdf-reader, I wanted to create a GUI on top of it. I called it Free SDF Viewer. Records in sd-files contain a chemical structure and optionally associated data. Therefore sdf format is often used to exchange chemical databases and that’s also why it makes a lot of sense to use a table to visualize sd-files.

Requirements

Before diving in I created a short list of the requirements. sd-files can be huge and hence they should not have to be fully loaded into memory. If the user scrolls down in the table the rows above should not be kept in memory but the scrolling should be smooth meaning that some records should be cached.

The first column should display the chemical structure and that chemical structure must be resizable (height and width). This means that the columns width and the row height must be user adjustable. Because sd-files can contain thousands of records I also wanted to have a row header only containing the row number (1-based).

    • Usable with large files
    • smooth scrolling
    • user adjustable column width
    • user adjustable row height
    • row header containing the row number (1-based)

I had some additional requirement like a nice and easy to use file chooser but that is not related to working with a JTable. All in all I had the feeling that my list was reasonable and implementation should be easy. But boy, was I wrong. It turned out to be a very bumpy ride.

Accessing the data

My first problem was how to get the data from the file into the JTable. My sdf-reader returns an SdfRecord object that contains all the data. The access is index-based meaning the first record has index 0 and so forth. It turned out to be rather straight forward. A JTable gets the data from a TableModel. The solution is to create a custom TableModel implementation. The most important method of TableModel is getValueAt(int rowIndex, int columnIndex). In this method you define how the data is retrieved. The source can be anything. A naive implementation of this method for my case will look like this:

@Override
public Object getValueAt(int rowIndex, int columnIndex) {

	SdfRecord record = sdfReader.getRecord(rowIndex);

	if (columnIndex == 0) {
		//this column is always the chemical structure
		// display the chemical structure image
		String molfile = record.getMolfile();
		ImageIcon chemicalStructure = new ChemicalStructureIcon(molfile,indigo,renderer,imageWidth, imageHeight);
		return chemicalStructure;
	} else {
		// display data. currently everything is treated as String
		// inlucding numbers and dates
		String columnName = getColumnName(columnIndex);
		String value = record.getProperty(columnName);
		return value;
	}
}

This is naive because the underlying sd-file will be accessed several times for a single row while reading exactly the same data. In my actually implementation I’m caching 100 records around the current position. So if the user scrolls up or down the data will be read from the cache. If the users scrolls far enough some data will be evicted from the cache and new data loaded. This cache logic is omitted for simplicity. For the full source code see the Free SDF Viewers project page.

You might have noticed that in above code I’m creating an instance of ChemicalStructureIcon. I will discuss this in the next chapter.

Displaying the chemical structure

I thought that displaying an image in a JTable cell would be very straightforward and easy. However that was wrong. During my search I found that one should use the ImageIcon class because JTable can render that by default. This is not entirely true. You actually have to specifically tell JTable to render ImageIcon columns as image. This is done by a custom TableCellRenderer.

private class SdfTableCellRenderer extends DefaultTableCellRenderer {

	@Override
	public void setValue(Object value) {
		if (value instanceof ImageIcon) {
			setIcon((ImageIcon) value);
			setText("");
		} else {
			setIcon(null);
			super.setValue(value);
		}
	}
}

Note that I will later show you that I extended JTable to easily initialize all my added features. That is why this is a private inner class.

After being able to see images in table cells I realized that when changing the column width, the image is not automatically adjusted to be smaller or larger. Note that changing column width is supported by default. After another search session I realized that the only solution was to extend ImageIcon and I therefore created ChemicalStructureIcon. ChemicalStructureIcon uses the Indigo Chemistry Toolkit for rendering chemical structures. The most relevant code shown below is the paintIcon(Component c, Graphics g, int x, int y) method which is called when the image is drawn. If the column width changes, the image is re-rendered automatically.

@Override
public synchronized void paintIcon(Component c, Graphics g, int x, int y) {
	Image image = getImage();
	if (image == null) {
		return;
	}
	Insets insets = ((Container) c).getInsets();
	x = insets.left;
	y = insets.top;

	int w = c.getWidth() - x - insets.right;
	int h = c.getHeight() - y - insets.bottom;

	if (w != width || h != height) {
		if (w < 16 || h < 16) {
			// 16 pixels is minimum size supported by indigo
			return;
		}
		width = w;
		height = h;
		indigo.setOption("render-image-size", w, h);
		image = renderImage();
		setImage(image);
	}

	ImageObserver io = getImageObserver();
	g.drawImage(image, x, y, w, h, io == null ? c : io);
}

You can see the complete source code on the projects web page.

Adding a row header

JTable has no concept of a row header. But not all hope is lost because JScrollPane does and a JTable usually is inside a JScrollPane. In my solution I use an extra JTable as a row header. That table has a very simple, custom TableModel.

@Override
public Object getValueAt(int rowIndex, int columnIndex) {
	return rowIndex + 1;
}

Complete source code of RowHeaderModel

To actually use this I extended JTable, added the field headerTable.This row header is initialized in the constructor of this custom JTable implementation.

public SdfTable(JScrollPane scrollPane, SdfReader sdfReader, int rowHeight) {
	super();
	//snipped other initialization code
	headerModel = new RowHeaderModel(getRowCount());
	headerTable = new JTable(headerModel);
	headerTable.setRowHeight(getRowHeight());
	headerTable.setShowGrid(false);
	headerTable.setAutoResizeMode(JTable.AUTO_RESIZE_OFF);
        // 60 -> the width of the row header in px
	headerTable.setPreferredScrollableViewportSize(new Dimension(60, 0));
	headerTable.getColumnModel().getColumn(0).setPreferredWidth(60);
	headerTable.getColumnModel().getColumn(0).setCellRenderer(new RowHeaderCellRenderer());
	// synchronize selection by using the same selection model in both tables
	headerTable.setSelectionModel(this.getSelectionModel());
	scrollPane.setRowHeaderView(headerTable);
	setPreferredScrollableViewportSize(getPreferredSize());
}

There is a lot going on here. Also note that both tables use the same SelectionModel. This means that when the user clicks in the row header, that whole row will be selected and when the user clicks on a row, the row header will be selected too.

When the row height changes, the row header must have the same new height. Therefore my JTable implementation overrides the setRowHeight() methods.

@Override
public void setRowHeight(int rowHeight) {
	super.setRowHeight(rowHeight);
	if (headerTable != null) {
		headerTable.setRowHeight(rowHeight);
	}
}

@Override
public void setRowHeight(int row, int rowHeight) {
	super.setRowHeight(row, rowHeight);
	if (headerTable != null) {
		headerTable.setRowHeight(row, rowHeight);
	}
}

Complete source code of SdfTable

Change row height with mouse

One of the requirements was that the user can adjust the row height. This must be possible for individual rows and all rows at once. First we will look into adjusting a single rows height.

Change height of single row

To listen for mouse input we need to extend MouseInputAdapter. The idea is to show a resize cursor when the mouse it at the boundary of 2 rows.  When the user then presses the left button and drags the mouse, the upper row will be resized relatively to the distance traveled by the mouse. This requires us to use basic math and geometry knowledge.

private int getResizingRow(Point p) {
	return getResizingRow(p, table.rowAtPoint(p));
}

private int getResizingRow(Point p, int row) {
	if (row == -1) {
		return -1;
	}
	int col = table.columnAtPoint(p);
	if (col == -1) {
		return -1;
	}
	Rectangle r = table.getCellRect(row, col, true);
	r.grow(0, -3);
	if (r.contains(p)) {
		return -1;
	}

	int midPoint = r.y + r.height / 2;
	int rowIndex = (p.y < midPoint) ? row - 1 : row;

	return rowIndex;
}

@Override
public void mousePressed(MouseEvent e) {
	Point p = e.getPoint();
	resizingRow = getResizingRow(p);
	mouseYOffset = p.y - table.getRowHeight(resizingRow);
	if (resizingRow >= 0) {
		table.setRowSelectionAllowed(false);
		table.setAutoscrolls(false);
	}
}

The math is one thing, more problematic was that dragging the mouse could lead to weird behavior on screen with the 2 affected rows flickering as they are constantly being (de)-selected. Also if you drag the mouse upwards to the table header, the table begins to scroll. For these reasons these features are disabled while the resizing occurs.

The row height is then changed according to the distance covered (up or down, Y-Coordinate in Swing) by the mouse when dragging.

@Override
public void mouseDragged(MouseEvent e) {
	table.clearSelection();
	int mouseY = e.getY();

	if (resizingRow >= 0) {
		int newHeight = mouseY - mouseYOffset;
		if (newHeight > 0) {
			table.setRowHeight(resizingRow, newHeight);
		}
	}
}

Complete source code of TableRowResizer

Change row height for all rows

This could be solved easily with a prompt / input were the user types in a number. However that is not user friendly at all. The idea is that if the mouse is at the edge of the top row and the table header, a resize cursor should be shown and if the mouse is pressed and dragged, all rows height will be changed. This sounds identical to above solution, however it is not. The table header is an instance of JTableHeader. It has a different cursor than JTable. So depending on mouse location either the tables or the table headers cursor must be changed into a resize cursor.

private boolean isResizingHeader(MouseEvent e) {
	Point p = e.getPoint();

	Object source = e.getSource();
	JTableHeader header = table.getTableHeader();

	if (source instanceof JTableHeader) {

		int col = table.columnAtPoint(p);
		if (col == -1) {
			return false;
		}

		return ((header.getY() + header.getHeight()) - 5) < p.y;

	} else if (source instanceof JTable) {

		int topRow = getTopRow();
		int row = table.rowAtPoint(p);

		if (row == topRow) {
			int col = table.columnAtPoint(p);
			if (col == -1) {
				return false;
			}
			Rectangle r = table.getCellRect(row, col, true);
			r.grow(0, -5);
			return r.y > p.y;
		}
	}	
	return false;
}

and for changing the cursor

@Override
public void mouseMoved(MouseEvent e) {
	if (e.getSource() instanceof JTable) {
		if (isResizingHeader(e)
				!= (table.getCursor() == resizeCursor)) {
			swapTableCursor();
		}
	} else if (e.getSource() instanceof JTableHeader) {
		if (isResizingHeader(e)
				!= (table.getTableHeader().getCursor() == resizeCursor)) {
			swapHeaderCursor();
		}
	}
}

Another issue is when the top row is only partially visible this resizing should work too. Also when changing the row height of all rows, the total height of the table changes dramatically. Without taking precautions this leads to erratic auto scrolling to different records. To prevent that, the current top row is programmatically kept at the top of the viewport.

@Override
public void mouseDragged(MouseEvent e) {
	int mouseY = e.getYOnScreen();
	if (isResizing) {
		int newHeight = table.getRowHeight() + (mouseY - yOffset);
		if (newHeight > 0) {
			yOffset = e.getYOnScreen();
			table.setRowHeight(newHeight);
			JViewport viewport = (JViewport) table.getParent();
			JScrollPane scrollPane = (JScrollPane) viewport.getParent();
			// This rectangle is relative to the table where the
			// northwest corner of cell (0,0) is always (0,0).
			Rectangle rect = table.getCellRect(topRow, 0, true);
			scrollPane.getVerticalScrollBar().setValue(rect.y);
		}
	}
}

This code has one downside, that as soon as the user drags the mouse, the top row will jump down fully into view but else it works very nicely.

See complete source code of AllRowsResizer.

Putting it all together

To put all the features together I extended JTable in my custom class SdfTable. The full constructor for SdfTable can be seen below.

public SdfTable(JScrollPane scrollPane, SdfReader sdfReader, int rowHeight) {
	super();
	DefaultTableCellRenderer r = new SdfTableCellRenderer();
	setDefaultRenderer(Object.class, r);
	TableRowResizer rowResizer = new TableRowResizer(this);
	AllRowsResizer allRowsResizer = new AllRowsResizer(this);

	TableModel tableModel = new SdfTableModel(sdfReader);
	setModel(tableModel);
	super.setRowHeight(rowHeight);
	getColumnModel().getColumn(0).setPreferredWidth(STRUCTURE_COLUMN_WIDTH);

	headerModel = new RowHeaderModel(getRowCount());
	headerTable = new JTable(headerModel);
	headerTable.setRowHeight(getRowHeight());
	headerTable.setShowGrid(false);
	headerTable.setAutoResizeMode(JTable.AUTO_RESIZE_OFF);
	headerTable.setPreferredScrollableViewportSize(new Dimension(60, 0));
	headerTable.getColumnModel().getColumn(0).setPreferredWidth(60);
	headerTable.getColumnModel().getColumn(0).setCellRenderer(new RowHeaderCellRenderer());
	// synchronize selection by using the same selection model in both tables
	headerTable.setSelectionModel(this.getSelectionModel());
	scrollPane.setRowHeaderView(headerTable);
	setPreferredScrollableViewportSize(getPreferredSize());
}

The JScrollPane argument is required for correctly setting the row header.

Complete source code of SdfTable.

Additional Features and Comments

Free SDF Viewer has some additional features not directly related to JTable and there are other issues I encounter that apply to everything in Swing.

Showing a wait cursor when loading sd-file

I created a menu with a single entry “Load SD-files…”. Clicking on it will display a FileChooser and then load and initialize an SdfTable object. This means that the code runs under the so called Event Dispatch Thread (EDT). Changes to UI Elements done in this thread will not become visible until the action completes. This means that if you change the cursor to a wait cursor and then load the sd-file in the EDT, the user will never see the wait cursor. Changing the cursor must be done in a different thread but not just any thread. You are required to use a SwingWorker thread. The take-away message is, that simple things become unexpectedly complex.

First the code executed after clicking on the “Load SD-File…” menu option:

private void loadFileMenuItemActionPerformed(java.awt.event.ActionEvent evt) {

        int returnVal = fileChooser.showOpenDialog(this);
        if (returnVal == JFileChooser.APPROVE_OPTION) {
            File file = fileChooser.getSelectedFile();
            logger.debug("Opening SD-File '{}'.", file.getAbsoluteFile());
            SdfLoader loader = new SdfLoader(this, file);
            loader.execute();
        } else {
            logger.debug("Opening of SD-file cancelled by the user.");
        }
    }

This creates and SdfLoader instance. SdfLoader extends SwingWorker. In the constructor it changes the cursor to a wait cursor, then in the doInBackground()-method the sd-file is loaded and finally in the done()-method the SdfTable is created and the cursor reverted back to its previous state.

private class SdfLoader extends SwingWorker<JTable, Void> {

	private final JFrame frame;
	private final File sdFile;
	private JTable table;
	private IOException ioException;

	public SdfLoader(JFrame frame, File sdFile) {
		this.frame = frame;
		frame.setCursor(Cursor.getPredefinedCursor(Cursor.WAIT_CURSOR));
		this.sdFile = sdFile;
	}

	@Override
	public JTable doInBackground() {
		try {
			//close old file
			if (sdfReader != null) {
				sdfReader.close();
			}
			sdfReader = new SdfReader(sdFile);
			lastOpenDir = sdFile.getParentFile();
			table = new SdfTable(jScrollPane1, sdfReader, 200);
		} catch (IOException ex) {
			logger.catching(ex);
			ioException = ex;
			table = jTable1;
		}
		return table;
	}

	@Override
	public void done() {
		if (ioException == null) {
			jTable1 = table;
			jScrollPane1.setViewportView(table);
			frame.setCursor(Cursor.getDefaultCursor());
		} else {
			JOptionPane.showMessageDialog(SdfViewer.this,
					ioException.getMessage(),
					"Error opening file",
					JOptionPane.ERROR_MESSAGE);
		}
	}
}

Remembering Settings

  One of them is remembering the last directory I sd-files was opened from. This information is written into a properties file and loaded at start-up. Future versions might make additonal use of this to store other settings like rendering options for the chemical structure. I’m mentioning this so that you are not confused by unexplained code in the main class SdfViewer.

Screenshots

First a screenshot showing a row with a different row height.

The second screen shot shows the SdfTableModel in action. The user currently is at row 139045 in a large sd-file and there are no performance issues on a standard laptop.

The full project Free SDF Viewer is available on bitbucket. There is also a download for an executable jar file in the projects downloads section.

Advertisements

Written by kienerj

January 17, 2014 at 10:00

Posted in Chemistry, Java, Programming

Tagged with , ,

Creating a Framework for Chemical Structure Search – Part 9

leave a comment »

Series Overview

This is Part 9 – Putting it all together of the “Creating a Framework for Chemical Structure Search“-Series.

Previous posts:

Introduction

In this final post I’m going to show you a basic Spring MVC 3 Web Application I made based on MoleculeDatabaseFramework.

Functionality

This Web Application MDFSimpleWebApp lets you

  • import ChemicalCompounds from an SD-File
  • do a chemical substructure search for compounds
  • view the search hits in a paged, tabular fashion
  • view individual search hits
  • download all search hits as SD-File

This is of course only a subset of all the features offered by MoleculeDatabaseFramework but it gives you a general idea how the framework works in terms of writing code and performance.

Entity

MDFSimpleWebApp contains 1 ChemicalCompound implementation called SimnpleCompound. It is the most basic possible implementation of ChemicalCompound with no additional properties.

There is also the entity SimpleLot which extends Containable. However it is not yet currently used within the application.

Repository and Service

MoleculeDatabaseFramework requires that you create a repository interface, a repository implementation (for chemical structure searching), a service interface and a service for each of your entities. Hence I created a SimpleCompoundRepository, SimpleCompoundRepositoryImpl, SimpleCompoundService and SimpleCompoundServiceImpl. These classes offer no custom search methods. They just implement all the methods required by the framework. See the Repository- and Service Packages.

SimpleCompoundController

This is the controller for SimpleCompound. The controller takes web requests and passes them on the Service Layer, in this case this is SimpleCompoundService, an implementation of ChemicalCompoundService. The controller exposes certain methods from the service like importing of SD-Files, chemical substructure searching or image rendering of chemical structures.

Rendering Images of chemical structures

For displaying chemical compounds I choose the option to dynamically generate images of all chemical structures in the compound. This functionality is also provided by MoleculeDatabaseFramework. Hence the according controller method is very simple:

@RequestMapping(value = "/{compoundId}/render", method = RequestMethod.GET)
public void renderCompound(@PathVariable Long compoundId,
		final HttpServletResponse response,
		@RequestParam(defaultValue = "500") int width,
		@RequestParam(defaultValue = "150") int height) throws IOException {
	try (ServletOutputStream out = response.getOutputStream()) {
		IAtomContainer mol = compoundService.getCdkMolecule(compoundId);
		MoleculeRenderer renderer = new MoleculeRenderer(width, height);
		renderer.renderMolecule(mol, out);
	}
}

and in a web page you just need to add the according image tag.

JSP with JSTL:

<img src="<c:url value="/compound/${compound.getId()}/render?width=500&height=300"/>" />

Or generated in JavaScript:

var html = <img alt="' + smiles + '" src="/MDFSimpleWebApp/compound/'+ compoundId + '/render" />
// insert image into existing html element

As example here an image of the web page for viewing a compound:

rendering example

Importing SD-File

For uploading a file using Spring MVC 3 I followed this tutorial. I had to create the very simple class FileUploadForm and the controller method is rather simple too:

@RequestMapping(value = "/import", method = RequestMethod.POST)
public String importCompounds(Model model, FileUploadForm fileUploadForm,
		BindingResult result)
		throws IOException {

	if (result.hasErrors()) {
		model.addAttribute("hasError", true);
		model.addAttribute("bindingResult", result);
		model.addAttribute(fileUploadForm);
		return "importCompounds";
	}
	Reader reader = new InputStreamReader(fileUploadForm.getFileData().getInputStream(), "US-ASCII");
	EntityImportResult importResult = compoundService.importSDF(reader, true);

	model.addAttribute("hasError", false);
	model.addAttribute("imported", importResult.getImportedEntities().size());
	model.addAttribute("present", importResult.getEntitiesAlreadyInDatabase().size());
	model.addAttribute(new FileUploadForm());

	return "importCompounds";
}

Chemical Structure Search

Search Form

The Chemical Structure Search is made up of a page that contains a tool for drawing chemical structures and submitting the search and the actual page for displaying search results. For drawing chemical structures MDFSimpleWebApp initially used the JChemPaint Applet but I recently changed it to JSME, a JavaScript based drawing tool. See below the search form with JSME:

Chemical Structure Search Form

Search Result Page

The search results page relies heavily on AJAX using JQuery and the JQuery plugin datatables. The search hits are displayed in paged fashion using datatables server-side processing and hence only 1 page of results is fetched from the database. The results table contains an image of the chemical structure, the compounds name and its CAS number. Clicking on the image will show a JavaScript alert containing the SMILES String of the given chemical structure.

Search Results Page

For each new page a AJAX request is sent to the server and the according page is returned. Note that the initial load of the page can take a bit longer. This is due to the fact that the total amount of hits is determined (eg. no SQL LIMIT-Clause). This count is cached so that all page requests are as fast. However due to how OFFSET and LIMIT work, the higher the page number, the longer the search takes. So if you have a high number of hits (eg. several thousands) the last page will load slower than the first one. If you want to display search hits 10’000 to 10’004 the database will search up to hit number 10’004 and then return the last 5 hits. However in general you should improve your search if you get so many hits.

After the page is returned from the server, the data must be converted to JSON and in a format expected by datatables. To achieve that I create the helper class JQueryDatatablesPage that contains all the properties that datatables requires and the according getters and setters. JQueryDatatablesPage is then converted to JSON using Jackson 2 ObjectMapper.

@RequestMapping(value = "/search", method = RequestMethod.GET, produces = "application/json")
public @ResponseBody
String search(
		@RequestParam int iDisplayStart,
		@RequestParam int iDisplayLength,
		@RequestParam int sEcho, // for datatables draw count
		@RequestParam String structure) throws IOException {

	int pageNumber = (iDisplayStart + 1) / iDisplayLength;
	PageRequest pageable = new PageRequest(pageNumber, iDisplayLength);
	Page<SimpleCompound> page = compoundService.findByChemicalStructure(structure, StructureSearchType.SUBSTRUCTURE, pageable);
	int iTotalRecords = (int) compoundService.count(null);
	int iTotalDisplayRecords = (int) page.getTotalElements();
	JQueryDatatablesPage<SimpleCompound> dtPage = new JQueryDatatablesPage<>(
			page.getContent(), iTotalRecords, iTotalDisplayRecords,
			Integer.toString(sEcho));

	String result = toJson(dtPage);
	return result;

}

private String toJson(JQueryDatatablesPage<?> dt) throws IOException {
	ObjectMapper mapper = new ObjectMapper();
	mapper.registerModule(new Hibernate4Module());
	return mapper.writeValueAsString(dt);
}

Jackson 2 can deal with circular references if your entities are annotated with

@JsonIdentityInfo(generator=ObjectIdGenerators.IntSequenceGenerator.class, property="@id")

You also need to register the Hibernate4Module to deal with Lazy Collections!

Donwload of Search Hits

You can download search result hits as SD-File by clicking on the Download Hits-Link on the search results page. The browser will display a dialog were you want to save the file. This uses the exportSDF()-method of SimpleCompoundService.

@RequestMapping(value = "/downloadHits", method = RequestMethod.GET)
public void downloadHits(@RequestParam String structure, HttpServletResponse response) throws IOException {

	List<Long> ids = compoundService.findByChemicalStructure(structure, StructureSearchType.SUBSTRUCTURE);
	HashSet<String> properties = new HashSet<>();
	properties.add("compoundName");
	response.setContentType("chemical/x-mdl-sdfile");
	String disposition = "attachment; fileName=searchHits-" + structure + ".sdf";
	response.setHeader("Content-Disposition", disposition);
	ServletOutputStream output = response.getOutputStream();
	OutputStreamWriter writer = new OutputStreamWriter (output);
	compoundService.exportSDF(ids, writer, properties);
}

Final Words

See below a demo video of a Chemical Substructure Search in MDFSimpleWebApp with a database of 65’000 compounds. The demo runs on a dual-core mobile i5 running Windows 7 32-bit with 4 GB of RAM installed or said otherwise: The hardware is pretty mediocre.

MDFSimpleWebApp is hosted on bitbucket. If you want to try out this application you can go to the download section on bitbucket and download a fully working standalone version for Windows 64-bit including PostgreSQL, the Bingo Cartridge for Chemical Structure Searching, tomcat as servlet container and this web application. Note: This file is 105 MB due to PostgreSQL and tomcat being included.

Written by kienerj

June 6, 2013 at 12:41

Creating a Framework for Chemical Structure Search – Part 8

leave a comment »

Series Overview

This is Part 8 – Spring Security Integration of the “Creating a Framework for Chemical Structure Search“-Series.

Previous posts:

Follow-ups:

Introduction

MoleculeDatabaseFramework is integrated with Spring-Security. It offers optional method level security in the service layer. This article will explain how it works and how to configure your application to use Spring-Security.

Annotation-Based

MoleculeDatabaseFramework has been integrated with Spring-Security using annotations. This means as long as you do not enable security in you Application Context, everything will work just fine without any security. Security is applied to methods in the Service interfaces.

The security integrations allows you to limit a user to certain types of entities, eg. one user can only read SimpleCompound while his supervisor also has access to SecretCompound. You can also assign roles that allows a user to update and/or delete compounds he created himself. And of course roles that allow to update or delete any compound of a given implementation.

The security integration uses the @PreAuthorize annotation. The value of this annotation is written in SpEL – Spring Expression Language. MoleculeDatabaseFramework uses the expressions hasRole or hasPermission. hasRole directly checks if the current user has the according role (called authority in Spring Security) and if yes grants access to the method or else throws an AccessDeniedException. hasPermission requires an implementation of org.springframework.security.access.PermissionEvaluator which has 2 overloaded methods public boolean hasPermission();. This is used on the save(entity) methods to determine if the given entity can be created or updated by the current user.

See the Spring Security Method Expression Documentation for more information on this topic.

Conventions

To make use of security you need to follow certain conventions. There are 6 basic “role-types”: create, read, update, delete, update_created and delete_created. A complete role is a “role-type” followed by and underscore and the entity class implementations SimpleClassName. So if you have an entity RegistrationCompound and you want a user to be able to read RegistrationCompounds he needs the role read_RegistrationCompound. And so forth.

  • A role starts with a “role-type” followed by “_” and the simple class name: read_RegistrationCompound
  • A user can either read none or all entries for a given entity implementation.
  • A service interface method that requires read-role must be annotated with @PreAuthorize("hasRole(#root.this.getReadRole())")

Above applies to ChemicalCompound, Containable and ChemicalCompoundContainer. For ChemicalStructure there is always only the supplied entity ChemicalStructure which a user of the framework should not extend. ChemicalStructure only has a save-Role (instead of save-Permission) and any user that can create or update any type of ChemicalCompound must have this role save_ChemicalStructure. Or in code:


public interface ChemicalCompoundService<T extends ChemicalCompound>
		extends Service<T> {

	//...snipped...
		
	@Transactional(readOnly = false)
	@PreAuthorize("hasPermission(#compound, 'save_' + #root.this.getCompoundClassSimpleName())")
	@Override
	T save(T compound);
	
	//...snipped...
}

public interface ChemicalStructureService<T extends ChemicalStructure>
		extends Service<T> {	
		
	//...snipped...
	
	@Transactional(readOnly = false)
	@PreAuthorize("hasRole('save_ChemicalStructure')")
	@Override
	T save(T structure);
	
	//...snipped...
}
  • For managing Users and Roles you need to use the supplied entities User and Role and their services.

User implements UserDetails from Spring Security and UserService extends UserDetailsService. To create, update or delete a User or a Role you need the role manage_User or manage_Role respectively.

Security Behaviour

MoleculeDatabaseFramework ships with a PermissionEvaluator implementation. This PermissionEvaluator checks if a given user can create, update or delete a given domain object. (Note: For read-methods just having the read-role is enough; they use hasRole instead of hasPermission in @PreAuthorize). The supplied DefaultPermissionEvaluator internally uses Permission objects to determine a users permissions.

The supplied PermissionEvaluator allows users with create, update or delete role to perform that action on any domain object (of the given implementation). Users with update_created or delete_created role can perform that action only on domain objects they created (getCreatedBy().equals(loggedInUserName).

Services only offer a save(entity) method. If a domain object is being created or being updated is determined whether its id is set or not (id == null -> create). This is exactly the same what hibernate does.

In your application you should only create exactly 1 implementation of ChemicalCompoundContainer. However this Container can hold any type of Containable and hence any type of ChemicalCompound. To ensure that a user only sees Containers that contain a ChemicalCompound he has the read-role for, all other Containers are filtered out. This includes the count() service method. So users with different privileges on ChemicalCompounds see different numbers of Containers.

The supplied PermissionEvaluator requires a RoleHierarchy bean to be configured in the security context. RoleHierarchy is very useful as it automatically assigns a “lower” privilege to someone with a “higher” one. So you say

create_RegistrationCompound > read_RegistrationCompound

then anyone with create_RegistrationCompound role automatically also has role read_RegistrationCompound.

Cascading of Persist and Merge

ChemicalCompound, Containable and ChemicalCompoundContainer in their JPA relationships between each other all use CascadeType.PERSIST and CascadeType.REFRESH. This means changes (updates) to existing entities must always be done using that entities service.save(entity) method because CascadeType.MERGE (update) is not set and hence updates are not cascaded.

In case of creating a new entity, the Permission implementations check if the current user has the privilege to also create the associated, new entities. If you create a new ChemicalCompoundContainer that contains a new Containable which is made of a new ChemicalCompound, the ContainerPermission will verify if the current user has the privilege to not only create the new ChemicalCompoundContainer but also to create the new ChemicalCompound and new Containable. If this is not the case, an AccessDeniedException is thrown.

Below example will create a new RegistrationCompound, a new Batch and a new CompoundContainer if the current user has the privileges create_RegistrationCompound, create_Batch and create_CompoundContainer.

RegistrationCompound regCompound = new RegistrationCompound();
regCompound.setCompoundName("Registration Compound");
regCompound.setCas(cas);
regCompound.setRegNumber(regNumber);

ChemicalStructure structure =
        chemicalStructureFactory.createChemicalStructure(structureData);

ChemicalCompoundComposition composition = new ChemicalCompoundComposition();
composition.setCompound(regCompound);
composition.setChemicalStructure(structure);
composition.setPercentage(100.0);

regCompound.getCompositions().add(composition);

Batch batch = new Batch(regCompound, batchNumber);

regCompound.getBatches().add(batch);

CompoundContainer container = new CompoundContainer("C00001", batch);
container = compoundContainerService.save(container);

In case an associated entity already exists, create-privilege is not required, if the entity that is being persisted is the “parent” in the hierarchy. Or said otherwise you can associate a new ChemicalCompoundContainer with an existing Containable but you can not associate a new Containable with an existing ChemicalCompoundContainer because that Container could not exist before the Containable was created. Same logic for relationship between Containable and ChemicalCompound.

Customize Security

To do so, you need to implement your own PermissionEvaluator and / or Permission implementations.

Configuration Example

Below the SecurityContext.xml used for testing security in MoleculeDatabaseFramework.

<?xml version="1.0" encoding="UTF-8"?>

<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:security="http://www.springframework.org/schema/security"
       xmlns:util="http://www.springframework.org/schema/util"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.1.xsd
             http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util-3.1.xsd
             http://www.springframework.org/schema/security http://www.springframework.org/schema/security/spring-security-3.1.xsd">

    <!-- User Service - In a real application this should use database
   and the according UserService and RoleService.  -->
    <security:user-service id="userService">
        <security:user name="user" password="password" authorities="read_TestCompound,
            read_TestContainable, create_TestCompoundContainer, create_RegistrationCompound, create_Batch"/>
        <security:user name="creator" password="password" authorities="create_TestCompound, create_TestContainable"/>
        <security:user name="owner" password="password" authorities="update_created_TestCompound, update_created_TestContainable, delete_created_TestCompound, delete_created_TestContainable, delete_created_TestCompoundContainer"/>
        <security:user name="editor" password="password" authorities="update_TestCompound, update_TestCompoundContainer, read_TestContainable"/>
        <security:user name="admin" password="admin" authorities="admin_TestCompound, admin_TestContainable, admin_TestCompoundContainer"/>
    </security:user-service>

    <!--    <bean id="userService"
        class="org.bitbucket.kienerj.moleculedatabaseframework.service.UserServiceImpl">
    </bean>-->

    <!-- Use a RoleHierarchy and a PermissionEvaluator in SpEL expression in
    @PreAuthorize -->
    <bean id = "methodSecurityExpressionHandler"
          class = "org.springframework.security.access.expression.method.DefaultMethodSecurityExpressionHandler">
        <property name="roleHierarchy" ref="roleHierarchy"/>
        <property name="permissionEvaluator" ref="permissionEvaluator"/>
    </bean>

    <!-- Role Hierachy - Probably should use database for this too in real app -->
    <bean id="roleHierarchy"
          class="org.springframework.security.access.hierarchicalroles.RoleHierarchyImpl">
        <property name="hierarchy">
            <value>
                admin_TestCompound > update_TestCompound
                update_TestCompound > update_created_TestCompound
                update_created_TestCompound > create_TestCompound
                admin_TestCompound > create_TestCompound
                create_TestCompound > read_TestCompound
                create_TestCompound > save_ChemicalStructure
                admin_TestCompound > delete_TestCompound
                delete_TestCompound > delete_created_TestCompound
                update_RegistrationCompound > create_RegistrationCompound
                create_RegistrationCompound > read_RegistrationCompound
                create_RegistrationCompound > save_ChemicalStructure
                admin_TestContainable > update_TestContainable
                admin_TestContainable > delete_TestContainable
                update_TestContainable > create_TestContainable
                create_TestContainable > read_TestContainable
                update_created_TestContainable > create_TestContainable
                admin_TestCompoundContainer > update_TestCompoundContainer
                admin_TestCompoundContainer > delete_TestCompoundContainer
                delete_created_TestCompoundContainer > create_TestCompoundContainer
                update_TestCompoundContainer > create_TestCompoundContainer
                create_TestCompoundContainer > read_TestCompoundContainer
            </value>
        </property>
    </bean>

    <!-- Permission Evaluator supplied by framework. The constructor takes a Map
    of SimpleClassName -> PermissionImplementation associations-->
    <bean id="permissionEvaluator"
         class="org.bitbucket.kienerj.moleculedatabaseframework.security.DefaultPermissionEvaluator">
        <constructor-arg index="0">
            <map key-type="java.lang.String"
                 value-type="org.bitbucket.kienerj.moleculedatabaseframework.security.Permission">
                <entry key="TestCompound" value-ref="chemicalCompoundPermission"/>
                <entry key="RegistrationCompound" value-ref="chemicalCompoundPermission"/>
                <entry key="TestContainable" value-ref="containablePermission"/>
                <entry key="TestCompoundContainer" value-ref="chemicalCompoundContainerPermission"/>
                <entry key="Batch" value-ref="containablePermission"/>
            </map>
        </constructor-arg>
    </bean>

    <!-- Permission implementations uses -->
    <bean id="chemicalCompoundPermission"
          class="org.bitbucket.kienerj.moleculedatabaseframework.security.ChemicalCompoundPermission">
    </bean>
    <bean id="containablePermission"
          class="org.bitbucket.kienerj.moleculedatabaseframework.security.ContainablePermission">
    </bean>
    <bean id="chemicalCompoundContainerPermission"
          class="org.bitbucket.kienerj.moleculedatabaseframework.security.ChemicalCompoundContainerPermission">
    </bean>


    <security:authentication-manager alias="testAuthenticationManager">
        <security:authentication-provider user-service-ref="userService"/>
    </security:authentication-manager>

    <!-- enable annotations and set expression handler to use-->
    <security:global-method-security pre-post-annotations="enabled">
        <security:expression-handler ref = "methodSecurityExpressionHandler"/>
    </security:global-method-security>
</beans>

Written by kienerj

May 22, 2013 at 08:37

Posted in Chemistry, Java, Programming

Tagged with ,

Creating a Framework for Chemical Structure Search – Part 7

leave a comment »

Series Overview

This is Part 7 – Service Layer of the “Creating a Framework for Chemical Structure Search“-Series.

Previous posts:

Follow-ups:

Introduction

In this article I will introduce the service layer of MoleculeDatabaseFramework. The service layer is responsible for transaction support and security.

Service for ChemicalStructure Entity

This service manages entities of type ChemicalStructure. This service is provided by the framework and should be used as-is. Any access of ChemicalStructures should be through this service. ChemicalStructureService contains the logic that make ChemicalStructures “immutable”. Quote from Part 5 of the series:

A ChemicalStructure is unique and immutable and managed by the framework. Users operate on ChemicalCompounds and not ChemicalStructures directly. Unique means if a new ChemicalCompound is saved, the framework checks if the ChemicalStructures in it already exist and if yes re-uses them. Immutable means that if a ChemicalCompound is updated and one of the ChemicalStructures has changed the framework will automatically check if the updated ChemicalStructure already exist and use it or create a new ChemicalStructure. The old one will remain unchanged!

In case a ChemicalStructure actually needs to be updated and the change should affect all ChemicalCompounds containing it the service has a separate method updateExistingStructure(structure) which requires the current user to have the role update_ChemicalStructure when Spring Security is enabled. In general this method should be restricted to Admins.

Below the source code for the two mentioned methods. Note that for brevity argument checks and logging statements were removed:

public T save(T structure) {
       
	T result = structureRepository.findByStructureKey(structure.getStructureKey());

	if (result == null) {
		// clear id (and createdBy and created) so that a new row is inserted
		structure.reset();

		return structureRepository.save(structure);
	} else {
		return structureRepository.save(result);
	}
}

@Override
public T updateExistingStructure(T structure){

	T result = structureRepository.findOne(structure.getId());

	if (result == null){
		throw new IllegalArgumentException("Given ChemicalStructure does not exist. It can't be updated.");
	}        
	return structureRepository.save(structure);
}

Services for ChemicalCompound and Containable

The services for ChemicalCompound and Containable are very similar. They consist of an interface which contains the security annotations and an implementation containing the @Transactional annotations for declarative transactions. They also offer methods with optional loading of lazy collections.

When a entity is saved, the service automatically sets the correct ChemicalStructures by either selecting an existing one from the database or creating a new one.
Also before the entity is passed to Hibernate, the method preSave(entity); is executed. This method is empty in the provided abstract services like ChemicalCompoundServiceImpl but can be overridden in subclasses. As example one could set all non-nullable properties to a default or a sequence value if it is null.

Below the source code of the save(entity) method of ChemicalCompoundServiceImpl for clarification:

public final T save(T compound) {
	logger.entry(compound);
	Preconditions.checkNotNull(compound);
	preSave(compound);
	if (compound.getCompositions() != null) {
		for (ChemicalCompoundComposition composition : compound.getCompositions()) {
			ChemicalStructure structure = composition.getChemicalStructure();
			ChemicalStructure result = chemicalStructureService.save(structure);
			composition.setChemicalStructure(result);
		}
	}
	//...snipped...
}

You need to create a service interface and an implementation for every ChemicalCompound and Containable implementation the application uses. The services must implement the getRepository(), checkUniqueness() and getExistingCompound() methods.

getRepository() must return the Spring-Data repository responsible for saving the entity of the same type as the current service is for.

checkUniqueness() must return if an entity is unique (does not violate any unique constraints) and if it is not unique it must return a Mapping of the violated constraint(all field names of constraint) and the offending value.

getExistingCompound() or getExistingContainable() respectively is called during import of SD-Files. The method must use the data from the SD-File (SdfRecord) to check if the current compound that is being imported already exists in the database. If it already exists it must return it, else it must return null.

Below an Example of such a Service implementation for RegistrationCompound entity:

RegistrationCompoundService UML

Note that Java Generics are not shown in above diagram hence in source code:

public interface RegistrationCompoundService extends ChemicalCompoundService<RegistrationCompound> {
	//...snipped...
}

public class RegistrationCompoundServiceImpl extends ChemicalCompoundServiceImpl<RegistrationCompound>
        implements RegistrationCompoundService {
	//...snipped...
}

Please see the MoleculeDatabaseFramework Tutorial for further information on how to implement such services.

ChemicalCompoundContainerService

This service manages entities of type ChemicalCompoundContainer. The main difference to services for ChemicalCompound and Containable is that an application should only have 1 implementation of ChemicalCompoundContainer and hence only 1 such service. The service can be used out-of-the-box or in some cases must be extended as example when the ChemicalCompoundContainer implementation adds additional unique constraints, checkUniqueness() and getExistingContainer() must be overridden.

Also ChemicalCompoundContainerService has some special considerations in terms of the Spring-Security integration.

Please see the MoleculeDatabaseFramework Tutorial for further information on how to implement such a service.

Written by kienerj

May 8, 2013 at 08:06

Creating a Framework for Chemical Structure Search – Part 6

leave a comment »

Series Overview

This is Part 6 – Data Access Layer of the “Creating a Framework for Chemical Structure Search“-Series.

Previous posts:

Follow-ups:

Introduction

In the previous article I introduced the entity model of MoleculeDatabaseFramework. This article will explain the Data Access Layer which uses Spring-Data-JPA with Hibernate and how the Chemical Structure Search methods of the Bingo PostgreSQL Cartridge are exposed to Hibernate and QueryDSL.

How Spring-Data JPA works

Basic functionality

I quote from Spring-Data website:

Spring Data JPA aims to significantly improve the implementation of data access layers by reducing the effort to the amount that’s actually needed. As a developer you write your repository interfaces, including custom finder methods, and Spring will provide the implementation automatically.

You create a new interface that extends from generic interfaces provided by Spring-Data and represents the repository for an entity. There are different kinds of repository interfaces but the repositories in MoleculeDatabaseFramework all extend JpaRepository. JpaRepository provides CRUD-methods and some retrieval methods for your entity.

Repositories in MoleculeDatabaseFramework also extend QueryDslPredicateExecutor. This adds findOne(predicate) and findAll(predicate) methods. Predicates are basically type-safe WHERE-Clauses.

Custom query methods

Besides the provided methods you can add your custom search methods by following the findBy-method conventions of Spring Data JPA or by annotating a method with @Query were the value of the annotation is either a JPQL Query or native SQL.

Custom Queries providing your own method implementation

In case you have a very complex query that can’t be automatically created by Spring-Data, you can create them yourself.

1. Create Custom Query Interface

To achieve this you need to first create an interface containing the desired query method(s) and annotate it with @NoRepositoryBean:

@NoRepositoryBean
public interface ChemicalStructureSearchRepository<T> {

    Page<T> findByChemicalStructure(String structureData,
            StructureSearchType searchType,
            Pageable pageable, Predicate predicate,
            String searchOptions,
            PathBuilder<T> pathBuilder);


    Page<T> findBySimilarStructure(String structureData,
            SimilarityType similarityType,
            Double lowerBound, Double upperBound,
            Pageable pageable, Predicate predicate,
            PathBuilder<T> pathBuilder);
}

This is the Source Code of ChemicalStructureSearchRepository minus JavaDoc comments.

2. Create a repository extending Custom Query interface

As an example below the Source Code for ChemicalCompoundRepository which extends ChemicalStructureSearchRepository:

@Repository
@Transactional(propagation = Propagation.MANDATORY)
public interface ChemicalCompoundRepository<T extends ChemicalCompound>
        extends ChemicalStructureSearchRepository<T>, JpaRepository<T, Long>,
        QueryDslPredicateExecutor<T> {
    
    List<T> findByCompositionsPkChemicalStructureId(Long structureId);
    
    T findByCas(String cas);

    @Query("select c from Containable c where c.chemicalCompound = ?1")
    List<Containable> getContainablesByCompound(ChemicalCompound compound);
}

3. Create an implementation of your repository

The convention is that the implementation is named after the repository with “Impl” appended, in this case ChemicalCompoundRepositoryImpl. This implementation must only implement your custom methods in this case defined in ChemicalStructureSearchRepository.

public class ChemicalCompoundRepositoryImpl<T extends ChemicalCompound>
        implements ChemicalStructureSearchRepository<T> {

	//...fields and constructors snipped...

    @Cacheable(STRUCTURE_QUERY_CACHE)
    @Override
    public Page<T> findByChemicalStructure(String structureData,
            StructureSearchType searchType, Pageable pageable,
            Predicate predicate, String searchOptions,
            PathBuilder<T> compoundPathBuilder) {
			
			//...implementation snipped...
    }


    @Cacheable(STRUCTURE_QUERY_CACHE)
    @Override
    public Page<T> findBySimilarStructure(String structureData,
            SimilarityType similarityType, Double lowerBound, Double upperBound,
            Pageable pageable, Predicate predicate,
            PathBuilder<T> compoundPathBuilder) {
			
			//...implementation snipped...
    }
}

Below an UML Class Diagram that shows the relationships of ChemicalCompoundRepository:

ChemicalCompoundRepository UML

Spring-Data automatically detects the repository implementation and combines all provided and all your custom search methods into one object which you use by calling them from ChemicalCompoundRepository.


Page<T> page = getRepository().findByChemicalStructure(structureData, searchType,
                pageable, predicate, searchOptions, pathBuilder);

Using the Repositories

MoleculeDatabaseFramework provides generic repositories for all entities in the entity model.

Source Code for all Repositories

To make use of a chemical structure search enabled repository you need to extend it using your specific entity implementation and optionally add your custom find methods:

@Repository
public interface RegistrationCompoundRepository extends ChemicalCompoundRepository<RegistrationCompound> {

    List<RegistrationCompound> findByRegNumberStartingWith(String regNumber);

}

That’s it!

You can find further information on how to implement entities and repositories in the MoleculeDatabaseFramework Tutorial as this article is meant to show the inner workings of the framework and not how to use it.

Exposing Bingo PostgreSQL Cartridge Methods

This is done by using a custom dialect extending Hibernates PostgreSQL82Dialect:

public class BingoPostgreSQLDialect extends PostgreSQL82Dialect {

    public BingoPostgreSQLDialect() {
         registerFunction("issubstructure", new SQLFunctionTemplate(
                 StandardBasicTypes.BOOLEAN, "?1  @ (?2, ?3)::bingo.sub"));
         registerFunction("isexactstructure", new SQLFunctionTemplate(
                 StandardBasicTypes.BOOLEAN, "?1  @ (?2, ?3)::bingo.exact"));
         registerFunction("matchessmarts", new SQLFunctionTemplate(
                 StandardBasicTypes.BOOLEAN, "?1  @ (?2, ?3)::bingo.smarts"));
         registerFunction("matchesformula", new SQLFunctionTemplate(
                 StandardBasicTypes.BOOLEAN, "?1  @ (?2, ?3)::bingo.gross"));
         registerFunction("issimilarstructure", new SQLFunctionTemplate(
                 StandardBasicTypes.BOOLEAN, "?1  @ (?2, ?3, ?4, ?5)::bingo.sim"));
         registerFunction("hasmassbetween", new SQLFunctionTemplate(
                 StandardBasicTypes.BOOLEAN, "?1 > ?2::bingo.mass AND ?1 < ?3::bingo.mass"));         
    }
}

And as a usage example a source code snippet from ChemicalCompoundRepositoryImpl:

public Page<T> findByChemicalStructure(String structureData,
            StructureSearchType searchType, Pageable pageable,
            Predicate predicate, String searchOptions,
            PathBuilder<T> compoundPathBuilder) {
			
	//...snipped...
			
	BooleanExpression matchesStructureQuery; // this is a Predicate!

	switch (searchType) {
		case EXACT:
			matchesStructureQuery = BooleanTemplate.create(
					"isExactStructure({0},{1},{2}) = true",
					structure.structureData,
					ConstantImpl.create(structureData),
					ConstantImpl.create(searchOptions));
			break;
		case SUBSTRUCTURE:
			matchesStructureQuery = BooleanTemplate.create(
					"isSubstructure({0},{1},{2}) = true",
					structure.structureData,
					ConstantImpl.create(structureData),
					ConstantImpl.create(searchOptions));
			break;
		//...snipped other cases
	}

	baseQuery = baseQuery.from(compoundPathBuilder)
			.innerJoin(compound.compositions, composition)
			.innerJoin(composition.pk.chemicalStructure, structure)
			.where(matchesStructureQuery.and(predicate));
	//...snipped...
}

Full Source Code for ChemicalCompoundRepositoryImpl

The next Part will focus on the Service Layer. The Service Layer controls transactions and security.

Written by kienerj

May 2, 2013 at 07:51

Creating a Framework for Chemical Structure Search – Part 5

with one comment

Series Overview

This is Part 5 – Entity Model of the “Creating a Framework for Chemical Structure Search“-Series.

Previous posts:

Follow-ups:

Introduction

In this part I will introduce you to the chosen design for the model (entity classes) and I will explain the reasoning behind it. The model is fairly simple but it still took me rather long to finalize it. The issue is that I needed to consider what different applications using my framework might require and at the same time keep it as simple as possible.

Entity Model

I’m just going to show you a simple UML class diagram created with yuml.me – An Online UML Diagram Generator and then introduce each entity.

Class Diagram of Model

BaseEntity

This is a base class that holds metadata like creation date. This is a @MappedSuperclass which the other model classes extend.

Source Code for BaseEntity

UPDATE: Due to a new feature BaseEntity now extends MetaDataEntity. BaseEntity contains an extra abstract method public Long getId();. All entities except ChemicalCompoundComposition extend BaseEntity and ChemicalCompoundComposition extends MetaDataEntity as it has no id property and sadly it is non-trivial or not possible at all to add a generated id to an @Embeddable using JPA and Hibernate.

ChemicalStructure

Entity for holding the chemical structure data (SMILES or molfile) and the structure key (InChiKey). A ChemicalStructure is unique and immutable and managed by the framework. Users operate on ChemicalCompounds and not ChemicalStructures directly. Unique means if a new ChemicalCompound is saved, the framework checks if the ChemicalStructures in it already exist and if yes re-uses them. Immutable means that if a ChemicalCompound is updated and one of the ChemicalStructures has changed the framework will automatically check if the updated ChemicalStructure already exist and use it or create a new ChemicalStructure. The old one will remain unchanged!

Source Code for ChemicalStructure

ChemicalCompoundComposition

Links together ChemicalStructure and ChemicalCompound and defines the relative occurrence of the ChemicalStructure within the ChemicalCompound.

Source Code for ChemicalCompoundComposition

ChemicalCompound

Abstract model of a ChemicalCompound. A ChemicalCompound consists of ChemicalCompoundCompositions. The class contains some basic fields like compoundName and cas. A ChemicalCompound can also be associated with a Set of Containables. Developers using MoleculeDatabaseFramework must create concrete implementations of this class. An application can have multiple implementations of ChemicalCompound and each implementation is stored and searched separately (Table per Concrete class Inheritance). Note that due to better usability it was decided to make CAS-Number column nullable and it is not unique.

A ChemicalCompound is a “virtual entity” or “descriptive entity”. It is like a specific car model that describes all properties of that car but is not a concrete object that physically exists.

Source Code for ChemicalCompound

Containable

A Containable is like a set of a ChemicalCompounds that were produced in the same way. In a Chemical Registration System this would be a Batch and in an Inventory System a Lot. The important part is that ChemicalCompound and Containable are generic classes and must form a pair:


@Entity
@Table(name="registration_compound")
@Data
@EqualsAndHashCode(callSuper=false, of = {"regNumber"})
public class RegistrationCompound extends ChemicalCompound<Batch> {
    // snipped
}

@Entity
@Table(name="batch", uniqueConstraints=
        @UniqueConstraint(columnNames = {"chemical_compound_id", "batch_Number"}))
@Data
@EqualsAndHashCode(callSuper=true, of = {"batchNumber"})
public class Batch extends Containable<RegistrationCompound> {
    // snipped
}

Source Code for Containable

ChemicalCompoundContainer

A ChemicalCompoundContainer holds exactly 1 Containable of any type. An application should only have 1 implementation of this entity. This represents a concrete physically available object containing a ChemicalCompound linked by a Containable. ChemicalCompoundContainer has a barcode field which is unique and not nullable. The barcode hence uniquely identifies a physically available sample of a ChemicalCompound.

Role and User

Role and User are only relevant if you plan on using MoleculeDatabaseFramework with Spring-Security. ChemicalCompound and Containable hold a reference to their Read-Role. This is used to filter ChemicalCompoundContainers in the database based on the current Users privileges. Example:

Your application has 2 ChemicalCompound-Implementations, DefaultCompound and SecretCompound. Current User has the Role to read DefaultCompounds but not for reading (viewing) SecretCompounds. So if this User searches for ChemicalCompoundContainers, only ChemicalCompoundContainer that contain a DefaultCompound must be returned by the search. To achieve that the queries WHERE-clause is extended and the filter based on the Role is added automatically. The main advantage of doing this filtering in the database compared to filtering the results within the applications is that you get pageable results which would not be easily possible (if at all) with application-side filtering (and performance is probably a lot better too).

Source Code for Role
Source Code for User

I will go further into Spring-Security Integration in later article. If you are interested in learning more about it I can refer you to MoleculeDatabaseFrameworks Spring-Security Wiki Page.

Written by kienerj

April 30, 2013 at 07:22

Posted in Chemistry, Java, Programming

Tagged with , ,

Creating a Framework for Chemical Structure Search – Part 4

leave a comment »

Series Overview

This is Part 4 – Component Selection of the “Creating a Framework for Chemical Structure Search“-Series.

Previous posts:

Follow-ups:

Introduction

Finally I will start with the actual creation of the framework. In this part I will introduce the main components (existing 3rd party frameworks and libraries) I use and briefly explain my choices. At this point I think it is fair to mention that my work was basically integrating different existing software components into my desired end-product while taking into account real-world problems and offering a solution for them. There are no new magic algorithms in chemical structure searching, modeling or drug discovery to be found here!

My first try

In my previous effort at creating a framework for chemical structure search, I thought being platform independent, especially regarding the used relational database management system (RDBMS), is an important aspect. Therefore I relied on doing the chemical structure search in the application and not the database. However it is exactly that part that lead to huge performance and efficiency problems. I had to do some stuff that just felt wrong and “hacky” to get usable performance.

Encountered issues with Application-based Substructure Search

Object Creation Performance

The first issue was, that for every structure search, all the structures (molfiles) passing the fingerprint screen had to be loaded from the database and converted to an IAtomContainer Object from the Chemistry Development Kit. It was the creation of these objects that was very CPU intensive. This was due to the fact that you had to detect aromaticity and similar things for every AtomContainer object. I found the solution for this in OrChem, a free cartridge for Oracle based on the CDK. The creators seemed to have the exact same issue and came up with their custom format. That format stored everything required like aromaticity and so forth in a CDK-specific way so the creation of IAtomContainers was not an issue anymore.

Substructure Search Performance

The second issue was the mediocre performance of the substructure search itself. The solution was a complex approach using multi-threading and queues. The first thread screened all structures using the pre-generated fingerprints. Fingerprints were stored in the database but loaded into memory on application start. If a structure passed the screen it’s database id was put into a queue. A second thread reads form that queue, loaded the molfile from database and generated the IAtomContainer and put them into a second queue. Then there were multiple threads (configurable amount) that took the AtomContainers from the queue and did the actual test for subgraph isomorphism. Again, if a structure passed this phase too, it’s database id was put into the output queue and the AtomContainer discarded. This last step was required because AtomContainers are memory hogs and you had to control somehow how many there were in memory at any time.

CPU load now easily reached 100% for seconds during substructure searches. I then realized that the database alone could easily use 20% or more of that probably due to loading all the structures form it. So I added the option to hold the custom format from OrChem in memory ( not big of an issue actually in terms of memory consumption) to reduce load on database and hence use those CPU cycles for substructure search. I guess you have long figured out how convoluted this all was. But it actually worked amazingly well! Because the hits were put into a queue it was easily possible to display the first say 5 hits on a web page while the search continued in the background. So you could give the impression of a very fast search!

Why start from scratch again?

So why change it? Tons of reasons. All of this was done with plain JDBC and various kinds of data transfer objects. Tight-Coupling and maintainability was a serious issue. On the application side of things it was impossible to sort the results because hits are returned somewhat randomly and hence real paging was not possible either. The second thing was how could you search for a substructure and a numeric property at the same time? Well the solution for that was, that one of the substructure search methods had a Set-argument. The Set should contain the database ids of the structures the search should be performed over. Hence do an SQL query for the numeric property first and feed the ids into the substructure search. That worked but again, not very straight forward. Adding and using such custom properties to the database was rather messy too, it lacked proper transaction support and so forth. All in all it was nothing to be proud of and certainly not usable in a real production environment. I did however learn a lot about the Java 5 concurrency package.

Component for Substructure Search

I decided that being dependent on a specific RDBMS is a minor issue compared to above outlined problems. I already knew about the open-source Bingo Cartridge and to my luck the company behind it was developing a version for PostgreSQL. So my choice of this component was easy. Use PostgreSQL with Bingo, both are free and open-source.

Application-side Chemistry toolkit

Especially for Input-output the framework required a Chemistry Toolkit and I again chose the Chemistry Development Kit CDK.

ORM

While it would be preferable to be independent of the ORM, I wasn’t able to achieve that but I admit I did not but much effort in it. MoleculeDatabaseFramework uses JPA 2.0 and hibernate as it’s JPA provider. The part that is hibernate specific is the custom SQL dialect I created for accessing the Structure Search functions of Bingo in JPQL and hence also QueryDSL. There is no specific reason I chose hibernate except I already knew it and it was able to do what I required. So I did not investigate any other JPA providers.

Application Framework – Dependency-Injection

Well I guess this is obvious. I chose Spring. I’ve heard and read a lot about Spring. I’ve always wanted to learn it and this was my chance. I also did not want the framework to depend an a full-blown Java EE Application server.

Data Access Layer – CRUD and Querying

I initial started the project with plain Spring and JPA (Hibernate). But shortly after I in my “research” I read about Spring Data JPA and it’s integration with QueryDSL. I quote from Spring-Data website:

Spring Data JPA aims to significantly improve the implementation of data access layers by reducing the effort to the amount that’s actually needed. As a developer you write your repository interfaces, including custom finder methods, and Spring will provide the implementation automatically.

To illustrate this here an example snippet showing an example implementation of my framework:

@Repository
public interface RegistrationCompoundRepository extends ChemicalCompoundRepository {

    List findByRegNumberStartingWith(String regNumber);

}

RegistrationCompound has a property called regNumber. Above interface method is automatically implemented by Spring Data and will return a result List of the RegistrationCompounds that match the passed in argument. That’s all you need to write. No SQL and not even a method implementation. Just create the interface and then follow the findBy method conventions of Spring Data.

A Spring Data repository can also make use of QueryDSL.

Querydsl is a framework which enables the construction of type-safe SQL-like queries for multiple backends including JPA, JDO and SQL in Java.

Example:

List result = query.from(customer)
    .where(customer.lastName.like("A%"), customer.active.eq(true))
    .orderBy(customer.lastName.asc(), customer.firstName.desc())
    .list(customer);

If you use QueryDSL in your Spring Data Repository using QueryDslPredicateExecutor

@Repository
@Transactional(propagation = Propagation.MANDATORY)
public interface ChemicalCompoundRepository
        extends ChemicalStructureSearchRepository, JpaRepository<T, Long>,
        QueryDslPredicateExecutor {
    //...
}

the repository will have additional methods that take a QueryDSL Predicate as an input. A Predicate is basically the WHERE-Clause of the query, like from above example customer.lastName.like("A%"). Some methods take additional parameter like a Pageable. This can be used for paging, the Pageable includes the paging (limit, offset) and sorting information.

This all means it is trivial to extend the repository my framework provides and add your own custom search methods to it. With using predicates you can create complex queries which at the same time search by chemical substructure, return the result sorted and paged and all this with a 1-line method declaration.

public Page findByChemicalStructure(String structureData,
            StructureSearchType searchType,
            Pageable pageable, Predicate predicate);

So I hope this got you interested!