Solutions to IT problems

Solutions I found when learning new IT stuff

Machine Learning for Beginners

leave a comment »

Introduction

This article is a very brief introduction into machine learning. It does not contain any mathematics or explanations on how machine learning works. In fact I myself am a complete novice regarding the mathematical backgrounds of machine learning. I will show a basic machine learning example using KNIME and the Iris Data set. This data set contains measurements of 3 types (classes) of Iris flowers. With this data you can create a model and then determine the type of flower just by measuring it. Be aware that this data set is very clean and simple. In more real-life examples, the data would need be cleaned before it can be used for machine learning. This data preparation usually takes most of the work, like 80-95%.

I will use an example of supervised machine learning. In supervised machine learning you pass a part of your data to a machine learning algorithm which then uses this data to build a model. With this model I can then predict to which class an unknown Iris flower belongs. This is called classification. To test how good the model is the other part of the data set, the validation set, is passed into a predictor. The predictor then uses the model to determine the class of the flower. Then the predicted and the actual class of of each flower is compared. Simply said the more matches there are, the better the model is.

The KNIME workflows created in this article is very simple and can be created in less than 5 minutes if you already know KNIME. If you have never used KNIME before, please take some time to familiarize yourself with the application. Because you are interested in machine learning I assume you are pretty good in working with computers and will quickly get to understand how KNIME works by yourself or by reading a tutorial.

Prerequisites

Please download and install KNIME.

KNIME [naim] is a user-friendly graphical workbench for the entire analysis process: data access, data transformation, initial investigation, powerful predictive analytics, visualisation and reporting. The open integration platform provides over 1000 modules (nodes), including those of the KNIME community and its extensive partner network.

With KNIME you can do data preparation and machine learning in a graphical workbench. No programming skills are required at all. In KNIME you have so called nodes. A node can either read, manipulate, visualize or write data. Nodes can be connected together to build a workflow. A KNIME workflow usually has reader node that reads in the data, then several data manipulation nodes and final a node that exports or visualizes the results.

Building the KNIME workflow

  1. Create new workflow

    create KNIME workflow

    A pop-up will appear on which you can give the workflow a name.

  2. Read the Iris data set text file

    Add Reader

    After adding the reader node you will need to configure it. Double-click on it an the configuration dialog will open. In the valid URL field enter the path to the iris data set file or browse to it. Note that the file is shipped with the KNIME installation and is in the KNIME directory in the folder IrisDataset.

    configure reader

    After the reader is configured it will turn yellow and is ready for execution. Right-click on the node and select “Execute”.

    execute reader

    If the reader node executed successfully it should turn green and you can see the data in the output port by right-clicking on the node and selecting “File Table”.

    executed reader

  3. Prepare the data

    To tell the machine learning algorithm about all discrete values in the class column, they need to be determined using the Domain Calculator node. Please add it to the workflow and then connect the reader node to it.

    connecting nodes

    The node will be auto-configured. You can directly execute it.

    domain calculator

    In the next step we will partition the data into the training set and the validation set. The training set is used to create the model and the validation set is used to test how good the model is. Add the Partitioning node to the workflow, connect it with the Domain Calculator and then configure it.

    Partitioning configuration

    Stratified Sampling will evenly distributed the different classes to the training and validation sets. The relative amount can be adjusted but this will affect the model. Also if you leave “Use random seed” unchecked, the training and validation set will differ with each run and hence also the accuracy of the model. To keep a fixed training set, please check it. After configuring the node, execute it.
    Partitioning executed

  4. Build and use the model

    We will use a decision tree for building the model. We will use the standard configuration. Add the Decision Tree Learner node to the workflow and connect the top port of the Partitioning node to it. Then execute the learner.

    executed learner

    Now add the Decision Tree Predictor node, connect the lower port of Partitioning node to it. Then connect the model output port of the learner to the predictor.

    executed predictor

  5. Validate the model

    Now we need to check how good our model can predict the correct flower class. To do so add the Scorer node and connect the predictor node to it. Then configure the Scorer node.

    configure scorer

    Execute the scorer. Then right-click on it and open the accuracy statistics. This will display information on the performance of your model.

    accuracy statistics

    The question is: What is a good model? Simply said you will want to get a high accuracy. For this model, that is enough. In other more complex scenarios you might need to especially avoid either false positives or false negatives meaning a bad accuracy with no false negatives can be better than a good accuracy but with false negatives. Context matters a lot. An example of this would be an HIV test. False positives in small numbers are not bad because you can just redo the test and almost certainly it will be negative the second time. However a false negative is unacceptable. You won’t redo the test but even worse you then will probably infect someone else.

Download KNIME workflow

After creating the workflow you can now play with the settings in the learner node or maybe change the amount in the Partitioning node and see how it affects the accuracy of the model. KNIME also has different algorithms for creating models. You could alternatively try a neural network

neural network

or a support vector machine. For this data all of the algorithms work pretty well. You can also install the Weka extension for KNIME and get access to tons of machine learning functionality from Weka.

If you have a chemistry background you might also be interested in the KNIME Labs Decision Tree Ensemble extension. The contained Tree Ensemble Learner can use fingerprints (bitvectors) for creating models. This is especially cool because KNIME is chemistry-aware. It can read sd-files and you can generated the fingerprints directly in KNIME by using for example the RDKit extentsion.

I hope this article helped you getting started with machine learning. Cheers.

Written by kienerj

May 8, 2014 at 14:44

Posted in Tools

Tagged with ,

Working with Swings JTable – An Example

leave a comment »

Introduction

In my previous article Fast random file access and line by line reading in Java I described how I created a random access file class that has a fast readLine() method. The reason for creating that class was, that I was implementing a random access reader for the chemistry file format sdf. Such files are ASCII text files and can contain tens of thousands of records. Each record consists of a variable amount of lines. For fast random access I wanted to index the offset where each record begins. With seek(offset) a requested record can then be accessed very quickly. To index the file it must be read line by line to search for the record separator $$$$. Therefore a fast readLine() method was crucial for performance.

After successfully implementing my reader for the sdf format, called sdf-reader, I wanted to create a GUI on top of it. I called it Free SDF Viewer. Records in sd-files contain a chemical structure and optionally associated data. Therefore sdf format is often used to exchange chemical databases and that’s also why it makes a lot of sense to use a table to visualize sd-files.

Requirements

Before diving in I created a short list of the requirements. sd-files can be huge and hence they should not have to be fully loaded into memory. If the user scrolls down in the table the rows above should not be kept in memory but the scrolling should be smooth meaning that some records should be cached.

The first column should display the chemical structure and that chemical structure must be resizable (height and width). This means that the columns width and the row height must be user adjustable. Because sd-files can contain thousands of records I also wanted to have a row header only containing the row number (1-based).

    • Usable with large files
    • smooth scrolling
    • user adjustable column width
    • user adjustable row height
    • row header containing the row number (1-based)

I had some additional requirement like a nice and easy to use file chooser but that is not related to working with a JTable. All in all I had the feeling that my list was reasonable and implementation should be easy. But boy, was I wrong. It turned out to be a very bumpy ride.

Accessing the data

My first problem was how to get the data from the file into the JTable. My sdf-reader returns an SdfRecord object that contains all the data. The access is index-based meaning the first record has index 0 and so forth. It turned out to be rather straight forward. A JTable gets the data from a TableModel. The solution is to create a custom TableModel implementation. The most important method of TableModel is getValueAt(int rowIndex, int columnIndex). In this method you define how the data is retrieved. The source can be anything. A naive implementation of this method for my case will look like this:

@Override
public Object getValueAt(int rowIndex, int columnIndex) {

	SdfRecord record = sdfReader.getRecord(rowIndex);

	if (columnIndex == 0) {
		//this column is always the chemical structure
		// display the chemical structure image
		String molfile = record.getMolfile();
		ImageIcon chemicalStructure = new ChemicalStructureIcon(molfile,indigo,renderer,imageWidth, imageHeight);
		return chemicalStructure;
	} else {
		// display data. currently everything is treated as String
		// inlucding numbers and dates
		String columnName = getColumnName(columnIndex);
		String value = record.getProperty(columnName);
		return value;
	}
}

This is naive because the underlying sd-file will be accessed several times for a single row while reading exactly the same data. In my actually implementation I’m caching 100 records around the current position. So if the user scrolls up or down the data will be read from the cache. If the users scrolls far enough some data will be evicted from the cache and new data loaded. This cache logic is omitted for simplicity. For the full source code see the Free SDF Viewers project page.

You might have noticed that in above code I’m creating an instance of ChemicalStructureIcon. I will discuss this in the next chapter.

Displaying the chemical structure

I thought that displaying an image in a JTable cell would be very straightforward and easy. However that was wrong. During my search I found that one should use the ImageIcon class because JTable can render that by default. This is not entirely true. You actually have to specifically tell JTable to render ImageIcon columns as image. This is done by a custom TableCellRenderer.

private class SdfTableCellRenderer extends DefaultTableCellRenderer {

	@Override
	public void setValue(Object value) {
		if (value instanceof ImageIcon) {
			setIcon((ImageIcon) value);
			setText("");
		} else {
			setIcon(null);
			super.setValue(value);
		}
	}
}

Note that I will later show you that I extended JTable to easily initialize all my added features. That is why this is a private inner class.

After being able to see images in table cells I realized that when changing the column width, the image is not automatically adjusted to be smaller or larger. Note that changing column width is supported by default. After another search session I realized that the only solution was to extend ImageIcon and I therefore created ChemicalStructureIcon. ChemicalStructureIcon uses the Indigo Chemistry Toolkit for rendering chemical structures. The most relevant code shown below is the paintIcon(Component c, Graphics g, int x, int y) method which is called when the image is drawn. If the column width changes, the image is re-rendered automatically.

@Override
public synchronized void paintIcon(Component c, Graphics g, int x, int y) {
	Image image = getImage();
	if (image == null) {
		return;
	}
	Insets insets = ((Container) c).getInsets();
	x = insets.left;
	y = insets.top;

	int w = c.getWidth() - x - insets.right;
	int h = c.getHeight() - y - insets.bottom;

	if (w != width || h != height) {
		if (w < 16 || h < 16) {
			// 16 pixels is minimum size supported by indigo
			return;
		}
		width = w;
		height = h;
		indigo.setOption("render-image-size", w, h);
		image = renderImage();
		setImage(image);
	}

	ImageObserver io = getImageObserver();
	g.drawImage(image, x, y, w, h, io == null ? c : io);
}

You can see the complete source code on the projects web page.

Adding a row header

JTable has no concept of a row header. But not all hope is lost because JScrollPane does and a JTable usually is inside a JScrollPane. In my solution I use an extra JTable as a row header. That table has a very simple, custom TableModel.

@Override
public Object getValueAt(int rowIndex, int columnIndex) {
	return rowIndex + 1;
}

Complete source code of RowHeaderModel

To actually use this I extended JTable, added the field headerTable.This row header is initialized in the constructor of this custom JTable implementation.

public SdfTable(JScrollPane scrollPane, SdfReader sdfReader, int rowHeight) {
	super();
	//snipped other initialization code
	headerModel = new RowHeaderModel(getRowCount());
	headerTable = new JTable(headerModel);
	headerTable.setRowHeight(getRowHeight());
	headerTable.setShowGrid(false);
	headerTable.setAutoResizeMode(JTable.AUTO_RESIZE_OFF);
        // 60 -> the width of the row header in px
	headerTable.setPreferredScrollableViewportSize(new Dimension(60, 0));
	headerTable.getColumnModel().getColumn(0).setPreferredWidth(60);
	headerTable.getColumnModel().getColumn(0).setCellRenderer(new RowHeaderCellRenderer());
	// synchronize selection by using the same selection model in both tables
	headerTable.setSelectionModel(this.getSelectionModel());
	scrollPane.setRowHeaderView(headerTable);
	setPreferredScrollableViewportSize(getPreferredSize());
}

There is a lot going on here. Also note that both tables use the same SelectionModel. This means that when the user clicks in the row header, that whole row will be selected and when the user clicks on a row, the row header will be selected too.

When the row height changes, the row header must have the same new height. Therefore my JTable implementation overrides the setRowHeight() methods.

@Override
public void setRowHeight(int rowHeight) {
	super.setRowHeight(rowHeight);
	if (headerTable != null) {
		headerTable.setRowHeight(rowHeight);
	}
}

@Override
public void setRowHeight(int row, int rowHeight) {
	super.setRowHeight(row, rowHeight);
	if (headerTable != null) {
		headerTable.setRowHeight(row, rowHeight);
	}
}

Complete source code of SdfTable

Change row height with mouse

One of the requirements was that the user can adjust the row height. This must be possible for individual rows and all rows at once. First we will look into adjusting a single rows height.

Change height of single row

To listen for mouse input we need to extend MouseInputAdapter. The idea is to show a resize cursor when the mouse it at the boundary of 2 rows.  When the user then presses the left button and drags the mouse, the upper row will be resized relatively to the distance traveled by the mouse. This requires us to use basic math and geometry knowledge.

private int getResizingRow(Point p) {
	return getResizingRow(p, table.rowAtPoint(p));
}

private int getResizingRow(Point p, int row) {
	if (row == -1) {
		return -1;
	}
	int col = table.columnAtPoint(p);
	if (col == -1) {
		return -1;
	}
	Rectangle r = table.getCellRect(row, col, true);
	r.grow(0, -3);
	if (r.contains(p)) {
		return -1;
	}

	int midPoint = r.y + r.height / 2;
	int rowIndex = (p.y < midPoint) ? row - 1 : row;

	return rowIndex;
}

@Override
public void mousePressed(MouseEvent e) {
	Point p = e.getPoint();
	resizingRow = getResizingRow(p);
	mouseYOffset = p.y - table.getRowHeight(resizingRow);
	if (resizingRow >= 0) {
		table.setRowSelectionAllowed(false);
		table.setAutoscrolls(false);
	}
}

The math is one thing, more problematic was that dragging the mouse could lead to weird behavior on screen with the 2 affected rows flickering as they are constantly being (de)-selected. Also if you drag the mouse upwards to the table header, the table begins to scroll. For these reasons these features are disabled while the resizing occurs.

The row height is then changed according to the distance covered (up or down, Y-Coordinate in Swing) by the mouse when dragging.

@Override
public void mouseDragged(MouseEvent e) {
	table.clearSelection();
	int mouseY = e.getY();

	if (resizingRow >= 0) {
		int newHeight = mouseY - mouseYOffset;
		if (newHeight > 0) {
			table.setRowHeight(resizingRow, newHeight);
		}
	}
}

Complete source code of TableRowResizer

Change row height for all rows

This could be solved easily with a prompt / input were the user types in a number. However that is not user friendly at all. The idea is that if the mouse is at the edge of the top row and the table header, a resize cursor should be shown and if the mouse is pressed and dragged, all rows height will be changed. This sounds identical to above solution, however it is not. The table header is an instance of JTableHeader. It has a different cursor than JTable. So depending on mouse location either the tables or the table headers cursor must be changed into a resize cursor.

private boolean isResizingHeader(MouseEvent e) {
	Point p = e.getPoint();

	Object source = e.getSource();
	JTableHeader header = table.getTableHeader();

	if (source instanceof JTableHeader) {

		int col = table.columnAtPoint(p);
		if (col == -1) {
			return false;
		}

		return ((header.getY() + header.getHeight()) - 5) < p.y;

	} else if (source instanceof JTable) {

		int topRow = getTopRow();
		int row = table.rowAtPoint(p);

		if (row == topRow) {
			int col = table.columnAtPoint(p);
			if (col == -1) {
				return false;
			}
			Rectangle r = table.getCellRect(row, col, true);
			r.grow(0, -5);
			return r.y > p.y;
		}
	}	
	return false;
}

and for changing the cursor

@Override
public void mouseMoved(MouseEvent e) {
	if (e.getSource() instanceof JTable) {
		if (isResizingHeader(e)
				!= (table.getCursor() == resizeCursor)) {
			swapTableCursor();
		}
	} else if (e.getSource() instanceof JTableHeader) {
		if (isResizingHeader(e)
				!= (table.getTableHeader().getCursor() == resizeCursor)) {
			swapHeaderCursor();
		}
	}
}

Another issue is when the top row is only partially visible this resizing should work too. Also when changing the row height of all rows, the total height of the table changes dramatically. Without taking precautions this leads to erratic auto scrolling to different records. To prevent that, the current top row is programmatically kept at the top of the viewport.

@Override
public void mouseDragged(MouseEvent e) {
	int mouseY = e.getYOnScreen();
	if (isResizing) {
		int newHeight = table.getRowHeight() + (mouseY - yOffset);
		if (newHeight > 0) {
			yOffset = e.getYOnScreen();
			table.setRowHeight(newHeight);
			JViewport viewport = (JViewport) table.getParent();
			JScrollPane scrollPane = (JScrollPane) viewport.getParent();
			// This rectangle is relative to the table where the
			// northwest corner of cell (0,0) is always (0,0).
			Rectangle rect = table.getCellRect(topRow, 0, true);
			scrollPane.getVerticalScrollBar().setValue(rect.y);
		}
	}
}

This code has one downside, that as soon as the user drags the mouse, the top row will jump down fully into view but else it works very nicely.

See complete source code of AllRowsResizer.

Putting it all together

To put all the features together I extended JTable in my custom class SdfTable. The full constructor for SdfTable can be seen below.

public SdfTable(JScrollPane scrollPane, SdfReader sdfReader, int rowHeight) {
	super();
	DefaultTableCellRenderer r = new SdfTableCellRenderer();
	setDefaultRenderer(Object.class, r);
	TableRowResizer rowResizer = new TableRowResizer(this);
	AllRowsResizer allRowsResizer = new AllRowsResizer(this);

	TableModel tableModel = new SdfTableModel(sdfReader);
	setModel(tableModel);
	super.setRowHeight(rowHeight);
	getColumnModel().getColumn(0).setPreferredWidth(STRUCTURE_COLUMN_WIDTH);

	headerModel = new RowHeaderModel(getRowCount());
	headerTable = new JTable(headerModel);
	headerTable.setRowHeight(getRowHeight());
	headerTable.setShowGrid(false);
	headerTable.setAutoResizeMode(JTable.AUTO_RESIZE_OFF);
	headerTable.setPreferredScrollableViewportSize(new Dimension(60, 0));
	headerTable.getColumnModel().getColumn(0).setPreferredWidth(60);
	headerTable.getColumnModel().getColumn(0).setCellRenderer(new RowHeaderCellRenderer());
	// synchronize selection by using the same selection model in both tables
	headerTable.setSelectionModel(this.getSelectionModel());
	scrollPane.setRowHeaderView(headerTable);
	setPreferredScrollableViewportSize(getPreferredSize());
}

The JScrollPane argument is required for correctly setting the row header.

Complete source code of SdfTable.

Additional Features and Comments

Free SDF Viewer has some additional features not directly related to JTable and there are other issues I encounter that apply to everything in Swing.

Showing a wait cursor when loading sd-file

I created a menu with a single entry “Load SD-files…”. Clicking on it will display a FileChooser and then load and initialize an SdfTable object. This means that the code runs under the so called Event Dispatch Thread (EDT). Changes to UI Elements done in this thread will not become visible until the action completes. This means that if you change the cursor to a wait cursor and then load the sd-file in the EDT, the user will never see the wait cursor. Changing the cursor must be done in a different thread but not just any thread. You are required to use a SwingWorker thread. The take-away message is, that simple things become unexpectedly complex.

First the code executed after clicking on the “Load SD-File…” menu option:

private void loadFileMenuItemActionPerformed(java.awt.event.ActionEvent evt) {

        int returnVal = fileChooser.showOpenDialog(this);
        if (returnVal == JFileChooser.APPROVE_OPTION) {
            File file = fileChooser.getSelectedFile();
            logger.debug("Opening SD-File '{}'.", file.getAbsoluteFile());
            SdfLoader loader = new SdfLoader(this, file);
            loader.execute();
        } else {
            logger.debug("Opening of SD-file cancelled by the user.");
        }
    }

This creates and SdfLoader instance. SdfLoader extends SwingWorker. In the constructor it changes the cursor to a wait cursor, then in the doInBackground()-method the sd-file is loaded and finally in the done()-method the SdfTable is created and the cursor reverted back to its previous state.

private class SdfLoader extends SwingWorker<JTable, Void> {

	private final JFrame frame;
	private final File sdFile;
	private JTable table;
	private IOException ioException;

	public SdfLoader(JFrame frame, File sdFile) {
		this.frame = frame;
		frame.setCursor(Cursor.getPredefinedCursor(Cursor.WAIT_CURSOR));
		this.sdFile = sdFile;
	}

	@Override
	public JTable doInBackground() {
		try {
			//close old file
			if (sdfReader != null) {
				sdfReader.close();
			}
			sdfReader = new SdfReader(sdFile);
			lastOpenDir = sdFile.getParentFile();
			table = new SdfTable(jScrollPane1, sdfReader, 200);
		} catch (IOException ex) {
			logger.catching(ex);
			ioException = ex;
			table = jTable1;
		}
		return table;
	}

	@Override
	public void done() {
		if (ioException == null) {
			jTable1 = table;
			jScrollPane1.setViewportView(table);
			frame.setCursor(Cursor.getDefaultCursor());
		} else {
			JOptionPane.showMessageDialog(SdfViewer.this,
					ioException.getMessage(),
					"Error opening file",
					JOptionPane.ERROR_MESSAGE);
		}
	}
}

Remembering Settings

  One of them is remembering the last directory I sd-files was opened from. This information is written into a properties file and loaded at start-up. Future versions might make additonal use of this to store other settings like rendering options for the chemical structure. I’m mentioning this so that you are not confused by unexplained code in the main class SdfViewer.

Screenshots

First a screenshot showing a row with a different row height.

The second screen shot shows the SdfTableModel in action. The user currently is at row 139045 in a large sd-file and there are no performance issues on a standard laptop.

The full project Free SDF Viewer is available on bitbucket. There is also a download for an executable jar file in the projects downloads section.

Written by kienerj

January 17, 2014 at 10:00

Posted in Chemistry, Java, Programming

Tagged with , ,

Fast random file access and line by line reading in Java

with one comment

Introduction

I recently read a message on a mailing list of an open-source project were the user was comparing it to a commercial product and claiming that the commercial product was a lot faster. The specific topic was about randomly accessing a large file containing multi-line records. The user said when initially opening the file it took minutes to create an index to later quickly access the desired records. The issue with that was that the file only contained 50k records were as the real deal would be like 1 Mio. records. The commercial product opens the 1 Mio file instantly. So questions like that always trigger my curiosity also since I have to admit my knowledge in Java IO is very limited, meaning reading text files line by line with BufferedReader. So I set out on a quest into the java.io API.

The Open-Source code

The commercial product probably uses multi-threading to create index in background. Another possibility is, that in fact it does not allow true random access just scrolling up and down and does some read-ahead caching. Hard to tell without access to it. So I looked at the code of the open-source project and it turns out that it uses java.io.RandomAccessFile. This makes sense. I did not fully get the indexing method, it seemed more complex than it needed be but it read the file line by line using java.io.RandomAccessFile. Basically each record is separated by a delimiter that appears on a separate line. So mapping each line (or it’s position) seems a fast and reasonable way to index the file. Or so I thought.

Dark-Side of the JDK

java.io.RandomAccessFile.readLine() method was supposedly I quote

written by a first semester CS student that dropped out. It can hardly perform any worse and it performs two orders of magnitude slower than it could.

And anyone can check and confirm this…even in the newest JDK 7. It reads a file byte by byte with no buffer. So the conclusion was to look elsewhere. However BufferedReader or other alternatives do not offer random access or a way to get the offset in bytes from the start of the file. And java.nio remained a mystery even after consulting my best friend Google. So I was kind of lost.

Roll-your-own

Or shall I say learning by doing? I set out to create my own indexing method using BufferedReader and mapping line numbers. Going to a specific record then requires a certain number of readLine() calls without caring about the returned data. This was already a lot faster than the infamous java.io.RandomAccessFile.readLine() way of doing it. However I was not satisfied because it was still too slow and let’s be honest kind of an ugly way to do it. As a next step I tried to read the file in a buffered way using the java.io.RandomAccessFile.read(byte[]) method. I converted the buffer to a String and then searched for the delimiter and mapped it’s offset in bytes form the start of the file. With the java.io.RandomAccessFile.seek(long) method that position can the be quickly accessed, randomly. This took some tinkering till I got it right but to my surprise this was still not very fast, in fact it was hardly faster than the previous ugly BufferedReader method, This left me puzzled. Actually I’m still puzzled even after finding the actual solution why this was over 10 times slower. I guess at some critical places using “convenience” classes like String and ArrayList over shuffling around array indexes has a very high price.

The Solution

Instead of rolling my own indexing method I decided to create a RandomAccessFile wrapper that has a usable readLine() method. The solution now looking back is obvious. Basically I just copy & pasted the BufferedReader.readLine() method and made some minor adjustments. These adjustments are for tracking the position (or offset or file pointer) and then setting it to the correct position if say a write-method is called and invalidating the buffer used for readLine(). And it works! So I now have a way of fast random access and fast line by line reading in one single Java class called OptimizedRandomAccessFile. This indexing now is pretty much 100 times faster. Wow. One should have thought that is a simple task. Way to go ex-Sun and Oracle!

Written by kienerj

September 23, 2013 at 21:01

Posted in Java, Programming

Tagged with ,

Creating a Framework for Chemical Structure Search – Part 9

leave a comment »

Series Overview

This is Part 9 – Putting it all together of the “Creating a Framework for Chemical Structure Search“-Series.

Previous posts:

Introduction

In this final post I’m going to show you a basic Spring MVC 3 Web Application I made based on MoleculeDatabaseFramework.

Functionality

This Web Application MDFSimpleWebApp lets you

  • import ChemicalCompounds from an SD-File
  • do a chemical substructure search for compounds
  • view the search hits in a paged, tabular fashion
  • view individual search hits
  • download all search hits as SD-File

This is of course only a subset of all the features offered by MoleculeDatabaseFramework but it gives you a general idea how the framework works in terms of writing code and performance.

Entity

MDFSimpleWebApp contains 1 ChemicalCompound implementation called SimnpleCompound. It is the most basic possible implementation of ChemicalCompound with no additional properties.

There is also the entity SimpleLot which extends Containable. However it is not yet currently used within the application.

Repository and Service

MoleculeDatabaseFramework requires that you create a repository interface, a repository implementation (for chemical structure searching), a service interface and a service for each of your entities. Hence I created a SimpleCompoundRepository, SimpleCompoundRepositoryImpl, SimpleCompoundService and SimpleCompoundServiceImpl. These classes offer no custom search methods. They just implement all the methods required by the framework. See the Repository- and Service Packages.

SimpleCompoundController

This is the controller for SimpleCompound. The controller takes web requests and passes them on the Service Layer, in this case this is SimpleCompoundService, an implementation of ChemicalCompoundService. The controller exposes certain methods from the service like importing of SD-Files, chemical substructure searching or image rendering of chemical structures.

Rendering Images of chemical structures

For displaying chemical compounds I choose the option to dynamically generate images of all chemical structures in the compound. This functionality is also provided by MoleculeDatabaseFramework. Hence the according controller method is very simple:

@RequestMapping(value = "/{compoundId}/render", method = RequestMethod.GET)
public void renderCompound(@PathVariable Long compoundId,
		final HttpServletResponse response,
		@RequestParam(defaultValue = "500") int width,
		@RequestParam(defaultValue = "150") int height) throws IOException {
	try (ServletOutputStream out = response.getOutputStream()) {
		IAtomContainer mol = compoundService.getCdkMolecule(compoundId);
		MoleculeRenderer renderer = new MoleculeRenderer(width, height);
		renderer.renderMolecule(mol, out);
	}
}

and in a web page you just need to add the according image tag.

JSP with JSTL:

<img src="<c:url value="/compound/${compound.getId()}/render?width=500&height=300"/>" />

Or generated in JavaScript:

var html = <img alt="' + smiles + '" src="/MDFSimpleWebApp/compound/'+ compoundId + '/render" />
// insert image into existing html element

As example here an image of the web page for viewing a compound:

rendering example

Importing SD-File

For uploading a file using Spring MVC 3 I followed this tutorial. I had to create the very simple class FileUploadForm and the controller method is rather simple too:

@RequestMapping(value = "/import", method = RequestMethod.POST)
public String importCompounds(Model model, FileUploadForm fileUploadForm,
		BindingResult result)
		throws IOException {

	if (result.hasErrors()) {
		model.addAttribute("hasError", true);
		model.addAttribute("bindingResult", result);
		model.addAttribute(fileUploadForm);
		return "importCompounds";
	}
	Reader reader = new InputStreamReader(fileUploadForm.getFileData().getInputStream(), "US-ASCII");
	EntityImportResult importResult = compoundService.importSDF(reader, true);

	model.addAttribute("hasError", false);
	model.addAttribute("imported", importResult.getImportedEntities().size());
	model.addAttribute("present", importResult.getEntitiesAlreadyInDatabase().size());
	model.addAttribute(new FileUploadForm());

	return "importCompounds";
}

Chemical Structure Search

Search Form

The Chemical Structure Search is made up of a page that contains a tool for drawing chemical structures and submitting the search and the actual page for displaying search results. For drawing chemical structures MDFSimpleWebApp initially used the JChemPaint Applet but I recently changed it to JSME, a JavaScript based drawing tool. See below the search form with JSME:

Chemical Structure Search Form

Search Result Page

The search results page relies heavily on AJAX using JQuery and the JQuery plugin datatables. The search hits are displayed in paged fashion using datatables server-side processing and hence only 1 page of results is fetched from the database. The results table contains an image of the chemical structure, the compounds name and its CAS number. Clicking on the image will show a JavaScript alert containing the SMILES String of the given chemical structure.

Search Results Page

For each new page a AJAX request is sent to the server and the according page is returned. Note that the initial load of the page can take a bit longer. This is due to the fact that the total amount of hits is determined (eg. no SQL LIMIT-Clause). This count is cached so that all page requests are as fast. However due to how OFFSET and LIMIT work, the higher the page number, the longer the search takes. So if you have a high number of hits (eg. several thousands) the last page will load slower than the first one. If you want to display search hits 10’000 to 10’004 the database will search up to hit number 10’004 and then return the last 5 hits. However in general you should improve your search if you get so many hits.

After the page is returned from the server, the data must be converted to JSON and in a format expected by datatables. To achieve that I create the helper class JQueryDatatablesPage that contains all the properties that datatables requires and the according getters and setters. JQueryDatatablesPage is then converted to JSON using Jackson 2 ObjectMapper.

@RequestMapping(value = "/search", method = RequestMethod.GET, produces = "application/json")
public @ResponseBody
String search(
		@RequestParam int iDisplayStart,
		@RequestParam int iDisplayLength,
		@RequestParam int sEcho, // for datatables draw count
		@RequestParam String structure) throws IOException {

	int pageNumber = (iDisplayStart + 1) / iDisplayLength;
	PageRequest pageable = new PageRequest(pageNumber, iDisplayLength);
	Page<SimpleCompound> page = compoundService.findByChemicalStructure(structure, StructureSearchType.SUBSTRUCTURE, pageable);
	int iTotalRecords = (int) compoundService.count(null);
	int iTotalDisplayRecords = (int) page.getTotalElements();
	JQueryDatatablesPage<SimpleCompound> dtPage = new JQueryDatatablesPage<>(
			page.getContent(), iTotalRecords, iTotalDisplayRecords,
			Integer.toString(sEcho));

	String result = toJson(dtPage);
	return result;

}

private String toJson(JQueryDatatablesPage<?> dt) throws IOException {
	ObjectMapper mapper = new ObjectMapper();
	mapper.registerModule(new Hibernate4Module());
	return mapper.writeValueAsString(dt);
}

Jackson 2 can deal with circular references if your entities are annotated with

@JsonIdentityInfo(generator=ObjectIdGenerators.IntSequenceGenerator.class, property="@id")

You also need to register the Hibernate4Module to deal with Lazy Collections!

Donwload of Search Hits

You can download search result hits as SD-File by clicking on the Download Hits-Link on the search results page. The browser will display a dialog were you want to save the file. This uses the exportSDF()-method of SimpleCompoundService.

@RequestMapping(value = "/downloadHits", method = RequestMethod.GET)
public void downloadHits(@RequestParam String structure, HttpServletResponse response) throws IOException {

	List<Long> ids = compoundService.findByChemicalStructure(structure, StructureSearchType.SUBSTRUCTURE);
	HashSet<String> properties = new HashSet<>();
	properties.add("compoundName");
	response.setContentType("chemical/x-mdl-sdfile");
	String disposition = "attachment; fileName=searchHits-" + structure + ".sdf";
	response.setHeader("Content-Disposition", disposition);
	ServletOutputStream output = response.getOutputStream();
	OutputStreamWriter writer = new OutputStreamWriter (output);
	compoundService.exportSDF(ids, writer, properties);
}

Final Words

See below a demo video of a Chemical Substructure Search in MDFSimpleWebApp with a database of 65’000 compounds. The demo runs on a dual-core mobile i5 running Windows 7 32-bit with 4 GB of RAM installed or said otherwise: The hardware is pretty mediocre.

MDFSimpleWebApp is hosted on bitbucket. If you want to try out this application you can go to the download section on bitbucket and download a fully working standalone version for Windows 64-bit including PostgreSQL, the Bingo Cartridge for Chemical Structure Searching, tomcat as servlet container and this web application. Note: This file is 105 MB due to PostgreSQL and tomcat being included.

Written by kienerj

June 6, 2013 at 12:41

Spring MVC 3 Tutorial for beginners

with one comment

Introduction

I was trying to create a very simple Web Application using Spring MVC as a usage example for MoleculeDatabaseFramework. Now Spring MVC is obviously overkill for this but it was a good change to learn it. However this endeavor turned out to be much more complicated than I had anticipated. Also it is very hard to find actual tutorials and information on how to use Spring MVC 3. A lot of stuff is from older versions and often it is not mentioned for which version a code or configuration snippet is.

Starting Point

I created a maven web application using netbeans IDE. I added MoleculeDatabaseFramework-1.0.0-SNAPSHOT as dependency. To use the same configuration as my integration test of MoleculeDatabaseFramework use, I had to add further dependencies as they are in the test-scope and not included in MoleculeDatabaseFramework-1.0.0. I also added dependencies spring-web and spring-webmvc, both are required!

Dependency Issues

MoleculeDatabaseFramework project has a dependency on spring-context-3.1.4. In the web application I chose spring-web-3.2.2 and spring-webmvc-3.2.2 which replaced spring-context-3.1.4 with spring-context-3.2.2. The problem is that spring-context-3.2.2 contains less packages than spring-context-3.1.4 and hence I kept getting an exception that EhCacheCacheManager class was missing. After wasting over an hour on this I figured out that since spring-3.2.x this is in a separate dependency spring-context-support-3.2.2. That solved this issue and here the according maven dependencies:

<dependency>
	<groupId>org.springframework</groupId>
	<artifactId>spring-web</artifactId>
	<version>3.2.2.RELEASE</version>
</dependency>
<dependency>
	<groupId>org.springframework</groupId>
	<artifactId>spring-webmvc</artifactId>
	<version>3.2.2.RELEASE</version>
</dependency>
<dependency>
	<groupId>org.springframework</groupId>
	<artifactId>spring-context-support</artifactId>
	<version>3.2.2.RELEASE</version>
</dependency>

Note: this is only required if you are using caching with ehcache in your application.

A second problem was that I was missing class javax.servlet.jsp.jstl.core.Config. This was weird because I could clearly see in netbeans that this was provided by javaee-web-api-6.0 which was added to the project automatically at creation from netbeans. However it is in the provided scope and hence not included in the war-file. To resolve this you need to manually add dependency

<dependency>
	<groupId>javax.servlet</groupId>
	<artifactId>jstl</artifactId>
	<version>1.2</version>
</dependency>

Putting it all togehter here all dependencies for the project:

    <dependencies>
        <dependency>
            <groupId>${project.groupId}</groupId>
            <artifactId>MoleculeDatabaseFramework</artifactId>
            <version>${project.version}</version>
        </dependency>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <version>0.11.8</version>
        </dependency>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-web</artifactId>
            <version>3.2.2.RELEASE</version>
        </dependency>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-webmvc</artifactId>
            <version>3.2.2.RELEASE</version>
        </dependency>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-context-support</artifactId>
            <version>3.2.2.RELEASE</version>
        </dependency>
        <dependency>
            <groupId>net.sf.ehcache</groupId>
            <artifactId>ehcache</artifactId>
            <version>2.6.5</version>
            <type>pom</type>
        </dependency>
        <dependency>
            <groupId>org.hibernate</groupId>
            <artifactId>hibernate-ehcache</artifactId>
            <version>4.2.1.Final</version>
        </dependency>
        <dependency>
            <groupId>javax</groupId>
            <artifactId>javaee-web-api</artifactId>
            <version>6.0</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>javax.servlet</groupId>
            <artifactId>jstl</artifactId>
            <version>1.2</version>
        </dependency>
        <dependency>
            <groupId>com.jolbox</groupId>
            <artifactId>bonecp-spring</artifactId>
            <version>0.8.0-rc2-SNAPSHOT</version>
        </dependency>
    </dependencies>

Configuration issues

Configuration File Locations

web.xml Spring configuration file(s) go into the myApp/src/main/webapp/WEB-INF directory or in netbeans it is displayed in myApp/Web Pages/WEB-INF (but the actual path is the prior one). If you have a properties file, that is used by the spring context, like for database config, you need to put it in myApp/src/main/resources and it then can be found with the path classpath:Application.properties. The same is true for the ehcache.xml configuration file. if you use hibernates import.sql file to load data on application start, that belongs there too.

Configuration Files

I created 2 spring configuration files, ApplicationContext.xml and mvc-config.xml. The later is used to configure Spring MVC and ApplicationContext.xml is used to configure
MoleculeDatabaseFramework and requires this entry:

<import resource="mvc-config.xml" />

web.xml

You also need to create a web.xml file in WEB-INF. web.xml contains a reference to the DispatcherServlet that accepts all request:

<servlet>
	<servlet-name>myApp</servlet-name>
	<servlet-class>
		org.springframework.web.servlet.DispatcherServlet
	</servlet-class>
	<init-param>
		<param-name>contextConfigLocation</param-name>
		<param-value>
			/WEB-INF/ApplicationContext.xml
		</param-value>
	</init-param>
	<load-on-startup>1</load-on-startup>
</servlet>

Note that if you do not specify the contextConfigLocation then the application context must be named -servlet.xml and hence in this case myApp-servlet.xml. load-on-startup must be 1 so that the Spring Context loads on application start.

You also need to specify which URLs you want to map:

<servlet-mapping>
	<servlet-name>MDFSimpleWebApp</servlet-name>
	<url-pattern>/*</url-pattern>
</servlet-mapping>

This means we want to map all request to myApp.

URL Mapping Issue

After the application actually loaded and my Controller was found and accessed correctly i kept getting an error that my views are not found. Even weirder was that the path to the view displayed in the log was correct!

The solution can be found in this stackoverflow question. However it is not the accepted answer but the one from sourcedelica. You need to add an additional mapping to your web.xml.

<servlet-mapping>
	<servlet-name>jsp</servlet-name>
	<url-pattern>/WEB-INF/jsp/*</url-pattern>
</servlet-mapping>

were the url-pattern is the path to the directory of all your web pages. This leads to following complete web.xml:

<?xml version="1.0" encoding="UTF-8"?>
<web-app xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xmlns="http://java.sun.com/xml/ns/javaee"
         xmlns:web="http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd"
         xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd"
         id="WebApp_ID" version="2.5">
    <display-name>MDF Simple Web Application</display-name>
    <welcome-file-list>
        <welcome-file>index.jsp</welcome-file>
    </welcome-file-list>

    <servlet>
        <servlet-name>MDFSimpleWebApp</servlet-name>
        <servlet-class>
            org.springframework.web.servlet.DispatcherServlet
        </servlet-class>
        <init-param>
            <param-name>contextConfigLocation</param-name>
            <param-value>
                /WEB-INF/ApplicationContext.xml
            </param-value>
        </init-param>
        <load-on-startup>1</load-on-startup>
    </servlet>
    <servlet-mapping>
        <servlet-name>MDFSimpleWebApp</servlet-name>
        <url-pattern>/*</url-pattern>
    </servlet-mapping>
    <servlet-mapping>
        <servlet-name>jsp</servlet-name>
        <url-pattern>/WEB-INF/jsp/*</url-pattern>
    </servlet-mapping>
</web-app>

mvc-config.xml

I chose an annotation based approach for controllers. But all in all the mvc configuration file did not pose any troubles:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:p="http://www.springframework.org/schema/p"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:mvc="http://www.springframework.org/schema/mvc"

       xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.2.xsd
          http://www.springframework.org/schema/mvc http://www.springframework.org/schema/mvc/spring-mvc-3.2.xsd">

    <mvc:annotation-driven />
    <mvc:view-controller path="/" view-name="index"/>
    <bean class="org.springframework.web.servlet.view.InternalResourceViewResolver">
        <property name="viewClass" value="org.springframework.web.servlet.view.JstlView"/>
        <property name="prefix" value="/WEB-INF/jsp/"/>
        <property name="suffix" value=".jsp"/>
    </bean>

</beans>

ApplicationContext.xml

This file is huge and configures MoleculeDatabaseframework inlcuding spring-data, hibernate, connection pooling and so forth. It is not really relevant to Spring-MVC and hence I’m intentionally omitting it to avoid any confusion. However it must contain at least

<import resource="mvc-config.xml" />

so that the mvc configuration is loaded.

Controller

I created 1 very simple controller and I’m just going to post the source code and explain id:

@Controller
@RequestMapping(value = "/compound")
public class SimpleCompoundController {

    @Autowired
    private SimpleCompoundService compoundService;

    @RequestMapping(value = "/{compoundId}", method = RequestMethod.GET)
    public String getSimpleCompound(@PathVariable Long compoundId, Model model) {
        SimpleCompound compound = compoundService.getById(compoundId);
        model.addAttribute("compound", compound);
        return "compound";
    }
}

@Controller marks the class as controller and hence spring can detect it using component scanning.

@RequestMapping(value = "/compound") on the class level tells this controller handles all request to that url. Note that it is relative to the url specified in the DispatcherServlet in web.xml. In this case this would mean myApp/compound.

@Autowired injects a service loaded in ApplicationContext.xml. This service loads data from the database.

@RequestMapping(value = "/{compoundId}", method = RequestMethod.GET) tells that the annotated method ist repsonsible for handling all request to myApp/compound/{compoundId}. So the value is relative to the mapping of the controller class (if specified). @PathVariable will then take {compoundId} from the URL and pass it into the method in which it is used to fetch the compound wit the given id form the database.

The the compound is added to the model and we return the name of the view to render. This return value is handled by InternalResourceViewResolver configured in mvc-config.xml. The resolver appends the file ending specified (.jsp) and adds the path to the views directory. The actual request will then be for myApp//WEB-INF/jsp/compound.jsp and that jsp page will be rendered.

View

The view is a very simple jsp page. It shows how you can then access the Model in your web page using expressions.

<%@page language="java" contentType="text/html" pageEncoding="UTF-8"%>
<%@ taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c" %>
<%@ taglib uri="http://java.sun.com/jsp/jstl/fmt" prefix="fmt" %>
<%@ taglib prefix="s" uri="http://www.springframework.org/tags" %>
<!DOCTYPE html>

<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
        <title>Compound ${compound.getId()}</title>
    </head>
    <body>
        <div><label>Compound Name:</label><label>${compound.getCompoundName()}</label></div>
    </body>
</html>

Written by kienerj

May 24, 2013 at 12:10

Posted in Java, Programming

Tagged with ,

Creating a Framework for Chemical Structure Search – Part 8

leave a comment »

Series Overview

This is Part 8 – Spring Security Integration of the “Creating a Framework for Chemical Structure Search“-Series.

Previous posts:

Follow-ups:

Introduction

MoleculeDatabaseFramework is integrated with Spring-Security. It offers optional method level security in the service layer. This article will explain how it works and how to configure your application to use Spring-Security.

Annotation-Based

MoleculeDatabaseFramework has been integrated with Spring-Security using annotations. This means as long as you do not enable security in you Application Context, everything will work just fine without any security. Security is applied to methods in the Service interfaces.

The security integrations allows you to limit a user to certain types of entities, eg. one user can only read SimpleCompound while his supervisor also has access to SecretCompound. You can also assign roles that allows a user to update and/or delete compounds he created himself. And of course roles that allow to update or delete any compound of a given implementation.

The security integration uses the @PreAuthorize annotation. The value of this annotation is written in SpEL – Spring Expression Language. MoleculeDatabaseFramework uses the expressions hasRole or hasPermission. hasRole directly checks if the current user has the according role (called authority in Spring Security) and if yes grants access to the method or else throws an AccessDeniedException. hasPermission requires an implementation of org.springframework.security.access.PermissionEvaluator which has 2 overloaded methods public boolean hasPermission();. This is used on the save(entity) methods to determine if the given entity can be created or updated by the current user.

See the Spring Security Method Expression Documentation for more information on this topic.

Conventions

To make use of security you need to follow certain conventions. There are 6 basic “role-types”: create, read, update, delete, update_created and delete_created. A complete role is a “role-type” followed by and underscore and the entity class implementations SimpleClassName. So if you have an entity RegistrationCompound and you want a user to be able to read RegistrationCompounds he needs the role read_RegistrationCompound. And so forth.

  • A role starts with a “role-type” followed by “_” and the simple class name: read_RegistrationCompound
  • A user can either read none or all entries for a given entity implementation.
  • A service interface method that requires read-role must be annotated with @PreAuthorize("hasRole(#root.this.getReadRole())")

Above applies to ChemicalCompound, Containable and ChemicalCompoundContainer. For ChemicalStructure there is always only the supplied entity ChemicalStructure which a user of the framework should not extend. ChemicalStructure only has a save-Role (instead of save-Permission) and any user that can create or update any type of ChemicalCompound must have this role save_ChemicalStructure. Or in code:


public interface ChemicalCompoundService<T extends ChemicalCompound>
		extends Service<T> {

	//...snipped...
		
	@Transactional(readOnly = false)
	@PreAuthorize("hasPermission(#compound, 'save_' + #root.this.getCompoundClassSimpleName())")
	@Override
	T save(T compound);
	
	//...snipped...
}

public interface ChemicalStructureService<T extends ChemicalStructure>
		extends Service<T> {	
		
	//...snipped...
	
	@Transactional(readOnly = false)
	@PreAuthorize("hasRole('save_ChemicalStructure')")
	@Override
	T save(T structure);
	
	//...snipped...
}
  • For managing Users and Roles you need to use the supplied entities User and Role and their services.

User implements UserDetails from Spring Security and UserService extends UserDetailsService. To create, update or delete a User or a Role you need the role manage_User or manage_Role respectively.

Security Behaviour

MoleculeDatabaseFramework ships with a PermissionEvaluator implementation. This PermissionEvaluator checks if a given user can create, update or delete a given domain object. (Note: For read-methods just having the read-role is enough; they use hasRole instead of hasPermission in @PreAuthorize). The supplied DefaultPermissionEvaluator internally uses Permission objects to determine a users permissions.

The supplied PermissionEvaluator allows users with create, update or delete role to perform that action on any domain object (of the given implementation). Users with update_created or delete_created role can perform that action only on domain objects they created (getCreatedBy().equals(loggedInUserName).

Services only offer a save(entity) method. If a domain object is being created or being updated is determined whether its id is set or not (id == null -> create). This is exactly the same what hibernate does.

In your application you should only create exactly 1 implementation of ChemicalCompoundContainer. However this Container can hold any type of Containable and hence any type of ChemicalCompound. To ensure that a user only sees Containers that contain a ChemicalCompound he has the read-role for, all other Containers are filtered out. This includes the count() service method. So users with different privileges on ChemicalCompounds see different numbers of Containers.

The supplied PermissionEvaluator requires a RoleHierarchy bean to be configured in the security context. RoleHierarchy is very useful as it automatically assigns a “lower” privilege to someone with a “higher” one. So you say

create_RegistrationCompound > read_RegistrationCompound

then anyone with create_RegistrationCompound role automatically also has role read_RegistrationCompound.

Cascading of Persist and Merge

ChemicalCompound, Containable and ChemicalCompoundContainer in their JPA relationships between each other all use CascadeType.PERSIST and CascadeType.REFRESH. This means changes (updates) to existing entities must always be done using that entities service.save(entity) method because CascadeType.MERGE (update) is not set and hence updates are not cascaded.

In case of creating a new entity, the Permission implementations check if the current user has the privilege to also create the associated, new entities. If you create a new ChemicalCompoundContainer that contains a new Containable which is made of a new ChemicalCompound, the ContainerPermission will verify if the current user has the privilege to not only create the new ChemicalCompoundContainer but also to create the new ChemicalCompound and new Containable. If this is not the case, an AccessDeniedException is thrown.

Below example will create a new RegistrationCompound, a new Batch and a new CompoundContainer if the current user has the privileges create_RegistrationCompound, create_Batch and create_CompoundContainer.

RegistrationCompound regCompound = new RegistrationCompound();
regCompound.setCompoundName("Registration Compound");
regCompound.setCas(cas);
regCompound.setRegNumber(regNumber);

ChemicalStructure structure =
        chemicalStructureFactory.createChemicalStructure(structureData);

ChemicalCompoundComposition composition = new ChemicalCompoundComposition();
composition.setCompound(regCompound);
composition.setChemicalStructure(structure);
composition.setPercentage(100.0);

regCompound.getCompositions().add(composition);

Batch batch = new Batch(regCompound, batchNumber);

regCompound.getBatches().add(batch);

CompoundContainer container = new CompoundContainer("C00001", batch);
container = compoundContainerService.save(container);

In case an associated entity already exists, create-privilege is not required, if the entity that is being persisted is the “parent” in the hierarchy. Or said otherwise you can associate a new ChemicalCompoundContainer with an existing Containable but you can not associate a new Containable with an existing ChemicalCompoundContainer because that Container could not exist before the Containable was created. Same logic for relationship between Containable and ChemicalCompound.

Customize Security

To do so, you need to implement your own PermissionEvaluator and / or Permission implementations.

Configuration Example

Below the SecurityContext.xml used for testing security in MoleculeDatabaseFramework.

<?xml version="1.0" encoding="UTF-8"?>

<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:security="http://www.springframework.org/schema/security"
       xmlns:util="http://www.springframework.org/schema/util"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.1.xsd
             http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util-3.1.xsd
             http://www.springframework.org/schema/security http://www.springframework.org/schema/security/spring-security-3.1.xsd">

    <!-- User Service - In a real application this should use database
   and the according UserService and RoleService.  -->
    <security:user-service id="userService">
        <security:user name="user" password="password" authorities="read_TestCompound,
            read_TestContainable, create_TestCompoundContainer, create_RegistrationCompound, create_Batch"/>
        <security:user name="creator" password="password" authorities="create_TestCompound, create_TestContainable"/>
        <security:user name="owner" password="password" authorities="update_created_TestCompound, update_created_TestContainable, delete_created_TestCompound, delete_created_TestContainable, delete_created_TestCompoundContainer"/>
        <security:user name="editor" password="password" authorities="update_TestCompound, update_TestCompoundContainer, read_TestContainable"/>
        <security:user name="admin" password="admin" authorities="admin_TestCompound, admin_TestContainable, admin_TestCompoundContainer"/>
    </security:user-service>

    <!--    <bean id="userService"
        class="org.bitbucket.kienerj.moleculedatabaseframework.service.UserServiceImpl">
    </bean>-->

    <!-- Use a RoleHierarchy and a PermissionEvaluator in SpEL expression in
    @PreAuthorize -->
    <bean id = "methodSecurityExpressionHandler"
          class = "org.springframework.security.access.expression.method.DefaultMethodSecurityExpressionHandler">
        <property name="roleHierarchy" ref="roleHierarchy"/>
        <property name="permissionEvaluator" ref="permissionEvaluator"/>
    </bean>

    <!-- Role Hierachy - Probably should use database for this too in real app -->
    <bean id="roleHierarchy"
          class="org.springframework.security.access.hierarchicalroles.RoleHierarchyImpl">
        <property name="hierarchy">
            <value>
                admin_TestCompound > update_TestCompound
                update_TestCompound > update_created_TestCompound
                update_created_TestCompound > create_TestCompound
                admin_TestCompound > create_TestCompound
                create_TestCompound > read_TestCompound
                create_TestCompound > save_ChemicalStructure
                admin_TestCompound > delete_TestCompound
                delete_TestCompound > delete_created_TestCompound
                update_RegistrationCompound > create_RegistrationCompound
                create_RegistrationCompound > read_RegistrationCompound
                create_RegistrationCompound > save_ChemicalStructure
                admin_TestContainable > update_TestContainable
                admin_TestContainable > delete_TestContainable
                update_TestContainable > create_TestContainable
                create_TestContainable > read_TestContainable
                update_created_TestContainable > create_TestContainable
                admin_TestCompoundContainer > update_TestCompoundContainer
                admin_TestCompoundContainer > delete_TestCompoundContainer
                delete_created_TestCompoundContainer > create_TestCompoundContainer
                update_TestCompoundContainer > create_TestCompoundContainer
                create_TestCompoundContainer > read_TestCompoundContainer
            </value>
        </property>
    </bean>

    <!-- Permission Evaluator supplied by framework. The constructor takes a Map
    of SimpleClassName -> PermissionImplementation associations-->
    <bean id="permissionEvaluator"
         class="org.bitbucket.kienerj.moleculedatabaseframework.security.DefaultPermissionEvaluator">
        <constructor-arg index="0">
            <map key-type="java.lang.String"
                 value-type="org.bitbucket.kienerj.moleculedatabaseframework.security.Permission">
                <entry key="TestCompound" value-ref="chemicalCompoundPermission"/>
                <entry key="RegistrationCompound" value-ref="chemicalCompoundPermission"/>
                <entry key="TestContainable" value-ref="containablePermission"/>
                <entry key="TestCompoundContainer" value-ref="chemicalCompoundContainerPermission"/>
                <entry key="Batch" value-ref="containablePermission"/>
            </map>
        </constructor-arg>
    </bean>

    <!-- Permission implementations uses -->
    <bean id="chemicalCompoundPermission"
          class="org.bitbucket.kienerj.moleculedatabaseframework.security.ChemicalCompoundPermission">
    </bean>
    <bean id="containablePermission"
          class="org.bitbucket.kienerj.moleculedatabaseframework.security.ContainablePermission">
    </bean>
    <bean id="chemicalCompoundContainerPermission"
          class="org.bitbucket.kienerj.moleculedatabaseframework.security.ChemicalCompoundContainerPermission">
    </bean>


    <security:authentication-manager alias="testAuthenticationManager">
        <security:authentication-provider user-service-ref="userService"/>
    </security:authentication-manager>

    <!-- enable annotations and set expression handler to use-->
    <security:global-method-security pre-post-annotations="enabled">
        <security:expression-handler ref = "methodSecurityExpressionHandler"/>
    </security:global-method-security>
</beans>

Written by kienerj

May 22, 2013 at 08:37

Posted in Chemistry, Java, Programming

Tagged with ,

Creating a Framework for Chemical Structure Search – Part 7

leave a comment »

Series Overview

This is Part 7 – Service Layer of the “Creating a Framework for Chemical Structure Search“-Series.

Previous posts:

Follow-ups:

Introduction

In this article I will introduce the service layer of MoleculeDatabaseFramework. The service layer is responsible for transaction support and security.

Service for ChemicalStructure Entity

This service manages entities of type ChemicalStructure. This service is provided by the framework and should be used as-is. Any access of ChemicalStructures should be through this service. ChemicalStructureService contains the logic that make ChemicalStructures “immutable”. Quote from Part 5 of the series:

A ChemicalStructure is unique and immutable and managed by the framework. Users operate on ChemicalCompounds and not ChemicalStructures directly. Unique means if a new ChemicalCompound is saved, the framework checks if the ChemicalStructures in it already exist and if yes re-uses them. Immutable means that if a ChemicalCompound is updated and one of the ChemicalStructures has changed the framework will automatically check if the updated ChemicalStructure already exist and use it or create a new ChemicalStructure. The old one will remain unchanged!

In case a ChemicalStructure actually needs to be updated and the change should affect all ChemicalCompounds containing it the service has a separate method updateExistingStructure(structure) which requires the current user to have the role update_ChemicalStructure when Spring Security is enabled. In general this method should be restricted to Admins.

Below the source code for the two mentioned methods. Note that for brevity argument checks and logging statements were removed:

public T save(T structure) {
       
	T result = structureRepository.findByStructureKey(structure.getStructureKey());

	if (result == null) {
		// clear id (and createdBy and created) so that a new row is inserted
		structure.reset();

		return structureRepository.save(structure);
	} else {
		return structureRepository.save(result);
	}
}

@Override
public T updateExistingStructure(T structure){

	T result = structureRepository.findOne(structure.getId());

	if (result == null){
		throw new IllegalArgumentException("Given ChemicalStructure does not exist. It can't be updated.");
	}        
	return structureRepository.save(structure);
}

Services for ChemicalCompound and Containable

The services for ChemicalCompound and Containable are very similar. They consist of an interface which contains the security annotations and an implementation containing the @Transactional annotations for declarative transactions. They also offer methods with optional loading of lazy collections.

When a entity is saved, the service automatically sets the correct ChemicalStructures by either selecting an existing one from the database or creating a new one.
Also before the entity is passed to Hibernate, the method preSave(entity); is executed. This method is empty in the provided abstract services like ChemicalCompoundServiceImpl but can be overridden in subclasses. As example one could set all non-nullable properties to a default or a sequence value if it is null.

Below the source code of the save(entity) method of ChemicalCompoundServiceImpl for clarification:

public final T save(T compound) {
	logger.entry(compound);
	Preconditions.checkNotNull(compound);
	preSave(compound);
	if (compound.getCompositions() != null) {
		for (ChemicalCompoundComposition composition : compound.getCompositions()) {
			ChemicalStructure structure = composition.getChemicalStructure();
			ChemicalStructure result = chemicalStructureService.save(structure);
			composition.setChemicalStructure(result);
		}
	}
	//...snipped...
}

You need to create a service interface and an implementation for every ChemicalCompound and Containable implementation the application uses. The services must implement the getRepository(), checkUniqueness() and getExistingCompound() methods.

getRepository() must return the Spring-Data repository responsible for saving the entity of the same type as the current service is for.

checkUniqueness() must return if an entity is unique (does not violate any unique constraints) and if it is not unique it must return a Mapping of the violated constraint(all field names of constraint) and the offending value.

getExistingCompound() or getExistingContainable() respectively is called during import of SD-Files. The method must use the data from the SD-File (SdfRecord) to check if the current compound that is being imported already exists in the database. If it already exists it must return it, else it must return null.

Below an Example of such a Service implementation for RegistrationCompound entity:

RegistrationCompoundService UML

Note that Java Generics are not shown in above diagram hence in source code:

public interface RegistrationCompoundService extends ChemicalCompoundService<RegistrationCompound> {
	//...snipped...
}

public class RegistrationCompoundServiceImpl extends ChemicalCompoundServiceImpl<RegistrationCompound>
        implements RegistrationCompoundService {
	//...snipped...
}

Please see the MoleculeDatabaseFramework Tutorial for further information on how to implement such services.

ChemicalCompoundContainerService

This service manages entities of type ChemicalCompoundContainer. The main difference to services for ChemicalCompound and Containable is that an application should only have 1 implementation of ChemicalCompoundContainer and hence only 1 such service. The service can be used out-of-the-box or in some cases must be extended as example when the ChemicalCompoundContainer implementation adds additional unique constraints, checkUniqueness() and getExistingContainer() must be overridden.

Also ChemicalCompoundContainerService has some special considerations in terms of the Spring-Security integration.

Please see the MoleculeDatabaseFramework Tutorial for further information on how to implement such a service.

Written by kienerj

May 8, 2013 at 08:06

Follow

Get every new post delivered to your Inbox.