Thursday, October 08, 2009

Large Batch Processing using Hibernate

When using hibernate for the retrieval of large amounts of data, one should remember that memory is limited and that incorrect usage of the hibernate session will cause the first and second level cache to explode in size.

There are two ways to prevent this, either (a) evict the objects once you are finished with them, or (b) use a stateless session. Another point to remember is that you cannot use the list method on the Query because this will retrieve the entire result set.




Statefull Session:

Remember to evict the associated objects from the first level cache once you are finished with them:



Query query = session.getNamedQuery("large");
ScrollableResult result = query
.setCacheMode(CacheMode.IGNORE)
.scroll(ScrollMode.FORWARD_ONLY);

while (result.next()) {
Object[] v = result.get();
Object a = v[0];
Object b = v[1];

… batch process …
Session.evict(a);
Session.evict(b);
}



Stateless Session:

There is no need to evict objects from the first level cache however all lazy relationships will be unavailable.



Query query = session.getSessionFactory()
.openStatelessSession()
.getNamedQuery("large");

ScrollableResult result = query.scroll(ScrollMode.FORWARD_ONLY);
while (result.next()) {
Object v = result.get(0);
… batch process …
}



Another useful feature is to utilise the setProperties method on the query object set appropriate query parameters instead of setting each parameter individually.

Bind using parameters:



Query query = session.getNamedQuery("mydosageQuery");
query.setString("dosageForm", "SLS");
display(query.list());



Bind using setProperties:



query = session.getNamedQuery("mydosageQuery");
query.setProperties(new MyDosageQuery(“SLS”));
display(query.list());




(This is more of a note to self really...)

Wednesday, July 08, 2009

Statements of Truth

Every statement and/or argument must be supported by evidence, and this evidence should be provided.

Without evidence, a statement could be called opinion and your confidence that the statement is true is related to your opinion of the person making the statement.

Furthermore, the fundamentals of the statement may change over time (due to a change in technology, performance, environment, implementation, etc) making the statement's supporting arguments and statements invalid, and therefore invalidating the claimed statement of truth.

Without the original evidence, you will not allow the reader to determine whether the statement is still applicable. This evidence should be sufficiently verbose for a reader, at the same time to reproduce and come to the same conclusion.

Friday, April 10, 2009

Google AppEngine - Maven POM

Google have standardised java hosting and deployment with the release of a scalable, standards based java hosting platform. PHP had this 14 years ago and its about time.

The first question is how do I take advantage?

1. After studying the terms of service it seems as though you do not cede intellectual property rights and that it is quite suitable for commercial use where you would normally use a public hosting solution. (You would obviously not store financial transactions in the cloud and you would not store medical or any data that has legal restrictions on where the data may physically reside) and who may look at it.)

2. I still have to determine the confirm BigTable reliability: Will it ever loose data?

3. The Maven POM - What is the difference between the SDK and the real environment.

Investigation



I provide a maven POM that describes the Java Google App-Engine capabilities based upon the classes available to a deployed application. To determine the classes available, I generated a list of all classes included in the SDK and executed Class.forName(String) within a test application that I deployed to Google App Engine.

The results were interesting and are the foundation of the POM that is provided below.

Repositories


From a maven perspective there are only two significant libraries within the SDK that must be installed into your local repository because the can be obtained publicly.

From the root of the SDK execute the following maven commands:



mvn install:install-file -Dfile=lib/shared/appengine-local-runtime-shared.jar -DgroupId=com.google -DartifactId=appengine-local-runtime-shared -Dversion=1.2.0 -Dpackaging=jar -DgeneratePom=true

mvn install:install-file -Dfile=lib/shared/jsp/commons-logging-1.1.1.jar -DgroupId=com.google -DartifactId=commons-logging-repackage -Dversion=1.1.1 -Dpackaging=jar -DgeneratePom=true



I have the following two repositories specified in my maven settings.



<repositories>
<repository>
<id>datanucleus</id>
<name>Datanucleus Repository</name>
<url>http://www.datanucleus.org/downloads/maven2</url>
</repository>
<repository>
<id>atlassian</id>
<name>Atlassian Repository</name>
<url>https://maven.atlassian.com/repository/centralmirror</url>
</repository>
</repositories>



Maven: Provided POM



The following POM describes the libraries for the "provided" maven scope. Google seem to have an API proxy for their data layer and I have not included those libraries in my analysis. I will provide a more comprehensive POM once I understand the technologies and the dependencies.

For now, this is the pom that best describes the "provided" scope:


<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/maven-v4_0_0.xsd">

<modelVersion>4.0.0</modelVersion>
<groupId>tim.app.test</groupId>
<artifactId>TestLab</artifactId>
<packaging>war</packaging>
<version>1.0.0</version>
<name>Test Lab</name>

<dependencies>
<dependency>
<groupId>com.google</groupId>
<artifactId>appengine-local-runtime-shared</artifactId>
<version>1.2.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.google</groupId>
<artifactId>commons-logging-repackage</artifactId>
<version>1.1.1</version>
<scope>provided</scope>
</dependency>

<!-- the following are available on repositories -->
<dependency>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
<version>1.3</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.geronimo.specs</groupId>
<artifactId>geronimo-jsp_2.1_spec</artifactId>
<version>1.0.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.geronimo.specs</groupId>
<artifactId>geronimo-servlet_2.5_spec</artifactId>
<version>1.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>ant</groupId>
<artifactId>ant</artifactId>
<version>1.6.5</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>ant</groupId>
<artifactId>ant-launcher</artifactId>
<version>1.6.5</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>commons-el</groupId>
<artifactId>commons-el</artifactId>
<version>1.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>tomcat</groupId>
<artifactId>jasper-compiler</artifactId>
<version>5.0.28</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>tomcat</groupId>
<artifactId>jasper-runtime</artifactId>
<version>5.0.28</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>jstl</groupId>
<artifactId>jstl</artifactId>
<version>1.1.2</version>
<type>jar</type>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>taglibs</groupId>
<artifactId>standard</artifactId>
<version>1.1.2</version>
<type>jar</type>
<scope>provided</scope>
</dependency>
</dependencies>

<repositories>
<repository>
<id>datanucleus</id>
<name>Datanucleus Repository</name>
<url>http://www.datanucleus.org/downloads/maven2</url>
</repository>
<repository>
<id>atlassian</id>
<name>Atlassian Repository</name>
<url>https://maven.atlassian.com/repository/centralmirror</url>
</repository>
</repositories>

<build>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.6</source>
<target>1.6</target>
</configuration>
</plugin>
</plugins>
</build>

</project>



Ant: Launch and Deploy



The following build.xml file can be used to launch the local GAE development server and it may also be used to publish the latest version to the production GAE server.

Run the deploy from the commandline at least once as you have to put in your username and password.



<project name="Google App Engine">

<target name="deploy">

<java fork="true"

dir="c:/development/lib/appengine-java-sdk-1.2.0"
classpath="C:/Development/lib/appengine-java-sdk-1.2.0/lib/appengine-tools-api.jar"
classname="com.google.appengine.tools.admin.AppCfg">

<arg value="update"/>
<arg value="C:\Development\General\TimAppTestlab\target\TestLab-1.0.0"/>

</java>
</target>

<target name="launch">

<java
fork="true"
dir="c:/development/lib/appengine-java-sdk-1.2.0"
classpath="C:/Development/lib/appengine-java-sdk-1.2.0/lib/appengine-tools-api.jar"
classname="com.google.appengine.tools.KickStart">

<arg value="com.google.appengine.tools.development.DevAppServerMain"/>
<arg value="C:\Development\General\TimAppTestlab\target\TestLab-1.0.0"/>

</java>
</target>

</project>




Enjoy,

-Tim