Q. What do you understand by batch processing and why do you need them? A. The idea behind batch processing is to allow a program to run without the need for human intervention, usually scheduled to run periodically at a certain time or every x minutes.The batch process solves
|
Q. What libraries do you need to get started with spring batch
A. The pom.xml file will be the start. Fill in the appropriate versions and the additional dependencies required.
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.myapp</groupId>
<artifactId>mybatchapp</artifactId>
<packaging>jar</packaging>
<version>1.0-SNAPSHOT</version>
<name>mybatchapp</name>
<url>http://maven.apache.org</url>
<dependencies>
<dependency>
<groupId>org.springframework.batch</groupId>
<artifactId>spring-batch-core</artifactId>
<version>....</version>
</dependency>
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>....</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring</artifactId>
<version>...</version>
</dependency>
...other dependencies like hibernate
</dependencies>
...
</project>
Q. What are the key terms of Spring batch?
A.
- JobLauncher: It helps you to launch a job. It uses a JobRepository to obtain a valid JobExecution.
- JobRepository: A persistent store for all the job meta-data information. Stores JobInstances, JobExecutions, and StepExecutions information to a database or a file system. The repository is required because a job could be a rerun of a previously failed job.
- Job: Represents a job. For example, an portfolio processing job that calculates the portfolio value. JobInstance: A running instance of a job. If a job runs every week night, then there will 5 instances of that job in a week.
- JobParameters: Parameters that are used by the JobInstance. For example, the portfolio processing job requires the list of account numbers as the input parameters. The job parameters are passed as command line arguments.
- JobExecution: Every attempt to run a JobInstance results in a JobExecution. If a job fails then it will be rerun and we end up having a single JobInstance with 2 JobExecutions (1 failed execution and 1 successful execution).
- Step: Each job is made up of one or more steps.
- StepExecution: Similar to JobExecution, there are StepExecutions. This represents an attempt to run a Step in a Job.
Q. Can you describe the key steps involved in a typical batch job scenario you had worked on?
A. A simplified batch process scenario is explained below.
The scenario is basically a batch job that runs overnight to go through all the accounts from the accounts table and calculates the available_cash by subtracting debit from the credit.
Step 1: Define a batch control table that keeps metdata about the batch job. The example below shows a batch_control table that process accounts in chunks. The account numbers 000 - 999 ara processed by a job and account_no 1000 - 1999 by another job. This table also holds information about when the job started, when the job finished, status (i.e. COMPLETED or FAILED), last processed account_no, etc.
job_id | job_name | start_timestamp | end_timestamp | status | account_ no_from | account_ no_to | last_ account_no |
1 | accountValueUpdateJob1 | 21/04/2012 3:49:11.053 AM | 21/04/2012 3:57:55.480 AM | COMPLETED | 000 | 999 | 845 |
2 | accountValueUpdateJob2 | 21/04/2012 3:49:11.053 AM | 21/04/2012 3:57:55.480 AM | FAILED | 1000 | 1999 | 1200 |
Step 2: To keep it simple, a single data table is used. You need to define a job and its steps. A job can have more than one steps. Each step can have a reader, processor, and a writer. An ItemReader will read data from the accounts table shown below for account numbers between account_from and account_to read from the batch_control table shown above. An ItemProcessor will calculate the availbale_cash and an ItemWriter will update the available_cash on the accounts table.
account_no | account_name | debit | credit | available_cash |
001 | John Smith | 200.00 | 4000.0 | 0.0 |
1199 | Peter Smith | 55000.50 | 787.25 | 0.0 |
Step 3: Once the batch job is completed, the batch_contol table will be updated accordingly.
Step 4: Listeners can be used to process errors (e.g. onProcessError(....), onReadError(....), etc) and other pre and post item events like beforeRead(...), afterRead(...), etc. The spring-batch framework make use of the configuration xml file, for example batch-context.xml and the Java classes annotated with @Component to wire up the components and implement the logic.
Step 5: The batch job can be executed via a shell or batch script that invokes the spring-batch framework as shown below. The CommandLineJobRunner is the Spring class that initiates the job by wiring up the relevant components, listeners, daos, etc via the configuration file batch-context.xml. The job parameter that is passed is "accountValueUpdateJob1", which is used to retrieve the relevant job metatdata from the job control table.
my_job_run.sh accountValueUpdateJob1 accountValueUpdateJob1.log
$0 $1 $2
The my_job_run.sh looks something like
...Step 6: The shell script (e.g. my_job_run.sh) or batch file will be invoked by a job scheduler like quartz or
JOB_CLASS=org.springframework.batch.core.launch.support.CommandLineJobRunner
APPCONTEXT=batch-context.xml
SPRING_JOB=availableBalanceJob
CLASSPATH=$JAVA_HOME/bin:.......................
JOB_TO_RUN=$1
...
# the jobParameter is jobName. It is passed via job script argument <jobNameToRun>.
$JAVA_HOME/bin/java -classpath ${CLASSPATH} ${JOB_CLASS} ${APPCONTEXT} ${SPRING_JOB} jobName=${JOB_TO_RUN}"
Unix cron job at a particular time without any human intervention.
This is basically the big picture. The wiring up of spring-batch will be explained in a different post.
- Spring batch part -2 - wiring up the components
- Spring batch part 3 -- wiring reader, processor, and writer
- Spring batch advanced tutorial -- writing your own reader