Scenario based and open-ended questions like this can reveal a lot about your ability to design systems.
Q.If you have a requirement to generate a report or a feed file with millions of records pulled from the database, how will you go about designing it and what questions will you ask?
A.The questions to ask are:
- How to display or provide the report. For example, online -- synchronously the user expects to see the report on the GUI or off-line -- asynchronously by sending the feed/report via an email or any other notification mechanisms like SFTP after generating the report in a separate thread.
- Should we restrict the online reports for only last 12 months of data to minimize the report size and get better performance, and provide report/feed for data older than 12 months via offline processing.
- Should we generate both online and offline reports asynchronously, and then for the online reports have the browser or GUI client to poll for report completion to display the results on the GUI. Alternatively can be emailed or downloaded via web at a later time.
- What report generation framework to use like Jasper Reports, Open CSV, XSL-FO with Apache FOP, etc depending on the required output formats.
- What is the source of truth for the report data -- database, RESTful web service call, XML, etc?
- How to handle exceptional scenarios -- send an error email, use a monitoring system like Tivoli or Nagios to raise production support tickets, etc?
- Security requirements. Are we sending feed/report with sensitive data via email? Do we need proper access control to restrict who can generate what for inline reports?
- Should we schedule the offline reports to run during off peak?
- Archival and purging of the older reports. What is the report retention period for the requirements relating to auditing and compliance purpose? How big are the feed files and should they be gzipped?
Firstly, using a simple custom solution.
In this solution, a blocking queue and Java multi-threading (i.e an Executor framework) can be used to asynchronously produce a report. Alternatively, you can use asynchronous processing with Spring.
Secondly, an Enterprise Integration Framework
like Apache Camel can be used to create an asynchronous route. The high-level diagram of a possible solution using the Apache Camel. This framework is written to address the Enterprise Integration Patterns (i.e. EIP).
Apache Camel is awesome if you want to integrate several applications with different protocols and technologies.Spring Integration framework is another alternative. There are a number of tutorials on Apache Camel in this blog to get started as it will be a very handy skill to have to solve business problems and convince your potential employers.
Finally, using an Enterprise Service Bus (ESB) like web Methods, Tibco, Oracle Service Bus, Mule, etc. Mule is an open source ESB. There are pros and cons to each approach. More on these topics can be found at
- JMS versus AMQP, Enterprise Integration Patterns (EIP), and Spring Integration versus Apache Camel
- Java interview questions and answers on asynchronous processing
- Asynchronous processing with Apache Camel
Scenarios based questions are very popular with the good interviewers, and really pays to brush up. There are a number of different scenarios based questions and answers.
- Scenarios and solutions for better concurrency - Read and Write locks
- Scenarios and solutions for better concurrency - CountDownLatch and CyclicBarrier
- Scenarios and solutions for better concurrency - Semaphores and mutexes
- Scenario: handling concurrent modification in Java
- Scenario: Asynchronous processing examples