Ensuring IBM i Reliability, Availability, and Recoverability

Post author:Tom Squire
Post published:December 10, 2019
Post category:Blog

In my previous 2 blogs I focused specifically on some of the security concerns surrounding the Monetary Authority of Singapore’s Technology Risk Management Guidelines and their Notice on Cyber Hygiene. As indicated in the TRM Guidelines, security is certainly not the only concern of MAS. Let me list a number of the headings from the June 2013 document to highlight my point:

- - 6.3 Source Code Review
  - 7.1 Change Management
  - 7.2 Program Migration
  - 7.5 Capacity Management
  - 8. Systems Reliability, Availability, and Recoverability
  - 8.2 Disaster Recovery Plan
  - 8.4 Data Backup Management

Just to stress the all-encompassing focus of the concept of risk, and the reality that it is so much more than security alone, section 1.0.4 of the document reads as follows:

1.0.4 The Technology Risk Management Guidelines (the “Guidelines”) set out risk management principles and best practice standards to guide the FIs in the following:

1. 1. 1. Establishing a sound and robust technology risk management framework;
    2. Strengthening system security, reliability, resiliency, and recoverability; and
    3. Deploying strong authentication to protect customer data, transactions and systems.

I therefore would like to focus this, admittedly rather short, blog on one of these non-security focused sections, namely Section 8. Systems Reliability, Availability, and Recoverability.

Ensuring the reliability of the IT systems in Singapore’s financial services sector is critical to enhancing the confidence of customers in the industry, and thus confidence in Singapore’s focus in maintaining, and growing, itself as a worldwide financial center. To quote the guidelines directly, “When critical systems fail, the disruptive impact on the FI’s operations or customers will usually be severe and widespread and the FI may suffer serious consequences to its reputation.” Hence it is imperative that banks, insurance companies, and other financial institutions have measures in place to ensure the continued reliability of their systems and to have stringent, thoroughly tested, mechanisms in place to guarantee the on-going availability of their systems should something untoward happen to them.

Now, I suspect that most IBM i aficionados would agree that this platform has a very credible reputation when it comes to reliability and resiliency. That said, I also suspect that most such folks would also not blindly bank 100% on that reputation and assume that nothing could ever go wrong with their IBM i systems. Hence, the establishment of backup machines, backup tapes, as well as sophisticated high availability and disaster recovery systems, and not to mention procedures, are critically important even for IBM i operators.

In order to maximise the availability of IBM i applications and data it is necessary to put in place real-time, fault-tolerant, object-level replication solutions, so that, in the event of an outage, you can bring a “warm” mirror of a clustered IBM i system into service within minutes. Such systems should ensure efficient synchronization checking of file attributes, file contents, IFS attributes, object existence, data area contents, and output queues. They must also must mirror changes, additions, or deletions automatically in near real-time to the replicated system – with no need for manual intervention, thus ensuring the highest fidelity between the production and secondary instances.While such systems can be quite an investment, if configured correctly, some of the cost can be absorbed by offloading some critical business tasks, such as running reports and queries as well as ETL, EDI, and web tasks, to your secondary system without affecting primary system performance.

Simply ensuring that your data is reliably backed up to a separate server is only a part of the solution to achieving business continuity on the IBM i. Equally critical is the ability to perform an effective role swap to the secondary instance in a timely manner so as not to bust your Recovery Time Objectives.

Working with one of our business partners, Joule Tech is able to provide our customers with a solution that is specifically tailored to ensure reliable, easy-to-use, high availability and disaster recovery solutions for the IBM i platform. Our solution reduces downtime related to unexpected IBM i system interruptions, with real-time, fault-tolerant, object-level replication. In the event of an outage, companies can bring a “warm” mirror of a clustered IBM i system into service within minutes.

This solution ensures a high-availability environment by giving business applications concurrent access to both master and replicated data. This set-up allows you to perform planned maintenance tasks such as running reports and queries as well as ETL, EDI, and web tasks from your secondary system without affecting primary system performance. Additionally, proactive issue notifications and self-correction for replication problems help your team identify and address potential issues before they affect system performance. The administrative capabilities of the solution help staff easily configure and perform operations such as starting, ending and switching replication groups, saving time and removing some of the risk that can be involved with manual processes.

For more information on how Joule Tech can help meet your HA and DR requirements, and probably in a more cost effective manner, get in touch with us using the form below.

You Might Also Like

Moving your IBM i to the cloud – some things to think about

Encrypting Data At Rest On IBM i

Multi-Factor Authentication and IBM Power i