System redundancy comes of age for credit unions

When's a good time to bring your Internet banking down for a routine software upgrade? How about halting audio response to move the system to a new server? Or those unpredictable but inevitable outages – like a disk crash or CPU failure. When's a good time for those events to happen? It doesn't matter whether you answer as a consumer of financial services or as a credit union employee; the reply to each of those questions is going to be "never." There's simply zero tolerance for downtime today – in large part because your members increasingly rely on remote service channels, like e-commerce, with a direct promise of 24/7/365 service. And most of those channels don't offer the kind of "stand-in" processing we're used to with ATM and credit/debit card networks. Downtime can have a significant impact on member and staff morale, as well as your bottom line. The effects can include staff overtime to restore files and manually enter transactions, field phone calls, and respond to members' concerns and complaints. Each manual correction increases the error factor, while the lack of a traditional paper trail for electronic services makes the re-creation process almost impossible. All the while, the time and costs grow exponentially – and member and staff confidence is eroded. Whether scheduled or unexpected, brief or extended, downtime isn't an acceptable occurrence anymore. This fact demands new strategies for increasing system uptime and reliability beyond the traditional "disaster recovery" solutions – the goal of system redundancy. Historically, credit unions have viewed system redundancy as the domain of the largest institutions and networks (think $$$). But technology advances are making system redundancy an affordable goal for many credit unions. There's a wide spectrum of possibilities, each with a corresponding level of protection, effort and cost. The following are a few options that any credit union should be investigating. It's noontime on a Friday and every delivery channel is operating at full tilt. Suddenly, your main system experiences a disk failure, corrupting your main database. In a case like this, the practice of journaling can help you restore your data and resume operations quickly. Journaling is a technique for capturing every database update to a backup file. Much like running a tape recorder, journaling tracks every change made to your database as it happens, enabling you to recreate those occurrences – and avoid manually re-keying transactions. Journaling is normally captured to a different disk drive than the database it is "recording," then backed up daily and stored off site. If the database becomes corrupted due to a hardware or software failure, journaling allows the IT staff to restore from an earlier backup, then reproduce the intervening activity up to the exact point of the failure – all automatically. It's one of the easiest and most basic steps toward a redundant system; so if your credit union isn't using it or looking into it, it's time to get started. Even with journaling, the process of restoring and recreating the database from backups and journal files can take some time. While you're frantically trying to correct the problem on your main system, it would be ideal if you could simply move your transaction processing over to another system and continue serving members. That's the purpose of database replication. Database replication enables you to replicate your entire database to a second hardware "box" – one you can turn to if your primary hardware is unusable. The data replication process takes place in real-time, giving you a second, perfectly up-to-the-minute database ready for use. While it may take a brief period to move processing to the secondary system, the potential interruption in service is far less than might occur if members had to wait for the primary system to be restored. Database replication can also provide a mechanism to continue critical processing, such as e-commerce services, by using the secondary system to stand in temporarily while the primary system is being maintained or serviced. Further along the spectrum is the process of clustering. Like database replication, it involves the use of a secondary system. The difference is that the two systems operate in a clustered environment in which each has the ability to "fail-over" to the other – making the transition instantaneous and virtually transparent to members and staff. In simple terms, systems A and B share the transaction processing load within a cluster. If system A goes down, system B automatically takes over. Once system A is back up, the processing load is again shared between the two. Beyond providing seamless recovery from unexpected outages, clustering opens up tremendous possibilities for scheduled downtime – those inevitable times when you need to perform routine maintenance or implement a hardware or software upgrade. The more complex technology becomes, the more time these tasks take and the less likely they can be handled solely off-hours (which is becoming an obsolete term anyway). Clustering instead allows you to perform a "rolling upgrade" – upgrading each machine in the cluster while keeping the system up and running and available for members. No matter which system redundancy initiatives you explore, remember that no option will be effective without the infrastructure and processes to support it. A low speed communications link between your branches won't allow you to place your primary and replicated databases in two different locations; while the lack of comprehensive backups and off-site storage policies can thwart even the best-laid plans. As with any technology effort, be sure the organization is prepared to invest the capital and resources required for the system redundancy efforts you choose to implement. Keep in mind, too, that system redundancy typically addresses "predictable" system outages. Recent events have reminded us of the equally important need for traditional disaster preparedness. A major disaster will likely trigger a need for additional staff and equipment, highly trained technical assistance, even replacement of major components of your facilities and systems. Only a full-scale disaster recovery strategy will allow you to successfully survive such a situation, whether man-made or natural. Combating downtime remains a two-fold challenge: to avoid the outages you can and to minimize the effects of those you can't. Today's range of system redundancy options enables every credit union to begin tackling both.

Continue Reading for Free

Breaking benefits news and analysis, on-site and via our newsletters and custom alerts
Educational webcasts, white papers, and ebooks from industry thought leaders
Critical converage of the property casualty insurance and financial advisory markets on our other ALM sites, PropertyCasualty360 and ThinkAdvisor

NOT FOR REPRINT