Challenges and Best Practices for Financial Infrastructure

Introduction

IT infrastructure for financial institutions is becoming increasingly complex.  Modern banking infrastructure combines mainframe applications, monolithic applications based on Service Oriented Architecture, relational databases, and big data platforms.  This complexity has led to “brittleness”.  Under certain conditions, a failure in a subsystem of the infrastructure can lead to catastrophic failure of an entire business function such as clearing and settlement (see below).

BNY Lost Payments Capability for 19 Hours

BNY Lost Payments Capability for 19 Hours

In addition to lacking resiliency, these systems lack robustness.  Robustness in the face of resiliency means that services degrade gracefully as subsystems become unavailable in a manner that is consistent with business needs.  Finally, disaster recovery mechanisms must be put in place that allow full capabilities to be restored in a timely manner after subsystems have failed.

The key question is: What best practices can be employed to ensure greater resiliency and robustness for IT infrastructure?

Problem Description

System is Heterogeneous

Financial institutions today use a combination of mainframes, monolithic applications, relational databases, microservices, and big data platforms.  Communication between these subsystems is done across a variety of different channels.  These include mainframe communication fabrics, enterprise service buses, lightweight message queues, and integration middleware.  Each of these have different ways of storing and transmitting state changes, transforming data, maintaining consistency, and queuing transactions and messages.

This makes it difficult to debug and restore systems when there are problems.  There may be no central repository for debug information.  Subsystems may have unknown dependencies.  Messages and transactions lost in-flight may not be recoverable in the event of system failure.

Legacy Systems

Banks and other institutions employ a wide range of infrastructure, some of it legacy in nature.  Mainframes, in particular, have maintained backwards compatibility from generation to generation.  This implies that some code may still be used in production after having been written fifty years ago!  This code may be poorly documented, as well as difficult and expensive to re-engineer.

Centralized Databases

Many different applications and services may all be storing and retrieving state information from a single centralized relational database.  This can cause a single point of failure and coupling between subsystems.  The use of centralized databases are often mandated to reduce licensing and administration costs.  However, in the era of FoSS (free and open source software), this restriction is no longer necessary.

Architectures Not Aligned to Business

The SOA, or Service-Oriented Architecture, decomposes enterprise software into business processes, services, service components, and operational systems.  SOA architecture was designed to maximize re-use of software and hardware components.  This design was driven by a desire to minimize software licensing costs (such as those for commercial relational databases and operating systems), and maximize hardware utilization.  However, this created some undesirable consequences.  If a business requirement changed, it would impact a large number of layers and components.  The architecture makes it difficult to optimize components for each business, since they are shared.  There are also problems caused by a misalignment of ownership between lines of business and projects for creating and maintaining various services and components.

Best Practices

Centralized Logging

All services and subsystems should subscribe to a central logging facility for debugging and monitoring purposes.  This makes information available in a central location for analysis.  Modern logging platforms allow for streaming and batch processing of data, and extensive analytics to be performed on log data across data sources.

Correlation IDs

Correlation IDs are identifiers that are passed between processes, programs, and subsystems in order to trace dependencies in the system.  This design pattern is particularly important in microservices architectures where a business activity may be carried out by hundreds of microservices, and other applications, acting in concert.  Centralized logs can be searched for specific correlation IDs to debug specific errors, and diagnose overall system behavior.

Bounded Context

As mentioned above, using centralized databases to store state information can create a single point of failure in the system.  The microservices architecture dictates that context be bounded to each microservice.  This means that each microservice is responsible for maintaining its own state.  This shifts responsibility from a centralized, shared DBA team to the team delivering the microservices themselves.  Bounded contexts reduce coupling between services, making systemic failures less likely.

Employ Domain Driven Design

As mentioned above, one of the main weaknesses of the SOA was difficulty in adapting services to needs that are specific to certain businesses.  Adoptees of microservices architectures are attempting to change that by recognizing the importance of domain driven design in best practices.  DDD should be used to determine how best to partition services along business lines.  In addition, DDD can drive definition of what behavior should be exhibited in the event of subsystem failure or degradation in performance.  For instance, if an AML (anti-money laundering) service fails to respond, perhaps a manual approval user interface should be presented to administrators.  It is important to keep in mind that failures can be partial, can cascade to other applications and services, and may only show up when a service is interacting with other parts of the system.  Resiliency and disaster-recovery requirements cannot come purely from a technical understanding of the system.  These requirements must be driven by business requirements from the domain.

Reengineer Legacy Subsystems as Appropriate

Legacy code and systems are often portrayed as the immovable object of IT.  Rather than assuming that legacy code cannot be changed or replaced, changes should be prioritized based on business requirements.  Legacy programs may have static routing, have inadequate logging, or may have bugs that can put business continuity at risk.  If legacy code shows any of these weaknesses, and is a high priority to fix given business considerations, it may be warranted to migrate them to a new architecture or fix the bugs in the current program.

 

Posted in Blog | Tagged , , , , | Comments Off on Challenges and Best Practices for Financial Infrastructure

OPScore Whitepaper

This is a technical whitepaper I authored while at Ubicom. In addition to authoring the paper, I designed the benchmark, ran the tests, and did graphic design and layout for the paper. Tools used were Ixia IxChariot, MS Excel, MS Word, and other tools.
OPScore Whitepaper

Posted in Portfolio | Tagged , | Comments Off on OPScore Whitepaper

Scrum for Software Globalization

Diagram of Software Globalization using Scrum
Are you learning about Agile software methods, but aren’t sure how to apply them to global software? This article explains how to do globalization in an organization using Scrum as a methodology.

The primary activities of software globalization are internationalization, localization, and testing.  We will describe how each activity maps to the Scrum process.

Software Internationalization (I18n)

Software internationalization is the process of architecting and writing software so it will function properly in multiple countries.  It involves designing software in a modular fashion so new countries can be easily supported by swapping out language packs and software libraries, rather than rewriting lots of code.  This is a primary engineering task, and is well-suited to being done with Scrum.  Requirements from international customers tend to change rapidly, so using Scrum to address these requirements in an agile fashion is a great idea.

Functional Testing

This is functional testing that is specific to the internationalization process.  This verifies that generic features can be used by international customers, and also verifies features that are specific to certain international market segments.
Continue reading

Posted in Blog | Tagged , , , | Comments Off on Scrum for Software Globalization

Globalization Process

This is a diagram describing the software globalization process. This was developed for Aeontera, Inc. Developed in Adobe InDesign CS4. Click to download a PDF version.

Posted in Portfolio | Tagged , , | Comments Off on Globalization Process

Wireless QoS Video

Video promoting Ubicom’s QoS technology for streaming media over a wireless network. I produced this using Adobe Premier, Photoshop, and Cubase SX3. It even includes an original music track!

Posted in Portfolio | Tagged , , , | Comments Off on Wireless QoS Video

3G Subscriber Data

Data on 3G subscribers is from Mary Meeker’s ‘Internet Trends 2010’ presentation from Morgan Stanley. Created using Tableau Public visualization software.

What can we learn from these charts?   First, let’s look at ARPU growth.  It seems there is fairly broad pressure on ARPU across the board.  The big players, however, are holding their own around 0% change in ARPU YoY.  The highest ARPU is concentrated among US and European service providers.  Note that AT&T’s wireline business is listed separately from the wireless business.  The greatest drop in ARPU last year was felt by the smaller regional players in Asia and India.

Now let’s look at market cap for service providers versus blended ARPU and number of subscribers.  It is interesting to see that firms with the highest market cap are placed along a line that maximizes either ARPU or number of subscribers.  The small players in the previous chart likewise show up with low market cap, ARPU, and subscribers.  They clearly have a long way to go to reach the profitable horizon of the big players.

Posted in Blog | Tagged , , , , , | Comments Off on 3G Subscriber Data

Global Product Management Begins at Home

picture of homeWhy is this important? Adapting a product for international markets requires checking your assumptions about the current product definition. If you know why you are doing something today for your current market, it will be easier to check if that will still be true in the new market. This way, the internationalization team will be able to adapt the existing product to the new market in a systematic way. Having a process for internationalizing a product saves both time and cost.

Segmentation

How do you define your current market segments? How do you group customers? By industry, sector, geography, job title, age? What are the unique challenges faced by each segment?

Use Cases

A use case is a specific way that customers get value from your product. Why do your current customers use your product? What problems are they trying to solve? Key use cases should be fully documented, including steps the customer takes to complete the use case. Many use cases are specific to a particular segment.
Continue reading

Posted in Blog | Tagged , | Comments Off on Global Product Management Begins at Home

The Science of Presentations

Posted in Portfolio | Tagged , , , , , , | Comments Off on The Science of Presentations

17 Design Principles for Presenters

thumbnail for design patterns for presentations

Posted in Portfolio | Tagged , , , , | Comments Off on 17 Design Principles for Presenters

The Power of Rigorous Thinking

There are only two fields where it is legitimate to prove that something is true: law andmathematics. True scientific fields can legitimately prove that a categorical statement is not true, but should never attempt to prove a universal positive statement.

Nassim Nicholas Taleb discusses this at great length in his new book The Black Swan.

What is the point of science if it cannot be used to prove things? In The Structure of Scientific Revolutions, Thomas Kuhn argues that the entire concepts of proof and progress are problematic.

What is the point of thinking of things if we cannot prove that our ideas are true? Because ideas are useful. Science seeks not to prove things, but rather to build useful models.Models, such as the idea of the atom, are useful because they correctly predict observations. As we adopt new models and cast aside our old ones, the scope of observations we can predict increase. What matters is not the individual conclusions, but rather the method.

This is also true in the world of business. The received knowledge of market segments, product strategies, business models, etc can be limiting. If we apply some rigor to the problem, we may be able to tease out some insights that were not obvious before.

If we ignore our current assumptions and ask questions like:

  • Why do we group customers together the way we do currently?
  • Are there profitable segments hidden inside of submarkets or segment we have been serving more generically?
  • Could a particular product offering be split or combined with other offering to better address needs?
  • Is there a different distribution method that may be better suited to a submarket, promoting it to a full segment?
Posted in Blog | Tagged , | Comments Off on The Power of Rigorous Thinking