A Credit Risk Data Warehouse


A Top 10 (by assets) internationally active US bank


With advances in portfolio risk and profitability modeling, and the need for consolidated regulatory reporting, large financial institutions have increasingly turned to enterprise data warehouses to satisfy their requirements for data analysis and reporting.  Data warehouses provide numerous benefits in these scenarios: conformance of multiple sources of data, consistency in data modeling, and data transformation and calculation algorithms that provide data for analytic consumption.  In short, a “single source of the truth” no matter the reporting view or end user.

Our client faced the challenge of a diversified business model, with risk-producing lines of business in retail, commercial credit, global markets, securities processing, custody, asset management, and treasury management, yet no consolidated data environment in which to analyze their risks and report their positions.  They collected credit data from a subset of systems and supplemented with manual data feeds, introducing data quality problems.  They had never undertaken an initiative to comprehensively identify all of their risk and financial reporting data. With Basel II compliance looming on the horizon and divergent internal architectures for economic capital and profitability reporting, they acknowledged the value of data consolidation, and engaged eBIS to help engineer a solution.

It was clear the bank needed to upgrade its data collection and management architecture.  Our engineering challenges were many:

  1. Design a conformed dimensional data model to integrate over 45 source product systems, representing the 7 lines of business producing risk for the bank
  2. Determine the requisite data points to satisfy regulatory, corporate finance, and portfolio risk management reporting
  3. Reconcile source product system data to the general ledger, such that reportable data at a granular level reflects the bank’s official books and records
  4. Design an approach to identify and exclude transactions between closely held entities
  5. Accommodate the effects of merger and acquisition activity seamlessly
  6. Create a solution for retail modeling, pooling similar product exposures and assigning calculated credit attributes
  7. Devise a modeling approach for the Basel II concept of repo-style portfolios: pools of exposures that are collateralized and re-margined on a daily basis
  8. Create a consistent definition of credit exposure across the enterprise to feed both regulatory and economic capital calculations
  9. Integrate with external systems to calculate: a) value at risk for portfolios margined daily; b) a probability of default (PD) for every commercial customer; c) a loss given default (LGD) for each credit facility; d) economic credit risk capital and e) loan reserve
  10. Devise a process scheduling approach to manage technical execution across technologies and platforms
  11. Establish expectations with providing data systems and enable a process to identify and remediate source system data errors

Notwithstanding the immediate modeling challenges, our underlying objective was to create a scalable environment comprehensive of reportable risk and profitability data points from all lines of business, with the flexibility to absorb changing analytic requirements through time.  If banking regulations evolve or new portfolio modeling techniques emerge, our client would have the tool to not just adapt, but react quickly: an intelligently crafted enterprise data warehouse.


The bank had an existing profitability environment, with data input limited to two product systems, that provided a flexible development platform and established back end infrastructure.  With an eye toward the long-term benefits of integrating risk and profitability data, and potentially creating an enterprise warehouse environment that could eventually serve all data consumers, we supported the idea of leveraging the profitability environment for their initiative.

In contemplating design for such a large construction effort, we concentrated on developing ideas around each of the initiative’s architectural challenges.

  1. Data Model: Identify the complete list of exposure types, such as letters of credit and derivatives, and, where possible, construct a single data entity per exposure type, modeling discrete data attributes.  Conform data from multiple sources into each data entity, creating a consistent functional representation.  Capture data at the lowest possible grain, facilitating detailed analysis where necessary with the potential for aggregations and summary reporting. Create a robust dimensional model that includes the dimensions of time, customer, department, organization unit, asset category and product. Design a multivariate time dimension for measures that would allow for analysis of three factors: the date of valuation, the date of data extraction in source, and the date of data processing in our warehouse.   Of particular importance for our primary audience of credit risk and portfolio managers were the credit attributes of any risk position.  We generated the idea of using a bespoke product dimension to model these attributes, discretely identifying the maturity band, priority of claim, business context, form of credit extension, and mitigation offsets associated to each risk position.
  2. Secondary Users: Interview and request documentation from regulatory reporting, financial accounting, and profitability managers outlining their data needs.  Expand the data model and collect data points from feeding systems to meet their requirements.
  3. General Ledger Reconciliation: Create a custom program to aggregate granular positions at a level consistent with the dimensional reporting of the corporate general ledger: department, organization unit, account, and currency.  Use a robust time dimension to ensure that, in the case of revaluation, back valuation, or lagged data, the most current data is used for analysis. Where net positions are relevant, such as with deposits and derivatives, create a process that replicates the netting logic, and reconcile the netted balance to gl.  Check for balance differences between the granular and ledger levels.  Where differences exist, represent them at the granular level such that, when summed, the granular amount equals the ledger amount.  As such, any granular analysis of risk would tie to the bank’s official books and records.  Build associations between the balance differences and the system(s) that provided the granular data.  Identify problem systems and use the reconciliation output as a tool to improve source inputs.
  4. Closely Held Entities: At a granular data level, build a relationship between the dimensions of customer and department.  A customer attribute could identify subsidiaries and affiliates, which could then be associated to the department that originated the position with the closely held entity.  If both parties rolled up to the same legal entity, exclude the position from any analysis.  For those positions with customers that represent subsidiaries or certain affiliates in a different legal entity, flag them for analysis depending on the reporting need.
  5. Mergers and Acquisitions:  Use a generic, surrogate key for all data entities, including dimensions.  If the bank were to merge with another company, the additive data could flow in seamlessly.  For existing data that then transfers to an acquired system, the surrogate assignment process could identify the old data in the new system, preserving its original identifier.
  6. Retail Modeling: Using empirical studies of the portfolio of retail exposures, members of the bank’s portfolio management division identified multiple strata of data to which probability of default (PD) and loss given default (LGD) ratings and usage given default (UGD) percentages could be assigned.  Using a combination of product identifiers (for PD, LGD) and funded status (for UGD), we were able to translate the functional study into a mechanism for rule maintenance.  Each retail position would receive product tags, and periodic analysis of retail data within the data warehouse would provide the means to update the credit ratings and percentages.  A workflow process would allow expert judgment override capability, providing a flexible tool to reflect both evidence and discretion, as well as a decision trail for regulatory review.
  7. Repo-Style Modeling: Design an application that would pool like sets of exposure by business line and associate collateral to each pool, ensuring that securities on either side of the transaction are clearly identified.  Once the pools of exposure and mitigation were defined, they could be sent to an external vendor for simulation of stressed downside price movements in the form of value at risk (VaR).  The VaR calculation could return to the data warehouse for inclusion in an exposure at default (EAD) balance for calculation of regulatory and economic capital downstream.
  8. Credit Exposure Definition: With analysis of exposure occurring at varying levels across systems and business lines, we recommended the creation of a central data mart to consistently define a unit of exposure.  The data mart would build a relationship between the lowest grain of exposure, instrument, and the modeled grain of exposure, exposure set, establishing either a one-to-one or many-to-one relationship.  Perhaps most importantly, the data mart would include logic to ensure every defined exposure, including on and off balance sheet positions, had a PD and LGD rating assigned to it, and would calculate a remaining maturity, derive a rule-based modeled maturity, and calculate EAD.  It would define the universe of exposure types and tag each exposure, creating the first ever enterprise classification for reporting and analysis. The effects of guarantees received would be estimated through a double-default algorithm, and guarantor exposure would be included for economic capital and customer concentration analysis.  For instances where regulatory and economic capital balance requirements diverge, model EAD in both ways as separate balances.  We recommended the inclusion of secondary balances, such as accruals, receivables and assessments, in the definition of a loan equivalent exposure, which would avail them to preferable capital treatment.  In one central place, the bank could define exposure and all its associated attributes and balances, and deliver it consistently to regulatory and economic calculation engines.
  9. Mitigation Definition: EAD and VaR calculations and LGD ratings all rely on mitigation information to derive their values.  We recommended the creation of a consolidated mitigation model, defining the exposure, type and magnitude of mitigation, and the context in which it could be used, such as jurisdiction, governing law and permissions.  The mitigation model would cover collateral, 3rd party guarantees and indemnities, credit derivatives, participations, syndications, and netting.
  10. External Integration: Create a flexible inbound and outbound data transfer platform with firewall security protection.  Define inbound and outbound layers of data within the warehouse: the inbound layer representing the data format of the sending system and the outbound layer representing the data format required by the receiving system, pre-processed into a data mart with analytic cacluations as required.  Both would include validation logic to cleanse data before it entered or left the warehouse.  In this way, the data acquisition paradigm could be blind to the system providing data.  Whether it is an internal legacy system or an external analytic calculation vendor, the treatment is the same: acquire the data in its native format and then validate and transform it into data warehouse dimensional standards.  Finally, build with tools flexible enough to adjust to messaging and a service-oriented architecture when the bank becomes ready for that strategic infrastructure change.
  11. Process Scheduling: A data warehouse normally employs a multitude of technologies to manage data extraction, transfer, load, analytic processing, replication and reporting.  Sequencing of process execution is of high importance, and often crosses technologies. The project team needed to formulate an approach for managing the amalgam of technologies, with the ability to trigger process execution and scale to the level of thousands of daily jobs with varying frequencies, controlled through a central software program.  We recommended investigating, in order of priority: 1) a corporate standard scheduler used in a similar capacity elsewhere at the bank; 2) a best of breed scheduling tool; 3) a custom built interface, using metadata and APIs.  Flexibility to integrate new technologies in the future should be a top priority.
  12. Source System Management: A data warehouse is only as good as the data it receives.  In that vein, it’s vitally important to establish agreements with providing systems as to format, content, timing and frequency of data delivery.  We recommended formalizing this relationship through Service Level Agreements (SLAs), which would document these data expectations and act as a reference point for adherence.  In addition, lines of communication, supported by data reports from the warehouse, would need to be established with source system owners for data quality remediation.  As a best practice, we stressed the importance of identifying data problems in a central warehouse, but managing them in providing systems.  The quality of today’s data is far less important than confidence in the quality of future data; only remedies at the source could ensure that confidence.

Our ideas provided guidance on best practices in data warehousing and functional modeling.  Taken together, they established a foundation on which to build a robust solution.


Acting in strategic advisory, architecture design and implementation roles, we staffed an eBIS team of 15 financial services professionals, part of a project team of over 75 client and consulting resources. Collaborating with a Big 4 firm to construct the solution, we worked together to translate a package of good ideas into a robust data environment.

The solution embraced all of the ideas outlined: a dimensional data environment addressing the needs of risk, finance, profitability, and regulatory users that ties to the bank’s reported ledger and calculates credit risk measures for Basel and economic reporting. The data warehouse, sized to almost 3 terabytes, orchestrates 1,700+ jobs from close to 50 providing and numerous receiving systems on a daily basis through a dependency-based scheduler.

The solution embodies numerous data warehouse best practices, ones that we documented as project standards, providing an architecture flexible enough to accept new systems and modeling requirements as they emerge. It handles identification and reprocessing of dirty data in a way that preserves the iterative improvements in data through time, while maintaining the locus of responsibility for data quality in originating systems.

Our contribution also included training sessions on the solution architecture for all interested parties within the bank. Our presentations focused on identifying business interpretations of the information captured in the warehouse, so that secondary users, outside of credit portfolio management, in the profitability, regulatory, and finance sectors, could recognize the potential benefits for their areas.


Our initiative was mandated out of a need for a comprehensive credit risk environment to meet regulatory and economic capital reporting requirements. The client committed significant time, monetary and human capital to that end through a project that spanned seven years. The result met the need: a comprehensive financial warehouse from which other reporting tools could extract data for Basel, Federal Reserve, economic credit risk capital and corporate profitability calculations and FFIEC reports.

As Basel III, trading book liquidity, and stress testing requirements unfold, and U.S. regulators set the parameters for adherence to the Dodd-Frank Act, our client has a flexible, scalable data environment in which to adapt. The solution was the first at the bank to consolidate all risk systems and dimensional data in a central warehouse, providing a substantive analytic and reporting tool at a critical time in financial services regulation.

In addition, the solution acts as the important first step on the path to creating a true enterprise data warehouse, from which all bank employees consume analytic data for custom analysis and standardized internal and regulatory reporting. We helped create a data environment that enables the “one source of the truth” that enterprises yearn for. A customer, department, and product are viewed consistently, regardless of the report. A unit of risk has a definition every user can understand. Regulatory, risk, finance, and profitability users can refer to the same intersection of data and get the same result, outcomes that are not easily quantifiable, but unquestionably valuable.

The mandate that set our initiative in motion was clear, but the unintended benefits resulted from a cadre of good ideas and the steadfast belief in designing solutions that can adjust to changing needs.