24 Sep 2020Christoph Nadig

What is synthesized data?

Hearing of a synthesizer, most people’s thoughts might wander to electronic music rather than to digital data. But the process of synthesizing a database is exactly what banks should have in mind when it comes to accelerating their software development and testing.

Done the right way, a synthesized database provides an authentic and production-like environment that can be accessed by all internal and external developers without any compliance constraints. This helps speed up development processes and provides the freedom to involve partners at will - an ever more needed capability in an open banking environment or ecosystem. In this blog post, we want to explain what synthesized or synthetic data is and how the financial industry is predestined to profit from this concept.

What is synthesized data?

A synthesized database provides an environment in which to develop and test new software solutions. It is a copy of a bank’s real data, but free of any client identifying data (CID) and, therefore, can be accessed internally and externally without any compliance constraints. While all CID is removed in the process, the structures within the database (constellations) will be preserved in the synthesized image. The result is a production-like, authentic environment that looks and behaves the same as the original database. It has a superior quality compared to any currently available solutions, including simple anonymization or artificial databases.

What are the benefits of synthesized data for an FI?

For a bank, it used to be best practice to develop and test new software on a copy of its productive data. While the authentic data environment allows the creation of reliable solutions and leads to credible results in testing, the usage of productive data comes at a cost. Compliance significantly restricts the access to CID. This sets a limit to the involvement of internal developers and, to an even greater extent, external suppliers. The existing solutions for this challenge – namely, artificial or anonymized databases - help to resolve the compliance issue. But artificial databases are typically small in size and of limited variety and complexity, delivering an incomplete reflection of the productive data, while still requiring high maintenance efforts. Anonymized instances may simulate the entire original database, but, as an exact copy without CID, still expose statistics about a bank.

The usage of a data synthesizer delivers the following benefits to a financial institution:

  • System development and testing based on production-like data
  • Reduction of compliance checks and reviews
  • Reduction of compliance and reputational damage through data leaks
  • Cost savings by offshoring testing
  • 3rd party access without constraints
  • Avoid touching CID when working from home

How to create a synthesized copy of your database

The generation of synthetic data follows four important phases. The Analyzer gathers the actual production data structures, based on the defined signatures of defined business types. The Deriver is the element that initiates the derivation of the synthetic data out of the Analyzer’s results and generates random data. The third element, called the Deleter, is a separate framework that removes all client sensitive data and leaves a template database that will be combined with the results from the Deriver to form a new, fully functional database. Our solution runs the first three elements in parallel for performance reasons. The last element, the Loader, combines the template data, now free of any CID, with the synthesized data from the Deriver.

Figure 1

The potential use cases of a synthesized database

Once in place, a synthesized database can contribute to several different use cases.

Figure 2

The potential efficiency gain in production days and IT costs depends on the size of an organization, but is significant from the start. With the shareability of the data, involvement of partners and offshore service centres can be realized with ease. We are currently developing synthesized databases for selective clients, allowing their external partners and independent developers to access their API to develop and test their own banking applications. For these clients, the availability of a shareable, production-like database is an important driver of their open banking efforts.

Conclusion

Financial institutions face a dilemma when it comes to their client’s data. While they must adhere to increasingly stricter data protection regulations, they also have to open their platform in the development process to ecosystems outside the bank. The provision of a realistic development and test environment becomes key for financial institutions to collaborate with fintechs. Avaloq recently began to offer the Data Synthesizer to its clients, allowing them to generate a synthesized image of their data with a tried-and-tested tool.

If you want to learn more about the Avaloq Data Synthesizer, contact your Key Account Manager.

Written by Christoph Nadig
As the Product Owner Technical Core at Avaloq, Christoph Nadig and his team in Zurich are responsible for critical core elements of the Avaloq core banking system. This includes vital parts such as information lifecycle management, the task and reporting engine and various base libraries. Christoph joined Avaloq in 2016 after covering various senior roles in the IT department of the Swiss technology and defence firm RUAG. Previous to his experience as Software Engineer and Chief Architect in the telecommunication industry, he studied Computer Sciences at the ETH Zurich.
Wealth management redefined using artificial intelligence

Download report

Learn more about:

  • The strains and prospects of a data science implementation project
  • How wealth managers create tangible business benefits with data science
  • 3 practical wealth management use cases of data-driven solutions