What is synthesized data?

Hearing of a synthesizer, most people’s thoughts might wander to electronic music rather than to digital data. But the process of synthesizing a database is exactly what banks should have in mind when it comes to accelerating their software development and testing.

Done the right way, a synthesized database provides an authentic and production-like environment that can be accessed by all internal and external developers without any compliance constraints. This helps speed up development processes and provides the freedom to involve partners at will – an ever more needed capability in an open banking environment or ecosystem. In this blog post, we want to explain what synthesized or synthetic data is and how the financial industry is predestined to profit from this concept.

A synthesized database provides an environment in which to develop and test new software solutions. It is a copy of a bank’s real data, but free of any client identifying data (CID) and, therefore, can be accessed internally and externally without any compliance constraints. While all CID is removed in the process, the structures within the database (constellations) will be preserved in the synthesized image. The result is a production-like, authentic environment that looks and behaves the same way as the original database. It has a superior quality compared to any currently available solutions, including simple anonymization or artificial databases.

What are the benefits of synthesized data for financial institutions?

For a bank, it used to be best practice to develop and test new software on a copy of its productive data. While the authentic data environment allows the creation of reliable solutions and leads to credible results in testing, the usage of productive data comes at a cost. Compliance significantly restricts the access to CID. This sets a limit to the involvement of internal developers and, to an even greater extent, external suppliers. The existing solutions for this challenge – namely, artificial or anonymized databases – help to resolve the compliance issue. But artificial databases are typically small in size and of limited variety and complexity, delivering an incomplete reflection of the productive data, while still requiring high maintenance efforts. Anonymized instances may simulate the entire original database, but, as an exact copy without CID, still expose statistics about a bank.

The usage of a data synthesizer delivers the following benefits to a financial institution:

System development and testing based on production-like data
Reduction of compliance checks and reviews
Reduction of compliance and reputational damage through data leaks
Cost savings by offshoring testing
Third-party access without constraints
Avoid touching CID when working from home

How to create a synthesized copy of your database

The generation of synthetic data follows four important phases. The Analyser gathers the actual production data structures, based on the defined signatures of defined business types. The Deriver is the element that initiates the derivation of the synthetic data out of the Analyser’s results and generates random data. The third element, called the Deleter, is a separate framework that removes all client-sensitive data and leaves a template database that will be combined with the results from the Deriver to form a new, fully functional database. Avaloq's solution runs the first three elements in parallel for performance reasons. The last element, the Loader, combines the template data, now free of any CID, with the synthesized data from the Deriver.

The potential use cases of a synthesized database

Once in place, a synthesized database can contribute to several different use cases.

The potential efficiency gain in production days and IT costs depends on the size of an organization, but is significant from the start. With the shareability of the data, involvement of partners and offshore service centres can be realized with ease. We are currently developing synthesized databases for selective clients, allowing their external partners and independent developers to access their API to develop and test their own banking applications. For these clients, the availability of a shareable, production-like database is an important driver of their open banking efforts.

Conclusion

Financial institutions face a dilemma when it comes to their client data. While they must adhere to increasingly stricter data protection regulations, they also have to open their platform in the development process to ecosystems outside the bank. The provision of a realistic development and test environment becomes key for financial institutions to collaborate with fintechs. Avaloq recently began to offer the Data Synthesizer to its clients, allowing them to generate a synthesized image of their data with a tried-and-tested tool.

What is synthesized data?

What are the benefits of synthesized data for financial institutions?

How to create a synthesized copy of your database

The potential use cases of a synthesized database

Conclusion

Five steps to successful AI adoption in small and medium-sized financial institutions

The state of AI adoption in wealth management

Delivering modern alpha profitably

How to make data science work for wealth management