Synthetic Data

Home | What we do | Solutions | Synthetic Data

Test datasets are used widely in software verification, Machine Learning algorithm training, and data analytics. Data is nowadays the main resource for businesses in areas as wide as marketing, employment, migration patterns, consumer behavior, and professional services. However, using production data (i.e., real data records) brings security and accuracy restrictions on the procedures to test and forecast scenarios, especially if the customer includes Personally Identifiable Information (PII) in its datasets. Synthetic data removes such barriers by producing pre-production data environments with the scale and realism required.

Learn more about our DATAGENESIS product.

Creating an alternative reality safe for testing and analysis

Synthetic data replicates the original data by maintaining its statistical integrity. The source records are not present in the synthetic dataset, thus removing any trace back to the original entries. In addition, synthetic datasets can be modified to different scales, forecasted into the future, or tweaked to represent variations of the original scenario in a controlled manner.

Using synthetic data for software testing/training and data analytics brings the following benefits compared to real data:

Use cases for synthetic data

Skymantics has been a consultant in the realms of security and privacy for several years with our customers. During this time we have observed the evolving and increasing data privacy regulations around the world.

Skymantics is a pioneer in the use of synthetic data. We are currently bringing early benefits of this approach to risk areas including tax administration and disaster response.

Main use cases for synthetic data are:

True Data anonymization

Masking and data obfuscation techniques are traceable and have been exploited to infer sensitive data from customer or patient data for identity engineering and fraud. Synthetic data is a superior alternative as it completely eliminates the risk to trace back to individual data. Synthetic datasets are safe to share within the organization (with data science or business analytics teams) and outside (with customers and collaborators).

Accelerate Path to Automated Software testing

Traditional test data generators are based on entry randomization and rules-based pattern construction. Artificial Intelligence enhances tremendously the accuracy and richness of test environments by generating synthetic datasets that mirror the stochastic relationships between the variables characterizing the real data, thus allowing to define specific test cases without losing accuracy.

Predictive analytics

A synthetic dataset can be tweaked and branched out to represent hypothetical scenarios for analysis and comparison. Examples of this are populations aged following different demographic patterns, or suffering economic recessions. This capability allows to simulate potential outcomes and perform sensitivity analysis on analyses such as market segment demographics, population vulnerability to disasters, or events in the air traffic network.

Fraud and Revenue Protection

Leverage built in consumer, taxpayer and socioeconomic patterns to train rules on how to detect data anomalies and protect your revenue. Extend core models to custom aspects to support utilities theft, abnormal spending patterns, and financial transactions.

Machine Learning algorithm training

As more business operations rely on Machine Learning models for prediction and decision support, the datasets necessary to train such models become a valuable and scarce resource. Synthetic training datasets are cheaper to scale, and allow to include corner cases and reduce bias, thus improving the accuracy of the models.

Our approach: alive synthetic populations

Skymantics has developed DATAGENESIS, the most advanced generator of synthetic populations in the market. By replicating demographic, geographic, and socioeconomic features of the population from authoritative data sources, DATAGENESIS augments customer data with population insights while we create privacy-compliant synthetic data environments.

Generation and aging
Households, individuals and businesses
Fabrication of names and addresses
Geospatial attibutes
What-if scenario to simulate events such as natural disasters, diseases
65 entity attributes
30 socioeconomic life events
Automated testing for statistical validity
Software Development Kit (SDK)

Our synthetic populations are “alive”, as they can be aged for a number of years, producing demographic changes to the population structure which replicate real statistical trends. This novel branching capability enables the forecast of multi-year test scenarios (e.g., recession, immigration, pandemics) and impact over populations of configured scales and geographic areas.

Thanks to a modular design, DATAGENESIS can be integrated in customer data pipeline (cloud or in-premise), and with 3rd party BI tools and data platforms. The high generative performance allows synthesizing hundreds to millions of households in a matter of minutes. These capabilities enable players of different industries to integrate synthetic demographics in their data analytics environment and make the most of their existing customer, patient, and citizen data records without compromising privacy.

Do you want to learn more about the possibilities of synthetic data? Contact us to query about our solutions and request a demo today.

Synthetic Data

Creating an alternative reality safe for testing and analysis

Privacy and Security

Agile data generation

Flexibility

Use cases for synthetic data

True Data anonymization

Accelerate Path to Automated Software testing

Predictive analytics

Fraud and Revenue Protection

Machine Learning algorithm training

Our approach: alive synthetic populations

Ready to Take Your Internet Marketing to the next Level?

Let’s talk

CONTACT US

© 2023 Skymantics | Powered by Skymantics