Modus; Multimodal transportation analysis Client SESAR Joint Undertaking Industries Passenger...
Read MoreSynthetic population for tax fraud detection testing
About the client
The Internal Revenue Service (IRS) mission is to provide America’s tax payers top quality service by helping them understand and meet their tax responsibilities, and to enforce the law with integrity and fairness to all. The Agency is currently ongoing a multi-year modernization plan to strengthen its information technology systems, technologies and processes. This includes data infrastructure (cloud, Agile, DevOps, API), process automation, and advanced data analytics and tools.
Challenge
The IRS Enterprise Services (ES), Enterprise Systems Testing (EST) unit provides enterprise-wide testing solutions and critical support across the systems and applications of the Agency. In its goal to acquire advanced data analytics capabilities, EST requires high-quality data to perform automated testing in pre-production environment of tax review and enforcement software, including tax fraud detection. Currently, simulated tax returns are generated for virtual individual households and updates based on random selections and rule-based configurations. This is a largely manual process that does not consider all possible use cases given in a tax population. In addition, production (real tax) data cannot be used due to privacy compliance issues.
The customer’s requirements were to develop a simulation engine that generates synthetic tax data representing more scenarios with lower effort. It required simulation of individuals and business entities which can evolve over time and generate updated tax records. The business goal is to reduce IT delivery cycles by generating synthetic tax data that is rapid, distributable, and repeatable.
The Simulation of the Nation (SimoN) tool
In response to the IRS requirements, Skymantics is providing a generator of synthetic population and eFile tax forms, with aging capabilities. The SimoN toolset is based on our synthetic population technology DataGenesis. A modular Machine Learning model architecture implements the algorithm training and generation, and a custom application creates and submits tax eFile forms for automated validation. Learn more about our experience with synthetic data.
The generation interface allows the tailoring of population size, geography, and any demographic attribute as desired for custom scenarios. Synthetic populations can then be aged over a number of years, as life events make the status of households, individuals and businesses evolve. A summary of SimoN features includes:
- Generation and aging of populations of households, individuals and businesses
- Fabrication of family names and addresses (U.S. and foreign)
- Synthesizes hundreds of households to millions, in minutes
- Zipcode matching allows household attributes to accounts for regional trends
- Ability to model what-if scenarios (recessions, natural disasters, changes in demography)
- 65 entity attributes including date of birth, gender, employment type, educational level, disability, citizenship and marital status
- 30 socioeconomic life events including birth, death, marriage, divorce, changes in job status and address changes
- 70 automated tests for statistical validity against data sources
- Modular design, building blocks – can be integrated with enterprise data and commercial visualization frontends
Skymantics is applying an Agile methodology to ensure customer requirements are covered by the SimoN solution. Open Source data science software libraries have been used to the maximum extent to ensure model reliability and robustness. A microservice architecture approach has been used, which increases configurability, reusability and explainability of the models. Data for model training is based on authoritative sources including U.S. Census, U.S. Bureau of Labor Statistics, academic research and other 3rd party sources.
The Skymantics difference
By pioneering the field of synthetic data in the generation of population attributes, Skymantics is leading the way in the application of Artificial Intelligence to a wider understanding and prediction of trends in demographics and psychographics. Financial, healthcare, and Government industries are some of the most mature domain areas of application. Learn more about our work in financial services / Government services.
We are currently offering an initial population synthesis capability to public and commercial organizations, which provides the following key value propositions:
Privacy and Security
Production data should never be used for testing scenarios and policies. Obfuscation and encryption are not valid security techniques. Synthetic data.
Performance
Able to produce test data in seconds or minutes with full traceability.
Flexibility
Ability to track necessary use cases to support tests, and to define forecast scenarios for testing decision-making outcomes.
Automation
Minimize overhead time for testing, allowing teams to shift left their quality controls and automate metrics.
Integration
Ability to synthesize data in your environment through toolkit.
Data quality
Maintains referential integrity across people, households, jobs business, just as a real population would.
Interested in learning more about our SYNTHETIC DATA capabilities or requesting a demo?
Contact us
Other success stories
Digital transformation in airports
Digital transformation in airports Client U.S. Department of Energy, Office...
Read MoreDAISY: Augmented analysis of aviation performance
daisy; Augmented analysis of aviation performance Client EUROCONTROL Industries Air...
Read More