Webinar: Making synthetic data look real using conditional GANs

This post is by Eric.

On Thursday, Maria Skoularidou from the University of Cambridge, Biostatistics Unit is stopping by to talk to us about generating high-quality synthetic data. You can register here.

You can read the whole paper here.

Abstract

Modeling the probability distribution of rows in tabular data and generating realistic synthetic data is a non-trivial task. Tabular data usually contain a mix of discrete and continuous columns (the most representing examples of such data are Electronic Health Records). Continuous columns may have multiple modes whereas discrete columns are sometimes imbalanced making their modeling difficult. Existing statistical and deep neural network models fail to properly model this type of data. We design CTGAN, which uses a conditional generator to address these challenges. To aid in comparison, we designed a benchmark with 7 simulated and 8 real datasets including several datasets often used to evaluate Bayesian networks. CTGAN outperformed Bayesian networks on most of the real datasets whereas other deep learning methods did not.

About the speaker

Maria Skoularidou holds a 4-year diploma in Computer Science and a 2-year MSc in Statistical Science, both from Athens University of Economics and Business, and is currently a final year PhD student at the University of CambridgeMRC-Biostatistics Unit supervised by Professor Sylvia Richardso. Her thesis is focused on probabilistic machine learning and more precisely, on using generative modeling in healthcare.

Her fields of interest lie in the underpinnings, theory, and methodology of machine intelligence, Bayesian inference, theoretical computer science, neuroscience, and information theory.

Maria is also the Founder and Chair of {Dis}Ability in AI and founder and secretary of  Women in Data Science and Statistics (RSS).

1 thought on “Webinar: Making synthetic data look real using conditional GANs

Leave a Reply

Your email address will not be published. Required fields are marked *