Bioproduction of valuable vaccines and biotherapeutics play essential roles in disease treatment and prevention; however, developing robust, predictable, and sustainable expression is challenging, especially in the mammalian cell line. A recent work (Jan Zrimec, 2022) uses generative adversarial networks (GAN) model to solve this issue, which perturbs the regulatory element of DNA sequence to achieve the desired transcription result.
The success of diffusion model in image generation field opens up new opportunities, inspiring researchers to apply diffusion model in biological entity generation, such as RFdiffusion for protein generation. In this project, you will explore the possibility of applying diffusion model to stable and diverse DNA sequence generation, with a focus on one of the regulatory elements in the DNA called promoter.
Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models
Controlling gene expression with deep generative design of regulatory DNA - Nature Communications
Effective gene expression prediction from sequence by integrating long-range interactions - Nature Methods
Automated model-predictive design of synthetic promoters to control transcriptional profiles in bacteria - Nature Communications
The first stage of the project focuses on using the diffusion model to generate diverse DNA sequence for a fixed transcription profile.
GitHub - JanZrimec/ExpressionGAN
GitHub - calico/basenji: Sequential regulatory activity predictions with deep convolutional neural networks.
EPD The Eukaryotic Promoter Database