The Neptune AI blog has a nice summarization of popular NLP augmentation strategies
Data Augmentation in NLP: Best Practices From a Kaggle Master - neptune.ai
This can be a good start for the project to explore some of these existing approaches and how would they perform on standard tasks.
The Augmentation game in CV is at another level:
CutMix: This method literally fills the CutOut. Instead of simply removing pixels, we replace the removed regions with a patch from another image (See Table). The ground truth labels are also mixed proportionally to the number of pixels of combined images.
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
AugMix: Augmentation operations such as translate x and weights such as m are randomly sampled. Randomly sampled operations and their compositions allow us to explore the semantically equivalent input space around an image. Mixing these images together produces a new image without veering too far from the original.
AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty
It is interesting to see how we can apply some of the above-mentioned strategies to NLP tasks.
Notion - The all-in-one workspace for your notes, tasks, wikis, and databases.