I had to give a talk about a project I did a year ago when I was in the third year, preparing for my internship and had to follow this pre-interview calender. I was a newbie in Natural Language Processing(NLP) before that and well, at least knew a fair bit at the end of the calendar. There was a task that I thought was extremely daunting back then, "write a blog on any topic that you liked learning?"
It was daunting because it was entirely up to me, and autonomy isn't as much fun when you truly have it. I ended up mushing two things I liked, Arctic Monkeys and Text Generation. It takes in an artist name, collects all the lyrics, feeds it to a language model which is good at prediction next word in a sequence and creates lyrics in the style of your artist. I used lyricsgenius to scrape the lyrics. nlpaug to increase the dataset by augmenting it. GPT-2 as the language model which was trained using the gpt2-simple package.
You can see the talk here, and if you want to look at the slides/notebooks, you can see them here. There are a lot of changes I had to make to the notebook given that everything in NLP has an expiry date in months and all my code was obsolete.
So these somethings that have changed
or the things that were the same but I learnt/discovered a year later
---
GPT-2 stands for Generative Pretrained Transformer; Yes I didn't know that before.
A lot of Python Packages are not good at backwards compatibility.
Do not, no matter how sleepy you are, do not forget to checkpoint your models.
GPT-2 Simple now offers four models, small (124M), medium(355M), large(774M), extra-large(774M).
Transformer works on the concept of self-attention (paying attention to different parts of a sentence to create a representation) which enables parallel estimations because you're not processing the sentence one word at a time.
CHECKPOINT YOUR MODELS.
nlpaug moved from BERT aug to ContextualWordEmbs to incorporate all of them newer BERTs.
Do not forget to checkpoint your models.
Want to try it for yourself?
Get your data by changing the API Key and Artist Name in this notebook.
Train your model by changing the name of the text file to reflect your filename here
Some of my favourite generated lyrics are: