Kanye West RNN
About
This document contains the code to create an RNN chatbot that emulates Kanye West's speech style.
Setting up the environment.
I am starting from scratch on this machine:
$ neofetch
'c. aayushbajaj@Aayushs-MacBook-Pro.local
,xNMM. -------------------------------------
.OMMMMo OS: macOS 15.2 24C101 arm64
OMMM0, Host: MacBookPro17,1
.;loddo:' loolloddol;. Kernel: 24.2.0
cKMMMMMMMMMMNWMMMMMMMMMM0: Uptime: 2 days, 42 mins
.KMMMMMMMMMMMMMMMMMMMMMMMWd. Packages: 182 (brew)
XMMMMMMMMMMMMMMMMMMMMMMMX. Shell: zsh 5.9
;MMMMMMMMMMMMMMMMMMMMMMMM: Resolution: 1440x900
:MMMMMMMMMMMMMMMMMMMMMMMM: DE: Aqua
.MMMMMMMMMMMMMMMMMMMMMMMMX. WM: Quartz Compositor
kMMMMMMMMMMMMMMMMMMMMMMMMWd. WM Theme: Blue (Dark)
.XMMMMMMMMMMMMMMMMMMMMMMMMMMk Terminal: tmux
.XMMMMMMMMMMMMMMMMMMMMMMMMK. CPU: Apple M1
kMMMMMMMMMMMMMMMMMMMMMMd GPU: Apple M1
;KMMMMMMMWXXWMMMMMMMk. Memory: 1485MiB / 8192MiB
.cooc,. .,coo:.
It is why I first need to run install conda first. I went with the whole suite from https://www.anaconda.com/download.
Then I initialised my environment and installed the correct packages:
conda create -n metal -f metal.yaml python=3.11
conda activate nlp
conda install numpy
conda install pandas
conda install matplotlib
conda install scikit-learn
pip install tensorflow-macos
pip install lyricsgenius
Sourcing data and cleaning:
I go get an API key from genius to pull Kanye's music into a text file:
# Fetch Kanye West's songs
artist = genius.search_artist("Kanye West", max_songs=100, sort="title")
# Save lyrics to a text file
with open("kanye_lyrics.txt", "w") as file:
for song in artist.songs:
file.write(song.lyrics + "\n\n")
#now we clean the data:
# Load raw lyrics
with open("kanye_lyrics.txt", "r") as file:
raw_data = file.read()
# Clean lyrics
cleaned_data = re.sub(r"\[.*?\]", "", raw_data) # Remove metadata like [Chorus]
cleaned_data = re.sub(r"\s+", " ", cleaned_data) # Replace multiple spaces with one
# Save cleaned lyrics
with open("cleaned_kanye_lyrics.txt", "w") as file:
file.write(cleaned_data)
TODO Architecture
Code
The below code works, but chatgpt wrote it for me. It was mainly a proof of concept for the moment. I shall refactor it all soon.
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Embedding
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
# Load the data
with open("cleaned_kanye_lyrics.txt", "r") as file:
data = file.read()
# Tokenize text
tokenizer = Tokenizer()
tokenizer.fit_on_texts([data])
sequence_data = tokenizer.texts_to_sequences([data])[0]
# Define vocabulary size and max sequence length
vocab_size = len(tokenizer.word_index) + 1
sequence_length = 50
# Create sequences
sequences = []
for i in range(sequence_length, len(sequence_data)):
seq = sequence_data[i - sequence_length:i]
sequences.append(seq)
# Convert sequences into numpy array
sequences = np.array(sequences)
# Split sequences into input (X) and output (y)
X, y = sequences[:, :-1], sequences[:, -1]
y = tf.keras.utils.to_categorical(y, num_classes=vocab_size)
# Build the RNN Model
model = Sequential([
Embedding(input_dim=vocab_size, output_dim=100, input_length=sequence_length - 1),
LSTM(units=128, return_sequences=True),
LSTM(units=128),
Dense(units=vocab_size, activation='softmax')
])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# Train the Model
model.fit(X, y, epochs=20, batch_size=32, validation_split=0.2)
model.save('kanye_rnn_model.h5')
# Generate Text
def generate_text(seed_text, next_words, model, tokenizer, max_sequence_len):
for _ in range(next_words):
token_list = tokenizer.texts_to_sequences([seed_text])[0]
token_list = pad_sequences([token_list], maxlen=max_sequence_len - 1, padding='pre')
predicted = model.predict(token_list, verbose=0)
output_word = tokenizer.index_word.get(np.argmax(predicted), "")
seed_text += " " + output_word
return seed_text.strip()
# Chatbot Interface
if __name__ == "__main__":
print("Kanye Bot: Hi, I’m Kanye Bot. What’s on your mind?")
while True:
user_input = input("You: ")
if user_input.lower() == "exit":
print("Kanye Bot: Peace out!")
break
response = generate_text(user_input, next_words=20, model=model, tokenizer=tokenizer, max_sequence_len=sequence_length)
print(f"Kanye Bot: {response}")