Project Hub - Project Hub - Tweet Emotion Classification

Assignment

Classify Emotions in Tweets Using an LSTM-based Architecture (PyTorch) Build an emotion classifier for tweets using:

Classical baselines (simple averaging embeddings + MLP)
An attention-enhanced LSTM
Emotions can include: Joy, sadness, anger, fear, love, surprise
Datasets:
EmoTweet Dataset
Kaggle datasets: Emotion Dataset for NLP (6 labels)

Brainstorming

Resources and dev log

08/01/2026 - 01:00 : find and downladed dataset emotion-dataset. Project scaffolding done. Downloaded glove embedding Trained model. Created simple dictionary called “word2idx” tied to Glove embeddings. Created mapping “emotion to id” and viceversa “id to emotion”. Modified database, added label_id and sequences column. Sequences is the word-by-word conversion to id for Glove embeddings. Created a neural network classfier with simple architecture Embedding Glove -> LSTM (NO ATTENTION) -> Output (size )
08/01/2026 - 14:00: given my experience with my previous failed project. I think it’s better to keep the old one. Save results somewhere and make a plot graph. Make a renowed list of task, added section to my notebook.
09/01/2026 - 01:10 - downloaded second dataset, removed duplicates, cleaned, i merged the two. There are no duplicates. Performed exploratory data analysis and understood that classes are umbalanced (this will have conseguences on the classification and the metric). Also data contains outliers. Computed 1,2,3-grams but it lack an explanation or there is something i’m not seeing with these. I looked also at most frequent 4-gram and 5-gram, but 5-gram are too rare compared to dataset. Most frequent 4-gram are almost like 1-2-3 suggesting that data is sparse. Gemini talked about “stop words” but i don’t know what they are.
09/01/2026 - 12:26 defined preprocessing pipeline. Defined a preprocessing pipeline in PreprocessPipeline class with all the rules and applied it to text.
09/01/2026 - 18:00 Plot graph with bi-grams (and only for bi-grams analysis: removed the stop words) for each category. Removed outliers on the right tail of the distribution. N-gram analysis was fundamental because i find a lot of noise. I cleaned data, thankfully over the entire dataset of 423k labeled samples only like <500 are noise. I can remove them later it is not urgent since most. From n-gram analysis visualization i removed “feel like” and “i m feeling” since these two bigrams were present and predominant across all the 6 labels. Original dataset is “unchanged” in the sense that stop words and strings are still present, only outliers removal and preprocessing pipeline are applied.
10/01/2026 17:10 - fixed a bug which uncorrectly compute outliers to remove.
10/01/2026 19:18 - implemented baseline embed + MLP and LSTM with attetnion, impliemented F1-score instead of loss, i consider both macro and weighted since classes are unbalanced and my goal is to maximize the F1-score. Added Confusion matrix visualization for 6 labels after the training of each model. Implemented the following test: Considered a small subset and overfit the LSTM with attention on the small subset, this should be enough to say that there are no bug or errors in the data. For each model training, created a history of f1 score (both weighed and macro) and history of train and val loss. Plotted f1 score. Compute confusion matrix for both
11/01/2026 15:18 switched from glove 100d to glove 300d since i have a dict of unique 80k words. The model was slightly better and i got on the baseline a maximum of 0.8960 F1 weighted score (record for baseline)
11/01/2026 17:48 added manual stop with try-except loop. Added Early stop and save model to google drive. Added prior initialization and xavier to boh models hopefully improving training. Added bias to cross entropy loss
11/01/2026 22:34 Added more samples and retrained. Baseline reached lowest validation of f1 macro 8613. While LSTM (1 layer) with hyperparameters of LSTM (with 1 layer) lr=5e-4, weight_decay 1e-5 and hidden_dim to 128 reached a maximum f1 score macro of 0.9087. Note the second model is bidirectional.
11/01/2026 23:37 implemented “set_seed” for reproducibility. Documented the dataset composition
12/01/2026 23:10 documented Datasets compositions. Explained what is GloVe. documented weighted cross entropy loss
13/01/2026 00:01 in the colab i wrote: - “Validation Loss (Weighted Cross Entropy) is the loss computed over the set, during each epoch after training; - Top-2 Validation Accuracy is the accuracy computed over the top 2 highest probability guesses. In my case both Validation Accuracy and Top-2 Val Accuracy are high, so the model is confident. But Top-2 Val Accuracy is higher.” ”- F1 Score can be weighted and not weighted. The not weighed (macro) is lower but a more honest evaluation because it does not take in account that data is unbalanced.” Provided an explanation for the results of the model. Documented in colab that precision and recall wasn’t balanced for all classes.
13/01/2026 01:05 - documented the Dictionary implementation and Out of Vocabulary problem and stuff. Documented attention mechanism
13/01/2026 13:21 - documented attention, now models return weights so i modified and verified all the functions. I collected a sample and printed the weights of each token from the sample of tweets, for each label and plotted them. Then i computed the top words across the validation set using weights values and plotted also them. I then commented the results.
13/01/2026 15:28 Added Error Analysis Section. Printed 10 tweets from the dataset that the model didn’t correctly labeled. For these i also shown the weights heatmap like before. Added sarcasm example: selected 5 random samples from another dataset of sarcastic tweets and just show the tweet + what model predict. In some of these the model predicted the correct way, in the others it won’t. But i think it’s good. Added Limitations and future improvements section of text only at the end.
13/01/2026 16:03 added emoji python library, added demoji text into the preprocess pipeline. need to a do a final rerun. Since i have to retrain, i added ReduceLROnPlateau to finetune the model and possibly reach a lower validation loss.
28/01/2026 18:21 - Learned what is pyproject.toml and added to the repository. Structured repository as Data Science Project Structure
29/01/2026 18:00 - Created script for downloading data
30/01/2026 13:00 - Create a notebook that merge the two datasets. Saved it into Parquet format to a file
30/01/2026 18:00 - I’m half the road before finishing the refactoring. I learned a lot so it’s good.
31/01/2026 16:00 - Added VocabBuilder class in src, a script called make_dictionary that instantiate the object from this class, load dataframe, create the sequences and make the dictionary and save it to vocab.json under data/

Preprocessing pipeline:

Lowercase everything
Remove noise: html, urls
hashtags are easy to separate just remove the # and then separate words
Keep ’!’ and ’?’ because they are primary indicators of surprise/anger
TweetTokenizer from nltk auotmatically manages contractions.
Remove emojis (i didn’t find any of them in the database)

Experiments

Model	lr	weight_decay	hidden_dim	dropout	F1-macro	Notes
Base	1e-3	default	256	0.3	0.84	Overfit
Base v2	3e-4	1e-5	256	0.0	0.8618	Baseline optmized
LSTM v1	3e-4	1e-5	128	0 (1 layer)	0.9086	Slower but best, more accurate

Tasks

Optional/Remaiing Tasks

Visualize Attention weights
Add a p-test between the f1s of baseline and f1s of the lstm model
Retrain with smaller hidden dimensionality
Ablation Study isolating attention’s contribution
Draw architecture graph for the two models
Improve Github repository
Add project to hugging face

Github Repository and project architecture

Comments

The EmoTweet-28 dataset isn’t available in official platforms.

Obsidian + 🪴 Quartz 4.0

Table of Contents

Project Hub - Tweet Emotion Classification

Project Hub - Project Hub - Tweet Emotion Classification

Assignment

Brainstorming

Resources and dev log

Experiments

Tasks

Comments

Graph View

Backlinks

Obsidian + 🪴 Quartz 4.0

Table of Contents

Project Hub - Tweet Emotion Classification

Project Hub - Project Hub - Tweet Emotion Classification §

Assignment §

Brainstorming §

Resources and dev log §

Experiments §

Tasks §

Comments §

Graph View

Backlinks

Project Hub - Project Hub - Tweet Emotion Classification

Assignment

Brainstorming

Resources and dev log

Experiments

Tasks

Comments