Skip to content

Omar-Yasser/arabic-sentiment-analysis

Repository files navigation

Arabic Sentiment Analysis

LLMs know everything, but don't understand anything.

- Omar Yasser

This project focuses on the sentiment analysis of company reviews in various dialects of Arabic.

Preprocessing

  1. Data Cleansing: Removal of nulls and duplicates to ensure a clean dataset.
  2. Text Normalization: Stripping away punctuation, digits, and special characters to focus on the linguistic essence.
  3. Diacritic Handling: Removing diacritics and normalizing Arabic characters to address the variability in text input.
  4. Language Homogenization: Translating the few non-Arabic words into Arabic to maintain linguistic consistency.
  5. Emoji Mapping: Emojis, often conveying strong sentiments, were mapped to their textual meanings.

Models

Four models were implemented:

  1. Finetuned AraBERT: Leveraging the power of AraBERT, finetuned to our specific dataset.
  2. Transformer from Scratch: Building a Transformer model from the ground up, to better understand its architecture.
  3. LSTM
  4. Bidirectional LSTM: LSTM, but it captures both forward and backward directions.

Results

Our team won in a Kaggle university-wide Arabic Sentiment Analysis competition (out of more than 100 teams). Our model achieved an impressive 87.5% accuracy, outperforming the second-best team by a significant margin of 2%.

Kaggle Competition Leaderboard

Team Members

About

Dialectical Arabic Sentiment Analysis

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy