Classification Models & Visual Insights

Model Training & Visualizations

We trained and evaluated two different models using TF-IDF vectorized news article data:

Logistic Regression
Multinomial Naive Bayes

Each model was evaluated on performance metrics and supported with insightful visualizations.

🔹 1. Logistic Regression

A linear classifier suitable for sparse high-dimensional data like TF-IDF.

Metrics Calculated:

Accuracy
Precision
Recall
F1 Score

📊 Visualizations:

Confusion Matrix — Displays correct and incorrect predictions.

Logistic Confusion Matrix

Correlation Matrix — Shows correlation between article text length and label.

Logistic Correlation Matrix

Top Keywords — Bar plots of the most predictive words for fake and real news.

Logistic Top Keywords used in Real News

Logistic Top Keywords used in Fake News

2. Multinomial Naive Bayes

A fast, probabilistic classifier designed for text data with discrete features.

Metrics Calculated:

Accuracy
Precision
Recall
F1 Score

📊 Visualizations:

Confusion Matrix — Indicates classification performance.

Naive Bayes Confusion Matrix

ROC Curve — Visualizes the trade-off between true positive and false positive rates.

ROC Curve

Word Clouds:
- 🔴 Fake News — Shows frequent terms in fake news articles
Fake News
- 🟢 Real News — Highlights words common in real news articles
Real News

These visualizations help interpret not just how well the models performed, but also why — based on word patterns and prediction behavior.