Dataset

🗂️ Dataset Description

The dataset used in this project comes from Kaggle: Fake and Real News Dataset, and consists of two separate CSV files:

Fake.csv — Contains news articles that are labeled as fake
True.csv — Contains news articles that are labeled as real

Each file includes the following columns: - title: The headline of the news article - text: The body/content of the article - subject: Topic category (e.g., politics, world news) - date: Publication date

🔗 Combining the Dataset

To train a supervised machine learning model, we need a single labeled dataset. So we:

Assigned a new column called label:
- 0 for fake articles
- 1 for real articles
Used pandas.concat() to combine both datasets into one DataFrame
Shuffled the rows to randomize the ordering
Saved the result as combined_news.csv

This ensures a balanced and properly labeled dataset for training and testing.