Dataset
🗂️ Dataset Description
The dataset used in this project comes from Kaggle: Fake and Real News Dataset, and consists of two separate CSV files:
Fake.csv
— Contains news articles that are labeled as fakeTrue.csv
— Contains news articles that are labeled as real
Each file includes the following columns: - title
: The headline of the news article - text
: The body/content of the article - subject
: Topic category (e.g., politics, world news) - date
: Publication date
🔗 Combining the Dataset
To train a supervised machine learning model, we need a single labeled dataset. So we:
- Assigned a new column called
label
:0
for fake articles1
for real articles
- Used
pandas.concat()
to combine both datasets into one DataFrame - Shuffled the rows to randomize the ordering
- Saved the result as
combined_news.csv
This ensures a balanced and properly labeled dataset for training and testing.