AI Con 2026

Prepare the Data

Since ML models are trained on data, the quality of that data will impact their overall quality and usefulness. Many open source datasets allow you to quickly jump into exploratory data analysis and feature selection. However, it is important to note that training data may first have to be collected, cleansed, structured, transformed, enriched and validated.

Practice Goals:

For this practice session, data for training the model is available but random noise in the form of bad samples has been added. You will have to inspect the dataset and clean it up. You're also tasked with gathering new or splitting out existing samples for testing the trained model, and making sure the files and folders are well-organized.

Hands-On Activities:

Download the Weather dataset.
Download Weather dataset (ZIP)
Create top-level folder structure: [train] [val]
Within [train] folder create structure: [cloudy] [rain] [shine] [sunrise]
Place all training samples in their respective sub-folders.
Clean the data by removing bad training samples.
Discover the shape of the training data.
Gather new or split existing samples for testing and place them in [val]
Conduct a final review of the prepared data.

Getting Started with AI and Machine Learning

Machine Learning Hello World

Computer Vision

Natural Language Processing

Prepare the Data