OpenAI o1-preview Tutorial: Mastering Machine Learning Projects with Ease

Artificial intelligence has reached new heights, and OpenAI’s o1-preview model is a game-changer for machine learning (ML) enthusiasts. This tutorial will guide you through building a water quality classifier from scratch, using OpenAI’s latest reasoning model. The goal? Achieve 90% accuracy with minimal manual intervention—just well-structured prompts.

Let’s explore the OpenAI o1 model, its capabilities, and how we can leverage it for end-to-end ML development.

What is OpenAI’s o1 Reasoning Model?

The o1 model is OpenAI’s latest AI designed for human-like reasoning. Unlike traditional models, it:

✅ Solves complex multi-step problems efficiently
✅ Handles intricate coding tasks with minimal errors
✅ Learns from mistakes through reinforcement learning
✅ Excels in math, logic, and scientific reasoning

It comes in two versions:

o1-preview: Best for deep reasoning and structured problem-solving.
o1-mini: A faster variant optimized for math and coding tasks.

Currently, these models lack web browsing, file uploads, and Python REPL functionality, but OpenAI plans to introduce them soon.

Accessing OpenAI o1-preview and o1-mini

There are two main ways to use the o1 models:

1. OpenAI API

If you’ve used OpenAI’s API before, accessing o1-preview is simple:

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
    model="o1-preview",
    messages=[
        {"role": "user", "content": "Create an SQL database from scratch using Python."}
    ]
)

print(response.choices[0].message.content)

2. ChatGPT Plus & Team Users

To access o1-preview in ChatGPT:

Click on the model selection dropdown.
Choose o1-preview.
Provide detailed multi-step instructions.

This approach allows for iterative refinement, ensuring optimal responses.

Building a Water Quality Classifier with OpenAI o1

Let’s get our hands dirty and build a machine learning project from scratch. Our mission? Predict if water is potable using a dataset from Kaggle.

Step 1: Engineering the Perfect Prompt

Good prompts are key to extracting the best results from OpenAI’s models. Here’s an optimized request:

My project manager assigned me to develop a Water Quality classification app using the dataset from Kaggle: https://www.kaggle.com/datasets/adityakadiwal/water-potability.

Goal: Achieve 90% accuracy.

Please provide:
- Python code for data preprocessing, model training, and evaluation.
- A FastAPI-based web app for predictions.
- Deployment instructions using Docker on Hugging Face.

This structured prompt ensures that o1-preview delivers step-by-step, actionable insights.

Step 2: Setting Up the Project

After receiving the generated response, organize your project:

mkdir water_quality_classifier && cd water_quality_classifier
mkdir app data models metrics src

Step 3: Loading & Preprocessing Data

Download and preprocess the Kaggle dataset.

import os
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
import joblib

os.makedirs("metrics", exist_ok=True)

# Load dataset
data = pd.read_csv("data/water_potability.csv")

# Handle missing values
imputer = SimpleImputer(strategy="mean")
data_imputed = pd.DataFrame(imputer.fit_transform(data), columns=data.columns)

# Scale features
scaler = StandardScaler()
features_scaled = scaler.fit_transform(data_imputed.drop("Potability", axis=1))

# Save scaler
joblib.dump(scaler, "models/scaler.joblib")

Step 4: Training the Model

Train a Random Forest Classifier to predict water potability.

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import skops.io as sio
import json

X_train, X_test, y_train, y_test = train_test_split(
    features_scaled, data_imputed["Potability"], test_size=0.2, stratify=data_imputed["Potability"]
)

model = RandomForestClassifier(n_estimators=200, random_state=42)
model.fit(X_train, y_train)

sio.dump(model, "models/water_quality_model.skops")

# Save metadata
metadata = {"model": "RandomForestClassifier", "accuracy": model.score(X_test, y_test)}
with open("models/metadata.json", "w") as f:
    json.dump(metadata, f, indent=4)

Step 5: Creating a FastAPI Web App

from fastapi import FastAPI, Request, Form
from fastapi.templating import Jinja2Templates
import numpy as np
import skops.io as sio
import joblib

app = FastAPI()
templates = Jinja2Templates(directory="app/templates")

model = sio.load("models/water_quality_model.skops", trusted=True)
scaler = joblib.load("models/scaler.joblib")

@app.get("/")
async def home(request: Request):
    return templates.TemplateResponse("form.html", {"request": request})

@app.post("/predict")
async def predict(request: Request, ph: float = Form(...), Turbidity: float = Form(...)):
    input_data = np.array([[ph, Turbidity]])
    prediction = model.predict(scaler.transform(input_data))
    result = "Potable" if prediction[0] == 1 else "Not Potable"
    return templates.TemplateResponse("result.html", {"request": request, "result": result})

Step 6: Deploying with Docker on Hugging Face

Create a Dockerfile for deployment:

FROM python:3.9-slim
WORKDIR /app
COPY . /app
RUN pip install --no-cache-dir -r requirements.txt
EXPOSE 7860
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "7860"]

Deploy to Hugging Face:

git clone https://huggingface.co/spaces/your-space
cd your-space
git add . && git commit -m "Deploying ML App"
git push

Final Thoughts

OpenAI’s o1-preview model makes it easier than ever to build end-to-end machine learning applications. With minimal manual coding, you can:

✅ Automate model training with high accuracy
✅ Generate functional web apps with a simple prompt
✅ Deploy AI-powered solutions effortlessly

The future of AI-driven development is here, and it’s smarter than ever. Ready to build your next AI project? Try OpenAI’s o1-preview today!