/ml

ML / AI Engineering

Build an AI feature with a measurable quality bar, not just a demo.

Overview

What this track proves.

ML / AI

This track is for people who want to build with models, prompts, retrieval, agents, and evaluations. You will focus on practical AI systems: how to define a task, test outputs, handle failures, and communicate limitations.

Pace

6-10 hrs/week

Capstone

AI workflow, RAG prototype, eval harness, or tool-calling demo.

Review

Async review on task design, eval evidence, grounding, and failure notes.

Good fit

This is for you if...

Use this section as a practical gut check before you apply.

You are curious about LLMs, retrieval, agents, or applied machine learning.
You like experiments, comparisons, and figuring out why an output failed.
You want to build AI features that can be evaluated, not just shown.
Outcomes

What you can leave with.

The goal is proof of work, not passive course completion.

Design an AI workflow with inputs, outputs, constraints, and failure cases.
Create a small evaluation set and compare outputs against a rubric.
Build a prototype using prompts, retrieval, structured output, or a simple agent loop.
Write a report on quality, limitations, and recommended improvements.
Curriculum

The work, in order.

01

Task design

Turn an ambiguous AI idea into a specific job with boundaries and success criteria.

02

Prompting and structure

Use clear instructions, examples, schemas, and context to reduce avoidable model failure.

03

Retrieval and grounding

Add source context when needed and evaluate whether the answer is actually supported.

04

Evaluation

Build a lightweight rubric and sample set so quality can improve through evidence.

Capstone examples

The artifact can take a few shapes.

A RAG prototype with citations and an evaluation table.
A structured-output workflow that extracts, classifies, or drafts useful work with tests.
A simple agent or tool-calling workflow with failure-mode notes.
Builder signal

What Builder-level work looks like.

Builder means you have shipped an AI feature with prompt, retrieval or workflow design, evaluation evidence, and a clear explanation of failure modes.

OpenAI or Anthropic modelsEmbeddings or retrieval toolsTypeScript or PythonSpreadsheets for evalsSupabase