EST. — Data Scientist in London
Chaeyoon
Kim.
Data Scientist at NHS England · Certified AI Ethicist · London
01 — About
Working at the edge of data and health.
I'm a Data Scientist at NHS England working on healthcare workforce modelling and LLM applications. Certified AI Ethicist with a background in data engineering at Samsung Semiconductor and an MSc in Data Science from City St George's, University of London.
LangChain Ambassador (2025) · open to new connections and collaborations.
View full CV ↗02 — Projects
Selected work.
NLP, Text & QA
In Progress
NoteGuard — Trust Layer for Clinical AI
LangGraph agent (Gemini + Tavily) that de-identifies NHS clinical free-text before any model or tool sees it — assert_clean() raises if a single identifier survives. Real names are restored only in the clinician-facing answer, alongside a measured residual-PII trust score.
↗
Reducing Missed NHS Appointments
ML solution to cut the cost and waitlist impact of non-attended hospital appointments, developed for a hackathon challenge set by the No. 10 data science team.
↗
1st Place
Semantic Answer Type Prediction
MSc dissertation submitted to the SMART 2021 shared task at the International Semantic Web Conference. Ranked 1st for classifying the expected answer type — entity, literal, or boolean — from natural-language questions over a knowledge graph.
↗
AI Engineering & ML
In Progress
PWR Workforce Elasticity Modelling
Panel econometrics estimating how NHS provider non-substantive staff spend responds to agency-restriction policy. 68 NHS trusts, 4 financial years, 68 passing tests. Headline elasticity β = −0.287 [95% CI −0.434, −0.140], cluster-robust SE on ICS.
↗
In Progress
NHS Policy Navigator
Adaptive retrieval agent over the NHS 10-Year Health Plan, built in London. Uses agentic RAG to answer nuanced policy questions with source attribution.
↗
Community Pharmacy Workforce Projection
Workforce projection model for NHS community pharmacy, built on open data. Projects the supply of pharmacists and pharmacy technicians across baseline, optimistic, and pessimistic scenarios, using compound annual growth rates from GPhC registers and the Community Pharmacy Workforce Survey.
↗
Pharmacy Analysis with Open Data
Python package (published to PyPI) for analysing NHS pharmacy open data at national and regional scale. Covers supervision and staffing across 10,000+ pharmacies, with Cohere-powered accessibility prediction and a Google Maps pharmacy finder. Vibe-coded with Cursor.
↗
Heartlink — Heart Disease Prediction
Spec-driven heart disease prediction pipeline with an NHS-styled dashboard. Classification, regression, and clustering on UCI Heart Disease data with clinical guardrails and property-based testing.
↗
More projects
In Progress
Bilingual To-Do — Korean & English
My seed idea for a bilingual Korean/English Kanban: a child's typed, spoken, or tapped message is routed to a tool that updates the board — a tool-calling agent sandbox on the route → dispatch pattern, with a PIN-gated parent mode and cards that conjugate by column to teach tense. This repo is the Korean-readable origin; with KUITA collaborators it grew into the live togethertodo app for pre-literate toddlers learning alongside their mums.
Getting Started with Makaton — AAC Choice Board
Digital symbol-based choice board for non-verbal and emerging-verbal pupils in UK SEN schools. React 18 + Supabase, optimised for iPad. Predictive card suggestions via Markov chain and Thompson-sampling bandit — no runtime LLM dependency. GDPR-compliant with multi-source AAC symbol fallback chain.
↗
03 — Writing
Thoughts on
data science.
Reflections & project notes
Four Knobs in My Text Preprocessing Pipeline (and What Each One Actually Does)
In What I Got Wrong Building a Sentiment Analysis Pipeline for Survey Data and the follow-up checklist, I w...
↗
What I Got Wrong Building a Sentiment Analysis Pipeline for Survey Data
I ran a sentiment analysis pipeline over free-text responses from an internal survey asking people for thei...
↗
Fixing a Sentiment Analysis Pipeline: A Checklist for Next Time
In What I Got Wrong Building a Sentiment Analysis Pipeline for Survey Data, I traced why a survey sentiment...
↗
Prompting Gemini to generate Makaton-style symbol cards
What I learnt about prompt engineering for consistent, licensing-safe AAC symbol generation while building the Getting Started with Makaton choice board.
↗
On LinkedIn