EST. — Data Scientist in London
Chaeyoon
Kim.
Data Scientist at NHS England · Certified AI Ethicist · London
01 — About
Working at the edge of data and health.
I'm a Data Scientist at NHS England working on healthcare workforce modelling and LLM applications. Certified AI Ethicist with a background in data engineering at Samsung Semiconductor and an MSc in Data Science from City St George's, University of London.
LangChain Ambassador (2025) · open to new connections and collaborations.
View full CV ↗02 — Projects
Selected work.
NLP, Text & QA
Reducing Missed NHS Appointments
ML solution to cut the cost and waitlist impact of non-attended hospital appointments, developed for a hackathon challenge set by the No. 10 data science team.
↗
Survey Sentiment Analysis
Natural-language pipeline for open-ended survey responses: cleans and tokenises free-text answers, builds per-question word clouds, and scores sentiment with TextBlob to surface respondent opinions. Built in Python with pandas, NLTK, and matplotlib.
↗
1st Place
Semantic Answer Type Prediction
MSc dissertation submitted to the SMART 2021 shared task at the International Semantic Web Conference. Ranked 1st for classifying the expected answer type — entity, literal, or boolean — from natural-language questions over a knowledge graph.
↗
AI Engineering & ML
In Progress
PWR Workforce Elasticity Modelling
Panel econometrics estimating how NHS provider non-substantive staff spend responds to agency-restriction policy. 68 NHS trusts, 4 financial years, 68 passing tests. Headline elasticity β = −0.287 [95% CI −0.434, −0.140], cluster-robust SE on ICS.
↗
In Progress
NHS Policy Navigator
Adaptive retrieval agent over the NHS 10-Year Health Plan, built in London. Uses agentic RAG to answer nuanced policy questions with source attribution.
↗
Community Pharmacy Workforce Projection
Workforce projection model for NHS community pharmacy, built on open data. Projects the supply of pharmacists and pharmacy technicians across baseline, optimistic, and pessimistic scenarios, using compound annual growth rates from GPhC registers and the Community Pharmacy Workforce Survey.
↗
Pharmacy Analysis with Open Data
Python package (published to PyPI) for analysing NHS pharmacy open data at national and regional scale. Covers supervision and staffing across 10,000+ pharmacies, with Cohere-powered accessibility prediction and a Google Maps pharmacy finder. Vibe-coded with Cursor.
↗
Heartlink — Heart Disease Prediction
Spec-driven heart disease prediction pipeline with an NHS-styled dashboard. Classification, regression, and clustering on UCI Heart Disease data with clinical guardrails and property-based testing.
↗
More projects
In Progress
Bilingual To-Do — Korean & English
My seed idea for a bilingual Korean/English Kanban: a child's typed, spoken, or tapped message is routed to a tool that updates the board — a tool-calling agent sandbox on the route → dispatch pattern, with a PIN-gated parent mode and cards that conjugate by column to teach tense. This repo is the Korean-readable origin; with KUITA collaborators it grew into the live togethertodo app for pre-literate toddlers learning alongside their mums.
Getting Started with Makaton — AAC Choice Board
Digital symbol-based choice board for non-verbal and emerging-verbal pupils in UK SEN schools. React 18 + Supabase, optimised for iPad. Predictive card suggestions via Markov chain and Thompson-sampling bandit — no runtime LLM dependency. GDPR-compliant with multi-source AAC symbol fallback chain.
↗
03 — Writing
Thoughts on
data science.
Reflections & project notes
On LinkedIn