Leveraging Taxonomic Structure in Classification on Large Label Sets

Friday, April 17, 2026 · 1:10 PM – 1:40 PM · Violet Crown Charlottesville

Ever had to solve a classification problem - on a budget - where you had “too many” target labels - 100? 1000? Maybe you thought of trying to break the problem up into smaller sub-problems, or resorted to heuristics for part of the solution. This talk describes a simple approach to decomposing this type of problem any time your label set is subject to a natural taxonomic structure. The resulting solution is simple, efficient (both statistically and computationally), and produces a principled uncertainty estimate over the whole label set, should you require it. We’ve employed it successfully at Trilliant Health for at least one problem of imputing values from a hierarchical medical coding system. The underlying framework code is now available on PyPI via our team’s open source monorepo.

About the Speaker

Matt Hawthorn

ML Engineer by day, math and statistics enthusiast by night

Matt started his journey in data science and machine learning in 2015 after a pivot from pure mathematics. His first love in the field was NLP, and he's dabbled in many other areas since. He has a passion for efficient, elegant code, and his latest side quest is learning Rust. When not hacking, you can find him playing a game of Go or enjoying the beauty of the Shenandoah Valley with his family and friends.