Junyoung Park

Senior AI Researcher at Qualcomm AI Research

I work on efficient inference systems for large language models, especially KV cache selection, speculative decoding, and long-context serving under resource constraints.

Efficient LLM inference KV cache management Speculative decoding Long-context inference RL for combinatorial optimization

Email junyoungpark.ml@gmail.com

CV Google Scholar GitHub

Research Focus

I work mostly on efficient LLM inference, with continuing interests in speculative decoding and reinforcement learning for combinatorial optimization.

KV cache and long-context inference

Work on KV cache selection, eviction, and long-context serving under memory constraints.

Examples: query-oriented KV selection (QuoKA), KV cache eviction for long contexts (KeyDiff).

Speculative decoding

Systems work on draft-model alignment and recursive decoding for faster language model serving.

Examples: recursive speculative decoding, draft-model alignment for speculative decoding.

RL for combinatorial optimization

Earlier work on reinforcement-learning methods for routing, scheduling, and optimization libraries.

Examples: parallel autoregressive policies for multi-agent optimization (PARCO), an RL library for combinatorial optimization (RL4CO).

Publications

ICLR 2026

QuoKA: Query-Oriented KV Selection for Efficient LLM Prefill

Dalton Jones, Junyoung Park, Matthew J Morse, Mingu Lee, Matthew Harper Langston, Christopher Lott

[paper]

NeurIPS 2025

Junyoung Park, Dalton Jones, Matthew Morse, Raghavv Goel, Mingu Lee, Christopher Lott

[paper]

NeurIPS 2025

PARCO: Parallel Autoregressive Models for Multi-Agent Combinatorial Optimization

Federico Berto, Chuanbo Hua, Laurin Luttmann, Jiwoo Son, Junyoung Park, Kyuree Ahn, Changhyun Kwon, Jinkyoo Park

[paper]

KDD 2025

RL4CO: A Unified Reinforcement Learning for Combinatorial Optimization Library

Federico Berto, Chuanbo Hua, Junyoung Park, Minsu Kim, Hyeonah Kim, Jiwoo Son, Haeyeon Kim, Joungho Kim, Jinkyoo Park

[paper] [website]

ICLR Workshop 2026

CAOTE: KV Caching through Attention Output Error-Based Token Eviction

Raghavv Goel, Junyoung Park, Mukul Gagrani, Dalton Jones, Matthew Morse, Harper Langston, Mingu Lee, Christopher Lott

Transportation Science 2025

Genetic Algorithms with Neural Cost Predictor for Solving Hierarchical Vehicle Routing Problems

Abhay Sobhan, Junyoung Park, Jinkyoo Park, Changhyun Kwon

TMLR 2025

Routefinder: Towards Foundation Models for Vehicle Routing Problems

Federico Berto, Chuanbo Hua, Nayeli Gast Zepeda, André Hottung, Niels Wouda, Leon Lan, Junyoung Park, Kevin Tierney, Jinkyoo Park

[paper]

CVPR Workshop 2024

On Speculative Decoding for Multimodal Large Language Models

Mukul Gagrani, Raghavv Goel, Wonseok Jeon, Junyoung Park, Mingu Lee, Christopher Lott

[paper]

ICLR Workshop 2024

Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-Tuned LLMs

Raghavv Goel, Mukul Gagrani, Wonseok Jeon, Junyoung Park, Mingu Lee, Christopher Lott

[paper]

ICLR Workshop 2024

Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement

Wonseok Jeon, Mukul Gagrani, Raghavv Goel, Junyoung Park, Mingu Lee, Christopher Lott

[paper]

Earlier publications and selected preprints 14 papers

Expand to view work from 2023 back to 2018.

MSEC 2023

Generating Dispatching Rules for the Interrupting Swap-Allowed Blocking Job Shop Scheduling Problem Using Graph Neural Network and Reinforcement Learning

Vivian Wen Hui Wong, Sang Hun Kim, Junyoung Park, Jinkyoo Park, Kincho H. Law

[paper]

ICLR 2023

Neuro CROSS Exchange: Learning to CROSS Exchange to Solve Realistic Vehicle Routing Problems

Junyoung Park, Minjun Kim, Jinkyoo Park

[paper]

AAMAS 2023

Learn to Solve the Min-Max Multiple Traveling Salesmen Problem with Reinforcement Learning

Junyoung Park, Changhyun Kwon, Jinkyoo Park

arXiv 2023

FOCA: First-Order Context-Based Adaptation for Generalizing to New Dynamical Systems

Junyoung Park, Federico Berto, Arec Jamgochian, Mykel J. Kochenderfer, Jinkyoo Park

arXiv 2023

Learning Context-Aware Adaptive Solvers to Accelerate Convex Quadratic Programming

Haewon Jung, Junyoung Park, Jinkyoo Park

[paper]

NeurIPS 2022

Sym-NCO: Leveraging Symmetricity for Neural Combinatorial Optimization

Minsoo Kim, Junyoung Park, Jinkyoo Park

[paper] [code]

ICLR 2022

Convergent Graph Solvers

Junyoung Park, Jinhyun Choo, Jinkyoo Park

[paper] [code]

arXiv 2022

Continuous-Depth Neural Models for Dynamic Graph Prediction

Michael Poli, Stefano Massaroli, Clayton M Rabideau, Junyoung Park, Atsushi Yamashita, Hajime Asama, Jinkyoo Park

arXiv 2022

ScheduleNet: Learn to Solve Multi-Agent Scheduling Problems with Reinforcement Learning

Junyoung Park, Sanjar Bakhtiyar, Jinkyoo Park

IJPR 2021

Learning to Schedule Job-Shop Problems: Representation and Policy Learning Using Graph Neural Network and Reinforcement Learning

Junyoung Park, Jaehyeong Chun, Sang Hun Kim, Youngkook Kim, Jinkyoo Park

[paper] [top-cited 2021/22] IF = 8.568

DLGMA @ AAAI 2020

Graph Neural Ordinary Differential Equations

Michael Poli, Stefano Massaroli, Junyoung Park, Atsushi Yamashita, Hajime Asama, Jinkyoo Park

[paper] [code]

IEEE Trans. Sustainable Energy 2019

Wind Field-Based Short-Term Turbine Response Forecasting by Stacked Dilated Convolutional LSTMs

Seongcheol Woo, Junyoung Park, Jinkyoo Park, Lance Manuel

IF = 9

Energy 2019

Physics-Induced Graph Neural Network: An Application to Wind-Farm Power Estimation

Junyoung Park, Jinkyoo Park

[paper] [code] [slides] IF = 8.857

IEEE Power & Energy Society 2018

Predicting Wind Turbine Power and Load Outputs by Multi-Task Convolutional LSTM Model

Seongcheol Woo, Junyoung Park, Jinkyoo Park

Latest News

Recent paper acceptances, workshop activity, and research updates.

Mar 2026 CAOTE accepted to ICLR 2026 Workshop MemAgents.
Mar 2026 QuoKA (query-oriented KV selection for efficient LLM prefill) accepted to ICLR 2026.
Sep 2025 Two papers accepted to NeurIPS 2025: KeyDiff (KV cache eviction for long-context LLM inference) and PARCO (parallel autoregressive models for multi-agent combinatorial optimization).
Aug 2025 Routefinder accepted to TMLR 2025.
Jul 2025 RL4CO accepted to KDD 2025. Check out rl4co.ai4co.org.
Mar 2024 Recursive Speculative Decoding accepted to the LLM Agents Workshop at ICLR 2024.
Oct 2023 Started as Senior AI Researcher at Qualcomm AI Research to work on efficient LLM.

Education

Training in industrial engineering, optimization, and machine learning at KAIST.

Ph.D. — KAIST

Industrial & Systems Engineering · Mar 2016 – Feb 2023
Advisor: Jinkyoo Park

Thesis: Applications of graph neural networks in modeling and decision-making of dynamic networked systems (Best Dissertation Award, College of Engineering, 2023)

B.S. — KAIST

Industrial & Systems Engineering · Business and Technology Management (Double Major)
Feb 2011 – Feb 2016 · National Excellence Scholarship (Fully Funded)