Geelon So

I’m a PhD Candidate in Computer Science at UC San Diego, where I am advised by Sanjoy Dasgupta and Yian Ma. I obtained my MS in Computer Science at Columbia University, where I was advised by Daniel Hsu. I graduated from the University of Chicago with a BS in Mathematics. [CV]

Research areas: algorithmic statistics, machine learning theory, unsupervised learning, continual learning, optimization

My research interests especially arise out of problems in machine learning where (i) there is too much data, (ii) there is not enough data, (iii) the data keeps changing, and possibly any or all three at the same time. […]

Papers

Here’s my Google Scholar profile. Here’s a list of other writings.

Online consistency of the nearest neighbor rule
Sanjoy Dasgupta and Geelon So.
Conference on Neural Information Processing Systems, 2024.
arXiv | talk | poster

Metric learning from limited pairwise preference comparisons
Zhi Wang, Geelon So, Ramya Korlakai Vinayak.
Conference on Uncertainty in Artificial Intelligence, 2024.
arXiv | talk

Optimization on Pareto sets: On a theory of multi-objective optimization
Abhishek Roy*, Geelon So*, Yi-An Ma.
Preprint, 2023.
arXiv | talk

Convergence of online k-means
Sanjoy Dasgupta, Gaurav Mahajan, Geelon So.
International Conference on Artificial Intelligence and Statistics, 2022.
arXiv | pdf | intro video | poster

Active learning with noise
Geelon So.
MS Thesis, Columbia University, 2019. pdf

Talks

Presentations given in various reading groups/seminars/graduate courses.

Online consistency of the nearest neighbor rule
Modern Paradigms of Generalization Talk, Simons Institute, Oct 2024.
Chicago Junior Theorists Workshop, Northwestern and TTIC, Dec 2024.

Metric learning from lazy crowds
Signals, Information, and Algorithms Lab, MIT, Mar 2024.

Geometry of multi-objective optimization
Yale Theory Student Seminar, Yale, Mar 2024.

Metric learning from lazy crowds
EnCORE Student Social, UCSD, Feb 2024.

Learning mixed Nash equilibria
Game theory reading group, UCSD, Feb 2024.

Statistical and online learning, a tutorial
Redwood Center, Berkeley, Dec 2023.

Calibrated learning in games
Forecasting/calibration, UCSD, Dec 2023.

Introduction to neural nets
Machine learning course lecture, UCSD, Jul 2023.

The double descent phenomenon [handout version]
Machine learning course lecture, UCSD, Nov 2022.

Learning with multi-modal data: canonical correlation analysis
Seekr Research reading group, seekr, Sep 2022.

Linear system identification with reverse experience replay
Time series reading group UCSD, Apr 2022.

Gradient play in smooth games
Algorithmic game theory reading group, UCSD, Apr 2022.

Computationally efficient approximation mechanisms
Algorithmic game theory reading group, UCSD, Mar 2022.

Independent component analysis
Unsupervised learning reading group, UCSD, Feb 2022.

Equilibrium computation: motivation and problems
Algorithmic game theory reading group, UCSD, Feb 2022.

Scalable sampling for discrete distributions
Time series reading group, UCSD, Nov 2021.

Graphical games
Algorithmic game theory reading group, UCSD, Nov 2021.

Active learning for maximum likelihood estimation
Time series reading group, UCSD, Sep 2021.

Stochastic calculus on manifolds: Part 1, Part 2
Sampling/optimization reading group, UCSD, Aug 2021.

Linear system identification without mixing
Sequential decision making lecture, UCSD, Jun 2021.

Sequential kernel herding
Sampling/optimization reading group, UCSD, Jun 2021.

Log sobolev inequalities and concentration
Concentration inequalities reading group, UCSD, Apr 2021.

Learning language games through interaction
Continual learning lecture, UCSD, Apr 2021.

Global non-convex optimization with discretized diffusions
Sampling/optimization reading group, UCSD, Apr 2021.

Stochastic differential equations basics
Sampling/optimization reading group, UCSD, Feb 2021.

Model of conserved macroscopic dynamics predicts future motor commands
Computational neurobiology lecture, UCSD, Feb 2021.

A theory of universal learning
Learning theory reading group, UCSD, Nov 2020.

Oja’s rule for streaming PCA
Learning theory reading group, UCSD, Sep 2020.

Proving the lottery ticket hypothesis
Learning theory reading group, UCSD, Aug 2020.

Approximate guarantees for dictionary learning
Learning theory reading group, UCSD, Jun 2020.

k-SVD for dictionary learning
Learning theory reading group, UCSD, May 2020.

Proximal methods for hierarchical sparse coding
Learning theory reading group, UCSD, May 2020.

Transformers are universal approximators
Learning theory reading group, UCSD, Apr 2020.

Using SVD to learn HMMs
Learning theory reading group, UCSD, Feb 2020.

Conditional mutual information and generalization
Generalization theory reading group, UCSD, Feb 2020.

Generalization in adaptive data analysis
Generalization theory reading group, UCSD, Jan 2020.

Generalization and differential privacy
Generalization theory reading group, UCSD, Nov 2019.

Invariant risk minimization
Learning theory reading group, UCSD, Nov 2019.

Complexity: beyond space and time
Home Partners of America tech talk, Aug 2019.

Zero-knowledge proofs from secure multiparty computation
Privacy-preserving technologies lecture, Columbia University, Apr 2019.

Geometry of gradient descent and lower bounds
Optimization reading group, Columbia University, Feb 2019.

Homomorphic encryption
Privacy-preserving technologies lecture, Columbia University, Feb 2019.

Approximate nearest-neighbor search
Machine learning lecture, Columbia University, Dec 2018.

Introduction to tensor decompositions
Machine learning lecture, Columbia University, Dec 2018.

Sums-of-squares for robust estimation
Optimization reading group, Columbia University, Nov 2018.

Spectral clustering and earlier version
Machine learning lecture, Columbia University, Oct 2018.

Sum-of-squares for MAXCUT
Optimization reading group, Columbia University, Sep 2018.

Topological data analysis
Unsupervised machine learning lecture, Columbia University, Jul 2018.

Tensor decompositions for parametric estimation
Unsupervised machine learning lecture, Columbia University, Jul 2018.

PAC-Bayes for ReLu neural networks
Generalization theory reading group, Columbia University, Apr 2018.

Graph robustness and percolation theory
Emergent systems reading group, Columbia University, Mar 2018.

Teaching

I was a lecturer for the 2023 EnCORE Foundations in Data Science program.

I was a teaching assistant for the following courses:

Winter 2025, Machine learning, UCSD
Winter 2024, Machine learning, UCSD
Fall 2022, Machine learning, UCSD
Fall 2020, Probability and statistics, UCSD
Fall 2018, Machine learning, Columbia University
Fall 2018, Unsupervised learning, Columbia University
Summer 2018, Unsupervised learning, Columbia University
Spring 2018, Graph theory, Columbia University
Fall 2017, Geographic information systems, Columbia University

Service

Reveiwer: ACM CSUR 2024; AISTATS 2022–2025; ALT 2025; ICML 2025; JOTA 2025; NeurIPS 2023
CSE PhD DEI fellowship reviewer (2020, 2021)
CSE/HDSI Visit Day AI coordinator (2020, 2021)
ExploreCSR 2020 mentor
Friends of Washington Park Youth Program mentor (2014–2017)