Homepage of Adam Kapelner, Ph.D.

Experience, Education,
Awards & Skills

Publications

Inventions

Teaching

Grants, Presentations,
& Service

I'm an Associate Professor of Mathematics at Queens College, CUNY in New York City. My Ph.D. and Masters is in Statistics from Wharton / UPenn. My doctoral thesis and current research program concentrates on improving the design of randomized experiments. We recently proved Fisher's blocking designs are optimal nearly 100yr after Fisher proposed blocking (see here).

Before graduate school, I was a software engineer building web applications. I still use these skills to run social science experiments and I'm interested in investigating racism and other types of discrimination (e.g. see here). My undergrad was in mathematics & computational science at Stanford.

I have a special love for teaching and mentoring. I helped to create and now administer the undergrad Data Science and Statistics concentration. My undergraduate alumni have gone on to data science, data engineering, data analyst and actuary careers at Wells Fargo, Webster Bank, Amazon, NASA, Grant Thornton, various tech startups, among others as well as various PhD programs at NYU, Drexel and Columbia.

I can't maintain all the pages of this website, so most of their information are all contained in my CV. My github is here. I'm also available for private tutoring (in advanced mathematics, computer science, data science) and data science consulting which I've done for Tesorio and Coatue Management among others.

This server also serves my gradesly.com website.

I most love problem solving and engineering solutions.
Below are a sample of problems and their solutions (implemented as software). For a list of others, see my CV by clicking here.

DictionarySquared
Problem:	How do we learn vocabulary? Is there an optimal way to learn?
Solution:	I built a fun and interactive vocabulary training program based on 50 years of vocabulary research. We learn from seeing and hearing the word in context over many repeated exposures. Contexts can be written (like the second image on the right), spoken or visual (like the third image on the right). The site is individually tailored so that each student progresses with words based on their current vocabulary level and the word is reviewed based on an optimally spaced schedule. The training program has been used by thousands of high school students over the past five years. Recently, I've teamed up with professors from University of Pittsburgh, Carnegie Mellon and University of South Carolina to create a research team. We just received $1.5M funding for further development from the Institute of Educational Sciences, a division of the United States Department of Education (see grants section).
Available at:	dictionarysquaredresearch.sc.edu for school use on the IES development grant.

bartMachine
Problem:	Non-parametric function estimation. This is a problem most fundamental to statistics and computer science - given a set of inputs X and response y, we need to estimate the function f which links the response and the inputs, y = f(X).
Solution:	We implemented the algorithm Bayesian Additive Regression Trees in Java and wrapped it into an R package which is freely available. Our machine learning suite is fast, lightweight and parallelized and contains many added features such as variable importance testing, variable selection, interaction detection, convergence diagnostics and partial dependence plots.
Available at:	CRAN for download (available within R), github.com for open-source code and papers are available on my publications page under the keyword "bayesian non-parametric learning."
Collaborators:	Justin Bleich, Ed George

GemIdent
Problem:	We were presented with microscopic images of lymph nodes from cancer patients captured by automatic light microscopy at 10x. An example image is on the right (top). We believed there were clinically significant spatial patterns to the immune and cancer cells, thus we were charged with the task of developing software to resolve the centroids of each cell nucleus.
Solution:	We opted for a supervised statistical learning solution. This means that a user would provide pixel examples of the different cells (as well as pixel examples of non-cells, see second image). We noticed that few colors were relevant: the colors of the stains such as red, blue and brown and the background, white. We then extract features of each pixel by looking at these colors in surrounding pixels at different radii to create "color scores" (see third image). These scores for many radii and all colors are arranged in a design matrix which is used to build a random forests model. This model then performs the multinomial classification of all pixels. To resolve centroids, simple heuristics are used to split blobs of classified pixels and find their centers. The above procedure is wrapped in a visual user interface programmed in Java (see fourth image). The procedure can then be run automatically on scanned images that comprise entire tissues and spatial relationships can be found (see last image). The procedure is additionally useful for any images with few colors where the objects of interest are homogeneously colored and shaped.
Available at:	github.com to download the executable and the open-source code. Papers are available as links in my cv here.
Collaborators:	Susan Holmes, Peter P. Lee

Contact me by email at kapelner@gmail.com

DictionarySquared

bartMachine

GemIdent