I'm an Assistant Professor of Mathematics at Queens College in New York City. I recently graduated with a Ph.D. in Statistics from Wharton Business School. I was previously a software engineer building web applications in San Francisco and a mathematics & computational science undergraduate at Stanford University.
My talent lies in engineering creative solutions to problems using a toolbox built from my studies in statistics, mathematics, machine learning, computer science, crowdsourcing and natural language processing. I also have a special love for teaching and mentoring.
For a printable version of my CV, click here or use the buttons below to navigate.
I've published in a variety of fields. My interests loosely are statistical learning, randomized experimentation, crowdsourcing, and biomedical applications. Please choose among the keywords below to sort by topic.

For a list of my citations, visit my Google Scholar page or my ResearchGate page.
Kapelner, A. & Krieger, A. (2014) Matching on-the-fly in Sequential Experiments for Higher Power and Efficiency. Biometrics, 70 (2), 378 - 388 (journal page) (free PDF)
Imagine you are running a sequential experiment measuring the difference between a treatment and control condition (e.g. a pill for blood pressure via a clinical trial or testing user behavior in an Internet-based experiment). You can match similar subjects together on-the-fly (as they arrive) to achieve higher power and efficiency in the experimental results.
Kapelner, A., Bleich, J., Cohen, Z. D., DeRubeis, R. J. & Berk, R. A. (2014) Inference for Treatment Regime Models in Personalized Medicine. submitted to Biometrics (free PDF)
Imagine you are a medical practitioner treating a disease by prescribing one of two possible drugs. Which drug do you assign to patients? Is your special assignment procedure beneficial versus a naive random assignment? How much better and is the improvement statistically significant?
Kapelner, A. & Vorsanger, M. (2014) Starvation of Cancer via Induced Ketogenesis and Severe Hypoglycemia. in press, Medical Hypotheses (journal page) (free PDF)
It is well known that cancer cells are solely dependent on glucose as their substrate for metabolism and they are not able to utilize other fuel sources such as ketones and fatty acids. It is also known that humans under heavy ketosis do not experience symptoms of hypoglycemia. In our proposal for cancer therapy, we marry these two ideas over the long term.
Kapelner, A. & Bleich, J. (2014) Prediction with Missing Data via Bayesian Additive Regression Trees. accepted, Canadian Journal of Statistics (free PDF)
We develop an extension to Bayesian Additive Regression Trees (a new procedure for statistically learning non-parametric functional relationship between a set up input variables X and a response variable y). In the extension, we incorporate missing data without the need for imputation. Simulations using real data and generated models demonstrate high performance and stability over competitors.
Bleich, J., Kapelner, A., George, E. I. & Jensen, S. T. (2014) Variable Selection for BART: An Application to Gene Regulation. Annals of Applied Statistics, 8(3): 1750-1781 (journal page) (free PDF)
We adapt Bayesian Additive Regression Trees (a new procedure for statistically learning non-parametric functional relationship between a set up input variables X and a response variable y). This adaptation can select "important" variables from the set of x's that affect y by employing a principled permutation-based inference procedure. We can also incorporate prior information about which variables are thought to be important before looking at the data.
Goldstein, A., Kapelner, A., Bleich, J. & Pitkin, E. (2014) Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation. in press, Journal of Computational & Graphical Statistics (journal page) (free PDF)
We develop a tool for visualizing the model estimated by any supervised "machine" learning algorithm. We plot the variation of the fitted values across the range of a covariate of interest for all cases. These lines suggest where and to what extent heterogeneities exist between cases. We also include a visual test for model additivity in any covariate.
Kapelner, A. & Bleich, J. (2014) bartMachine: A Powerful Tool for Machine Learning. accepted, Journal of Statistical Software (free PDF)
We present a new R package implementation of Bayesian Additive Regression Trees (a new procedure for statistically learning non-parametric functional relationship between a set up input variables X and a response variable y). The package introduces many new features for data analysis using BART such as variable selection, interaction detection, model diagnostic plots, parallelization, incorporation of missing data and the ability to save trees for future prediction.
Bleich, J. & Kapelner, A. (2014) Bayesian Additive Regression Trees With Parametric Models of Heteroskedasticity. in revision, Bayesian Analysis (free PDF)
We adapt Bayesian Additive Regression Trees (a new procedure for statistically learning non-parametric functional relationship between a set up input variables X and a response variable y). This adaptation incorporates heteroskedasticity into the model by modeling the form of heteroskedasticity as a linear model of another set of covariates (may or may not be X). In simulations, we demonstrate a reduction in overfitting and more appropriate predictive intervals than homoskedastic BART.
Berk, R. A., Bleich, J., Kapelner, A., Henderson, J., Kurtz, E. (2014) Using Regression Kernels to Forecast A Failure to Appear in Court. submitted to Journal of Quantitative Criminology
We develop an implementation of principal components logistic regression using a novel three split procedure. The first split trains the kernel models from a set of many kernels, the second picks the model which respects low error and correct error costs, and the third gives an honest assessment of future performance. Our implementation contains an R package and our paper applies these methods to forecasting failures to appear in court.
Chandler, D. & Kapelner, A. (2013) Breaking Monotony with Meaning: Motivation in Crowdsourcing Markets. Journal of Economic Behavior & Organization, 90: 123-133 (journal page) (free PDF)
We conducted the first natural field experiment to explore the relationship between the "meaningfulness" of a task and worker effort, measured on three scales. We employed ~2500 workers from Amazon's Mechanical Turk, an online labor market, to label medical images. We manipulated the task to exhibit three conditions: "meaningful," "zero-context" and "shredded." We found that the meaningful treatment increased the labor supply and the shredded treatment decreased the quality of the labor.
Schwartz, H. A., Eichstaedt, J., Blanco, E., Agrawal, M., Dziurzynnski, L., Kern, M. L., Kapelner, A., Park, G., Jha, S., Stillwell, D., Kosinski, M. & Ungar, L. H. (2014) Predicting People's Well-Being in Social Media: Multi-level message and user models of language use. working paper
We presented the task of predicting well-being, as measured by the "satisfaction with life scale." Using Amazon's Mechanical Turk, we created a training set of textual examples properly rated. We then used machine learning to build a high-performance model and in addition, we identify textual features that characterize well-being.
Kapelner, A., Kaliannan, K., Schwartz, H. A., Ungar, L. H. & Foster, D. P. (2012) New Insights from Coarse Word Sense Disambiguation in the Crowd. CoLING (journal page)
We use crowdsourcing to disambiguate 1000 words from among coarse-grained senses. Using regression, we find surprising features which drive differential WSD accuracy: (a) the number of rephrasings within a sense definition is associated with higher accuracy; (b) as word frequency increases, accuracy decreases even if the number of senses is kept constant; and (c) spending more time is associated with a decrease in accuracy.
Kapelner, A. & Chandler, D. (2010) Preventing Satisficing in Online Surveys. Proceedings of CrowdConf (journal page)
We examine the prevalence of satisficing (mental shortcuts / cheating) on Amazon's Mechanical Turk's survey tasks. We present a question-presentation method of fading in survey questions and answers one-by-one, called "Kapcha," which we experimentally demonstrate to reduce satisficing, thereby improving the quality of survey results.
Chang, A. Y., Bhattacharya, N., Mu, J., Setiadi, A. F., Carcamo-Cavazos, V., Lee, G. H., Simons, D. L., Yadegarynia, S., Hemati, K., Kapelner, A., Zheng, M., Krag, D. N., Schwartz, E. J., Chen, D. Z. & Lee, P. P. (2013) Spatial organization of dendritic cells within tumor draining lymph nodes impacts clinical outcome in breast cancer patients. Journal of translational medicine, 11(1): 242 (journal page)
We describe the spatial organization of dendritic cells within tumor-draining lymph nodes using the software gemident. We then describe the spatial organization's association with survival outcome in cancer patients. We also characterize specific changes in number, size, maturity, and T-cell co-localization of such clusters.
Setiadi, A. F.; Ray, N. C., Kohrt, H. E., Kapelner, A., Carcamo-Cavazos, V., Levic, E. B., Yadegarynia, S., van der Loos, C. M., Schwartz, E. J., Holmes, S. & Lee, P. P. (2010) Quantitative, architectural analysis of immune cell subsets in tumor-draining lymph nodes from breast cancer patients and healthy lymph nodes. PloS one, 5(8): e12420 (journal page)
We present a novel, quantitative image analysis approach incorporating 1) multi-color tissue staining, 2) high-resolution, automated whole-section imaging, 3) the use of the "gemident" image analysis software to identify cell types and locations, and 4) spatial statistical analysis. We apply our integrative approach to compare the architectural patterns of T and B cells within tumor-draining lymph nodes from breast cancer patients versus healthy lymph nodes. We found that the spatial grouping patterns of T and B cells differed between healthy and breast cancer lymph nodes, and this could be attributed to the lack of B cell localization in the extrafollicular region of the TDLNs.
Holmes, S., Kapelner, A. & Lee, P. P. (2009) An interactive java statistical image segmentation system: Gemident. Journal of Statistical Software, 30(10): 1-20 (journal page)
We present a novel object identification algorithm developed in Java which locates objects of interest in images. Here, we apply the system to finding cells in images of immunohisto-chemically-stained lymph node tissue. The success of the method depends heavily on the use of color, the relative homogeneity of object appearance, the user's input, and the coupled statistical learning algorithm, random forests. Our system enables iterative improvements to the classification over many correction cycles, resulting in a highly accurate and user-friendly system.
Kapelner, A., Lee, P. P. & Holmes, S. (2007) An interactive statistical image segmentation and visualization system. in proceedings of IEEE, Medical Information Visualisation (journal page) (free PDF)
Supervised learning can be used to segment regions of interest in images making use of color and morphological information. We developed a novel object identification algorithm in Java which locates phenotypes of interest in images. Our main innovation is interactive feature extraction from color images by using sums over color similarities (as measured by the Mahalanobis distance) at various radii. These features are then fed into a statistical learning algorithm to classify pixels belonging to phenotypes of interest.
Contact me by email: © 2016 Adam Kapelner
Consulting_button_rollover Tutoring_button_rollover'
© 2016 Adam Kapelner