Tak Yeon Lee

Towards Understanding Human Mistakes of Programming by Example: An Online User Study

IUI '17 Proceedings of the 22nd International Conference on Intelligent User Interfaces

Tak Yeon Lee, Casey Dugan, and Benjamin B. Bederson

Programming-by-Example (PBE) enables users to create programs without writing a line of code. However, there is little research on people's ability to accomplish complex tasks by providing examples, which is the key to successful PBE solutions. This paper presents an online user study, which reports observations on how well people decompose complex tasks, and disambiguate sub-tasks. Our findings suggest that disambiguation and decomposition are difficult for inexperienced users. We identify seven types of mistakes made, and suggest new opportunities for actionable feedback based on unsuccessful examples, with design implications for future PBE systems.

PDF

The Human Touch: How Non-expert Users Perceive, Interpret, and Fix Topic Models

International Journal of Human-Computer Studies, Volume 105, September 2017

Lee, T.Y., Smith, A., Seppi, K., Elmqvist, N., Boyd-Graber, J., and Findlater, L.

Topic modeling is a common tool for understanding large bodies of text, but is typically provided as a "take it or leave it" proposition. Incorporating human knowledge in unsupervised learning is a promising approach to create high-quality topic models. Existing interactive systems and modeling algorithms support a wide range of refinement operations to express feedback. However, these systems' interactions are primarily driven by algorithmic convenience, ignoring users who may lack expertise in topic modeling. To better understand how non-expert users understand, assess, and refine topics, we conducted two user studies—an in-person interview study and an online crowdsourced study. These studies demonstrate a disconnect between what non-expert users want and the complex, low-level operations that current interactive systems support. In particular, our findings include: (1) analysis of how non-expert users perceive topic models; (2) characterization of primary refinement operations expected by non-expert users and ordered by relative preference; (3) further evidence of the benefits of supporting users in directly refining a topic model; (4) design implications for future human-in-the-loop topic modeling interfaces.

PDF

Evaluating Visual Representations for Topic Understanding and Their Effects on Manually Generated Labels

Transactions of the Association for Computational Linguistics, 2016.

Alison Smith, Tak Yeon Lee, Forough Poursabzi-Sangdeh, Leah Findlater, Jordan Boyd-Graber, and Niklas Elmqvist

Probabilistic topic models are important tools for indexing, summarizing, and analyzing large document collections by their themes. However, promoting end-user understanding of topics remains an open research problem. We compare labels generated by users given four topic visualization techniques (word lists, word lists with bars, word clouds, and network graphs) in addition to automatically generated labels on how well downstream users believe the labels appropriately describe corresponding documents. Our study has two phases: a labeling phase where users label visualized topics and a validation phase where new users select which labels best describe the topics' documents. Although all visualizations produce similar quality labels, simple visualizations like word lists allow users to quickly understand topics, while complex visualizations take longer but expose multi-word expressions that simpler visualizations obscure. Automatic labels lag behind user-created labels, but our dataset of manually labeled topics suggest preferred linguistic patterns (e.g., hypernyms, phrases) that can improve automatic topic labeling algorithms.

PDF

Human-Centered and Interactive: Expanding the Impact of Topic Models

Human-Centered Machine Learning workshop, ACM Conference on Human Factors in Computing Systems (CHI 2016).

Alison Smith, Tak Yeon Lee, Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Kevin Seppi, Niklas Elmqvist, and Leah Findlater

Statistical topic modeling is a common tool for summarizing the themes in a document corpus. Due to the complexity of topic modeling algorithms, however, their results are not accessible to non-expert users. Recent work in interactive topic modeling looks to incorporate the user into the inference loop, for example, by allowing them to view a model then update it by specifying important words and words that should be ignored. However, the majority of interactive topic modeling work has been performed without fully understanding the needs of the end user and does not adequately consider challenges that arise in interactive machine learning. In this paper, we outline a subset of interactive machine learning design challenges with specific considerations for interactive topic modeling. For each challenge, we propose solutions based on prior work and our own preliminary findings and identify open questions to guide future work.

PDF

CTArcade: Computational Thinking with Games in School Age Children

International Journal of Child-Computer Interaction

Tak Yeon Lee, Matthew Louis Mauriello, June Ahn, and Benjamin B. Bederson

We believe that children as young as ten can directly benefit from opportunities to engage in computational thinking. One approach to provide these opportunities is to focus on social game play. Understanding game play is common across a range of media and ages. Children can begin by solving puzzles on paper, continue on game boards, and ultimately complete their solutions on computers. Through this process, learners can be guided through increasingly complex algorithmic thinking activities that are built from their tacit knowledge and excitement about game play. This paper describes our approach to teaching computational thinking skills without traditional programming—but instead by building on children's existing game playing interest and skills. We built a system called CTArcade, with an initial game (Tic-Tac-Toe), which we evaluated with 18 children aged 10–15. The study shows that our particular approach helped young children to better articulate algorithmic thinking patterns, which were tacitly present when they played naturally on paper, but not explicitly apparent to them until they used the CTArcade interface.

PDF

Experiments on Motivational Feedback for Crowdsourced Workers

International AAAI Conference on Weblogs and Social Media (ICWSM 2013) [20% acceptance rate]

Tak Yeon Lee, Casey Dugan, Werner Geyer, Tristan Ratchford, Jamie Rasmussen, N. Sadat Shami, Stela Lupushor

This paper examines the relationship between motivational design and its longitudinal effects on crowdsourcing systems. In the context of a company internal web site that crowdsources the identification of Twitter accounts owned by company employees, we designed and investigated the effects of various motivational features including individual / social achievements and gamification. Our 6-month experiment with 437 users allowed us to compare the features in terms of both quantity and quality of the work produced by participants over time. While we found that gamification can increase workers' motivation overall, the combination of motivational features also matters. Specifically, gamified social achievement is the best performing design over a longer period of time. Mixing individual and social achievements turns out to be less effective and can even encourage users to game the system.

PDF

CTArcade: Learning Computational Thinking While Training Virtual Characters Through Game Play

CHI '12 Extended Abstracts on Human Factors in Computing Systems, May 05-10, 2012, Austin, Texas, USA

Tak Yeon Lee, Matthew Louis Mauriello, John Ingraham, Awalin Sopan, June Ahn, Benjamin B. Bederson

In this paper we describe CTArcade, a web application framework that seeks to engage users through game play resulting in the improvement of computational thinking (CT) skills. Our formative study indicates that CT skills are employed when children are asked to define strategies of common games such as Connect Four. In CTArcade, users can train their own virtual characters while playing games with it. Trained characters then play matches against other virtual characters. Based on reviewing the matches played, users can improve their game character. A basic usability evaluation was performed on the system, which helped to define plans for improving CTArcade and assessing its design goals.

PDF

TreeCovery: Coordinated dual treemap visualization for exploring the Recovery Act

Government Information Quarterly (December 2011) doi:10.1016/j.giq.2011.07.004

Rios, M., Sharma, P., Lee, T.Y., Schwarts, R. and Shneiderman, B.

The American Recovery and Reinvestment Act dedicated $787 billion to stimulate the U.S. economy and mandated the release of the data describing the exact distribution of that money. The dataset is a large and complex one; one of its distinguishing features is its bi-hierarchical structure, arising from the distribution of money through agencies to specific projects and the natural aggregation of awards based on location. To offer a comprehensive overview of the data, a visualization must incorporate both these hierarchies. We present TreeCovery, a tool that accomplishes this through the use of two coordinated treemaps. The tool includes a number of innovative features, including coordinated zooming and filtering and a proportional highlighting technique across the two trees. TreeCovery was designed to facilitate data exploration, and initial user studies suggest that it will be helpful in insight generation. RATB (Recovery Accountability and Transparency Board) has tested TreeCovery and is considering including the concept in their visual analytics.

PDF

Optimizing Display Advertisements Based on Historic User Trails

SIGIR 2011 Workshop: Internet Advertising (IA2011)

Gupta, N., Khurana, U., Lee, T.Y., and Nawathe, S.

Effective online display advertising requires a dynamic selection of the advertisement to be displayed when a web page is fetched. As the goal of displaying advertisement is to engage the users and obtain clicks, the advertisement which has the highest probability of click should be displayed. In this paper we address the problem of finding the most suitable display advertisement option for a user given his/her current browsing session. Using this historical browsing session information, we mine the association of different advertisement views, engagements and clicks, and apply Bayesian models to find the likelihood of an advertisement to be clicked given a specific set of events that describe a user session. A major challenge in training the model for optimum precision is the sparsity of click events, hence we propose the use of advertisement engagement as a success event like clicks to train the model more effectively. Our technique significantly outperforms the baseline technique of using prior probabilities for selecting advertisements.

PDF