Targeted Crowdsourcing using Participant Interest
Yilmaz, Yavuz Selim
MetadataShow full item record
Rapid growth in smartphone usage and social networks provides the potential for large scale crowdsourcing. Despite the availability of the tools and the smart devices for crowdsourcing, the state-of-the-art in aggregating the crowd responses is lagging far behind. Today, crowd response aggregation requires significant effort with active human moderation and/or the heavy use of machine learning algorithms. This limits the applicability of crowdsourcing for advanced tasks. In this thesis, we present an alternative proactive solution for weighing participants/responses and providing more accurate aggregation results via targeted crowdsourcing by leveraging each participant's interests. In order to design effective aggregation algorithms, we have conducted several crowdsourcing projects. To study the importance of participant classification for crowdsourcing, we designed an interest based user weighing and response aggregation schema which intelligently identifies the valuable responses even when such responses are in the minority. As an evaluation, we deployed a location based question answering system on Twitter. We chose among the questions posted on Twitter to query the crowd, and we defined the group of users to forward the questions using their Foursquare location data. Our system was able to answer 75% of the questions accurately, and the latency in answering the questions was low (20 mins). Results revealed that, location-interest based targeting improves the accuracy of the system for some focused location categories (such as nightlife, food etc.). Classification of participants based on their interests, and then asking them multiple choice questions combines the most significant lessons we learned from our crowd- sourcing projects. To leverage this, we deployed CrowdReply, a crowdsourced multiple choice question answering (MCQA) app for the TV show "Who Wants to be a Millionaire?" ( WWTBAM ). CrowdReply aggregates the answers from the participants to play the game while it is live on the TV. In our earlier experiments, CrowdReply was answering the questions using an algorithm called majority voting ( MV ), where it counts the votes for each of the choices for a given multiple choice question and selects the mostly voted choice as the final answer. Using MV, CrowdReply was able to answer more than 90% of the easier questions in WWTBAM TV show. However, its success was plummeting to 50% for the harder questions. To boost CrowdReply's success, we developed our participant interest based MCQA algorithms. In our experiments, we defined target participant groups for the questions by utilizing the applications installed on participants' smartphones. Namely, CrowdReply defines the interest groups based on the application categories on Google Play Store. It classifies the participants to these interest groups by counting the number of applications installed on each participant's devices from each of these application categories. CrowdReply calculates the success of each participant group using training data (i.e. a portion of answering history in the system), and assigns a weight to each participant group for a given question difficulty level. CrowdReply leverages this participant group weights when aggregating participant responses (instead of counting each vote equally as in MV). Our final algorithm improved the answering accuracy by 10% on overall, and more importantly it pulled up the accuracy from 52% to 87% on the harder questions compared to MV. Results indicate that our interest based participant and response weighing method provides more accurate answer aggregation for MCQA queries, and helps building a crowdsourced superplayer for WWTBAM.