research-article

Assignment Techniques for Crowdsourcing Sensitive Tasks

Authors:
L. Elisa Celis

EPFL

EPFL
View Profile

,
Sai Praneeth Reddy

IIT-Delhi

IIT-Delhi
View Profile

,
Ishaan Preet Singh

IIT-Delhi

IIT-Delhi
View Profile

,
Shailesh Vaya

Xerox Research Centre

Xerox Research Centre
View Profile

CSCW '16: Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social ComputingFebruary 2016Pages 836–847https://doi.org/10.1145/2818048.2835202

Published:27 February 2016Publication History

CSCW '16: Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing

Pages 836–847

ABSTRACT

Protecting the privacy of crowd workers has been an important topic in crowdsourcing, however, task privacy has largely been ignored despite the fact that many tasks, e.g., form digitization, live audio transcription or image tagging often contain sensitive information. Although assigning an entire job to a worker may leak private information, jobs can often be split into small components that individually do not. We study the problem of distributing such tasks to workers with the goal of maximizing task privacy using such an approach.

We introduce information loss functions to formally measure the amount of private information leaked as a function of the task assignment. We then design assignment mechanisms for three different assignment settings: PUSH, PULL and a new setting Tug Of War (TOW), which is an intermediate approach that balances flexibility for both workers and requesters. Our assignment algorithms have zero privacy loss for PUSH, and tight theoretical guarantees for PULL. For TOW, our assignment algorithm provably outperforms PULL; importantly the privacy loss is independent of the number of tasks, even when workers collude. We further analyze the performance and privacy tradeoffs empirically on simulated and real-world collusion networks and find that our algorithms outperform the theoretical guarantees.

References

Micah Adler, Soumen Chakrabarti, Michael Mitzenmacher, and Lars Rasmussen. 1998. Parallel Randomized Load Balancing. Random Structures and Algorithms (1998), 159–188. Google ScholarDigital Library
Chithralekha Balamurugan, Shourya Roy, and Sujit Gujar. 2013. Methods and systems for creating tasks of digitizing electronic document. (May 29 2013). US Patent App. 13/904,319.Google Scholar
Chithralekha Balamurugan, Shourya Roy, Jacki O'neill, and Sujit Gujar. 2014. Method and system for a text data entry from an electronic document. (Oct. 21 2014). US Patent 8,867,838.Google Scholar
David Blumenthal. 2010. Launching HIteCH. New England Journal of Medicine 362, 5 (2010), 382–385.Google ScholarCross Ref
Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Matteo Silvestri, and Giuliano Vesci. 2013. Choosing the right crowd: expert finding in social networks. In Proceedings of the 16th International Conference on Extending Database Technology. ACM, 637–648. Google ScholarDigital Library
Jonathan Bragg, Daniel S Weld, and others. 2013. Crowdsourcing multi-label classfication for taxonomy creation. In First AAAI conference on human computation and crowdsourcing.Google Scholar
Bo Brinkman. 2013. An analysis of student privacy rights in the use of plagiarism detection systems. Science and engineering ethics 19, 3 (2013), 1255–1266.Google Scholar
Kuang Chen, Akshay Kannan, Yoriyasu Yano, Joseph M Hellerstein, and Tapan S Parikh. 2012. Shreddr: pipelined paper digitization for low-resource organizations. In Proceedings of the 2nd ACM Symposium on Computing for Development. ACM, 3. Google ScholarDigital Library
Djellel Eddine Difallah, Gianluca Demartini, and Philippe Cudré-Mauroux. 2013. Pick-A-Crowd: Tell Me What You Like, and Ill Tell You What to Do. In Proceedings of the 22nd international conference on World Wide Web. International World Wide Web Conferences Steering Committee, 367–374. Google ScholarDigital Library
Whitfield Diffie and Martin E Hellman. 1976. New directions in cryptography. Information Theory, IEEE Transactions on 22, 6 (1976), 644–654. Google ScholarDigital Library
Noah E Friedkin. 1983. Horizons of observability and limits of informal control in organizations. Social Forces 62, 1 (1983), 54–77.Google ScholarCross Ref
Gagan Goel, Afshin Nikzad, and Adish Singla. 2014. Allocating tasks to workers with matching constraints: truthful mechanisms for crowdsourcing markets. In Proceedings of the companion publication of the 23rd international conference on World wide web companion. International World Wide Web Conferences Steering Committee, 279–280. Google ScholarDigital Library
Gaston H Gonnet. 1981. Expected length of the longest probe sequence in hash code searching. Journal of the ACM (JACM) 28, 2 (1981), 289–304. Google ScholarDigital Library
Mitchell Gordon, Walter S Lasecki, Winnie Leung, Ellen Lim, Steven P Dow, and Jeffrey P Bigham. 2014. Glance Privacy: Obfuscating Personal Identity While Coding Behavioral Video. In Second AAAI Conference on Human Computation and Crowdsourcing.Google Scholar
Ralph Gross and Alessandro Acquisti. 2005. Information revelation and privacy in online social networks. In Proceedings of the 2005 ACM workshop on Privacy in the electronic society. ACM, 71–80. Google ScholarDigital Library
Christopher G Harris. 2011. Dirty deeds done dirt cheap: a darker side to crowdsourcing. In Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third International Conference on Social Computing (SocialCom). IEEE, 1314–1317.Google Scholar
Kashmir Hill and Zack O'Malley Greenburg. 2010. The Black Market Price of Your Personal Info. Forbes Magazine. http://www.forbes.com/2010/11/29/ black-market-price-of-your-info-personal-finance. htmlGoogle Scholar
Jeff Howe. 2008. Crowdsourcing: How the power of the crowd is driving the future of business. Wired Magazine, Random House. Google ScholarDigital Library
Srikanth Jagabathula, Lakshminarayanan Subramanian, and Ashwin Venkataraman. 2014. Reputation-based worker filtering in crowdsourcing. In Advances in Neural Information Processing Systems. 2492–2500.Google Scholar
Hiroshi Kajino, Yukino Baba, and Hisashi Kashima. 2014. Instance-Privacy Preserving Crowdsourcing. In Second AAAI Conference on Human Computation and Crowdsourcing.Google Scholar
Ravi Kannan, Santosh Vempala, and Adrian Vetta. 2004. On clusterings: Good, bad and spectral. Journal of the ACM (JACM) 51, 3 (2004), 497–515. Google ScholarDigital Library
Ehud D Karnin, Eugene Walach, and Tal Drory. 2010. Crowdsourcing in the document processing practice. Springer.Google Scholar
Roman Khazankin, Harald Psaier, Daniel Schall, and Schahram Dustdar. 2011. Qos-based task scheduling in crowdsourcing environments. In Service-Oriented Computing. Springer, 297–311. Google ScholarDigital Library
Ashiqur R KhudaBukhsh, Jaime G Carbonell, and Peter J Jansen. 2014. Detecting Non-Adversarial Collusion in Crowdsourcing. In Second AAAI Conference on Human Computation and Crowdsourcing.Google Scholar
Aniket Kittur, Jeffrey V Nickerson, Michael Bernstein, Elizabeth Gerber, Aaron Shaw, John Zimmerman, Matt Lease, and John Horton. 2013. The future of crowd work. In Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 1301–1318. Google ScholarDigital Library
Aniket Kittur, Boris Smus, Susheel Khamkar, and Robert E Kraut. 2011. Crowdforge: Crowdsourcing complex work. In Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 43–52. Google ScholarDigital Library
Nicolas Kokkalis, Thomas Köhn, Carl Pfeiffer, Dima Chornyi, Michael S Bernstein, and Scott R Klemmer. 2013. EmailValet: Managing email overload through private, accountable crowdsourcing. In Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 1291–1300. Google ScholarDigital Library
Walter Lasecki, Christopher Miller, Adam Sadilek, Andrew Abumoussa, Donato Borrello, Raja Kushalnagar, and Jeffrey Bigham. 2012. Real-time captioning by groups of non-experts. In Proceedings of the 25th annual ACM symposium on User interface software and technology. ACM, 23–34. Google ScholarDigital Library
Walter S Lasecki, Mitchell Gordon, Jaime Teevan, Ece Kamar, and Jeffrey P Bigham. 2015. Preserving Privacy in Crowd-Powered Systems. (2015).Google Scholar
Walter S Lasecki, Jaime Teevan, and Ece Kamar. 2014. Information extraction and manipulation threats in crowd-powered systems. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing. ACM, 248–256. Google ScholarDigital Library
Matthew Lease, Jessica Hullman, Jeffrey P Bigham, Michael Bernstein, Juho Kim, Walter Lasecki, Saeideh Bakhshi, Tanushree Mitra, and Robert C Miller. 2013. Mechanical turk is not anonymous. Social Science Research Network (2013).Google Scholar
Greg Little and Yu-An Sun. 2011. Human OCR: Insights from a complex human computation process. In Workshop on Crowdsourcing and Human Computation, Services, Studies and Platforms, ACM CHI. Citeseer.Google Scholar
R Manmatha, Chengfeng Han, Edward M Riseman, and W Bruce Croft. 1996. Indexing handwriting using word matching. In Proceedings of the first ACM international conference on Digital libraries. ACM, 151–159. Google ScholarDigital Library
Jon Noronha, Eric Hysen, Haoqi Zhang, and Krzysztof Z Gajos. 2011. Platemate: crowdsourcing nutritional analysis from food photographs. In Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 1–12. Google ScholarDigital Library
Ali Nosary, Laurent Heutte, Thierry Paquet, and Yves Lecourtier. 1999. Defining writer's invariants to adapt the recognition task. In Document Analysis and Recognition, 1999. ICDAR'99. Proceedings of the Fifth International Conference on. IEEE, 765–768. Google ScholarDigital Library
U.S. Department of Health & Human Services. 2000. Summary of the HIPPA Privacy Rule. http://www.hhs. gov/ocr/privacy/hipaa/understanding/summary/Google Scholar
Jacki O'Neill, Shourya Roy, Antonietta Grasso, and David Martin. 2013. Form digitization in BPO: from outsourcing to crowdsourcing?. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 197–206. Google ScholarDigital Library
Tore Opsahl and Pietro Panzarasa. 2009. Clustering in weighted networks. Social networks 31, 2 (2009), 155–163.Google Scholar
Tony M Rath and Rudrapatna Manmatha. 2007. Word spotting for historical documents. International Journal of Document Analysis and Recognition (IJDAR) 9, 2-4 (2007), 139–152. Google ScholarDigital Library
Theodoros Rekatsinas, Amol Deshpande, and Ashwin Machanavajjhala. 2013. SPARSI: Partitioning Sensitive Data Amongst Multiple Adversaries. Proc. VLDB Endow. 6, 13 (Aug. 2013), 1594–1605. DOI: http://dx.doi.org/10.14778/2536258.2536270 Google ScholarDigital Library
Pierangela Samarati and Latanya Sweeney. 1998. Generalizing data to provide anonymity when disclosing information. In PODS, Vol. 98. 188. Google ScholarDigital Library
Cristina Sarasua and Matthias Thimm. 2013. Microtask available, send us your CV!. In Cloud and Green Computing (CGC), 2013 Third International Conference on. IEEE, 521–524. Google ScholarDigital Library
Benjamin Satzger, Harald Psaier, Daniel Schall, and Schahram Dustdar. 2013. Auction-based crowdsourcing supporting skill management. Information Systems 38, 4 (2013), 547–560. Google ScholarDigital Library
Imran Ahmed Siddiqi and Nicole Vincent. 2007. Writer identfication in handwritten documents. In Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on, Vol. 1. IEEE, 108–112. Google ScholarDigital Library
Lav R Varshney. 2012. Privacy and reliability in crowdsourcing service delivery. In SRII Global Conference (SRII), 2012 Annual. IEEE, 55–60. Google ScholarDigital Library
Lav R Varshney, Aditya Vempaty, and Pramod K Varshney. 2014. Assuring privacy and reliability in crowdsourcing with coding. In Information Theory and Applications Workshop (ITA), 2014. IEEE, 1–6.Google ScholarCross Ref
Louis Vuurpijl and Lambert Schomaker. 1996. Coarse writing-style clustering based on simple stroke-related features. Progress in Handwriting Recognition (1996), 37–44.Google Scholar
Gang Wang, Tianyi Wang, Haitao Zheng, and Ben Y Zhao. 2014. Man vs. machine: Practical adversarial detection of malicious crowdsourcing workers. In 23rd USENIX Security Symposium, USENIX Association, CA. Google ScholarDigital Library
Duncan J Watts and Steven H Strogatz. 1998. Collective dynamics of small-worldnetworks. nature 393, 6684 (1998), 440–442.Google Scholar
Stephen M Wolfson and Matthew Lease. 2011. Look before you leap: legal pitfalls of crowdsourcing. Proceedings of the American Society for Information Science and Technology 48, 1 (2011), 1–10.Google ScholarCross Ref
Sai Wu, Xiaoli Wang, Sheng Wang, Zhenjie Zhang, and Anthony KH Tung. 2014. K-anonymity for crowdsourcing database. Knowledge and Data Engineering, IEEE Transactions on 26, 9 (2014), 2207–2221.Google Scholar

Recommendations

Preventing sensitive relationships disclosure for better social media preservation

A fundamental aspect of all social networks is information sharing. It is one of the most common forms of online interaction that is tightly associated with social media preservation and information disclosure. As such, information sharing is commonly ...
Read More
"Our Privacy Needs to be Protected at All Costs": Crowd Workers' Privacy Experiences on Amazon Mechanical Turk

Crowdsourcing platforms such as Amazon Mechanical Turk (MTurk) are widely used by organizations, researchers, and individuals to outsource a broad range of tasks to crowd workers. Prior research has shown that crowdsourcing can pose privacy risks (e.g., ...
Read More
Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of Online Surveys
CHI '15: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems

Crowdsourcing is increasingly being used as a means to tackle problems requiring human intelligence. With the ever-growing worker base that aims to complete microtasks on crowdsourcing platforms in exchange for financial gains, there is a need for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CSCW '16: Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing
February 2016
1866 pages
ISBN:9781450335928
DOI:10.1145/2818048
General Chairs:
Darren Gergle
Northwestern University
,
Meredith Ringel Morris
Microsoft Research
,
Program Chairs:
Pernille Bjørn
University of Copenhagen
,
Joseph Konstan
University of Minnesota
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 February 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Crowdsourcing
Microtasks
Privacy
Social Networks
Qualifiers
- research-article
Conference

Acceptance Rates
CSCW '16 Paper Acceptance Rate142of571submissions,25%Overall Acceptance Rate2,235of8,521submissions,26%
More
Upcoming Conference
CSCW '24

Sponsor:

sigchi

CSCW '24: Computer-Supported Cooperative Work and Social Computing

November 9 - 13, 2024

San Jose , Costa Rica
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 124
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Assignment Techniques for Crowdsourcing Sensitive Tasks

CSCW '16: Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing

ABSTRACT

References

Cited By

Recommendations

Preventing sensitive relationships disclosure for better social media preservation

"Our Privacy Needs to be Protected at All Costs": Crowd Workers' Privacy Experiences on Amazon Mechanical Turk

Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of Online Surveys