ABSTRACT
Protecting the privacy of crowd workers has been an important topic in crowdsourcing, however, task privacy has largely been ignored despite the fact that many tasks, e.g., form digitization, live audio transcription or image tagging often contain sensitive information. Although assigning an entire job to a worker may leak private information, jobs can often be split into small components that individually do not. We study the problem of distributing such tasks to workers with the goal of maximizing task privacy using such an approach.
We introduce information loss functions to formally measure the amount of private information leaked as a function of the task assignment. We then design assignment mechanisms for three different assignment settings: PUSH, PULL and a new setting Tug Of War (TOW), which is an intermediate approach that balances flexibility for both workers and requesters. Our assignment algorithms have zero privacy loss for PUSH, and tight theoretical guarantees for PULL. For TOW, our assignment algorithm provably outperforms PULL; importantly the privacy loss is independent of the number of tasks, even when workers collude. We further analyze the performance and privacy tradeoffs empirically on simulated and real-world collusion networks and find that our algorithms outperform the theoretical guarantees.
- Micah Adler, Soumen Chakrabarti, Michael Mitzenmacher, and Lars Rasmussen. 1998. Parallel Randomized Load Balancing. Random Structures and Algorithms (1998), 159–188. Google ScholarDigital Library
- Chithralekha Balamurugan, Shourya Roy, and Sujit Gujar. 2013. Methods and systems for creating tasks of digitizing electronic document. (May 29 2013). US Patent App. 13/904,319.Google Scholar
- Chithralekha Balamurugan, Shourya Roy, Jacki O'neill, and Sujit Gujar. 2014. Method and system for a text data entry from an electronic document. (Oct. 21 2014). US Patent 8,867,838.Google Scholar
- David Blumenthal. 2010. Launching HIteCH. New England Journal of Medicine 362, 5 (2010), 382–385.Google ScholarCross Ref
- Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Matteo Silvestri, and Giuliano Vesci. 2013. Choosing the right crowd: expert finding in social networks. In Proceedings of the 16th International Conference on Extending Database Technology. ACM, 637–648. Google ScholarDigital Library
- Jonathan Bragg, Daniel S Weld, and others. 2013. Crowdsourcing multi-label classfication for taxonomy creation. In First AAAI conference on human computation and crowdsourcing.Google Scholar
- Bo Brinkman. 2013. An analysis of student privacy rights in the use of plagiarism detection systems. Science and engineering ethics 19, 3 (2013), 1255–1266.Google Scholar
- Kuang Chen, Akshay Kannan, Yoriyasu Yano, Joseph M Hellerstein, and Tapan S Parikh. 2012. Shreddr: pipelined paper digitization for low-resource organizations. In Proceedings of the 2nd ACM Symposium on Computing for Development. ACM, 3. Google ScholarDigital Library
- Djellel Eddine Difallah, Gianluca Demartini, and Philippe Cudré-Mauroux. 2013. Pick-A-Crowd: Tell Me What You Like, and Ill Tell You What to Do. In Proceedings of the 22nd international conference on World Wide Web. International World Wide Web Conferences Steering Committee, 367–374. Google ScholarDigital Library
- Whitfield Diffie and Martin E Hellman. 1976. New directions in cryptography. Information Theory, IEEE Transactions on 22, 6 (1976), 644–654. Google ScholarDigital Library
- Noah E Friedkin. 1983. Horizons of observability and limits of informal control in organizations. Social Forces 62, 1 (1983), 54–77.Google ScholarCross Ref
- Gagan Goel, Afshin Nikzad, and Adish Singla. 2014. Allocating tasks to workers with matching constraints: truthful mechanisms for crowdsourcing markets. In Proceedings of the companion publication of the 23rd international conference on World wide web companion. International World Wide Web Conferences Steering Committee, 279–280. Google ScholarDigital Library
- Gaston H Gonnet. 1981. Expected length of the longest probe sequence in hash code searching. Journal of the ACM (JACM) 28, 2 (1981), 289–304. Google ScholarDigital Library
- Mitchell Gordon, Walter S Lasecki, Winnie Leung, Ellen Lim, Steven P Dow, and Jeffrey P Bigham. 2014. Glance Privacy: Obfuscating Personal Identity While Coding Behavioral Video. In Second AAAI Conference on Human Computation and Crowdsourcing.Google Scholar
- Ralph Gross and Alessandro Acquisti. 2005. Information revelation and privacy in online social networks. In Proceedings of the 2005 ACM workshop on Privacy in the electronic society. ACM, 71–80. Google ScholarDigital Library
- Christopher G Harris. 2011. Dirty deeds done dirt cheap: a darker side to crowdsourcing. In Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third International Conference on Social Computing (SocialCom). IEEE, 1314–1317.Google Scholar
- Kashmir Hill and Zack O'Malley Greenburg. 2010. The Black Market Price of Your Personal Info. Forbes Magazine. http://www.forbes.com/2010/11/29/ black-market-price-of-your-info-personal-finance. htmlGoogle Scholar
- Jeff Howe. 2008. Crowdsourcing: How the power of the crowd is driving the future of business. Wired Magazine, Random House. Google ScholarDigital Library
- Srikanth Jagabathula, Lakshminarayanan Subramanian, and Ashwin Venkataraman. 2014. Reputation-based worker filtering in crowdsourcing. In Advances in Neural Information Processing Systems. 2492–2500.Google Scholar
- Hiroshi Kajino, Yukino Baba, and Hisashi Kashima. 2014. Instance-Privacy Preserving Crowdsourcing. In Second AAAI Conference on Human Computation and Crowdsourcing.Google Scholar
- Ravi Kannan, Santosh Vempala, and Adrian Vetta. 2004. On clusterings: Good, bad and spectral. Journal of the ACM (JACM) 51, 3 (2004), 497–515. Google ScholarDigital Library
- Ehud D Karnin, Eugene Walach, and Tal Drory. 2010. Crowdsourcing in the document processing practice. Springer.Google Scholar
- Roman Khazankin, Harald Psaier, Daniel Schall, and Schahram Dustdar. 2011. Qos-based task scheduling in crowdsourcing environments. In Service-Oriented Computing. Springer, 297–311. Google ScholarDigital Library
- Ashiqur R KhudaBukhsh, Jaime G Carbonell, and Peter J Jansen. 2014. Detecting Non-Adversarial Collusion in Crowdsourcing. In Second AAAI Conference on Human Computation and Crowdsourcing.Google Scholar
- Aniket Kittur, Jeffrey V Nickerson, Michael Bernstein, Elizabeth Gerber, Aaron Shaw, John Zimmerman, Matt Lease, and John Horton. 2013. The future of crowd work. In Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 1301–1318. Google ScholarDigital Library
- Aniket Kittur, Boris Smus, Susheel Khamkar, and Robert E Kraut. 2011. Crowdforge: Crowdsourcing complex work. In Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 43–52. Google ScholarDigital Library
- Nicolas Kokkalis, Thomas Köhn, Carl Pfeiffer, Dima Chornyi, Michael S Bernstein, and Scott R Klemmer. 2013. EmailValet: Managing email overload through private, accountable crowdsourcing. In Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 1291–1300. Google ScholarDigital Library
- Walter Lasecki, Christopher Miller, Adam Sadilek, Andrew Abumoussa, Donato Borrello, Raja Kushalnagar, and Jeffrey Bigham. 2012. Real-time captioning by groups of non-experts. In Proceedings of the 25th annual ACM symposium on User interface software and technology. ACM, 23–34. Google ScholarDigital Library
- Walter S Lasecki, Mitchell Gordon, Jaime Teevan, Ece Kamar, and Jeffrey P Bigham. 2015. Preserving Privacy in Crowd-Powered Systems. (2015).Google Scholar
- Walter S Lasecki, Jaime Teevan, and Ece Kamar. 2014. Information extraction and manipulation threats in crowd-powered systems. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing. ACM, 248–256. Google ScholarDigital Library
- Matthew Lease, Jessica Hullman, Jeffrey P Bigham, Michael Bernstein, Juho Kim, Walter Lasecki, Saeideh Bakhshi, Tanushree Mitra, and Robert C Miller. 2013. Mechanical turk is not anonymous. Social Science Research Network (2013).Google Scholar
- Greg Little and Yu-An Sun. 2011. Human OCR: Insights from a complex human computation process. In Workshop on Crowdsourcing and Human Computation, Services, Studies and Platforms, ACM CHI. Citeseer.Google Scholar
- R Manmatha, Chengfeng Han, Edward M Riseman, and W Bruce Croft. 1996. Indexing handwriting using word matching. In Proceedings of the first ACM international conference on Digital libraries. ACM, 151–159. Google ScholarDigital Library
- Jon Noronha, Eric Hysen, Haoqi Zhang, and Krzysztof Z Gajos. 2011. Platemate: crowdsourcing nutritional analysis from food photographs. In Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 1–12. Google ScholarDigital Library
- Ali Nosary, Laurent Heutte, Thierry Paquet, and Yves Lecourtier. 1999. Defining writer's invariants to adapt the recognition task. In Document Analysis and Recognition, 1999. ICDAR'99. Proceedings of the Fifth International Conference on. IEEE, 765–768. Google ScholarDigital Library
- U.S. Department of Health & Human Services. 2000. Summary of the HIPPA Privacy Rule. http://www.hhs. gov/ocr/privacy/hipaa/understanding/summary/Google Scholar
- Jacki O'Neill, Shourya Roy, Antonietta Grasso, and David Martin. 2013. Form digitization in BPO: from outsourcing to crowdsourcing?. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 197–206. Google ScholarDigital Library
- Tore Opsahl and Pietro Panzarasa. 2009. Clustering in weighted networks. Social networks 31, 2 (2009), 155–163.Google Scholar
- Tony M Rath and Rudrapatna Manmatha. 2007. Word spotting for historical documents. International Journal of Document Analysis and Recognition (IJDAR) 9, 2-4 (2007), 139–152. Google ScholarDigital Library
- Theodoros Rekatsinas, Amol Deshpande, and Ashwin Machanavajjhala. 2013. SPARSI: Partitioning Sensitive Data Amongst Multiple Adversaries. Proc. VLDB Endow. 6, 13 (Aug. 2013), 1594–1605. DOI: http://dx.doi.org/10.14778/2536258.2536270 Google ScholarDigital Library
- Pierangela Samarati and Latanya Sweeney. 1998. Generalizing data to provide anonymity when disclosing information. In PODS, Vol. 98. 188. Google ScholarDigital Library
- Cristina Sarasua and Matthias Thimm. 2013. Microtask available, send us your CV!. In Cloud and Green Computing (CGC), 2013 Third International Conference on. IEEE, 521–524. Google ScholarDigital Library
- Benjamin Satzger, Harald Psaier, Daniel Schall, and Schahram Dustdar. 2013. Auction-based crowdsourcing supporting skill management. Information Systems 38, 4 (2013), 547–560. Google ScholarDigital Library
- Imran Ahmed Siddiqi and Nicole Vincent. 2007. Writer identfication in handwritten documents. In Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on, Vol. 1. IEEE, 108–112. Google ScholarDigital Library
- Lav R Varshney. 2012. Privacy and reliability in crowdsourcing service delivery. In SRII Global Conference (SRII), 2012 Annual. IEEE, 55–60. Google ScholarDigital Library
- Lav R Varshney, Aditya Vempaty, and Pramod K Varshney. 2014. Assuring privacy and reliability in crowdsourcing with coding. In Information Theory and Applications Workshop (ITA), 2014. IEEE, 1–6.Google ScholarCross Ref
- Louis Vuurpijl and Lambert Schomaker. 1996. Coarse writing-style clustering based on simple stroke-related features. Progress in Handwriting Recognition (1996), 37–44.Google Scholar
- Gang Wang, Tianyi Wang, Haitao Zheng, and Ben Y Zhao. 2014. Man vs. machine: Practical adversarial detection of malicious crowdsourcing workers. In 23rd USENIX Security Symposium, USENIX Association, CA. Google ScholarDigital Library
- Duncan J Watts and Steven H Strogatz. 1998. Collective dynamics of small-worldnetworks. nature 393, 6684 (1998), 440–442.Google Scholar
- Stephen M Wolfson and Matthew Lease. 2011. Look before you leap: legal pitfalls of crowdsourcing. Proceedings of the American Society for Information Science and Technology 48, 1 (2011), 1–10.Google ScholarCross Ref
- Sai Wu, Xiaoli Wang, Sheng Wang, Zhenjie Zhang, and Anthony KH Tung. 2014. K-anonymity for crowdsourcing database. Knowledge and Data Engineering, IEEE Transactions on 26, 9 (2014), 2207–2221.Google Scholar
Recommendations
Preventing sensitive relationships disclosure for better social media preservation
A fundamental aspect of all social networks is information sharing. It is one of the most common forms of online interaction that is tightly associated with social media preservation and information disclosure. As such, information sharing is commonly ...
"Our Privacy Needs to be Protected at All Costs": Crowd Workers' Privacy Experiences on Amazon Mechanical Turk
Crowdsourcing platforms such as Amazon Mechanical Turk (MTurk) are widely used by organizations, researchers, and individuals to outsource a broad range of tasks to crowd workers. Prior research has shown that crowdsourcing can pose privacy risks (e.g., ...
Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of Online Surveys
CHI '15: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing SystemsCrowdsourcing is increasingly being used as a means to tackle problems requiring human intelligence. With the ever-growing worker base that aims to complete microtasks on crowdsourcing platforms in exchange for financial gains, there is a need for ...
Comments