skip to main content
10.1145/2818048.2835202acmconferencesArticle/Chapter ViewAbstractPublication PagescscwConference Proceedingsconference-collections
research-article

Assignment Techniques for Crowdsourcing Sensitive Tasks

Published:27 February 2016Publication History

ABSTRACT

Protecting the privacy of crowd workers has been an important topic in crowdsourcing, however, task privacy has largely been ignored despite the fact that many tasks, e.g., form digitization, live audio transcription or image tagging often contain sensitive information. Although assigning an entire job to a worker may leak private information, jobs can often be split into small components that individually do not. We study the problem of distributing such tasks to workers with the goal of maximizing task privacy using such an approach.

We introduce information loss functions to formally measure the amount of private information leaked as a function of the task assignment. We then design assignment mechanisms for three different assignment settings: PUSH, PULL and a new setting Tug Of War (TOW), which is an intermediate approach that balances flexibility for both workers and requesters. Our assignment algorithms have zero privacy loss for PUSH, and tight theoretical guarantees for PULL. For TOW, our assignment algorithm provably outperforms PULL; importantly the privacy loss is independent of the number of tasks, even when workers collude. We further analyze the performance and privacy tradeoffs empirically on simulated and real-world collusion networks and find that our algorithms outperform the theoretical guarantees.

References

  1. Micah Adler, Soumen Chakrabarti, Michael Mitzenmacher, and Lars Rasmussen. 1998. Parallel Randomized Load Balancing. Random Structures and Algorithms (1998), 159–188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Chithralekha Balamurugan, Shourya Roy, and Sujit Gujar. 2013. Methods and systems for creating tasks of digitizing electronic document. (May 29 2013). US Patent App. 13/904,319.Google ScholarGoogle Scholar
  3. Chithralekha Balamurugan, Shourya Roy, Jacki O'neill, and Sujit Gujar. 2014. Method and system for a text data entry from an electronic document. (Oct. 21 2014). US Patent 8,867,838.Google ScholarGoogle Scholar
  4. David Blumenthal. 2010. Launching HIteCH. New England Journal of Medicine 362, 5 (2010), 382–385.Google ScholarGoogle ScholarCross RefCross Ref
  5. Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Matteo Silvestri, and Giuliano Vesci. 2013. Choosing the right crowd: expert finding in social networks. In Proceedings of the 16th International Conference on Extending Database Technology. ACM, 637–648. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jonathan Bragg, Daniel S Weld, and others. 2013. Crowdsourcing multi-label classfication for taxonomy creation. In First AAAI conference on human computation and crowdsourcing.Google ScholarGoogle Scholar
  7. Bo Brinkman. 2013. An analysis of student privacy rights in the use of plagiarism detection systems. Science and engineering ethics 19, 3 (2013), 1255–1266.Google ScholarGoogle Scholar
  8. Kuang Chen, Akshay Kannan, Yoriyasu Yano, Joseph M Hellerstein, and Tapan S Parikh. 2012. Shreddr: pipelined paper digitization for low-resource organizations. In Proceedings of the 2nd ACM Symposium on Computing for Development. ACM, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Djellel Eddine Difallah, Gianluca Demartini, and Philippe Cudré-Mauroux. 2013. Pick-A-Crowd: Tell Me What You Like, and Ill Tell You What to Do. In Proceedings of the 22nd international conference on World Wide Web. International World Wide Web Conferences Steering Committee, 367–374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Whitfield Diffie and Martin E Hellman. 1976. New directions in cryptography. Information Theory, IEEE Transactions on 22, 6 (1976), 644–654. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Noah E Friedkin. 1983. Horizons of observability and limits of informal control in organizations. Social Forces 62, 1 (1983), 54–77.Google ScholarGoogle ScholarCross RefCross Ref
  12. Gagan Goel, Afshin Nikzad, and Adish Singla. 2014. Allocating tasks to workers with matching constraints: truthful mechanisms for crowdsourcing markets. In Proceedings of the companion publication of the 23rd international conference on World wide web companion. International World Wide Web Conferences Steering Committee, 279–280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gaston H Gonnet. 1981. Expected length of the longest probe sequence in hash code searching. Journal of the ACM (JACM) 28, 2 (1981), 289–304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Mitchell Gordon, Walter S Lasecki, Winnie Leung, Ellen Lim, Steven P Dow, and Jeffrey P Bigham. 2014. Glance Privacy: Obfuscating Personal Identity While Coding Behavioral Video. In Second AAAI Conference on Human Computation and Crowdsourcing.Google ScholarGoogle Scholar
  15. Ralph Gross and Alessandro Acquisti. 2005. Information revelation and privacy in online social networks. In Proceedings of the 2005 ACM workshop on Privacy in the electronic society. ACM, 71–80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Christopher G Harris. 2011. Dirty deeds done dirt cheap: a darker side to crowdsourcing. In Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third International Conference on Social Computing (SocialCom). IEEE, 1314–1317.Google ScholarGoogle Scholar
  17. Kashmir Hill and Zack O'Malley Greenburg. 2010. The Black Market Price of Your Personal Info. Forbes Magazine. http://www.forbes.com/2010/11/29/ black-market-price-of-your-info-personal-finance. htmlGoogle ScholarGoogle Scholar
  18. Jeff Howe. 2008. Crowdsourcing: How the power of the crowd is driving the future of business. Wired Magazine, Random House. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Srikanth Jagabathula, Lakshminarayanan Subramanian, and Ashwin Venkataraman. 2014. Reputation-based worker filtering in crowdsourcing. In Advances in Neural Information Processing Systems. 2492–2500.Google ScholarGoogle Scholar
  20. Hiroshi Kajino, Yukino Baba, and Hisashi Kashima. 2014. Instance-Privacy Preserving Crowdsourcing. In Second AAAI Conference on Human Computation and Crowdsourcing.Google ScholarGoogle Scholar
  21. Ravi Kannan, Santosh Vempala, and Adrian Vetta. 2004. On clusterings: Good, bad and spectral. Journal of the ACM (JACM) 51, 3 (2004), 497–515. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ehud D Karnin, Eugene Walach, and Tal Drory. 2010. Crowdsourcing in the document processing practice. Springer.Google ScholarGoogle Scholar
  23. Roman Khazankin, Harald Psaier, Daniel Schall, and Schahram Dustdar. 2011. Qos-based task scheduling in crowdsourcing environments. In Service-Oriented Computing. Springer, 297–311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ashiqur R KhudaBukhsh, Jaime G Carbonell, and Peter J Jansen. 2014. Detecting Non-Adversarial Collusion in Crowdsourcing. In Second AAAI Conference on Human Computation and Crowdsourcing.Google ScholarGoogle Scholar
  25. Aniket Kittur, Jeffrey V Nickerson, Michael Bernstein, Elizabeth Gerber, Aaron Shaw, John Zimmerman, Matt Lease, and John Horton. 2013. The future of crowd work. In Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 1301–1318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Aniket Kittur, Boris Smus, Susheel Khamkar, and Robert E Kraut. 2011. Crowdforge: Crowdsourcing complex work. In Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 43–52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Nicolas Kokkalis, Thomas Köhn, Carl Pfeiffer, Dima Chornyi, Michael S Bernstein, and Scott R Klemmer. 2013. EmailValet: Managing email overload through private, accountable crowdsourcing. In Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 1291–1300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Walter Lasecki, Christopher Miller, Adam Sadilek, Andrew Abumoussa, Donato Borrello, Raja Kushalnagar, and Jeffrey Bigham. 2012. Real-time captioning by groups of non-experts. In Proceedings of the 25th annual ACM symposium on User interface software and technology. ACM, 23–34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Walter S Lasecki, Mitchell Gordon, Jaime Teevan, Ece Kamar, and Jeffrey P Bigham. 2015. Preserving Privacy in Crowd-Powered Systems. (2015).Google ScholarGoogle Scholar
  30. Walter S Lasecki, Jaime Teevan, and Ece Kamar. 2014. Information extraction and manipulation threats in crowd-powered systems. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing. ACM, 248–256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Matthew Lease, Jessica Hullman, Jeffrey P Bigham, Michael Bernstein, Juho Kim, Walter Lasecki, Saeideh Bakhshi, Tanushree Mitra, and Robert C Miller. 2013. Mechanical turk is not anonymous. Social Science Research Network (2013).Google ScholarGoogle Scholar
  32. Greg Little and Yu-An Sun. 2011. Human OCR: Insights from a complex human computation process. In Workshop on Crowdsourcing and Human Computation, Services, Studies and Platforms, ACM CHI. Citeseer.Google ScholarGoogle Scholar
  33. R Manmatha, Chengfeng Han, Edward M Riseman, and W Bruce Croft. 1996. Indexing handwriting using word matching. In Proceedings of the first ACM international conference on Digital libraries. ACM, 151–159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jon Noronha, Eric Hysen, Haoqi Zhang, and Krzysztof Z Gajos. 2011. Platemate: crowdsourcing nutritional analysis from food photographs. In Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 1–12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Ali Nosary, Laurent Heutte, Thierry Paquet, and Yves Lecourtier. 1999. Defining writer's invariants to adapt the recognition task. In Document Analysis and Recognition, 1999. ICDAR'99. Proceedings of the Fifth International Conference on. IEEE, 765–768. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. U.S. Department of Health & Human Services. 2000. Summary of the HIPPA Privacy Rule. http://www.hhs. gov/ocr/privacy/hipaa/understanding/summary/Google ScholarGoogle Scholar
  37. Jacki O'Neill, Shourya Roy, Antonietta Grasso, and David Martin. 2013. Form digitization in BPO: from outsourcing to crowdsourcing?. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 197–206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Tore Opsahl and Pietro Panzarasa. 2009. Clustering in weighted networks. Social networks 31, 2 (2009), 155–163.Google ScholarGoogle Scholar
  39. Tony M Rath and Rudrapatna Manmatha. 2007. Word spotting for historical documents. International Journal of Document Analysis and Recognition (IJDAR) 9, 2-4 (2007), 139–152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Theodoros Rekatsinas, Amol Deshpande, and Ashwin Machanavajjhala. 2013. SPARSI: Partitioning Sensitive Data Amongst Multiple Adversaries. Proc. VLDB Endow. 6, 13 (Aug. 2013), 1594–1605. DOI: http://dx.doi.org/10.14778/2536258.2536270 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Pierangela Samarati and Latanya Sweeney. 1998. Generalizing data to provide anonymity when disclosing information. In PODS, Vol. 98. 188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Cristina Sarasua and Matthias Thimm. 2013. Microtask available, send us your CV!. In Cloud and Green Computing (CGC), 2013 Third International Conference on. IEEE, 521–524. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Benjamin Satzger, Harald Psaier, Daniel Schall, and Schahram Dustdar. 2013. Auction-based crowdsourcing supporting skill management. Information Systems 38, 4 (2013), 547–560. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Imran Ahmed Siddiqi and Nicole Vincent. 2007. Writer identfication in handwritten documents. In Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on, Vol. 1. IEEE, 108–112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Lav R Varshney. 2012. Privacy and reliability in crowdsourcing service delivery. In SRII Global Conference (SRII), 2012 Annual. IEEE, 55–60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Lav R Varshney, Aditya Vempaty, and Pramod K Varshney. 2014. Assuring privacy and reliability in crowdsourcing with coding. In Information Theory and Applications Workshop (ITA), 2014. IEEE, 1–6.Google ScholarGoogle ScholarCross RefCross Ref
  47. Louis Vuurpijl and Lambert Schomaker. 1996. Coarse writing-style clustering based on simple stroke-related features. Progress in Handwriting Recognition (1996), 37–44.Google ScholarGoogle Scholar
  48. Gang Wang, Tianyi Wang, Haitao Zheng, and Ben Y Zhao. 2014. Man vs. machine: Practical adversarial detection of malicious crowdsourcing workers. In 23rd USENIX Security Symposium, USENIX Association, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Duncan J Watts and Steven H Strogatz. 1998. Collective dynamics of small-worldnetworks. nature 393, 6684 (1998), 440–442.Google ScholarGoogle Scholar
  50. Stephen M Wolfson and Matthew Lease. 2011. Look before you leap: legal pitfalls of crowdsourcing. Proceedings of the American Society for Information Science and Technology 48, 1 (2011), 1–10.Google ScholarGoogle ScholarCross RefCross Ref
  51. Sai Wu, Xiaoli Wang, Sheng Wang, Zhenjie Zhang, and Anthony KH Tung. 2014. K-anonymity for crowdsourcing database. Knowledge and Data Engineering, IEEE Transactions on 26, 9 (2014), 2207–2221.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    CSCW '16: Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing
    February 2016
    1866 pages
    ISBN:9781450335928
    DOI:10.1145/2818048

    Copyright © 2016 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 27 February 2016

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    CSCW '16 Paper Acceptance Rate142of571submissions,25%Overall Acceptance Rate2,235of8,521submissions,26%

    Upcoming Conference

    CSCW '24

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader