skip to main content
10.1145/3219819.3219979acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Can Who-Edits-What Predict Edit Survival?

Authors Info & Claims
Published:19 July 2018Publication History

ABSTRACT

As the number of contributors to online peer-production systems grows, it becomes increasingly important to predict whether the edits that users make will eventually be beneficial to the project. Existing solutions either rely on a user reputation system or consist of a highly specialized predictor that is tailored to a specific peer-production system. In this work, we explore a different point in the solution space that goes beyond user reputation but does not involve any content-based feature of the edits. We view each edit as a game between the editor and the component of the project. We posit that the probability that an edit is accepted is a function of the editor's skill, of the difficulty of editing the component and of a user-component interaction term. Our model is broadly applicable, as it only requires observing data about who makes an edit, what the edit affects and whether the edit survives or not. We apply our model on Wikipedia and the Linux kernel, two examples of large-scale peer-production systems, and we seek to understand whether it can effectively predict edit survival: in both cases, we provide a positive answer. Our approach significantly outperforms those based solely on user reputation and bridges the gap with specialized predictors that use content-based features. It is simple to implement, computationally inexpensive, and in addition it enables us to discover interesting structure in the data.

Skip Supplemental Material Section

Supplemental Material

kristof_edit_survival.mp4

mp4

353.5 MB

References

  1. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et almbox. . 2016. TensorFlow: A System for Large-Scale Machine Learning Proceedings of OSDI'16. Savannah, GA, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. Thomas Adler and Luca de Alfaro . 2007. A Content-Driven Reputation System for the Wikipedia Proceedings of WWW'07. Banff, AB, Canada. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Thomas Adler, Luca de Alfaro, Ian Pye, and Vishwanath Raman . 2008. Measuring Author Contributions to the Wikipedia. In Proceedings of WikiSym'08. Porto, Portugal. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Christopher M. Bishop . 2006. Pattern Recognition and Machine Learning. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ralph Allan Bradley and Milton E. Terry . 1952. Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons. Biometrika Vol. 39, 3/4 (1952), 324--345.Google ScholarGoogle Scholar
  6. Amit Bronner and Christof Monz . 2012. User Edits Classification Using Document Revision Histories Proceedings of EACL 2012. Avignon, France. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jonathan Corbet and Greg Kroah-Hartman . 2017. 2017 Linux Kernel Development Report. Technical Report. The Linux Foundation.Google ScholarGoogle Scholar
  8. Dan Cosley, Dan Frankowski, Loren Terveen, and John Riedl . 2007. SuggestBot: Using Intelligent Task Routing to Help People Find Work in Wikipedia. In Proceedings of IUI'07. Honolulu, HI, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Alexander Philip Dawid and Allan M Skene . 1979. Maximum Likelihood Estimation of Observer Error-rates using the EM Algorithm. Applied Statistics Vol. 28, 1 (1979), 20--28.Google ScholarGoogle ScholarCross RefCross Ref
  10. Luca de Alfaro and B. Thomas Adler . 2013. Content-Driven Reputation for Collaborative Systems Proceedings of TGC 2013. Buenos Aires, Argentina. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Luca de Alfaro, Ashutosh Kulshreshtha, Ian Pye, and B. Thomas Adler . 2011. Reputation Systems for Open Collaboration. Commun. ACM Vol. 54, 8 (2011), 81--87. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gregory Druck, Gerome Miklau, and Andrew McCallum . 2008. Learning to Predict the Quality of Contributions to Wikipedia Proceedings of WikiAI 2008. Chicago, IL, USA.Google ScholarGoogle Scholar
  13. Arpad Elo . 1978. The Rating Of Chess Players, Past & Present. Arco Publishing.Google ScholarGoogle Scholar
  14. GitHub . 2017. The State of the Octoverse 2017. deftempurl%https://octoverse.github.com/ tempurl Accessed: 2017--10--27.Google ScholarGoogle Scholar
  15. Aaron Halfaker and Dario Taraborelli . 2015. Artificial intelligence service “ORES” gives Wikipedians X-ray specs to see through bad edits. deftempurl%https://blog.wikimedia.org/2015/11/30/artificial-intelligence-x-ray-specs/ tempurl Accessed: 2017--10--27.Google ScholarGoogle Scholar
  16. Stefan Heindorf, Martin Potthast, Benno Stein, and Gregor Engels . 2016. Vandalism Detection in Wikidata. In Proceedings of CIKM'16. Indianapolis, IN, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Sara Javanmardi, David W. McDonald, and Cristina V. Lopes . 2011. Vandalism Detection in Wikipedia: A High-Performing, Feature-Rich Model and its Reduction Through Lasso. In Proceedings of WikiSym'11. Mountain View, CA, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yujuan Jiang, Bram Adams, and Daniel M. German . 2013. Will My Patch Make It? And How Fast? Case Study on the Linux Kernel Proceedings of MSR 2013. San Francisco, CA, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Yehuda Koren, Robert Bell, and Chris Volinsky . 2009. Matrix Factorization Techniques for Recommender Systems. Computer Vol. 42, 8 (2009), 30--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Joseph B. Kruskal . 1983. An Overview of Sequence Comparison: Time Warps, String Edits, and Macromolecules. SIAM Rev. Vol. 25, 2 (1983), 201--237.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Xuan Nhat Lam, Thuc Vu, Trong Duc Le, and Anh Duc Duong . 2008. Addressing Cold-Start Problem in Recommendation Systems Proceedings of ICUIMC'08. Suwon, Korea. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Asher Levi, Osnat Mokryn, Christophe Diot, and Nina Taft . 2012. Finding a Needle in a Haystack of Reviews: Cold Start Context-Based Hotel Recommender System. In Proceedings of RecSys'12. Dublin, Ireland. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Martin Potthast, Benno Stein, and Robert Gerling . 2008. Automatic Vandalism Detection in Wikipedia. In Proceedings of ECIR 2008. Glasgow, Scottland. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Georg Rasch . 1960. Probabilistic Models for Some Intelligence and Attainment Tests. Danmarks Pædagogiske Institut.Google ScholarGoogle Scholar
  25. Paul Resnick, Ko Kuwabara, Richard Zeckhauser, and Eric Friedman . 2000. Reputation systems. Commun. ACM Vol. 43, 12 (2000), 45--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Peter C. Rigby, Daniel M. German, and Margaret-Anne Storey . 2008. Open Source Software Peer Review Practices: A Case Study of the Apache Server Proceedings of ICSE'08. Leipzig, Germany. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Andrew I Schein, Alexandrin Popescul, Lyle H Ungar, and David M Pennock . 2002. Methods and Metrics for Cold-Start Recommendations Proceedings of SIGIR'02. Tampere, Finland. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Behzad Tabibian, Isabel Valera, Mehrdad Farajtabar, Le Song, Bernhard Schölkopf, and Manuel Gomez-Rodriguez . 2017. Distilling Information Reliability and Source Trustworthiness from Digital Traces. In Proceedings of WWW'17. Perth, WA, Australia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Louis L. Thurstone . 1927. A Law of Comparative Judgment. Psychological Review Vol. 34, 4 (1927), 273--286.Google ScholarGoogle ScholarCross RefCross Ref
  30. Laurens van der Maaten and Geoffrey Hinton . 2008. Visualizing Data using t-SNE. Journal of Machine Learning Research Vol. 9, Nov (2008), 2579--2605.Google ScholarGoogle Scholar
  31. Peter Welinder, Steve Branson, Pietro Perona, and Serge J Belongie . 2010. The Multidimensional Wisdom of Crowds. In Advances in Neural Information Processing Systems 23. Vancouver, BC, Canada. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jacob Whitehill, Ting-fan Wu, Jacob Bergsma, Javier R Movellan, and Paul L Ruvolo . 2009. Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise. In Advances in Neural Information Processing Systems 22. Vancouver, BC, Canada. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Wikipedia . 2017 a. Wikipedia article depth. deftempurl%https://meta.wikimedia.org/wiki/Wikipedia_article_depth tempurl Accessed: 2017--10--30.Google ScholarGoogle Scholar
  34. Wikipedia . 2017 b. Wikipedia:Wikipedians. deftempurl%https://en.wikipedia.org/wiki/Wikipedia:Wikipedians tempurl Accessed: 2017--10--27.Google ScholarGoogle Scholar
  35. Taha Yasseri, Anselm Spoerri, Mark Graham, and János Kertész . 2014. The most controversial topics in Wikipedia: A multilingual and geographical analysis. In Global Wikipedia: International and Cross-Cultural Issues in Online Collaboration, bibfieldeditorPnina Fichman and Noriko Hara (Eds.). Scarecrow Press.Google ScholarGoogle Scholar
  36. Ernst Zermelo . 1928. Die Berechnung der Turnier-Ergebnisse als ein Maximumproblem der Wahrscheinlichkeitsrechnung. Mathematische Zeitschrift Vol. 29, 1 (1928), 436--460.Google ScholarGoogle ScholarCross RefCross Ref
  37. Denny Zhou, Sumit Basu, Yi Mao, and John C Platt . 2012. Learning from the Wisdom of Crowds by Minimax Entropy Advances in Neural Information Processing Systems 25. Lake Tahoe, CA, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Can Who-Edits-What Predict Edit Survival?

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
          July 2018
          2925 pages
          ISBN:9781450355520
          DOI:10.1145/3219819

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 19 July 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader