research-article

Can Who-Edits-What Predict Edit Survival?

Authors:
Ali Batuhan Yardim

Bilkent University, Ankara, Turkey

Bilkent University, Ankara, Turkey
View Profile

,
Victor Kristof

Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland

Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
View Profile

,
Lucas Maystre

Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland

Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
View Profile

,
Matthias Grossglauser

Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland

Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
View Profile

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningJuly 2018Pages 2604–2613https://doi.org/10.1145/3219819.3219979

Published:19 July 2018Publication History

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 2604–2613

ABSTRACT

As the number of contributors to online peer-production systems grows, it becomes increasingly important to predict whether the edits that users make will eventually be beneficial to the project. Existing solutions either rely on a user reputation system or consist of a highly specialized predictor that is tailored to a specific peer-production system. In this work, we explore a different point in the solution space that goes beyond user reputation but does not involve any content-based feature of the edits. We view each edit as a game between the editor and the component of the project. We posit that the probability that an edit is accepted is a function of the editor's skill, of the difficulty of editing the component and of a user-component interaction term. Our model is broadly applicable, as it only requires observing data about who makes an edit, what the edit affects and whether the edit survives or not. We apply our model on Wikipedia and the Linux kernel, two examples of large-scale peer-production systems, and we seek to understand whether it can effectively predict edit survival: in both cases, we provide a positive answer. Our approach significantly outperforms those based solely on user reputation and bridges the gap with specialized predictors that use content-based features. It is simple to implement, computationally inexpensive, and in addition it enables us to discover interesting structure in the data.

Supplemental Material

kristof_edit_survival.mp4

mp4

353.5 MB

Download

References

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et almbox. . 2016. TensorFlow: A System for Large-Scale Machine Learning Proceedings of OSDI'16. Savannah, GA, USA. Google ScholarDigital Library
B. Thomas Adler and Luca de Alfaro . 2007. A Content-Driven Reputation System for the Wikipedia Proceedings of WWW'07. Banff, AB, Canada. Google ScholarDigital Library
B. Thomas Adler, Luca de Alfaro, Ian Pye, and Vishwanath Raman . 2008. Measuring Author Contributions to the Wikipedia. In Proceedings of WikiSym'08. Porto, Portugal. Google ScholarDigital Library
Christopher M. Bishop . 2006. Pattern Recognition and Machine Learning. Springer. Google ScholarDigital Library
Ralph Allan Bradley and Milton E. Terry . 1952. Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons. Biometrika Vol. 39, 3/4 (1952), 324--345.Google Scholar
Amit Bronner and Christof Monz . 2012. User Edits Classification Using Document Revision Histories Proceedings of EACL 2012. Avignon, France. Google ScholarDigital Library
Jonathan Corbet and Greg Kroah-Hartman . 2017. 2017 Linux Kernel Development Report. Technical Report. The Linux Foundation.Google Scholar
Dan Cosley, Dan Frankowski, Loren Terveen, and John Riedl . 2007. SuggestBot: Using Intelligent Task Routing to Help People Find Work in Wikipedia. In Proceedings of IUI'07. Honolulu, HI, USA. Google ScholarDigital Library
Alexander Philip Dawid and Allan M Skene . 1979. Maximum Likelihood Estimation of Observer Error-rates using the EM Algorithm. Applied Statistics Vol. 28, 1 (1979), 20--28.Google ScholarCross Ref
Luca de Alfaro and B. Thomas Adler . 2013. Content-Driven Reputation for Collaborative Systems Proceedings of TGC 2013. Buenos Aires, Argentina. Google ScholarDigital Library
Luca de Alfaro, Ashutosh Kulshreshtha, Ian Pye, and B. Thomas Adler . 2011. Reputation Systems for Open Collaboration. Commun. ACM Vol. 54, 8 (2011), 81--87. Google ScholarDigital Library
Gregory Druck, Gerome Miklau, and Andrew McCallum . 2008. Learning to Predict the Quality of Contributions to Wikipedia Proceedings of WikiAI 2008. Chicago, IL, USA.Google Scholar
Arpad Elo . 1978. The Rating Of Chess Players, Past & Present. Arco Publishing.Google Scholar
GitHub . 2017. The State of the Octoverse 2017. deftempurl%https://octoverse.github.com/ tempurl Accessed: 2017--10--27.Google Scholar
Aaron Halfaker and Dario Taraborelli . 2015. Artificial intelligence service “ORES” gives Wikipedians X-ray specs to see through bad edits. deftempurl%https://blog.wikimedia.org/2015/11/30/artificial-intelligence-x-ray-specs/ tempurl Accessed: 2017--10--27.Google Scholar
Stefan Heindorf, Martin Potthast, Benno Stein, and Gregor Engels . 2016. Vandalism Detection in Wikidata. In Proceedings of CIKM'16. Indianapolis, IN, USA. Google ScholarDigital Library
Sara Javanmardi, David W. McDonald, and Cristina V. Lopes . 2011. Vandalism Detection in Wikipedia: A High-Performing, Feature-Rich Model and its Reduction Through Lasso. In Proceedings of WikiSym'11. Mountain View, CA, USA. Google ScholarDigital Library
Yujuan Jiang, Bram Adams, and Daniel M. German . 2013. Will My Patch Make It? And How Fast? Case Study on the Linux Kernel Proceedings of MSR 2013. San Francisco, CA, USA. Google ScholarDigital Library
Yehuda Koren, Robert Bell, and Chris Volinsky . 2009. Matrix Factorization Techniques for Recommender Systems. Computer Vol. 42, 8 (2009), 30--37. Google ScholarDigital Library
Joseph B. Kruskal . 1983. An Overview of Sequence Comparison: Time Warps, String Edits, and Macromolecules. SIAM Rev. Vol. 25, 2 (1983), 201--237.Google ScholarDigital Library
Xuan Nhat Lam, Thuc Vu, Trong Duc Le, and Anh Duc Duong . 2008. Addressing Cold-Start Problem in Recommendation Systems Proceedings of ICUIMC'08. Suwon, Korea. Google ScholarDigital Library
Asher Levi, Osnat Mokryn, Christophe Diot, and Nina Taft . 2012. Finding a Needle in a Haystack of Reviews: Cold Start Context-Based Hotel Recommender System. In Proceedings of RecSys'12. Dublin, Ireland. Google ScholarDigital Library
Martin Potthast, Benno Stein, and Robert Gerling . 2008. Automatic Vandalism Detection in Wikipedia. In Proceedings of ECIR 2008. Glasgow, Scottland. Google ScholarDigital Library
Georg Rasch . 1960. Probabilistic Models for Some Intelligence and Attainment Tests. Danmarks Pædagogiske Institut.Google Scholar
Paul Resnick, Ko Kuwabara, Richard Zeckhauser, and Eric Friedman . 2000. Reputation systems. Commun. ACM Vol. 43, 12 (2000), 45--48. Google ScholarDigital Library
Peter C. Rigby, Daniel M. German, and Margaret-Anne Storey . 2008. Open Source Software Peer Review Practices: A Case Study of the Apache Server Proceedings of ICSE'08. Leipzig, Germany. Google ScholarDigital Library
Andrew I Schein, Alexandrin Popescul, Lyle H Ungar, and David M Pennock . 2002. Methods and Metrics for Cold-Start Recommendations Proceedings of SIGIR'02. Tampere, Finland. Google ScholarDigital Library
Behzad Tabibian, Isabel Valera, Mehrdad Farajtabar, Le Song, Bernhard Schölkopf, and Manuel Gomez-Rodriguez . 2017. Distilling Information Reliability and Source Trustworthiness from Digital Traces. In Proceedings of WWW'17. Perth, WA, Australia. Google ScholarDigital Library
Louis L. Thurstone . 1927. A Law of Comparative Judgment. Psychological Review Vol. 34, 4 (1927), 273--286.Google ScholarCross Ref
Laurens van der Maaten and Geoffrey Hinton . 2008. Visualizing Data using t-SNE. Journal of Machine Learning Research Vol. 9, Nov (2008), 2579--2605.Google Scholar
Peter Welinder, Steve Branson, Pietro Perona, and Serge J Belongie . 2010. The Multidimensional Wisdom of Crowds. In Advances in Neural Information Processing Systems 23. Vancouver, BC, Canada. Google ScholarDigital Library
Jacob Whitehill, Ting-fan Wu, Jacob Bergsma, Javier R Movellan, and Paul L Ruvolo . 2009. Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise. In Advances in Neural Information Processing Systems 22. Vancouver, BC, Canada. Google ScholarDigital Library
Wikipedia . 2017 a. Wikipedia article depth. deftempurl%https://meta.wikimedia.org/wiki/Wikipedia_article_depth tempurl Accessed: 2017--10--30.Google Scholar
Wikipedia . 2017 b. Wikipedia:Wikipedians. deftempurl%https://en.wikipedia.org/wiki/Wikipedia:Wikipedians tempurl Accessed: 2017--10--27.Google Scholar
Taha Yasseri, Anselm Spoerri, Mark Graham, and János Kertész . 2014. The most controversial topics in Wikipedia: A multilingual and geographical analysis. In Global Wikipedia: International and Cross-Cultural Issues in Online Collaboration, bibfieldeditorPnina Fichman and Noriko Hara (Eds.). Scarecrow Press.Google Scholar
Ernst Zermelo . 1928. Die Berechnung der Turnier-Ergebnisse als ein Maximumproblem der Wahrscheinlichkeitsrechnung. Mathematische Zeitschrift Vol. 29, 1 (1928), 436--460.Google ScholarCross Ref
Denny Zhou, Sumit Basu, Yi Mao, and John C Platt . 2012. Learning from the Wisdom of Crowds by Minimax Entropy Advances in Neural Information Processing Systems 25. Lake Tahoe, CA, USA. Google ScholarDigital Library

Index Terms

Can Who-Edits-What Predict Edit Survival?

Recommendations

Generalizing motion edits with Gaussian processes

One way that artists create compelling character animations is by manipulating details of a character's motion. This process is expensive and repetitive. We show that we can make such motion editing more efficient by generalizing the edits an animator ...
Read More
Aesthetic edits for character animation
SCA '03: Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation

The utility of an interactive tool can be measured by how pervasively it is embedded into a user's workflow. Tools for artists additionally must provide an appropriate level of control over expressive aspects of their work while suppressing unwanted ...
Read More
User generated content and credibility evaluation of online health information

This meta-analysis addresses credibility concerns for online health information.A collection of empirical studies addressing user-generated content was analyzed.We synthesized 22 effect sizes drawn from empirical studies of 1346 participants.Source ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2018
2925 pages
ISBN:9781450355520
DOI:10.1145/3219819
General Chairs:
Yike Guo
Imperial College London
,
Faisal Farooq
IBM
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 July 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
collaborative filtering
peer-production systems
ranking
user-generated content
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 529
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Can Who-Edits-What Predict Edit Survival?

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Generalizing motion edits with Gaussian processes

Aesthetic edits for character animation

User generated content and credibility evaluation of online health information

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Can Who-Edits-What Predict Edit Survival?

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Generalizing motion edits with Gaussian processes

Aesthetic edits for character animation

User generated content and credibility evaluation of online health information

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media