ABSTRACT
Graph matching is a generalization of the classic graph isomorphism problem. By using only their structures a graph-matching algorithm finds a map between the vertex sets of two similar graphs. This has applications in the de-anonymization of social and information networks and, more generally, in the merging of structural data from different domains.
One class of graph-matching algorithms starts with a known seed set of matched node pairs. Despite the success of these algorithms in practical applications, their performance has been observed to be very sensitive to the size of the seed set. The lack of a rigorous understanding of parameters and performance makes it difficult to design systems and predict their behavior.
In this paper, we propose and analyze a very simple percolation - based graph matching algorithm that incrementally maps every pair of nodes (i,j) with at least r neighboring mapped pairs. The simplicity of this algorithm makes possible a rigorous analysis that relies on recent advances in bootstrap percolation theory for the G(n,p) random graph. We prove conditions on the model parameters in which percolation graph matching succeeds, and we establish a phase transition in the size of the seed set. We also confirm through experiments that the performance of percolation graph matching is surprisingly good, both for synthetic graphs and real social-network data.
Supplemental Material
Available for Download
The original version of the paper had a mistake in the proof of Lemma 2 in the appendix, which we correct in the revised version. The corrected proof has two parts. In the first part, we show that early in the matching process, no wrong pair $[i,j]$ receives $r$ marks or more, i.e., no wrong pairs are matched (w.h.p.) In the second part, we show that late in the matching process, if a wrong pair $[i,j]$ reaches $r$ marks, then either $[i,i]$ or $[j,j]$ lready received $r$ marks before (w.h.p.), and as a consequence $[i,j]$ does not get matched. Together, this establishes a sufficient condition for no matching error to occur. However, as a consequence of this correction, the condition on the parameter $r$ in Lemma 2 (threshold for number of marks until a node pair gets matched) is now $r >= 4$, as opposed to $r>=2$. This is reflected in the discussion of the result below Lemma 2. The main findings and conclusions of the paper are otherwise unchanged.
- M. Aizenman and J. L. Lebowitz. Metastability effects in bootstrap percolation. Journal of Physics A: Mathematical and General, 21(19), 1988.Google ScholarCross Ref
- L. Backstrom, C. Dwork, and J. Kleinberg. Wherefore art thou r3579x : anonymized social networks, hidden patterns, and structural steganography. In Proceedings of the 16th international conference on World Wide Web, 2007. Google ScholarDigital Library
- B. Bollobás. Random graphs, volume 73. Cambridge University Press, 2001.Google Scholar
- P. Doshi, R. Kolli, and C. Thomas. Inexact matching of ontology graphs using expectation-maximization. Web Semantics: Science, Services and Agents on the World Wide Web, 7(2), 2009. Google ScholarDigital Library
- S. Janson, T. Łuczak, T. Turova, and T. Vallier. Bootstrap percolation on the random graph G_n,p. The Annals of Applied Probability, 22(5), 2012.Google Scholar
- G. W. Klau. A new graph-based method for pairwise global network alignment. BMC Bioinformatics, 10(S-1), 2009.Google Scholar
- N. Korula and S. Lattanzi. An efficient reconciliation algorithm for social networks. ArXiv e-prints, July 2013.Google Scholar
- J. Leskovec. Stanford network analysis project. http://snap.stanford.edu/index.html.Google Scholar
- A. Mislove, B. Viswanath, K. P. Gummadi, and P. Druschel. You are who you know: inferring user profiles in online social networks. In Proceedings of the third ACM international conference on Web search and data mining, WSDM, pages 251--260, 2010. Google ScholarDigital Library
- A. Narayanan and V. Shmatikov. De-anonymizing social networks. In Proceedings of the 2009 30th IEEE Symposium on Security and Privacy, 2009. Google ScholarDigital Library
- P. Pedarsani and M. Grossglauser. On the privacy of anonymized networks. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, 2011. Google ScholarDigital Library
- M. Penrose. Random Geometric Graphs. Oxford University Press, USA, 2003.Google Scholar
- M.-E. Rosoiu, C. T. dos Santos, and J. Euzenat. Ontology matching benchmarks: generation and evaluation. In OM, volume 814 of CEUR Workshop Proceedings, 2011.Google Scholar
- Y.-K. Shih and S. Parthasarathy. Scalable global alignment for multiple biological networks. BMC Bioinformatics, 13(S-3), 2012.Google Scholar
- P. Shvaiko and J. Euzenat. Ontology matching: State of the art and future challenges. IEEE Transactions on Knowledge and Data Engineering, 25(1), 2013. Google ScholarDigital Library
- G. Wondracek, T. Holz, E. Kirda, and C. Kruegel. A practical attack to de-anonymize social network users. In IEEE Symposium on Security and Privacy, 2010. Google ScholarDigital Library
Index Terms
- On the performance of percolation graph matching
Recommendations
De-anonymizing Clustered Social Networks by Percolation Graph Matching
Survey Papers and Regular PapersOnline social networks offer the opportunity to collect a huge amount of valuable information about billions of users. The analysis of this data by service providers and unintended third parties are posing serious treats to user privacy. In particular, ...
Error-tolerant geometric graph similarity and matching
Highlights- An intuitive approach to graph similarity and matching using geometric graphs.
- ...
AbstractGeometric graph matching is the process of evaluating the similarity between geometric graphs, where each vertex has an associated coordinate point in a plane. Exact matching computes a strict correspondence between two geometric ...
A graph matching method and a graph matching distance based on subgraph assignments
During the last decade, the use of graph-based object representation has drastically increased. As a matter of fact, object representation by means of graphs has a number of advantages over feature vectors. As a consequence, methods to compare graphs ...
Comments