Bläsius, Thomas; Fischbeck, Philipp On the External Validity of Average-Case Analyses of Graph AlgorithmsACM Transactions on Algorithms 2024
The number one criticism of average-case analysis is that we do not actually know the probability distribution of real-world inputs. Thus, analyzing an algorithm on some random model has no implications for practical performance. At its core, this criticism doubts the existence of external validity, i.e., it assumes that algorithmic behavior on the somewhat simple and clean models does not translate beyond the models to practical performance real-world input. With this paper, we provide a first step towards studying the question of external validity systematically. To this end, we evaluate the performance of six graph algorithms on a collection of 2740 sparse real-world networks depending on two properties; the heterogeneity (variance in the degree distribution) and locality (tendency of edges to connect vertices that are already close). We compare this with the performance on generated networks with varying locality and heterogeneity. We find that the performance in the idealized setting of network models translates surprisingly well to real-world networks. Moreover, heterogeneity and locality appear to be the core properties impacting the performance of many graph algorithms.
Casel, Katrin; Friedrich, Tobias; Neubert, Stefan; Schmid, Markus L. Shortest distances as enumeration problemDiscrete Applied Mathematics 2024: 89–103
We investigate the single source shortest distance (SSSD) and all pairs shortest distance (APSD) problems as enumeration problems (on unweighted and integer weighted graphs), meaning that the elements (u,v,d(u,v)) – where u and v are vertices with shortest distance d(u,v) – are produced and listed one by one without repetition. The performance is measured in the RAM model of computation with respect to preprocessing time and delay, i.e., the maximum time that elapses between two consecutive outputs. This point of view reveals that specific types of output (e.g., excluding the non-reachable pairs (u,v,∞), or excluding the self-distances (u,u,0)) and the order of enumeration (e.g., sorted by distance, sorted row-wise with respect to the distance matrix) have a huge impact on the complexity of APSD while they appear to have no effect on SSSD. In particular, we show for APSD that enumeration without output restrictions is possible with delay in the order of the average degree. Excluding non-reachable pairs, or requesting the output to be sorted by distance, increases this delay to the order of the maximum degree. Further, for weighted graphs, a delay in the order of the average degree is also not possible without preprocessing or considering self-distances as output. In contrast, for SSSD we find that a delay in the order of the maximum degree without preprocessing is attainable and unavoidable for any of these requirements.
Berenbrink, Petra; Hoefer, Martin; Kaaser, Dominik; Lenzner, Pascal; Rau, Malin; Schmand, Daniel Asynchronous Opinion Dynamics in Social NetworksDistributed Computing 2024
Opinion spreading in a society decides the fate of elections, the success of products, and the impact of political or social movements. A prominent model to study opinion formation processes is due to Hegselmann and Krause. It has the distinguishing feature that stable states do not necessarily show consensus, i.e., the population of agents might not agree on the same opinion. We focus on the social variant of the Hegselmann-Krause model. There are \(n\) agents, which are connected by a social network. Their opinions evolve in an iterative, asynchronous process, in which agents are activated one after another at random. When activated, an agent adopts the average of the opinions of its neighbors having a similar opinion (where similarity of opinions is defined using a parameter \(\varepsilon\)). Thus, the set of influencing neighbors of an agent may change over time. We show that such opinion dynamics are guaranteed to converge for any social network. We provide an upper bound of \(O(nE^2 (\varepsilon/\delta)^2)\) on the expected number of opinion updates until convergence to a stable state, where \(E\rvert\) is the number of edges of the social network, and \($\delta$\) is a parameter of the stability concept. For the complete social network we show a bound of \(O(n^3(n^2 + (\varepsilon/\delta)^2))\) that represents a major improvement over the previously best upper bound of \(O(n^9 (\varepsilon/\delta)^2)\).
Friedrich, Tobias; Göbel, Andres; Klodt, Nicolas; Krejca, Martin S.; Pappik, Marcus Analysis of the survival time of the SIRS process via expansionElectronic Journal of Probability 2024: 1–29
We study the SIRS process---a continuous-time Markov chain modeling the spread of infections on graphs. In this model, vertices are either susceptible, infected, or recovered. Each infected vertex becomes recovered at rate 1 and infects each of its susceptible neighbors independently at rate~\(\lambda\), and each recovered vertex becomes susceptible at a rate~\(\varrho\), which we assume to be independent of the graph size. A central quantity of the SIRS process is the time until no vertex is infected, known as the survival time. Surprisingly though, to the best of our knowledge, all known rigorous theoretical results that exist so far immediately carry over from the related SIS model and do not completely explain the behavior of the SIRS process. We address this imbalance by conducting theoretical analyses of the SIRS process via the expansion properties of the underlying graph. Our first result shows that the expected survival time of the SIRS process on stars is at most polynomial in the graph size for any value of~\(\lambda\). This behavior is fundamentally different from the SIS process, where the expected survival time is exponential already for small infection rates. This raises the question of which graph properties result in an exponential survival time. Our main result is an exponential lower bound of the expected survival time of the SIRS process on expander graphs. Specifically, we show that on expander graphs \(G\) with \(n\) vertices, degree close to \(d\), and sufficiently small spectral expansion, the SIRS process has expected survival time at least exponential in \(n\) when \(\lambda \geq c/d\) for a constant \(c > 1\). Previous results on the SIS process show that this bound is almost tight. Additionally, our result holds even if \(G\) is a subgraph. Notably, our result implies an almost-tight threshold for Erdos-Renyi-graphs and a regime of exponential survival time for complex network models. The proof of our main result draws inspiration from Lyapunov functions used in mean-field theory to devise a two-dimensional potential function and from applying a negative-drift theorem to show that the expected survival time is exponential.
Sauer, Pascal; Cseh, Ágnes; Lenzner, Pascal Improving ranking quality and fairness in Swiss-system chess tournamentsJournal of Quantitative Analysis in Sports 2024
Bilò, Davide; Friedrich, Tobias; Lenzner, Pascal; Melnichenko, Anna Geometric Network Creation GamesSIAM Journal on Discrete Mathematics 2024: 277–315
Network creation games are a well-known approach for explaining and analyzing the structure, quality, and dynamics of real-world networks that evolved via the interaction of selfish agents without a central authority. In these games selfish agents corresponding to nodes in a network strategically buy incident edges to improve their centrality. However, past research on these games only considered the creation of networks with unit-weight edges. In practice, e.g., when constructing a fiber-optic network, the choice of which nodes to connect and also the induced price for a link crucially depend on the distance between the involved nodes, and such settings can be modeled via edge-weighted graphs. We incorporate arbitrary edge weights by generalizing the well-known model by Fabrikant et al. [Proceedings of PODC ’03, ACM, 2003, pp. 347–351] to edge-weighted host graphs and focus on the geometric setting where the weights are induced by the distances in some metric space. In stark contrast to the state of the art for the unit-weight version, where the price of anarchy is conjectured to be constant and where resolving this is a major open problem, we prove a tight nonconstant bound on the price of anarchy for the metric version and a slightly weaker upper bound for the nonmetric case. Moreover, we analyze the existence of equilibria, the computational hardness, and the game dynamics for several natural metrics. The model we propose can be seen as the game-theoretic analogue of the classical network design problem. Thus, low-cost equilibria of our game correspond to decentralized and stable approximations of the optimum network design.
Friedrich, Tobias; Göbel, Andreas; Katzmann, Maximilian; Schiller, Leon Cliques in High-Dimensional Geometric Inhomogeneous Random GraphsSIAM Journal on Discrete Mathematics 2024: 1943–2000
A recent trend in the context of graph theory is to bring theoretical analyses closer to empirical observations, by focusing the studies on random graph models that are used to represent practical instances. There, it was observed that geometric inhomogeneous random graphs (GIRGs) yield good representations of complex real-world networks, by expressing edge probabilities as a function that depends on (heterogeneous) vertex weights and distances in some underlying geometric space that the vertices are distributed in. While most of the parameters of the model are understood well, it was unclear how the dimensionality of the ground space affects the structure of the graphs. In this paper, we complement existing research into the dimension of geometric random graph models and the ongoing study of determining the dimensionality of real-world networks, by studying how the structure of GIRGs changes as the number of dimensions increases. We prove that, in the limit, GIRGs approach non-geometric inhomogeneous random graphs and present insights on how quickly the decay of the geometry impacts important graph structures. In particular, we study the expected number of cliques of a given size as well as the clique number and characterize phase transitions at which their behavior changes fundamentally. Finally, our insights help in better understanding previous results about the impact of the dimensionality on geometric random graphs.
Jana, Satyabrata; Saha, Souvik; Sahu, Abhishek; Saurabh, Saket; Verma, Shaily Partitioning subclasses of chordal graphs with few deletionsTheoretical Computer Science 2024: 114288
In the (Vertex) \(k\)-Way Cut problem, input is an undirected graph \(G\), an integer \(s\), and the goal is to find a subset \(S\) of edges (vertices) of size at most \(s\), such that \(G-S\) has at least \(k\) connected components. Downey et al. [Electr. Notes Theor. Comput. Sci. 2003] showed that \(k\)-Way Cut is W[1]-hard parameterized by \( k \). However, Kawarabayashi and Throup [FOCS 2011] showed that the problem is fixed-parameter tractable (FPT) in general graphs with respect to the parameter \(s\) and provided a \( \mathcal{O(s^s^\mathcal{O(s) n^2) \) time algorithm, where \(n \) denotes the number of vertices in \(G \). The best-known algorithm for this problem runs in time \(s^{\mathcal{O(s) n^\mathcal{O(1)}\) given by Lokshtanov et al. [ACM Tran. of Algo. 2021]. On the other hand, Vertex \(k\)-Way Cut is W[1]-hard with respect to either of the parameters, \(k\) or \(s\) or \(k+s\). These algorithmic results motivate us to look at the problems on special classes of graphs. In this paper, we consider the (Vertex) \(k\)-Way Cut problem on subclasses of chordal graphs and obtain the following results. We first give a sub-exponential FPT algorithm for \(k\)-Way Cut running in time \(2^{\mathcal{O(\sqrt{s} \log s) n^\mathcal{O(1)}\) on chordal graphs. It is known that Vertex \(k\)-Way Cut is W[1]-hard on chordal graphs, in fact on split graphs, parameterized by \(k+s\). We complement this hardness result by designing polynomial-time algorithms for Vertex \(k\)-Way Cut on interval graphs, circular-arc graphs and permutation graphs.
Bilò, Davide; Chechik, Shiri; Choudhary, Keerti; Cohen, Sarel; Friedrich, Tobias; Krogmann, Simon; Schirneck, Martin Approximate Distance Sensitivity Oracles in Subquadratic SpaceTheoretiCS 2024
An \(f\)-edge fault-tolerant distance sensitive oracle (\(f\)-DSO) with stretch \(\sigma\ge{}1\) is a data structure that preprocesses a given undirected, unweighted graph \(G\) with \(n\) vertices and \(m\) edges, and a positive integer \(f\). When queried with a pair of vertices \(s,t\) and a set \(F\) of at most \(f\) edges, it returns a \(\sigma\)-approximation of the \(s\)-\(t\)-distance in \(G-F\). We study \(f\)-DSOs that take subquadratic space. Thorup and Zwick [JACM~2005] showed that this is only possible for \(\sigma \ge 3\). We present, for any constant \(f\ge{}1\) and \(\alpha\in(0,\frac{1}{2})\), and any \(\varepsilon>0\), a randomized \(f\)-DSO with stretch \( 3+\varepsilon\) that w.h.p. takes \(\tilde{O}(n^{2-\frac{\alpha}{f+1}})\cdot{}O(\log n/\varepsilon)^{f+2}\) space and has an \(O(n^\alpha/\varepsilon^2)\) query time. The time to build the oracle is \(\tilde{O}(mn^{2-\frac{\alpha}{f+1}})\cdot{}O(\log n/\varepsilon)^{f+1}\). We also give an improved construction for graphs with diameter at most \(D\). For any positive integer \(k\), we devise an \(f\)-DSO with stretch \(2k-1\) that w.h.p. takes \(O(D^{f+o(1)}n^{1+1/k})\) space and has \(\tilde{O}(D^o(1)})\) query time, with a preprocessing time of \(O(D^{f+o(1)}mn^{1/k})\). Chechik, Cohen, Fiat, and Kaplan [SODA 2017] devised an \(f\)-DSO with stretch \(1{+}\varepsilon\) and preprocessing time \(O(n^{5+o(1)}/\varepsilon^f)\), albeit with a super-quadratic space requirement. We show how to reduce their preprocessing time to \(O(mn^{2+o(1)}/\varepsilon^f)\).
Friedrich, Tobias; Göbel, Andreas; Klodt, Nicolas; Krejca, Martin S.; Pappik, Marcus The Irrelevance of Influencers: Information Diffusion with Re-Activation and Immunity Lasts Exponentially Long on Social Network ModelsAnnual AAAI Conference on Artificial Intelligence 2024
Information diffusion models on networks are at the forefront of AI research. The dynamics of such models typically follow stochastic models from epidemiology, used to model not only infections but various phenomena, including the behavior of computer viruses and viral marketing campaigns. A core question in this setting is how to efficiently detect the most influential vertices in the host graph such that the infection survives the longest. In processes that incorporate re-infection of the vertices, such as the SIS process, theoretical studies identify parameter thresholds where the survival time of the process rapidly transitions from logarithmic to super-polynomial. These results contradict the intuition that the starting configuration is relevant, since the process will always either die out fast or survive almost indefinitely. A shortcoming of these results is that models incorporating short-term immunity (or creative advertisement fatigue) have not been subjected to such a theoretical analysis so far. We reduce this gap in the literature by studying the SIRS process, a more realistic model, which besides re-infection additionally incorporates short-term immunity. On complex network models, we identify parameter regimes for which the process survives exponentially long, and we get a tight threshold for random graphs. Underlying these results is our main technical contribution, showing a threshold behavior for the survival time of the SIRS process on graphs with large expander subgraphs, such as social network models.
Constantinescu, Andrei; Lenzner, Pascal; Reiffenhäuser, Rebecca; Schmand, Daniel; Varricchio, Giovanna Solving Woeginger’s Hiking Problem: Wonderful Partitions in Anonymous Hedonic GamesInternational Colloquium on Automata, Languages and Programming (ICALP) 2024
A decade ago, Gerhard Woeginger posed an open problem that became well-known as "Woeginger’s Hiking Problem": Consider a group of n people that want to go hiking; everyone expresses preferences over the size of their hiking group in the form of an interval between 1 and n. Is it possible to efficiently assign the n people to a set of hiking subgroups so that every person approves the size of their assigned subgroup? The problem is also known as efficiently deciding if an instance of an anonymous Hedonic Game with interval approval preferences admits a wonderful partition. We resolve the open problem in the affirmative by presenting an O(n⁵) time algorithm for Woeginger’s Hiking Problem. Our solution is based on employing a dynamic programming approach for a specific rectangle stabbing problem from computational geometry. Moreover, we propose natural, more demanding extensions of the problem, e.g., maximizing the number of satisfied participants and variants with single-peaked preferences, and show that they are also efficiently solvable. Last but not least, we employ our solution to efficiently compute a partition that maximizes the egalitarian welfare for anonymous single-peaked Hedonic Games.
Friedrich, Tobias; Göbel, Andreas; Katzmann, Maximilian; Schiller, Leon Real-World Networks Are Low-Dimensional: Theoretical and Practical AssessmentInternational Joint Conference on Artificial Intelligence (IJCAI) 2024
Recent empirical evidence suggests that real-world networks have very low underlying dimensionality. We provide a theoretical explanation for this phenomenon as well as develop a linear-time algorithm for detecting the underlying dimensionality of such networks. Our theoretical analysis considers geometric inhomogeneous random graphs (GIRGs), a geometric random graph model, which captures a variety of properties observed in real-world networks. These properties include a heterogeneous degree distribution and non-vanishing clustering coefficient, which is the probability that two random neighbors of a vertex are adjacent. Our first result shows that the clustering coefficient of GIRGs scales inverse exponentially with respect to the number of dimensions d, when the latter is at most logarithmic in n, the number of vertices. Hence, for a GIRG to behave like many real-world networks and have a non-vanishing clustering coefficient, it must come from a geometric space of o(log n) dimensions. Our analysis on GIRGs allows us to obtain a linear-time algorithm for determining the dimensionality of a network. Our algorithm bridges the gap between theory and practice, as it comes with a rigorous proof of correctness and yields results comparable to prior empirical approaches, as indicated by our experiments on real-world instances. The efficiency of our algorithm makes it applicable to very large data-sets. We conclude that very low dimensionalities (from 1 to 10) are needed to explain properties of real-world networks.
Andreev, Nikita; Bliznets, Ivan; Kundu, Madhumita; Saurabh, Saket; Tripathi, Vikash; Verma, Shaily Parameterized Complexity of Paired DominationInternational Workshop on Combinatorial Algorithms (IWOCA) 2024: 523–536
The Paired Domination problem is one of the well-studied variants of the classical Dominating Set problem. In a graph \(G\) on \(n\) vertices, a dominating set \(D\) (set of vertices such that \(N[D]=V(G)\)) is called a paired dominating set of \(G\), if \(G[D]\) has perfect matching. In the Paired Domination problem, given a graph \(G\) and a positive integer \(k\), the task is to check whether \(G\) has a paired dominating set of size at most \(k\). The problem is a variant of the Dominating Set problem, and hence inherits most of the hardness of the Dominating Set problem; however, the same cannot be said about the algorithmic results. In this paper, we study the problem from the perspective of parameterized complexity, both from solution and structural parameterization, and obtain the following results. 1. We design an (non-trivial) exact exponential algorithm running in time \(1.7159^n\). 2. It admits Strong Exponential Time Hypothesis (SETH) optimal algorithm parameterized by the treewidth \(tw\) of the graph \(G\). The algorithm runs in time \(4^{tw}n^O(1)}\) and unless SETH fails, there is no algorithm running in time \((4-\epsilon)^twn^O(1)}\) for any \(\epsilon >0\). 3. We design an algorithm parameterized by the distance to cluster graphs. We complement this result by proving that the problem does not admit a polynomial kernel under this parameterization and under parameterization by vertex cover number. 4. Paired Domination admits a polynomial kernel on graphs that exclude a biclique \(K_{i,j}\). 5. We also prove that one of the counting versions of Paired Domination parameterized by cliquewidth admits \(n^{2cw+O(1)}\) time algorithm parameterized by cliquewidth (\(cw\)). However, it does not admit an FPT algorithm unless #SETH is false.
Göbel, Andreas; Goldberg, Leslie Ann; Roth, Marc The Weisfeiler-Leman Dimension of Conjunctive QueriesSymposium on Principles of Database Systems (PODS) 2024
The Weisfeiler-Leman (WL) dimension of a graph parameter f is the minimum k such that, if G1 and G2 are indistinguishable by the k-dimensional WL-algorithm then f(G1)=f(G2). The WL-dimension of f is ∞ if no such k exists. We study the WL-dimension of graph parameters characterised by the number of answers from a fixed conjunctive query to the graph. Given a conjunctive query φ, we quantify the WL-dimension of the function that maps every graph G to the number of answers of φ in G. The works of Dvorák (J. Graph Theory 2010), Dell, Grohe, and Rattan (ICALP 2018), and Neuen (ArXiv 2023) have answered this question for full conjunctive queries, which are conjunctive queries without existentially quantified variables. For such queries φ, the WL-dimension is equal to the treewidth of the Gaifman graph of φ. In this work, we give a characterisation that applies to all conjunctive qureies. Given any conjunctive query φ, we prove that its WL-dimension is equal to the semantic extension width sew(φ), a novel width measure that can be thought of as a combination of the treewidth of φ and its quantified star size, an invariant introduced by Durand and Mengel (ICDT 2013) describing how the existentially quantified variables of φ are connected with the free variables. Using the recently established equivalence between the WL-algorithm and higher-order Graph Neural Networks (GNNs) due to Morris et al. (AAAI 2019), we obtain as a consequence that the function counting answers to a conjunctive query φ cannot be computed by GNNs of order smaller than sew(φ).
Neubert, Stefan; Casel, Katrin Incremental Ordering for Scheduling ProblemsProceedings of the International Conference on Automated Planning and Scheduling 2024: 405–413
Given an instance of a scheduling problem where we want to start executing jobs as soon as possible, it is advantageous if a scheduling algorithm emits the first parts of its solution early, in particular before the algorithm completes its work. Therefore, in this position paper, we analyze core scheduling problems in regards to their enumeration complexity, i.e. the computation time to the first emitted schedule entry (preprocessing time) and the worst case time between two consecutive parts of the solution (delay). Specifically, we look at scheduling instances that reduce to ordering problems. We apply a known incremental sorting algorithm for scheduling strategies that are at their core comparison-based sorting algorithms and translate corresponding upper and lower complexity bounds to the scheduling setting. For instances with n jobs and a precedence DAG with maximum degree Δ, we incrementally build a topological ordering with O(n) preprocessing and O(Δ) delay. We prove a matching lower bound and show with an adversary argument that the delay lower bound holds even in case the DAG has constant average degree and the ordering is emitted out-of-order in the form of insert operations. We complement our theoretical results with experiments that highlight the improved time-to-first-output and discuss research opportunities for similar incremental approaches for other scheduling problems.
Fomin, Fedor V.; Golovach, Petr A.; Sagunov, Danil; Simonov, Kirill Tree Containment Above Minimum Degree is FPTSymposium on Discrete Algorithms (SODA) 2024: 366–376
Abstract According to the classic Chvatal's Lemma from 1977, a graph of minimum degree \(\delta(G)\) contains every tree on \(\delta(G)+1\) vertices. Our main result is the following algorithmic “extension” of Chvatal's Lemma: For any \(n\)-vertex graph G, integer \(k\), and a tree \(T\) on at most \(\delta(G)+k\) vertices, deciding whether \(G\) contains a subgraph isomorphic to \(T\), can be done in time \(f(k) · n^\mathcal{O(1)}\) for some function \(f\) of \(k\) only. The proof of our main result is based on an interplay between extremal graph theory and parameterized algorithms. The full version of the paper can be accessed at https://arxiv.org/abs/2310.09678.
Bläsius, Thomas; Cohen, Sarel; Fischbeck, Philipp; Friedrich, Tobias; Krejca, Martin S. Robust Parameter Fitting to Realistic Network Models via Iterative Stochastic ApproximationCoRR 2024
ArXiv preprint
Random graph models are widely used to understand network properties and graph algorithms. Key to such analyses are the different parameters of each model, which affect various network features, such as its size, clustering, or degree distribution. The exact effect of the parameters on these features is not well understood, mainly because we lack tools to thoroughly investigate this relation. Moreover, the parameters cannot be considered in isolation, as changing one affects multiple features. Existing approaches for finding the best model parameters of desired features, such as a grid search or estimating the parameter-feature relations, are not well suited, as they are inaccurate or computationally expensive. We introduce an efficient iterative fitting method, named ParFit, that finds parameters using only a few network samples, based on the Robbins-Monro algorithm. We test ParFit on three well-known graph models, namely Erdős-Rényi, Chung-Lu, and geometric inhomogeneous random graphs, as well as on real-world networks, including web networks. We find that ParFit performs well in terms of quality and running time across most parameter configurations.