Issac, Davis; Chandran, L. Sunil; Zhou, Sanming Hadwiger’s conjecture for squares of 2-treesEuropean Journal of Combinatorics 2019: 159–174
Hadwiger's conjecture asserts that any graph contains a clique minor with order no less than the chromatic number of the graph. We prove that this well-known conjecture is true for all graphs if and only if it is true for squares of split graphs. This observation implies that Hadwiger's conjecture for squares of chordal graphs is as difficult as the general case, since chordal graphs are a superclass of split graphs. Then we consider 2-trees which are a subclass of each of planar graphs, 2-degenerate graphs and chordal graphs. We prove that Hadwiger's conjecture is true for squares of 2-trees. We achieve this by proving the following stronger result: If G is the square of a 2 tree, then G has a clique minor of size \(\chi(G)\), where each branch set is a path.
Friedrich, Tobias; Krejca, Martin S.; Rothenberger, Ralf; Arndt, Tobias; Hafner, Danijar; Kellermeier, Thomas; Krogmann, Simon; Razmjou, Armin Routing for On-Street Parking Search using Probabilistic DataAI Communications 2019: 113–124
A significant percentage of urban traffic is caused by the search for parking spots. One possible approach to improve this situation is to guide drivers along routes which are likely to have free parking spots. The task of finding such a route can be modeled as a probabilistic graph problem which is NP-complete. Thus, we propose heuristic approaches for solving this problem and evaluate them experimentally. For this, we use probabilities of finding a parking spot, which are based on publicly available empirical data from TomTom International B.V. Additionally, we propose a heuristic that relies exclusively on conventional road attributes. Our experiments show that this algorithm comes close to the baseline by a factor of 1.3 in our cost measure. Last, we complement our experiments with results from a field study, comparing the success rates of our algorithms against real human drivers.
Doerr, Benjamin; Doerr, Carola; Kötzing, Timo Solving Problems with Unknown Solution Length at Almost No Extra CostAlgorithmica 2019: 703–748
Following up on previous work of Cathabard et al. (in: Proceedings of foundations of genetic algorithms (FOGA’11), ACM, 2011) we analyze variants of the (1 + 1) evolutionary algorithm (EA) for problems with unknown solution length. For their setting, in which the solution length is sampled from a geometric distribution, we provide mutation rates that yield for both benchmark functions ONEMAX and LEADINGONES an expected optimization time that is of the same order as that of the (1 + 1) EA knowing the solution length. More than this, we show that almost the same run times can be achieved even if no a priori information on the solution length is available. We also regard the situation in which neither the number nor the positions of the bits with an influence on the fitness function are known. Solving an open problem from Cathabard et al. we show that, for arbitrary natural numbers s, such ONEMAX and LEADINGONES instances can be solved, simultaneously for all natural numbers \(n\), in expected time \(O(n(\log(n))^2 \log\log(n) ... \log^{(s−1)}(n)(\log^{(s)}(n))^{(1+\varepsilon)})\) and \(O(n^2 \log(n) \log\log(n) ... \log^{(s−1)}(n)(\log^{(s)}(n))^{(1+\varepsilon)})\), respectively; that is, in almost the same time as if \(n\) and the relevant bit positions were known. For the LEADINGONES case, we prove lower bounds of same asymptotic order of magnitude apart from the \((\log^{(s)}(n))^\varepsilon\) factor. Aiming at closing this arbitrarily small remaining gap, we realize that there is no asymptotically best performance for this problem. For any algorithm solving, for all \(n\), all instances of size \(n\) in expected time at most \(T(n)\), there is an algorithm doing the same in time \(T'(n)\) with \(T' = o(T)\). For ONEMAX we show results of similar flavor.
Shi, Feng; Schirneck, Martin; Friedrich, Tobias; Kötzing, Timo; Neumann, Frank Reoptimization Time Analysis of Evolutionary Algorithms on Linear Functions Under Dynamic Uniform ConstraintsAlgorithmica 2019: 828–857
Rigorous runtime analysis is a major approach towards understanding evolutionary computing techniques, and in this area linear pseudo-Boolean objective functions play a central role. Having an additional linear constraint is then equivalent to the NP-hard Knapsack problem, certain classes thereof have been studied in recent works. In this article, we present a dynamic model of optimizing linear functions under uniform constraints. Starting from an optimal solution with respect to a given constraint bound, we investigate the runtimes that different evolutionary algorithms need to recompute an optimal solution when the constraint bound changes by a certain amount. The classical \((1+1)\) EA and several population-based algorithms are designed for that purpose, and are shown to recompute efficiently. Furthermore, a variant of the \((1+(\lambda,\lambda))\) GA for the dynamic optimization problem is studied, whose performance is better when the change of the constraint bound is small.
Doerr, Benjamin; Fischbeck, Philipp; Frahnow, Clemens; Friedrich, Tobias; Kötzing, Timo; Schirneck, Martin Island Models Meet Rumor SpreadingAlgorithmica 2019: 886–915
Island models in evolutionary computation solve problems by a careful interplay of independently running evolutionary algorithms on the island and an exchange of good solutions between the islands. In this work, we conduct rigorous run time analyses for such island models trying to simultaneously obtain good run times and low communication effort. We improve the existing upper bounds for both measures (i) by improving the run time bounds via a careful analysis, (ii) by balancing individual computation and communication in a more appropriate manner, and (iii) by replacing the usual communicate-with-all approach with randomized rumor spreading. In the latter, each island contacts a randomly chosen neighbor. This epidemic communication paradigm is known to lead to very fast and robust information dissemination in many applications. Our results concern island models running simple (1+1) evolutionary algorithms to optimize the classic test functions OneMax and LeadingOnes. We investigate binary trees, d-dimensional tori, and complete graphs as communication topologies.
Cseh, Ágnes; Matuschke, Jannik New and Simple Algorithms for Stable Flow ProblemsAlgorithmica 2019: 2557–2591
Stable flows generalize the well-known concept of stable matchings to markets in which transactions may involve several agents, forwarding flow from one to another. An instance of the problem consists of a capacitated directed network in which vertices express their preferences over their incident edges. A network flow is stable if there is no group of vertices that all could benefit from rerouting the flow along a walk. Fleiner (Algorithms 7:1-14, 2014) established that a stable flow always exists by reducing it to the stable allocation problem. We present an augmenting path algorithm for computing a stable flow, the first algorithm that achieves polynomial running time for this problem without using stable allocations as a black-box subroutine. We further consider the problem of finding a stable flow such that the flow value on every edge is within a given interval. For this problem, we present an elegant graph transformation and based on this, we devise a simple and fast algorithm, which also can be used to find a solution to the stable marriage problem with forced and forbidden edges. Finally, we study the stable multicommodity flow model introduced by Király and Pap (Algorithms 6:161-168, 2013). The original model is highly involved and allows for commodity-dependent preference lists at the vertices and commodity-specific edge capacities. We present several graph-based reductions that show equivalence to a significantly simpler model. We further show that it is NP-complete to decide whether an integral solution exists.
Neumann, Aneta; Neumann, Frank; Friedrich, Tobias Quasi-random Image Transition and AnimationAustralian Journal of Intelligent Information Processing Systems 2019: 10–18
Evolutionary algorithms have been widely used in the area of creativity. Recently, evolutionary processes have been used to create artistic image transition processes using random walks. In this paper, we explore the use of quasi-random walks for evolutionary image transition and animation. Quasi-random walks show similar features as standard random walks, but with much less randomness. We utilize this established model from discrete mathematics and show how agents carrying out quasi-random walks can be used for evolutionary image transition and animation. The key idea is to generalize the notion of quasi-random walks and let a set of autonomous agents perform quasi-random walks painting an image. Each agent has one particular target image that they paint when following a sequence of directions for their quasi-random walk.The sequence can easily be chosen by a user and allows them to produce a wide range of different transition patterns and animations.
Cechlárová, Katarína; Cseh, Ágnes; Manlove, David Selected open problems in Matching Under PreferencesBulletin of the European Association for Theoretical Computer Science 2019: 14–38
The House Allocation problem, the Stable Marriage problem and the Stable Roommates problem are three fundamental problems in the area of matching under preferences. These problems have been studied for decades under a range of optimality criteria, but despite much progress, some challenging questions remain open. The purpose of this article is to present a range of key open questions for each of these problems, which will hopefully stimulate further research activity in this area.
Battaglia, Francesco; Cucina, Domenico; Rizzo, Manuel Detection and estimation of additive outliers in seasonal time seriesComputational Statistics 2019: 1–17
The detection of outliers in a time series is an important issue because their presence may have serious negative effects on the analysis in many different ways. Moreover the presence of a complex seasonal pattern in the series could affect the properties of the usual outlier detection procedures. Therefore modelling the appropriate form of seasonality is a very important step when outliers are present in a seasonal time series. In this paper we present some procedures for detection and estimation of additive outliers when parametric seasonal models, in particular periodic autoregressive, are specified to fit the data. A simulation study is presented to evaluate the benefits and the drawbacks of the proposed procedure on a selection of seasonal time series. An application to three real time series is also examined.
Michail, Othon; Skretas, George; Spirakis, G. Paul On the transformation capability of feasible mechanisms for programmable matterComputer and System Sciences 2019: 18–39
In this work, we study theoretical models of programmable matter systems. The systems under consideration consist of spherical modules, kept together by magnetic forces and able to perform two minimal mechanical operations (or movements): rotate around a neighbor and slide over a line. In terms of modeling, there are n nodes arranged in a 2-dimensional grid and forming some initial shape. The goal is for the initial shape A to transform to some target shape B by a sequence of movements. Most of the paper focuses on transformability questions, meaning whether it is in principle feasible to transform a given shape to another. We first consider the case in which only rotation is available to the nodes. Our main result is that deciding whether two given shapes A and B can be transformed to each other is in P. We then insist on rotation only and impose the restriction that the nodes must maintain global connectivity throughout the transformation. We prove that the corresponding transformability question is in PSPACE and study the problem of determining the minimum seeds that can make feasible otherwise infeasible transformations. Next we allow both rotations and slidings and prove universality: any two connected shapes A,B of the same number of nodes, can be transformed to each other without breaking connectivity. The worst-case number of movements of the generic strategy is \(\Theta(n^2)\). We improve this to \( \mathcal{O}(n) \) parallel time, by a pipelining strategy, and prove optimality of both by matching lower bounds. We next turn our attention to distributed transformations. The nodes are now distributed processes able to perform communicate-compute-move rounds. We provide distributed algorithms for a general type of transformation.
Reza Maimani, Hamid; Parsaei Majd, Leila Nowhere-zero flow on some products of signed graphsDiscrete Applied Mathematics 2019: 84–92
Abstract In this paper, we show that the Cartesian product of two signed nontrivial connected graphs has a nowhere-zero 4-flow. Also, we prove that signed biwheel \(B_n\) admits a nowhere zero \(k\)-flow where \(k \in \{3,4\}\).
Trubenova, Barbora; Kötzing, Timo; Krejca, Martin S.; Lehre, Per Kristian Surfing on the seascape: Adaptation in a changing environmentEvolution: International Journal of Organic Evolution 2019: 1356–1374
The environment changes constantly at various time scales and, in order to survive, species need to keep adapting. Whether these species succeed in avoiding extinction is a major evolutionary question. Using a multilocus evolutionary model of a mutation‐limited population adapting under strong selection, we investigate the effects of the frequency of environmental fluctuations on adaptation. Our results rely on an "adaptive‐walk" approximation and use mathematical methods from evolutionary computation theory to investigate the interplay between fluctuation frequency, the similarity of environments, and the number of loci contributing to adaptation. First, we assume a linear additive fitness function, but later generalize our results to include several types of epistasis. We show that frequent environmental changes prevent populations from reaching a fitness peak, but they may also prevent the large fitness loss that occurs after a single environmental change. Thus, the population can survive, although not thrive, in a wide range of conditions. Furthermore, we show that in a frequently changing environment, the similarity of threats that a population faces affects the level of adaptation that it is able to achieve. We check and supplement our analytical results with simulations.
Bläsius, Thomas; Rademacher, Marcel; Rutter, Ignaz How to Draw a PlanarizationGraph Algorithms and Applications 2019: 653–682
We study the problem of computing straight-line drawings of non-planar graphs with few crossings. We assume that a crossing-minimization algorithm is applied first, yielding a planarization, i.e., a planar graph with a dummy vertex for each crossing, that fixes the topology of the resulting drawing. We present and evaluate two different approaches for drawing a planarization in such a way that the edges of the input graph are as straight as possible. The first approach is based on the planarity-preserving force-directed algorithm IMPRED [Simonetto et al. Computer Graphics Forum 2011], the second approach, which we call Geometric Planarization Drawing, iteratively moves vertices to their locally optimal positions in the given initial drawing. Our evaluation shows that both approaches significantly improve the initial drawing and that our geometric approach outperforms the force-directed approach. To the best of our knowledge, this is the first paper concerned with the generation of a straight-line drawing that respects an arbitrary planarization.
Cseh, Ágnes; Skutella, Martin Paths to stable allocationsInternational Journal of Game Theory 2019: 835–862
The stable allocation problem is one of the broadest extensions of the well-known stable marriage problem. In an allocation problem, edges of a bipartite graph have capacities and vertices have quotas to fill. Here we investigate the case of uncoordinated processes in stable allocation instances. In this setting, a feasible allocation is given and the aim is to reach a stable allocation by raising the value of the allocation along blocking edges and reducing it on worse edges if needed. Do such myopic changes lead to a stable solution? In our present work, we analyze both better and best response dynamics from an algorithmic point of view. With the help of two deterministic algorithms we show that random procedures reach a stable solution with probability one for all rational input data in both cases. Surprisingly, while there is a polynomial path to stability when better response strategies are played (even for irrational input data), the more intuitive best response steps may require exponential time. We also study the special case of correlated markets. There, random best response strategies lead to a stable allocation in expected polynomial time.
Cucina, Domenico; Rizzo, Manuel; Ursu, Eugen Multiple changepoint detection for periodic autoregressive models with an application to river flow analysisStochastic Environmental Research and Risk Assessment 2019: 1137–1157
River flow data are usually subject to several sources of discontinuity and inhomogeneity. An example is seasonality, because climatic oscillations occurring on inter-annual timescale have an obvious impact on the river flow. Further sources of alteration can be caused by changes in reservoir management, instrumentation or even unexpected shifts in climatic conditions. When such changes are ignored the results of a statistical analysis can be strongly misleading, so flexible procedures are needed for building the appropriate models, which may be very complex. This paper develops an automatic procedure to estimate the number and locations of changepoints in Periodic AutoRegressive (PAR) models, which have been extensively used to account for seasonality in hydrology. We aim at filling the literature gap on multiple changepoint detection by allowing several time segments to be detected, inside of which a different PAR structure is specified, with the resulting model being employed to successfully capture the discontinuities of river flow data. The model estimation is performed by optimization of an objective function based on an information criterion using genetic algorithms. The proposed methodology is evaluated by means of simulation studies and it is then employed in the analysis of two river flows: the South Saskatchewan, measured at Saskatoon, Canada, and the Colorado, measured at Lees Ferry, Arizona. For these river flows we build changepoint models, discussing the possible events that caused discontinuity, and evaluate their forecasting accuracy. Comparisons with the literature on river flow analysis and on existing methods for changepoint detection confirm the efficiency of our proposal.
Batra, Sanjit Singh; Kumar, Nikhil; Tripathi, Amitabha Some Problems Concerning the Frobenius Number for Extensions of an Arithmetic ProgressionThe Ramanujan Journal 2019: 545–565
For positive and relative prime set of integers \(A=\{a_1,\ldots,a_k\}\), let \(\Gamma(A)\) denote the set of integers of the form \(a_1 x_1+ \ldots +a_k x_k\) with each \(x_i \geq 0\). It is well known that \(\Gamma^c(A)=\mathbb{N} \setminus \Gamma(A)\) is a finite set, so that \(\texttt{g}(A)\), which denotes the largest integer in \(\Gamma^{c}(A)\), is well defined. Let \(A=AP(a,d,k)\) denote the set \(\{ a,a+d, \dots, a+(k-1)d \}\) of integers in arithmetic progression, and let \(\texttt{gcd}(a,d)=1\). We (i) determine the set \(A^+=\{ b \in \Gamma^c(A):\texttt{g}(A \cup \{ b \})= \texttt{g}(A) \}\); (ii) determine a subset \(\overline{A^{+}}\) of \(\Gamma^{c}(A)\) of largest cardinality such that \(A \cup \overline{A^{+}}\) is an independent set and \(\texttt{g}(A \cup \overline{A^{+}})=\texttt{g}(A)\); and (iii) determine \(\texttt{g}(A \cup \{b\})\) for some class of values of \(b\) that includes results of some recent work.
Friedrich, Tobias; Kötzing, Timo; Krejca, Martin S. Unbiasedness of Estimation-of-Distribution AlgorithmsTheoretical Computer Science 2019: 46–59
In the context of black-box optimization, black-box complexity is used for understanding the inherent difficulty of a given optimization problem. Central to our understanding of nature-inspired search heuristics in this context is the notion of unbiasedness. Specialized black-box complexities have been developed in order to better understand the limitations of these heuristics – especially of (population-based) evolutionary algorithms (EAs). In contrast to this, we focus on a model for algorithms explicitly maintaining a probability distribution over the search space: so-called estimation-of-distribution algorithms (EDAs). We consider the recently introduced \(n\)-Bernoulli-\(\lambda\)-EDA framework, which subsumes, for example, the commonly known EDAs PBIL, UMDA, \(\lambda\)-MMAS\(_\textrm{IB}\), and cGA. We show that an \(n\)-Bernoulli-\(\lambda\)-EDA is unbiased if and only if its probability distribution satisfies a certain invariance property under isometric automorphisms of \([0, 1]^n\). By restricting how an \(n\)-Bernoulli-\(\lambda\)-EDA can perform an update, in a way common to many examples, we derive conciser characterizations, which are easy to verify. We demonstrate this by showing that our examples above are all unbiased.
Kötzing, Timo; Krejca, Martin S. First-hitting times under driftTheoretical Computer Science 2019: 51–69
For the last ten years, almost every theoretical result concerning the expected run time of a randomized search heuristic used drift theory, making it the arguably most important tool in this domain. Its success is due to its ease of use and its powerful result: drift theory allows the user to derive bounds on the expected first-hitting time of a random process by bounding expected local changes of the process – the drift. This is usually far easier than bounding the expected first-hitting time directly. Due to the widespread use of drift theory, it is of utmost importance to have the best drift theorems possible. We improve the fundamental additive, multiplicative, and variable drift theorems by stating them in a form as general as possible and providing examples of why the restrictions we keep are still necessary. Our additive drift theorem for upper bounds only requires the process to be lower-bounded, that is, we remove unnecessary restrictions like a finite, discrete, or bounded state space. As corollaries, the same is true for our upper bounds in the case of variable and multiplicative drift. By bounding the step size of the process, we derive new lower-bounding multiplicative and variable drift theorems. Last, we also state theorems that are applicable when the process has a drift of 0, by using a drift on the variance of the process.
Cseh, Ágnes; Irving, Robert W.; Manlove, David F. The Stable Roommates Problem with Short ListsTheory of Computing Systems 2019: 128–149
We consider two variants of the classical Stable Roommates problem with Incomplete (but strictly ordered) preference lists (sri) that are degree constrained, i.e., preference lists are of bounded length. The first variant, egald-sri, involves finding an egalitarian stable matching in solvable instances of sri with preference lists of length at most \(d\). We show that this problem is NP-hard even if \(d = 3\). On the positive side we give a \(\frac{2d+3}{7}\)-approximation algorithm for \(d \in \{3,4,5\}\) which improves on the known bound of 2 for the unbounded preference list case. In the second variant of sri, called d-srti, preference lists can include ties and are of length at most d. We show that the problem of deciding whether an instance of d-srti admits a stable matching is NP-complete even if \(d = 3\). We also consider the "most stable" version of this problem and prove a strong inapproximability bound for the \(d = 3\) case. However for \(d = 2\) we show that the latter problem can be solved in polynomial time.
Friedrich, Tobias; Göbel, Andreas; Neumann, Frank; Quinzan, Francesco; Rothenberger, Ralf Greedy Maximization of Functions with Bounded Curvature Under Partition Matroid ConstraintsConference on Artificial Intelligence (AAAI) 2019: 2272–2279
We investigate the performance of a deterministic GREEDY algorithm for the problem of maximizing functions under a partition matroid constraint. We consider non-monotone submodular functions and monotone subadditive functions. Even though constrained maximization problems of monotone submodular functions have been extensively studied, little is known about greedy maximization of non-monotone submodular functions or monotone subadditive functions. We give approximation guarantees for GREEDY on these problems, in terms of the curvature. We find that this simple heuristic yields a strong approximation guarantee on a broad class of functions. We discuss the applicability of our results to three real-world problems: Maximizing the determinant function of a positive semidefinite matrix, and related problems such as the maximum entropy sampling problem, the constrained maximum cut problem on directed graphs, and combinatorial auction games. We conclude that GREEDY is well-suited to approach these problems. Overall, we present evidence to support the idea that, when dealing with constrained maximization problems with bounded curvature, one needs not search for (approximate) monotonicity to get good approximate solutions.
Roostapour, Vahid; Neumann, Aneta; Neumann, Frank; Friedrich, Tobias Pareto Optimization for Subset Selection with Dynamic Cost ConstraintsConference on Artificial Intelligence (AAAI) 2019: 2354–2361
In this paper, we consider subset selection problems for functions \(f\) with constraints where the constraint bound \(B\) changes over time. We point out that adaptive variants of greedy approaches commonly used in the area of submodular optimization are not able to maintain their approximation quality. Investigating the recently introduced POMC Pareto optimization approach, we show that this algorithm efficiently computes a \( \phi= (\alpha_f/2)(1-\frac{1}{e^{\alpha_f}})\)-approximation, where \(\alpha_f\) is the submodularity ratio, for each possible constraint bound \(b \leq B\). Furthermore, we show that POMC is able to adapt its set of solutions quickly in the case that \(B\) increases. Our experimental investigations for the influence maximization in social networks show the advantage of POMC over generalized greedy algorithms.
Feldotto, Matthias; Lenzner, Pascal; Molitor, Louise; Skopalik, Alexander From Hotelling to Load Balancing: Approximation and the Principle of Minimum DifferentiationAutonomous Agents and Multiagent Systems (AAMAS) 2019: 1949–1951
Competing firms tend to select similar locations for their stores. This phenomenon, called the principle of minimum differentiation, was captured by Hotelling with a landmark model of spatial competition but is still the object of an ongoing scientific debate. Although consistently observed in practice, many more realistic variants of Hotelling's model fail to support minimum differentiation or do not have pure equilibria at all. In particular, it was recently proven for a generalized model which incorporates negative network externalities and which contains Hotelling's model and classical selfish load balancing as special cases, that the unique equilibria do not adhere to minimum differentiation. Furthermore, it was shown that for a significant parameter range pure equilibria do not exist. We derive a sharp contrast to these previous results by investigating Hotelling's model with negative network externalities from an entirely new angle: approximate pure subgame perfect equilibria. This approach allows us to prove analytically and via agent-based simulations that approximate equilibria having good approximation guarantees and that adhere to minimum differentiation exist for the full parameter range of the model. Moreover, we show that the obtained approximate equilibria have high social welfare.
Bläsius, Thomas; Friedrich, Tobias; Lischeid, Julius; Meeks, Kitty; Schirneck, Martin Efficiently Enumerating Hitting Sets of Hypergraphs Arising in Data ProfilingAlgorithm Engineering and Experiments (ALENEX) 2019: 130–143
We devise an enumeration method for inclusion-wise minimal hitting sets in hypergraphs. It has delay \(O(m^{k^\ast+1} \cdot n^2)\) and uses linear space. Hereby, \(n\) is the number of vertices, \(m\) the number of hyperedges, and \(k^\ast\) the rank of the transversal hypergraph. In particular, on classes of hypergraphs for which the cardinality \(k^\ast\) of the largest minimal hitting set is bounded, the delay is polynomial. The algorithm solves the extension problem for minimal hitting sets as a subroutine. We show that the extension problem is W[3]-complete when parameterised by the cardinality of the set which is to be extended. For the subroutine, we give an algorithm that is optimal under the exponential time hypothesis. Despite these lower bounds, we provide empirical evidence showing that the enumeration outperforms the theoretical worst-case guarantee on hypergraphs arising in the profiling of relational databases, namely, in the detection of unique column combinations.
Fokina, Ekaterina; Kötzing, Timo; San Mauro, Luca Limit Learning Equivalence StructuresAlgorithmic Learning Theory (ALT) 2019: 383–403
While most research in Gold-style learning focuses on learning formal languages, we consider the identification of computable structures, specifically equivalence structures. In our core model the learner gets more and more information about which pairs of elements of a structure are related and which are not. The aim of the learner is to find (an effective description of) the isomorphism type of the structure presented in the limit. In accordance with language learning we call this learning criterion InfEx-learning (explanatory learning from informant). We start with a discussion and separations of different variants of this learning criterion, including learning from text (where the only information provided is which elements are related, and not which elements are not related) and finite learning (where the first actual conjecture of the learner has to be correct). This gives first intuitions and examples for what (classes of) structures are learnable and which are not. Our main contribution is a complete characterization of the learnable classes of structures in terms of a combinatorial property of the classes. This property allows us to derive a bound of \(\mathbf{0''}\) on the computational complexity required to learn uniformly enumerable classes of structures. Finally, we show how learning classes of structures relates to learning classes of languages by mapping learning tasks for structures to equivalent learning tasks for languages.
Casel, Katrin; Fernau, Henning; Khosravian Ghadikolaei, Mehdi; Monnot, Jerome; Sikora, Florian Extension of vertex cover and independent set in some classes of graphs and generalizationsInternational Conference on Algorithms and Complexity (CIAC) 2019: 124–136
We study extension variants of the classical problems Vertex Cover and Independent Set. Given a graph \(G = (V, E)\) and a vertex set \(U \subseteq V\), it is asked if there exists a minimal vertex cover (resp. maximal independent set) \(S\) with \(U \subseteq S\) (resp. \(U \supseteq S\)). Possibly contradicting intuition, these problems tend to be NP-complete, even in graph classes where the classical problem can be solved efficiently. Yet, we exhibit some graph classes where the extension variant remains polynomial-time solvable. We also study the parameterized complexity of theses problems, with parameter \(|U|\), as well as the optimality of simple exact algorithms under ETH. All these complexity considerations are also carried out in very restricted scenarios, be it degree or topological restrictions (bipartite, planar or chordal graphs). This also motivates presenting some explicit branching algorithms for degree-bounded instances. We further discuss the price of extension, measuring the distance of \(U\) to the closest set that can be extended, which results in natural optimization problems related to extension problems for which we discuss polynomial-time approximability.
Bläsius, Thomas; Friedrich, Tobias; Katzmann, Maximilian; Meyer, Ulrich; Penschuck, Manuel; Weyand, Christopher Efficiently Generating Geometric Inhomogeneous and Hyperbolic Random GraphsEuropean Symposium on Algorithms (ESA) 2019: 21:2–21:14
EATCS Best Paper Award
Hyperbolic random graphs (HRG) and geometric inhomogeneous random graphs (GIRG) are two similar generative network models that were designed to resemble complex real world networks. In particular, they have a power-law degree distribution with controllable exponent \(\beta\), and high clustering that can be controlled via the temperature \(T\). We present the first implementation of an efficient GIRG generator running in expected linear time. Besides varying temperatures, it also supports underlying geometries of higher dimensions. It is capable of generating graphs with ten million edges in under a second on commodity hardware. The algorithm can be adapted to HRGs. Our resulting implementation is the fastest sequential HRG generator, despite the fact that we support non-zero temperatures. Though non-zero temperatures are crucial for many applications, most existing generators are restricted to \(T = 0\). We also support parallelization, although this is not the focus of this paper. Moreover, we note that our generators draw from the correct probability distribution, i.e., they involve no approximation. Besides the generators themselves, we also provide an efficient algorithm to determine the non-trivial dependency between the average degree of the resulting graph and the input parameters of the GIRG model. This makes it possible to specify \(\bar{d}\) as input and obtain a graph with expected average degree \(\bar{d}\). Moreover, we investigate the differences between HRGs and GIRGs, shedding new light on the nature of the relation between the two models. Although HRGs represent, in a certain sense, a special case of the GIRG model, we find that a straight-forward inclusion does not hold in practice. However, the difference is negligible for most use cases.
Casel, Katrin; Fernau, Henning; Khosravian Ghadikolaei, Mehdi; Monnot, Jerome; Sikora, Florian Extension of some edge graph problems: standard and parameterized complexityFundamentals of Computation Theory (FCT) 2019: 185–200
We consider extension variants of some edge optimization problems in graphs containing the classical Edge Cover, Matching, and Edge Dominating Set problems. Given a graph \(G=(V,E)\) and an edge set \(U \subseteq E\), it is asked whether there exists an inclusion-wise minimal (resp., maximal) feasible solution \(E'\) which satisfies a given property, for instance, being an edge dominating set (resp., a matching) and containing the forced edge set \(U\) (resp., avoiding any edges from the forbidden edge set \(E \setminus U\)). We present hardness results for these problems, for restricted instances such as bipartite or planar graphs. We counter-balance these negative results with parameterized complexity results. We also consider the price of extension, a natural optimization problem variant of extension problems, leading to some approximation results.
Doerr, Benjamin; Kötzing, Timo Multiplicative Up-DriftGenetic and Evolutionary Computation Conference (GECCO) 2019
Drift analysis aims at translating the expected progress of an evo- lutionary algorithm (or more generally, a random process) into a probabilistic guarantee on its run time (hitting time). So far, drift arguments have been successfully employed in the rigorous analy- sis of evolutionary algorithms, however, only for the situation that the progress is constant or becomes weaker when approaching the target. Motivated by questions like how fast fit individuals take over a population, we analyze random processes exhibiting a multiplica- tive growth in expectation. We prove a drift theorem translating this expected progress into a hitting time. This drift theorem gives a sim- ple and insightful proof of the level-based theorem first proposed by Lehre (2011). Our version of this theorem has, for the first time, the best-possible linear dependence on the growth parameter \(\delta\) (the previous-best was quadratic). This gives immediately stronger run time guarantees for a number of applications.
Alon, Noga; Chechik, Shiri; Cohen, Sarel Deterministic Combinatorial Replacement Paths and Distance Sensitivity OraclesInternational Colloquium on Automata, Languages and Programming (ICALP) 2019: 12:1–12:14
In this work we derandomize two central results in graph algorithms, replacement paths and distance sensitivity oracles (DSOs) matching in both cases the running time of the randomized algorithms. For the replacement paths problem, let \(G = (V,E)\) be a directed unweighted graph with \(n\) vertices and m edges and let \(P\) be a shortest path from \(s\) to \(t\) in \(G\). The replacement paths problem is to find for every edge \(e\) in \(P\) the shortest path from \(s\) to \(t\) avoiding \(e\). Roditty and Zwick [ICALP 2005] obtained a randomized algorithm with running time of \(\mathcal{O}(m \sqrt{n})\). Here we provide the first deterministic algorithm for this problem, with the same \(\mathcal{O}(m \sqrt{n})\) time. Due to matching conditional lower bounds of Williams et al. [FOCS 2010], our deterministic combinatorial algorithm for the replacement paths problem is optimal up to polylogarithmic factors (unless the long standing bound of \(\mathcal{O}(mn)\) for the combinatorial boolean matrix multiplication can be improved). This also implies a deterministic algorithm for the second simple shortest path problem in \(\mathcal{O}(m \sqrt{n})\) time, and a deterministic algorithm for the \(k\)-simple shortest paths problem in \(\mathcal{O}(k m sqrt{n})\) time (for any integer constant \(k > 0\)). For the problem of distance sensitivity oracles, let \(G = (V,E)\) be a directed graph with real-edge weights. An \(f\)-Sensitivity Distance Oracle (\(f\)-DSO) gets as input the graph \(G=(V,E)\) and a parameter \(f\), preprocesses it into a data-structure, such that given a query \((s,t,F)\) with \(s,t \in V\) and \(F \subseteq E \cup V\), \(|F| <=f\) being a set of at most \(f\) edges or vertices (failures), the query algorithm efficiently computes the distance from \(s\) to \(t\) in the graph \(G \setminus F\) (i.e., the distance from \(s\) to \(t\) in the graph \(G\) after removing from it the failing edges and vertices \(F\)). For weighted graphs with real edge weights, Weimann and Yuster [FOCS 2010] presented several randomized \(f\)-DSOs. In particular, they presented a combinatorial \(f\)-DSO with \(\mathcal{O}(mn^{4-\alpha})\) preprocessing time and subquadratic \(\mathcal{O}(n^{2-2(1-\alpha)/f})\) query time, giving a tradeoff between preprocessing and query time for every value of \(0 < \alpha < 1\). We derandomize this result and present a combinatorial deterministic \(f\)-DSO with the same asymptotic preprocessing and query time.
Friedrich, Tobias; Rothenberger, Ralf The Satisfiability Threshold for Non-Uniform Random 2-SATInternational Colloquium on Automata, Languages and Programming (ICALP) 2019: 61:1–61:14
Propositional satisfiability (SAT) is one of the most fundamental problems in computer science. Its worst-case hardness lies at the core of computational complexity theory, for example in the form of NP-hardness and the (Strong) Exponential Time Hypothesis. In practice however, SAT instances can often be solved efficiently. This contradicting behavior has spawned interest in the average-case analysis of SAT and has triggered the development of sophisticated rigorous and non-rigorous techniques for analyzing random structures. Despite a long line of research and substantial progress, most theoretical work on random SAT assumes a uniform distribution on the variables. In contrast, real-world instances often exhibit large fluctuations in variable occurrence. This can be modeled by a non-uniform distribution of the variables, which can result in distributions closer to industrial SAT instances. We study satisfiability thresholds of non-uniform random 2-SAT with n variables and m clauses and with an arbitrary probability distribution \((p_i)_{i \in [n]}\) with \(p_1 \ge p_2 \ge \dots \ge p_n > 0\) over the n variables. We show for \(p_1^2 = \Theta(\sum_{i=1}^n p_i^2)\) that the asymptotic satisfiability threshold is at \(m = \Theta((1− \sum_{i=1}^n p_i^2) / \)\((p_1 (\sum_{i=2}^n p_i^2)^{1/2}))\) and that it is coarse. For \(p_1^2 = o( \sum_{i=1}^n p_i^2)\) we show that there is a sharp satisfiability threshold at \(m = (\sum_{i=1}^n p_i^2)^{−1}\). This result generalizes the seminal works by Chvatal and Reed [FOCS 1992] and by Goerdt [JCSS 1996].
Casel, Katrin; Day, Joel D.; Fleischmann, Pamela; Kociumaka, Tomasz; Manea, Florin; Schmid, Markus L. Graph and String Parameters: Connections Between Pathwidth, Cutwidth and the Locality NumberInternational Colloquium on Automata, Languages and Programming (ICALP) 2019: 109:1–109:16
We investigate the locality number, a recently introduced structural parameter for strings (with applications in pattern matching with variables), and its connection to two important graph-parameters, cutwidth and pathwidth. These connections allow us to show that computing the locality number is NP-hard but fixed parameter tractable (when the locality number or the alphabet size is treated as a parameter), and can be approximated with ratio O( \( \sqrt{ \log opt} \log n \) ). As a by-product, we also relate cutwidth via the locality number to pathwidth, which is of independent interest, since it improves the currently best known approximation algorithm for cutwidth. In addition to these main results, we also consider the possibility of greedy-based approximation algorithms for the locality number.
Peters, Jannik; Stephan, Daniel; Amon, Isabel; Gawendowicz, Hans; Lischeid, Julius; Salabarria, Julius; Umland, Jonas; Werner, Felix; Krejca, Martin S.; Rothenberger, Ralf; Kötzing, Timo; Friedrich, Tobias Mixed Integer Programming versus Evolutionary Computation for Optimizing a Hard Real-World Staff Assignment ProblemInternational Conference on Automated Planning and Scheduling (ICAPS) 2019: 541–554
Assigning staff to engagements according to hard constraints while optimizing several objectives is a task encountered by many companies on a regular basis. Simplified versions of such assignment problems are NP-hard. Despite this, a typical approach to solving them consists of formulating them as mixed integer programming (MIP) problems and using a state-of-the-art solver to get solutions that closely approximate the optimum. In this paper, we consider a complex real-world staff assignment problem encountered by the professional service company KPMG, with the goal of finding an algorithm that solves it faster and with a better solution than a commercial MIP solver. We follow the evolutionary algorithm (EA) metaheuristic and design a search heuristic which iteratively improves a solution using domain-specific mutation operators. Furthermore, we use a flow algorithm to optimally solve a subproblem, which tremendously reduces the search space for the EA. For our real-world instance of the assignment problem, given the same total time budget of \(100\) hours, a parallel EA approach finds a solution that is only \(1.7\)\% away from an upper bound for the (unknown) optimum within under five hours, while the MIP solver Gurobi still has a gap of \(10.5\) \%.
Friedrich, Tobias; Rothenberger, Ralf Sharpness of the Satisfiability Threshold for Non-Uniform Random \(k\)-SAT.International Joint Conference on Artificial Intelligence (IJCAI) 2019: 6151–6155
We study non-uniform random k-SAT on n variables with an arbitrary probability distribution p on the variable occurrences. The number \(t = t(n)\) of randomly drawn clauses at which random formulas go from asymptotically almost surely (a.a.s.) satisfiable to a.a.s. unsatisfiable is called the satisfiability threshold. Such a threshold is called sharp if it approaches a step function as n increases. We show that a threshold t(n) for random k-SAT with an ensemble \((p_n)_{n\in\mathbb{N}}\) of arbitrary probability distributions on the variable occurrences is sharp if \(\|p\|_2^2 = O_n(t^{-2/k})\) and \(\|p\|_\infty\) \(= o_n(t^{-k/(2k-1)} \log^{-(k-1)/(2k-1)}(t))\). This result generalizes Friedgut’s sharpness result from uniform to non-uniform random k-SAT and implies sharpness for thresholds of a wide range of random k-SAT models with heterogeneous probability distributions, for example such models where the variable probabilities follow a power-law distribution.
Gao, Ziyuan; Jain, Sanjay; Khoussainov, Bakhadyr; Li, Wei; Melnikov, Alexander; Seidel, Karen; Stephan, Frank Random Subgroups of RationalsMathematical Foundations of Computer Science (MFCS) 2019: 25:1–25:14
This paper introduces and studies a notion of algorithmic randomness for subgroups of rationals. Given a randomly generated additive subgroup \((G,+)\) of rationals, two main questions are addressed: first, what are the model-theoretic and recursion-theoretic properties of \((G,+)\); second, what learnability properties can one extract from \(G\) and its subclass of finitely generated subgroups? For the first question, it is shown that the theory of \((G,+)\) coincides with that of the additive group of integers and is therefore decidable; furthermore, while the word problem for \(G\) with respect to any generating sequence for \(G\) is not even semi-decidable, one can build a generating sequence \(\beta\) such that the word problem for \(G\) with respect to \(\beta\) is co-recursively enumerable (assuming that the set of generators of \(G\) is limit-recursive). In regard to the second question, it is proven that there is a generating sequence \(\beta\) for \(G\) such that every non-trivial finitely generated subgroup of \(G\) is recursively enumerable and the class of all such subgroups of \(G\) is behaviourally correctly learnable, that is, every non-trivial finitely generated subgroup can be semantically identified in the limit (again assuming that the set of generators of \(G\) is limit-recursive). On the other hand, the class of non-trivial finitely generated subgroups of \(G\) cannot be syntactically identified in the limit with respect to any generating sequence for \(G\). The present work thus contributes to a recent line of research studying algorithmically random infinite structures and uncovers an interesting connection between the arithmetical complexity of the set of generators of a randomly generated subgroup of rationals and the learnability of its finitely generated subgroups.
Chechik, Shiri; Cohen, Sarel Near Optimal Algorithms For The Single Source Replacement Paths ProblemSymposium on Discrete Algorithms (SODA) 2019: 2090–2109
The Single Source Replacement Paths (SSRP) problem is as follows; Given a graph \(G = (V, E)\), a source vertex \(s\) and a shortest paths tree \(T_s\) rooted in \(s\), output for every vertex \(t \in V\) and for every edge \(e\) in \(T_s\) the length of the shortest path from \(s\) to \(t\) avoiding \(e\). We present near optimal upper bounds, by providing \(\tilde{\mathcal{O}}(m \sqrt{n} + n^2) \) time randomized combinatorial algorithm for unweighted undirected graphs, and matching conditional lower bounds for the SSRP problem.
Bilò, Davide; Friedrich, Tobias; Lenzner, Pascal; Melnichenko, Anna Geometric Network Creation GamesSymposium on Parallelism in Algorithms and Architectures (SPAA) 2019: 323–332
Network Creation Games are a well-known approach for explaining and analyzing the structure, quality and dynamics of real-world networks like the Internet and other infrastructure networks which evolved via the interaction of selfish agents without a central authority. In these games selfish agents which correspond to nodes in a network strategically buy incident edges to improve their centrality. However, past research on these games has only considered the creation of networks with unit-weight edges. In practice, e.g. when constructing a fiber-optic network, the choice of which nodes to connect and also the induced price for a link crucially depends on the distance between the involved nodes and such settings can be modeled via edge-weighted graphs. We incorporate arbitrary edge weights by generalizing the well-known model by Fabrikant et al.~[PODC'03] to edge-weighted host graphs and focus on the geometric setting where the weights are induced by the distances in some metric space. In stark contrast to the state-of-the-art for the unit-weight version, where the Price of Anarchy is conjectured to be constant and where resolving this is a major open problem, we prove a tight non-constant bound on the Price of Anarchy for the metric version and a slightly weaker upper bound for the non-metric case. Moreover, we analyze the existence of equilibria, the computational hardness and the game dynamics for several natural metrics. The model we propose can be seen as the game-theoretic analogue of a variant of the classical Network Design Problem. Thus, low-cost equilibria of our game correspond to decentralized and stable approximations of the optimum network design.
Friedrich, Tobias From Graph Theory to Network Science: The Natural Emergence of HyperbolicitySymposium Theoretical Aspects of Computer Science (STACS) 2019: 5:1–5:9
Network science is driven by the question which properties large real-world networks have and how we can exploit them algorithmically. In the past few years, hyperbolic graphs have emerged as a very promising model for scale-free networks. The connection between hyperbolic geometry and complex networks gives insights in both directions: (1) Hyperbolic geometry forms the basis of a natural and explanatory model for real-world networks. Hyperbolic random graphs are obtained by choosing random points in the hyperbolic plane and connecting pairs of points that are geometrically close. The resulting networks share many structural properties for example with online social networks like Facebook or Twitter. They are thus well suited for algorithmic analyses in a more realistic setting. (2) Starting with a real-world network, hyperbolic geometry is well-suited for metric embeddings. The vertices of a network can be mapped to points in this geometry, such that geometric distances are similar to graph distances. Such embeddings have a variety of algorithmic applications ranging from approximations based on efficient geometric algorithms to greedy routing solely using hyperbolic coordinates for navigation decisions.
Cseh, Ágnes; Juhos, Attila Pairwise Preferences in the Stable Marriage ProblemSymposium Theoretical Aspects of Computer Science (STACS) 2019: 21:1–21:16
We study the classical, two-sided stable marriage problem under pairwise preferences. In the most general setting, agents are allowed to express their preferences as comparisons of any two of their edges and they also have the right to declare a draw or even withdraw from such a comparison. This freedom is then gradually restricted as we specify six stages of orderedness in the preferences, ending with the classical case of strictly ordered lists. We study all cases occurring when combining the three known notions of stability - weak, strong and super-stability - under the assumption that each side of the bipartite market obtains one of the six degrees of orderedness. By designing three polynomial algorithms and two NP-completeness proofs we determine the complexity of all cases not yet known, and thus give an exact boundary in terms of preference structure between tractable and intractable cases.
Bläsius, Thomas; Friedrich, Tobias; Sutton, Andrew M. On the Empirical Time Complexity of Scale-Free 3-SAT at the Phase TransitionTools and Algorithms for the Construction and Analysis of Systems (TACAS) 2019: 117–134
The hardness of formulas at the solubility phase transition of random propositional satisfiability (SAT) has been intensely studied for decades both empirically and theoretically. Solvers based on stochastic local search (SLS) appear to scale very well at the critical threshold, while complete backtracking solvers exhibit exponential scaling. On industrial SAT instances, this phenomenon is inverted: backtracking solvers can tackle large industrial problems, where SLS-based solvers appear to stall. Industrial instances exhibit sharply different structure than uniform random instances. Among many other properties, they are often heterogeneous in the sense that some variables appear in many while others appear in only few clauses. We conjecture that the heterogeneity of SAT formulas alone already contributes to the trade-off in performance between SLS solvers and complete backtracking solvers. We empirically determine how the run time of SLS vs. backtracking solvers depends on the heterogeneity of the input, which is controlled by drawing variables according to a scale-free distribution. Our experiments reveal that the efficiency of complete solvers at the phase transition is strongly related to the heterogeneity of the degree distribution. We report results that suggest the depth of satisfying assignments in complete search trees is influenced by the level of heterogeneity as measured by a power-law exponent. We also find that incomplete SLS solvers, which scale well on uniform instances, are not affected by heterogeneity. The main contribution of this paper utilizes the scale-free random 3-SAT model to isolate heterogeneity as an important factor in the scaling discrepancy between complete and SLS solvers at the uniform phase transition found in previous works.
Bläsius, Thomas; Fischbeck, Philipp; Friedrich, Tobias; Schirneck, Martin Understanding the Effectiveness of Data Reduction in Public Transportation NetworksWorkshop on Algorithms and Models for the Web Graph (WAW) 2019: 87–101
Given a public transportation network of stations and connections, we want to find a minimum subset of stations such that each connection runs through a selected station. Although this problem is NP-hard in general, real-world instances are regularly solved almost completely by a set of simple reduction rules. To explain this behavior, we view transportation networks as hitting set instances and identify two characteristic properties, locality and heterogeneity. We then devise a randomized model to generate hitting set instances with adjustable properties. While the heterogeneity does influence the effectiveness of the reduction rules, the generated instances show that locality is the significant factor. Beyond that, we prove that the effectiveness of the reduction rules is independent of the underlying graph structure. Finally, we show that high locality is also prevalent in instances from other domains, facilitating a fast computation of minimum hitting sets.
Echzell, Hagen; Friedrich, Tobias; Lenzner, Pascal; Molitor, Louise; Pappik, Marcus; Schöne, Friedrich; Sommer, Fabian; Stangl, David Convergence and Hardness of Strategic Schelling SegregationWeb and Internet Economics (WINE) 2019: 156–170
The phenomenon of residential segregation was captured by Schelling's famous segregation model where two types of agents are placed on a grid and an agent is content with her location if the fraction of her neighbors which have the same type as her is at least \(\tau\), for some \(0<\tau<1\). Discontent agents simply swap their location with a randomly chosen other discontent agent or jump to a random empty cell. We analyze a generalized game-theoretic model of Schelling segregation which allows more than two agent types and more general underlying graphs modeling the residential area. For this we show that both aspects heavily influence the dynamic properties and the tractability of finding an optimal placement. We map the boundary of when improving response dynamics (IRD), i.e., the natural approach for finding equilibrium states, are guaranteed to converge. For this we prove several sharp threshold results where guaranteed IRD convergence suddenly turns into the strongest possible non-convergence result: a violation of weak acyclicity. In particular, we show such threshold results also for Schelling's original model, which is in contrast to the standard assumption in many empirical papers. Furthermore, we show that in case of convergence, IRD find an equilibrium in \(O(m)\) steps, where \(m\) is the number of edges in the underlying graph and show that this bound is met in empirical simulations starting from random initial agent placements.
FOGA ’19: Proceedings of the 15th ACM/SIGEVO Conference on Foundations of Genetic Algorithms ACM 2019
Editorship
The FOGA workshop series aims at advancing our understanding of the working principles behind evolutionary algorithms and related randomized search heuristics. FOGA is the premier event to discuss advances in the theoretical foundations of these algorithms, tools needed to analyze them, and different aspects of comparing algorithms' performance. FOGA XV has welcomed submissions covering the entire spectrum of work, ranging from rigorously derived mathematical results to carefully crafted empirical studies.
Theory of Randomized Optimization HeuristicsDagstuhl Reports 2019: 61–94
This report documents the activities of Dagstuhl Seminar 19431 on Theory of Randomized Optimization Heuristics. 46 researchers from Europe, Australia, Asia, and North America have come together to discuss ongoing research. This tenth edition of the seminar series had three focus topics: (1) relation between optimal control and heuristic optimization, (2) benchmarking optimization heuristics, and (3) the interfaces between continuous and discrete optimization. Several breakout sessions have provided ample opportunity to brainstorm on recent developments in the research landscape, to discuss and solve open problems, and to kick-start new research initiatives.
Bläsius, Thomas; Friedrich, Tobias; Schirneck, Martin The Minimization of Random HypergraphsCoRR 2019
ArXiv preprint
We investigate the maximum-entropy model \(\mathcal{B}_{n,m,p}\) for random \(n\)-vertex, \(m\)-edge multi-hypergraphs with expected edge size \(pn\). We show that the expected size of the minimization of \(\mathcal{B}_{n,m,p}\), i.e., the number of its inclusion-wise minimal edges, undergoes a phase transition with respect to \(m\). If \(m\) is at most \(1/(1-p)^{(1-p)n}\), then the minimization is of size \(\Theta(m)\). Beyond that point, for \(\alpha\) such that \(m = 1/(1-p)^{\alpha n}\) and \(\mathrm{H}\) being the entropy function, it is \(\Theta(1) \cdot \min\left(1, \, \frac{1}{(\alpha\,{-}\,(1-p)) \sqrt{(1\,{-}\,\alpha) n}} \right) \cdot 2^{(\mathrm{H}(\alpha) + (1-\alpha) \log_2 p) n}\). This implies that the maximum expected size over all \(m\) is \(\Theta((1+p)^n/\sqrt{n})\). Our structural findings have algorithmic implications for minimizing an input hypergraph, which in turn has applications in the profiling of relational databases as well as for the Orthogonal Vectors problem studied in fine-grained complexity. The main technical tool is an improvement of the Chernoff--Hoeffding inequality, which we make tight up to constant factors. We show that for a binomial variable \(X \sim \mathrm{Bin}(n,p)\) and real number \(0 < x \le p\), it holds that \(\mathrm{P}[X \leq xn] = \Theta(1) \cdot \min\left(1, \, \frac{1}{(p-x) \sqrt{xn}} \right) \cdot 2^{-\mathrm{D}(x \,{\|}\, p) n}\), where \(\mathrm{D}\) denotes the Kullback--Leibler divergence between Bernoulli distributions. The result remains true if \(x\) depends on \(n\) as long as it is bounded away from \(0\).