Dr. Martin Schirneck

I recently moved to a new research group:

Theory and Application of Algorithms
University of Vienna

You can reach me there via martin.schirneck(at)univie.ac.at.

Research Interests

My research interests include various topics in mathematics and theoretical computer science.
I am currently working on the following subjects.

fault-tolerant data structures
data profiling
enumeration algorithms and complexity
parameterized complexity
evolutionary computation and black-box complexity

I have been involved in past and ongoing research projects in the Algorithm Engineering group. These projects include scientific work with our students as well as collaborations with partners in industry.

A short CV can be found here.

Talks

In October 2018, I was invited to the Dagstuhl seminar on Algorithmic Enumeration to give a talk about Efficiently Enumerating Hitting Sets of Hypergraphs Arising in Data Profiling (slides).

I also had the opportunity to present my work at the Friedrich Schiller University Jena, the University of Glasgow, as well as the Humboldt University and Technical University Berlin. My contributed talks include presentations at ALENEX, ALT, ESA, FOGA, GECCO, IPEC, ITCS, MFCS, STACS, STOC, VLDB, and WEPA.

My ESA, ITCS, and STOC talks are available online.

STOC 2023 Approximate Distance Sensitivity Oracles in Subquadratic Space (slides)
ITCS 2022 Fixed-Parameter Sensitivity Oracles (slides)
ESA 2021 Near-Optimal Deterministic Single-Source Distance Sensitivity Oracles (slides)
ESA 2020 The Minimization of Random Hypergraphs (slides)

Teaching

Other Activities

In the Algorithm Engineering group, I am one of the mentors to the new members. I also maintain the group's news feed (RSS) and do some of the TYPO3 content management on our sites. Besides my studies, I try to increase article quality in the German Wikipedia, especially in the math and CS section.

In 2015, I was a tutor at the HPI Schülerkolleg teaching school children basic computer science.

I played Go as a member of the team Jena III in the German Bundesliga in the 2014/15 season.

Publications

A list of my publications can be found here and on DBLP.

Clean Citation Style 002

{ "authors" : [{ "lastname":"Bläsius" , "initial":"T" , "url":"https://hpi.de/friedrich/publications/people/thomas-blaesius.html" , "mail":"thomas.blasius(at)hpi.de" }, { "lastname":"Casel" , "initial":"K" , "url":"https://hpi.de/friedrich/publications/people/katrin-casel.html" , "mail":"katrin.casel(at)hpi.de" }, { "lastname":"Chauhan" , "initial":"A" , "url":"https://hpi.de/friedrich/publications/people/ankit-chauhan.html" , "mail":"ankit.chauhan(at)hpi.de" }, { "lastname":"Cohen" , "initial":"S" , "url":"https://hpi.de/friedrich/publications/people/sarel-cohen.html" , "mail":"sarel.cohen(at)hpi.de" }, { "lastname":"Cseh" , "initial":"�" , "url":"https://hpi.de/friedrich/publications/people/agnes-cseh.html" , "mail":"agnes.cseh(at)hpi.de" }, { "lastname":"Doskoč" , "initial":"V" , "url":"https://hpi.de/friedrich/publications/people/vanja-doskoc.html" , "mail":"vanja.doskoc(at)hpi.de" }, { "lastname":"Elijazyfer" , "initial":"Z" , "url":"https://hpi.de/friedrich/people/ziena-elijazyfer.html" , "mail":"ziena.elijazyfer(at)hpi.de" }, { "lastname":"Fischbeck" , "initial":"P" , "url":"https://hpi.de/friedrich/publications/people/philipp-fischbeck.html" , "mail":"philipp.fischbeck(at)hpi.de" }, { "lastname":"Friedrich" , "initial":"T" , "url":"https://hpi.de/friedrich/publications/people/tobias-friedrich.html" , "mail":"friedrich(at)hpi.de" }, { "lastname":"Göbel" , "initial":"A" , "url":"https://hpi.de/friedrich/publications/people/andreas-goebel.html" , "mail":"andreas.goebel(at)hpi.de" }, { "lastname":"Issac" , "initial":"D" , "url":"https://hpi.de/friedrich/publications/people/davis-issac.html" , "mail":"davis.issac(at)hpi.de" }, { "lastname":"Katzmann" , "initial":"M" , "url":"https://hpi.de/friedrich/publications/people/maximilian-katzmann.html" , "mail":"maximilian.katzmann(at)hpi.de" }, { "lastname":"Khazraei" , "initial":"A" , "url":"https://hpi.de/friedrich/publications/people/ardalan-khazraei.html" , "mail":"ardalan.khazraei(at)hpi.de" }, { "lastname":"Kötzing" , "initial":"T" , "url":"https://hpi.de/friedrich/publications/people/timo-koetzing.html" , "mail":"timo.koetzing(at)hpi.de" }, { "lastname":"Krejca" , "initial":"M" , "url":"https://hpi.de/friedrich/publications/people/martin-krejca.html" , "mail":"martin.krejca(at)hpi.de" }, { "lastname":"Krogmann" , "initial":"S" , "url":"https://hpi.de/friedrich/publications/people/simon-krogmann.html" , "mail":"simon.krogmann(at)hpi.de" }, { "lastname":"Krohmer" , "initial":"A" , "url":"https://hpi.de/friedrich/publications/people/anton-krohmer.html" , "mail":"none" }, { "lastname":"Kumar" , "initial":"N" , "url":"https://hpi.de/friedrich/publications/people/nikhil-kumar.html" , "mail":"nikhil.kumar(at)hpi.de" }, { "lastname":"Lagodzinski" , "initial":"G" , "url":"https://hpi.de/friedrich/publications/people/gregor-lagodzinski.html" , "mail":"gregor.lagodzinski(at)hpi.de" }, { "lastname":"Lenzner" , "initial":"P" , "url":"https://hpi.de/friedrich/publications/people/pascal-lenzner.html" , "mail":"pascal.lenzner(at)hpi.de" }, { "lastname":"Melnichenko" , "initial":"A" , "url":"https://hpi.de/friedrich/publications/people/anna-melnichenko.html" , "mail":"anna.melnichenko(at)hpi.de" }, { "lastname":"Molitor" , "initial":"L" , "url":"https://hpi.de/friedrich/publications/people/louise-molitor.html" , "mail":"louise.molitor(at)hpi.de" }, { "lastname":"Neubert" , "initial":"S" , "url":"https://hpi.de/friedrich/publications/people/stefan-neubert.html" , "mail":"stefan.neubert(at)hpi.de" }, { "lastname":"Pappik" , "initial":"M" , "url":"https://hpi.de/friedrich/publications/people/marcus-pappik.html" , "mail":"marcus.pappik(at)hpi.de" }, { "lastname":"Quinzan" , "initial":"F" , "url":"https://hpi.de/friedrich/publications/people/francesco-quinzan.html" , "mail":"francesco.quinzan(at)hpi.de" }, { "lastname":"Rizzo" , "initial":"M" , "url":"https://hpi.de/friedrich/publications/people/manuel-rizzo.html" , "mail":"manuel.rizzo(at)hpi.de" }, { "lastname":"Rothenberger" , "initial":"R" , "url":"https://hpi.de/friedrich/publications/people/ralf-rothenberger.html" , "mail":"ralf.rothenberger(at)hpi.de" }, { "lastname":"Schirneck" , "initial":"M" , "url":"https://hpi.de/friedrich/publications/people/martin-schirneck.html" , "mail":"martin.schirneck(at)hpi.de" }, { "lastname":"Seidel" , "initial":"K" , "url":"https://hpi.de/friedrich/publications/people/karen-seidel.html" , "mail":"karen.seidel(at)hpi.de" }, { "lastname":"Sutton" , "initial":"A" , "url":"https://hpi.de/friedrich/publications/people/andrew-m-sutton.html" , "mail":"none" }, { "lastname":"Weyand" , "initial":"C" , "url":"https://hpi.de/friedrich/publications/people/christopher-weyand.html" , "mail":"none" }]}

Bilò, Davide; Chechik, Shiri; Choudhary, Keerti; Cohen, Sarel; Friedrich, Tobias; Krogmann, Simon; Schirneck, Martin Approximate Distance Sensitivity Oracles in Subquadratic SpaceTheoretiCS 2024

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

Bilò, Davide; Choudhary, Keerti; Cohen, Sarel; Friedrich, Tobias; Krogmann, Simon; Schirneck, Martin Fault-Tolerant ST-Diameter OraclesInternational Colloquium on Automata, Languages and Programming (ICALP) 2023: 24:1–24:20

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

@inproceedings{bilo2023faulttolerant,
  abstract = {We study the problem of estimating the \(ST\)-diameter of a graph that is subject to a bounded number of edge failures. An \(f\)-edge fault-tolerant \(ST\)-diameter oracle (\(f\)-FDO-\(ST\)) is a data structure that preprocesses a given graph \(G\), two sets of vertices \(S,T\), and positive integer \(f\). When queried with a set \(F\) of at most \(f\) edges, the oracle returns an estimate \(\widehat{D}\) of the \(ST\)-diameter \(\mathrm{diam}(G-F,S,T)\), the maximum distance between vertices in \(S\) and \(T\) in \(G-F\). The oracle has stretch \(\sigma \geq 1\) if \(\mathrm{diam}(G-F,S,T) \leq \widehat{D} \leq \sigma \mathrm{diam}(G-F,S,T)\). If \(S\) and \(T\) both contain all vertices, the data structure is called an \(f\)-edge fault-tolerant diameter oracle (\(f\)-FDO). An \(f\)-edge fault-tolerant distance sensitivity oracles (\(f\)-DSO) estimates the pairwise graph distances under up to \(f\) failures. We design new \(f\)-FDOs and \(f\)-FDO-\(ST\)s by reducing their construction to that of all-pairs and single-source \(f\)-DSOs. We obtain several new tradeoffs between the size of the data structure, stretch guarantee, query and preprocessing times for diameter oracles by combining our black-box reductions with known results from the literature. We also provide an information-theoretic lower bound on the space requirement of approximate \(f\)-FDOs. We show that there exists a family of graphs for which any \(f\)-FDO with sensitivity \(f \ge 2\) and stretch less than \(5/3\) requires \(\Omega(n^{3/2})\) bits of space, regardless of the query time.},
  author = {Bilò, Davide and Choudhary, Keerti and Cohen, Sarel and Friedrich, Tobias and Krogmann, Simon and Schirneck, Martin},
  booktitle = {International Colloquium on Automata, Languages and Programming (ICALP)},
  keywords = {sarelcohen davidebilo tobiasfriedrich year2023 icalp keertichoudhary simonkrogmann martinschirneck},
  pages = {24:1-24:20},
  title = {Fault-Tolerant ST-Diameter Oracles},
  year = 2023
}

Bilò, Davide; Choudhary, Keerti; Cohen, Sarel; Friedrich, Tobias; Krogmann, Simon; Schirneck, Martin Compact Distance Oracles with Large Sensitivity and Low StretchAlgorithms and Data Structures Symposium (WADS) 2023: 149–163

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

@inproceedings{bilo2023compact,
  abstract = {An \(f\)-edge fault-tolerant distance sensitive oracle (\(f\)-DSO) with stretch \(\sigma \geq 1\) is a data-structure that preprocesses an input graph \(G = (V,E)\). When queried with the triple \((s,t,F)\), where \(s, t \in V\) and \(F \subseteq E\) contains at most \(f\) edges of \(G\), the oracle returns an estimate \(\widehat{d}_{G-F}(s,t)\) of the distance \(d_{G-F}(s,t)\) between \(s\) and \(t\) in the graph \(G-F\) such that \(d_{G-F}(s,t) \leq \widehat{d}_{G-F}(s,t) \leq \sigma \cdot d_{G-F}(s,t)\). For any positive integer \(k \ge 2\) and any \(0 < \alpha < 1\), we present an \(f\)-DSO with sensitivity \(f = o(\log n/\log\log n)\), stretch \(2k-1\), space \(O(n^{1+\frac{1}{k}+\alpha+o(1)})\), and an \(\widetilde{O}(n^{1+\frac{1}{k} - \frac{\alpha}{k(f+1)}})\) query time. Prior to our work, there were only three known \(f\)-DSOs with subquadratic space. The first one by Chechik et al. [Algorithmica 2012] has a stretch of \((8k-2)(f+1)\), depending on \(f\). Another approach is storing an \(f\)-edge fault-tolerant \((2k-1)\)-spanner of \(G\). The bottleneck is the large query time due to the size of any such spanner, which is \(\Omega(n^{1+1/k})\) under the Erdős girth conjecture. Bilò et al. [STOC 2023] gave a solution with stretch \(3+\varepsilon\), query time \(O(n^{\alpha})\) but space \(O(n^{2-\frac{\alpha}{f+1}})\), approaching the quadratic barrier for large sensitivity. In the realm of subquadratic space, our \(f\)-DSOs are the first ones that guarantee, at the same time, large sensitivity, low stretch, and non-trivial query time. To obtain our results, we use the approximate distance oracles of Thorup and Zwick [JACM 2005], and the derandomization of the \(f\)-DSO of Weimann and Yuster [TALG 2013] that was recently given by Karthik and Parter [SODA 2021].},
  author = {Bilò, Davide and Choudhary, Keerti and Cohen, Sarel and Friedrich, Tobias and Krogmann, Simon and Schirneck, Martin},
  booktitle = {Algorithms and Data Structures Symposium (WADS)},
  keywords = {sarelcohen davidebilo tobiasfriedrich year2023 keertichoudhary simonkrogmann martinschirneck wads},
  pages = {149-163},
  title = {Compact Distance Oracles with Large Sensitivity and Low Stretch},
  year = 2023
}

Casel, Katrin; Friedrich, Tobias; Schirneck, Martin; Wietheger, Simon Fair Correlation Clustering in ForestsFoundations of Responsible Computing (FORC) 2023: 9:1–9:12

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

The study of algorithmic fairness received growing attention recently. This stems from the awareness that bias in the input data for machine learning systems may result in discriminatory outputs. For clustering tasks, one of the most central notions of fairness is the formalization by Chierichetti, Kumar, Lattanzi, and Vassilvitskii [NeurIPS 2017]. A clustering is said to be fair, if each cluster has the same distribution of manifestations of a sensitive attribute as the whole input set. This is motivated by various applications where the objects to be clustered have sensitive attributes that should not be over- or underrepresented. Most research on this version of fair clustering has focused on centriod-based objectives. In contrast, we discuss the applicability of this fairness notion to Correlation Clustering. The existing literature on the resulting Fair Correlation Clustering problem either presents approximation algorithms with poor approximation guarantees or severely limits the possible distributions of the sensitive attribute (often only two manifestations with a 1:1 ratio are considered). Our goal is to understand if there is hope for better results in between these two extremes. To this end, we consider restricted graph classes which allow us to characterize the distributions of sensitive attributes for which this form of fairness is tractable from a complexity point of view. While existing work on Fair Correlation Clustering gives approximation algorithms, we focus on exact solutions and investigate whether there are efficiently solvable instances. The unfair version of Correlation Clustering is trivial on forests, but adding fairness creates a surprisingly rich picture of complexities. We give an overview of the distributions and types of forests where Fair Correlation Clustering turns from tractable to intractable. As the most surprising insight, we consider the fact that the cause of the hardness of Fair Correlation Clustering is not the strictness of the fairness condition. We lift most of our results to also hold for the relaxed version of the fairness condition. Instead, the source of hardness seems to be the distribution of the sensitive attribute. On the positive side, we identify some reasonable distributions that are indeed tractable. While this tractability is only shown for forests, it may open an avenue to design reasonable approximations for larger graph classes.

@inproceedings{case2023correlation,
  abstract = {The study of algorithmic fairness received growing attention recently. This stems from the awareness that bias in the input data for machine learning systems may result in discriminatory outputs. For clustering tasks, one of the most central notions of fairness is the formalization by Chierichetti, Kumar, Lattanzi, and Vassilvitskii [NeurIPS 2017]. A clustering is said to be fair, if each cluster has the same distribution of manifestations of a sensitive attribute as the whole input set. This is motivated by various applications where the objects to be clustered have sensitive attributes that should not be over- or underrepresented. Most research on this version of fair clustering has focused on centriod-based objectives. In contrast, we discuss the applicability of this fairness notion to Correlation Clustering. The existing literature on the resulting Fair Correlation Clustering problem either presents approximation algorithms with poor approximation guarantees or severely limits the possible distributions of the sensitive attribute (often only two manifestations with a 1:1 ratio are considered). Our goal is to understand if there is hope for better results in between these two extremes. To this end, we consider restricted graph classes which allow us to characterize the distributions of sensitive attributes for which this form of fairness is tractable from a complexity point of view. While existing work on Fair Correlation Clustering gives approximation algorithms, we focus on exact solutions and investigate whether there are efficiently solvable instances. The unfair version of Correlation Clustering is trivial on forests, but adding fairness creates a surprisingly rich picture of complexities. We give an overview of the distributions and types of forests where Fair Correlation Clustering turns from tractable to intractable. As the most surprising insight, we consider the fact that the cause of the hardness of Fair Correlation Clustering is not the strictness of the fairness condition. We lift most of our results to also hold for the relaxed version of the fairness condition. Instead, the source of hardness seems to be the distribution of the sensitive attribute. On the positive side, we identify some reasonable distributions that are indeed tractable. While this tractability is only shown for forests, it may open an avenue to design reasonable approximations for larger graph classes.},
  author = {Casel, Katrin and Friedrich, Tobias and Schirneck, Martin and Wietheger, Simon},
  booktitle = {Foundations of Responsible Computing (FORC)},
  keywords = {katrincasel simonwietheger forc tobiasfriedrich year2023 martinschirneck},
  pages = {9:1-9:12},
  title = {Fair Correlation Clustering in Forests},
  year = 2023
}

Bilò, Davide; Chechik, Shiri; Choudhary, Keerti; Cohen, Sarel; Friedrich, Tobias; Krogmann, Simon; Schirneck, Martin Approximate Distance Sensitivity Oracles in Subquadratic SpaceSymposium on Theory of Computing (STOC) 2023: 1396–1409

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

Friedrich, Tobias; Kötzing, Timo; Radhakrishnan, Aishwarya; Schiller, Leon; Schirneck, Martin; Tennigkeit, Georg; Wietheger, Simon Crossover for Cardinality Constrained OptimizationACM Transactions on Evolutionary Learning and Optimization 2023: 1–32

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

Bilò, Davide; Choudhary, Keerti; Cohen, Sarel; Friedrich, Tobias; Schirneck, Martin Deterministic Sensitivity Oracles for Diameter, Eccentricities and All Pairs DistancesInternational Colloquium on Automata, Languages and Programming (ICALP) 2022: 68:1–68:19

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

Friedrich, Tobias; Kötzing, Timo; Radhakrishnan, Aishwarya; Schiller, Leon; Schirneck, Martin; Tennigkeit, Georg; Wietheger, Simon Crossover for Cardinality Constrained OptimizationGenetic and Evolutionary Computation Conference (GECCO) 2022: 1399–1407

Best Paper Award (Theory Track)

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

Bläsius, Thomas; Friedrich, Tobias; Schirneck, Martin The Complexity of Dependency Detection and Discovery in Relational DatabasesTheoretical Computer Science 2022: 79–96

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

Bilò, Davide; Casel, Katrin; Choudhary, Keerti; Cohen, Sarel; Friedrich, Tobias; Lagodzinski, J.A. Gregor; Schirneck, Martin; Wietheger, Simon Fixed-Parameter Sensitivity OraclesInnovations in Theoretical Computer Science (ITCS) 2022: 23:1–23:18

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

We combine ideas from distance sensitivity oracles (DSOs) and fixed-parameter tractability (FPT) to design sensitivity oracles for FPT graph problems. An oracle with sensitivity \(f\) for an FPT problem \(\Pi\) on a graph \(G\) with parameter \(k\) preprocesses \(G\) in time \(O(g(f,k) poly(n))\). When queried with a set \(F\) of at most \(f\) edges of \(G\), the oracle reports the answer to the \(\Pi\)-with the same parameter \(k\)-on the graph \(G-F\), i.e., \(G\) deprived of \(F\). The oracle should answer queries in a time that is significantly faster than merely running the best-known FPT algorithm on \(G-F\) from scratch. We design sensitivity oracles for the \(k\)-Path and the \(k\)-Vertex Cover problem. Our first oracle for \(k\)-Path has size \(O(k^{f+1})\) size and query time \(O(f \min\{f, log (f) + k\})\). We use a technique inspired by the work of Weimann and Yuster [FOCS 2010, TALG 2013] on distance sensitivity problems to reduce the space to \(O\big(\big(\frac{f+k}{f}\big)^f \big(\frac{f+k}{k}\big)^k fk \cdot \log n\big)\) at the expense of increasing the query time to \(O\big(\big(\frac{f+k}{f}\big)^f \big(\frac{f+k}{k}\big)^k f \min\{f,k\} \cdot \log n \big)\). Both oracles can be modified to handle vertex-failures, but we need to replace \(k\) with \(2k\) in all the claimed bounds. Regarding \(k\)-Vertex Cover, we design three oracles offering different trade-offs between the size and the query time. The first oracle takes \(O(3^{f+k})\) space and has \(O(2^f)\) query time, the second one has a size of \(O(2^{f+k^2+k})\) and a query time of \(O(f{+}k^2)\); finally, the third one takes \(O(fk+k^2)\) space and can be queried in time \(O(1.2738^k + fk^2)\). All our oracles are computable in time (at most) proportional to their size and the time needed to detect a \(k\)-path or \(k\)-vertex cover, respectively. We also provide an interesting connection between \(k\)-Vertex Cover and the fault-tolerant shortest path problem, by giving a DSO of size \(O(poly(f,k) \cdot n)\) with query time in \(O(poly(f,k))\), where \(k\) is the size of a vertex cover. Following our line of research connecting fault-tolerant FPT and shortest paths problems, we introduce parameterization to the computation of distance preservers. Given a graph with a fixed source \(s\) and parameters \(f\),\(k\), we study the problem of constructing polynomial-sized oracles that reports efficiently, for any target vertex \(v\) and set \(F\) of at most \(f\) edge failures, whether the distance from \(s\) to \(v\) increases at most by an additive term of \(k\) in \(G-F\). The oracle size is \(O(2^k k^2 \cdot n)\), while the time needed to answer a query is \(O(2^k f^\omega k^\omega)\), where \(\omega<2.373\) is the matrix multiplication exponent. The second problem we study is about the construction of bounded-stretch fault-tolerant preservers. We construct a subgraph with \(O(2^{fk+f+k k \cdot n)\) edges that preserves those \(s\)-\(v\)-distances that do not increase by more than \(k\) upon failure of \(F\). This improves significantly over the \( \tilde{O} (f n^{2-\frac{1}{2^f}}) \) bound in the unparameterized case by Bodwin et al. [ICALP 2017].

@inproceedings{bilo2022fixedparameter,
  abstract = {We combine ideas from distance sensitivity oracles (DSOs) and fixed-parameter tractability (FPT) to design sensitivity oracles for FPT graph problems. An oracle with sensitivity \(f\) for an FPT problem \(\Pi\) on a graph \(G\) with parameter \(k\) preprocesses \(G\) in time \(O(g(f,k) poly(n))\). When queried with a set \(F\) of at most \(f\) edges of \(G\), the oracle reports the answer to the \(\Pi\)-with the same parameter \(k\)-on the graph \(G-F\), i.e., \(G\) deprived of \(F\). The oracle should answer queries in a time that is significantly faster than merely running the best-known FPT algorithm on \(G-F\) from scratch. We design sensitivity oracles for the \(k\)-Path and the \(k\)-Vertex Cover problem. Our first oracle for \(k\)-Path has size \(O(k^{f+1})\) size and query time \(O(f \min\{f, \log (f) + k\})\). We use a technique inspired by the work of Weimann and Yuster [FOCS 2010, TALG 2013] on distance sensitivity problems to reduce the space to \(O\big(\big(\frac{f+k}{f}\big)^f \big(\frac{f+k}{k}\big)^k fk \cdot \log n\big)\) at the expense of increasing the query time to \(O\big(\big(\frac{f+k}{f}\big)^f \big(\frac{f+k}{k}\big)^k f \min\{f,k\} \cdot \log n \big)\). Both oracles can be modified to handle vertex-failures, but we need to replace \(k\) with \(2k\) in all the claimed bounds. Regarding \(k\)-Vertex Cover, we design three oracles offering different trade-offs between the size and the query time. The first oracle takes \(O(3^{f+k})\) space and has \(O(2^f)\) query time, the second one has a size of \(O(2^{f+k^2+k})\) and a query time of \(O(f{+}k^2)\); finally, the third one takes \(O(fk+k^2)\) space and can be queried in time \(O(1.2738^k + fk^2)\). All our oracles are computable in time (at most) proportional to their size and the time needed to detect a \(k\)-path or \(k\)-vertex cover, respectively. We also provide an interesting connection between \(k\)-Vertex Cover and the fault-tolerant shortest path problem, by giving a DSO of size \(O(poly(f,k) \cdot n)\) with query time in \(O(poly(f,k))\), where \(k\) is the size of a vertex cover. Following our line of research connecting fault-tolerant FPT and shortest paths problems, we introduce parameterization to the computation of distance preservers. Given a graph with a fixed source \(s\) and parameters \(f\),\(k\), we study the problem of constructing polynomial-sized oracles that reports efficiently, for any target vertex \(v\) and set \(F\) of at most \(f\) edge failures, whether the distance from \(s\) to \(v\) increases at most by an additive term of \(k\) in \(G-F\). The oracle size is \(O(2^k k^2 \cdot n)\), while the time needed to answer a query is \(O(2^k f^\omega k^\omega)\), where \(\omega<2.373\) is the matrix multiplication exponent. The second problem we study is about the construction of bounded-stretch fault-tolerant preservers. We construct a subgraph with \(O(2^{fk+f+k} k \cdot n)\) edges that preserves those \(s\)-\(v\)-distances that do not increase by more than \(k\) upon failure of \(F\). This improves significantly over the \( \tilde{O} (f n^{2-\frac{1}{2^f}}) \) bound in the unparameterized case by Bodwin et al. [ICALP 2017].},
  author = {Bilò, Davide and Casel, Katrin and Choudhary, Keerti and Cohen, Sarel and Friedrich, Tobias and Lagodzinski, J.A. Gregor and Schirneck, Martin and Wietheger, Simon},
  booktitle = {Innovations in Theoretical Computer Science (ITCS)},
  keywords = {sarelcohen gregorlagodzinski davidebilo katrincasel simonwietheger tobiasfriedrich itcs keertichoudhary year2022 martinschirneck},
  pages = {23:1-23:18},
  title = {Fixed-Parameter Sensitivity Oracles},
  year = 2022
}

Bläsius, Thomas; Friedrich, Tobias; Lischeid, Julius; Meeks, Kitty; Schirneck, Martin Efficiently Enumerating Hitting Sets of Hypergraphs Arising in Data ProfilingJournal of Computer and System Sciences 2022: 192–213

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

Bilò, Davide; Cohen, Sarel; Friedrich, Tobias; Schirneck, Martin Space-Efficient Fault-Tolerant Diameter OraclesMathematical Foundations of Computer Science (MFCS) 2021: 18:1–18:16

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

We design \(f\)-edge fault-tolerant diameter oracles (\(f\)-FDO, or simply FDO if \(f=1\)). For a given directed or undirected and possibly edge-weighted graph \(G\) with \(n\) vertices and \(m\) edges and a positive integer \(f\), we preprocess the graph and construct a data structure that, when queried with a set \(F\) of edges, where \(|F| \leq f\), returns the diameter of \(G - F\). An \(f\)-FDO has stretch \(\sigma \geq 1\) if the returned value \(\widehat D\) satisfies \(\operatornamediam(G - F) leq widehat D leq sigma \operatornamediam(G - F)\). For the case of a single edge failure (\(f=1\)) in an unweighted directed graph, there exists an approximate FDO by Henzinger et al. [ITCS 2017] with stretch \((1+\varepsilon)\), constant query time, space \(O(m)\), and a combinatorial preprocessing time of \(\widetildeO(mn + n^1.5 \sqrt{Dm/\varepsilon})\), where \(D\) is the diameter. We present a near-optimal FDO with the same stretch, query time, and space. It has a preprocessing time of \(\widetildeO(mn + n^2/\varepsilon)\), which is better for any constant \(\varepsilon > 0\). The preprocessing time nearly matches a conditional lower bound for combinatorial algorithms, also by Henzinger et al. When using fast matrix multiplication instead, we achieve a preprocessing time of \(\widetildeO(n^2.5794 + n^2/\varepsilon)\). We further prove an information-theoretic lower bound showing that any FDO with stretch better than \(3/2\) requires \(\Omega(m)\) bits of space. Thus, for constant \(0 < \varepsilon < 3/2\), our combinatorial \((1+ \varepsilon)\)-approximate FDO is near-optimal in all the parameters. In the case of multiple edge failures (\(f>1\)) in undirected graphs with non-negative edge weights, we give an \(f\)-FDO with stretch \((f+2)\), query time \(O(f^2\log^2{n})\), \(\widetildeO(fn)\) space, and preprocessing time \(\widetildeO(fm)\). We complement this with a lower bound excluding any finite stretch in \(o(fn)\) space. Many real-world networks have polylogarithmic diameter. We show that for those graphs and up to \(f = o(\log n/ \log\log n)\) failures one can swap approximation for query time and space. We present an exact combinatorial \(f\)-FDO with preprocessing time \(mn^{1+o(1)}\), query time \(n^{o(1)}\), and space \(n^{2+o(1)}\). With fast matrix multiplication, the preprocessing time can be improved to \(n^{\omega+o(1)}\), where \(\omega < 2.373\) is the matrix multiplication exponent.

@inproceedings{bilo2021spaceefficient,
  abstract = {We design \(f\)-edge fault-tolerant diameter oracles (\(f\)-FDO, or simply FDO if \(f=1\)). For a given directed or undirected and possibly edge-weighted graph \(G\) with \(n\) vertices and \(m\) edges and a positive integer \(f\), we preprocess the graph and construct a data structure that, when queried with a set \(F\) of edges, where \(|F| \leq f\), returns the diameter of \(G - F\). An \(f\)-FDO has stretch \(\sigma \geq 1\) if the returned value \(\widehat D\) satisfies \(\operatorname{diam}(G - F) \leq \widehat D \leq \sigma \operatorname{diam}(G - F)\). For the case of a single edge failure (\(f=1\)) in an unweighted directed graph, there exists an approximate FDO by Henzinger et al. [ITCS 2017] with stretch \((1+\varepsilon)\), constant query time, space \(O(m)\), and a combinatorial preprocessing time of \(\widetilde{O}(mn + n^{1.5} \sqrt{Dm/\varepsilon})\), where \(D\) is the diameter. We present a near-optimal FDO with the same stretch, query time, and space. It has a preprocessing time of \(\widetilde{O}(mn + n^2/\varepsilon)\), which is better for any constant \(\varepsilon > 0\). The preprocessing time nearly matches a conditional lower bound for combinatorial algorithms, also by Henzinger et al. When using fast matrix multiplication instead, we achieve a preprocessing time of \(\widetilde{O}(n^{2.5794} + n^2/\varepsilon)\). We further prove an information-theoretic lower bound showing that any FDO with stretch better than \(3/2\) requires \(\Omega(m)\) bits of space. Thus, for constant \(0 < \varepsilon < 3/2\), our combinatorial \((1+ \varepsilon)\)-approximate FDO is near-optimal in all the parameters. In the case of multiple edge failures (\(f>1\)) in undirected graphs with non-negative edge weights, we give an \(f\)-FDO with stretch \((f+2)\), query time \(O(f^2\log^2{n})\), \(\widetilde{O}(fn)\) space, and preprocessing time \(\widetilde{O}(fm)\). We complement this with a lower bound excluding any finite stretch in \(o(fn)\) space. Many real-world networks have polylogarithmic diameter. We show that for those graphs and up to \(f = o(\log n/ \log\log n)\) failures one can swap approximation for query time and space. We present an exact combinatorial \(f\)-FDO with preprocessing time \(mn^{1+o(1)}\), query time \(n^{o(1)}\), and space \(n^{2+o(1)}\). With fast matrix multiplication, the preprocessing time can be improved to \(n^{\omega+o(1)}\), where \(\omega < 2.373\) is the matrix multiplication exponent.},
  author = {Bilò, Davide and Cohen, Sarel and Friedrich, Tobias and Schirneck, Martin},
  booktitle = {Mathematical Foundations of Computer Science (MFCS)},
  keywords = {sarelcohen davidebilo tobiasfriedrich mfcs year2021 martinschirneck},
  pages = {18:1&ndash;18:16},
  title = {Space-Efficient Fault-Tolerant Diameter Oracles},
  year = 2021
}

Bilò, Davide; Cohen, Sarel; Friedrich, Tobias; Schirneck, Martin Near-Optimal Deterministic Single-Source Distance Sensitivity OraclesEuropean Symposium on Algorithms (ESA) 2021: 18:1–18:17

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

Given a graph with a distinguished source vertex \(s\), the Single Source Replacement Paths (SSRP) problem is to compute and output, for any target vertex \(t\) and edge \(e\), the length \(d(s,t,e)\) of a shortest path from \(s\) to \(t\) that avoids a failing edge \(e\). A Single-Source Distance Sensitivity Oracle (Single-Source DSO) is a compact data structure that answers queries of the form \((t,e)\) by returning the distance \(d(s,t,e)\). We show how to compress the output of the SSRP problem on \(n\)-vertex, \(m\)-edge graphs with integer edge weights in the range \([1,M]\) into a deterministic Single-Source DSO that has size \(O(M^{1/2 n^{3/2})\) and query time \(\widetildeO(1)\). We prove that the space requirement is optimal (up to the word size). Our techniques can also handle vertex failures within the same bounds. Chechik and Cohen [SODA 2019] presented a combinatorial randomized \(\widetildeO(m\sqrt{n}+n^2)\) time SSRP algorithm for undirected and unweighted graphs. We derandomize their algorithm with the same asymptotic running time and apply our compression to obtain a deterministic Single-Source DSO with \(\widetildeO(m\sqrt{n}+n^2)\) preprocessing time, \(O(n^{3/2})\) space, and \(\widetildeO(1)\) query time. Our combinatorial Single-Source DSO has near-optimal space, preprocessing and query time for dense unweighted graphs, improving the preprocessing time by a \(\sqrt{n}\)-factor compared to previous results. Grandoni and Vassilevska Williams [FOCS 2012, TALG 2020] gave an algebraic randomized \(\widetildeO(Mn^\omega)\) time SSRP algorithm for (undirected and directed) graphs with integer edge weights in the range \([1,M]\), where \(\omega < 2.373\) is the matrix multiplication exponent. We derandomize their algorithm for undirected graphs and apply our compression to obtain an algebraic Single-Source DSO with \(\widetildeO(Mn^\omega)\) preprocessing time, \(O(M^{1/2 n^{3/2})\) space, and \(\widetildeO(1)\) query time. This improves the preprocessing time of algebraic Single-Source DSOs by polynomial factors compared to previous results. We also present further improvements of our Single-Source DSOs. We show that the query time can be reduced to a constant at the cost of increasing the size of the oracle to \(O(M^{1/3 n^{5/3})\) and that all our oracles can be made path-reporting. On sparse graphs with \(m=O(\frac{n^{5/4-\varepsilon}}{M^{7/4}})\) edges, for any constant \(\varepsilon > 0\), we reduce the preprocessing to randomized \(\widetildeO(M^7/8 m^1/2 n^{11/8}) = O(n^{2-\varepsilon/2})\) time. To the best of our knowledge, this is the first truly subquadratic time algorithm for building Single-Source DSOs on sparse graphs.

@inproceedings{bilo2021nearoptimal,
  abstract = {Given a graph with a distinguished source vertex \(s\), the Single Source Replacement Paths (SSRP) problem is to compute and output, for any target vertex \(t\) and edge \(e\), the length \(d(s,t,e)\) of a shortest path from \(s\) to \(t\) that avoids a failing edge \(e\). A Single-Source Distance Sensitivity Oracle (Single-Source DSO) is a compact data structure that answers queries of the form \((t,e)\) by returning the distance \(d(s,t,e)\). We show how to compress the output of the SSRP problem on \(n\)-vertex, \(m\)-edge graphs with integer edge weights in the range \([1,M]\) into a deterministic Single-Source DSO that has size \(O(M^{1/2} n^{3/2})\) and query time \(\widetilde{O}(1)\). We prove that the space requirement is optimal (up to the word size). Our techniques can also handle vertex failures within the same bounds. Chechik and Cohen [SODA 2019] presented a combinatorial randomized \(\widetilde{O}(m\sqrt{n}+n^2)\) time SSRP algorithm for undirected and unweighted graphs. We derandomize their algorithm with the same asymptotic running time and apply our compression to obtain a deterministic Single-Source DSO with \(\widetilde{O}(m\sqrt{n}+n^2)\) preprocessing time, \(O(n^{3/2})\) space, and \(\widetilde{O}(1)\) query time. Our combinatorial Single-Source DSO has near-optimal space, preprocessing and query time for dense unweighted graphs, improving the preprocessing time by a \(\sqrt{n}\)-factor compared to previous results. Grandoni and Vassilevska Williams [FOCS 2012, TALG 2020] gave an algebraic randomized \(\widetilde{O}(Mn^\omega)\) time SSRP algorithm for (undirected and directed) graphs with integer edge weights in the range \([1,M]\), where \(\omega < 2.373\) is the matrix multiplication exponent. We derandomize their algorithm for undirected graphs and apply our compression to obtain an algebraic Single-Source DSO with \(\widetilde{O}(Mn^\omega)\) preprocessing time, \(O(M^{1/2} n^{3/2})\) space, and \(\widetilde{O}(1)\) query time. This improves the preprocessing time of algebraic Single-Source DSOs by polynomial factors compared to previous results. We also present further improvements of our Single-Source DSOs. We show that the query time can be reduced to a constant at the cost of increasing the size of the oracle to \(O(M^{1/3} n^{5/3})\) and that all our oracles can be made path-reporting. On sparse graphs with \(m=O(\frac{n^{5/4-\varepsilon}}{M^{7/4}})\) edges, for any constant \(\varepsilon > 0\), we reduce the preprocessing to randomized \(\widetilde{O}(M^{7/8} m^{1/2} n^{11/8}) = O(n^{2-\varepsilon/2})\) time. To the best of our knowledge, this is the first truly subquadratic time algorithm for building Single-Source DSOs on sparse graphs.},
  author = {Bilò, Davide and Cohen, Sarel and Friedrich, Tobias and Schirneck, Martin},
  booktitle = {European Symposium on Algorithms (ESA)},
  keywords = {sarelcohen davidebilo esa tobiasfriedrich year2021 martinschirneck},
  pages = {18:1&ndash;18:17},
  title = {Near-Optimal Deterministic Single-Source Distance Sensitivity Oracles},
  year = 2021
}

Shi, Feng; Schirneck, Martin; Friedrich, Tobias; Kötzing, Timo; Neumann, Frank Correction to: Reoptimization Time Analysis of Evolutionary Algorithms on Linear Functions Under Dynamic Uniform ConstraintsAlgorithmica 2020: 3117–3123

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

Friedrich, Tobias; Kötzing, Timo; Lagodzinski, J. A. Gregor; Neumann, Frank; Schirneck, Martin Analysis of the (1+1) EA on Subclasses of Linear Functions under Uniform and Linear ConstraintsTheoretical Computer Science 2020: 3–19

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

Bläsius, Thomas; Friedrich, Tobias; Schirneck, Martin The Minimization of Random HypergraphsEuropean Symposium on Algorithms (ESA) 2020: 21:1–21:15

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

@inproceedings{Blaesius20MinimizationESA,
  abstract = {We investigate the maximum-entropy model \(\mathcal{B}_{n,m,p}\) for random \(n\)-vertex, \(m\)-edge multi-hypergraphs with expected edge size \(pn\). We show that the expected size of the minimization \(\min(\mathcal{B}_{n,m,p})\), i.e., the number of inclusion-wise minimal edges of \(\mathcal{B}_{n,m,p}\), undergoes a phase transition with respect to \(m\). If \(m\) is at most \(1/(1-p)^{(1-p)n}\), then \(\mathrm{E}[|\min(\mathcal{B}_{n,m,p})|]\) is of order \(\Theta(m)\), while for \(m \ge 1/(1-p)^{(1-p+\varepsilon)n}\) for any \(\varepsilon > 0\), it is \(\Theta( 2^{(\mathrm{H}(\alpha) + (1-\alpha) \log_2 p) n}/ \sqrt{n})\). Here, \(\mathrm{H}\) denotes the binary entropy function and \(\alpha = - (\log_{1-p} m)/n\). The result implies that the maximum expected number of minimal edges over all \(m\) is \(\Theta((1+p)^n/\sqrt{n})\). Our structural findings have algorithmic implications for minimizing an input hypergraph. This has applications in the profiling of relational databases as well as for the Orthogonal Vectors problem studied in fine-grained complexity. We make several technical contributions that are of independent interest in probability. First, we improve the Chernoff&ndash;Hoeffding theorem on the tail of the binomial distribution. In detail, we show that for a binomial variable \(Y \sim \mathrm{Bin}(n,p)\) and any \(0 < x < p\), it holds that \(\mathrm{P}[Y \le xn] = \Theta( 2^{-\!\mathrm{D}(x \,{\|}\, p) n}/\sqrt{n})\), where \(\mathrm{D}\) is the binary Kullback&ndash;Leibler divergence between Bernoulli distributions. We give explicit upper and lower bounds on the constants hidden in the big-O notation that hold for all \(n\). Secondly, we establish the fact that the probability of a set of cardinality \(i\) being minimal after \(m\) i.i.d. maximum-entropy trials exhibits a sharp threshold behavior at \(i^* = n + \log_{1-p} m\).},
  author = {Bläsius, Thomas and Friedrich, Tobias and Schirneck, Martin},
  booktitle = {European Symposium on Algorithms (ESA)},
  keywords = {esa thomasblaesius tobiasfriedrich martinschirneck year2020},
  pages = {21:1-21:15},
  title = {The Minimization of Random Hypergraphs},
  year = 2020
}

Birnick, Johann; Bläsius, Thomas; Friedrich, Tobias; Naumann, Felix; Papenbrock, Thorsten; Schirneck, Martin Hitting Set Enumeration with Partial Information for Unique Column Combination DiscoveryProceedings of the VLDB Endowment 2020: 2270–2283

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

Bläsius, Thomas; Friedrich, Tobias; Lischeid, Julius; Meeks, Kitty; Schirneck, Martin Efficiently Enumerating Hitting Sets of Hypergraphs Arising in Data ProfilingAlgorithm Engineering and Experiments (ALENEX) 2019: 130–143

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

Bläsius, Thomas; Fischbeck, Philipp; Friedrich, Tobias; Schirneck, Martin Understanding the Effectiveness of Data Reduction in Public Transportation NetworksWorkshop on Algorithms and Models for the Web Graph (WAW) 2019: 87–101

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

Shi, Feng; Schirneck, Martin; Friedrich, Tobias; Kötzing, Timo; Neumann, Frank Reoptimization Time Analysis of Evolutionary Algorithms on Linear Functions Under Dynamic Uniform ConstraintsAlgorithmica 2019: 828–857

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

Doerr, Benjamin; Fischbeck, Philipp; Frahnow, Clemens; Friedrich, Tobias; Kötzing, Timo; Schirneck, Martin Island Models Meet Rumor SpreadingAlgorithmica 2019: 886–915

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

Kötzing, Timo; Schirneck, Martin; Seidel, Karen Normal Forms in Semantic Language IdentificationAlgorithmic Learning Theory (ALT) 2017: 493–516

[ Abstract ] [ BibTeX ] [ URL ] [ Download ]

Shi, Feng; Schirneck, Martin; Friedrich, Tobias; Kötzing, Timo; Neumann, Frank Reoptimization Times of Evolutionary Algorithms on Linear Functions Under Dynamic Uniform ConstraintsGenetic and Evolutionary Computation Conference (GECCO) 2017: 1407–1414

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

Doerr, Benjamin; Fischbeck, Philipp; Frahnow, Clemens; Friedrich, Tobias; Kötzing, Timo; Schirneck, Martin Island Models Meet Rumor SpreadingGenetic and Evolutionary Computation Conference (GECCO) 2017: 1359–1366

[ Abstract ] [ BibTeX ] [ DOI ] [ Download ]

Bläsius, Thomas; Friedrich, Tobias; Schirneck, Martin The Parameterized Complexity of Dependency Detection in Relational DatabasesInternational Symposium on Parameterized and Exact Computation (IPEC) 2016: 6:1–6:13

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

Friedrich, Tobias; Kötzing, Timo; Krejca, Martin S.; Nallaperuma, Samadhi; Neumann, Frank; Schirneck, Martin Fast Building Block Assembly by Majority Vote CrossoverGenetic and Evolutionary Computation Conference (GECCO) 2016: 661–668

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

Kötzing, Timo; Schirneck, Martin Towards an Atlas of Computational Learning TheorySymposium on Theoretical Aspects of Computer Science (STACS) 2016: 47:1–47:13

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

Theses

Clean Citation Style 002

Schirneck, Martin Enumeration Algorithms in Data ProfilingPhD Thesis, Hasso Plattner Institute, University of Potsdam 2022

[ Abstract ] [ BibTeX ] [ Download ]

Data profiling is the extraction of metadata from relational databases. An important class of metadata are multi-column dependencies. They come associated with two computational tasks. The detection problem is to decide whether a dependency of a given type and size holds in a database. The discovery problem instead asks to enumerate all valid dependencies of that type. We investigate the two problems for three types of dependencies: unique column combinations (UCCs), functional dependencies (FDs), and inclusion dependencies (INDs). We first treat the parameterized complexity of the detection variants. We prove that the detection of UCCs and FDs, respectively, is W[2]-complete when parameterized by the size of the dependency. The detection of INDs is shown to be one of the first natural W[3]-complete problems. We further settle the enumeration complexity of the three discovery problems by presenting parsimonious equivalences with well-known enumeration problems. Namely, the discovery of UCCs is equivalent to the famous transversal hypergraph problem of enumerating the hitting sets of a hypergraph. The discovery of FDs is equivalent to the simultaneous enumeration of the hitting sets of multiple input hypergraphs. Finally, the discovery of INDs is shown to be equivalent to enumerating the satisfying assignments of antimonotone, 3-normalized Boolean formulas. In the remainder of the thesis, we design and analyze discovery algorithms for unique column combinations. Since this is as hard as the general transversal hypergraph problem, it is an open question whether the UCCs of a database can be computed in output-polynomial time in the worst case. For the analysis, we therefore focus on instances that are structurally close to databases in practice, most notably, inputs that have small solutions. The equivalence between UCCs and hitting sets transfers the computational hardness, but also allows us to apply ideas from hypergraph theory to data profiling. We devise an discovery algorithm that runs in polynomial space on arbitrary inputs and achieves polynomial delay whenever the maximum size of any minimal UCC is bounded. Central to our approach is the extension problem for minimal hitting sets, that is, to decide for a set of vertices whether they are contained in any minimal solution. We prove that this is yet another problem that is complete for the complexity class W[3], when parameterized by the size of the set that is to be extended. We also give several conditional lower bounds under popular hardness conjectures such as the Strong Exponential Time Hypothesis (SETH). The lower bounds suggest that the running time of our algorithm for the extension problem is close to optimal. We further conduct an empirical analysis of our discovery algorithm on real-world databases to confirm that the hitting set perspective on data profiling has merits also in practice. We show that the resulting enumeration times undercut their theoretical worst-case bounds on practical data, and that the memory consumption of our method is much smaller than that of previous solutions. During the analysis we make two observations about the connection between databases and their corresponding hypergraphs. On the one hand, the hypergraph representations containing all relevant information are usually significantly smaller than the original inputs. On the other hand, obtaining those hypergraphs is the actual bottleneck of any practical application. The latter often takes much longer than enumerating the solutions, which is in stark contrast to the fact that the preprocessing is guaranteed to be polynomial while the enumeration may take exponential time. To make the first observation rigorous, we introduce a maximum-entropy model for non-uniform random hypergraphs and prove that their expected number of minimal hyperedges undergoes a phase transition with respect to the total number of edges. The result also explains why larger databases may have smaller hypergraphs. Motivated by the second observation, we present a new kind of UCC discovery algorithm called Hitting Set Enumeration with Partial Information and Validation (HPIValid). It utilizes the fast enumeration times in practice in order to speed up the computation of the corresponding hypergraph. This way, we sidestep the bottleneck while maintaining the advantages of the hitting set perspective. An exhaustive empirical evaluation shows that HPIValid outperforms the current state of the art in UCC discovery. It is capable of processing databases that were previously out of reach for data profiling.

@mastersthesis{schirneck2022enumeration,
  abstract = {Data profiling is the extraction of metadata from relational databases. An important class of metadata are multi-column dependencies. They come associated with two computational tasks. The detection problem is to decide whether a dependency of a given type and size holds in a database. The discovery problem instead asks to enumerate all valid dependencies of that type. We investigate the two problems for three types of dependencies: unique column combinations (UCCs), functional dependencies (FDs), and inclusion dependencies (INDs). We first treat the parameterized complexity of the detection variants. We prove that the detection of UCCs and FDs, respectively, is W[2]-complete when parameterized by the size of the dependency. The detection of INDs is shown to be one of the first natural W[3]-complete problems. We further settle the enumeration complexity of the three discovery problems by presenting parsimonious equivalences with well-known enumeration problems. Namely, the discovery of UCCs is equivalent to the famous transversal hypergraph problem of enumerating the hitting sets of a hypergraph. The discovery of FDs is equivalent to the simultaneous enumeration of the hitting sets of multiple input hypergraphs. Finally, the discovery of INDs is shown to be equivalent to enumerating the satisfying assignments of antimonotone, 3-normalized Boolean formulas. In the remainder of the thesis, we design and analyze discovery algorithms for unique column combinations. Since this is as hard as the general transversal hypergraph problem, it is an open question whether the UCCs of a database can be computed in output-polynomial time in the worst case. For the analysis, we therefore focus on instances that are structurally close to databases in practice, most notably, inputs that have small solutions. The equivalence between UCCs and hitting sets transfers the computational hardness, but also allows us to apply ideas from hypergraph theory to data profiling. We devise an discovery algorithm that runs in polynomial space on arbitrary inputs and achieves polynomial delay whenever the maximum size of any minimal UCC is bounded. Central to our approach is the extension problem for minimal hitting sets, that is, to decide for a set of vertices whether they are contained in any minimal solution. We prove that this is yet another problem that is complete for the complexity class W[3], when parameterized by the size of the set that is to be extended. We also give several conditional lower bounds under popular hardness conjectures such as the Strong Exponential Time Hypothesis (SETH). The lower bounds suggest that the running time of our algorithm for the extension problem is close to optimal. We further conduct an empirical analysis of our discovery algorithm on real-world databases to confirm that the hitting set perspective on data profiling has merits also in practice. We show that the resulting enumeration times undercut their theoretical worst-case bounds on practical data, and that the memory consumption of our method is much smaller than that of previous solutions. During the analysis we make two observations about the connection between databases and their corresponding hypergraphs. On the one hand, the hypergraph representations containing all relevant information are usually significantly smaller than the original inputs. On the other hand, obtaining those hypergraphs is the actual bottleneck of any practical application. The latter often takes much longer than enumerating the solutions, which is in stark contrast to the fact that the preprocessing is guaranteed to be polynomial while the enumeration may take exponential time. To make the first observation rigorous, we introduce a maximum-entropy model for non-uniform random hypergraphs and prove that their expected number of minimal hyperedges undergoes a phase transition with respect to the total number of edges. The result also explains why larger databases may have smaller hypergraphs. Motivated by the second observation, we present a new kind of UCC discovery algorithm called Hitting Set Enumeration with Partial Information and Validation (HPIValid). It utilizes the fast enumeration times in practice in order to speed up the computation of the corresponding hypergraph. This way, we sidestep the bottleneck while maintaining the advantages of the hitting set perspective. An exhaustive empirical evaluation shows that HPIValid outperforms the current state of the art in UCC discovery. It is capable of processing databases that were previously out of reach for data profiling.},
  annote = {PhD Thesis, Hasso Plattner Institute, University of Potsdam},
  author = {Schirneck, Martin},
  keywords = {martinschirneck theses year2022},
  school = {Hasso Plattner Institute, University of Potsdam},
  title = {Enumeration Algorithms in Data Profiling},
  year = 2022
}

Schirneck, Martin On Restrictions in Computational Language LearningMaster Thesis, Friedrich Schiller University Jena 2015

2016 Dean's Prize for Best Thesis (Examenspreis des Dekans).

[ Abstract ] [ BibTeX ] [ Download ]

Schirneck, Martin Betrachtungen über ein distanzbasiertes KlassifikationsverfahrenBachelor Thesis, Friedrich Schiller University Jena 2012

[ Abstract ] [ BibTeX ] [ Download ]

As Advisor

As Guest Lecturer

As Teaching Assistant