Publications

We try to keep an up to date list of all our publications. If you are interested in a PDF that we have not uploaded yet, feel free to send us an email to get a copy. All recent publications you will find below. For older, please click appropriate year.

Publications of the years 2025, 2024, 2023, 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007

2014

Advancing Big Data Benchmarks - Proceedings of the 2013 Workshop Series on Big Data Benchmarking, WBDB.cn, Xi’an, China, July 16-17, 2013 and WBDB.us, San José, CA, USA, October 9-10, 2013 Revised Selected Papers Rabl, Tilmann; Jacobsen, Hans-Arno; Nambiar, Raghunath; Poess, Meikel; Bhandarkar, Milind A.; Baru, Chaitanya K. in Lecture Notes in Computer Science (2014). (Vol. 8585) Springer.

[ BibTeX ] [ EndNote ] [ URL ]

Specifying Big Data Benchmarks - First Workshop, WBDB 2012, San Jose, CA, USA, May 8-9, 2012, and Second Workshop, WBDB 2012, Pune, India, December 17-18, 2012, Revised Selected Papers Rabl, Tilmann; Poess, Meikel; Baru, Chaitanya K.; Jacobsen, Hans-Arno in Lecture Notes in Computer Science (2014). (Vol. 8163) Springer.

[ BibTeX ] [ EndNote ] [ URL ]

Towards a Complete BigBench Implementation. Rabl, Tilmann; Frank, Michael; Danisch, Manuel; Gowda, Bhaskar; Jacobsen, Hans-Arno (2014). 3–11.

[ Abstract ] [ BibTeX ] [ EndNote ] [ Download ]

Discussion of BigBench: A Proposed Industry Standard Performance Benchmark for Big Data. Baru, Chaitanya K.; Bhandarkar, Milind A.; Curino, Carlo; Danisch, Manuel; Frank, Michael; Gowda, Bhaskar; Jacobsen, Hans-Arno; Jie, Huang; Kumar, Dileep; Nambiar, Raghunath Othayoth; Poess, Meikel; Raab, Francois; Rabl, Tilmann; Ravi, Nishkam; Sachs, Kai; Sen, Saptak; Yi, Lan; Youn, Choonhan (2014). 44–63.

[ BibTeX ] [ EndNote ] [ URL ]

PSBench: A Benchmark for Content- and Topic-Based Publish/Subscribe Systems. Zhang, Kaiwen; Rabl, Tilmann; Sun, Yi Ping; Kumar, Rushab; Zen, Nayeem; Jacobsen, Hans-Arno (2014). 17–18.

[ BibTeX ] [ EndNote ] [ URL ]

CaSSanDra: An SSD Boosted Key-Value Store. Menon, Prashanth; Rabl, Tilmann; Sadoghi, Mohammad; Jacobsen, Hans-Arno (2014). 1162–1167.

[ Abstract ] [ BibTeX ] [ EndNote ] [ Download ]

Optimizing Key-Value Stores for Hybrid Storage Architectures. Menon, Prashanth; Rabl, Tilmann; Sadoghi, Mohammad; Jacobsen, Hans-Arno (2014). 355–358.

[ BibTeX ] [ EndNote ] [ URL ]

Materialized Views in Cassandra. Rabl, Tilmann; Jacobsen, Hans-Arno (2014). 351–354.

[ Abstract ] [ BibTeX ] [ EndNote ] [ Download ]

DGFIndex for Smart Grid: Enhancing Hive with a Cost-Effective Multidimensional Range Index. Liu, Yue; Hu, Songlin; Rabl, Tilmann; Liu, Wantao; Jacobsen, Hans-Arno; Wu, Kaifeng; Chen, Jian; Li, Jintao in PVLDB (2014). 7(13) 1496–1507.

[ Abstract ] [ BibTeX ] [ EndNote ] [ Download ]

@article{DBLP:journals/pvldb/LiuHRLJWCL14,
  abstract = {In Smart Grid applications, as the number of deployed electric smart meters increases, massive amounts of valuable meter data is generated and collected every day. To enable reliable data collection and make business decisions fast, high throughput storage and high-performance analysis of massive meter data become crucial for grid companies. Considering the advantage of high efficiency, fault tolerance, and price-performance of Hadoop and Hive systems, they are frequently deployed as underlying platform for big data processing. However, in real business use cases, these data analysis applications typically involve multidimensional range queries (MDRQ) as well as batch reading and statistics on the meter data. While Hive is high-performance at complex data batch reading and analysis, it lacks efficient indexing techniques for MDRQ. In this paper, we propose DGFIndex, an index structure for Hive that efficiently supports MDRQ for massive meter data. DGFIndex divides the data space into cubes using the grid file technique. Unlike the existing indexes in Hive, which stores all combinations of multiple dimensions, DGFIndex only stores the information of cubes. This leads to smaller index size and faster query processing. Furthermore, with pre-computing user-defined aggregations of each cube, DGFIndex only needs to access the boundary region for aggregation query. Our comprehensive experiments show that DGFIndex can save significant disk space in comparison with the existing indexes in Hive and the query performance with DGFIndex is 2-63 times faster than existing indexes in Hive, 2-94 times faster than HadoopDB, 2-75 times faster than scanning the whole table in different query selectivity.},
  author = {Liu, Yue and Hu, Songlin and Rabl, Tilmann and Liu, Wantao and Jacobsen, Hans{-}Arno and Wu, Kaifeng and Chen, Jian and Li, Jintao},
  journal = {{PVLDB}},
  keywords = {PVLDB},
  number = 13,
  pages = {1496-1507},
  title = {DGFIndex for Smart Grid: Enhancing Hive with a Cost-Effective Multidimensional Range Index},
  volume = 7,
  year = 2014
}

%0 Journal Article
%1 DBLP:journals/pvldb/LiuHRLJWCL14
%A Liu, Yue
%A Hu, Songlin
%A Rabl, Tilmann
%A Liu, Wantao
%A Jacobsen, Hans-Arno
%A Wu, Kaifeng
%A Chen, Jian
%A Li, Jintao
%D 2014
%J PVLDB
%N 13
%P 1496-1507
%T DGFIndex for Smart Grid: Enhancing Hive with a Cost-Effective Multidimensional Range Index
%V 7
%X In Smart Grid applications, as the number of deployed electric smart meters increases, massive amounts of valuable meter data is generated and collected every day. To enable reliable data collection and make business decisions fast, high throughput storage and high-performance analysis of massive meter data become crucial for grid companies. Considering the advantage of high efficiency, fault tolerance, and price-performance of Hadoop and Hive systems, they are frequently deployed as underlying platform for big data processing. However, in real business use cases, these data analysis applications typically involve multidimensional range queries (MDRQ) as well as batch reading and statistics on the meter data. While Hive is high-performance at complex data batch reading and analysis, it lacks efficient indexing techniques for MDRQ. In this paper, we propose DGFIndex, an index structure for Hive that efficiently supports MDRQ for massive meter data. DGFIndex divides the data space into cubes using the grid file technique. Unlike the existing indexes in Hive, which stores all combinations of multiple dimensions, DGFIndex only stores the information of cubes. This leads to smaller index size and faster query processing. Furthermore, with pre-computing user-defined aggregations of each cube, DGFIndex only needs to access the boundary region for aggregation query. Our comprehensive experiments show that DGFIndex can save significant disk space in comparison with the existing indexes in Hive and the query performance with DGFIndex is 2-63 times faster than existing indexes in Hive, 2-94 times faster than HadoopDB, 2-75 times faster than scanning the whole table in different query selectivity.

TPC-DI: The First Industry Benchmark for Data Integration. Poess, Meikel; Rabl, Tilmann; Jacobsen, Hans-Arno; Caufield, Brian in PVLDB (2014). 7(13) 1367–1378.

[ Abstract ] [ BibTeX ] [ EndNote ] [ Download ]

Publications

Chair

News

20.11.2024 | Paper on Ecological Efficiency of Database Servers Accepted at CIDR 2025

09.08.2024 | Paper on Query Compilation for GPUs accepted at LWDA '24

18.07.2024 | Stork paper accepted at DATAI '24

08.03.2024 | CXL Buffer Management Paper Accepted at HardBD & Active '24

01.02.2024 | InferDB paper accepted at VLDB '24

Events

24.03.2022 | FG DB Symposium

Directions