Hasso-Plattner-InstitutSDG am HPI
Hasso-Plattner-InstitutDSG am HPI
  
Login
  • de
 

DESIOT: Processing Multiple Queries with IoT Networks

Wang Yue

Chair for Data Engineering Systems

Hasso Plattner Institute

Office: F-1.06
Tel .: + 49- (0) 331 5509-280
Email: Wang.Yue (at) hpi.de
Links: Homepage

Supervisor: Prof. Dr. Tilmann Rabl

 

 

We are focusing on the efficient window aggregation in distributed networks regarding to stream processing. In particular, we are interested in finding the approach that shares multiple queries among different nodes and slicing windows with multiple queries in an effective and efficient way.

Motivation

In recent year, there is a very important problem come to the us that the adaptation of sensors has rapidly increased over the past decades, which construct a huge sensor network for many aspects in our daily life. According to the survey, there are at least 20 billion IoT devices will be connected to the sensor network by 2020.[1] In addition, the use-cases faced by sensor networks are becoming diversify and sophisticated, which includes from smart cities to health monitoring. To handle those use-cases wherein the sensor networks, there are two main ideas. For the first idea, we can utilize the modern hardware to speed up the performance of each IoT device, which combines the classical IoT solution with current hardware techniques. However, the high-performance hardware probably is unaffordable for some scenarios, also, because of its compatibility, it is hard to transfer and extend solutions to another architecture and scenario.
Therefore, we proposed our own solution that is completely in fog computing way and made use of the idea of stream processing to cope with the use-cases that in the sensor networks.

Research Problem

To depict our scenario, I will give an example of a smart home. In general, an apartment building contains several rooms that are fully separate. And the administrator or devices would monitor the physical situation of each room. For instance, administrators have to know the temperature of each room every 5 min in the case on fire, and central air condition adjust the humidity and temperature depends on the situation of each room. Thus, if we generalize the problem mentioned above, we can deem each room as a single data stream and every inquiry to rooms as the query. The problem is how to implement multiple queries into distributed homogeneous data streams. The problem we are focusing on is able to break down in tow slighter issues.
How can we process a single query in a distributed data stream?
How can multiple queries be processed efficiently in a distributed data stream?

Solution

In our work, we proposed a stream processing system so-called DESIoT to against the problem we mentioned, which is split into three layers, local node, intermedia node, and the root node. To illustrate the whole structure let us follow the example of a smart home. In the local node, it collects data from the sensors of a single room, and then creates a window as well as calculates data before sending it to the intermedia node. Afterward, the intermedia node aggregates all the windows from local nodes that belong to it into a new window and transmit the result to the root node. Eventually, in the top layer, the root node aggregates all the results from intermedia nodes and produces the final result. Each node has two components, window manager and node manager. The window manager is responsible for creating windows and aggregating windows, essentially it processes the query. While the query manager maintains the connection between different nodes and provides failure recovery and state management.

Related Work

We now present some related work with stream processing in distributed network and window aggregation among multiple queries. Madden et al. [2] propose the Tiny Aggregation that is a tree-structured distributed data aggregation. And it works a lot on how to build the network route between different nodes. In Cougar [3,4] and LEACH [5], the authors make use of a class-structured network, which elected cluster node among other nodes that aggregate windows from other nodes. The PEGASIS [6] introduces a chain-structured approach that all the nodes are a priori and conscious of all other nodes. SlideSide[7] proposed an approach for incremental processing in multi-queries scenarios that maintains a running prefix-/suffix-scan over the input stream. FlatFAT[8] uses a tree structure to store the partial result.
In summary, there are so many stream processing techniques that focus on sensor networks and multiple queries. However, none of them try to combine those together, which is a general problem met by many realistic scenarios. Also, they lack the solution of decomposable function in stream processing and only concentrate on the sliding or tumbling windows.

Methodology

Compared with traditional stream processing, in a sensor network, the feature of data sources are distributed that we cannot centralize all the data into the root node. Therefore, the state-of-are process the query locally to reduce the network overhead and share the pressure of the root node. However, the current classification of window types does not fit sensor networks completely. We proposed 9 new window types with different window bound including local bound, holistic bound, and global bound that specific for the sensor network. And then the 9 window types are combined with decomposable function and non-decomposable function, which eventually produces two window types, the local window, and center window. For the local window, its window bound can be determined by the node itself, thus the node is able to aggregate data before transmitting data to the upper node. On other hand, in the global window, it is impossible to pre-calculate the data before every data centralized in the root node. Our approach for the local window is to do partial aggregation in both the local node and intermedia node. On the contrary, without doing aggregation in the global window, our approach only compresses and sort data where in the global window. When the scenario has to process multi-query into a distributed sensor network, we proposed DESMA algorithm, which tries to merge all the queries into a single query group. Afterward, the algorithm slices the query group into several sub-windows by the window bound of original queries. Thus, we are able to share the result of sub-windows in the case without re-calculating. Then, the node assembly the results of sub-windows that belong to the same query and send the final result to the upper node.

Evaluation

We totally have 7 virtual machines for benchmarking, 4 for the local node, 2 for intermedia node, and last one for the root node, each upper node connects to 2 child nodes. There are three baselines is going to make a comparison with our DESIoT. The first baseline centralizes all data to the root node. The second baseline is the disco. In the second baseline, we make use of our DESIoT but remove the DESMA algorithm. Afterward, we compare the DESIoT with three baselines on latency and throughput. Also, we would present the network overhead and node performance of each baseline and DESMA.​​​​​​​

References

  1. Mark Hung. Leading the IoT - Gartner Insights on How to Lead in a Connected World. Gartner Research, 2017
  2. Samuel Madden, Michael Franklin, Joseph Hellerstein, and Wei Hong. TAG: a Tiny AGgregation Service for Ad-Hoc Sensor Networks. In OSDI, pages 131–146. ACM, 2002.
  3. Johannes Gehrke and Samuel Madden. Sensor and actuator networks - Query processing in sensor networks. IEEE Pervasive Computing, 3(1):46–55, 2004.
  4. Yong Yao and Johannes Gehrke. The cougar approach to in-network query process- ing in sensor networks. ACM SIGMOD Record, 31(3):9–18, 2002.
  5. Wendi Heinzelman, Anantha Chandrakasan, and Hari Balakrishnan. An application-specific protocol architecture for wireless microsensor networks. IEEE Transactions on Wireless Communications, 1(4):660–670, 2002.
  6. Stephanie Lindsey, Cauligi Raghavendra, and Krishna Sivalingam. Data gathering algorithms in sensor networks using energy metrics. TPDS, 13(9):924–935, 2002.
  7. Theodorakis, Georgios, Peter Pietzuch, and Holger Pirk. "SlideSide: a fast incremental stream processing algorithm for multiple queries." (2020). 
  8. Tangwongsan, Kanat, et al. "General incremental sliding-window aggregation." Proceedings of the VLDB Endowment 8.7 (2015): 702-713.