This result is more informative than the compression-based algorithm predictions. However, to make their comparison possible, we supply the algorithm with the time that an event occurs, and require the prediction of the event type. For this purpose, all detected T-patterns in the pattern dictionary are used to create their critical intervals based on a fixed history, and these are checked for inclusion of the event time.
For each applicable pattern, a uniform distribution within the critical interval is assumed, and the probabilities of different patterns are combined. We summarize the experimental results in Table 1. The significance of the proposed improvements is obvious.
T-Patterns Revisited: Mining for Temporal Patterns in Sensor Data
Table 1 shows that for a given layout and a single user in the environment, compression based methods have good prediction accuracy within the limits imposed by layout and behaviour dictionary. For Layout 2, ALZ achieves This shows that the parameters of the compression methods are correctly set. In fact, these methods have few tunable parameters, and they are robust in face of these. What is important here, is that the compression methods deteriorate rapidly as soon as multiple users are introduced into the environment, as it is evident from the second and fourth columns of Table 1.
From the results it is evident that the T-pattern-based approaches perform better than compression-based approaches. This is most apparent in the 2-person scenario where the intermingling of 1-person patterns generates a large number of new combinations, a fair bit of which are erroneously identified as T-patterns. The GMM approach fares much better, even in the more difficult 2-person scenario.
We have used the MERL motion detector dataset for a larger scale experiments [ 3 ]. The T-pattern approach is not applicable to this dataset in its original form, because the number of unique events i. The PIR sensors fire when someone or something passes near the sensor. Via simple binary activations of these sensors, this dataset expresses the residual trace of the activity of all people working in the two-floor facility.
It has been previously used in the IEEE Information Visualization Challenge, and presents a significant challenge for behavior analysis, search, manipulation and visualization. The accompanying ground truth contains partial tracks and behavior detections, as well as map data and anonymous calendar data. We have two separate experimental setups on this dataset. We use a small portion of the MERL data for this purpose, as the temporal requirement for the TPattern variants are prohibitive.
The 15 sensors are selected as five clusters of sensor triplets, where each triplet is in close proximity and highly correlated, but the clusters are remotely located in the building, thus uncorrelated in principle. Any correctly sequenced within-cluster patterns are correct , and any cross-cluster patterns are spurious. Out-of-sequence patterns within clusters can also be detected. We labeled them as gray. We report the number of correctly found, missed, spurious and gray area patterns separately for each method.
We also consider Bonferroni correction in this section. The number of tests needs to be estimated for Bonferroni correction. The number of tests per event pair was elaborated before, we now complement this with the estimation of the number of event pairs. If there are n sensors, the number of elementary patterns we need to consider is n 2.
Assume m of these are accepted as patterns. The number of tertiary patterns to be examined will be nm. In this section we will not go beyond tertiary patterns; they are sufficient for comparison purposes. Assume a topologically uniform placement of sensors, in a simple mesh. Each sensor will have four neighbours, with which it would form elementary patterns, creating 2 n patterns in the process ignoring boundary conditions.
The independence testing for GMMTPat further reduces this number, as we no longer test all pairs of events for the existence of T-patterns.
For an event horizon of steps and sensors, this means four billion tests. Table 2 shows the average pattern counts and numbers of tests obtained with all three methods, as well as their Bonferroni-adjusted variants. The last row displays the expected number of tests, computed under the mentioned assumptions. The Bonferroni adjustment is largely robust to changes in this value, as long as the fluctuation is under an order of magnitude. It is clear that while Bonferroni adjustment eliminates many spurious patterns, it does not effect the time complexity much. There is of course a gain due to eliminated patterns; less patterns are tested in the end.
We have used the recorded sensor events between 21 March and 11 June for training, and there are four and a half million events in this subset, generated by sensors. As the test set, we use a different set of recordings, collected a year later May 24, —July 2, , comprising about two million events. Due to the large number of available instances, cross-validation was not used in this study.
Data compression - Wikipedia
The complete motion ground truth for people using the environment is not available, as the sensor outputs are sometimes ambiguous. Furthermore, it is not possible to have rapid activations from a single sensor in succession, and some activity is lost. Finally, the network transmission of the events from sensors to the central recording server is reported to cause minor data loss from time to time.
Along with sensor activations, some information about movements called tracklets are provided. Each tracklet is a directed graph of sensor activations, which possibly belongs to a single person. The MERL dataset was investigated in [ 37 ], who proposed a number of interaction features for sampling-based analysis of massive datasets. Entropy is proposed as a robust measure suitable for cases where data are abundant to assess relative organization of event distributions. Since the amount of data is massive, we do not construct the whole cascade of T-patterns, but look at the elementary patterns, each composed of two basic sensor events spaced at most five minutes apart.
- Astral Realms (Mage: the Awakening);
- Compression-Based Methods of Statistical Analysis and Prediction of Time Series | RISE?
- Thinking for a Change: Putting the TOC Thinking Processes to Use (The CRC Press Series on Constraints Management)?
For each such pattern, the potential critical interval is found by fitting a two-component Gaussian with the EM algorithm to the pooled interval times between the sensor firings, as described. Since the data are 1-dimensional, the convergence is fast less than 10 iterations and robust. Our experiments show that using more than 5, events for a single candidate pattern is not beneficial, as the distribution is very well approximated with 5, events.
For real patterns, the first Gaussian has a very narrow shape that is characterized by a small standard deviation in comparison to the second Gaussian. Figure 4 shows the first Gaussian peak for a number of subsequent sensors along a corridor coupled with the first sensor in the corridor. The origin denotes the firings of the first sensor. The second sensor activation is very salient, producing a clear peak near zero.
The third sensor again coupled with the first sensor in the corridor produces a smaller peak, and moves away from the first event in time. As we move on to sensors along the corridor, the peaks get flatter. The data range is seconds, and the second peak for each distribution is usually between and seconds, i. The Gaussian peaks for successive sensor firings after a given sensor event on a corridor.
We use the elementary patterns detected by the algorithm to construct a Voronoi graph, which reflects the topology of the environment. Technically, the Voronoi graph or the Voronoi diagram of an environment is made up of points equidistant to existing obstacles, and thus serves as a roadmap [ 38 , 39 ]. This is a useful representation if the exact positions of the wireless sensors and the map of the environment are missing, typically in scenarios where the deployment is fast and requires minimum manual intervention.
In our implementation, every sensor is shown as a node in this graph, and once the elementary T-patterns [ i , j ] are found, they are simply joined by an edge. For a better visualization, we used the following pruning process. Assume node i and node j are connected with a T-pattern. The edge that represents this pattern is pruned, if there exists a node k that has stronger T-patterns to both node i and node j.
The strength of the pattern is reflected by the likelihood ratio of the means of the two Gaussians that model the inter-event times. A higher ratio is indicative of a stronger peak, and consequently, a stronger relation. The pruning process considers patterns sequentially, sorted by ascending pattern strength. Figure 5 a shows a map of the environment and the superposed Voronoi graph. The circles indicate approximate sensor locations, and the connection links indicate the elementary patterns.
Activation of sensors shown with light colours is a good predictor of the next related event. By using the T-patterns, we can try to predict events based on the activation of a given sensor.
This is actually more powerful than predicting the next event in the system, as we can give a temporal window i. From the learned set of patterns, we select the two strongest T-patterns for each sensor i. For each sensor activation of the test set, we looked at the two best T-patterns, and checked the corresponding critical intervals given by two standard deviations for the expected events.
periocenter.ru/wp-content/map18.php If at least one event was detected, the prediction was counted as a success. As the number of sensors increased in time, we did not take into account activations from sensors that were missing in the training data. It is also possible to analyse the prediction success sensor by sensor. Figure 5 b links prediction success to the sensor locations.
For some locations e. Recent progress in sensor technology makes it necessary to create algorithms that are capable of discovering structure in large-scale and possibly heterogeneous sensor systems. In this paper we have reviewed existing methodologies for the discovery of temporal patterns in sensor data. We have proposed two improvements to the basic T-pattern methodology referred to in this text as GMM T-patterns that significantly improve the performance.
Experiments show that T-patterns outperform the compression-based techniques and the proposed improvements independence testing and GMM-modelling of correlation times yield more reliable results. We have applied the modified T-pattern algorithm on a recently published challenging dataset, consisting of binary motion sensor activations. We have shown that the proposed GMMTPat method significantly reduces the temporal complexity, even when contrasted to variants of T-pattern approach that are several orders of magnitude faster than the original.
We have shown the effect of Bonferroni adjustment in eliminating spurious patterns. We have also assessed the prediction accuracy, in which the detected patterns are used to predict the firing of the next sensor in the pattern, and automatic construction of the Voronoi graph, which is a proximity-based physical map of the environment.