U-TFF: A U-Net-Based Anomaly Detection Framework for Robotic Manipulator Energy Consumption Auditing Using Fast Fourier Transform (2024)

1. Introduction

Robotic manipulators have emerged as essential Artificial intelligence (AI)-based tools in modern manufacturing and production processes, executing programmed and intricate tasks with exceptional precision and speed, particularly in complex or hazardous environments [1]. These functionalities have led to their widespread adoption in various applications, including automatic sorting [2], waste handling [3], and others. However, owing to their inherent complexity, robotic manipulators are susceptible to unexpected anomalies such as attacks, operational fluctuations, and environmental changes, potentially leading to critical economic and security issues [4]. Consequently, there is an urgent need to develop a robust monitoring system to ensure the security and operational health of such manipulators.

The first step of such a monitoring system is determining what types of data need to be monitored. Although most robotic manipulators are equipped with built-in sensors capable of tracking and recording various time-varying operational parameters, including the position, speed, voltage, and current of each joint motor, these sensors are also vulnerable to attacks, and may be maliciously manipulated to provide fake data. This vulnerability issue poses a critical challenge and reduces the trustworthiness of the monitoring process. To address this issue, the adoption of a side-channel energy consumption auditing approach has been proposed [5]. This method employs a physical mechanism to measure energy consumption through a separate module positioned alongside the robotic manipulator system. Such an approach allows for isolated monitoring, making it more robust to data manipulation and attacks [6].

The next step is to process the data for use in learning to identify anomalies when attacks occur, which is usually referred to as the anomaly detection task. In contrast to traditional approaches such as decision trees [7], deep learning-based methods can be employed in anomaly detection tasks by training neural networks to distinguish anomalies from normal samples automatically. This approach has found broad applications in AI-aided manufacturing [8]. Neural networks using these methods can vary from small models (e.g., [9,10,11]) to large models (e.g., [12,13,14]). Moreover, based on the model training methods, anomaly detection approaches can be generally classified into two main categories, namely, supervised and semi-supervised learning methods. In supervised learning, each instance in the dataset is labeled as either normal or anomalous for model training [15]. Because supervised methods require labeled data, they provide clear and precise information about both normal and anomalous instances. As a result, supervised methods have strong ability to tackle known anomalies that have been identified and labeled in the training dataset. They can learn the specific characteristics of these anomalies and accurately detect them during the testing phase. However, supervised learning relies heavily on labeled data for training, which may be expensive or difficult to obtain in many real-world scenarios. The resulting models may also struggle to detect anomalies that differ significantly from those observed during training. In other words, the generalization of supervised approaches is not guaranteed. On the other hand, semi-supervised learning typically uses normal instances only for model training to learn the underlying patterns of normal behavior. A well-trained model can identify instances that deviate significantly from the learned patterns of normal behavior, flagging these instances as anomalies. This process is also known as outlier detection. Considering that semi-supervised models are not solely reliant on labeled data, they can generalize better to unseen anomalies compared to supervised approaches.

Based on the seminal efforts outlined above, this paper introduces a semi-supervised anomaly detection approach that combines side-channel energy consumption auditing specifically for robotic manipulators. The present study utilizes a small neural network, enabling its deployment on edge computing devices to perform anomaly detection tasks in real time. Moreover, only energy consumption profiles during operation are audited, making it less vulnerable to intentional intrusion. Our approach relies on a reconstruction network trained using only energy consumption profiles associated with normal operations. It then identifies anomalies by evaluating the magnitude of reconstruction errors, considering that abnormal patterns can cause large errors when reconstructed through a well-trained network. To this end, a static threshold is determined where samples with errors above the threshold are predicted as anomalies. The reconstruction network combines the well-known U-Net [16] architecture with a new Time–Frequency Fusion block. This block applies the Fast Fourier Transform to the time series data, providing complementary information and features in the frequency domain in addition to traditional time domain data processing. This delicate design enriches feature representation and improves detection performance. The main contributions of this work are enumerated as follows:

  • Energy consumption serves as the sole data source that is acquired through a side monitoring channel for anomaly detection. This design mitigates the risk of targeted malicious attacks on measurement sensors and enhances the reliability of the proposed method. Notably, it also offers great potential for extension to other systems and applications.

  • A novel neural network block for Time–Frequency Fusion is introduced to develop a reconstruction network for anomaly detection. This new block leverages the Fast Fourier Transform to capture essential features in the frequency domain. This complementary knowledge facilitates the feature learning process by enriching feature representations, leading to improved detection performance, especially for anomalies with strong presentation in the frequency domain.

  • For this study, we compiled a tailored dataset documenting the energy consumption during the operational phases of a robotic manipulator. This dataset comprises over 6000 instances of energy consumption during normal operations, which were utilized to train the proposed model. Furthermore, more than 6000 instances of energy consumption incorporating simulated anomalies were collected for model verification purposes. The dataset has been made publicly available as an open-source resource to facilitating its utilization and evaluation by fellow researchers.

The rest of this paper is structured as follows: Section 2 introduces prior works in related fields; Section 3 elaborates on the methodology and key research components; details regarding the experiments, including generation of the custom dataset, are outlined in Section 4; Section 5 presents the experimental results and analysis; finally, Section 6 concludes the paper with a brief technical summary.

2. RelatedWork

2.1. Side-Channel Mechanisms in AnomalyDetection

Lightbody et al. proposed an intrusion detection dataset named Dragon_Pi for training AI-based intrusion detection neural networks [17]. This dataset includes a collection of normal and under-attack power consumption traces captured from a side-channel power analyzer. To evaluate its effectiveness, the authors adopted a convolutional autoencoder (CAE) trained using normal data from the proposed dataset. When testing with anomalies, the CAE model could achieve 0.78 and 0.876 AUC without and with postprocessing, respectively. Moreover, a novel data transmission approach using a disconnected side-channel monitoring system to monitor additive manufacturing processes was proposed by Raeker-Jordan et al. [18]. Their approach parses side-channel measurements to decode the embedded information for comparison against already observed patterns, effectively transmitting process quality information to the monitoring system. This design allows the monitoring system to remain disconnected from the network while still allowing the detection of unseen patterns. A novel multi-modal sabotage attack detection system for additive manufacturing machines was developed by Yu et al. [19]. In this system, multiple side channels, including acoustic, magnetic, vibration, and power, are utilized to estimate the system’s states. The proposed detection system was evaluated under real-world test cases and achieved an attack detection accuracy of 98.15%.

2.2. Semi-Supervised Learning Anomaly DetectionApproaches

To address the challenges associated with inadequate and inaccurately labeled data in communication networks, Meng et al. proposed SemiADC, a GAN-based semi-supervised anomaly detection framework for dynamic communication networks [20]. The framework initially approximates the feature distribution of normal nodes with regularization from abnormal ones. Subsequently, it identifies and extracts nodes with inaccurate labels based on the learned feature distribution and structure-based temporal dependencies. These self-learning iterations proceed with mutual enhancement, ultimately enhancing the accuracy of anomaly detection. In another salient research work [21], a semi-supervised anomaly detection method called Dual Prototype Auto-Encoder (DPAE) was introduced for industrial surface inspection based on the encoder–decoder–encoder paradigm. During the training phase, DPAE incorporates both the dual prototype loss and reconstruction loss to guide the encoders’ learning process in producing latent vectors close to their prototypes. As a result, latent vectors corresponding to normal images tend to exhibit closer proximity, while a significant separation between latent vectors indicates the presence of an anomaly. Yang et al. [22] proposed a practical approach for anomaly detection called PLELog, which operates on log data and adopts a semi-supervised framework to avoid the labor-intensive process of manual labeling. It learns about historical anomalies via probabilistic label estimation. Furthermore, PLELog remains resilient to unstable log data through semantic embedding techniques and achieves efficient anomaly detection by implementing an attention-based GRU neural network. Please make sure that the references citations in the text correspond to the items in the References section in paper.

2.3. Fast Fourier Transform in DeepLearning

A new method to address signal-based fault detection and diagnosis for rotating machinery was developed by Jalayer et al. [23]. For feature extraction, the proposed model integrates two distinct signal transform techniques, Continuous Wavelet Transform (CWT) and Fast Fourier Transform (FFT), alongside statistical features derived from the raw signal to capture all fault signatures. By employing a Convolutional Long Short-Term Memory (CLSTM) architecture, the framework efficiently handles multi-channel input data and learns its spatiotemporal characteristics. Through a sensitivity analysis conducted on the input channels, the research demonstrated that implementing these multi-domain features enhances the model’s accuracy. The study of Li et al. [24] focused on implementing an innovative ensemble deep learning model for monitoring cutting tool wear by utilizing audio sensors. An audio denoising technique combined with FFT and bandpass filters along with dependent component analysis was employed to extract tool wear data during machining. Subsequently, a detection model based on ensemble convolutional neural networks was trained and the audio signals were transformed into audio images using various algorithms. The findings affirmed the high accuracy of the proposed method in predicting tool wear values across various cutting conditions. Moreover, Li and Chen presented a novel approach to classify electroencephalogram (EEG) signals using a deep learning model [25]. It automates the learning of relevant features within a supervised learning framework. FFT is employed to generate the EEG matrix. Subsequently, a PCA neural network extracts hidden information from the frequency matrix of EEG signals. These deep features are then utilized as inputs to train a support vector machine model to recognize epileptic seizures.

3. Methodology

The details about the proposed anomaly detection system are presented in this section. First, the overview of the framework is shown in Figure 1, which includes two main stages:

  • Offline model training: In this phase, the robotic manipulator functions normally when executing designated tasks. The corresponding energy consumption data is utilized by the reconstruction network to learn the distribution of normal data patterns. As part of the model training process, a loss function is defined to measure the disparity between the reconstructed output and the original energy consumption data. This loss function quantifying the reconstruction error is then utilized in backpropagation to refine and enhance the performance of the network iteratively.

  • Online anomaly detection: A sufficiently trained model should demonstrate the ability to accurately reconstruct input samples following normal patterns with minimal reconstruction errors. However, notable changes may occur in the energy consumption profiles of the robotic manipulator when subjected to attacks or other anomalous conditions. Consequently, the network may fail to accurately reconstruct these anomalous inputs using the normal patterns learned in the offline phase, leading to higher errors. Thus, a static threshold is determined to distinguish anomalies from normal data. Specifically, an input sample is flagged as an anomaly if its reconstruction error value is higher than the predefined threshold.

In the following sections, we describe the essential processing steps involved in the offline training and online detection phases.

3.1. DataProcessing

Considering that the robotic manipulator has a total of J joints and that each joint has a sensor that monitors and records its consumed energy, the energy consumption data at a time instant t can be described by the set e t = { e t 1 , , e t J } . Assuming that the manipulator performs one operation starting from the first time instant ( t = 1 ) to the n t h instant ( t = n ) , the energy consumption sequence associated with this operation is defined as the set { e 1 , , e n } . To enrich the feature representation and fix the size of the input to the neural network, a sliding window technique is adopted, which splits the energy sequence into several overlapped segments. Each segment has a predetermined length T, the value of which also represents the sliding window size. For the i t h segment S i , this is composed of { e i T + 1 , , e i 1 , e i } . In this manner, the overlap length between two consecutive segments is T 1 , ensuring an offset of one single time instance between them. Thus, a data sequence of length n is transformed into a total of n T + 1 segments, making the input to the neural network one segment instead of a single data point. This design improves feature representation and detection performance by incorporating more statistical features [26].

3.2. Time–Frequency FusionBlock

Figure 2 depicts the structure of the proposed Time–Frequency Fusion (TFF) block, including two branches designed to process input signals from different domains. The left branch primarily operates in the time domain, utilizing a single fully connected layer (colored in orange) to learn input patterns and project them into vectors within the latent space. Specifically, the size of the input to the current block is denoted as c i n , and the desired output size is represented as c t , where the subscript t represents the time domain processing. It is worth noting that, similar to the single fully connected layer, the input size c i n and output size c t can share the same value or have different values. In the right branch, which is dedicated to frequency domain processing, the input is initially converted from the time domain to the frequency domain using the Fast Fourier Transform (FFT). Subsequently, two separate fully connected layers are employed to learn the real (highlighted in purple) and imaginary (highlighted in green) parts of the complex output obtained from the FFT operation. As the FFT operation maintains the data size, both the real and imaginary parts of the FFT results remain the size of c i n . Consequently, two layers in the frequency branch change the size from c i n to c f r and c f i , respectively. The subscript f denotes the frequency domain branch, and superscripts r and i represent the real and imaginary parts of frequency features, respectively. In this work, the values of c t , c f r , and c f i are all the same; this value is denoted as F in this paper, i.e., c t = c f r = c f i = F . The inverse FFT operation is then applied to the complex values within the latent space, where features for reconstructing the real and imaginary parts have already been learned by the fully connected layers. Although the inverse FFT yields results in the time domain, the process inherently distills the knowledge and feature representation acquired from the frequency domain. Features from both branches are combined by summation to integrate information from both domains. Finally, another fully connected layer processes the combined features with size F to generate the final output of the TFF block, the desired dimension of which is denoted by c o u t . Additionally, in this work the value of c o u t is determined to be the same as c i n of each block. In short, this new block enables learning features in the frequency domain to improve the reconstruction of normal time series.

3.3. U-TFF

Based on the block described above, the proposed U-TFF network architecture is shown in Figure 3. It is inspired by the well known U-Net, which includes one encoder and a decoder in a U-shaped architecture. The encoder acts in a contracting path by including several TFF blocks, the output size of which is smaller than its input. Moreover, the encoder is responsible for capturing context and extracting high-level features from the input by representing them as vectors in the reduced-dimensional latent space. On the other hand, the decoder, in other words the expansive path, is the symmetric counterpart to the encoder. It consists of the same number of Time–Frequency Fusion blocks, with each block having an increased output dimension compared to its input. The purpose of the expansive path is to restore the data sequence dimension while retaining the context learned in the contracting path in order to complete the reconstruction process.

One of the key features of the proposed network is the use of skip connections (the gray dashed arrows) between corresponding blocks in the encoder and decoder. More specifically, the input to each block of the decoder within the network is the combination of the output from the previous block and the output from the corresponding same-level block of the encoder. The process of such a combination is represented as the red “Concat Block” in Figure 3, where the element-wise sum between two outputs is implemented. These skip connections facilitate the direct flow of information from the encoder to the decoder in order to prevent the loss of essential information during the downsampling process in the encoder. Skip connections also mitigate the vanishing gradient problem commonly encountered during training of deep neural networks. By providing shortcut connections, gradients can flow more directly from the output to the input layers, thereby addressing the issue of diminishing gradients. This results in more stable and efficient network training.

Because the neural network is intended for reconstruction, the loss function used for model training is defined as the 2-norm of the error (i.e., the mean square error) between the original input segment S i and the reconstructed segment S i ¯ .

L r e c = S i S i ¯ 2

Anomaly detection during the online testing phase is also based on the reconstruction error defined in Equation (1). The detector applies a static threshold to generated reconstruction errors. If the error is higher than the predefined threshold, the corresponding data point is classified as an anomaly. The value of the threshold is determined so as to be slightly larger than the maximum reconstruction error of the training dataset. Such a strategy can mitigate misclassification problems when other unobserved normal patterns produce larger reconstruction errors.

4. Experiment

This section describes the experiments undertaken to assess and validate the proposed anomaly detection system. It includes details concerning the robot manipulator, the programmed tasks during operations, and a simulated attack injected into the system to mimic potential anomalies encountered in industrial settings.

4.1. RoboticManipulator

As the schematic in Figure 4 shows, a Lynxmotion robot manipulator with four degrees of freedom [27] was chosen in this study for experimentation. This compact robot arm comprises five Lynxmotion Smart Servo (LSS) motors, each of which is considered as a joint of the robot arm, with an LSS adapter board affixed to facilitate communication. The operational speed range of the motors falls between 1 and 4 revolutions per minute (RPM). Each motor is equipped with side-channel sensors to track and record various states such as position, speed, current, and voltage. However, the calculation of energy consumption in this work solely relies on the current and voltage values. A Raspberry Pi 3 Model B+ is used to send control signals to command the robotic manipulator to perform designed tasks as well as to monitor and record the manipulator’s status for subsequent anomaly detection.

4.2. ExecutedTasks

The robotic manipulator was commanded to perform identical tasks across six distinct locations to emulate industrial manufacturing operations, each offering opportunities for injecting anomalies. At each location, the manipulator was commanded to extend its end effector to the center of the designated area, execute a gripping motion without picking any objects during normal operations, and then return to its original posture before proceeding to the next location. As depicted in Figure 4, these six locations are denoted by numbers 1 to 6, arranged in a circle. The manipulator initiates its operations at location 1 and progresses counterclockwise. Upon completing the task at location 6, the manipulator transitions back to location 1 to commence a new cycle. Random motions were introduced between each predefined position to diversify the dataset and enhance the model’s generalization capabilities, representing multiple feasible trajectories for accomplishing the same task. Consequently, despite executing identical tasks, the robotic manipulator still exhibits slight variations in energy consumption profiles.

4.3. SimulatedAttack

Specifically, a physical attack is considered in this work to represent a concerning anomaly. This experiment replicates instances in which the system encounters undesired forces, possibly caused by external factors such as physical tampering that can disrupt the normal operation of Cyber–Physical Systems (CPS). Such attacks may lead to unexpected variations in the manipulator’s operations and present potential anomalies to the entire production line. It is important to highlight that visual information is unavailable during operations in the simulated attack scenario, and anomalies cannot be observed from the final product. This suggests that the attacks are attempted in order to undermine system operations while remaining challenging to identify using conventional methods. Moreover, the assigned tasks are still completed during the attacks, further complicating the detection process.

In a standard operation, the robotic manipulator executes its tasks without gripping any objects. However, during a physical attack a single object positioned at the designated location is grasped by the manipulator’s end effector, and the weight of the mass gives rise to a different profile than normal operation. In this experiment, we used two segments of PVC pipes weighing 33 g and 250 g to simulate physical attacks of varying intensity levels. These weights correspond to gravitational forces of approximately 0.32 N and 2.6 N, respectively. An operation where physical attacks are randomly introduced into tasks was conducted to evaluate the effectiveness of detecting attacks. Each physical attack event involving gripping and releasing objects by the robotic manipulator and transitioning the end effector to the next location lasted 10 s. The experiment with physical attacks included as part of the testing data was conducted over a total duration of approximately 30 min. The design of random anomaly injection allowed for replicating real-world scenarios where attacks may occur contingently and impact different segments of program execution.

5. Results andDiscussion

This section provides a detailed analysis of our experimental findings and the results obtained from anomaly detection, both qualitatively and quantitatively. Following the semi-supervised anomaly detection method, the U-TFF model was trained exclusively on normal data. The training dataset comprised time series data collected over a duration of 35 min at a frequency of 3 HZ. This dataset included various sensor readings, including the position, speed, current, and voltage of all five motors. These values were recorded at each time instant using the side-channel mechanism. Energy consumption was calculated as the product of the current and voltage.

Figure 5 presents a segment of the collected energy consumption data. The index of the motor corresponds to its relative position in the manipulator, where the end effector is labeled as Motor 5 and the base motor of the robotic manipulator is denoted as Motor 1. Notably, significant variations in energy consumption can be observed among the different motors, with motors 2 and 3 exhibiting higher energy consumption than others. The training dataset was randomly divided into 80% for training and 20% for validation to ensure the model’s robustness and generalization.

The sliding window size was determined to be 3, meaning that each data segment includes one current value and two previous values for each motor. Therefore, at each time the dimension of input to the proposed network is 15. Considering that a larger sliding window size leads to better detection performance but lower response speed, this value was selected in order to achieve a balance between detection accuracy and inference speed. The encoder of the network was developed using three TFF blocks with output dimensions of 12, 9, and 6, respectively; correspondingly, the decoder also consisted of three blocks, with output dimensions of 6, 9, and 12, respectively.

The model training utilized the Adam optimizer to minimize the loss function defined in Equation (1), with 1000 epochs completed on a GPU workstation equipped with an Intel(R) Xeon(R) Gold 5220R CPU @ 2.20 GHZ and two NVIDIA RTX A6000 GPUs (Santa Clara, CA, USA).

5.1. BenchmarkModel

To evaluate the effectiveness of the proposed anomaly detection network, comparison experiments were conducted on the custom dataset with four benchmark models. These models included the Deep Auto-Encoder (DAE) from [28], Generative Adversarial Network (GAN) with one and multiple discriminators from [26], Anomaly Transformer from [29], and TadGAN from [30].

DAE builds the reconstruction network using solely fully connected layers. It consists of one encoder and one decoder, but without frequency domain processing or a skip connection. Moreover, to ensure a fair comparison, the sliding window size of DAE was also set to three and the input size was set to 15. Both the encoder and decoder have one fully connected hidden layer. To enable model training and online testing, the GAN model employs a second submodule called the discriminator in addition to the main reconstruction network called generator. The discriminator network, representing a more complex mathematical representation, can further improve the reconstruction ability of the generator. The Anomaly Transformer introduces a unique anomaly attention mechanism to calculate the association discrepancy. Employing a minimax strategy, it enhances the ability to distinguishing between normal and abnormal associations; more specifically, it utilizes the distance between the association discrepancies measured by the KL divergence as the detection criterion. TadGAN utilizes LSTM recurrent neural networks in both the generator and discriminator to capture the temporal correlations of time series distributions. The model is trained using a cycle consistency loss, facilitating efficient reconstruction.

5.2. EvaluationMetrics

The quantitative evaluation of the proposed model and comparison experiments were conducted in two distinct ways: instant-level detection and event-level detection. Instant-level detection refers to evaluating each individual data point in a time series independently to determine whether it is normal or anomalous. This method treats data at each time instant as separate individual items without considering the relationship with preceding or succeeding data points. The primary goal is to identify anomalies by examining each point in isolation, ensuring that any deviation from normal behavior is detected at the moment it occurs. This method allows for real-time anomaly detection, which is particularly useful in applications where immediate identification of anomalies is critical. Conversely, event-level detection focuses the analysis on periods between two non-successive time instants, where each period contains either solely normal or anomalous data. For example, if an attack only occurs between the first and the twentieth time instants during one operation, this period is labeled as an attack event. In this work, if any instant within an attack event is identified as anomalous, then the entire event is considered a true positive. Conversely, if a time instant during a normal event is mistakenly identified as an anomaly, then the entire event is counted as a false positive. The consideration behind event-level detection is that detecting anomalies at every instant during an attack event is exceptionally difficult and normally unnecessary. In general, detecting one or a few anomalous data instances during an attack event is already sufficient for the purposes of raising an alert.

More specifically, we use three evaluation metrics to describe detection performance quantitatively: recall, precision, and accuracy. These metrics are defined below using the values of true positive (TP), true negative (TN), false positive (FP), and false negative (FN) [31].

R e c a l l = T P T P + F N

P r e c i s i o n = T P T P + F P

A c c u r a c y = T P + T N T P + T N + F P + T N

Recall represents the ratio of detected anomalies to all the abnormal samples in the ground truth dataset. Conversely, precision signifies the percentage of correctly predicted abnormal samples relative to the total samples predicted as positive. Accuracy in anomaly detection refers to the proportion of correct predictions among all the samples, including both normal samples and anomalies.

5.3. Results andDiscussion

Figure 6 presents the anomaly detection result using the proposed U-TFF network. In this figure, background colors (white and yellow) indicate the event status of the manipulator in the ground truth, corresponding to normal operations and physical attacks, respectively. The x-axis of the figure represents time instants during operation, while the y-axis represents the reconstruction error generated by the proposed anomaly detection network. A higher value of reconstruction error indicates a higher likelihood of anomaly presence. The maximum reconstruction error of all training samples determines the threshold used in this work to distinguish anomalies from normal patterns. It has a value of 0.54 and is represented by the horizontal black dashed line in Figure 6. Therefore, all data points with reconstruction errors higher than the threshold are predicted as anomalies in the instant-level anomaly detection, and are denoted by red asterisks in the figure.

Table 1 shows the confusion matrix of instant-level physical attack detection. As shown by Figure 6, some normal patterns are mistakenly detected as anomalies around time instant 1300. The proposed network produces 204 false positives for physical attack detection. On the other hand, most anomalous data instances are successfully detected by the proposed network, leading to 647 true positive samples. Moreover, the anomaly detection network yields a small number of false negative samples (131), confirming the detection model’s utility.

Table 2 summarizes the quantitative results concerning the three essential metrics of recall, precision, and accuracy in order to evaluate the performance of the proposed framework. For instant-level detection, the anomaly detection model exhibits an accuracy of 0.933. However, as shown in Figure 6, U-TFF reaches a precision of only 0.7603 due to the false positive samples. Most notably, the the model achieves a recall of 0.8316, indicating salient performance in detecting true positive samples. In event-level detection, the network exhibits a successful detection rate of 0.9375 for physical attack events. This matches the observation from Figure 6 that only one attack event failed to be detected, which occurred immediately after time instant 1500. On the other two metrics, precision and accuracy, it achieves values of 0.9375 and 0.9394, respectively.

Table 3 presents comparison result with the other four benchmark models, with the best performer under each performance metric highlighted in bold. Regarding accuracy, the GAN model with one discriminator demonstrates slightly superior performance, only 0.018 and 0.038 higher than the GAN model with five discriminators and the proposed network. Anomaly Transformer ranks fourth with an accuracy value of 0.8881, followed by DAE (0.8554) and TadGAN (0.8522). On the other hand, Anomaly Transformer achieves the best performance in terms of precision, with a value of 0.9249. This value is approximately 0.1 higher than that of the GAN model with only one discriminator, which ranks second. The proposed U-TFF model detects anomalies with a precision of 0.7603, which is slightly lower than the GAN with five discriminators (0.8044) but higher than DAE (0.7346) and TadGAN (0.6142). It is worth noting that the significant performance Anomaly Transformer in terms of precision is due to its small number of true positive samples, which is confirmed by the low recall value (0.342). Anomaly Transformer could only detect a small proportion of anomalous patterns in the ground truth. Conversely, our network exhibits superior performance with respect to the recall evaluation metric, with a value of 0.8316, followed by the two GAN models, which achieved values of 0.787 and 0.813, respectively. The deep autoencoder model still performs poorly, with a recall of 0.226. The above comparison verifies the feasibility and effectiveness of the proposed U-FFT-based anomaly detection network for energy consumption auditing for robotic manipulators.

6. Conclusions

This paper introduces a novel approach to anomaly detection in robotic manipulators. It adopts side-channel energy auditing and utilizes the time series of energy consumption profiles as the sole input data for analysis. This simple design and robust monitoring scheme mitigate the risks caused by operational anomalies, malicious attacks, and other threats targeting networked sensors, which can pose significant challenges to accurate online data processing and analysis. A novel reconstruction network named U-TFF is proposed to learn energy consumption patterns and detect anomalies during the operation of robotic manipulators. U-TFF consists of one encoder that extracts features as low-dimensional vectors in the latent space and one decoder that reconstructs the input using associated latent vectors. The encoder includes several delicately designed blocks to distill frequency domain features from temporal inputs while continuously reducing the feature dimension. Conversely, the decoder comprises the same number of blocks as the encoder but with an increasing output size to reconstruct the input. More specifically, the Fast Fourier Transform is first applied to the temporal input to convert it from the time domain to the frequency domain. Then, two fully connected layers are used to extract features from the real and imaginary parts, respectively. The inverse FFT is then used to transform the extracted frequency features back to the time domain. This process reveals temporal information predominant in the frequency domain; when combined with conventional data processing in the time domain, it is capable of producing a more comprehensive feature set for anomaly detection spanning both domains. A fully connected layer for the output is employed to reconstruct the input using feature vectors from the information of both the time and frequency domains. To further improve the reconstruction ability of the network, a U-shape architecture is implemented along with skip connections to link two encoder and decoder blocks at the same level of the feature hierarchy. The sliding window technique is applied to sequences of energy consumption in order to enrich the feature representations of the input. The network is trained in a semi-supervised manner using only samples from normal operations. During online testing, inputs with anomalous data are expected to yield a significantly large reconstruction error, which can then be used as an effective indicator of the presence of anomalies.

For experimental evaluations of the proposed energy auditing system, we utilized a Lynxmotion robotic manipulator programmed to perform prescribed tasks as a test bed. Physical attacks were simulated by gripping objects of different weights that were injected randomly during the operation of a series of tasks. A custom dataset consisting of normal and physical attack data was compiled for online testing of the proposal anomaly detection model, and experiments were conducted to compare the present U-TFF model with four other benchmarks. The proposed network was able to successfully detect 83% of abnormal data instants while maintaining a precision of 0.7603 and an accuracy of 0.933. For event-level detection, the proposed model achieved over 0.93 on all three evaluation metrics, demonstrating well-balanced performance.

Future investigations will focus on real-time implementation of edge computing devices [32], deployment of the proposed system in relevant industrial settings, and performance assessment under varying operational conditions. In addition, to further improve the detection performance, emerging network architectures such as attention mechanisms will be explored within the proposed network [33].

Author Contributions

Conceptualization, G.S., S.H.H., and Y.W.; methodology, G.S. and S.H.H.; software, G.S.; formal analysis, G.S.; investigation, G.S., S.H.H., T.K. and Y.W.; data curation, T.K.; writing—original draft preparation, G.S.; writing—review and editing, S.H.H. and Y.W.; supervision, Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Liu, T.; Xu, C.; Qiao, Y.; Jiang, C.; Yu, J. Particle Filter SLAM for Vehicle Localization. arXiv 2024, arXiv:2402.07429. [Google Scholar]
  2. Bui, H.D.; Nguyen, H.; La, H.M.; Li, S. A deep learning-based autonomous robot manipulator for sorting application. In Proceedings of the 2020 Fourth IEEE International Conference on Robotic Computing (IRC), Virtual, 9–11 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 298–305. [Google Scholar]
  3. Zhang, K.; Hutson, C.; Knighton, J.; Herrmann, G.; Scott, T. Radiation tolerance testing methodology of robotic manipulator prior to nuclear waste handling. Front. Robot. AI 2020, 7, 6. [Google Scholar] [CrossRef] [PubMed]
  4. Al-Garadi, M.A.; Mohamed, A.; Al-Ali, A.K.; Du, X.; Ali, I.; Guizani, M. A survey of machine and deep learning methods for internet of things (IoT) security. IEEE Commun. Surv. Tutor. 2020, 22, 1646–1685. [Google Scholar] [CrossRef]
  5. Song, G.; Hong, S.H.; Kyzer, T.; Wang, Y. An Energy Consumption Auditing Anomaly Detection System of Robotic Manipulators based on a Generative Adversarial Network. In Proceedings of the Annual Conference of the PHM Society, Salt Lake City, UT, USA, 28 October–2 November 2023; Volume 15. [Google Scholar]
  6. Jung, W.; Feng, Y.; Khan, S.A.; Xin, C.; Zhao, D.; Zhou, G. DeepAuditor: Distributed Online Intrusion Detection System for IoT devices via Power Side-channel Auditing. In Proceedings of the 2022 21st ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), Virtual, 4–6 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 415–427. [Google Scholar]
  7. Zhang, Y.; Zhu, M.; Gui, K.; Yu, J.; Hao, Y.; Sun, H. Development and application of a monte carlo tree search algorithm for simulating da vinci code game strategies. arXiv 2024, arXiv:2403.10720. [Google Scholar]
  8. Zang, H.; Li, S.; Dong, X.; Ma, D.; Dang, B. Evaluating the social impact of ai in manufacturing: A methodological framework for ethical production. Acad. J. Sociol. Manag. 2024, 2, 21–25. [Google Scholar]
  9. Zhu, M.; Zhang, Y.; Gong, Y.; Xu, C.; Xiang, Y. Enhancing Credit Card Fraud Detection: A Neural Network and SMOTE Integrated Approach. J. Theory Pract. Eng. Sci. 2024, 4, 23–30. [Google Scholar] [CrossRef]
  10. Dong, X.; Dang, B.; Zang, H.; Li, S.; Ma, D. The prediction trend of enterprise financial risk based on machine learning arima model. J. Theory Pract. Eng. Sci. 2024, 4, 65–71. [Google Scholar]
  11. Audibert, J.; Michiardi, P.; Guyard, F.; Marti, S.; Zuluaga, M.A. Usad: Unsupervised anomaly detection on multivariate time series. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual, 23–27 August 2020; pp. 3395–3404. [Google Scholar]
  12. Zhang, Y.; Zhu, M.; Gong, Y.; Ding, R. Optimizing science question ranking through model and retrieval-augmented generation. Int. J. Comput. Sci. Inf. Technol. 2023, 1, 124–130. [Google Scholar] [CrossRef]
  13. Su, J.; Jiang, C.; Jin, X.; Qiao, Y.; Xiao, T.; Ma, H.; Wei, R.; Jing, Z.; Xu, J.; Lin, J. Large Language Models for Forecasting and Anomaly Detection: A Systematic Literature Review. arXiv 2024, arXiv:2402.10350. [Google Scholar]
  14. Liu, C.; He, S.; Zhou, Q.; Li, S.; Meng, W. Large Language Model Guided Knowledge Distillation for Time Series Anomaly Detection. arXiv 2024, arXiv:2401.15123. [Google Scholar]
  15. Zhu, M.; Zhang, Y.; Gong, Y.; Xing, K.; Yan, X.; Song, J. Ensemble Methodology: Innovations in Credit Default Prediction Using LightGBM, XGBoost, and LocalEnsemble. arXiv 2024, arXiv:2402.17979. [Google Scholar]
  16. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference, Part III 18, Munich, Germany, 5–9 October 2015; Springer: Munich, Germany, 2015; pp. 234–241. [Google Scholar]
  17. Lightbody, D.; Ngo, D.M.; Temko, A.; Murphy, C.C.; Popovici, E. Dragon_Pi: IoT Side-Channel Power Data Intrusion Detection Dataset and Unsupervised Convolutional Autoencoder for Intrusion Detection. Future Internet 2024, 16, 88. [Google Scholar] [CrossRef]
  18. Raeker-Jordan, N.; Chung, J.; Kong, Z.J.; Williams, C. Ensuring additive manufacturing quality and cyber–physical security via side-channel measurements and transmissions. J. Manuf. Syst. 2024, 73, 275–286. [Google Scholar] [CrossRef]
  19. Yu, S.Y.; Malawade, A.V.; Chhetri, S.R.; Al Faruque, M.A. Sabotage attack detection for additive manufacturing systems. IEEE Access 2020, 8, 27218–27231. [Google Scholar] [CrossRef]
  20. Meng, X.; Wang, S.; Liang, Z.; Yao, D.; Zhou, J.; Zhang, Y. Semi-supervised anomaly detection in dynamic communication networks. Inf. Sci. 2021, 571, 527–542. [Google Scholar] [CrossRef]
  21. Liu, J.; Song, K.; Feng, M.; Yan, Y.; Tu, Z.; Zhu, L. Semi-supervised anomaly detection with dual prototypes autoencoder for industrial surface inspection. Opt. Lasers Eng. 2021, 136, 106324. [Google Scholar] [CrossRef]
  22. Yang, L.; Chen, J.; Wang, Z.; Wang, W.; Jiang, J.; Dong, X.; Zhang, W. Semi-supervised log-based anomaly detection via probabilistic label estimation. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), Madrid, Spain, 22 May–30 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1448–1460. [Google Scholar]
  23. Jalayer, M.; Orsenigo, C.; Vercellis, C. Fault detection and diagnosis for rotating machinery: A model based on convolutional LSTM, Fast Fourier and continuous wavelet transforms. Comput. Ind. 2021, 125, 103378. [Google Scholar] [CrossRef]
  24. Li, Z.; Liu, X.; Incecik, A.; Gupta, M.K.; Królczyk, G.M.; Gardoni, P. A novel ensemble deep learning model for cutting tool wear monitoring using audio sensors. J. Manuf. Process. 2022, 79, 233–249. [Google Scholar] [CrossRef]
  25. Li, M.; Chen, W. FFT-based deep feature learning method for EEG classification. Biomed. Signal Process. Control 2021, 66, 102492. [Google Scholar] [CrossRef]
  26. Song, G.; Hong, S.H.; Kyzer, T.; Wang, Y. Energy consumption auditing based on a generative adversarial network for anomaly detection of robotic manipulators. Future Gener. Comput. Syst. 2023, 149, 376–389. [Google Scholar] [CrossRef]
  27. Nantel, E. 4 DoF Robotic Arm. Available online: https://wiki.lynxmotion.com/info/wiki/lynxmotion/view/ses-v2-arms/lss-4dof-arm/#HSpecifications (accessed on 26 May 2024).
  28. Hong, S.H.; Kyzer, T.; Cornelius, J.; Zahiri, F.; Wang, Y. Intelligent anomaly detection of robot manipulator based on energy consumption auditing. In Proceedings of the 2022 IEEE Aerospace Conference (AERO), Big Sky, MT, USA, 5–12 March 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–11. [Google Scholar]
  29. Xu, J.; Wu, H.; Wang, J.; Long, M. Anomaly transformer: Time series anomaly detection with association discrepancy. arXiv 2021, arXiv:2110.02642. [Google Scholar]
  30. Geiger, A.; Liu, D.; Alnegheimish, S.; Cuesta-Infante, A.; Veeramachaneni, K. Tadgan: Time series anomaly detection using generative adversarial networks. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Virtual, 10–13 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 33–43. [Google Scholar]
  31. Xiang, Y.; Huo, S.; Wu, Y.; Gong, Y.; Zhu, M. Integrating AI for Enhanced Exploration of Video Recommendation Algorithm via Improved Collaborative Filtering. J. Theory Pract. Eng. Sci. 2024, 4, 83–90. [Google Scholar]
  32. Guo, J.; Zhang, S.; Qian, Y.; Wang, Y. An adaptively weighted loss-enabled lightweight teacher–student model for real-time railroad inspection on edge devices. Neural Comput. Appl. 2023, 35, 24455–24472. [Google Scholar] [CrossRef]
  33. Liu, T.; Xu, C.; Qiao, Y.; Jiang, C.; Chen, W. News Recommendation with Attention Mechanism. arXiv 2024, arXiv:2402.07422. [Google Scholar]

U-TFF: A U-Net-Based Anomaly Detection Framework for Robotic Manipulator Energy Consumption Auditing Using Fast Fourier Transform (1)

Figure 1. The pipeline of the proposed anomaly detection framework and system.

Figure 1. The pipeline of the proposed anomaly detection framework and system.

U-TFF: A U-Net-Based Anomaly Detection Framework for Robotic Manipulator Energy Consumption Auditing Using Fast Fourier Transform (2)

U-TFF: A U-Net-Based Anomaly Detection Framework for Robotic Manipulator Energy Consumption Auditing Using Fast Fourier Transform (3)

Figure 2. The architecture of the Time–Frequency Fusion block.

Figure 2. The architecture of the Time–Frequency Fusion block.

U-TFF: A U-Net-Based Anomaly Detection Framework for Robotic Manipulator Energy Consumption Auditing Using Fast Fourier Transform (4)

U-TFF: A U-Net-Based Anomaly Detection Framework for Robotic Manipulator Energy Consumption Auditing Using Fast Fourier Transform (5)

Figure 3. Overview of the proposed U-TFF network.

Figure 3. Overview of the proposed U-TFF network.

U-TFF: A U-Net-Based Anomaly Detection Framework for Robotic Manipulator Energy Consumption Auditing Using Fast Fourier Transform (6)

U-TFF: A U-Net-Based Anomaly Detection Framework for Robotic Manipulator Energy Consumption Auditing Using Fast Fourier Transform (7)

Figure 4. The schematic with Lynxmotion robotic manipulator.

Figure 4. The schematic with Lynxmotion robotic manipulator.

U-TFF: A U-Net-Based Anomaly Detection Framework for Robotic Manipulator Energy Consumption Auditing Using Fast Fourier Transform (8)

U-TFF: A U-Net-Based Anomaly Detection Framework for Robotic Manipulator Energy Consumption Auditing Using Fast Fourier Transform (9)

Figure 5. Example of the time series data for energy consumption in the custom dataset.

Figure 5. Example of the time series data for energy consumption in the custom dataset.

U-TFF: A U-Net-Based Anomaly Detection Framework for Robotic Manipulator Energy Consumption Auditing Using Fast Fourier Transform (10)

U-TFF: A U-Net-Based Anomaly Detection Framework for Robotic Manipulator Energy Consumption Auditing Using Fast Fourier Transform (11)

Figure 6. Physical attack detection results using U-TFF.

Figure 6. Physical attack detection results using U-TFF.

U-TFF: A U-Net-Based Anomaly Detection Framework for Robotic Manipulator Energy Consumption Auditing Using Fast Fourier Transform (12)

U-TFF: A U-Net-Based Anomaly Detection Framework for Robotic Manipulator Energy Consumption Auditing Using Fast Fourier Transform (13)

Table 1. Confusion matrix results for physical attack detection.

Table 1. Confusion matrix results for physical attack detection.

Confusion MatrixPredicted Label
PositiveNegative
Ground TruthPositive647131
Negative2044018

U-TFF: A U-Net-Based Anomaly Detection Framework for Robotic Manipulator Energy Consumption Auditing Using Fast Fourier Transform (14)

Table 2. Evaluation metrics for physical attack detection results.

Table 2. Evaluation metrics for physical attack detection results.

Detection MannerRecallPrecisionAccuracy
Instant-level0.83160.76030.933
Event-level0.93750.93750.9394

U-TFF: A U-Net-Based Anomaly Detection Framework for Robotic Manipulator Energy Consumption Auditing Using Fast Fourier Transform (15)

Table 3. Comparison results on the custom dataset.

Table 3. Comparison results on the custom dataset.

ModelRecallPrecisionAccuracy
DAE0.2260.73460.8554
GAN with one D0.7870.83020.9368
GAN with five D0.8130.80440.935
Anomaly Transformer0.3420.92490.8881
TadGAN0.33730.61420.8522
Proposed0.83160.76030.933

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.


© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
U-TFF: A U-Net-Based Anomaly Detection Framework for Robotic Manipulator Energy Consumption Auditing Using Fast Fourier Transform (2024)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Francesca Jacobs Ret

Last Updated:

Views: 5961

Rating: 4.8 / 5 (48 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Francesca Jacobs Ret

Birthday: 1996-12-09

Address: Apt. 141 1406 Mitch Summit, New Teganshire, UT 82655-0699

Phone: +2296092334654

Job: Technology Architect

Hobby: Snowboarding, Scouting, Foreign language learning, Dowsing, Baton twirling, Sculpting, Cabaret

Introduction: My name is Francesca Jacobs Ret, I am a innocent, super, beautiful, charming, lucky, gentle, clever person who loves writing and wants to share my knowledge and understanding with you.