Introduction

TCPWave's flexible and extensive DNS TITAN platform offers superior protection & performance. It is powered by machine learning and artificial intelligence. The advanced threat intelligence features embedded into the Network Security Management (NSM) models of the TCPWave IPAM are acclaimed by many existing customers. The threat intelligence of TCPWave, along with the comprehensive features, does not require any separate license to realize the benefits. The DNS TITAN, powered with a performance-tuned Atlantis Deep Learning model, is a part of the base license of the TCPWave DDI offering.

Domain Generation Algorithms Detection Challenges

Malicious attackers use Domain Generation Algorithms (DGA) techniques to bypass traditional security mechanisms such as blacklisting of domains, reverse engineering malware to detect hardcoded Command and Control server’s IP addresses and domain names.

 

With DGA, attackers generate many random domain names, which the malware attempts to connect using DNS queries. With a shared seed and DGA between the malware and the attacker, the attacker registers one of the generated domains to resolve the Command and Control (C&C) server. The malware tries to resolve the generated domains until it discovers the Command-and-Control server and starts communicating with it. This technique of random generation and disposal of registered domains, also known as Domain-Fluxing or Fast-Flux, makes it difficult for the security teams to blacklist domains and takedown the C&C server.

 

Diagram

Description automatically generated

There are about 56 known DGA families, according to Netlab. Some of them include gameover, cryptolocker, abcbot, Mirai, etc. The classification is based on the parameters like generated pattern length, characters used, notation, etc., of the domains. Some examples are mentioned below:

 

DGA Family

Example

Cryptolocker

nvjwoofansjbh.ru

Gameover

14dtuor1aubbmjhgup7915tlinc.net

Abcbot

knjpeuzyr.tk

Mirai

xvrvdsuhphjg.tech

 

Having DGA infected malware devices inside the network poses the risk of data exfiltration using DNS tunnels. Detection methodologies like log analysis of the suspicious DNS traffic are laborious, time-consuming, and prone to evaluation errors.

TCPWave Titan

DNS TITAN is one of the solutions that TCPWave provides to combat the DGAs, which detects the traffic, alerts security engineers, and blocks malicious IP. DNS TITAN uses In-House built tunnel detection Machine Learning (ML) algorithms trained using massive and varied DNS data, thereby helping it learn and detect the malicious DNS traffic flowing through the DNS pathways in your organization.

 

Machine Learning techniques are broadly categorized into Supervised, Unsupervised, and Reinforcement learning based on the signal or feedback available to train the models. Supervised learning is machine learning, in which machines are trained on labeled Data.  Unsupervised learning uses ML algorithms to analyze the patterns and clusters in the Data. And Reinforcement learning trains the sequence of actions to maximize the reward.

 

The TCPWave Threat Intelligence leverages Supervised Machine Learning techniques like Decision Tree (DT), Random Forest (RF) Support Vector Machine (SVM), and K-Nearest Neighbors (KNN), also Deep Learning techniques like Convolution Neural Networks (CNN), Long-and Short-Term Memory (LSTM) Networks and Artificial Neural Networks (ANN) to detect malicious traffic flowing on DNS.

 

The TCPWave has trained and validated the models by combining more than one technique along with one technique at a time to add an extra edge of utilizing collective judgment of multiple techniques in one model. Post various iterations, TCPWave built ATLANTIS Model (The hybrid model with CNN, LSTM, and ANN techniques).

 

Meanwhile, Deep Learning techniques have demonstrated the ability to find and extract these structural, linguistic, and statistical features. As the DGA generated domains, random alphanumeric character domains, are different from the legitimate traffic in structural, linguistic, and statistical properties, ML models are trained on derived feature queries. DL Models are trained on encoded query vectors. 3.5M records of malicious and non-malicious domains are used to train the models, and these models perform well on their various training and testing sets. They can generalize to new DGA families or new versions of previously seen families. The following diagram illustrates how TCPWave IPAM detects and prevents DGA in real-time:

 

Diagram

Description automatically generated


The following machine learning techniques are used in TCPWave IPAM:

DGA Detection Configuration in TCPWave IPAM

The TCPWave IPAM allows the administrator to define the Network Security Monitoring templates assigned to individual TCPWave DNS Remote appliances. Network Security Monitoring templates will enable the admin to select the desired Machine Learning model under “Enable Anomaly Detection”, to detect and prevent the DGA attack. The following Machine learning models are available:

Graphical user interface, application, Word

Description automatically generated

 

Now associate the NSM template to the TCPWave DNS Remote appliances from the TCPWave DNS Appliances page.

 

Graphical user interface, text, application

Description automatically generated

Viewing Alerts and Logs

The system generates alerts in the TCPWave IPAM Dashboard when an attack is detected. On setting the global parameters from the TCPWave IPAM Global Policy Management, the system prevents the DGA attacks in real-time.

 

 

TCPWave DDI Admin has the privilege to view the DGA activity by fetching the Suspicious Query Log (ML) report of a remote DNS appliance.

 

A screenshot of a computer

Description automatically generated

Model Evaluation

The model evaluation section provides the metrics, model performance on various DGA families. TCPWave IPAM set of models predicts the accuracy of 88% - 98%. The following sub-sections describe one of the best model performances.

Model Performance

The model has performed great across the metrics on training and test datasets. Clocked Training accuracy of 0.9778, And on test accuracy is 0.9790, Precision, Recall and F1 Score 0.9867, 0.9799, 0.9833.

Performance on DGA families

We have validated the model performance on different DGA families and the following are the accuracy and False Negative ratio metrics.

 

DGA_class

Accuracy

FN_ratio

bamital

100

0

banjori

99.77

0.23

chinad

100

0

cryptolocker

99

1

dircrypt

98.96

1.04

dyre

100

0

emotet

99.68

0.32

enviserv

99.8

0.2

feodo

100

0

flubot

99.62

0.38

fobber_v1

100

0

fobber_v2

98.66

1.34

gameover

100

0

locky

96.72

3.28

murofet

99.79

0.21

mydoom

94.2

5.8

necro

98.71

1.29

necurs

97.84

2.16

padcrypt

97.02

2.98

pykspa_v1

97.48

2.52

pykspa_v2_fake

94

6

pykspa_v2_real

93.43

6.57

qadars

97.5

2.5

ramnit

98.33

1.67

ranbyus

99.58

0.42

rovnix

99.97

0.03

shifu

91.08

8.92

shiotob

98.93

1.07

simda

97.84

2.16

symmi

99.51

0.49

tempedreve

91.19

8.81

tinba

99.63

0.37

tordwm

91.18

8.82

 

Note: Ignored the metrics of the DGA families, for the domains less than 100.

Conclusion

Armed with this information, network administrators can plan and govern their DNS environment, accelerating IP space provisioning and management, simplifying configuration, supporting zero-touch automation initiatives, and strengthening corporate cybersecurity posture.

Appendix

The machine learning modules are evaluated using the following metrics:

Accuracy = (TP+TN)/(TP+TN+FP+FN)

Precision = TP/(TP+FP)

Recall = TP/(TP+FN)

F1 Score = (2*Precision*Recall)/ (Precision+ Recall)

False Negative Ratio (also called as Miss Rate) = FN/(FN+TP)