TCPwave TITAN is the one-stop solution for all your DNS security needs. It uses advanced technologies where AI/ML plays a major role. One of the solutions that TITAN provides is DNS Tunnel Detection. These tunnel detection ML algorithms are trained using massive and varied DNS data thereby helping it to detect the malicious DNS traffic flowing through the DNS pathways in your organization.
Supervised learning is the machine learning task of learning a function that maps an input to an output based on input-output pairs given in the training phase. It infers a function from labeled training data consisting of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a multidimensional vector) and the desired output value (also called the supervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way.
A random forest classifier is used as a classification algorithm. A random forest classifier is a bootstrapping algorithm with multiple decision trees acting in the model. The fundamental concept behind the random forest is the wisdom of crowds. A large number of relatively uncorrelated models (trees) operating as a committee will outperform any of the individual constituent models. If we have 1000 samples of data with 10 variables. Random forest tries to build multiple decision tree models with different samples and different initial variables. For instance, it will take a random sample of 100 rows and 5 randomly chosen initial variables to build a decision tree model. It will repeat the process (say) 10 times and then make a final prediction on each observation. The final prediction is a function of each prediction. This final prediction can simply be the mean of each prediction.
The trained machine learning model to classify anomalous DNS queries. Uses powerful Random Forest Classifier. Ability to retrain the model from the UI using organization specific whitelisted data.
The machine learning model is applied on the contiguous live packets taken at the regular interval of time. The detected queries by this model will be passed to traffic analysis rule based model.
The filtered queries by ML model pass through the set of rules defined by the network administrator such as query count threshold from a host, query count threshold for a domain and other critical parameters.
Queries for top 1000 public domains are whitelisted and are filtered before sending to ML model for detection.
Various charts such as Top 10 domains queried, Top 10 hosts queried, Top 10 successfully queried domains, Top 10 failed domains, Top 10 FQDN lengths and many other charts give insight to the admin and help to define realistic rules.
The network administrator will see a notification on the Dashboard generated by the monitoring engine when malicious domains are detected. Admin can then block the domain in RPZ from the UI. Admin can also whitelist domains. Admin can import domain reputation data for the system to remember while making decision.