Cyber Threat Extraction

  • Date: Feb 2018
  • Category: Data Science
  • Key Tags: SVM, KNN, Spark

A web based application can predict cyber attack's possible tactics and techniques by analysing attack report.

Introduction

It is a team project belongs to Big Data Management and Analytics Lab, UT Dallas.
Director: Prof. Latifur Khan
Students: Ayoade, Gbadebo Gbadero. Zhang, Runze. Lella, Ashoka Vardhan.
The code is not public for now.

Cyber Threat Extraction is a web based application, it can predict cyber attack's possible tactics and techniques by analysing attack report.

By leveraging natural language processing techniques, we propose and implement a system that extract meaningful text from reports, classify them into standard tactics and then to techniques and provide rel- evant mitigation responses to the attacks. Since standardized threat reports with proper tactics and techniques categorization is scarce, resulting in limited training data, we encounter the challenge of biased training dataset. We propose the use of a bias correction technique called kernel mean matching to correct the bias of the model built from the limited training data and apply it on classifying threat reports from various cyber-security organizations across the Internet.

To classify the reports to obtain techniques used by the attacker to complete a tactic, we leverage label propagation to propagate the confidence score of the classifier for tactics classes to the techniques class. To evaluate our approach, we performed experiment on 18,500 real threat report files from various computer security or- ganizations and applied bias correction techniques which resulted in increase in classification accuracy by 12 %.

Front-End

This web-based application allows users upload attack text report through Report Form, after select classifier methods, required tactics and techniques’ result counts, click submit. The information will be sent through JSON File to the back-end, and the result form will show the corresponding prediction result according to the return JSON file.

Report Form Interface

Report Form has four input parts and one submit button.

  • Text Area
  • Classifier Selection(SVM/KNN)
  • Tactics Count
  • Techniques Count
  • Submit Button

Result Form Interface

Result Form will show tactics and techniques sorted by probability.

When mouse on any tactic or technique’s name, it will have a temporary window to show this tactic/technique’s description.

Back-End Setup

Install all requirements

pip install -r requirements.txt

Run the server by using the following command:

python main.py

By default the service runs on port 3614. This can be changed by modifying the last line in main.py

app.run(host='0.0.0.0', debug=True, port=3614)

The service can be requested at localhost:3614/classify using a POST request with the following input json:

{
    "text":"text content....",
    "tactics_count":2,
    "techniques_count":10
}

All the fields text, tactics_count, techniques_count are mandatory. The text field should be atleast of length 20 characters long. Some sample test requests are available in test.sh. Below is the expected output after running test.sh

{
  "status": "success"
}
{
  "error": "Missing required field: text"
}
{
  "error": "Missing required field: tactics_count"
}
{
  "error": "Missing required field: techniques_count"
}
{
  "error": "Content of text is too less to tag the document"
}
{
  "knn_tactics": {
    "tactics": [
      {
        "probability": 0.8, 
        "tactic": "1. Persistence"
      }, 
      {
        "probability": 0.2, 
        "tactic": "2. Exfiltration"
      }, 
      {
        "probability": 0.0, 
        "tactic": "3. Privilege Escalation"
      }
    ], 
    "techniques": [
      {
        "probability": 0.046498998691105346, 
        "technique": "1. Plist Modification"
      }, 
      {
        "probability": 0.04227772929712973, 
        "technique": "2. Application Shimming"
      }, 
      {
        "probability": 0.03585973090375598, 
        "technique": "3. Accessibility Features"
      }, 
      {
        "probability": 0.03214838131153677, 
        "technique": "4. Scheduled Task"
      }, 
      {
        "probability": 0.028898670567490622, 
        "technique": "5. DLL Injection"
      }, 
      {
        "probability": 0.02666643647889151, 
        "technique": "6. Hidden Files and Directories"
      }
    ]
  }, 
  "svm_no_w2v": {
    "tactics": [
      {
        "probability": 0.7235331515043044, 
        "tactic": "1. Exfiltration"
      }, 
      {
        "probability": 0.15607838930914225, 
        "tactic": "2. Persistence"
      }, 
      {
        "probability": 0.06363746482034609, 
        "tactic": "3. Credential Access"
      }
    ], 
    "techniques": [
      {
        "probability": 0.1619158119158139, 
        "technique": "1. Bash History"
      }, 
      {
        "probability": 0.03765845403767891, 
        "technique": "2. Plist Modification"
      }, 
      {
        "probability": 0.034239746454108946, 
        "technique": "3. Application Shimming"
      }, 
      {
        "probability": 0.029041959312146367, 
        "technique": "4. Accessibility Features"
      }, 
      {
        "probability": 0.026036223877609357, 
        "technique": "5. Scheduled Task"
      }, 
      {
        "probability": 0.02159652467271545, 
        "technique": "6. Hidden Files and Directories"
      }
    ]
  }, 
  "svm_w2v": {
    "tactics": [
      {
        "probability": 0.6357702043143317, 
        "tactic": "1. Persistence"
      }, 
      {
        "probability": 0.2569947563234411, 
        "tactic": "2. Exfiltration"
      }, 
      {
        "probability": 0.040483758777964424, 
        "tactic": "3. Privilege Escalation"
      }
    ], 
    "techniques": [
      {
        "probability": 0.046498998691105346, 
        "technique": "1. Plist Modification"
      }, 
      {
        "probability": 0.04227772929712973, 
        "technique": "2. Application Shimming"
      }, 
      {
        "probability": 0.03585973090375598, 
        "technique": "3. Accessibility Features"
      }, 
      {
        "probability": 0.03214838131153677, 
        "technique": "4. Scheduled Task"
      }, 
      {
        "probability": 0.028898670567490622, 
        "technique": "5. DLL Injection"
      }, 
      {
        "probability": 0.02666643647889151, 
        "technique": "6. Hidden Files and Directories"
      }
    ]
  }
}