REU@NMT CSE Research Projects

Project 1: Advanced machine learning techniques for botnet detection

A botnet is a collection of compromised internet-connected devices that are infected by malware and remotely controlled as a group by a Botmaster. Owners usually are unaware of the infection of their devices. Botnets are currently the main platform for cybercriminals to launch various cyber attacks including distributed denial-of-service (DDoS) attacks, sending spam emails, installing spyware to steal sensitive private information, and so on. It was reported that there was a 69.2% increase in the first quarter of 2017 over the previous quarter in botnet malware usage, which poses a great challenge to cybersecurity. The project will attack the botnet detection with advanced machine learning techniques. Network-based detection will be adopted in this project, which analyzes the characteristics of network flow to identify anomalous traffic between the infected devices and the Command & Control (C&C) server of the botnet. Bio-inspired algorithms will be employed to select qualified network flow-based feature subset for botnet detection. Advanced machine learning techniques such as ensemble learning and cost-sensitive learning will be explored to improve the detection performance.

Project 2: Intrusion detection for cyber-physical systems

Cyber-physical systems (CPSs) are new class of engineered systems which integrate physical resources with computational and communication components. Since the cyber domain and physical domain are highly integrated, to secure the systems from attacks is a great challenge. The intrusion detection system (IDS) has been shown to be an effective way to monitor a network or system for malicious attacks. The design of CPS IDS is a challenging work which has to consider the unique properties of CPS. The aim of this project is to develop a distributed data-driven IDS for a large scale CPS like smart grid. Novel data-driven algorithm that combine knowledge and behavior will be developed to detect both existing and zero-day attacks. Bio-inspired approaches will be investigated as possible solutions to develop the algorithms and systems.

Project 3: PDF malware detection with visualization techniques and deep learning

PDF (Portable Document Format) is a file format invented by Adobe for presenting, exchanging and archiving documents that is independent of hardware, software, and operating systems. As one of the most used file formats, PDF documents have become one of the major vectors for malware attacks. This is mainly due to the flexibility of PDF file structure and the ability of embedding different kinds of contents such as JavaScript code, encoded streams and image objects etc. These features can be exploited by attackers to embed the malware in PDF files using tools like Metasploit. For example, it was reported that the current popular Ransomware can be hidden inside PDF documents to launch the attacks.

Various PDF malware analysis techniques have been proposed to address the challenges of PDF malware attacks, including keyword-based techniques, tree-based techniques, code-based techniques and machine learning-based techniques. This project will attack the problem using a different approach. The image visualization techniques and deep learning  will be investigated to build advanced models for PDF malware detection.

Project 4: Efficient access control model for smart grids

Smart grids integrate cyber infrastructure with legacy power grid to enable efficient power generation, delivery and usage. However, the adding of cyber infrastructure increases the complexity of the system, which introduces new vulnerabilities that could be exploited by potential adversaries. Traditional access control methods for computer networks such as mandatory access control (MAC), discretionary access control (DAC), and role-based access control (RBAC) are not suitable for smart grids due to the unique characteristics of smart grids. Smart grids have multiple types of systems roles (operators, engineers, technicians, managers etc.) and multiple security domains (e.g. interconnected region networks), even domain of domains. The primary security objective of smart grids is availability while the traditional access control models only focus on confidentiality and integrity. The aim of this project is to develop a flexible, sustainable and scalable access control model for smart grids. A security analysis will be performed to prove that the developed model is secure against threats and attacks.

Project 5: Privacy preserving for location-based services using spatial transformations

While mobile users would like to use location-based services (LBSs) to obtain answers to queries such as ‘Find me the nearest Italian restaurant with a rating > 3 on’, they would also like to preserve their privacy by not disclosing their exact location. This project will investigate the technique of hiding the exact location through a spatial transformation. Hilbert curves are space-filling curves that provide such spatial transformation through hash functions with limited preservation of proximity of the domain. This provides a mapping from (x, y)-coordinates of points of interest (e.g., restaurants) into non-negative integers. So, a trusted entity is employed which transforms (encrypts) the location of each point of interest into a corresponding integer and sends these encoded locations to the Location-based Server while sharing the parameters used for the transformation (encryption key) only with the user (i.e., not with the LBS).

To query the LBS, the user would encrypt his/her own location and send that to the LBS, which would find the nearest point of interest in the encrypted space and return that location. The user then decrypts the returned location to find the actual location. There are two interesting research questions to be investigated. First, to what extent is the ‘nearby’ point of interest in the encrypted space actually close in the (x, y) space? How effective are heuristics such as the use of two orthogonal curves to mitigate non-proximate locations? Second, to what extent can an adversary perform the decryption given limited knowledge of the parameters used in the creation of the Hilbert curve? In particular, are there some parameters that are more crucial than others?

Project 6: Secure data logging for mobile devices

Users of modern smartphones store a lot of sensitive data in their devices. Attackers are interested in obtaining a foothold on these phones in order to take monetary and strategic advantage of the sensitive data. another problem is that phones are often lost or stolen; when such events occur, data is revealed to potentially dangerous adversaries. When a lost or stolen phone is recovered, the owner would want to learn if the sensitive data was accessed or changed. While the technology of database logging is well developed, the techniques typically do not apply to phones owing to the limitations of storage space. We have shown how tamper-resistant logging can be developed on devices like cell phones with only a part of the actual log stored on the phone and used to mitigate the above problems. This project will implement the scheme and experiment with the degree of tamper resistance and performance. Furthermore, the use of the ideas behind steganography to detect unauthorized access will be investigated.

Project 7: Developing user mental model through reasoned action approach against semantic attacks

Semantic attacks, which target the way we, as humans, assign meaning to contents, have been challenging and difficult to handle with in computer security. The underlying cause of semantic attacks is a semantic barrier between the user’s mental model and the system’s actual processing model. The SSLstripping attack is one of the semantic attacks, which takes advantage of the simple observation that most users do not explicitly type the safe address of a web page (https), but rather rely either on the browser or the target page to redirect them to a secure location. This opens the opportunity to strip users’ sessions of its security, while giving the user the illusion of privacy. In this project, we will investigate how to better understand and develop the user’s mental processing model in the context of cybersecurity through the use of the reasoned action approach (RAA), which explains that a user’s behavior is determined by his/her intention to perform the behavior and the intention is, in turn, a function of attitudes towards the behavior, perceived norms (or social pressure), and perceived behavior control (capacity and relevant skills/abilities). Then we will conduct research on how to utilize the developed model to enhance the effectiveness, efficiency, and satisfaction of our previous solution against the SLLstripping attack (SSLight).

Project 8: Privacy controls in smart grids

Consumers’ data such as power usage may be released to third-party companies for various purposes, and it is desirable that each consumer decides whether the granularity of information in the released data is tolerable or violates security and privacy needs. This project aims at developing an advanced approach to managing consumer data, which considers privacy risks in disclosing information in a smart grid system. Specifically, by using the notion of differential privacy, this project focuses on developing an efficient privacy-preserving scheme that considers not just data aggregation but also data generation and communication in the smart grid.

Project 9: Credit card fraud, internet access, and impacts in rural America

Rural America is different from urban America in several unique ways. One major difference is in the information infrastructure and access to information. Admittedly, the information revolution has brought substantial improvement to the quality of life in rural America. At the same time, it makes rural America vulnerable to information fraud and cyber-attacks because of the way that information is accessed. For most parts of rural America, especially in the Midwest and the West, internet infrastructure has been poorly developed as many areas do not have access to affordable broadband service until today. In recent years, credit card fraud and identity thefts have been widely spreading across the country. These cyber-security incidents have affected rural and remote area residents more severely simply because of the potential delay in alerting, identifying, and responding to the incidents. Most of these processes rely critically on internet access and information infrastructure. This project aims to study how credit card fraud and identity thefts affect rural areas in the US, and how different ways of internet access and the development of information infrastructure could help to alleviate the impacts. The project expects to: (1) collect data on credit card fraud and identify thefts through open sources and web scraping; and (2) use advanced data analytics (e.g. regression methods, textual analysis, data envelopment analysis) to evaluate the impacts and derive implications for policy intervention.