ABSTRACT:
Cybersecurity is a present and growing concern that needs to be addressed with both behavioral and design-oriented research. Public cloud providers such as Amazon Web Services and federal funding agencies such as the National Science Foundation have invested billions of dollars into developing high-performance computing resources accessible to users through configurable virtual machine (VM) images. This approach offers users the flexibility of changing and updating their environment for their computational needs. Despite the substantial benefits, users often introduce thousands of vulnerabilities by installing open-source software packages and misconfiguring file systems. Given the scale of vulnerabilities, security personnel struggle to identify and prioritize vulnerable assets for remediation. In this research, we designed a novel unsupervised deep learning-based Multi-View Combinatorial-Attentive Autoencoder (MV-CAAE) to capture multi-dimensional vulnerability data and automatically identify groups of similar vulnerable compute instances to help facilitate the development of targeted remediation strategies. We rigorously evaluated the proposed MV-CAAE against state-of-the-art methods in three technical clustering experiments. Experiment results indicate that the MV-CAAE achieves V-measure scores (metric of cluster quality) 8 percent-48 percent higher than benchmark methods. We demonstrated the practical value through a comprehensive case study by clustering vulnerable VMs and gathering qualitative feedback from experienced security professionals through semi-structured interviews. The results indicated that clustering vulnerable assets can help prioritize vulnerable instances for remediation and enhance decision-making tasks. The present design-research work also contributes to our theoretical knowledge of cyber-defense.
Key words and phrases: Online vulnerability, multi-view representation learning, attention mechanisms, cybersecurity, deep learning, cyberinfrastructure, design science, cloud computing, asset clustering