Priyanka Ranade

MS Information Systems, May 2019

Priyanka did her research in the field of analyzing multi-lingual cybersecurity artifacts. She successfully defended her Master’s thesis in May 2019 and completed her BS, Information Systems in 2018.

Along with KNACC lab, she was also associated with the Accelerated Cognitive Cybersecurity Lab (ACCL) and the Ebiquity Research group.

Thesis TitleMultilingual Text Alignment for Cyber Security

Committee: Dr. Karuna P. Joshi (Chair), Dr. Anupam Joshi (Co-Chair), Dr. Zhiyuan Chen, Dr. Shimei Pan

Thesis Abstract:  Cybersecurity threats, exploits, and intelligence sources have evolved to be largely cross-regional over the course of time. Although the security community perpetually addresses this topic, its scope is continually stretching and introducing new areas of study. Particularly, an area of research that is relevant but heavily under-explored, is the use of multilingual open source intelligence in cyber operations. Open Source Intelligence (OSINT) in the form of text is scattered across major criminal networks and is highly multilingual in nature. By aligning multilingual sources, the security community can tap into new pools of intelligence. Language alignment can be achieved through the use of neural machine translation (NMT) systems. This thesis explores supervised and unsupervised methods in aligning multilingual open source intelligence sources without the use of third-party engines. Although third-party engines are growing stronger, they are unsuited for private security environments. First, sensitive intelligence is not a permitted input to third-party engines due to privacy and confidentiality policies. In addition, third-party engines produce generalized translations that tend to lack exclusive cybersecurity terminology, which could be integral in attack discovery.

We address these issues and describe our system that enables threat intelligence understanding across unfamiliar languages. We create monolingual and multilingual word embeddings from open source intelligence data in two distinct languages and derive a bilingual dictionary through both supervised and unsupervised methods. We then create a neural network-based system that takes in cybersecurity data in a different language and outputs the respective English translation. We evaluate with traditional approaches, and through experimental applications.

 

Priyanka is currently working for Northrop Grumman.