Rapid strides have been made in the syntactic analysis (part of speech, dependency parses) of unstructured text as well as in tasks such as Concept and entity extraction and named entity recognition. However, relation extraction from unstructured text remains a challenge. Users are often expected to handcraft relation extraction rules for their domain, especially for data in the non-consumer space (e.g., industrial domains, cybersecurity).
The goal of this project is to learn relation extraction rules with the help of user feedback and interaction in the form of positive examples and interactive chat based dialog. A possible approach is using NLP and deep learning techniques over a combination of syntactic and semantic patterns in a set of user annotated sentences and convert the patterns to a generic extraction rule. We hope this project will aid in accelerating the digitization of domain knowledge – developing algorithms to improve relation extraction from unstructured data, especially speeding the knowledge capture process.
Collaborators from GE: Dr. Varish Mulwad, Dr. Kareem Aggour
This project was supported in part by GE Research.
- Adithya Bandi, Karuna P. Joshi and Vaish Mulwad, “Affinity Propagation Initialisation Based Proximity Clustering For Labeling in Natural Language Based Big Data Systems“, In Proceedings of 6th IEEE International Conference on Big Data Security on Cloud (BigDataSecurity 2020), May 2020.
- Agniva Banerjee, Raka Dalal, Sudip Mittal, and Karuna Pande Joshi, “Generating Digital Twin models using Knowledge Graphs for Industrial Production Lines”, Workshop on Industrial Knowledge Graphs, co-located with the 9th International ACM Web Science Conference 2017, June 2017.