Cybercrime Detection: A Robust Algorithm for Detecting Rare Text
An algorithm reducing cost, time, and increasing efficacy over current detection systems. It is applicable for detecting cybercrime, for supporting GRC initiatives, as well as to identify cyberbullying and self-harm, including suicidal thoughts.
Existing algorithms and detection systems based on them typically depend on manually developed dictionaries and / or require a large number of training instances rendering them unwieldy and requiring on-going labor-intensive efforts.
However, the inventors have developed a new algorithm for machine learning in short text (such as SMS messages or tweets). The algorithm is particularly tailored to identifying specific occurrences or types of instances within large volumes of text and they believe the algorithm is applicable to cyberbullying and suicidal intent content in online communication, terrorist activity, and financial fraud.
The E-2 algorithm begins with a set of labeled training data and is refined by continuous learning on unlabeled data, reducing cost, time, and increasing efficacy. Testing and initial use shows the algorithm is accurate with few false positives.