Brand monitoring is one of the premium services offered by CloudSEK’s flagship digital risk monitoring platform – XVigil. This functionality is comprehensive of a wide range of use cases such as:
- Fake domain monitoring
- Rogue/ fake application detection
- VIP monitoring
Threat actors deploy fake or rogue apps that masquerade as the official application, by infringing on our client’s trademark and copyrighted material. Upon seeing our client’s familiar trademark, their customers are tricked into installing such apps on their devices, thereby running a malicious code that allows threat actors to exfiltrate data. Just this year, XVigil reported and alerted our clients regarding over 2.4 lakh fake apps from various third party app stores.
Classification through similarity scoring
Classification of such threats forms a major part of XVigil’s threat monitoring framework. The platform identifies and classifies the app as fake or rogue based on whether it impersonates our client’s apps, or if the uploaded APK files are different from our client’s official APKs.
For any machine learning problem, you need training data, where you expect your test data to be similar to the training data. In this case, however, the data is different for each client, and you would need a separate model for every new client.
We approach the problem as a similarity score problem. We compare the suspected app with all the official apps of the client and see how similar the suspected app is to the official app in terms of the app-title, description, screenshots provided, logos, etc. Here, a greater concern arises when we don’t have all the information related to the client, to compare it with the suspicious app.
CloudSEK’s Knowledge Extraction Module
To gather more information about the client, we built a knowledge extraction module. When a new client signs up, this module is triggered and it tries to collect all the information it can about the client. The knowledge extraction module was built as a generic module that tries to obtain every piece of information it can about the client. The client’s name and their primary domain are the only inputs required for the process.
With these details, the knowledge extraction module can identify the industry that our client operates in, their various products and services, competitors of the client, the main tech-stack/ technologies the client works with, their official apps, and so on. These details are sourced from Google Play store, or by crawling and parsing the client’s website, various job listings posted by the clients themselves, etc. The gathered information is then passed through custom Named Entity Recognition (NER) models or static rules on them to get the client details in a structured manner.
When we monitor for malicious applications, we run a text/ image similarity model on the gathered information (client’s logo, official app, competitors, etc.) and the information present in the suspicious app. For example, text similarity checks how contextually similar the client’s app description is to that of the suspicious app. Another module tries to find if the client’s logos appear on the screenshots provided within the malicious app. The different scores are then summarised to recommend an ensemble score.
Finally…
If the similarity score is greater than a certain threshold, we can safely say that the app resembles our client’s brand/ app. Now we need to check if the APK uploaded is a modified APK or the original client’s APK itself. The only challenge here is to maintain the APKs released by the client, for any and all versions of their apps, for all devices/ regions. To work around this issue, we compare the suspicious app’s certificate with the certificates present in the official app of the client present on Google Play store. This removes all the apps’ APKs from the official app developers of the client. This leaves us with malicious apps, which is then reported to our clients.