How do GROUP BY detectors work?
GROUP BY detectors take their name from the SQL command “GROUP BY”.
This type of detector aggregates events to windows, with the window logic (defined by the user) being based on a specific time frame and specific attributes. For example, a window can contain all events related to a specific device_name
+ device_ip
in a 15-minute time frame.
The detector then generates leads per window and not per individual event.
However, these windows naturally contain less data than FILTER detectors. They will not include all the data from all the events, but rather only the “grouping keys”, the time window, and statistics defined by the user (e.g. number of distinct usernames seen across all the events in the window).
it is recommended to use this feature when a single event does not qualify for a lead, but a group of events does. For example, generating a lead only when more than 10 failed login attempts were seen for a specific user in a specific time frame.
Why are my test results different than the leads I’m seeing on the platform?
The test results are fetched using a live query, executed for providing a general understanding of the expected leads, including their format and amount.
There are different reasons why the results seen in the test differ from the leads in the platform:
Custom scoring rules - If any custom scoring rules were defined for the detector with the IGNORE action, some leads might’ve been ignored and therefore not available in the platform.
For GROUP BY detectors:
Missing unique attributes - Hunters has a deduplication mechanism that filters out duplicate leads which were generated on the same day. If the detector doesn’t have many attributes, leads might be dedupped and therefore not available in the platform. This can happen for GROUP BY detectors where no grouping keys or statistics were defined - which will cause the leads to be non-informative and easily duplicated.
Malformed timestamps in input data - GROUP BY detectors are sensitive to event times, because they aggregate events to time windows. In case there are missing event times in the raw data, or malformed timestamps (events far in the past, future, or general out-of-orderness), leads might not be generated. For example, a detector that looks at 10 failed logins in a 15 minute window, won’t generate leads if some of these events have no timestamp attached to them, or if they were inserted to the data lake hours apart.
If none of the reasons above are applicable and leads still seem to be missing, contact support for further assistance.
How can I create more complex detectors (joining different tables, manipulating the data, etc.)?
With SQL Custom Detectors, you can create detectors using SQL statements executed directly in the data lake.
These detectors can be created via API. For more information, check out the API Docs.