This article lists the most common service limits you might encounter as you use Hunters.
What is the purpose of these limits?
These limitations serve to protect the Hunters infrastructure from spam leads and to maintain high performance. It also serves to prevent Hunters customers from an influx of irrelevant leads and stories. In parallel to these limitations, our research teams are constantly learning from the data and improving detectors to achieve higher fidelity in lead creation.
Detection limits
Item | Limit | More about it |
---|---|---|
Lead creation from Hunters and custom detectors | 150 per detector per day | The detector |
Lead creation from third-party detectors | 100K per detector per day | The detector |
📘Note
In case a lead’s Start Time and End Time values are not found in the raw data (i.e. no event time is found as part of the event), Hunters will resort to the following fallback logic:
If
start_time
orend_time
are null, Hunters will insert to these fields the timestamp of the event’s insertion time into the data lake (METADATA$INSERTION_TIME
).If insertion time is also missing, Hunters will insert the lead creation time.
This is relevant to lead creation, but not for event processing. Meaning, for detectors where the event time is an inherent part of the logic (e.g. time windowed detectors, or statistical time series analysis detectors), events with missing timestamps will not be adjusted using the above-mentioned logic, but rather omitted and not processed as part of the detection logic.
Ingestion limits
Ingestion through an intermediary S3 bucket
Item | Limit | More about it |
---|---|---|
File size | 50MB (compressed) | Files in the connected bucket must be below the specified limit. |
Our permanent ingestion pipeline is designed to handle various data sources and volumes effectively. However, processing extremely large files can lead to performance issues, resource constraints, and unexpected behaviors not only for this pipeline but also for your other interconnected pipelines. By capping the file size at 50MB (after compression), we can maintain optimal pipeline performance and minimize the risk of disruptions.
Consequences of exceeding the file size limit
Pipeline Disruptions: Uploading files larger than the specified limit can cause disruptions in the data processing flow, impacting your data's availability downstream in this pipeline and other interconnected pipelines.
Resource Strain: Processing large files consumes significant system resources, potentially leading to slowdowns not only for data processing in this pipeline but also affecting the performance of your other connected pipelines.
Increased Latency: Larger files take longer to process, which may increase data ingestion latency for your datasets in this pipeline and other connected pipelines.
Best practices
To ensure a smooth and efficient data ingestion process across all your pipelines, please follow these best practices:
Pre-processing Large Files: If you have files that exceed the 50MB limit, consider breaking them down into smaller, manageable chunks before submitting them to the ingestion pipeline.
Compression: Compressing large files (e.g., using gzip) can significantly reduce their size without sacrificing data integrity. Ensure that your compressed files remain within the 50MB limit.
Batching: If you have multiple smaller files to upload, consider batching them together to minimize the number of individual uploads and reduce potential overhead.
Optimized File Formats: Select file formats that balance file size and data integrity. Some formats are more efficient than others for certain types of data.
Contacting support
If you encounter any issues or have questions related to data ingestion or file size limitations, our support team is here to assist you. Feel free to reach out to our support channels, and we'll be glad to help you.
By adhering to these file size limit guidelines you contribute to a stable and reliable data ingestion process for all your pipelines. We appreciate your cooperation and look forward to a successful and uninterrupted data processing experience!