Not All Threat Feeds Are Equal: How We Score Feed Quality

Data quality dashboard showing feed metrics

The threat intelligence market has a volume problem masquerading as a quality problem. Vendors compete on indicator count — "40 million IOCs per day" is a more impressive headline than "12 million IOCs with 96% precision" — which creates systematic incentives to include low-confidence indicators in feeds rather than filtering them. The downstream consequence is that SOC teams ingest feeds that inflate their indicator volume while providing little additional detection value and significant false positive overhead.

The Four Dimensions of Feed Quality

Threat feed quality is multidimensional, and optimizing on any single dimension produces distorted assessments. ThreatPulsar evaluates each feed across four dimensions, weighted by their operational impact on enrichment decision quality:

Precision (false positive rate): The percentage of indicators in the feed that, when observed in a customer environment, reflect actual malicious activity rather than legitimate behavior. A feed reporting 10,000 IP addresses per day with a 15% false positive rate means 1,500 of those IPs will generate false positive enrichment hits in environments where those IPs host legitimate services. For feeds that include CDN IP ranges, shared hosting blocks, or Tor exit nodes used by both legitimate privacy-conscious users and threat actors, precision can drop below 60%.

Freshness (time-to-inclusion and half-life): How quickly new malicious infrastructure is added to the feed after first observation, and how long stale indicators remain in the feed after the infrastructure is decommissioned. A feed that adds indicators within 2 hours of first observation but retains them for 180 days after the associated infrastructure goes offline provides accurate freshness for new indicators but generates stale hit noise on older ones.

Coverage (indicator type and sector distribution): Whether the feed covers the IOC types and threat actor sectors relevant to the customer's environment. A feed that primarily covers commodity malware C2 infrastructure has limited value for a financial services SOC dealing primarily with targeted attacks by nation-state actors using living-off-the-land techniques. Coverage assessment requires mapping feed composition against the customer's threat model, not just counting total indicators.

Attribution depth: Whether the feed provides context beyond the indicator itself — malware family associations, threat actor group links, MITRE ATT&CK technique tags, and campaign identifiers. An IP address with only a "malicious" verdict contributes less enrichment value than the same IP address with a threat actor group association and three linked MITRE techniques, even if both arrive at the same time.

Measuring Precision: The Back-Testing Problem

Precision measurement is technically challenging because the ground truth is uncertain. The "correct" verdict for a given IP address depends on how you define maliciousness and over what time window you measure. An IP that hosted a botnet C2 server last Tuesday but now hosts a legitimate SaaS application is technically malicious in historical context and legitimate in current context. Whether a hit on that indicator in your environment represents a real threat depends on when the connection occurred and whether the botnet is still using that IP.

ThreatPulsar measures feed precision using a 30-day back-testing window: for each indicator in a feed, we compare the feed's verdict against confirmations from other independent sources within 30 days of the feed's first inclusion of the indicator. If an IP is flagged in Feed A and corroborated by two or more independent sources within 30 days, it is counted as a true positive for Feed A's precision score. If it is flagged by Feed A only and not corroborated by any other source within 30 days, it is counted as a presumed false positive. This method undercounts true positives for indicators that are genuinely malicious but only visible to one intelligence source — specialized infrastructure used exclusively in targeted attacks, for example — but it provides a consistent, computable precision estimate that does not require human analyst ground-truth labeling of every indicator.

Using this methodology, precision scores across ThreatPulsar's integrated feeds range from 67% to 97%, with a median of 84%. The three lowest-precision feeds in the library are all open source IP reputation feeds with large communities of volunteer submitters; the highest-precision feeds are commercial feeds from vendors with dedicated collections infrastructure focused on specific threat actor groups.

Freshness Measurement: Half-Life and Detection Windows

For C2 infrastructure specifically, the half-life of an indicator — the time after which the infrastructure is statistically likely to have been decommissioned or repurposed — is a critical feed quality parameter. Analysis of C2 IP indicators across multiple feeds shows a median infrastructure half-life of 14 days for botnet C2 and 47 days for targeted attack C2. This means that half of botnet C2 indicators in a feed are operationally stale within two weeks of being reported.

A feed that retains indicators for 90 days without staleness filtering is therefore providing, at any given time, approximately 71% stale indicators for botnet C2 (those past their 14-day half-life) and only 29% fresh ones. Enrichment hits on stale indicators produce false positives when the infrastructure has been reassigned to a different customer by the hosting provider.

ThreatPulsar applies a per-feed, per-indicator-type freshness decay function when weighting enrichment results. An indicator from a feed that is 60 days old for a C2 IP type has its confidence contribution reduced significantly relative to the same indicator newly added to the feed. This prevents stale indicators from producing high-confidence enrichment hits simply because they are present in a feed that does not expire old data.

When More Feeds Do Not Mean Better Enrichment

A counterintuitive finding from operating an enrichment platform with 40+ feeds is that beyond a threshold of approximately 15-20 high-quality feeds with complementary coverage, adding more feeds increases false positive noise faster than it adds true positive coverage. The marginal feed added to a mature enrichment library tends to have high overlap with existing feeds on well-documented malicious infrastructure and high divergence on marginal or low-confidence indicators — exactly the indicators most likely to produce false positives.

This is the feed aggregation trap: a platform that queries 40 feeds does not necessarily produce better enrichment than one that queries 15 well-selected feeds, and may produce worse enrichment if the additional 25 feeds contribute primarily noise. The operational indicator is false positive rate trend: if adding a new feed increases your overall false positive rate without a corresponding increase in true positive detections, the feed is adding noise, not signal.

ThreatPulsar monitors per-feed contribution to true positive and false positive enrichment outcomes on an ongoing basis. Feeds that drop below a quality threshold — precision below 70%, or a false positive to true positive contribution ratio above 0.3 — are either weighted near zero in enrichment scoring or removed from the active feed library entirely. Feed quality is treated as a dynamic operational parameter, not a static vendor selection decision.

Open Source vs. Commercial Feeds: A Realistic Comparison

The open source vs. commercial debate in threat intelligence often conflates two different questions: coverage and precision. Open source feeds often have comparable or superior coverage on commodity threats — Feodo Tracker's coverage of banking trojan C2 infrastructure, for example, is among the best available for that specific threat category regardless of cost. Commercial feeds tend to have superior precision and attribution depth on targeted attack infrastructure because the collection methodology involves human analysts and proprietary collection infrastructure that can verify indicators before publication.

The practical implication is that feed selection should be driven by threat model fit, not by cost or brand recognition. A SOC primarily dealing with commodity ransomware and phishing campaigns can build a high-precision enrichment library from open source and low-cost commercial feeds if those feeds are selected for their specific coverage of those threat categories. A SOC dealing with nation-state actors targeting their industry needs commercial feeds from vendors with specific expertise in that threat actor group's infrastructure and techniques.

Conclusion

Feed quality measurement is a continuous operational function, not a one-time vendor evaluation. The quality of any given feed changes as the vendor's collection methodology evolves, as the threat actor landscape shifts, and as false positive-generating infrastructure gets repurposed. An enrichment platform that treats feed weights as static configuration is operating with an increasingly inaccurate model of its own data quality.

The questions worth asking about any threat intelligence feed: What is the measured false positive rate against my specific environment types? What is the average age of indicators at the time they identify malicious activity in customer environments? How many indicators per day have no corroboration from independent sources? These are computable metrics — not marketing claims — and they are the ones that determine whether a feed contributes to detection quality or detracts from it.

Back to Insights