Early Case Intelligence and Culling
Analytics, deduplication, email threading, and communications mapping applied before full review begins. Modern ECA reduces active review volume by 60 to 80 percent. NYCF's process makes every reduction decision documented and defensible for NY state and federal court proceedings.
What This Solves
After collection, a New York litigation team is looking at several hundred thousand documents. Most are not relevant. A substantial portion are exact duplicates sitting in multiple custodians' inboxes. Hundreds of email threads are fragmented across a dozen custodians at different stages of the reply chain. Review everything document by document and you have weeks of expensive attorney time spent on material the court will never see.
Early case intelligence is the phase where NYCF applies analytics and reduction techniques before a single document goes to an attorney for review. The goal is moving only the right data forward: unique, potentially responsive documents in a form reviewers can navigate efficiently. This is not about cutting corners or reducing production completeness. Every reduction decision is documented, the methodology is logged, and the parameters used to exclude documents can be reproduced and explained to a SDNY magistrate, an EDNY judge, or opposing counsel in a Rule 26(f) conference. Modern ECA, done properly, reduces review volume by 60 to 80 percent without sacrificing responsive material.
Deduplication: Global and Custodial
Hash-based deduplication is the foundation of volume reduction. NYCF computes cryptographic hash values (MD5 and SHA-256) for every document in the collection. Two files with identical hash values are exact duplicates, byte for byte. One copy moves to review. The duplicate is suppressed but not destroyed: NYCF maintains a cross-reference log mapping every suppressed duplicate to its canonical copy, so that custodian-specific context is preserved if needed.
The choice between global and custodial deduplication is a legal decision, not a technical one, and it needs to be agreed upon with counsel before processing begins. Global deduplication removes a document from review if it appears anywhere in the collection. Custodial deduplication removes duplicates only within each custodian's own data. If the same document appears in five custodians' email, all five copies proceed to review under a custodial approach, because the fact that each person held the document may itself be relevant to the theory of the case.
NYCF documents which methodology was applied, the parameters used, the hash algorithm, and the resulting document counts at each stage. This documentation goes into the processing log and is available for opposing counsel review or court submission if challenged, including in proportionality arguments under FRCP Rule 26(b)(1) or the NY CPLR general disclosure rules (CPLR 3101, 3120) as applied to ESI.
Email Threading and Inclusive Email Analysis
An email thread starts as a single message. By the time discovery is collected, that conversation may exist as dozens of individual messages held by different custodians at different points in the reply chain. Each message is technically a separate document, but reviewing them separately is redundant: later messages in the thread quote all prior content.
Email threading identifies the most inclusive email in each conversation thread. The inclusive email is the latest message in the chain that contains all prior content within it. If a reviewer reads the inclusive email, they have seen everything said in that thread. Earlier messages are flagged as thread members but can be set aside unless a specific review decision requires looking at earlier branch points separately.
Threading also handles replies, forwards, and divergent branches. NYCF's threading analysis identifies where a thread splits (when someone forwards a message and starts a parallel conversation) and ensures that each branch has its own inclusive email identified. The result is a structured view of every conversation, not a flat list of individual messages, which matters in large commercial cases before the NY Commercial Division or in multi-party SDNY litigation.
Near-Duplicate Detection
Exact hash deduplication removes byte-identical files. Near-duplicate detection handles the next layer: documents that are substantially similar but not identical. A contract with a single clause changed, a draft email with one paragraph removed, or a report with an updated financial figure each is a different document by hash but may require only a fraction of review time if identified as near-duplicate of something already reviewed.
NYCF's near-duplicate analysis computes similarity scores between documents and groups near-duplicates into clusters. Reviewers see the pivot document for each cluster, review the differences within the cluster, and apply consistent coding decisions across clustered documents without reading each one independently. The similarity threshold used and the resulting clusters are documented in the processing log, which is disclosable under NY Commercial Division Rule 11-c and comparable federal court ESI practice.
Communications Mapping and Data Visualization
Communications mapping analyzes email and message metadata across the entire collection to build a structured picture of who communicated with whom, how frequently, and during which time periods. For NY commercial litigation, this analysis serves two concrete purposes: identifying the most active communicators in the relevant time period (which often focuses review on the custodians that matter most to the theory), and identifying unexpected communications, such as messages between parties who should not have been in contact or communication spikes around key business events.
NYCF delivers communications maps as visual outputs: network graphs showing communication volume and frequency between custodians, timeline charts showing spikes around key dates, and frequency tables exportable to Excel or PDF. These outputs support attorney strategy sessions in the weeks before depositions begin and can be adapted for presentation to clients or co-counsel.
Search Strategy Development
Keyword searches in eDiscovery are more consequential than most attorneys initially appreciate. An overly broad search returns tens of thousands of irrelevant hits that inflate review costs and create proportionality problems. An overly narrow search misses responsive documents and creates production gaps that become motion practice. NYCF works with counsel to develop a search strategy grounded in the facts of the specific matter.
The process starts with a review of the case narrative: who are the key players, what were the key events, what documents does counsel need to find? NYCF translates that narrative into a structured set of search terms with field-level targeting (searching sender fields differently than body text), date range parameters, custodian scoping, and proximity operators for multi-word combinations. NYCF then runs the terms against the post-processing collection and reports hit counts so that counsel can see the impact of each term before committing to it. Terms that return an unusually high or low hit count relative to the matter's facts trigger review and refinement.
The final search protocol is documented in writing and becomes part of the matter record, supporting disclosure of search methodology in Rule 26(f) conferences and demonstrating reasonable, proportionate discovery efforts under SDNY and EDNY individual judges' ESI standing orders.
NYCF's Early Case Intelligence Process
Ingestion and Indexing
Collected data is loaded into NYCF's processing environment. Every file is extracted, metadata is captured, and the full text is indexed for search. Containers such as ZIP archives, PST files, and MSG files with attachments are exploded into component items while family relationships are preserved. The ingestion log records file counts, document counts, processing exceptions, and any items requiring special handling, and it is available for disclosure to counsel.
Analytics and Email Threading
NYCF runs email threading across the full data set, identifying inclusive emails and building a structured conversation hierarchy for every thread. Near-duplicate clustering identifies groups of substantially similar documents and designates a pivot document for each cluster. Communications mapping generates custodian interaction data and timeline analytics. All analytical outputs are reviewed with counsel before any documents are excluded from review.
Deduplication: Global or Custodial
NYCF applies hash-based deduplication using MD5 and SHA-256 algorithms, using the methodology specified by counsel. Family integrity is preserved throughout: if a parent document is suppressed as a duplicate, its attachments are suppressed with it; if a parent is active, all attachments remain active in the review set. The deduplication log maps every suppressed document to its canonical copy and records hash values for both, satisfying the record-keeping expectations of NY court ESI practice requirements.
Communications Mapping and Anomaly Identification
Communications analysis outputs are presented to counsel: network graphs of custodian interactions, volume timelines with annotations for key case events, and tables of high-frequency communicators. NYCF identifies communication patterns warranting closer attention, including unexpected communications between parties, spikes around key dates in the matter's timeline, and custodians with low collection volume relative to their expected role in the business transactions at issue.
Search Strategy Development and Testing
NYCF develops a structured search protocol in collaboration with counsel, including keyword terms, field targeting, date filters, custodian scoping, and proximity operators. Each term is tested against the collection before the protocol is finalized, with hit counts reported so counsel can see the impact of individual terms. The agreed protocol is documented before it is applied to generate the final review set, creating the disclosure record required in SDNY and EDNY ESI management conferences.
ECA Summary Report and Review Set Delivery
Before any data moves to the Advantage Plus review platform, NYCF produces an ECA summary report documenting each reduction step: documents received, documents after deduplication, documents after threading suppression, documents after search culling, and the final review set count. The report includes reduction percentages at each stage, the parameters used, and a summary of analytical findings. This report supports Rule 26(f) planning discussions and documents reasonable, proportionate discovery efforts.
Family Integrity and Defensibility
Family integrity is one of the most important constraints in ECA. A "family" in eDiscovery is a parent document and its attachments, or an email and its embedded images and forwarded attachments. Courts expect that families travel together through review and production: you cannot produce a contract without its exhibits, or an email without the spreadsheet attached to it. NYCF's processing preserves family relationships at every reduction step.
When a non-inclusive email is suppressed by threading, its attachments are suppressed with it. When that email's inclusive copy proceeds to review, the attachments from all thread members are associated with it. No attachment is orphaned from its parent. When deduplication suppresses a document, the family of the active copy receives any unique attachments from suppressed copies. These rules are applied consistently, logged, and documented in a form suitable for disclosure to opposing counsel or presentation to a NY Commercial Division justice or SDNY magistrate if the completeness of the review set is challenged.
What NYCF Delivers
An ECA summary report with before/after document counts at each reduction stage. A deduplication log with hash values and custodian cross-reference map. An email thread index identifying inclusive emails and thread members. A near-duplicate cluster report with similarity scores and pivot document identification. Communications mapping outputs including network graphs, timeline charts, and frequency tables. A documented search protocol with per-term hit counts. A defensibility log suitable for disclosure to opposing counsel, for use in Rule 26(f) conferences, or for submission to the court. The final review-ready data set loaded into the Advantage Plus platform or any platform specified by counsel.
Last reviewed and updated: April 2026
Hash-Based Deduplication
MD5 and SHA-256 hash computation across the full collection. Global or custodial methodology selected with counsel. Family integrity preserved throughout: parent-attachment relationships maintained at every suppression step. Full cross-reference log maintained and disclosable.
Email Threading
Inclusive email identification across all custodians and thread branches. Reply, forward, and divergent path analysis. Non-inclusive suppression with complete logging. Thread branch handling for split conversations. Review of thread structure presented to counsel before any suppression is applied.
Near-Duplicate Detection
Similarity scoring across the full post-collection data set. Cluster grouping with designated pivot document for each cluster. Documented similarity threshold agreed with counsel. Reduces redundant review within clusters without excluding any document from the matter record.
Communications Mapping
Custodian interaction network graphs. Volume timelines with annotations for key case events. Anomaly identification for unexpected communications or contact patterns. All outputs exportable to Excel or PDF for use in attorney strategy sessions or client presentations.
Early Case Intelligence Consultation
NYCF can run ECA on collected data within days of engagement. Contact us to discuss your matter, data volume, and review timeline.
Related Services
Advantage Plus Review Platform
NYCF's in-house review platform with flat-rate feature access, AI-assisted review queues, privilege workflows, and production handoff, all at one price with no per-feature charges.
Learn MoreManaged Review and Expert Services
NYCF-staffed review operations with project management, quality control, privilege review, and expert declarations for NY matters that require hands-on support beyond technology alone.
Learn MoreCloud and SaaS Collections
Direct-source collection from Microsoft 365, Google Workspace, Slack, and other platforms to ensure ECA begins with complete and defensible input data.
Learn MoreReduce Review Volume Before Review Begins
NYCF's early case intelligence services apply analytics, deduplication, and communications mapping to move only the right documents into review for your New York litigation matter. Contact us to discuss your data volume and timeline.