Data Discovery and Source Mapping
Unknown Slack workspaces, unregistered cloud tenants, and undisclosed SaaS platforms create discovery exposure long before a Rule 26(f) conference is scheduled. For matters in the SDNY, EDNY, NY Supreme Court, or NY Commercial Division, NYCF maps the full data landscape before collection begins.
What This Solves
Most discovery disputes do not originate in the courtroom. They surface weeks earlier, when a key Salesforce org goes unidentified during initial scoping, a Slack workspace created during a pre-deal due diligence process never makes it onto the custodian list, or a Snowflake data warehouse turns out to hold years of transactional records that opposing counsel expected to see in production. For New York corporate litigation and regulatory matters, the stakes around these gaps are especially high. Judges in the SDNY and EDNY have imposed significant sanctions for inadequate preservation and incomplete productions, and NY Commercial Division Rule 11-c specifically addresses ESI disclosure obligations at the outset of commercial cases.
NYCF's data discovery and source mapping work eliminates those surprises. Before any collection begins, NYCF analysts conduct structured source interviews, deploy connector-based tools against known cloud environments, and produce a documented data inventory that gives counsel a defensible picture of where relevant ESI exists, who controls it, and what needs to be preserved. The result is better scoping, fewer late-breaking disclosures, and a stronger foundation for every downstream step in the matter.
What We Identify
Enterprise data rarely stays in one place. A single custodian at a Manhattan financial services firm may have relevant ESI spread across a primary email account, a personal OneDrive folder, a shared Teams channel, a Slack workspace created under a partner tenant for a specific transaction, a Salesforce org with embedded email logging, and a Box folder shared with outside counsel on Park Avenue. NYCF maps all of it.
Shadow IT is a particular issue in complex commercial matters involving New York technology companies, media businesses, and financial institutions. Employees frequently adopt SaaS tools that IT never formally sanctioned: a department using a consumer Zoom account for file storage, a deal team running due diligence notes in an unregistered Notion workspace, or a trading desk keeping track of communications through personal Gmail. NYCF's source discovery process specifically looks for these repositories, not just the ones that appear on the initial custodian interview list.
Platforms NYCF routinely identifies and maps include:
Microsoft 365: Exchange Online, SharePoint, OneDrive, Teams, and Viva Engage (Yammer). Google Workspace: Gmail, Drive, Chat, Meet recordings, and Shared Drives. Slack: standard channels, private channels, direct messages, and Slack Connect workspaces used with outside parties. Salesforce: email logs, activity records, case notes, documents, and chatter feeds. Snowflake and Databricks: analytical data warehouses and data lakehouse environments common in New York financial technology and data-driven businesses. AWS and Azure: S3 buckets, RDS databases, Blob storage, SQL databases, and Active Directory audit logs. Oracle and SAP: ERP records, financial data, HR modules, and transaction histories at the enterprise level. SharePoint on-premises and hybrid deployments, including those common among New York government agencies and large institutions. Box, Zoom, and other collaboration platforms, including version histories, comments, and meeting transcripts. Backup systems, archive tapes, and decommissioned server images retained for regulatory compliance.
Custodian-Source Alignment
Identifying sources is only part of the work. NYCF pairs each source with the custodians who have access, the categories of data each source holds, and the time periods for which data is available and recoverable. This custodian-source alignment document becomes the authoritative reference for scoping preservation and collection decisions throughout the matter.
For matters with large custodian populations, which are common in New York commercial litigation involving financial institutions and publicly traded companies, NYCF uses structured intake questionnaires alongside automated account enumeration to cross-check self-reported information against actual system activity. An employee may not recall that they had access to a particular SharePoint site during a specific transaction window, but the access logs will show it. NYCF reconciles both sources of information and documents any discrepancies for counsel's review.
NY Commercial Division matters present a particular challenge when corporate parties have gone through mergers, spin-offs, or significant restructuring in the relevant period. Legacy tenants, successor accounts, and archived mailboxes from predecessor entities often contain relevant ESI that custodians have forgotten about entirely. NYCF specifically reviews organizational history during the source identification phase to flag these situations before they become production gaps.
NYCF's Process
Source Interviews
NYCF conducts structured interviews with IT administrators, department heads, and key custodians to build an initial inventory of known data sources. Interview guides are tailored to the matter type: large-scale commercial litigation in NY Supreme Court, regulatory inquiries from the NYDFS or NY Attorney General, internal corporate investigations, and employment matters each require different lines of questioning. All interview notes are documented and retained as part of the matter record.
Connector-Based Discovery
Using API-level connectors and administrative access, NYCF enumerates actual data locations across cloud tenants, SaaS platforms, and on-premises systems. This automated layer catches repositories that custodians did not mention and independently verifies the completeness of the interview-based inventory. Results are compared against the interview record and discrepancies are documented for counsel.
Live Data Mapping
NYCF builds a living data map that documents each identified source: platform, data type, custodian access, date range of available data, retention policy, and collection feasibility. The map is updated throughout the engagement as new sources surface or additional custodians are identified. Counsel receives updated versions as material changes occur.
Custodian-Source Alignment
Each custodian is formally linked to the systems they use or have used, including systems they may not have disclosed during interviews. NYCF cross-references interview responses against system access logs, directory services, and license assignment records to identify gaps or inconsistencies. Departures, transfers, and role changes during the relevant period are factored into the analysis.
Scope Validation with Counsel
NYCF presents the completed data map to counsel for review, confirms which sources fall within scope, and documents any sources that counsel has determined are outside scope along with the basis for that determination. For NY Commercial Division cases requiring a preliminary conference ESI disclosure, this documentation serves as the technical foundation for the positions counsel takes at the conference.
Defensibility and Documentation
Data discovery work is only as valuable as its documentation. Courts in the SDNY and EDNY, along with NY Supreme Court Commercial Division judges, increasingly scrutinize not just what was collected, but how the producing party determined what existed in the first place. NYCF delivers a written Source Identification Report documenting the methodology used, the sources identified, the custodians interviewed, the connector queries executed, and the scope decisions made by counsel.
This report gives counsel a factual record to cite in Rule 26(f) conferences, NY Commercial Division preliminary conferences, and any subsequent motion practice about the adequacy of the producing party's search. Every source enumeration query is logged with timestamps. Every interview is documented with the date, participants, and topics covered. The report can be supplemented as new information comes to light and is structured to support a NYCF declaration if the adequacy of the source identification process is challenged in motion practice.
NYCF's forensic analysis of source data is distinct from the attorney's conclusions about relevance. The scope decisions, privilege determinations, and discovery strategy are entirely the domain of counsel. NYCF provides the technical documentation of what exists and how it was identified; counsel decides what to preserve, collect, and produce.
Deliverables
At the conclusion of a source mapping engagement, NYCF delivers the following:
A Source Identification Report providing a written summary of all sources identified, the method of identification, and their relationship to the custodians and time periods at issue. A Data Map structured as a formal inventory linking each source to its data type, associated custodians, available date range, applicable retention policy, and current collection status. A Custodian-Source Matrix aligning each custodian to their data footprint across all identified systems, including systems identified through automated enumeration that the custodian did not disclose. A Shadow IT Log documenting any unauthorized or unapproved platforms identified during the discovery process, with a description of the data types involved and the custodians who used them. A Scope Confirmation Memo recording counsel's in-scope and out-of-scope determinations, suitable for production or disclosure in connection with a Rule 26(f) conference or NY Commercial Division ESI submission.
Last reviewed and updated: April 2026
Cloud Tenant Enumeration
NYCF scans Microsoft 365, Google Workspace, and Slack tenants to identify all workspaces, shared drives, and external collaboration environments relevant to the matter. Regional and subsidiary tenants, guest account access, and Slack Connect external workspaces are all included in the enumeration.
SaaS Platform Discovery
Salesforce, Box, Zoom, and other collaboration platform inventories are built through a combination of administrative access and network-level license analysis. Shadow IT identification includes platforms that IT never formally approved but that employees used for business communications or document storage.
Database and Structured Data Sources
Snowflake, Databricks, Oracle, and SAP environments are mapped alongside cloud databases including AWS RDS and Azure SQL. For New York financial institutions and trading firms, NYCF also identifies proprietary trading system databases and compliance archive systems that often hold responsive ESI outside the standard corporate email and document systems.
Custodian Management
Structured custodian interviews use guided questionnaires tailored to the matter and the custodian's role. Interview responses are cross-referenced against directory services and access logs. Supplemental custodian identification occurs throughout the engagement as new sources and collaborators surface.
Map Your Data Before Opposing Counsel Does
All matters are strictly confidential. Contact NYCF to begin a source mapping engagement for your New York litigation or regulatory matter.
Related Services
Legal Holds and Preservation
Once sources are identified, NYCF issues and tracks defensible hold notices, manages custodian acknowledgments, and documents preservation steps to protect against spoliation claims in NY courts.
Learn MoreCloud and SaaS Collections
NYCF collects directly from M365, Google Workspace, Slack, Box, Zoom, and more using the eCloudDiscovery platform, preserving metadata and chain of custody at each step.
Learn MoreComputer Forensics
When ESI requires forensic-level device examination, NYCF's certified examiners work alongside eDiscovery collections for court-admissible evidence recovery in New York proceedings.
Learn MoreStart With the Full Picture
Better source mapping means fewer surprises, lower collection costs, and stronger defensibility in SDNY, EDNY, and NY Supreme Court matters. NYCF can begin a source mapping engagement within days of retention.