The National Cancer Database (NCDB) is a nationally recognized clinical oncology database that is jointly sponsored by the American College of Surgeons and the American Cancer Society. The NCDB includes 34 million patient records with 90% follow-up during a 5-year period. While participating cancer centers represent approximately 30% of hospitals within the United States, the NCDB captures approximately 70% of all patients newly diagnosed with cancer.
Some of the variables reported are as follows:
· Demographic information (patient age, sex, etc)
· Type of center the patient was treated at (community, academic, integrated cancer center)
· Year the patient was diagnosed
· Time to biopsy or surgical removal after diagnosis
· Type of biopsy performed, if any
· Type of surgical resection performed, if any (local excision, wide excision, radical)
· Presence of contaminated margins, if any (microscopic, macroscopic, etc)
· Tumor characteristics (size, grade, subtype, etc)
· Whether patient received chemo or radiation
· Vital status (whether patient was alive at last visit or not and when that was)
As with any database, there are limitations to the NCDB and they are discussed in a review article in the Journal of the American Medical Association at the following:
https://jamanetwork.com/journals/jamaoncology/article-abstract/2604822
The National Trauma Data Bank (NTDB), represents a concerted and sustained effort by the American College of Surgeons Committee on Trauma (ACSCOT) to provide an extensive collection of trauma registry data provided primarily by accredited/designated trauma centers across the U.S.
Included variables:
· Demographic information
· Injury variables (time/location, industrial/work related, primary ICD-10 code)
· Use of protective equiptment
· Pre-hospital information (EMS factors, field vital signs)
· Triage criteria
· ER course information (vitals, drug screening, GCS, disposition)
· Hospital procedures (ICD-10-CM codes)
· Comorbid conditions
· Injury severity
· Outcome data (length of stay, final disposition)
· Financial information/Workers compensation
Truven database from IBM encompasses 3.2 million patient that includes a family of administrative claims databases that contain data on inpatient and outpatient claims, outpatient prescription claims, clinical utilization records, and healthcare expenditures. The three main databases available for use are each composed of a convenience sample for one of the following patient populations: (1) patients with employer-based health insurance from contributing employers, (2) Medicare beneficiaries who possess supplemental insurance paid by their employers, and (3) patients with Medicaid in one of eleven participating states. Eleven supplemental databases are available, which are utilized to overcome the limited clinical data available in the core MarketScan databases. There are several limitations to this database, primarily related to the fact that individuals or their family members within two of the core databases mandatorily possess some form of employer-based health insurance, which prevents the dataset from being nationally representative. Nonetheless, this database provides detailed and rigorously maintained claims data to identify healthcare utilization patterns among this cohort of patients.
National Surgical Quality Improvement Program
Hospital submitted perioperative data to include medical comoribidites and patient outcomes up to 30 days postoperatively. This data is good to look at perioperative complications and factors that contribute to them. No long term data is within it.
Texas inpatient public use data use file (PUDF), which is administrative data that is submitted by most (but not all) hospitals in Texas. It is accessed via a request to the TX DSHS, and is available to the public. Since it's administrative data, not clinical data, there are limitations, but any sort of procedure (via CPT codes and ICD9/10 PC) or diagnosis (via ICD9/10 CM codes) can be examined. It would be best for cost analysis and demographic studies, in my opinion, as well as possible trends in treatment (if the procedure codes/CPT codes are different). For more granular information on diagnosis codes, working with years since 2016 (after implementation of ICD10) is best - more specific. It can take a while to get the data refined but the end results and the large numbers to work with hopefully justify it. Likely requires new request every project.