PathLAKE Data Lake

The PathLAKE project aims to curate and make available to researchers, raw image data from multiple samples across the entire landscape of human disease. This will be achieved by taking digitised images created in the routine scanning of histopathology slides, either at the time they are reported or by scanning slides from existing research cohorts. De-identified image and metadata from these cases is then transferred from the contributing NHS centres to the Data Lake and stored. Each record is tagged with fields taken from the pathology system (such as specimen type, site, SNOMED code) to enable the cases to be identified in search strategies. In addition, each case contains metadata files which are also derived from the original hospital record and stored in de-identified form. This is designed to give researchers as much detail as possible about the individual cases they are studying. For more information on what data is currently stored within the Data Lake, please click the link below

PathLake Portal

Data Access

Access to PathLAKE data is subject to the PathLAKE governance process and approval of the Access Committee. This process ensures that researchers using PathLAKE data have viable plans that maintain public and professional trust, ensure the research is of public benefit, and are methodologically robust.
The governance process consists of the following:

Screening of applications: Each application is checked for completeness.
Review: Applications are sent to the Access Committee for full review. Applications may also undergo additional review by subject matter experts or information governance experts.
Decision and feedback: PathLAKE will communicate the committee’s decision to the researcher.
Approved applications: All applications approved by the Access Committee are referred on to the Data Management Committee, who will manage the process of data access in collaboration with the researcher.

If you are interested in applying for access to data, please download the access form here and return the completed form by email to
No projects will be approved which guide a clinical intervention. These will be directed to obtain separate National Research ethics approval.

Approved Applications

Application No. Company Title of Research
DL2022/007 University of Warwick Automatic cell recognition in colon tissue samples
DL2022/015 RAIR Health Limited The extraction of proprietary clinical insights from real world pathology data using graph database technology.

What is Computer Assisted Diagnosis?

The PathLAKE research database has been reviewed and approved by South Central – Oxford C Research Ethics Committee,