News
Article
Author(s):
Researchers from Google and Stanford Medicine recently partnered to develop the Skin Condition Image Network dataset, a tool aimed at providing image resources for conditions across skin tones.
In collaboration with physicians at Stanford Medicine, Google recently announced the formation and launch of its Skin Condition Image Network (SCIN) dataset.1 The collaborative effort is aimed at addressing current limitations of dermatology image datasets.
This concern is not new, however. It has been well-documented that the representation of patients with darker skin types in clinical trial settings and educational resources is limited.2
"Dermatology conditions are diverse in their appearance and severity and manifest differently across skin tones. Yet, existing dermatology image datasets often lack representation of everyday conditions (like rashes, allergies and infections) and skew towards lighter skin tones," according to the news from Google.1 "Furthermore, race and ethnicity information is frequently missing, hindering our ability to assess disparities or create solutions."
SCIN has been developed as a free, open-access resource with considered protections for the protection of contributor privacy. With more than 10,000 images currently available within the database, images thus far have been provided by patients experiencing various skin, hair, and nail conditions.
Patients contributing images to the database, who were US-based, did so on a voluntarily basis and with informed consent amid an institutional-review board approved study.3 The study, published by Cornell University's arXiv, received an average of 22 contributor-based submissions per day beginning in March 2023.3
Upon image collection, patients were instructed to take images at varying proximities to the skin in order to provide context for dermatologic labeling. All patients were given the option to provide self-reported demographic data, and images are labeled with dermatologist estimates of Fitzpatrick Skin Type and layperson labeler estimates of Monk Skin Tone.
A range of 1 to 3 dermatologists then labeled each contributed image with upwards of 5 dermatologic conditions and a subsequent confidence score.
At present, the SCIN dataset is comprised of eruptions (56.43%), cutaneous infections (21.85%), contact dermatitis (10.5%), vascular conditions (2.24%), pigmentary disorders (0.75%), ulcers and blisters (0.32%), benign neoplasms (3.85%), malignant or pre-malignant conditions (1.35%), nail conditions (0.25%), hair disorders (0.04%), and other disorders (2.42%).
Compared to existing datasets, SCIN primarily contains images central to allergic, inflammatory, and infectious conditions, versus clinical sources that primarily focus on benign and malignant neoplasms. Researchers referenced existing datasets FitzPatrick17k, DDI, ISIC 2020, ISIC 2017, PH2, HAM10000, SKINL2, and PAD-UFES.
Regarding Fitzpatrick skin types, self-reported skin type distribution included 8.64% Type I, 24.92% Type II, 30.39% Type III, 19.63% Type IV, 9.84% Type V, and 6.57% Type VI. This distribution varied with dermatologist-estimated skin types, which included 7.55% Type I, 40.21% Type II, 32.76% Type III, 13.64% Type IV, 5.27% Type V, and 0.57% Type VI.
"We hope the SCIN dataset will be a helpful resource for those working to advance inclusive dermatology research, education, and AI tool development," wrote Pooja Rao, PhD, a research scientist at Google Research.1 "By demonstrating an alternative to traditional dataset creation methods, SCIN paves the way for more representative datasets in areas where self-reported data or retrospective labeling is feasible."
References