Collecting accurate patient demographic information is the foundation of a data-driven approach because it develops a clearer picture of a given patient population’s health status. Organizations need accurate demographic data to identify patient populations facing disparities at the point of care and in the broader community. Importantly, this data powerfully reinforces the case for addressing disparities.
To better understand their patient population, organizations should collect, at a minimum, both REGAL (Race, Ethnicity, Gender identity & sexual orientation, Age, and Language) data and data on the social determinants of health (SDOH) impacting people’s health status in their community. All organizations who are serious about their efforts to advance health equity need to make plans for broadening their demographic data collection if they are not yet at this point. While these data are a good starting point, over time aim to expand data collection to include additional sociodemographic domains, such as:
- Disability status
- Geography (i.e., urban, suburban, rural)
- Highest level of educational attainment
- Insurance status
- Religion
- Socioeconomic status (using payer as a proxy or percentage of federal poverty level)
- Veteran status
- ZIP code
Start with REGAL data to capture the intersections of identity
REGAL demographics have emerged as an industry standard because this data provides both a comprehensive and streamlined overview of a patient’s identity. When organizations fail to collect REGAL, they fail to capture key aspects of their patients’ identities. Collecting REGAL data provides insight into patient identity, which allows organizations to provide culturally sensitive and patient-centered care, better understand the needs of their patient population, and identify disparities.
While many organizations already collect basic demographic data on patients, there are often gaps in what data is collected and discrepancies in the data collection process. It's vital to break down REGAL data to identify intersections— capturing a full picture of each patient's identity. For example, it's not enough to look at patients by race. The experience of being a Black transgender woman is different than the experience of a Black man—and REGAL data aims to capture both.
There are two common pitfalls that organizations fall into when it comes to REGAL data collection:
- Organizations collect limited data on race and ethnicity. For example, when collecting patient demographic information, race and ethnicity are often grouped together. It’s important to distinguish between race and ethnicity: race refers to an individual’s physical traits, while ethnicity refers to an individual’s cultural identity or place of origin. While national standards include only two ethnicity categories, Hispanic and not Hispanic, many other ethnicities exist. For example, consider expanding data collection from the broader category of Asian to include more specific subsets like Chinese, Indian, Vietnamese, etc. Collecting data on ethnicity categories relevant to their local community better positions provider institutions to identify which groups are most at risk for experiencing a disparity and then better target interventions specific to those patient populations.
- Organizations collect gender identity data within the male-female binary. Traditionally, organizations have not included options for patients to share their sexual orientation or have had options for gender identity tha include only the male-female binary. By expanding options to include patients’ sexual orientation and gender identity, also commonly abbreviated to SOGI, organizations create a more inclusive environment, get a more accurate picture of care needs, and can better understand the experiences of their LGBTQ+ patients. Collecting SOGI demographics is one important step in a larger journey that health care organizations need to embark on to be more inclusive of their LGBTQ+ patients.
Layer on social needs data that has the biggest impact on health
According to the World Health Organization, SDOH are “the conditions in which people are born, grow, live, work and age.” These social factors have a profound impact on patients’ health status, and estimates show they can account for up to 60% of an individual’s health outcomes. However, health systems frequently do not account for patients’ social needs that may drive differences in care experiences and health outcomes. Collecting SDOH data gives health care providers a more comprehensive understanding of barriers to care. When combined with traditional clinical data, SDOH data promotes more effective care planning and interventions for individual patients, as well as a more accurate population risk assessment. Although collecting SDOH data may require more time and/or resources, it should ultimately save costs and effort because the patient’s care plan is aligned with their needs from the start.
Organizations should prioritize SDOH data collection for social factors that have an outsized impact on a patient’s health status (e.g., housing, food security) and/or are already known to be present in the community and contributing to patients' health status.
Sample SDOH screening questions

Collecting SDOH and REGAL data
Identifying disparities in any given patient population begins with effective screening. When collecting patient demographic data, don’t make assumptions based on appearances, as these assumptions may not align with how the patient identifies. For example, while the majority of patients may identify within the gender binary (either as a man or woman), many patients will identify on a gender spectrum (transgender, nonbinary, genderfluid, etc.) and might not use traditional masculine (he/him/his) or feminine (she/her/hers) pronouns. A clinician can’t accurately determine the complex nuances of gender identity based on a patient’s physical appearance. Instead, it’s better to ask patients their gender identity and preferred pronouns, rather than guessing incorrectly.
A self-reporting methodology offers an effective alternative that reduces the administrative burden on staff, but also ensures more accurate data collection. Leverage the online patient portal to allow patients the chance to self-report their demographic data either ahead of planned care or at the beginning of their interaction with the system. Giving patients the chance to self-report their demographic data removes any guesswork by staff. It also helps patients feel more comfortable and honest about potentially sensitive information, ensuring reported data is as accurate a reflection of their identity and social needs as possible.
However, when self-reporting demographic data, some patients may fear that their personal information will be used to discriminate against them, causing them to leave questions or sections of the questionnaire blank. To overcome this, organizations should supplement self-reporting with at least one validation touchpoint between a staff member and the patient to fill any gaps in the patient’s self-reporting. During this touchpoint, a staff member should run through a few questions so the patient can quickly validate their responses and give them another opportunity to answer any blank questions. Staff should share that they ask all patients these questions, explain the rationale behind the questions, and clarify how the information is used. Equip staff with a standardized script to help ease any patient hesitancy.
Finally, incorporate screening results into the EHR to ensure visibility to all staff members coming into contact with the patient and to facilitate data analysis at the population level.
Checklist for collecting SDOH and REGAL data
