HIMSS: A self-contained approach to data warehouse design
ATLANTA--Richard E. Biehl, PhD, a data warehouse architect and consultant for Data-Oriented Quality Solutions, used his expertise in designing data warehouses--specifically, the star-schema dimensional warehouse plan--in a presentation Wednesday at HIMSS10
While star-schema warehouse plans are not new--Biehl explained that they were used for data processing outside of the field of healthcare for the past 20-30 years-- they are effective for data management because they put facts in the center of the model and organize other data into several dimensions around them, so no one dimension is optimized in favor of another, he said.
In the medical world, the center of the star can consist of lab test results, vital signs, text from nursing or radiology reports, or any other core facts, said Biehl, who noted that the dimensions are ways to “slice the data,” consisting of physical and logical, or sub-dimensions. “The number of dimensions physically is a question of your database optimization. The number of dimensions, logically, is a characteristic of your clinical domain and will vary by institution and the scope of your data,” he explained.
The challenge of building a data warehouse is not designing the database, asserted Biehl, who said building a good star-schema warehouse can be done in under an hour, but rather coming up with the right dimensions. Standardizing dimensions so that they will work across an entire institution is the solution to this issue, Biehl said. “Data warehouses only get big, never smaller. Over time, you’re going to load more data into your database and need to consolidate greater amounts of data.”
In order for an institution to figure what their dimensions would be, they must undergo a process called dimensionalization, said Biehl. This process looks at any patterns and similarities in the data. The rules for dimensionalization include not mixing any data into a dimension that represents different kinds of systems, including societal (hospital, nursing home), organismic (patients, caregivers) and mechanical systems (bed, room, unit). “You don’t mix persons, places or things in the same dimension, or things that you do as opposed to things you administer,” Biehl explained.
During his presentation, Biehl gave a demonstration of how he would go about creating dimensions. Pulling out countries, states, cities and addresses, he created a geopolitical entity. Next, Biehl selected all nurses, physicians and transport people and created a caregiver dimension. He continued on and created several more dimensions, but noted that the process becomes more challenging as more complicated data is added to the system.
“Most of the data in your clinical world is in your dimensions, not your fact table. Facts are just connections that pull those threads together,” said Biehl, noting that all information about the patient, including when they were admitted, what tests or imaging they had undergone or medications they were administered can be found in these dimensions and the dimensions serve as a way to query data. The star-schema warehouse model can build certain data queries in 15 minutes, as opposed to other models which could take up to weeks or months, Biehl said.
Across healthcare systems, Biehl noted that there is little variation in dimensions. In referencing his presentation, Biehl said, “With a little tweaking, I hope I have handed you a design for a warehouse that you can have up and running fairly quickly.”
While star-schema warehouse plans are not new--Biehl explained that they were used for data processing outside of the field of healthcare for the past 20-30 years-- they are effective for data management because they put facts in the center of the model and organize other data into several dimensions around them, so no one dimension is optimized in favor of another, he said.
In the medical world, the center of the star can consist of lab test results, vital signs, text from nursing or radiology reports, or any other core facts, said Biehl, who noted that the dimensions are ways to “slice the data,” consisting of physical and logical, or sub-dimensions. “The number of dimensions physically is a question of your database optimization. The number of dimensions, logically, is a characteristic of your clinical domain and will vary by institution and the scope of your data,” he explained.
The challenge of building a data warehouse is not designing the database, asserted Biehl, who said building a good star-schema warehouse can be done in under an hour, but rather coming up with the right dimensions. Standardizing dimensions so that they will work across an entire institution is the solution to this issue, Biehl said. “Data warehouses only get big, never smaller. Over time, you’re going to load more data into your database and need to consolidate greater amounts of data.”
In order for an institution to figure what their dimensions would be, they must undergo a process called dimensionalization, said Biehl. This process looks at any patterns and similarities in the data. The rules for dimensionalization include not mixing any data into a dimension that represents different kinds of systems, including societal (hospital, nursing home), organismic (patients, caregivers) and mechanical systems (bed, room, unit). “You don’t mix persons, places or things in the same dimension, or things that you do as opposed to things you administer,” Biehl explained.
During his presentation, Biehl gave a demonstration of how he would go about creating dimensions. Pulling out countries, states, cities and addresses, he created a geopolitical entity. Next, Biehl selected all nurses, physicians and transport people and created a caregiver dimension. He continued on and created several more dimensions, but noted that the process becomes more challenging as more complicated data is added to the system.
“Most of the data in your clinical world is in your dimensions, not your fact table. Facts are just connections that pull those threads together,” said Biehl, noting that all information about the patient, including when they were admitted, what tests or imaging they had undergone or medications they were administered can be found in these dimensions and the dimensions serve as a way to query data. The star-schema warehouse model can build certain data queries in 15 minutes, as opposed to other models which could take up to weeks or months, Biehl said.
Across healthcare systems, Biehl noted that there is little variation in dimensions. In referencing his presentation, Biehl said, “With a little tweaking, I hope I have handed you a design for a warehouse that you can have up and running fairly quickly.”