In this article, I am going to review how the Generate Rows tool in Alteryx can help prepare a data set with missing rows of data to trend the total number of patients in a hospital on a specific day while also providing information on the number patients admitted and discharged. As defined by Alteryx, the Generate Rows is used to create new rows of data at a record level which can create a sequence of numbers, transactions and dates.
The Business Case
Hospital leadership requests a trend graph that will tell them how many patients were admitted, discharged and occupied a bed on a specific day? They would like this information on one dashboard with the flexibility to change the date based on a single calendar day.
Lets take look at the fake dataset I created for this scenario. All data used in this demonstration is made up and does not represent any real hospital cases. It is a simple data set with a patient identifier such as a billing claim number, an admission date and a discharge date. The admission date is the day the patient arrived at the hospital for services. The discharge date is the day the patient left the hospital.
|Patient||Admit Date||Discharge Date|
How do we tackle this request? Your first inclination might be to trend the number of admissions by the admission date and the number of discharges by the discharge date. This technically would answer two of the three questions; however, it is limited because trending would be based on admit and discharge date and not a single date as requested by leadership. Each graph would be independent of each other as each date represents a different occurrence. It also does not answer the third question: how many patients occupied a bed each day.
How do we count the number of patients in a hospital on a given day with only an admit and discharge date for each patient? Our data set is a single row per patient id and does not contain a row for every day the patient occupied a bed. Since the data set has missing rows of information we cannot easily determine how many patients occupied a bed or were present on a given day. In order tackle this problem, we would need to insert a row for each day the patient was hospitalized. Adding a row for each day the patient occupied a bed allows the patient to be counted on that day when we trend the information using a single date field. We want the data format to look like this for each patient. In this example, the patient will have three rows one on each day the patient occupied a bed.
|Patient||Admit Date||Discharge Date|
Here lies our data scaffolding challenge. Data Scaffolding is a technique that can fill in the missing data points and create data structure for developing the desired visualization. There are various ways to pad in missing data points depending on the software tools available to you and the analysis being conducted. For this scenario, the Generate Rows tool in Alteryx will do the job perfectly! Let’s take this step by step.
Step 1: Connect the dataset to the Input Data tool
Step 2: Sort by Admit Date and Patient Number. It is important that the admit date be sorted in ascending order and the patient number be sorted in ascending order. The dates must be in sequence with each patient or the future steps will calculate incorrectly.
Step 3: When working with real data, there will be patients who are not discharged from the hospital. These patients are still occupying a bed. Therefore, create a filter that separates the discharged patients from the non-discharged patients. This will be needed when applying the Generate Rows tool.
Step 4: Drag a Formula tool onto the canvas and attach it to the Filter True anchor. Create a formula for DateTimeNow() and label it Now. This will be needed for our generate rows formula to populate the rows for patients who are not discharged yet. We are making the assumption the patient is still hospitalized since there is no discharge date. As a side note, my fake data set was created in November 2020 but I’m writing this blog in January of 2021. Hence why the Now field is January 2021. Under normal circumstances, this formula would be applied to real data in more current times.
Step 5: Generate the missing rows for the discharged patients and non-discharged patients. Drag two Generate Row tools to the canvas. Connect one to the Formula Tool and the other to the False Filter anchor. Your workflow should look like this.
The set up for the discharged patients and non-discharge patients is slightly different. Let’s set up the Generate Rows tool for the discharged patients first. Start by creating a new field and naming it New Date. This will be the single date that can be used in the visualization. The New Date field is also used to populate a date for each day the patient was hospitalized. In the Initialization Expression add the field admit date. This is the starting point for creating the rows. Next write a conditional statement that states the New Date field must be less than or equal to the Discharge Date. If the condition is true, additional rows will be generated until the condition is false. Lastly, create the formula to add a day for each time the condition is true. In this case, we are adding a day to the New Date field each time the condition is true for the patient.
The Generate Rows tool did exactly what we commanded. It produced a row for each day the patient was hospitalized. Patient 1 was in the hospital for three days and we have three rows for each day. In Medicare terms of counting patient days, CMS normally does not consider the last day as a day in the hospital since the patient was discharged at some point during the day. However, for our scenario, leadership would like the patient counted as being present on the day of discharge.
The only difference in the set up of the discharged patients generate rows tool and the non-discharged patients generate rows tool is the Condition Expression. Rather than stating the [New Date]<=[Discharge Date] change it to [New Date]<=[Now]. We are making the assumption that the patient is still hospitalized at the time the data set is generated, therefore using Now as a discharge date to populate the missing rows of data.
Step 6: Union the data together with a Union tool. Join both Generate Rows output anchors to the Union tool.
Step 7: Drag a formula tool onto the canvas and create the ADT fields of Admitted, Discharged, and Total. These fields will be used in the data visualization that will be created in Tableau.
To calculate the number of patients discharged, write an if statement to populate a value of 1 if discharged and 0 if not discharged. This formula will place a 1 on the day of discharge and 0 on all other days per patient. Make this field an Int16 data type. The formula is: if [Discharge Date]=[New Date] then 1 else 0 endif
To calculate the number of patients admitted, write an if statement to populate a value of 1 if admitted and 0 if not admitted. This formula will place a 1 on the day of admission and 0 on all other days for the patient. Make this field an Int16 data type. The formula is: if [Admit Date]=[New Date] then 1 else 0 endif
Lastly, create a formula for Total. Enter the value of 1 and make it an int16 data type. Now we have three new fields that can be used to sum the total number of patients, the number of admissions and the number of discharges per day.
In the image above, patient 1 has a 1 on the admit date of 11/01//2020, a 1 on the discharge date of 11/3/2020 and a 1 in each row for the three days. We are ready to visualize the trends by the New Date field. Your final workflow should look like the image below.
Step 8: Visualize the trends in Tableau. The New Date field acts as our single calendar date for trending the data. There is never a day when the hospital doesn’t have a patient, so no need to worry about any missing dates in the New Date field for this scenario. Drag the New Date field onto columns as a continuous pill set to day. Drag the measures Total, Admitted and Discharged onto Rows. All in one view, the leader can see how many new patients were admitted, how many were discharged, and the total number of patients throughout the day all possible with the Generate Rows tool and a single date called New Date for the trend.
Step 9: Continue to build out the dashboard for leadership. I’ll leave this part up to your imaginations.
I hope you learned a little more about the Alteryx Generate Rows tool and how it can be used with dates in general and in the healthcare industry. Applying other healthcare fields like diagnosis to the dataset, could allow for further analysis such as determining the number of new Covid-19 admissions as an example. Click on the link to access the Alteryx workflow used in this blog.
To learn more about data scaffolding with Tableau, I recommend Kevin and Ken Flerlage blog Creating a Data Scaffold in Tableau or Jonathan Drummey blog Introduce the Scaffold (scroll down the site).
A good business use case for the tool. Could you please attach datasets along with blogs as it gives us learners the chance to try our hands on the same case.
Thanks for the input. I added a link to the Alteryx Workflow at the end of the blog. You should be able to download the sample workflow.