A mathematical model of hospital length of stay
Lee, Kiat Haw.
Date of Issue2011
School of Computer Engineering
Hospital length of stay (LOS) is often used as a reliable proxy for measuring the consumption of hospital resources. There are various significant factors that will influence the probability of the LOS of a patient. This project seeks to build a mathematical prediction model to generate this probability based on these significant factors, which are observed variables, by investigating their relationship towards LOS and its distribution. Such a model would be the basis for a robust estimation of resource consumption; it will also assist the strategic planning of hospital facilities. The project was carried out primarily using the R programming language and environment for statistical computing. The dataset studied is the entire patient population admitted to Singapore General Hospital (SGH) in the period from 2004 to 2007, which is further segregated into major clusters of Diagnosis-Related Groups (DRG) and individual DRGs. A competing risk model analysis was done on all datasets to uncover the correlation between various patient attributes and LOS, followed by a conditional independence test to map a causality network of those attributes, which is represented by a Directed Acyclic Graph (DAG). The result is a Bayesian classifier that further segregates the datasets into classes or cohorts of patients having similar attributes. The empirical distribution of LOS is established to be highly skewed with a heavy right tail. This makes the applications of simple statistics, such as averaging, to LOS for measuring and planning of hospital resources unrealistic as it is easily displaced by outliers. The Coxian phase-type (PH) model, a special type of Markov Chain, was chosen to model the distribution of different classes and cohorts of patients. Several sets of parameters are generated for major DRG groups. The prediction model employs these parameters according to weighted probabilities of outcome generated by the Bayesian classifier to produce a PH distribution for LOS given the inputs. Finally, a comparison of predicted and empirical values for selected datasets will be done in terms of the sum of all errors squared (difference between predicted and empirical values). The accuracy of the model is gauged by observing this sum of all errors squared as the number of known patient variables increases.
DRNTU::Engineering::Computer science and engineering
Final Year Project (FYP)
Nanyang Technological University