A Pilot Study on the Evaluation of Clinical Chemistry Laboratory Test Performance using Six Sigma Metrics

Pier Angeli Medina, Jenny Matibag, Sarah Jane Datay-Lim, Elizabeth Arcellana-Nuqui

The Medical City, Pasig City, Philippines

ISSN 2507-8364
Printed in the Philippines.
Copyright© 2019 by the PJP.
Received: 5 October 2019.
Accepted: 4 December 2019.
Published online first: 5 December 2019.
Corresponding author: Sarah Jane L. Datay-Lim, MD


Introduction Six sigma has been used over the years, initially in manufacturing industries to improve quality by reducing the number of wastes and defects. In the laboratory, it can also provide measurement of quality using the sigma scale.

Objectives The main objective of the study is to evaluate the performance of tests in two chemistry analyzers using the six sigma scale.

Methodology A total of twenty (28) tests were evaluated on two Abbott Architect c8000 chemistry analyzers from September 2014 to July 2019 using results of quality control mean, coefficient of variation, bias and total allowable error to compute for the six sigma value. Both level one and level two third party quality controls were included in the evaluation.

Results OResults of the study showed the tests that were >6 sigma for both levels 1 and 2 throughout the 5 years. Di-Bil, CK, HLD, TG and UA were consistently >6 sigma for one machine while CK, Di-Bil, HDL, Mg, TG and UA were consistently >6 sigma for the other. Level 1 and Level 2 sigma scores were noted to be incongruent in some analytes as follows: ALB, ALT, K, TP for one instrument and ALB, ALP and AST for the other instrument. Electrolytes Ca, Cl, and Na were generally low (<3.0) for both machines with the exception of K which showed better sigma scores.

Conclusion Using six sigma metrics allowed the laboratory to evaluate the performance of the chemistry tests objectively. Tests that are >6.0 sigma signifies world class performance and entail application of fewer Westgard rules with fewer number of runs while those that are <3.0 need method improvement or more stringent quality control measures. The findings show that we can use this for monitoring and performance evaluation for quality improvement.

Key words: bias, laboratory, quality control, quality improvement, six sigma, Westgard rules


Laboratory results are a keystone in the diagnostics and therapeutics of medicine. It is therefore important that measures are taken to assure the quality of processes that generate these results. Running control materials is a vital element to ensuring that all the machines are working at optimal levels before any of the patient results are released. Control results are normally plotted on a Levy-Jennings chart in order to easily visualize if they are within acceptable range. Mr. James O. Westgard established the “Westgard rules”, which are generally accepted guidelines applied to the Levy-Jennings charts to make decisions on the reliability of results.[1] However, laboratories are still faced with challenges of false rejection and inappropriate use of QC rules

Six sigma was first developed at Motorola in the 1980’s to improve quality and reduce cost by eliminating defects. It was developed through statistical measurements and benchmarking using the DMAIC (Define, Measure, Analyze, Improve and Control) principle.[2] Since then, it has been applied not only in the manufacturing industries, but also in the medical field. It is particularly suitable in the laboratory where variation can be measured to predict performance instead of counting the defects.[3] Most studies involving the use of six sigma in the laboratory have shown benefit of using this method as part of the approach to quality management.[4],[5],[6],[7] Six sigma is a powerful tool for assessment of test performance in order to apply appropriate Quality Control (QC) rules and other recommendations such as number of runs and levels.

Hence, we analyzed internal quality control data of two (2) Abbott Architect c8000 series in the chemistry section of our laboratory from August 2014 to June 2019 to evaluate the performance of clinical chemistry analytes on the six sigma scale.


Methods and sample

This is a descriptive study of all internal quality control samples of clinical chemistry tests done at The Medical City Department of Laboratory Medicine and Pathology, Ortigas Pasig City, Philippines, from August 2014 to June 2019.

All the Quality control data were extracted from two (2) Abbott Architect c8000 series clinical chemistry analyzers (Abbott Diagnostics, Chicago, IL, USA) per year. The machines are labeled “Instrument A” (c803024) and “Instrument B” (c803029).

Both Level 1 and Level 2 control data of the following analytes were included: Albumin (ALB), Alkaline Phosphatase (ALP), Alanine Aminotransferase (ALT), Amylase, Aspartate Aminotransferase (AST), Total Bilirubin (Bil-T), Direct Bilirubin (Bil-D), Calcium (Ca), Chloride (Cl), Total Cholesterol (Chole), Creatine Kinase Total (CK), Complement 3 (C3), Carbon Dioxide(C02) Glucose, Gamma- glutamyl Transpeptidase (GGT), High Density Lipoprotein (HDL), Iron, Lactate (Lac), Lactate Dehydrogenase (LDH), Low Density Lipoprotein (LDL), Lipase (Lip), Magnesium (Mg), Phosphatase (Phos), Potassium (K), Sodium (Na), Total protein (TP), Triglyceride (TG), Uric acid (UA), Blood Urea Nitrogen (BUN), Creatinine (Crea) and Unsaturated Iron Binding Capacity (UIBC). Quality control materials used were Level 1 and 2 Lymphocheck Biorad Assayed Chemistry Control (Bio-Rad, Marnes-la-Coquette, France) of the same lot number for a defined period of time (lyophilized).


The sigma values were then determined for each test using the formula:

Sigma metrics (σ) = Total allowable error (TEa%) – Bias %/ Coefficient of variation (CV%)

Precision and Bias

The degree of precision can be determined through the computation of the coefficient of variation (CV%). It can be computed from our internal quality control (IQC) using the formula:

CV % = Standard of Deviation (SD)/ Mean * 100

On the other hand, Bias was computed using our data from External Quality Assurance Scheme (EQAS) using the formula:

Bias % = [ (Laboratory mean – Peer group mean)/ Peer group mean ] * 100

Total allowable error (TEa)

TEa combines both imprecision and bias of a method to calculate the impact on a test result and gives the tolerance limits of each analyte in the laboratory. There are different available TEa goals such as CLIA (Clinical Laboratory Improvements Amendments)[8] from the US, Rili BAK (German Medical Council for the Quality Assessment of quantitative Analyses in Medical Laboratories, 2008 version; the inter-lab or “Ring Trials” values, in contrast to the intra-lab values) and the Ricos biological variability database (desirable target values, in contrast to the minimal or optimal target values).[9] For this study, we used TEa from different sources (Table 1).

Table 1. Total allowable error (TEa) used to compute for six sigma derived from CLIA, Ricos BV and CAP


Monthly sigma was monitored since the start of the study and the cumulative yearly sigma was also calculated and summarized for the chemistry analytes for each of the chemistry instruments (Tables 2 and 3).

Table 2. Sigma metrics for Instrument A (c8000) from 2015-2019

Table 3. Sigma metrics for Instrument B (c8000) from 2015-2019

For both instruments, there were generally more analytes with sigma greater than 6. Instrument A, the main chemistry analyzer of the laboratory, had the following percentage of tests that are > 6 sigma: 44.6% (2015), 52.2% (2016), 51.8% (2017), 56.7 % (2018), and 18.3 % (2019). Instrument B on the other hand had the following percentage of tests that are more than six sigma as follows: 47.2% (2015), 25% (2016), 64.6 % (2017), 55.6 % (2018) and 18.42 % (2019). There are also noted differences between the two machines in terms of sigma performance percentage (Figure 1).

Figure 1. Sigma Performance of Chemistry tests in both machines showing the number of tests under the different sigma categories from 2015 to 2019.

From 2015 to 2019, the percentage of tests that were >6 sigma, <3 sigma and those that fall in between show variations in number with 2016 and 2019 having the highest number of tests <3 sigma at 32.5 % (2016) and 35% (2019) for instrument A and 33.3 % (2016) and 42% (2019) for instrument B. All other years showed predominance of tests that were >6 sigma. There were also years with predominance of tests computed between >3 to <6 at 55.4% (2015) and 46.7% (2019) for instrument A and 47.2% (2015) and 41.7% (2016).

Tests that were >6 sigma for both levels 1 and 2 throughout the 5 years were noted. Di-Bil, CK, HLD, TG and UA were consistently >6 sigma for Instrument A. CK, Di- Bil, HDL, Mg, TG and UA were consistently >6 sigma for instrument B. Level 1 and Level 2 sigma scores were noted to be incongruent in some analytes as follows: ALB, ALT, K, TP for Instrument A and ALB, ALP and AST for Instrument B. Electrolytes Ca, Cl, and Na were generally low (<3.0) for both machines, with the exception of K which showed better sigma scores.


Six sigma means that six sigmas or standard deviations of process variation should fit within the tolerance limits. The measure of process performance is the number of defects per million (DPM) products or defects per million opportunities (DPMO).[2] Hence, an analyte that is computed to be six sigma is “world class” with a 3.4 DPM only, reflecting very few defects or errors. As sigma increases, consistency and steadiness of a test improves, which can reduce operating costs and wastes, and at the same time increase levels of customer satisfaction.[4]

The tests that were computed to be >6 sigma were identified (Tables 2 and 3). They require less stringent quality control monitoring using fewer QC runs with lower false rejection rates through application of selected Westgard rules. Highest percentages of tests >6.0 sigma were noted at 56.7 % (2018) for instrument A (Figure 2) and 64.6% (2017) for instrument B.

Figure 2. OPSpecs chart of Chemistry tests in 2018 for Instrument A showing the analytes under the different sigma categories: (A) Level 1 controls. (B) Level 2 controls. Routine operating specifications are presented in the form of an "OPSpecs chart," which describes the operational limits for imprecision and inaccuracy when a desired level of quality assurance is provided by a specific QC procedure. The allowable inaccuracy based on computed bias on the Y axis are plotted against the allowable imprecision on the X axis for each analyte. Note that those points on the left-most corner are those with the six sigma or “world class performance” while those that fall on the far right and even outside the chart are those with poor performance, with sigma less than 3.0. This helps visualize the performance of tests and make decisions regarding the QC plan.

Three (3) sigma on the other hand, is the minimum acceptable quality at 66,807 DPM. Anything that is 3 sigma or below requires maximum QC or method improvement. More stringent quality control should be undertaken for these processes, such as application of more Westgard rules, more frequent monitoring, and additional QC runs for the day.

The observed difference in the sigma score percentages reflect the inherent nature of laboratory testing in a chemistry laboratory. Tests are affected by numerous factors such as the materials used, preventive maintenance schedules, equipment, and staff competency. Because quality is a continuous process, the sigma metrics represent only the performance at a given period of time. Sigma may change depending on the quality improvement strategies employed and the current conditions in the laboratory or equipment, among other factors. The observed improvements in the number of tests >6 sigma from 2015 to 2017 and 2018 can be attributed to the increased frequency of water filter changes, more intensive staff training. The fluctuations in the number of tests <3 sigma on the other hand, reflect the recorded periods of poor water supply, problems with room air conditioning, and instrument maintenance issues. The sigma of analytes in 2019 which showed significantly lower number of tests >6 sigma reflected the water crises which happened in the area together with issues in room temperature (air conditioning).

Quality control policy of decreasing QC run for the tests that are >6 sigma as recommended by Westgard2 was started May 2017. This policy of decreasing the number of runs to once a day for tests >6 sigma enabled our staff to focus on the problematic ones and improve efficiency in the laboratory.

Certain tests had constantly good performance (>6 sigma both levels), such as Di-Bil, CK, HLD, TG and UA for instrument A and CK, Di-Bil, HDL, Mg, TG and UA for instrument B. This consistency reflects the stability of the methods and its robustness despite other factors that more easily affected the other tests.

Electrolytes such as Ca, Cl, and Na were noted to have a sigma of <3.0 throughout the years. This is most likely due to the fact that the biological variation and total allowable error are very narrow for these tests. The coefficient of variation and bias for these tests were consistently very low, but overall sigma is <3.0 due to narrow TEa. Since sigma is computed with an equation, the variables play a role in its computed value. This brings to light the need to also look into computational variables when investigating poor sigma performance, as we do not want to cause unnecessary wastage of time resources, or manpower due to cause false rejection.[7]

Some tests reported significantly different sigma scores of its Level 1 and Level 2, one below or near 3.0 and the other >6.0 on certain years. Notable examples of such are Albumin and Bilirubin total for Instrument B (Table 3 and Figure 3). Albumin Level 1 sigma was 9.12 and Level 2 was 2.4 in 2018 at instrument B. This occurred in different tests throughout the years. It may be attributed to the methodologies having different detection performance at high and low levels. According to some studies, wide variations in sigma values for both the QC levels must be evaluated further, especially the method, and more strategies must be implemented to decrease or remove the discrepancy.[4] The performance of the different levels cannot be averaged and must be addressed individually. This may also be the explanation for the difference in the performance of the two machines even if it is of the same brand and model.

Figure 3. for levels 1 and 2 of select analytes (Instrument B). (Bilirubin total level 2 not available on instrument B in 2015).

Six sigma metrics provides a standard framework for measuring analytical quality but there are also issues with regards to the computation. It is said that one of its weakness is the bias which is usually based on the interlaboratory peer group comparison using either third party controls or manufacturer controls. The controls may not be commutable and so the bias may only be relative. When we participate in EQA, we are compared with our peers and there are some who argue that peer group may not be sufficient to determine analytical quality.[10] According to studies,[11],[12] realistic estimates of assay bias/ trueness require metrological standardization of all field assays and analysis of trueness controls. However, this may be difficult to apply because the gold standard reference materials are not always readily available for the clinical laboratories and likely too costly for routine use.

Source of TEa to compute sigma is a major factor to consider. One study demonstrated the impact of this by comparing the different common sources of Tea: biological variability, CLIA and RiliBAK.[11] They concluded that the most stringent was the biological variability but may not be appropriate for all tests. They recommended that laboratories choose TEa values from different sources which maybe the most appropriate for individual assays, as what was performed in this study.

Despite of the limitations, six sigma metrics may give laboratories a better understanding of the performance of their tests. This tool, in combination with a rational QC design for each analyte, can improve quality and reduce waste.[2]


In conclusion, computation of six sigma metrics allowed us to evaluate the performance of our chemistry tests on the six sigma scale. We were able to identify which are good performers and those that need monitoring and improvement. Tests that are >6.0 sigma require fewer Westgard rules and QC runs while those that are <3.0 sigma require more stringent quality control measures such as more Westgard rules application and QC runs. We recommend that six sigma metrics may be added to current quality improvement programs of the laboratory.


The authors thank the following: Mr. Sten Westgard, Ms. Ma. Lourdes Gatbonton, RMT, Ms. Leilani Cureg Soriano, RMT, Ms. Aileen Damasing, RMT and Abbott Diagnostics.


All authors certified fulfillment of ICMJE authorship criteria.


The authors declared no conflict of interest.




[1] Westgard JO. Basic QC practices, 3rd ed. Madison, WI. Westgard QC, Inc., 2010.

[2] Westgard JO. Six sigma quality design and control. Madison, WI. Westgard QC, Inc., 2006.

[3] Lippi G, Plebani M. A six-sigma approach for comparing diagnostic errors in healthcare- where does laboratory medicine stand? Ann Transl Med. 2018; 6(10):180. PubMed PubMed Central CRossRef

[4] Singh B, Goswami B, Gupta VK, Chawla R, Mallika V. Application of sigma metrics for the assessment of quality assurance in clinical biochemistry laboratory in India: a pilot study. Ind J Clin Biochem. 2011;26 (2):131-5. PubMed PubMed Central CrossRef

[5] Chaudhary NG, Patani SS, Sharma H, Maheshwari A, Jadhav PM, Maniar MA. Application of six sigma for the quality assurance in clinical biochemistry laboratory – a retrospective study. Int J Res Med. 2013:2(3):17-20.

[6] Lakshman M, Reddy BR, Bhulaxmi P, et al. Evaluation of sigma metrics in a Medical Biochemistry lab. Int J Biomed Res. 2015;6(3):164-71. CrossRef

[7] Modi N, Shah T. Application of six sigma test in clinical biochemistry laboratory. Int J Res Med. 2017;6(2):75-8.

[8] CLIA Requirements for Analytical Quality. Available at:

[9] Ricós C, Alvarez V, Cava R Current databases on biological variation: pros, cons and progress. Scand J Clin Lab Invest. 1999;59(7):491-500. PubMed CrossRef

[10] Friedecky B, Kratochvila J, Budina M. Why do different EQA schemes have apparently different limits of acceptability? Clin Chem Lab Med. 2011;49:743- 5. PubMed CrossRef

[11] Hens, K, Berth M, Armbruster D, Westgard S. Sigma metrics used to assess analytical quality of clinical chemistry assays: importance of the allowable total error (TEa) target. Clin Chem Lab Med 2014;52(7):973-80. PubMed CrossRef

[12] Guo X, Zhang T, Gao X, et al. Sigma metrics for assessing the analytical quality of clinical chemistry assays: a comparison of two approaches: electronic supplementary material available online for this article. Biochem Med (Zagreb). 2018;28(2):020708. PubMed PubMed Central CrossRef

Disclaimer: This journal is OPEN ACCESS, providing immediate access to its content on the principle that making research freely available to the public supports a greater global exchange of knowledge. As a requirement for submission to the PJP, all authors have accomplished an AUTHOR FORM, which declares that the ICMJE criteria for authorship have been met by each author listed, that the article represents original material, has not been published, accepted for publication in other journals, or concurrently submitted to other journals, and that all funding and conflicts of interest have been declared. Consent forms have been secured for the publication of information about patients or cases; otherwise, authors have declared that all means have been exhausted for securing consent.