Evaluating System Performance: Between a “ROC” and a Hard Place

How can engineers use a receiver operating characteristics curve to assess the functioning of a complex sensor system?



When Emerson wrote, “Build a better mousetrap, and the world will beat a path to your door,” he likely had no idea how complicated that simple idea could become. What constitutes “better” to today’s engineer is often a matrix of conflicting parameters and concerns ranging from functionality to marketing. The engineer must sift through this complexity and decide how well any new device can actually solve a problem.

This article discusses a complex sensor system that captures medical prescription errors using Raman spectroscopy. The “how well” question is answered with an applied analysis—the receiver operating characteristics (ROC) curve.

Raman spectroscopy

When monochromatic light passes through a material, it is scattered mostly at the same wavelength as the source (elastic scattering). However, depending on the medium, a small portion (about 1 photon per 10 million) of the light is also scattered into wavelengths shorter and longer than the source (inelastic scattering).


imageThe Centice PASS Rx smart sensor: The PASS Rx system uses Raman spectroscopy to verify the accuracy of pharmaceutical prescriptions.

imageRaman spectrum of lithium carbonate: The spectrum of lithium carbonate before and after calibration and processing.

Raman scattering, which is a form of inelastic scattering, occurs when a photon interacts with the vibrational or rotational energy of a molecule. The photon exchanges energy with the molecule in one of two ways. When the photon transfers some energy to the molecule, it is scattered at longer wavelengths (Stokes), and, when it gains energy from the molecule, it is scattered at shorter (anti-Stokes) wavelengths. Since Raman scattering relies on the interaction between the photon and a molecule’s polarization, it is rapid and its wavelength shift is not dependent on the excitation wavelength.

Raman spectra can be used as a fingerprint for a molecule, since each molecule scatters differently based on its particular structure and bonds. Analytical chemists use Raman to probe molecular structure, concentration and reactions, but the potential applications are endless, and usually only limited by cost, convenience and the knowledge required to analyze spectra.

Breakthroughs in technology have stimulated efforts to move Raman from the lab into the hands of new users. Recent improvements in CCD technology, gratings, filters, optical packaging, lasers and other components have all contributed to smaller, more rugged Raman devices that are well suited for new environments. However, one of the more interesting developments for Raman spectrometers in recent years comes from embedded algorithms that automate the analysis and interpretation of the spectroscopic information.

Users may want to know something about a sample but have no interest in its Raman spectra. For example, the operator may simply want to confirm that a substance is what he or she thinks it is. The simplicity of this binary answer comes at the expense of engineering and, since no device is perfect, there will be rare instances where the device is wrong. One challenge in creating such a device is in understanding its accuracy, as illustrated in the following practical example.

Prescription verification

Pharmacists face ever-increasing demands to process more prescriptions without sacrificing accuracy. As the population ages, the number of prescriptions dispensed in the United States is growing rapidly, estimated at 3.9 billion in 2009. Manual prescription verifications (visual inspections) are subjective and therefore prone to error. According to a 2003 study published in the Journal of the American Pharmacists Association, the overall error rate for dispensing pharmaceutical medications was an alarming 1.7 percent.

Prescription errors can have devastating consequences. According to the Institute of Medicine, at least 1.5 million Americans are injured by medication mistakes each year. Raman spectroscopy offers a solution: It can provide pharmacists with a quantitative verification process while prescriptions are being dispensed. However, pharmacists will only adopt a sensor that fits seamlessly into their workflow, covers their workload sufficiently, adapts to the ever-changing landscape of available drugs, and, most important, performs accurately. It is vital, therefore, for the engineer to understand the performance tradeoffs of the sensor system.


imageOverlapping population curves of two experiments on the classifier: The dark line represents a threshold value that has been chosen for illustrative purposes. Different colors represent the areas of each distribution that fall above and below this threshold value. Each area represents the probability of one of the four scenarios. For example, the red area represents the probability of catching a prescription error, while the orange shows the likelihood of a false alarm.

A company called Centice Corp. has developed its “PASS Rx” system, which uses Raman spectroscopy to verify that the pills in a bottle match the prescription label. The pharmacist scans the prescription’s bar code and places the bottle into the PASS Rx device. The device positions the bottle, acquires a spectrum, analyzes the spectrum, and assesses whether the contents of the bottle match the prescription label. To answer this question, the Raman spectrum is first processed before applying the classifier to make the final yes/no determination.


imageROC curve: A high-quality ROC curve (blue) shows good discrimination over the case of completely overlapping distributions, shown in dashed red.

The processing steps help to remove variation caused by the pill bottle’s spectrum and fluorescence as well as the differences caused by the size and type of the container, the kind of pill, and the quantity of medication. After gathering and processing the spectra, the classifier scores the measurement relative to a threshold. If the score significantly exceeds the threshold, the sensor may confidently say that the drug inside the bottle and the label match.

ROC curve analysis

Consider two experiments to measure the performance of the PASS Rx system. In the first, PASS Rx is repeatedly tested with correctly dispensed prescriptions. The classifier score for each test is calculated and compared to a threshold in order to assess whether there is a dispense (fill) error. The second experiment is similar; however, here the system is tested exclusively with prescription errors. (The actual error case experiments are complicated by different manufacturers and strengths for the same drug, but these details are not considered here.) Based on these two experiments, there are four possible outcomes for each test, as summarized in the table below. The rates (probabilities) of each outcome for our two experiments may be defined as:

  • Sensitivity = TP/(TP+FN) = Probability that a prescription error is caught.

  • Specificity = TN/(FP+TN) = Probability that a properly filled prescription is verified as such.

  • 1-Specificity = FP/(FP+TN) = Probability of a false alarm.

  • 1-Sensitivity = FN/(TP+FN) = Probability that a prescription error is missed.

Rates depend on the threshold level. In an ideal case, the two distributions shown in the figure above have very little overlap and the probability of receiving a correct answer from the device with a well-placed threshold approaches 100 percent (i.e., the sensitivity and specificity = 100 percent). However, if both score distributions were identical (with overlapping means), the chance that the pharmacist receives a false alarm would be equal to the chance that they catch an error no matter where the threshold was set!

The receiver operating characteristic (ROC) curve was derived to help the engineers view the overlap of these two distributions for all threshold levels in one convenient graph. It is computed by plotting the sensitivity rate versus the false alarm rate (1-Specificity) for each threshold level from minus infinity to plus infinity. As shown in the figure in the center of the page, in the case of zero discrimination (overlapping distributions), the classifier is a straight, unit-slope line (red line). For near perfect discrimination (very little overlap in the two distributions), the curve approaches the upper left corner of the graph, which represents the optimal threshold value (TP=100%, FP=0%).

The beauty of ROC curves is that they make no assumptions about the shape of the test distributions or the absolute level of the threshold. They are versatile and useful for any binary classifier, providing a means whereby different systems may be compared. Families of ROC curves may be presented on a single chart to graphically illustrate the change in performance for a range of other variables (e.g., pill quantity, laser power, exposure time, etc.).

Engineers can use ROC curves to set the threshold based on many factors, including nontechnical aspects such as user perception. For example, it is typically more important to the pharmacist that the false negative rate is very low, even if that means that dealing with more false positives (false alarms)—but not to the point where the pharmacist is continually plagued with repeat measurement or manual inspection requests. This delicate balance is helped tremendously by a thorough understanding of ROC curves. In general, ROC curves can guide engineers, customers and management to a good visual understanding of system tradeoffs.

Classifier scenarios

Brett D. Guenter is a principal engineer at Centice Corporation in Morrisville, N.C., U.S.A.

Publish Date:

Evaluating System Performance: Between a “ROC” and a Hard Place

How can engineers use a receiver operating characteristics curve to assess the functioning of a complex sensor system?

Become a member or log in to view the full text of this article.

OSA Members get the full text of Optics & Photonics News, plus a variety of other member benefits.

Publish Date:

Add a Comment