The Slippery Slope between Hope and Fraud

When scientists socialize we often find ourselves talking shop like any profession, I imagine.  This weekend I heard of a young scientist who was fired by a biotech company. When control sample data differed from “the right answer,” the young employee added a mathematical “fudge factor” so that the control data – and all the associated sample data – would come out “right.” A coworker discovered the instrument was out of tune (I’ll spare you the technical details) and stumbled upon the young scientist’s flawed data analysis. The young scientist was warned and re-schooled on how to maintain the instrument and conduct the assay properly. Unfortunately a couple of months later, coworkers discovered the scientist was still fudging the data and so was fired.  Those of us hearing the sad tale were flabbergasted – did the employee really not understand the scientific method, or was this a dishonest character, or was the supervisor putting pressure on the young scientist to get data?

These days “Big Datasets” in biology and biomedical science are the next new thing. Thought leaders, pundits and excitable sorts are harking the beginning of the personalized medicine age.  Armed with an individual patient’s human genome data, the hope is that physicians can treat that person’s disease with the right drug at the right time. To be sure, progress is being made.  Most work to date correlates subsets of the genome (biomarkers or gene variants or expression profiles) with a particular disease or treatment regimen.  I’ve worked with these datasets in my previous two jobs. Luckily I had top-notch biostatisticians, instrument technicians and database administrators at my side. These datasets are big, complex, and as confusing and confounding as human biology. How easy might it be for a dishonest scientist to fudge or over-interpret this data?

Last night 60 Minutes delved into that question.  They featured the Anil Potti scandal.  Potti and his colleagues at Duke Medical School used Big Genomic Data Sets to guide a clinical trial for lung cancer.  It turns out that Potti manipulated the data underlying the clinical trial design (not to mention a series of high-profile publications).  Yes – that’s right – a dishonest scientist fudged data and then used that data to treat cancer patients (there is no WordPress formatting that can adequately convey my dismay!). Some patients’ families are suing the university, but Duke decided to cooperate with 60 Minutes as a cautionary tale.  Yes, Potti was fired but he’s still practicing medicine and treating patients somewhere in South Carolina.  Yes, Potti’s supervisor is contrite, but he assures us the patients involved were provided proper medical care.

This is a horrible story that damages the trust patients and the public put in research hospitals and science.  But we all bear responsibility for these stories when we hype a new technology before its time (Especially to patients – but also to Governors, voters and investors). We bear responsibility when we give scant attention to best practices or the “boring” infrastructure of scientific research – that is – the controls, the statistical analysis, the data management.  (The Duke Review Board IGNORED warnings from two respected biostatisticans!).  We bear responsibility when we tie career advancement or company goals to the “right” data or publication. I’m fairly certain the Potti Case will provide plenty fodder for scientific ethics classes and research review committees across the CSU this year.  It would be a mistake to analyze it as a case involving only one dishonest scientist.