We started writing about performance measurement and evaluation three decades ago, roughly around the time that Paul Decker, the president and CEO of Mathematica, began his illustrious career in program evaluation, data analysis and public sector policy and management. For the last 13 years, he has led one of the world’s largest public research organizations with the mission of its 1400 employees focused on improving public well-being.
Our interview with Decker was one of several we conducted with Mathematica researchers for our new book, The Promises and Pitfalls of Performance-Informed Management (Rowman & Littlefield, 2020). At the time of the interview we were working on our chapter about evaluation. Here is an edited version of the conversation.
Q. What were your first experiences like in the field of performance measurement and evaluation?
A. My initial exposure was the JTPA system, the Job Training Partnership Act. That was highly representative of the old world. The data weren’t readily available and the adjustments that were intended to turn outcomes into impact were simplistic, crude and not well understood by people in the field.
Cream skimming was a fact of life that you had to deal with there. If you wanted a high employment rate, you could select people who had a high probability of being reemployed. In that kind of program, self-selection is a factor, too. People who have more employment barriers are less likely to show up for training. It’s the nature of the beast.
The JTPA existed in a world where we were trying to make people’s lives better by intervening. The challenge was finding what happens in the absence of the intervention.
It was an unsatisfying system. It was expensive and it didn’t generate clarity. Methodologists had no confidence that it was really giving an accurate impression of performance.
Mathematica President Paul Decker
Q. What do you see as the biggest changes that have occurred over time since those JTPA days?
A. What’s changed over time is the greater availability of data. Programs back then didn’t have extensive administrative records. They now do and the data are likely to be better.
Measurement and evaluation are a lot cheaper than they used to be. You’re dealing with databases that get updated frequently so it’s a dynamic picture -- a flow of data, rather than a very static snapshot. The data is richer and timelier. That feeds into more of a continuous improvement approach. It better fits the need of program administrators.
We’re not where we want to be, but things are moving a lot faster than they did. The numbers part of the picture has really been transformed and needs to be more heavily integrated into the management processes.
Q. What are the big differences in how performance measurement and evaluation are used in different levels of government?
A. In recent years data have been leveraged in a more intensive way at the local and state levels versus the federal government. That’s because cities have the data on people’s everyday experience. How quickly is snow being cleared from streets in the winter? How effectively is crime reduced and are we becoming safer over time? By paying attention to the data, you can quickly have an impact on people’s lives and that’s of interest to mayors because they want to improve the lives of their constituents and also be reelected.
I’m talking primarily about the most data-forward cities. The average city probably hasn’t made as much progress. . But the most innovative cities – New York, Chicago, and Kansas City -- have been thinking about this, not only from the perspective of what data do we bring to bear but how do we make administrative changes in light of the data we see.
It’s harder at the federal level. I think the idea of integrating data into public management hasn’t been advanced as quickly as at the city level. They’re working that out, but it’s often the research office in a federal agency that does this, while the people in the program offices will need to rely more on data in the future.
States are kind of in between. They still don’t have the resources to fund the evaluations.) But I imagine that will change, too, as the cost of research comes down.
Q. Almost five years ago we wrote a column in Governing about the tension between evaluation and measurement. It was called, “Government’s Data-Driven Frenemies.” Individuals in both fields were trying to figure out how to improve programs, but they didn’t quite trust each other. Is that what you observed?
A. In the old world, we tended to have this attitude that on the one hand there was performance measurement and on the other there was evaluation.
But they weren’t completely separate. People understood that if you want to transform performance measurement, you have to think about the concepts that we think about in evaluation – about causality or what would have happened with the absence of the program.
I think frenemies is a fair characterization. On the performance measurement side, if you didn’t understand some of these concepts you could mislead people into thinking the program was doing something it wasn’t doing. You’d have positive outcomes, but you might have had those outcomes even without the program. There was skepticism about the ability to really conduct effective performance measurement without a recognition of evaluation principles. The critique in the 1980s and 1990s was that there was no way to get at impact except by using random assignment.
Q. Do you think the tensions between evaluators and performance measurement folks have dwindled?
A. I think more and more these two concepts become intertwined. They’re blending into each other and the idea of frenemies becomes passé.
All of this gets better with new technology and new data. But it’s a work in progress. Research offices work really hard on identifying causality and the impact of programs and do it in a way that’s relatively detached. Program administrators don’t have that luxury. They have to make the program work more effectively every day.
This is the nascent phase of the transformation – putting this more in the hands of public managers. And bringing the effort to analyze impact into a very central part of their day to day management. This will become a critical part of management, but it’s not embedded in systems yet. This is in transition.
Q. Do you see more impact evaluation being done by practitioners as opposed to research organizations or departments?
A. There are more and more parts that will be embedded as management responsibilities. In the past, if there were questions around ways to reduce participant attrition in a program, a manager might have relied on a participant attrition expert. Now, I think the first step for a program manager would be to see whether the data can be used to formulate a strategy to reduce attrition rather than simply relying on someone’s expert judgment.
This is definitely changing relationships. The history of Mathematica’s business is doing work on contract to research offices. We had relatively little contact with program managers 30 years ago. If there was an evaluation, public managers would be engaged in the discussion, but we weren’t working closely with them or providing guidance on how to use data.
Because of this transformation, we now can provide guidance directly to program managers. It’s transforming who our clients are. We’re not just focused on evaluation for policy makers or Congress on how to set policy based on the big evaluations we do. Program administrators become potential clients too as we answer a broad array of administrator questions with data and insights.
Given the changes that have happened, evaluation should be faster and cheaper. Rapid cycle testing is nested within that. You want to try things, evaluate and adjust. You don’t want to always have spent five years to generate evidence. You do it more quickly and spend a fraction of what you would have spent in the past.
Q. Do you have an example of the rapid cycle testing that you’re talking about?
In urban school districts, there’s a widespread issue of teachers having high absentee rates as the school year is winnowing down. With rapid cycle evaluation, you can test out different interventions in different schools and even use random assignment to do that. Not many program managers think in those terms yet.
That’s not changing any time soon, but I think that’s where we’re going.
Q. We’ve heard so much about evidence-based policy in the last few years. How much effect is this movement having?
A. I think the emphasis on evidence has had an effect to some extent. But it has to be embedded into the public management process so that program managers are using data as effectively as they can to manage their programs. It’s one thing to get people talking about evidence-driven program implementation or management, but it will take time to make it a reality. You must have the right skills and culture.
Having the right data is also an issue. That’s why I say we’re in the first half of the first inning of this game. A lot of things must happen for the game to progress. If you don’t have a culture of embedding this in day-to-day program administration, you aren’t thinking hard enough about the data that would drive change most effectively. A lot of the data that’s needed is some of the hardest data to get.