The need for more rigor in AusAID’s project evaluations

Aid effectiveness has become the mantra of the day.  Aid donors demand it, program implementers profess to embody it and consultants and companies make money assessing it.  AusAID is all about it, even announcing an Independent Evaluation Committee and a Results Framework as part of the aid budget.

Within an organisation tackling aid effectiveness takes many forms.  There’s the monitoring which at a basic level involves carefully setting SMART (Specific, Measurable, Achievable, Relevant and Time-bound) objectives and then gathering data periodically to meet targets.  Then there are evaluations which are a completely different beast altogether.

Evaluations can be designed by an independent body; the Independent Evaluation Groups of the World Bank and Asian Development Bank are such examples.  For AusAID, this function is performed by the Office of Development Effectiveness (ODE).

Similarly to the Multilateral Banks, ODE undertakes evaluations of country programs and sectoral evaluations in areas such as public sector reform, health, education and water and sanitation.  However, unlike the multilaterals, there appears to be little in the way of independent evaluations at a project level.  The recently released AusAID ‘Blue Book’ relating to the budget shows this may change with ODE tasked with undertaking program level evaluation.  Any omission of such evaluations is worrying if Jeffrey Sachs is to be believed; “if someone is really serious about the aid issue they should take a micro-perspective” to evaluation.

Evaluations can also be undertaken by project teams themselves (as distinct from ODE) with support from internal or external evaluation experts.  Unfortunately, AusAID project evaluations, are (to the best of my knowledge) all mid-term reviews or ex-post (i.e. undertaken after the program has completed) which could be one symptom of evaluations being an after-thought rather than an institutionalised and integral part of the learning process.  In any case, what’s missing is evidence that an evaluation has been designed ex-ante (i.e. before the project was implemented).  While country and sectoral evaluations are difficult to design in advance, this is not the case for project evaluations.

Why is this important?  Consider a project intervention that involves providing text-books to students in a poor region of Pakistan.  Assume we calculate the test scores of students before the project starts and notice after it finished that student scores have improved.  Is that a big tick for our aid program?  Only if we can successfully attribute the increase in scores to the text-books.  What if the increase was due to these students receiving extra tutoring or having access to a sudden influx of higher qualified teachers?  Also what if these students were unusually intelligent. If they weren’t, would extra text-books have made a difference?

By considering an evaluation ex-ante, a technical expert can incorporate what us evaluators like to call counterfactuals.  Moreover, we can randomly assign a project intervention (eg: text-books) to some schools in the district and not to others.  Assuming a large enough sample size, the average school that receives the project should have the same characteristics as the average school that doesn’t receive the project.  Any difference after project completion can, on average, be attributed to the intervention (i.e. the text-books).

The above example is a very simplistic example of a randomized evaluation.  In many cases such evaluations are inappropriate or impractical.  They require large sample sizes, regular monitoring and specific interventions (for example, they may not be suitable for governance programs).  They have been critisied for not being universally applicable and for focusing too much on quantitative analysis.  Assessing long-term impacts could also be problematic. Most randomized evaluations are undertaken just after project completion.  Thus they are likely to estimate average quantitative outcomes rather than impacts.

I agree with these criticisms of randomized evaluations and propose that where practical, the most rigorous evaluations of outcomes have to follow a mixed methods proscription that incorporates a randomized component and a battery of qualitative tools.  Let’s call such evaluations the ‘platinum standard’.  A randomized evaluation (the gold standard) by itself is unlikely to be enough; just as a mixed methods approach without a randomized trial is unlikely to be enough.

Following an ex-post evaluation approach with largely qualitative analysis (similar to the approach that AusAID takes for most of its projects) can overstate aid effectiveness.  To be fair, very few organisations undertake randomized evaluations on an institutional and large scale basis.  However they can be done.  The MIT Poverty Action Lab has undertaken more than 300 randomized evaluations, the World Bank has a separate unit dedicated to just doing randomized evaluations and DFID and USAID also commission some randomized evaluations.  These organisations can go one step further by following a platinum standard prescription combining randomized evaluations with qualitative analysis.

I’ve been informed AusAID is working with 3IE to develop a randomized control trial of a program/project in Indonesia.  This is commendable and important; but a token evaluation is not enough.  If AusAID wants to be the best donor possible, it needs to administer platinum standard evaluations in at least some (not one!) of its projects.  The following are possible (albeit overly simplistic) steps forward for evaluators involved in country programs (as distinct from ODE):

  1.  Have an evaluation expert investigate AusAID’s pipeline projects and select all projects that are appropriate for platinum standard evaluation.
  2. Randomly select projects for such an evaluation.
  3. Select an external partner to implement the platinum standard evaluation.

The third point is important as the successful completion of a platinum standard evaluation requires specialist and technical knowledge that may be unavailable within AusAID on a regular basis over the life of the project.  As such, for selected projects, AusAID should consider outsourcing evaluations to universities or institutions focused on such designs to limit risk of non-completion.  Clearly this approach should be cost-effective and as such dependent on the overall project cost, the possible learning outcomes and relevance to future projects.  With a aid program valued at over 4 billion dollars, surely there is more than just one such project as a candidate for platinum evaluation.

Why should AusAID take this approach?

  1.  AusAID has a goal of being the best donor possible.  This cannot be accomplished without appropriate assessment of aid effectiveness.  To accomplish this, undertaking ex-post only evaluations will not suffice.
  2. Taxpayers deserve to have the most rigorous understanding of ‘bang for their buck’.
  3. Without the existence of appropriate counterfactuals, project impact may be overstated.
  4. Platinum standard evaluations can be presented as ‘flagship’ studies by AusAID.
  5. Influential parliamentarians are increasingly asking for the use of randomized trials (see this post by Dr. Andrew Leigh, MP).

Assuming technical expertise exists, a key difficulty is convincing partners to undertake such platinum standard evaluations.  Partner concerns may be warranted; for example, it may be impractical to administer an experimental design.  More often than not, partners do not believe the value of such rigorous evaluations outweigh the costs of potential negative results and the additional administrative burdens they may impose.  Moreover, evaluators are fighting against strong incentives from project managers against undertaking rigorous evaluations.

Without at least some mixed method evaluations incorporating experimental designs, it is difficult to argue causal impact at a high level of credibility.  AusAID, by holding the purse strings, can request that such evaluations are administered (where practical) across specific projects.  Similarly, for organisations receiving core AusAID funds, AusAID can request that a certain number of such evaluations are undertaken.  The numbers promised and delivered can be tracked on a regular basis.

To be clear, platinum style evaluations which incorporate various qualitative techniques and a randomized quantitative component are difficult to undertake and are impractical in the majority of cases.  But they are possible.  At a minimum, AusAID should consider institutionalising platinum evaluations into a small percentage of its programs.

Dinuk Jayasuriya is a Post-Doctoral Fellow at the Development Policy Centre, Australian National University.  Previously, he was a Monitoring and Evaluation Officer with the World Bank Group.

image_pdfDownload PDF

Dinuk Jayasuriya

Dr Dinuk Jayasuriya is a Director of two Sri Lankan-based companies. He was previously employed by the World Bank Group and was an academic at the ANU Development Policy Centre.


  • Interesting post, Dinuk. I agree on the need to plan evaluations more carefully to find the right approach for each intervention. Also, that more care needs to go into planning to ensure that existing knowledge is incorporated into the interventions.

    But also important is the development of a system that ensure real independence. Aid is no longer an ‘a-political’ issue that voters are unaware and interested in. Efforts to assess the quality of Australian aid will be undermined if this is not done properly.

    Australia should therefore be careful of avoiding the situation we find in the UK where DFID is the main client of the very same consultancies, NGOs and think tanks that are called to evaluate UK aid. KPMG, for example, manages the aid watchdog but also implements hundreds of millions of Pounds-worth of projects.

    This clientelistic approach means that think tanks like ODI are also now involved in projects as implementing agents for iNGOs and consultancies (such as PWC and KPMG) which makes their oversight roles impossible. And the same is true with smaller consultancies from communications to social development which often find themselves working with organisations that they are also evaluating. 3ie itself, supposedly the enforcers of absolute certainty, is not free from this.

    True independent voices are few and unpopular.

    This situation is not helped by the roles played by new foundations like Gates or iNGOs in using researchers to advocate for their own interests (see, for instance, Gates’ development progress work) which goes as far as funding influential media outlets like the Guardian for the same purpose.

    The consequence is a system with few (if any) lines of accountability; one in which all participants are clearly benefiting from the status quo and the conclusion that ‘more aid is good’. The public is beginning to react to this and, unless, important changes are made (and many will be big half-baked PR jobs, unfortunately), the baby will be thrown away along with the bathwater.

    I think the fault here lies mainly with some large bilateral funders such as DFID that have failed to recognise that different organisations play different roles in the aid sector and that their contributions demand certain degree of specialisation and even protection. A system in which consultancies, research centres, think tanks and NGOs are all expected to compete and collaborate with each other can only lead to uncomfortable and dangerous conflicts of interest.

    Conflicts that are incompatible with the demand for rigour and transparency in project evaluations.

    Australia would do well to avoid this muddling of roles. It should attempt to strengthen independent research communities with evaluation expertise separate from those tasked with implementing aid policy. Only this will allow Australian to hold its Aid industry to account.

    • Thanks Enrique for your post.

      It’s a hard balancing act, especially when there are a small pool of organisations with the skills and resources to both implement and evaluate projects. In the private sector, there are four major audit firms that audit most if not all the large multinational companies. However there are regulations in place to ensure that the same audit firm cannot provide advice on internal controls while also undertaking the external audit. As I recall there are also regulations in place to prevent one audit firm from auditing the same company for more than 5 consecutive years. Perhaps aid donors could consider similar regulations (if they haven’t done so already).

  • Thanks Chris for your detailed comments. I agree that ODE forms only part of the evaluation team and indeed the article is directed to AusAID and not specifically ODE. I also agree that evaluation has to be tailored to methodologies that are suitable to those needs (and RCTs are not suitable in many cases). That said, where practical and cost-effective, an RCT is generally considered the most rigorous quantitative technique. The key way any other quantitative approach would be superior to an RCT is if available observable data is very highly correlated with unobservable characteristics (of units in the sample) or if unobservable characteristics are unlikely to influence outcomes or participation – both which don’t seem to happen widely in practice. Apart from the above point, the debate focuses largely on the disadvantages of RCTs (which certainly exist), and less on the disadvantages of RCTs relative to other quantitative techniques. In the large majority of cases where practical and where the benefits outweigh the costs, RCTs represent the most appropriate quantitative approach. Combine that with the most appropriate qualitative component, and it would provide a strong basis for causal inference – hence the term platinum standard. Unfortunately only a small percentage of projects will lend them to this type of scrutiny and hence other evaluation approaches to provide rigour, quality and validity are more practical.

    I am encouraged that AusAID are looking at an array of ex-ante evaluation approaches which can only improve on the good steps AusAID has taken to make performance standards stronger. As to your last point, I can only hope that development effectiveness is measuring value for money and feeding into new programs and projects.

  • Dinuk,

    I am coming to this conversation late having just returned from the field, but you raise some important points that deserve exploration. First, I am not sure how helfpul it is to put ‘platinum’ or even ‘gold’ labels on approaches in evaluation. As a professional evaluator, I have always felt the best approach is to think carefully through what you are trying to achieve with your evaluation and then choose and explore the methodologies that are going to suit those needs. The whole issue of labels tends to open debates around standards rather than focusing on rigour, quality and validity. Some nice responses on this recently in the Journal of Economic Literature from Ravallion and Rosenzweig in their reviews of Banerjee and Duflo’s book “Poor Economics”.

    Second, the current performance and evaluation policy in AusAID allows programs significant freedom in choosing their own approaches to performance tracking. Ex-ante evaluation is encouraged and is increasingly being adopted by the more performance oriented parts of the agency (admittedly it is often the better funded country programs – Indonesia, PNG and Philippines). There are also evaluators working with a number of AusAID teams and they are regularly having conversations about pipeline evaluative activities. It is often overlooked that ODE only forms one part of the quality system in the agency and there is a whole team in the Program Effectiveness and Performance Division tasked with driving the evaluation policy. The real challenge for this section is driving policy change alongside adequate and supported cultural reform of an agency filled with Development Generalists.

    Which brings me to my third point. The main constraint in taking on this approach has always been expertise rather than cost or willingness. AusAID has few professional evaluators in its ranks and even fewer individuals capable of getting their heads around an RCT or quasi-experimental evaluation design. The field of evaluation in Australia is heavily qualitatively focused and the ability to draw on a pool of quantitative evaluation experts that understand the development context is not straight forward. As I understand it, AusAID is having conversations with J-PAL and others about running this type of evaluation, but there are all sorts of constraints and challenges that need thinking through before launching into the misguided policy (see recent USAID proclamations) of believing independent evaluations will be the answer. You suggest universities as a possible partner and there is value in better utilising these institutions. However, universtities have often had their own issues with being involved in these operational undertakings. Contrary to popular belief, my experience has been that AusAID is very open to starting a conversation with groups that could undertake this type of work and the political barriers are often overcooked.

    Thanks for raising the issue as I think it is an important component of the development effectiveness debate. What I think is even more important though is raising the whole issue of where the issue of effectiveness and value for money is going. There are some important aspects of the recent AusAID agency results framework that have huge ramifications for how the sector works and this deserves further discussion.

    Christopher Nelson is a Monitoring and Evaluation specialist with the World Bank Group and a former M&E advisor at AusAID.

  • Thanks David and Paul for your comments. You both point to the fervor surrounding RCTs – it’s unfortunate if institutions think them to be the be all and end all. I certainly hope (and indeed believe) that AusAID will not go down the path of funding projects that are only suitable for RCTs (and by extension platinum style evaluations which also incorporate qualitative tools). For many reasons, including the ones mentioned above, we need an array of evaluations of different rigor depending on the cost, benefit, practicalities and technical expertise available.

  • It is important to note that in some aid agencies, the fervor for RCTs has led to an expansion of projects that can be evaluated in this way at the expense of those for which RCTs are not appropriate but which might have a longer lasting impact. For example, Andrew Natsios, a former Administrator of USAid describes in “The Clash of the Counter Bureaucracy and Development” how the demand for the so-called “gold standard” of RCTs (although I see that it is now “platinum” – is this evaluation inflation?) has biased the choice of projects that USAid supports towards those that are suitable for RCT evaluation. He points out that initiatives that have the most transformative impact are those that are difficult to measure and also carry the greatest risk of failure. No good bureaucrat wants a failed project, so risk taking in this way is eschewed. I have long maintained that we can learn as much from failed projects as we can from successful ones as long as the reasons for failure are carefully documented and become part of institutional memory and future project design. Alas, one of consequences of the rapid turnover of staff in many aid agencies means that institutional memory is rare or non-existent.

  • Dinuk, I agree with your observations regarding the need for mixed methods approaches, and an ex ante evaluation perspective. I am the research director of a large evaluation of the AusAID-funded, Australian Sports Commission-implemented Australian Sports Outreach Program. This evaluation is employing a mixed methods approach, however the quantitative component is restricted due to the ex post nature of the design (and therefore no baseline, no possibility of a control etc), this is a pity but there is not much we can do about it (so is this a ‘bronze’ or maybe ‘silver’ standard design?). Due to the fervor surrounding RCTs there is increasing pressure on evaluation consultants to employ the most rigorous methods possible, however for the reasons you mention this is almost always not possible; more needs to be done to ensure that donors understand when such methods are possible and when they are not. I agree with your call for AusAID to review upcoming projects to assess which of them may be suited to platinum standard evaluations. David

    Dr David Carpenter is the Principal Consultant, International Development at Sustineo in Canberra

Leave a Comment