Brian Tayan is a researcher with the Corporate Governance Research Initiative at Stanford Graduate School of Business. This post is based on a recent paper by Mr. Tayan; David Larcker, Professor of Accounting at Stanford Graduate School of Business; Edward Watts, Assistant Professor of Accounting at Yale School of Management; and Lukasz Pomorski, Lecturer at Yale School of Management.
Related research from the Program on Corporate Governance includes The Illusory Promise of Stakeholder Governance (discussed on the Forum here) and Will Corporations Deliver Value to All Stakeholders? (discussed on the Forum here), both by Lucian A. Bebchuk and Roberto Tallarita; Restoration: The Role Stakeholder Governance Must Play in Recreating a Fair and Sustainable American Economy—A Reply to Professor Rock (discussed on the Forum here) by Leo E. Strine, Jr.; and Stakeholder Capitalism in the Time of COVID (discussed on the Forum here) by Lucian Bebchuk, Kobi Kastiel, and Roberto Tallarita.
ESG ratings are intended to provide information to market participants (investors, analysts, and corporate managers) about the relation between corporations and non-investor stakeholders interests. They do so by sifting masses of data to extract insights into various elements of environmental, social, and governance performance and risk. Investors rely on this information to make investment decisions, while corporations use ratings to gain third-party feedback on the quality of their sustainability initiatives.
Recently, ESG ratings providers have come under scrutiny over concerns of the reliability of their assessments. In this post, we examine these concerns. We review the demand for ESG information, the stated objectives of ESG ratings providers, how ratings are determined, the evidence of what they achieve, and structural aspects of the industry that potentially influence ratings. Our purpose is to help companies, investors, and regulators better understand the use of ESG ratings and to highlight areas where they can improve. We find that while ESG ratings providers may convey important insights into the nonfinancial impact of companies, significant shortcomings exist in their objectives, methodologies, and incentives which detract from the informativeness of their assessments.
Demand for ESG information has exploded in recent years. Ten years ago, the term ESG—although in existence—was rarely used by the investment community or in corporate boardrooms. Instead, public and professional interest was focused on the general concepts of corporate responsibility, sustainability, and impact investing. Only recently has the focus on ESG (environmental, social, and governance) as a unique concept come to the forefront and with it an explosion in the demand for information (see Exhibit 1).
Sources of this demand include:
Demand for ESG information has in many ways outstripped the ability of suppliers to supply the depth, detail, and accuracy of data required. This is perhaps due to the immense number of factors that plausibly fall under the heading of ESG, the difficulty in measuring ESG factors, and the daunting challenge of determining their impact. To this end, Amel-Zadeh and Serafeim (2018) find several informational impediments that hinder ESG integration in the investment process including lack of comparability across firms, lack of standards, the cost of gathering information, and a lack of quantifiable information.
Commercially developed, third-party ESG ratings are one type of service provider that has evolved to meet the demand for ESG information. A 2020 survey by SustainAbility finds that ESG ratings are the most frequently referenced source of information that institutional investors rely on to gauge ESG performance (55 percent, tied with direct company engagement). Another survey finds that 88 percent of investment professionals use third-party ESG ratings as a part of their investment process, with 92 percent expecting to do so in the future.
The importance of ESG ratings to the asset management business is demonstrated by the flow of funds into ESG-labeled investment products. Bank of America calculates that over $200 billion was invested in ESG bond funds between 2019 and 2022. Hartzmark and Sussman (2019) show that mutual funds with high ESG ratings (as measured by Morningtart) realized net inflows over the measurement period, compared with net outflows among firms with low ESG ratings.
ESG ratings are intended to measure “ESG quality.” ESG quality itself, however, does not have a single agreed-upon definition. Two main views of ESG exist, and to some extent they work in directionally opposite ways.
One view of ESG is that it reflects the impact a company has on the welfare of its stakeholders, such as employees, suppliers, customers, local community, and the environment. Under this definition, a company can improve its ESG profile by withdrawing from activities that are harmful to stakeholders or improving business practices in affected areas to benefit these constituents. The cost of such investment, at least in the short run, is incurred by shareholders, while the long-term financial impact to the company is undetermined or unstated. This view of ESG (“doing good”) is what most individual investors likely think of when they think about ESG quality.
A competing view is that ESG measures the impact societal and environmental factors have on the company, and that these factors are financially material. Under this definition, an ESG framework provides a set of risk factors that the company can plan for or mitigate through strategic planning, targeted investment, or a change in operating activity. Addressing ESG risk factors, even if costly in the short run, is expected to result in a long-term financial benefit to the corporation and its shareholders. This view of ESG (the impact of environmental and social risks on financial performance) is the one predominantly adopted by ESG ratings providers.
The tension between these viewpoints is demonstrated in a Bloomberg BusinessWeek article which takes a critical view of ESG ratings, with a focus on the ratings of MSCI. According to the article,
There’s virtually no connection between MSCI’s ‘better world’ marketing and its methodology. That’s because the ratings don’t measure a company’s impact on the Earth and society. In fact, they gauge the opposite: the potential impact of the world on the company and its shareholders. MSCI doesn’t dispute this characterization. It defends its methodology as the most financially relevant for the companies it rates.
According to the article, MSCI’s CEO
concedes ordinary investors piling into such funds have no idea that his ratings, and ESG overall, gauge the risk the world poses to a company not the other way around. ‘No, they for sure don’t understand that,’ he said in an interview.
The authors of this piece make the assumption that ESG ratings are supposed to measure a company’s impact on the environment and society and convey surprise that MSCI’s ratings attempt to measure the opposite.
The ESG ratings industry is highly fragmented with dozens of ratings agencies and data providers in existence. The backgrounds of these firms are not uniform, with many having entered the ESG ratings business from different areas of historical expertise. Some ESG ratings firms used to create ESG funds or referenced in the press include:
These are just a few ESG ratings providers. Other well-known firms include S&P Global, Vigeo Eiris (owned by Moody’s Investor Services), HIP, and TruValue Labs (owned by FactSet Research—See Exhibit 2).
ESG ratings firms aim to provide insight into ESG quality. However, the approaches they take are not the same. This can be seen in the variation in their stated objectives.
A common theme among ESG providers is investment risk reduction. The assumption is that ESG quality improves financial performance by reducing social and environmental factors that pose risk to the company’s business model or operations. To this end, MSCI claims its ratings “support ESG risk mitigation and long-term value creation.” Sustainalytics measures “the degree to which a company’s economic value is at risk” because of ESG factors. If these providers are correct in their thesis and accurate in their measurement, we should be able to observe a correlation between ESG ratings and subsequent risk events (measured by such factors as financial performance or reduced likelihood of regulatory violations, litigation, or bankruptcy.
Risk reduction is not the only claim of ESG ratings providers. Some are explicit in designing their scores to predict returns. For example, HIP claims that its ratings “correlate with better returns for the same amount of risk.” Arabesque says its approach “is all about identifying companies that are better positioned to outperform over the long term. … When calculating the ESG score of a company, the algorithm will only use information that significantly helps explain future risk-adjusted performance.” These claims are also testable and can be verified by relating ESG ratings to subsequent stock or bond price changes.
In addition to these, some ESG ratings providers make additional claims, such as measuring a company’s environmental or social impact (ISS), transparency and commitment to ESG (Refinitiv), or provide a screen for ESG selection in support of stewardship goals (FTSE Russell). The accuracy of these types of claims is somewhat harder to measure.
ESG ratings are generally reported on a letter or numeric basis to reflect the company’s absolute or relative ESG risk or performance. Some companies (such as MSCI) use a 7-point scale from AAA to CCC, analogous to that used by major credit-rating agencies. Others use a 12-point scale from A+ to D-, similar to an education system (ISS is an example). Another widely used approach is to publish scores on a percentile basis using a scale of 1 to 100, where 100 can either represent high ESG quality (positive) or high ESG risk (negative).
Many ratings providers claim to measure industry-relative ESG quality, while some claim to measure absolute quality. Industry-adjusted ratings allow investors to compare ESG risk or performance across firms within the same industry. In this way, an energy company that is more financially exposed to environmental risks can be identified against its peer group. However, industry-adjusted ratings do not allow for comparison of firms across industries, and a company’s rating is highly dependent on the industry it is designated to. By contrast, ratings providers that claim to measure absolute ESG quality can be used for comparison across industries, although firms tend to receive systematically higher or lower ratings depending on their line of business.
To arrive at an overall ESG rating, ratings firms typically make separate assessments of the three components of ESG—E (environment), S (social), and G (governance)—which they then aggregate to compute an overall score. In measuring these, the firm must have a view of the major factors that contribute to each component. These might be derived using statistical analysis of historical data to identify drivers of E, S, and G, or they might be hypothesized based on a theoretical relation that is not tested.
For example, MSCI identifies the following subcomponents of E, S, and G:
Climate change: The company’s contribution to climate change through emissions, or the company’s exposure to harm due to climate change or climate-related regulatory action.
Natural capital: The degree to which the company relies on natural resources that might be at risk
Pollution and waste: The generation of waste (packaging, materials, or toxins) as part of the production or disposal of company goods.
Environmental opportunities: The potential to use environmental technology to improve operations or sales.
Human capital: All aspects of human capital management including employment practices, talent development, safety, and the labor standards of suppliers.
Product liability: The potential for products to cause harm because of quality failures, safety failures, financial harm, privacy violations or data leaks, chemical harm, other health or demographic risk, and the potential benefits of responsible investment to improve product quality, safety, or impact.
Stakeholder opposition: Societal opposition to the company because of controversial sourcing techniques or locations, or other conflicts with local communities.
Social opportunities: The potential to benefit society by improving access to products.
Corporate governance: Factors relating to the quality of corporate oversight, including the structure and composition of the board of directors, shareholder ownership structure and control, CEO pay practices, and accounting quality.
Corporate behavior: Evidence into the ethical behavior of the company, including anticompetitive practices, corruption, and tax shielding and transparency.
(See Exhibit 3 for examples of ESG frameworks).
Sources: MSCI Key Issue Framework (as of July 2022), available at: https://www.msci.com/our-solutions/esg-investing/esg-ratings/esg-ratings-key-issue-framework; FTSE ESG Ratings Model (as of June 2021), available at: https://research.ftserussell.com/products/downloads/Guide_to_FTSE_Sustainable_Investment_Data_used_in_FTSE_Russell_Indices.pdf; Refinitiv ESG Scores (as of May 2022), available at: https://www.refinitiv.com/content/dam/marketing/en_us/documents/methodology/refinitiv-esg-scores-methodology.pdf; S&P Global ESG Ratings (as of July 2022), available at: https://www.spglobal.com/esg/solutions/data-intelligence-esg-scores; Sustainalytics ESG Risk Ratings (as of January 2021), available for download at: https://www.sustainalytics.com/esg-data.
Ratings providers may leverage reporting frameworks developed by third-party organizations. Examples include the reporting standards developed by the Sustainability Accounting Standards Board (SASB), Task Force on Climate-Related Financial Disclosures, and the Global Reporting Initiative. These frameworks offer the benefit of leveraging the work of independent organizations and are often similar to the proprietary frameworks developed by ESG ratings providers.
One observation is that the number of input variables is massively large. FTSE Russell claims its model uses 300 indicators. Refinitiv uses 630 ESG metrics. S&P Global uses 1,000 underlying data points.
Managing this number of variables requires the ratings provider to make important decisions or simplifying assumptions. One is assessing materiality. Not all variables are equally material across companies or industries. As a result, some variables might require larger or lesser weighting to reflect their relevance; some might be excluded entirely. Another decision is how to deal with missing data. Even though a variable might be deemed material, this does not mean that the relevant data is available to measure that variable. (We discuss options for handling this decision below). A related decision is how to standardize variables when they are reported differently and therefore are not directly comparable across companies. Finally, the ratings provider must decide how to weight both the variables in their importance to E, S, and G, and also the overall pillars of E, S, and G in relation to one another.
All of these choices will influence the reported ESG rating.
The data sources used to populate ratings models include public, quasi-public, and private data. Public data includes company-reported filings with the SEC, company-produced sustainability reports, press releases, newswires, and media reports. Quasi-public information includes data captured in government, regulatory, and NGO datasets. Nonpublic information might be provided by the company in response to solicited questionnaires.
Working with data sets such as these brings inherent problems. Three major challenges are completeness of data, standardization, and consistency.
Completeness. A model that includes hundreds of material input variables requires data to support each variable. Much of this information is not publicly reported. As a result, the ratings firm will have to make decisions about how to handle missing data. One approach is to simply omit the data point, but this makes it difficult to compare scores across companies that report and do not report a value. Another is to make an assumption about what the data might be. For example, when information is not available to populate a data point, MSCI appears to assume that the company’s performance is the industry average. (In this case, the choice of industry peer group will influence how the data point is populated.) By contrast, FTSE assumes that the company’s performance is the worst. (This choice is intended to encourage transparency but is also likely punitive.) A third approach is to estimate the data using advanced statistical techniques to impute the missing value.
Standardization. The problem of standardization occurs when companies report information on the same variable using scales that are not directly comparable. For example, one company might report workplace safety information using raw numbers (number of incidents), a time scale (injuries per unit of time worked), or a percentage scale (lost-time frequency). The ratings provider must standardize these differences across companies in order to compute overall ESG performance.
Consistency. To improve the performance of models, a ratings provider might make retroactive adjustments to historical data. For example, the data included in a model five years ago might not be the same as the data in the model today for that same year. Data changes are made to improve the accuracy of models, as new or better data is made available. However, they have the effect of making a model look more predictive than it was. Revising past data based on observed subsequent outcomes can invalidate the results from back testing. This is an important concern when evaluating the predictability and validity of commercial ESG ratings.
The impact of routine methodological choices such as these can be seen in the example of Refinitiv. Berg, Fabisik, and Sautner (2021) show that methodological changes adopted by Refinitiv in 2020 resulted in major changes to both current and historical ratings. Median scores were 18 percent lower with rewritten changes, with 44 percent and 16 percent swings in E and S scores, respectively. These revisions also changed the predictive results of the ratings. Stocks with high ESG scores outperformed in the rewritten data but not in the original data. They observe that “data rewriting is an ongoing rather than a one-off phenomenon,” no doubt reflective of firms working to improve the usefulness of their data.
Having reviewed the objectives and methodological choices of ESG firms, we can better understand the research evidence regarding ESG ratings quality, consistency, and effectiveness. Unfortunately, it is rare for ratings providers to offer concrete, systematic evidence to back up claims about their ratings.
Practitioners profess a lack of understanding about the methodologies and reliability of ESG ratings. The Alternative Investment Management Association (AIMA), which represents such firms globally, reports that its members “have experienced challenges in terms of understanding and validating the approaches used by different ratings providers.” The European Securities and Market Authority describes the market for ESG ratings as “immature,” based on its structure and dispersion of methodologies. A 2020 study of institutional investors uncovers widespread concerns, including inaccuracy and inconsistency of data, inexperienced research analysts, and a perception that ESG quality cannot be distilled to a score.
Systemic patterns are observed in ESG ratings. One pattern is related to company size: Large companies receive higher average ratings than smaller companies. This might be due to the more significant resources large firms are able to invest in ESG initiatives, or it might be due to the fact that large companies have greater disclosure of ESG data. A second pattern is industry-related: While some ESG ratings are industry-adjusted, those that are not may have higher average scores for certain industries (such as banks and wireless communications) than for others (such as tobacco and gaming). It is not clear if these patterns are due to fundamental differences in ESG quality across industries, or a result of the methodological choices and input variables that underpin ESG ratings models. A third pattern is country-related: European companies have higher average ESG scores than U.S. companies, which might be due to political and regulatory differences across countries. Firms in emerging markets also have lower ratings than firms in more developed economies.
Research also demonstrates an upward drift in ESG ratings over time. D.E. Shaw (2022) analyzes the aggregate ESG scores for all Russell 1000 companies as calculated by MSCI between January 2015 and December 2021. They find an 18 percent aggregate improvement over the measurement period. Structural changes account for 6 percentage points of this improvement. These include:
Adjusting for these structural changes, D.E. Shaw still finds that MSCI ratings are subject to an aggregate 12 adjusted-improvement (which the report describes as “grade inflation”). They do not explain the reason for this improvement.
Studies find low correlations across ESG ratings providers. This is perhaps surprising if ESG ratings are supposed to measure the same construct.
CFA Institute (2021) finds correlations across the major providers ranging from 0.65 (between S&P Global and Sustainalytics) to 0.14 (between ISS and S&P Global). Dimson, Marsh, and Staunton (2020) find not only that ESG ratings vary across providers but the individual components (E, S, and G) also vary widely. For example, assessments of the E, S, and G components as determined by MSCI and Sustainlytics exhibit correlations of only 0.11, 0.18, and -0.02, respectively. This suggests either they are measuring unrelated constructs or they have significant measurement error in measuring the same construct (see Exhibit 4).
Sources: Kevin Prall, “ESG Ratings: Navigating Through the Haze,” blog posting at CFA Institute (August 10, 2021); Florian Berg, Julian F. Kölbel, and Roberto Rigobon, “Aggregate Confusion: The Divergence of ESG Ratings,” Review of Finance (2022).
Berg, Kölbel, and Rigobon (2022) try to identify reasons why ESG ratings diverge across providers. They deconstruct ratings along three dimensions: scope (the attributes the ratings providers attempt to measure), measurement (the measures used to evaluate the same attributes), and weighting (the weights assigned to attributes in reflection of their relative importance). They find that differences in measurement (56 percent) and scope (38 percent) account for most of the divergence, with weighting differences accounting for just 6 percent of the variance. This illustrates how fundamental the methodological differences are across firms.
Perhaps unexpected, Christensen, Serafeim, and Sikochi (2022) find that corporate disclosure does not reduce the divergence of ESG ratings but instead increases it. They explain that “due to the subjective nature of ESG information … higher disclosure would be associated with higher disagreement, as disclosure expands opportunities for different interpretations of information.” This suggests that greater corporate disclosure requirements of environmental and social data might not lead to more consistent ESG ratings. In this way, ESG ratings might be similar to equity analyst ratings, where the rating is ultimately dependent on the interpretation of information rather than its availability.
The divergence of ESG ratings has several implications. One is the potential to confuse investment decisions by giving unreliable information about the ESG quality of firms. Another is that it hampers the disclosure that fund managers make to investors regarding the overall ESG quality of their portfolio. A third is that it reduces the incentive of companies to improve their ESG performance by sending unreliable signals about how their ESG initiatives are assessed by third-party observers.
Studies find that ESG ratings have low associations with environmental and social outcomes.
A review of MSCI ratings conducted by Bloomberg finds that most upgrades occur for what Bloomberg calls “rudimentary business practices” rather than substantive improvements. In justifying 155 upgrades, MSCI cited governance improvements almost half (42 percent) of the time—significantly more than social (32 percent) or environmental (26 percent) improvements. Upgrades were often driven by check-the-box practices, such as conducting an employee survey that might reduce turnover, and rarely for substantial practices, such as an actual reduction in carbon emissions. Half of companies were upgraded for doing nothing—the result of methodological changes.
Raghunandan and Rajgopal (2022) find that companies in ESG portfolios (those with high Sustainalytics ratings) have worse records for compliance with labor and environmental laws relative to companies in non-ESG portfolios during the same period. Companies added to ESG portfolios also do not subsequently improve compliance with labor or environmental regulations.
Gibson, Glossner, Krueger, Matos, and Steffen (2022) find that U.S. firms that join the Principles for Responsible Investment (PRI), which commit a company to incorporate ESG factors into their decision-making processes, earn worse ESG ratings (as assigned by MSCI, Refinitiv, and Sustainalytics) than U.S. firms that do not make this commitment.
The relation between financial performance and ESG ratings is uncertain.
Dunn, Fitzgibbons, and Pomorski (2018) study the risk characteristics of companies based on their ESG ratings (as provided by MSCI). They find that companies with the lowest ratings have volatility that is up to 15 percent higher and betas up to 3 percent higher than stocks with the highest ratings. They also find that ESG scores might be predictive of future risk, although the effects are modest. They conclude that “ESG information may play a role in investment portfolios that goes beyond the ethical considerations and may inform investors about the riskiness of the securities in a way that is complementary to what is captured by traditional statistical risk models.”
Hartzmark and Sussman (2019) examine the relation between fund sustainability and performance (using Sustainability fund ratings). They find that funds with low sustainability ratings perform better than those with high ratings. Bansal, Wu, and Yaron (2022) find that companies with high ESG ratings (by MSCI) perform better during good economic times but worse during bad economic times. Demers, Hendrikse, Joos, and Lev (2021) study the performance of companies at the onset of Covid-19 and find no evidence that ESG ratings predict performance during this unexpected risk event. Lopez-de-Silanes, McCahery, and Pudschedl (2019) examine ESG ratings outside of the U.S.—primarily in European countries, Australia, and Japan. They find that ESG scores of companies domiciled in these countries are not associated with risk-adjusted performance.
Schröder (2007) and Dimson, Marsh, and Staunton (2020) both find that ESG indexes created by ESG ratings firms (such as MSCI and FTSE Russell) exhibit outperformance during their prelaunch periods only to underperform after their launch dates. This suggests that ESG indexes are created through back-testing methods that do not result in a sustainable investment strategy.
Atz, Liu, Bruno, and Van Holt (2021) provide a substantial literature review of over 1,100 primary peer-reviewed papers and 27 meta-analyses on ESG and sustainable investing published between 2015 and 2020. They conclude that “the financial performance of ESG investing has on average been indistinguishable from conventional investing.”
It might be the case that, while the ratings published by any single ratings provider are not predictive of performance, the assessments of multiple providers might be informative when considered in aggregate. To this end, Berg, Kölbel, Pavlova, and Rigobon (2021) attempt to combine the ratings of multiple providers to reduce the “noise” from conflicting assessments. They find some evidence that combining the scores from multiple firms leads to a stronger relationship between ESG and performance.
Several structural features might influence the quality of ESG ratings. These include:
The complete paper is available for download here.
An interesting analysis and comparison of studies on the effectiveness of ESG ratings and their use and impact on sustainability investment. London Stock Exchange Group (LSEG)’s Refinitiv and FTSE Russell businesses are one of the providers your article cites, under the “Who are the Players” subhead. However, the descriptions of our offerings in the ESG space are not correct: Amplifying on and clarifying what you listed, Refinitiv is the rebranded data provider Thomson Reuters Financial & Risk. The rest of that sentence is not pertinent to Refinitiv, its ESG indices and ratings, which indeed were purchased by LSEG in early 2021. Also, the ensuing “Rate the Raters” timeline chart in Exhibit 2 on “Evolution of ESG Ratings Industry – ESG Merger & Acquisition Activity” inaccurately lists Refinitiv’s relationship to Thomson Reuters, as being the latter’s “parent…” Actually, LSEG is Refinitiv’s parent. Thanks for registering this amplification and correction. Mark D. Harrop, Sr Associate External Communications, London Stock Exchange Group Data & Analytics
Your email is never published nor shared. Required fields are marked *
You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>