Readability of MD&A extracted from iXBRL: Computational linguistic approach∗

Readability of MD&A extracted from iXBRL:
Computational linguistic approach∗
Yoshitaka Hirose†

Hirohisa Hirai‡

Kohei Arai§

May 27, 2017, Ver.1.0

Abstract
This paper clarifies determinants of the readability of Management and Discussion &
Analysis (MD&A) section from annual reports of Japanese companies extracted from
Inline Extensible Business Reporting Language (iXBRL). Previous studies have focused on English-language information, with no studies discussing the characteristics
of Japanese MD&A using large-sample data. Thus, we extracted the character information Japanese companies from iXBRL and analysed the readability using text
mining. We found that 1) companies with large market value at the end of the term
and companies with a high age have low readability, 2) companies with a large market
value at the end of the term and companies with many foreign segments have many
characters, and 3) companies with high age have fewer characters. Further, MD&A in
Japan had greater readability than comparable United States documents. Our results
suggest that firms with asymmetric information use simpler words for shareholders

and, further, are conscious of shareholders who have poor Japanese. The academic
contribution of this paper is to show the usefulness of iXBRL as well as the readability
of Japanese MD&A using large-sample data through a computational linguistic approach. In addition, this research compares the results of Li (2008), which targeted
the United States, with the results for Japan.
Keywords: iXBRL, MD&A, readability, textual analysis
JEL Codes: M41, M48, D82, G14, M15



Corresponding author: Yoshitaka Hirose ([email protected])
Takasaki University of Commerce Junior College

Kanagawa University
§
Gunma University


Electronic copy available at: https://ssrn.com/abstract=2975765

1


Introduction

This paper aims to clarify the character information of MD&A (Management and Discussion
& Analysis) section in Japanese annual reports. The MD&A disclosed in the “Management
Discussion and Analysis on Financial Condition and Results of Operations” sections in annual reports. The choice and coverage of content is at the discretion of management, and
some reports include future projections or plans. Although disclosure was institutionalised in
2003, there is limited research on its contents. Li (2008) conducted a typical empirical study
of MD&A, which revealed that the readability of the MD&A section of the annual report can
be used to predict future performance. According to Li (2008), the readability components
are difficulty and length. Specifically, the article analysed the Fog Index to measure the
difficulty level and the length of sentences. The paper found that 1) companies with low
profit margins are preparing highly readable annual reports, and 2) low readability is positively correlated with future performance. Since Li (2008), readability continues to be used
to assess the qualitative disclosure quality of financial statements (Lang, Stice-Lawrence,
2015; Lee, 2010; Lawrence, 2011 etc.). Previous studies have targeted English-language
information, with no studies yet discussing the characteristics of Japanese MD&A using
large-sample data. Recently, Shibasaki and Tamaoka (2010) developed a judgment model of
Japanese difficulty (grade), which supports similar investigation of large samples of MD&A
from Japanese companies. Thus, with reference to Li (2008), this paper conducts replication
tests on Japanese MD&A. The academic contribution of this paper is to show the usefulness

of Inline Extensible Business Reporting Language (iXBRL), as well as assessing the readability of Japanese MD&A using large-sample data via a computational linguistic approach.
In addition, we compare the results of Li (2008), which targeted the United States (US),
with the results for Japan.
The remainder of this paper is organised as follows. Section 2 sets up hypotheses to be
verified in this paper on the readability of Japanese MD&A. We also present the empirical
model used to verify the hypotheses. Section 3 describes the research design. Section 4
1
Electronic copy available at: https://ssrn.com/abstract=2975765

explains the data and sample selection used in this paper and presents the descriptive statistics. Section 5 verifies the readability features of Japanese MD&A according to Li (2008).
Finally, section 6 summarises and discusses future directions.

2

Literature Review

Disclosure of MD&A was introduced in 1968 in the US, with disclosure items specified in
Regulation S-K, and further specified in Item 303 in 1980 Form 10-K in 1982. According
to the SOX Act established in 2002, the disclosure items of MD&A were added, and this
form reaches the present. In contrast, disclosure of MD&A in Japan has a shorter history,

with its institutionalisation occurring when regulations were revised in 2003. Bryan (1997),
a pioneering study on MD&A used samples from 250 USA companies in 1990, and found the
following three outcomes: 1) MD&A is significantly correlated with short-term future performance, 2) MD&A is significantly related to analysts’ performance forecast modifications,
and 3) capital expenditure predictions are significantly related to current and future stock returns. Studies on MD&A since Bryan (1997) include Cole and Jones (2004) and Sun (2010).
Cole and Jones (2004) analysed the MD&A of retailers during the three years from 1996 to
1999. The information showed the utility of the reasoning given for the change in revenue
(growth of sales at existing stores etc.) and information on future capital plans (opening /
closing plan etc.). These variables were shown to be related to future revenues, future profits,
and stock returns. Sun (2010) analysed MD&A on 568 manufacturing industry inventories,
and found that the explanation for excessive increases in inventory is positively related to
profitability. These studies generally analysed specific industries and a small number of samples. In contrast, Li (2008) conducted a typical empirical study of MD&A on a large sample,
and discovered that the readability of the MD&A section of the annual report could be used
to predict future performance. Li (2008) considered two components of readability: difficulty
and length. Specifically, the article analysed the Fog Index to measure the difficulty level

2

and the length of sentences. The study found that 1) companies with low profit margins
are preparing low readability annual reports, 2) companies that disclose easy-to-read annual
reports are earning persistence. The result supports the hypothesis that managers create

complicated annual reports to hide current low performance. In other words, it suggests that
managers fogging information that is disadvantageous to investors and create opportunistic
annual reports. Li (2008) contributed the following three things. First, the study expanded
strategic disclosure research by analysing large-sample data on readability and showed that it
can be valuable in verifying whether readability is related to profit and profit sustainability.
Second, more complicated annual reports have low-quality disclosure that increases investor
information processing costs. Third, the quality of disclosure as character information is
related to the sustainability of profits. Since Li (2008), readability continues to be used to
assess qualitative disclosure quality in financial statements (Lang and Stice-Lawrence, 2015;
Lee, 2010; Lawrence, 2011 etc.). However, these previous studies have obtained text data
from databases such as Compustat and Osiris. In contrast, our research, text data was
obtained from iXBRL.

3

Research design

In this section, we present a hypothesis following Li (2008) regarding the readability of
MD&A. For that purpose, we first reviewed previous studies on the readability components:
difficulty and length, and describe several resulting hypotheses.


3.1

Readability: Difficulty and Length

In this paper, the difficulty levels of MD&A included in annual reports of Japanese companies were measured by a grade judgment formula used in computational linguistics. In
English, a grade judgment formula called the FOG index, as used in Li (2008), is a common tool for measuring the difficulty of a sentence. Since our focus is Japanese companies,

3

we looked to Shibasaki and Tamaoka (2010), who developed a grade determination formula
based on Japanese language textbooks. Specifically, the authors conducted a multiple regression analysis with various predicted factors was performed with Japanese difficulty as
the independent variable and the subject grade of the textbook as the dependent variable.
Then, after executing variable selection using a step-wise method, the authors argued that
the two variables proportion of hiragana in the whole text and average predicate number
for one sentence, which we will discuss in more detail below, are effective as independent
variables. Equation (1) below is the grade determination formula calculated as a regression
equation. In the formula, Grade represents the grade or difficulty, which correlates to a
school reading level. For example, if the grade level is 1, the degree of difficulty is at the
first grade level of the elementary school, and if it is 9, it is at the difficulty degree of the 3rd

grade level of the junior high school. X1 represents the proportion of hiragana characters in
the whole text (unit is %) and X2 represents average predicate number of one sentence:

Grade = −0.145X1 + 0.587X2 + 14.016,

(1)

Regarding X1 , the linguistic features of Japanese characters include the four character types:
kanji, hiragana, katakana, and romaji. According to Shibasaki and Tamaka (2010), the
presence of four kinds of characters in one language is a feature not found in other languages.
Therefore, they assumed that the proportion of character types is one of the variables that
determines the difficulty level. Their analysis of 205 major texts compared with a standard
textbook of Japanese language indicated that hiragana decreases and kanji increases as the
grade increases. The former is included in the grade determination formula. That is, these
variables are effective for measuring the difficulty level.
Regarding X2 , Shibasaki and Tamaoka (2010) also focused on the complexity of the
grammatical structure. Specifically, sentences are more difficult when they contain more
than one predicate. In other words, the number of predicates per sentence is an indication of

4


the complexity of the grammatical structure. Next, we consider the readability component
of Length by measuring the total number of characters of MD&A. A document with a large
total number of characters is considered difficult to read because the reader’s information
processing cost is high. In this paper, similar to Li (2008), the total number of characters of
MD&A in the annual report is defined as:

Length = log(N Characters),

(2)

Similar to Li (2008), because there is distortion in the distribution of the number of characters, we use logarithmically transformed values for analysis. In contrast to Li (2008) we
consider the number of characters rather than words, as this better accounts for Japanese
language characteristics.

3.2

Set hypothesis: Determining factors of readability

To support comparison, we use a similar hypothesis to Li (2008), as follows: the determinant

of readability is a non-strategic disclosure, and managers are not manipulating it strategically.
Next, in accordance with Li (2008), we set up hypotheses about the following variables that
are considered determinants of readability. The variables are year-end market value after
logarithmic transformation (SIZE ), market to book value ratio (MTB ), years since listing
(AGE ), special item profit / total asset (SI P ), special item loss / total assets (SI N ), number
of business segments (NBSEG), number of geographical segments (NGSEG), number of
foreign sales segments (NFSEG), earnings volatility (EARN VOL), and financial complexity
(NITEMS ). The company size can be thought of as a proxy variable for many aspects of
business activities and the business environment of a company. For example, in Watts and
Zimmerman (1986), a variable called company size was used as a proxy variable for the
political cost of a company. Management will disclose more information in the annual report
if the scale of the company grows. Therefore, hypothesis 1 is set using SIZE as a determinant

5

of readability of the annual report; SIZE is the logarithm of the year-end market value:
Hypothesis 1 The size of a company is inversely correlated with readability. That is, larger
companies have lower readability of MD&A than smaller companies.
Companies with high MTB are different to those with low MTB in many ways (e.g.
investment opportunities and growth potential). Growing companies with high MTB have a

more complex and uncertain business model, leading to hypothesis 2 where MTB is (market
price of net assets + book value of liabilities) / (total book value of assets):
Hypothesis 2 MTB is inversely correlated with readability. That is, companies with a high
market to book value ratio have lower readability of MD&A than companies with low market
to book value ratio.
Established companies with high AGE have implemented disclosure in the stock market
over a long period. Therefore, the company has low information asymmetry and uncertainty
of information, and readability of annual reports is expected to be good. If the investor has
the correct information on the business model of the long-living company after listing, the
company may disclose a simple and readable annual report. Therefore, hypothesis 3 is set:
Hypothesis 3 AGE is correlated with readability. That is, companies with a high age have
more readable MD&A than companies with lower age.
Companies with large special extraordinary profit / loss items such as SI P and SI N
experience more abnormal events. Therefore, such companies have complicated information
to disclose in their annual reports, leading to hypothesis 4:
Hypothesis 4 Extraordinary profits and loss are inversely correlated with readability. That
is, companies with many extraordinary profits and losses have lower readability of MD&A
than companies with few extraordinary profits / losses.

6


Companies with many NBSEG, NGSEG, and NFSEG are conducting regional development, business diversification, overseas expansion and so forth. Thus, it is assumed that the
company is doing more complicated business, leading to more complex disclosure in annual
reports. Therefore, hypothesis 5 is set:
Hypothesis 5 NBSEG, NGSEG, and NFSEG are inversely correlated with readability. That
is, companies performing complex business have lower readability of MD&A than companies
that do not conduct complicated business.
EARN VOL is considered to be a proxy variable of a company in an unstable business
environment. Companies are expected to have more complicated disclosures to investors as
the uncertainty of the business environment increases, leading to hypothesis 6. EARN VOL
is calculated based on the standard deviation of operating profit over the past five years:
Hypothesis 6 Earning volatility is inversely correlated with readability. That is, companies
performing unstable businesses have lower readability of MD&A than companies doing stable
business.
NITEMS is a variable representing financial complexity. Firms with complex finances are
expected to become more complex, leading to less readable annual reports. Therefore, hypothesis 7 is set. The complexity of finance was measured as follows. Among the additional
disclosure items included in Nikkei NEEDS, the number of items (logarithmic transformation) not voluntarily disclosed is counted. This is because firms reporting many items in
financial statements are considered to be complicated in finance. In other words, companies
with a large value for NITEMS are financially complicated:
Hypothesis 7 Financial complexity is inversely correlated with readability. That is, companies with complex finance have lower readability of MD&A than companies with a simpler
finance.

7

In this paper, we analyse the usefulness of MD&A according to Li (2008). First, we
analyse difficulty and length of sentences to gather information about readability. However,
Li (2008) targeted English, and we are focusing on Japanese publications. Language differences in text mining are major issues. Thus, Shibasaki and Tamaoka (2010) is used in this
paper. We make a grade judgment to measure the difficulty of sentences. Therefore, for
hypotheses 1 to 7 on the readability of MD&A, the dependent variable is set to difficulty
or length, where a high value indicates low readability. Verification is carried out by the
following equation:
Grade or Length = β0 + β1 SIZE + β2 M T B + β3 AGE + β4 SI P + β5 SI N

(3)

+β6 N BSEG + β7 N GSEG + β8 N F SEG + β9 EARN V OL + β10 N IT EM S,
For verification of the hypothesis, the expected sign of each coefficient is as shown in Table
1.

4

Sampling

Text data was obtained from the Financial Services Agency website EDINET (Electronic
Disclosure for Investors’ NETwork). The “Detailed document search” function was used
to select “Search for only the documents containing XBRL” and to specify the submitter’s
industry to exclude Banks, Securities & Commodity Futures, Insurance, and Other Financing
Business. We performed Specify Type of document and selected Annual Securities Report.
We chose the 2015 fiscal year for the reporting period to obtain samples corresponding to
the taxonomy version of March 31, 2015. This search procedure resulted in 2,547 cases. To
obtain MD&A, we used Perl and R. We first removed all HTML tags from “.xbrl” using
Perl. Next, R was used to extract the MD&A of companies with MD&A tag line breaks.
Perl was used to extract MD&A of companies that did not have MD&A tag line breaks.
This gave us 2,265 enterprises with MD&A. This sample included companies listed on the

8

First Section of the Tokyo Stock Exchange (TSE), unlisted companies, and companies listed
on the second section of the TSE and other stock exchanges.
Financial data for the past five years was obtained from Nikkei NEEDS-Financial Quest
2.0 compiled by Nikkei, Inc, and used to calculate EARN VOL. Furthermore, in order to
ensure comparability, we restricted the sample to the fiscal year ending March 31st to match
the adoption of the Japanese Generally Accepted Accounting Principles (GAAP). The sample size of the companies that were listed on the First Section of the TSE that meets these
conditions was 1,106. Ohter companies (1,159) excluded from the analysis.

5

Results

Table 2 shows the descriptive statistics. The mean Grade: an estimate grade for Japanese
difficulty, is 10.81. Li (2008) reported in his US sample that the mean FOG Index was 18.23,
which indicated unreadable. Although there are issues with comparing the two results due
to linguistic differences, we are able to say that Japanese annual reports are easier to read on
average than US annual reports. In our paper, the length of the MD&A is measured in terms
of characters, not words like Li (2008). However, Li (2008) reported that the mean number
of words in MD&A section was 4,665. Compared with our finding that the mean number
of characters in MD&A is 2,023, it is obvious that the Japanese MD&As are comparatively
brief. In summary, Japanese MD&As have greater readability than those from the US.
Table 3 shows the results of a regression analysis on equation (3). Column [1] reports
the determinants of the MD&A Grade, column [2] and [3] report the determinants of the
two factors, PredicateRate and HiraganaRate, respectively, that make up the MD&A Grade.
SIZE and AGE are significantly related to MD&A Grade by affected the HiraganaRate. As
hypothesised, SIZE raises the difficulty of sentences. However, contrary to our expectation,
AGE also has a positive relationship with MD&A Grade. This suggests that firms with
asymmetric information to present to shareholders choose plain words for explanation. It

9

is plausible that the MD&As of older firms are easy for shareholders to understand even
using difficult words written in Kanji. Column [4] reports the determinants of the MD&A
Length, SIZE, AGE and NFSEG. As hypothesized, SIZE increases the length of the MD&A
and AGE decreases it. Regarding the outcome of AGE, this is consistent with previous
findings because, if the information content is the same, using Kanji shortens the length of
the sentence. As for NFSEG, the result differed from the hypothesis; it seems that companies
are conscious of shareholders whose are not good at Japanese.

6

Conclusion

In this paper, we analysed the readability determinants of MD&A. We subsequently summarise the two main contributions of this paper. First, the descriptive statistics of MD&A
from Japanese annual reports were presented. There was a trend whereby the readability of
MD&A from Japanese companies was better compared with that of US companies. Second,
by analysing the difficulty level and length of sentences, we specified the determinants of
readability. Specifically, we found that 1) companies with a large market value at the end
of the term and companies with a high age have low readability, 2) companies with a large
market value at the end of the term and companies with many foreign segments have more
characters, and 3) companies with high age have fewer characters. These results suggest
that the MD&A from Japanese companies are valuable as sources of empirical information
system research from databases such as the iXBRL. The focus on and clarification of readability in Japanese documents is an important contribution of this research. Future research
should explore the differences in interpretation between Li (2008) and the present study in
greater detail. Since it is necessary to discuss whether the Japanese-specific phenomenon of
easier-to-read MD&A and fewer characters is desirable for disclosure, future research should
discuss the disclosure system. The readability and short length of MD&A reports suggest
that it is becoming increasingly important to provide easy-to-understand information to ex-

10

ternal stakeholders such as investors. On the other hand, since it can also be interpreted
that the disclosure of MD&A is becoming less important, further analysis is necessary.

References
[1] Bryan, S. H. (1997). Incremental information content of required disclosures contained
in management discussion and analysis, The Accounting Review, Vol. 72, No. 2, pp.
285-301.
[2] Cole, C. J. and Jones, C. L. (2004). The Usefulness of M&A Disclosures in the Retail
Industry, Journal of Accounting, Auditing & Finance, Vol. 19, No. 4, pp. 361-388.
[3] Lang, M. and Stice-Lawrence, L. (2015). Textual analysis and international financial
reporting: Large sample evidence, Journal of Accounting and Economics, Vol.60, No.23, pp.110-135.
[4] Lawrence, A. (2013). Individual investors and financial disclosure, Journal of Accounting
and Economics, Vol.56, No.1, pp.130-147.
[5] Lee, Y. J. (2012). The Effect of Quarterly Report Readability on Information Efficiency
of Stock Prices, Contemporary Accounting Research, Vol.29, No.4, pp.1137-1170.
[6] Li, F. (2008). Annual report readability, current earnings, and earnings persistence,
Journal of Accounting and Economics, Vol.45, No.2, pp.221-247.
[7] Shibasaki, H. and Tamaoka, K. (2010). Constructing a Formula to Predict School Grades
1-9 based on Japanese Language School Textbooks, Japan Journal of Educational Technology, 33(4), pp. 449-458.
[8] Sun, Y. (2010). Do MD&A disclosures help users interpret disproportionate inventory
increases?, The Accounting Review, Vol. 85, No. 4, pp. 1411-1440.
[9] Watts, R.L., Zimmerman, J.L., (1986). Positive Accounting Theory, Prentice-Hall, Englewood Cliffs, NJ.

11

Tables
Table 1: Expected coefficient signs for each hypothesis
Hypothesis
1
Coefficient
β1
Expected code +

2
β2
+

3
β3
-

12

4
β4 /β5
+/+

5
β6 /β7 /β8
+/+/+

6
β9
+

7
β10
+

Table 2: Descriptive Statistics
Grade
PredicateRate (%)
HiraganaRate (%)
Length
SIZE
MTB
AGE
SI P
SI L
NBSEG
NGSEG
NFSEG
EARN VOL
NITEMS
Earnings
PL

Mean Std. Dev.
10.81
0.56
2.40
0.60
31.82
4.11
2023
1131.9
24.88
1.48
696198
876928
39.71
20.46
0.01
0.02
0.01
0.02
1.55
0.81
0.28
0.71
1.02
0.98
2216.9
7113.9
3.10
0.13
0.06
0.04
0.97
0.18

Min
9.54
0.90
20.79
411.8
21.48
51447
1.00
0.00
0.00
0.00
0.00
0.00
11.4
2.57
-0.12
0.00

25th
10.44
2.00
29.11
1221.5
23.76
329385
19.00
0.00
0.00
1.66
0.00
0.00
227
3.00
0.03
1.00

Median
75th
Max
10.75
11.16
12.32
2.39
2.77
4.00
32.02
34.75
41.05
1761.5 2525.8
6183.8
24.68
25.82
29.47
497780 784591 13963570
44.00
58.80
65.00
0.00
0.01
0.30
0.00
0.01
0.21
1.95
2.08
2.71
0.00
0.00
2.40
1.61
1.95
2.64
565.1 1476.9 124248.3
3.09
3.18
3.53
0.05
0.07
0.50
1.00
1.00
1.00

Text of the MD&As were obtained from the iXBRL provided by EDINET operated by Financial Services Agency of Japan.
Grade is an estimate of the equivalent grade in a Japanese language textbook, calculated as 0.587 * PredicateRate - 0.145
* HiraganaRate + 14.016 (Shibasaki and Tamaoka, 2010). PredicateRate is the average number of predicates in a sentence.
HiraganaRate is the average number of Hiragana in a sentence. Length is the natural logarithm of the number of characters in
an MD&A. SIZE is the logarithm of the market value of equity. MTB (Market-to-Book) is the market value of the firm divided
by its book value. AGE is the number of years since a firm was listed on the Tokyo Stock Exchange. SI P is extraordinary
profit in special items scaled by the book value of assets. SI L is extraordinary loss in special items scaled by the book value
of assets. NBSEG is the logarithm of 1 + the number of business segments. NGSEG is the logarithm of 1 + the number of
geographic segments. NFSEG is the logarithm of 1 + the number of foreign geographic segments. EARN VOL is the standard
deviation of the operating earnings in the last five fiscal years. NITEMS is the number of missing voluntary disclosure items.
Earnings is ordinary profit scaled by the book value of assets. PL is a dummy variable that equals 1 if a company reports a
profit and 0 otherwise. All financial data were obtained from Nikkei NEEDS FinancialQuest.

13

Table 3: Summary Statistics of the Determinants of Grade and Length
[1] Grade
SIZE
MTB
AGE
SI P
SI L
NBSEG
NGSEG
NFSEG
EARN VOL
NITEMS
Earnings
PL
(Constant)
Industrial Dummy
Observations
R2
Adjusted R2
Residual Std. Error (d.f. = 1,065)
F Statistic (d.f. = 40; 1,065)

0.035**
(0.015)
-0.000
(0.000)
0.004***
(0.001)
1.927
(1.185)
-0.571
(1.065)
0.008
(0.027)
-0.052*
(0.030)
-0.023
(0.023)
-0.000
(0.000)
0.103
(0.145)
0.074
(0.473)
0.007
(0.108)
9.126***
(0.694)
Yes
1,106
0.086
0.052
0.546
2.510***

[2]
PredicateRate
-0.006
(0.016)
-0.000
(0.000)
-0.000
(0.001)
-0.137
(1.286)
2.077*
(1.156)
-0.021
(0.029)
-0.082**
(0.033)
0.042*
(0.025)
0.000
(0.000)
0.022
(0.157)
-0.014
(0.514)
0.067
(0.117)
2.548***
(0.753)
Yes
1,106
0.058
0.023
0.593
1.652***

[3]
HiraganaRate
-0.263**
(0.108)
-0.000
(0.000)
-0.029***
(0.007)
-12.96
(8.746)
12.302
(7.864)
-0.159
(0.198)
0.006
(0.223)
0.313*
(0.171)
0.000
(0.000)
-0.569
(1.067)
-0.393
(3.496)
0.183
(0.798)
43.421***
(5.123)
Yes
1,106
0.072
0.037
4.034
2.054***

[4] Length
0.056***
(0.014)
-0.000
(0.000)
-0.005***
(0.001)
-1.117
(1.158)
0.606
(1.041)
0.012
(0.026)
-0.013
(0.029)
0.061***
(0.023)
-0.000
(0.000)
-0.003
(0.141)
-0.255
(0.463)
-0.167
(0.106)
7.025***
(0.678)
Yes
1,106
0.092
0.058
0.534
2.706***

This table shows the regression results of Grade and Length on the determinants and industry fixed effects. Standard errors
are shown in parentheses. ***/**/* indicate significance at 0.01, 0.05, and 0.10 level, respectively. Text of the MD&As were
obtained from the iXBRL provided by EDINET operated by the Financial Services Agency of Japan. Grade is an estimate of
the equivalent grade in a Japanese language textbook, calculated as 0.587 * PredicateRate - 0.145 * HiraganaRate + 14.016
(Shibasaki and Tamaoka, 2010). PredicateRate is the average number of predicates in a sentence. HiraganaRate is the average
number of Hiragana in a sentence. Length is the natural logarithm of the number of characters in an MD&A. SIZE is the
logarithm of the market value of equity. MTB (Market-to-Book) is the market value of the firm divided by its book value.
AGE is the number of years since a firm was listed on the Tokyo Stock Exchange. SI P is extraordinary profit in special items
scaled by the book value of assets. SI L is extraordinary loss in special items scaled by the book value of assets. NBSEG is the
logarithm of 1 + the number of business segments. NGSEG is the logarithm of 1 + the number of geographic segments. NFSEG
is the logarithm of 1 + the number of foreign geographic segments. EARN VOL is the standard deviation of the operating
earnings in the last five fiscal years. NITEMS is the number of missing voluntary disclosure items. Earnings is ordinary profit
scaled by the book value of assets. PL is a dummy variable that equals 1 if a company reports a profit and 0 otherwise. All
financial data were obtained from Nikkei NEEDS FinancialQuest.

14