自然言語処理の餅屋

自然言語処理の歴史

様々な資料から収集した世界と日本の自然言語処理の歴史年表をまとめました。もし間違いがありましたらお詫びすると共に、ご指摘いただければ訂正します。

1940年代

1947年3月: ロックフェラー財団の Warren Weaver が知人への手紙の中で機械翻訳の可能性に言及
- 暗号解読技術を使えば、世界のあらゆる言語の基本的な部分は認識できるのではないかと考えた。
- これが世界の機械翻訳（及び自然言語処理）の始まりとされる。1946年が最初とする資料（例えばこれや、長尾の下記著書）もあるが、John Hutchins は1946年説については証拠がないので1947年とするのが妥当であろうと結論づけている。
1948年: Claude Elwood Shannon が英語の単語連鎖の近似を計算するために n-gram を使用
- Claude Elwood Shannon. A mathematical theory of communication. Bell System Technical Journal, Vol.27, No.3, pp.379–423. 1948.
- n-gram の考え方そのものは Andrey Andreyevich Markov が 1913年に提案（いわゆるマルコフ連鎖）
1949年7月: Weaver が Translation という表題の覚え書き（下記）を執筆し、米国内の主要な研究者に配布される。これをきっかけに世界で初めてワシントン大学、UCLA、MITで機械翻訳の研究が始まる。
- Warren Weaver. Translation. 1949.

1950年代

1952年: 機械翻訳に関する最初の学術会議
1954年: ジョージタウン大学とIBMの共同研究結果が発表
- ロシア語を英語に機械翻訳。250単語と6個の構文規則を用いる
1955年: イギリス、フランス、イタリア、ソ連で機械翻訳の研究が開始
1957年: 九州大学で機械翻訳（日英独の相互翻訳）の研究が開始
- システムは Kyusyu Translator-1 (KT-1) と命名され、1960年に完成。
1957年頃: 通産省電気試験所で機械翻訳の研究が開始
1958年: Hans Peter Luhnが単語頻度(Term Frequency; TF)によってテキストを要約(重要文抽出)する論文を発表
- “the frequency of word occurrence in an article furnishes a useful measurement of word significance”
- Hans Peter Luhn. The Automatic Creation of Literature Abstracts, IBM Journal of Research and Development, Vol.2, No.2, pp.159. 1958.
1959年2月: 電気試験所が作成した日本最初の英日翻訳機「やまと」が完成
- 翻訳例1：This is the book which is mine. →コレガホン（ソレガワレノモノダ）ダ.
- 翻訳例2: The computer does not forget whatever he learned. →computerガ（カレガマナビタモノハナンデモ）ヲワスレナイ.

1960年代

1962年: 自然言語処理に関する世界初の学会 Association for Machine Translation and Computational Linguistics (AMTCL)が設立
- 1968年にAssociation for Computational Linguistics(ACL)に改称
1963年: Annual Meeting of the Association for Computational Linguistics(ACL)が初開催
1964年: Mosteller and Wallace がベイズ推定を使ってテキスト分類を行う
- Mosteller, F. and D. L. Wallace. 1964. Inference and Disputed Authorship: The Federalist. Springer-Verlag. 1984. 2nd edition: Applied Bayesian and Classical Inference.
1964年: 九州大学の栗原俊彦らが仮名漢字方式に関する特許を出願
- この特許が現在の仮名漢字変換の最初と言われる。
1965年: International Conference on Computational Linguistics(COLING)が初開催
1965年8月: Automatic Language Processing Advisory Committee (ALPAC) から機械翻訳に関する報告書が提出される
- ALPAC. Language and Machines: Computers in Translation and Linguistics. National Academies of Sciences. 1966
1966年: 対話システムELIZA が発表
1967年: 沖電気の黒崎悦明らが仮名漢字変換試作システムを試作
1968年: 世界最古の機械翻訳会社の一つSYSTRANが創業、商用機械翻訳システム SYSTRAN を開発、米国政府に導入

1970年代

1972年: Karen Spärck Jonesが逆文書頻度(Inverse Document Frequency; IDF) の考え方を提案
- “The exhaustivity of a document description is the number of terms it contains, and the specificity of a term is the number of documents to which it pertains”
- Karen Spärck Jones. A Statistical Interpretation of Term Specificity and Its Application in Retrieval. Journal of Documentation. Vol.28, No.1, pp.11–21. 1972.
1973年: Salton and Yang が下記論文で Luhn の TF と Spärck Jones の IDF を組み合わせた TF-IDF を提唱
- Gerard Salton and C.S. Yang. On the Specification of Term Values in Automatic Indexing. Journal of Documentation. Vol.29, No.4. 1973.
- ただし初めて TF･IDF（原文表記）と呼んだのは上記論文ではなく Salton et al. Contribution to the Theory of Indexing. Technical Report TR73-188, Cornell University. 1973. が最初。
1975年: Cornelis Joost van Rijsbergen が再現率と適合率を一つにした尺度 F-measure を提案
- van Rijsbergen, C. J. 1975. Information Retrieval. Butterworths.
- 正確に言えば F ではなく 1-F に相当する値である E (effectiveness) を提案。このあたりの議論はこの論文を参照。
1975年: 情報処理学会計算言語学研究会が設立
- 1981年に自然言語処理研究会(SIG-NL)に改称
1977年: シャープが仮名漢字変換方式の日本語ワードプロセッサ試作機をビジネスショウに参考出品
1979年: 東芝が仮名漢字変換方式の日本語ワードプロセッサ JW-10 を発売。630万円。

1980年代

1990年代

1993年: 日本語形態素解析システム JUMAN Version 1.0 が発表
- 最初のバージョンは Version 0.6で 1992年2月17日付け
1994年4月1日: 言語処理学会が設立
1996年: Google が検索サービスを開始
- 開始当初は BackRub という名称で、1997 年に google.com がドメイン登録

2000年代

2002年: 機械翻訳の自動評価尺度である BLEU が提案される
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W. J. BLEU: a method for automatic evaluation of machine translation. ACL-2002: 40th Annual meeting of the Association for Computational Linguistics. pp. 311–318. 2002.
2006年: Google が Google Translate (Google翻訳) サービスを開始
2006年3月26日: MeCab 最初のバージョンである MeCab 0.90 を発表
2009年：日本の著作権法が改正され、検索エンジンに伴う情報の収集、整理・解析・検索結果の表示(第47条の5)や情報解析研究のための複製(第30条の4)が、著作権者の許諾を得なくても可能であることが明記された。

2010年代

2011年2月: IBM が開発した質問応答システム Watson がクイズ番組 Jeopardy! において人間と対戦し勝利
2013年1月: Google が Word2Vec を発表
- Tomas Mikolov et al. Efficient Estimation of Word Representations in Vector Space. 1st International Conference on Learning Representations(ICLR). 2013.
2017年: Google が Transformer を発表
- Ashish Vaswan et al. Attention is all you need. Advances in neural information processing systems (NeurIPS 2017), Vol.30. 2017.
2018年10月: Google が BERT を発表
- Jacob Devlin et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proc. of NAACL-HLT 2019. Vol.1, pp.4171–4186. 2019.

2020年代

2020年: Open AI が GPT-3 を発表
- Tom B. Brown et al. Language Models are Few-Shot Learners. Advances in neural information processing systems (NeurIPS 2020), Vol.33. 2020.
2023年4月7日：河野太郎デジタル大臣・国家公務員制度担当大臣が衆議院内閣委員会の答弁で、政府におけるChatGPTなどAI活用について「積極的に考えていきたい」と考えを述べる。
- 河野大臣、ChatGPTなどのAI活用は「積極的に考えていきたい」　霞が関の働き方改革巡り答弁

主な参考資料

長尾真. 機械翻訳はどこまで可能か. 岩波書店. 1986.
中川裕志, 森辰則. 自然言語処理研究会. 情報処理, Vol.48, No.8, pp.924-925. 情報処理学会. 2007.
IPSJコンピュータ博物館. 日本語ワードプロセッサ誕生と発展の歴史.

(感想・要望・情報提供)　

自然言語処理の餅屋

ユーザ用ツール

サイト用ツール

サイドバー

自然言語処理の歴史

1940年代

1950年代

1960年代

1970年代

1980年代

1990年代

2000年代

2010年代

2020年代

主な参考資料

ページ用ツール