===== 評価 =====
=== ツール ===
  * 2021-04-29 | [[https://github.com/princeton-nlp/metric-wsd|MetricWSD]] -  Non-Parametric Few-Shot Learning for Word Sense Disambiguation
  * 2021-02-18 | [[https://github.com/cl-tohoku/PheMT/tree/main/eval_tools|PheMT evaluation toolkit]] - 日英[[機械翻訳]]の言語現象毎評価データセット
  * 2019-10-17 | [[https://github.com/gcunhase/NLPMetrics|Natural Language Processing Performance Metrics]]
  * 2017-11-02 | [[https://github.com/borgr/gec-ranking|Ground Truth for Grammatical Error Correction Metrics]] -- python implementation of the GLEU metric


=== 記事 ===
  * 2022-06-01 | [[https://blog.shinonome.io/huggingface-evaluate/|【機械学習】Hugging faceの評価指標計算ライブラリ「Evaluate」を使ってみた。]]
  * 2022-02-25 | [[https://ai-scholar.tech/articles/natural-language-processing/mauve|生成されたテキストの人間っぽさや面白さを高精度にモデル化：MAUVE]]
  * 2021-12-18 | [[https://gotutiyan.hatenablog.com/entry/2021/12/18/123008|評価手法としてではない評価手法]]
  * 2020-12-15 | [[https://stop-the-world.hatenablog.com/entry/cs276-information-retrieval-15|Information Retrieval and Web Search まとめ(15): 評価(2)]]
  * 2020-12-14 | [[https://stop-the-world.hatenablog.com/entry/cs276-information-retrieval-14|Information Retrieval and Web Search まとめ(14): 評価(1)]]
  * 2020-06-03 | [[https://webbigdata.jp/ai/post-5978|BLEURT:人工知能が生成した文章の品質を評価(1/3)]]
  * 2020-05-15 | (スライド) [[https://speakerdeck.com/cfiken/15-nlpaper-dot-challenge-bertying-yong-mian-qiang-hui-tekisutosheng-cheng-falseping-jia-x-bert|[2020/05/15] nlpaper.challenge BERT応用勉強会 テキスト生成の評価 × BERT]]
  * 2020-01-12 | [[https://qiita.com/amtsyh/items/a926b79b90dfabe895e9|テキスト生成の自動評価指標について]]