このページの2つのバージョン間の差分を表示します。
次のリビジョン | 前のリビジョン最新のリビジョン両方とも次のリビジョン | ||
nagaoka_tigrinya_corpus [2021/08/19 22:40] – 作成 admin | snow:nagaoka_tigrinya_corpus [2022/05/12 21:17] – admin | ||
---|---|---|---|
行 1: | 行 1: | ||
+ | [[: | ||
+ | ~~NOTOC~~ | ||
+ | |||
===== Nagaoka Tigrinya Corpus ===== | ===== Nagaoka Tigrinya Corpus ===== | ||
- | ==== 1. What is a corpus ? ==== | + | |
+ | (Note that this page is written by [[: | ||
+ | |||
+ | === 1. What is a corpus ? === | ||
"In linguistics, | "In linguistics, | ||
- | ==== 2. The Nagaoka Tigrinya corpus 1.0 (NTC 1.0) ==== | + | === 2. The Nagaoka Tigrinya corpus 1.0 (NTC 1.0) === |
The Nagaoka Tigrinya corpus is the first publicly available part-of-speech (PoS) tagged corpus of Tigrinya language. This text corpus is compiled at Nagaoka university of Technology. The corpus is a collection of news articles from an Eritrean newspaper called " | The Nagaoka Tigrinya corpus is the first publicly available part-of-speech (PoS) tagged corpus of Tigrinya language. This text corpus is compiled at Nagaoka university of Technology. The corpus is a collection of news articles from an Eritrean newspaper called " | ||
- | ==== 3. Tagset design | + | === 3. Tagset design === |
The corpus is manually tagged for part of speech tags with few enhancements done automatically. This released NTC 1.0 is labelled with 20 Tigrinya parts-of-speech that contain level-1 (Major PoS Category) and Level-2 (Type of Category) information. The tags are given as follows: | The corpus is manually tagged for part of speech tags with few enhancements done automatically. This released NTC 1.0 is labelled with 20 Tigrinya parts-of-speech that contain level-1 (Major PoS Category) and Level-2 (Type of Category) information. The tags are given as follows: | ||
The guidelines for tagging NTC 1.0 were developed based on three Tigrinya grammar books. These are: | The guidelines for tagging NTC 1.0 were developed based on three Tigrinya grammar books. These are: | ||
- | | + | |
- | | + | |
- | | + | |
==== 4. Format of NTC ==== | ==== 4. Format of NTC ==== | ||
行 25: | 行 31: | ||
NTC 1.0 can be used freely for research purposes. | NTC 1.0 can be used freely for research purposes. | ||
- | - [[https://filedn.com/lit4DCIlHwxfS1gj9zcYuDJ/ | + | - [[https://www.jnlp.org/cgi-priv/download.cgi? |
- | - [[https://filedn.com/lit4DCIlHwxfS1gj9zcYuDJ/ | + | - [[https://www.jnlp.org/cgi-priv/download.cgi? |
- | ==== 6. Contact us ==== | + | === 6. Contact us === |
- | For any suggestions, | + | For any suggestions, |
- | yemane@jnlp.org or yemanekeleta@gmail.com. | + | |
We appreciate your input to help us improve the quality of NTC. We hope this corpus will encourage further Natural Language Processing (NLP) research on Tigrinya and other Eritrean languages. | We appreciate your input to help us improve the quality of NTC. We hope this corpus will encourage further Natural Language Processing (NLP) research on Tigrinya and other Eritrean languages. | ||
- | (NOTE: In case e-mail addresses above are unreachable or download links are dead, please contact | + | (NOTE: In case e-mail addresses above are unreachable or download links are dead, please contact |