このページの2つのバージョン間の差分を表示します。
両方とも前のリビジョン前のリビジョン次のリビジョン | 前のリビジョン最新のリビジョン両方とも次のリビジョン | ||
snow:nagaoka_tigrinya_corpus [2022/05/12 21:12] – [6. Contact us] admin | snow:nagaoka_tigrinya_corpus [2022/05/12 21:17] – admin | ||
---|---|---|---|
行 3: | 行 3: | ||
===== Nagaoka Tigrinya Corpus ===== | ===== Nagaoka Tigrinya Corpus ===== | ||
- | ==== 1. What is a corpus ? ==== | + | |
+ | (Note that this page is written by [[: | ||
+ | |||
+ | === 1. What is a corpus ? === | ||
"In linguistics, | "In linguistics, | ||
- | ==== 2. The Nagaoka Tigrinya corpus 1.0 (NTC 1.0) ==== | + | === 2. The Nagaoka Tigrinya corpus 1.0 (NTC 1.0) === |
The Nagaoka Tigrinya corpus is the first publicly available part-of-speech (PoS) tagged corpus of Tigrinya language. This text corpus is compiled at Nagaoka university of Technology. The corpus is a collection of news articles from an Eritrean newspaper called " | The Nagaoka Tigrinya corpus is the first publicly available part-of-speech (PoS) tagged corpus of Tigrinya language. This text corpus is compiled at Nagaoka university of Technology. The corpus is a collection of news articles from an Eritrean newspaper called " | ||
- | ==== 3. Tagset design | + | === 3. Tagset design === |
The corpus is manually tagged for part of speech tags with few enhancements done automatically. This released NTC 1.0 is labelled with 20 Tigrinya parts-of-speech that contain level-1 (Major PoS Category) and Level-2 (Type of Category) information. The tags are given as follows: | The corpus is manually tagged for part of speech tags with few enhancements done automatically. This released NTC 1.0 is labelled with 20 Tigrinya parts-of-speech that contain level-1 (Major PoS Category) and Level-2 (Type of Category) information. The tags are given as follows: | ||
The guidelines for tagging NTC 1.0 were developed based on three Tigrinya grammar books. These are: | The guidelines for tagging NTC 1.0 were developed based on three Tigrinya grammar books. These are: | ||
- | | + | |
- | | + | |
- | | + | |
==== 4. Format of NTC ==== | ==== 4. Format of NTC ==== | ||
行 31: | 行 34: | ||
- [[https:// | - [[https:// | ||
- | ==== 6. Contact us ==== | + | === 6. Contact us === |
- | For any suggestions, | + | For any suggestions, |
We appreciate your input to help us improve the quality of NTC. We hope this corpus will encourage further Natural Language Processing (NLP) research on Tigrinya and other Eritrean languages. | We appreciate your input to help us improve the quality of NTC. We hope this corpus will encourage further Natural Language Processing (NLP) research on Tigrinya and other Eritrean languages. | ||
(NOTE: In case e-mail addresses above are unreachable or download links are dead, please contact [[:eng:|the admistrator]]) | (NOTE: In case e-mail addresses above are unreachable or download links are dead, please contact [[:eng:|the admistrator]]) | ||