Pacling 2013: http://pacling.nak.ics.keio.ac.jp/
Comments and Questions:
* What is the definition of "common sense knowledge"
-> We define the predicate (verb, adjective, verbal noun) as common sense knowledge.
* What corpus did you use?
-> We used the Japanese Google 7-gram data.
* How did you evaluate the results?
-> We manually evaluated whether assigned predicates are correct or not.
* What applications do you want to utilize common sense knowledge base for?
-> We want to utilize common sense knowledge for various NLP applications such as a conversation system.
* You can try using the case frame corpus constructed by Kyoto University: http://nlp.ist.i.kyoto-u.ac.jp/index.php?%E4%BA%AC%E9%83%BD%E5%A4%A7%E5%AD%A6%E6%A0%BC%E3%83%95%E3%83%AC%E3%83%BC%E3%83%A0
Two Issues in Syntactic Parsing
- Coordinate Structure: This information helps improve parsing accuracy
-> DP matching method for alignment (path-based method)
-> Parse trees produced by the grammar rules (tree-based method)
* Tree structure can represent coordinate structure as a tree
* Sum of all the scores of COORD/COORD nodes in the tree
- Grammatical Units
* Multiword Expressions (MWE)
- Lexicalized phrases & Institutional phrases: collocations, named entity
? How to construct MWE Lexicon -> from Wikitionary
? How to construct MWE annotated corpus -> annotation of Penn Treebank
? What to do
-> Dictionary of semi-fixed and syntactically flexible MWEs
-> MWE annotated corpus construction-> Parsing with MWE dictionary
* Complex sentence pattern -> Joing processing
Investigation of clause pattern variations around "SBAR" pattern
Extraction of SBAR patterns in auto-parsed English corpus and grouping them
Corpus data: Hiragana Times (http://www.hiraganatimes.com/en/)
Coordinate Structure: Did you try to a statistical approach? -> Yes, but we couln't get fine resutls.
Thematic Representation of Short Text Messages with Latent Topics: Application in the Twitter context
memoWhat's a merit of your output.
Example: input and output
Input: just a tweet
Output: Add information which relate to a tweet in order to understand the tweet.How do you hack the Wikipedia article. -> time up
e.g. In 1954, the NBA had no health benefits no penston plan, no minimum salary , and the average players salary was 8000$ a season
Bursty Topics in Time Series Japanese / Chinese News Streams and their Cross-Lingual Alignment
I want to see a recall evaluation.
-> future plans
We think that the birthday topic is a important event between Japan and Chinese.
Using Heterogeneous Features for Scientific Citation Classification
Researchers are faced with ever increasing literature in all fields
-> Help researchers to more efficiently distill knowledge from scientific citation networks
Design of a Web-scale Japanese Corpus
NICT: Japanese Syntactic Dependency Database Version 1.1.
- 480 million sysntactic dependency relations in 600 million pages and 43 billion sentences
Kyoto University: Kyoto-U Case Frames (Version 1.0) in 2009
Tsukuba-U: Tsukuba Web Corpus
NDL: Web Archive Project
Yata: Japanese Web Corpus 2010
Heritrix Crawler (Version 3.1)
- Developed by Internet Archive (United State)
- Used by national libraries (e.g. NDL in Japan)
NWC (Nihongo Web Corpus) Toolkit
Masuoka-Takubo POS target
Kokugo-ken Short Unit & Kokubo-ken Long Unit (Chunker CRF++)
I hope this corpus is published
One issue is the copyright of original
Extraction of Drug Information using Clue Words from Japanese Blogs
Extraction of medical information from patient's blogs
* To get answers fro these questions
* To help decision making
Okusuri110ban (NGO website with drugs information)
--- 12,170 illness related nouns automatically retrieved.Does the recall of results is increased If you collect more examples?-> We think the precision is more important than the recall (Because we want to use this system for participants) So, the small collect data is also more important than big noisy data.
Their approach accepts the negative expressions.
Dependency-Based Method for Extracting Causes of Emotions
memoChallenge: Recognition of implicit emotions from text
Novel method for extraction of emotion causes from sentences
How can I use the technology everyday in my life -> analysis for blogs, news and forums (for example, marketing emotions might be useful)
in other language -> similar system is developed