History
- Symbolic (象征) NLP (1950s – early 1990s)
- 基于明确的规则和逻辑推理
- 可解释性强,能理解复杂语言逻辑
- 很难写全所有规则,应对不了语言中的灵活表达
- Statistical (统计) NLP (1990s–2010s)
- 依赖大量数据和概率计算
- 适合处理大规模数据
- 需要大量训练数据,结果不一定能解释清楚
- Neural (神经) NLP (present)
- 模拟人脑工作原理,通过层层计算“自动学习”语言
- 适合复杂任务,比如翻译、聊天、生成文章
- 计算量大,训练时间长,结果不容易解释
Common tasks
- Text and speech processing
- Optical (光学) character recognition (OCR)
- Speech recognition
- Speech segmentation
- Text-to-speech
- Word segmentation (Tokenization) 单词分割(令牌化)
- Morphological (形态) analysis
- Lemmatization
- Morphological segmentation
- Part-of-speech tagging
- Stemming
- Syntactic(句法) analysis
- Grammar induction
- Sentence breaking (also known as “sentence boundary disambiguation”)
- Parsing
- Lexical semantics (of individual words in context)
- Lexical semantics
- Distributional semantics
- Named entity recognition (NER)
- Sentiment (情感) analysis (see also Multimodal sentiment analysis)
- Terminology (术语) extraction
- Word-sense disambiguation (WSD)
- Entity linking
- Relational semantics (semantics of individual sentences)
- Relationship extraction
- Semantic parsing
- Semantic role labelling (see also implicit semantic role labelling below)
- Discourse (semantics beyond individual sentences)
- Coreference resolution
- Discourse analysis
- Implicit semantic role labelling
- Recognizing textual entailment
- Topic segmentation and recognition
- Argument mining
- Higher-level NLP applications
- Automatic summarization (text summarization)
- Grammatical error correction
- Logic translation
- Machine translation (MT)
- Natural-language understanding (NLU)
- Natural-language generation (NLG):
- Book generation
- Document AI
- Dialogue management
- Question answering
- Text-to-image generation
- Text-to-scene generation
- Text-to-video
- llm
References