Semantics - Arabic Part of Speech Tagger


Part of speech tagging is the process of selecting the most likely sequence of syntactic categories for the words in a sentence. It determines grammatical characteristics of the words, such as part of speech, grammatical number, gender, person, etc. In the case of Arabic language, this task is not trivial since most of the words are ambiguous as a result of the absence of vowels.

For each word, we want at a minimum to identify its main lexical category (noun, verb etc.) and inflectional features (plural, past tense etc.) if any. We might also identify some quasi-semantic features (proper noun) or even specify a word sense relative to some lexicon.

Paper available

This article is best reviewed with its accompanied paper.

- Dependency Grammar
- Catenation
- Sentence Structure
- PoS Tagging

Arabic Part of Speech Tagger Arabic Corpus Arabic Root Extractor
Arabic Text Parser Arabic Roots Arabic Text Diacritizer
Arabic Ontology Processor Arabic Stems Arabic Verb Conjugator
Arabic Named Entity Extractor Loan Words Arabic Noun Inflector
Loan Terms Personal Names Retrieval
Colloquial Arabic Toponym Romanizer
English/Arabic Entity Names Amharic Verb Conjugator

Kalmasoft PoS Tagger is the answer to most of the problems related to Arabic corpus tagging, a context-sensitive rule-based solution hand-crafted set of comprehensive syntactic rules to deal with Arabic datasets, the output is a structured XML or JSON format but SQL database and CSV are among the other alternatives.

Kalmasoft PoS Tagger is designed to prepare Arabic annotated corpus since tagged corpus is more useful than an untagged corpus because there is more information there than in the raw text alone. Once a corpus is tagged, it can be used to extract information. This can then be used for creating dictionaries and grammars of a language using real language data. Tagged corpora are also useful for detailed quantitative analysis of text.

The system's output -processed corpus- is therefore suited for machines rather than human although there exists a view interface for testing purposes which works well for short text; output can also be saved as HTML or TXT file.

sliding window

MAPS Arabic PoS tagger
A screenshot of MAPSSeman PoS Tagger interface, you can view the technical specifications. You may also DOWNLOAD Evaluation copy.

Arabic Named Entity Recognition
Arabic Named Entity Recognition.

visualisation of parsed catena
Visualisation of parsed catena.

V: verbA: adjectiveC: conjunction
N: nounPr: prepositiona: adverb
d: demonstrativer: relativeF: foreign word
O: ordinal numberE: verbal noun:
R: pronounT: typographic errorX: No Solution
P: perfectiveS: singularF: feminine
I: imperfectiveD: dual1: first person
M: imperativeP: plural2: second person
E: emphaticM: masculine3: third person

Check full documentation of Kalmasoft tagset here.

تعددت وتنوعت الأزمات التي خلفتها الحرب في اليمن وأزمة الانقطاع الكامل لخدمة الكهرباء ضاعفت من معاناة سكان هذه البلاد ودفعتهم نحو مصادر الطاقة البديلة للتخفيف من آثار تلك الأزمة
toddt wtnwot Al!zmAt Alty KlfthA AlHrb fy Alymn w!zm: AlAnqTAo AlkAml lKdm: AlkhrbA' DAoft mn moAnA: skAn hch AlblAd wdfothm nHw mSAdr AlTAq: Albdyl: lltKfyf mn |xAr tlk Al!zm:

ID Token KATS Syntax Arguments Prefix Suffix Gloss*
1 تعددت toddt VPIA 3PF•••
2 وتنوعت wtnwot VPIA 053PF••• PC
3 الأزمات Al!zmAt NNG ••••PF PD
4 التي Alty PL
5 خلفتها KlfthA VPIA 023SF3SF
6 الحرب AlHrb NNN ••••S• PD
7 في fy PP
8 اليمن Alymn NN•G
9 وأزمة w!zm: NF•G ••••SF PC
10 الانقطاع AlAnqTAo NF•G 07•SM PD
11 الكامل AlkAml NA•G ••••SM PD
12 لخدمة lKdm: NF•G ••••SF
13 الكهرباء AlkhrbA' N••G PD
14 ضاعفت DAoft VPIA 033SF•••
15 من mn PP
16 معاناة moAnA: NF•G 03•SF
17 سكان skAn NQ•G ••••BM
18 هذه hch PD
19 البلاد AlblAd N••G PD
20 ودفعتهم wdfothm VPIA 013SF3PM PC
21 نحو nHw NV
22 مصادر mSAdr NF•A ••••PM
23 الطاقة AlTAq: N••G ••••SF PD
24 البديلة Albdyl: NA•G ••••SF PD
25 للتخفيف lltKfyf NF•G PP
26 من mn PP
27 آثار |xAr NF•G ••••B•
28 تلك tlk PD
29 الأزمة Al!zm: NF•G ••••SF
(*) These are for reference only, the real module outputs simple version gloss (stems only).

(**) Larger xml output sample can be found here XML output sample

Home » MAPS » MAPS Semantics » Arabic Part of Speech Tagger

Category Software | Reference MSLTAG | Family MAPSEMANL | Last updated 19/12/2019