Arabic part of speech tagger

12/5/2023

POS-tagging is usually the first step in linguistic analysis. Thus, the task of POS-tagging is attaching appropriate grammatical or morpho-syntactical category labels to each word, token, symbol, abbreviation and even punctuation mark in a corpus. POS-tagging is considered as a process for automatically assigning the proper grammatical tag to each word of a written text according to its appearance on the text. One of these areas is part-of-speech tagging (POS-tagging). There are many areas that may be considered as properly included within the discipline of computational linguistics. Computational linguistics might be considered as a synonym of automatic processing of natural language, since the main task of computational linguistics is just the construction of computer programs to process words and texts in natural language. It unites two areas that are quite different in appearance, computer science and natural languages. Computational linguistics is a field of artificial intelligence dealing with the logical modeling of natural language from a computational perspective. The most effective features that yield this accuracy are a combination of w 0 (the current word), p 0 (POS of the current word), p -3 (POS of three words before), p -2 (POS of two words before) and p -1 (POS of the word before).The study described in this paper belongs to the area of computational linguistics.

The highest accuracy in the results achieved is 98.32%, which can be a significant enhancement for the state-of-the-art for Arabic Quranic text. The data used in this study is the Arabic Quranic Corpus, an annotated linguistic resource consisting of 77,430 words with Arabic grammar, syntax and morphology for each word in the Holy Quran. Hence, this study aims to efficiently integrate different feature sets and tagging algorithms to synthesize more accurate POS tagging procedure. In addition, an in-depth study has been conducted on a large list of features for exploiting effective features and investigating their role in enhancing the performance of POS taggers for the Quranic Arabic. The Majority voting is used here as the combination strategy to exploit classifiers advantages. We propose a classifiers combination experimental framework for Arabic POS tagger, by selecting two best diverse probabilistic classifiers used in numerous works in non-Arabic language namely K-Nearest Neighbour (KNN) and Naive Bayes (NB). With this in mind, the main problem here is to find out how existing and efficient methods perform in Arabic and how can Quranic corpus be utilized to produce an efficient framework for Arabic POS tagging. This complexity presents several challenges for POS tagging such as high ambiguity, data sparseness and large existence of unknown words. Currently, it is well known that some POS tagging models are not performing well on the Quranic Arabic due to the complexity of the Quranic Arabic text. Different POS tagging techniques in the literature have been developed and experimented.

It is the process of classifying every word in a given context to its appropriate part of speech. Part Of Speech (POS) tagging forms the important preprocessing step in many of the natural language processing applications such as text summarization, question answering and information retrieval system.

0 Comments

I'm James. This is my year of travel.

Arabic part of speech tagger

Leave a Reply.

Author

Archives

Categories