کاربرد الگوریتم‌های یادگیری ماشین در متن‌کاوی با رویکرد آنالیز احساس Journal Article

Writer: سمیع زاده، رضا ؛ محمودی سعید آباد، الناز ؛

Journal of Information Technology Management تابستان 1397 - شماره 35 Ranking ب (Ministry of Science/ISC (‎22 page(s) - From 309 to 330 )

Keywords: متن‌کاوی شبکه‌های عصبی آنالیز احساس ماشین ‌بردار ‌پشتیبان نایو بیز Neural Network sentiment analysis text mining support vector machine Naïve Bayes

fa en

Abstract:

Classification of the cyber texts and comments into two categories of positive and negative sentiment among social media users is of high importance in the research are related to text mining. In this research, we applied supervised classification methods to classify Persian texts based on sentiment in cyber space. The result of this research is in a form of a system that can decide whether a comment which is published in cyber space such as social networks is considered positive or negative. The comments that are published in Persian movie and movie review websites from 1392 to 1395 are considered as the data set for this research. A part of these data are considered as training and others are considered as testing data. Prior to implementing the algorithms, pre-processing activities such as tokenizing, removing stop words, and n-germs process were applied on the texts. Naïve Bayes, Neural Networks and support vector machine were used for text classification in this study. Out of sample tests showed that there is no evidence indicating that the accuracy of SVM approach is statistically higher than Naïve Bayes or that the accuracy of Naïve Bayes is not statistically higher than NN approach. However, the researchers can conclude that the accuracy of the classification using SVM approach is statistically higher than the accuracy of NN approach in 5% confidence level.

Machine summary:

پژوهش هاي بسياري در حوزٔە استخراج دانش از داده هاي متني انجام گرفته است و در تمـام آنها تلاش شده که با استفاده از روش هاي طبقه بندي احساس ، متن هـاي منتشـر شـدٔە کـاربران فضاي مجازي بر اساس احساسات ايجادشده در آنها، به گروه هاي مجزا و با ويژگي هاي مشخص طبقه بندي شوند. Sentiment analysis مي شود (براي مثال ثبت يک تا پنج ستاره در سايت هاي مختلف توسط کاربر بـه منظـور نمـايش ميزان مطلوبيت ) و شايد در نگاه اول براي اين کار به تحليل متن هاي منتشر شدٔە کاربران نيازي نباشد؛ اما استفاده از روش هاي متن کـاوي در ايـن زمينـه راه را بـراي آنـاليز احسـاس در سـطح مشخص هموار مي کند؛ به گونه اي که تحليل متن ها به نتايجي منتهي شود که نشان دهـد بـراي مثال کاربران يک محصول از کدام يک از ويژگي هاي آن رضايت دارند و کدام ويژگـي نتوانسـته است مطلوبيت کافي را براي ايشان فراهم آورد. در اغلب پژوهش ها منظور از دقت به دست آمده ، نسبت تعداد متن هـايي است که احساس موجود در آنها با استفاده از الگوريتم مد نظر به درستي تشخيص داده شـده بـه کل کامنت هاي تجزيه و تحليل شده اند (موراس ، والياتي و نتو، ٢٠١٢). نتايج اجراي نايو بيز دقت = ٥٣/٦٨ % منفي صحيح مثبت صحيح Class precision پيش بيني منفي 616 448 % 56/05 پيش بيني مثبت 354 477 %57/40 % 49/69 %51/63 Class Recall همان طور که در جدول ٣ مشاهده مي شود، دقت مدل نايو بيز ٥٦/٦٣ درصد تخمين زده شده و اين الگوريتم توانسته است از ٩٧٠ کامنت منفي ، ٦١٦ کامنت را بـه درسـتي بـا عبـارت منفـي طبقه بندي کند و در تشخيص کامنت هاي منفي ٦٣/٥١ درصد دقيق عمل کرده است .