چکیده:
رعد و برق از شدیدترین مخاطرات آبوهوایی است که هرساله به خسارت اقتصادی – اجتماعی و زیستمحیطی فراوانی منجر میشود. پیشبینی رعد و برق بهعلّت گسترش فضایی و زمانی آبوهوا بهصورت فیزیکی یا دینامیکی بسیار دشوار است؛ بنابراین پیشبینی بهموقع و ارزیابی بهترین مدل دادهکاوی در کاهش آسیب و خسارت موثّر است. در پژوهش حاضر، از دادههای سال 1390 تا 1396 ایستگاه هواشناسی رشت استفاده شد. متغیّر وابسته وقوع و عدم وقوع رعد و برق در طی هفت سال و متغیّرهای مستقل عوامل موثّر بر رعد و برق شامل دما، رطوبت نسبی، ابرناکی، سرعت باد، جهت باد، فشار هوا و رعد و برق در روز گذشته است. پس از پیشپردازش و پردازش دادهها از مدلهای دادهکاوی شامل درختهای کارت، چاید، سی فایو و شبکة عصبی پرسپترون چندلایه، تابع پایهای شعاعی و ماشین بردار پشتیبان در نرمافزار اس.پی.اس.اس. مودلر ورژن 20 استفاده شد. نتایج حاصل از مدلها با معیارهای مقایسهای و منحنی راک مقایسه شد. با توجّه به نتایج بهدستآمده از مدلها، احتمال وقوع رعد و برق در آینده در ماههای اردیبهشت، خرداد و تیر نسبت به سایر ماهها حداکثر است و میزان وقوع از فصل بهار تا زمستان روند کاهشی دارد و در فصل زمستان، به حداقل مقدار خود میرسد و ازمیان مدلهای پیشبینیکننده، درخت چاید با میزان تشخیص 794/ 0 و حداقل میزان نرخ مثبت کاذب 205/ 0و مدل ماشین بردار پشتیبان با پیشبینی صحیح 773/ 0 مورد و نرخ خطای 475/ 0 و دقّت 855/ 0 نسبت به سایر مدلها عملکرد بهینه دارند.
Lightning is one of the most severe weather hazards that will cause significant economic, social and environmental damage each year. The prediction of a lightning is a very difficult task due to the spatial and temporal expansion of weather either physically or dynamically. Therefore, timely forecasting of lightning and evaluation of the best data mining model is effective in reducing damage. In this research, the data of the years 2012_2018 of the Meteorological Station of Rasht were used, including dependent variable of occurrence and non-occurrence of lightning during 7 years and independent variables of factors affecting lightning including temperature, relative humidity, cloudy, wind speed, wind direction, pressure air and Previous day's lightning. After preprocessing and processing data, data mining models including Classification & Regression Tree (CART), Chi-squared Automatic Interaction Detector (CHAID), Induction of Decision Trees (C5) and neural networks Radial Basis Function (RBF), Multi Layer Perceptron (MLP) and Support Vector Machine (SVM) were used in Spss Modeler Ver 20 software. The results of the models were compared with the Comparative Criteria and the Receiver operating characteristic (ROC) curve. According to the results of the models, the probability of lightning occurrence is higher in the months of May, June and July than in other months and the rate of occurrence from spring to winter has a decreasing trend, while in winter it is at least. CHAID tree with a specificity rate of 0.794 and a minimum false positive rate of 0.205 and the SVM model with a correct prediction of 0.773 and an error rate of 0.475 and precision of 0.855 have optimum performance compared with other models. Extended Abstract 1-Introduction Lightning is the ionization of the atmosphere due to the increased potential difference between the cloud and earth and the rapid discharge of electricity in the form of light and sound waves. Increasing the intensity of Lightning lead to thunderstorms, heavy rain, floods and tornadoes. Rasht is the largest rice-growing city in the country and produces 11% of the required rice in country. In recent years, lightning accidents such as rice stalk sleeping and the risk of paddy disease, roads blocked due to floods, traffic congestion, damage to buildings and the falling bridge and mortality from the electric shock have doubled the importance of predicting lightning in the future. The main purpose of this study was to use recorded ground data from the occurrence and non-occurrence of lightning (binary data) and the effect of related meteorological parameters (temperature, relative humidity, cloudy, wind speed, wind direction, pressure air and previous day's lightning) to estimate the probability of lightning occurrence in future using data mining (trees and neural network models) and evaluate and determine the optimal model to reduce future damage. 2-Materials and Methods In this study, binary data of lightning and atmospheric parameters (temperature, relative humidity, cloudy, wind speed, wind direction, pressure air and previous day's lightning) were obtained from Rasht Meteorological Station during the years 2012-2018. Then according to Eq. (1) the data were normalized between zero and one and data classes were balanced using RUS and ROS algorithms in Rapid Miner software. Xn=X-Xmin / Xmax-Xmin Eq. (1) the process of changing variables with determination of statistical properties and correlation was performed using SPSS software to reduce the errors. Finally, SPSS Modeler software was used to predict occurrence and non-occurrence of lightning in future using by CART, CHAID, C5 trees and Multi Layer Perceptron (MLP), Radial Basis Function (RBF) and Support Vector Machine (SVM). In this research, the training data set contains 70% of the data and testing data set contains 30% of the data. Then, based on the relations (2 – 9) the results of the models output were evaluated with interpolation matrix, comparative criteria and ROC curve. Eq. (2) Accuracy=TP+TN/ TP+TN+FP+FN Precision=TP / TP+FP Eq. (3) Sensitivity=TP / TP+FN Eq. (4) Harmonic Mean=2*P*S / P+S Eq. (5) Specificity=TN / TN+FP Eq. (6) False Positive Rate= FP / FP+TN Eq. (7) False Negative Rate= FN / FN+TP Eq. (8) RMSE= √1/N Ʃ(P-O)2 Eq. (9) Where, O signifies the observed value, P denotes the predicted value, TN indicates the true negative rate, FP indicates the false positive rate, FN shows the false negative rate, TP shows the true positive rate and N signifies the number of data. 3-Results and Discussion Lightning is one of the most important environmental hazards. Data mining technique is a suitable method to predict lightning. The results show that prediction using data mining technique is possible and effective. Based on the results, the probability of lightning occurrence is the highest in spring (May and June) and summer (July); it is minimized in winter and has a decreasing trend. Therefore, the probability of lightning occurrence in the future is higher than non-occurrence of lightning. Besides, among the three tree, CART, CHAID and C5, the CART and C5 trees had less satisfactory indices lacking the highest accuracy and precision in predicting lightning in future. Whereas the CHAID tree in 0.76 cases made a correct prediction with 0.85 precision and predicted the occurrence of lightning rate to be 0.54, which is very similar to the real value 0.62, and among the network artificial models Support Vector Machine (SVM) model with maximum utility with prediction of 0.77 accuracy and precision of 0.85 and prediction of 0.60 probability of lightning occurrence have priority and superiority than Radial Basis Function (RBF) and Multi-Layer Perceptron (MLP) models. According to the classification and Area Under Roc Curve (AUC) among the trees, the CHAID tree with 0.829 value and the Support Vector Machine model with 0.853 value have superiority. The numerical results are obtained and the similarity of this prediction with real values shows that trees and network artificial are effective in predicting the probability of lightning occurring in the future and the CHAID tree and Support Vector Machine model have optimal performance compared with other models showing better predictability. 4-Conclusion According to the results of the model outputs, it was found that the probability of lightning occurring in Rasht city is very high. The models show the probability of lightning occurring in April has the same trend but the maximum lightning occurred in spring (May and June) due to unstable weather conditions and summer (July) is more than autumn and winter. Besides it has a decreasing trend, from spring to winter which is minimized in winter. From the evaluation of the CHIAD tree and the Support Vector Machine model, the Support Vector Machine model with a slight difference in utility indices of accuracy = 0.773, precision = 0.855, harmonic mean = 0.813, root mean square error = 0.475. False negative rate = 0.198 was identified as the optimal model in predicting lightning in future and with respect to reliable outputs with maximum accuracy, precision and least prediction error, the Support Vector Machine model has a good performance which can be used to forecast the probability of lightning occurrencein Rasht City. Also, according to the results of the models, the effective parameters to occurrence of lightning in order of Importance are previous day's lightning, temperature, pressure air, relative humidity and cloudy; other parameters are less important. Using data mining techniques and predictingprobability of lightning occurrence in future use by Support Vector Machine model, as a model with most accurate and precision, provides more accurate meteorology and the more effective actions to reduce future damage.
خلاصه ماشینی:
نتايج نهايي نشان داد مدل پيش بينـي دقـت قابـل قبـولي دارد؛ امـا براسـاس نتـايج بـه دسـت آمـده از تجزيـه و تحليل هاي آماري بهبود در پيش بيني وقوع رعد و برق و توفان هاي تندري و کاهش نرخ اشتباه در مـدل پـيش بينـي به ميزان و تأثير متغيرهاي واردشده به مدل و مشاهدات بيشتر در مناطقي که فرکانس رعد و بـرق بيشـتري دارنـد، وابسته است و درصورت وجود متغيرهاي بيشتر ميتوان پيش بيني را براي فصل هاي مختلـف بـه طـور جداگانـه و بـا دقت بيشتر انجام داد (راجيوان و همکاران ، ٢٠١٢)؛ در اين راستا به منظور تحليل روند وقوع رعد و بـرق از داده هـاي هواشناسي هفده ايستگاه سينوپتيک با طولانيترين دورٔە آماري و دو آزمون ناپارامتري استفاده شد و براسـاس نتـايج به دست آمده ، مشخص شد نيمۀ غربي ايران ازنظر تعداد وقوع روزهاي همراه با رعد و برق ، منطقـۀ همگنـي نيسـت و تعداد وقوع به صورت ساليانه از شمال به جنوب کاهش مـييابـد و در مقيـاس فصـلي و سـاليانه رونـد افزايشـي دارد (رسولي و جوان ، ١٣٩١)؛ همچنين با کدهاي مربوط به توفان هاي تندري، فراواني و توزيع زماني و مکاني توفان هـاي تندري در شمال غربي ايران طي سال هاي ١٩٥١ تا ٢٠٠٢ تحليل و مشخص شـد کـه فصـل هـاي بهـار و تابسـتان ، حداکثر بارش رعد و برقي را دارند (رسولي و همکاران ، ١٣٨٦).
با گذشت زمان پژوهش هاي انجام شده در سطح جهان به منظور پيش بينـي احتمـال وقـوع رعـد و بـرق ، همگـي براساس استفاده و ارزيابي مدل هـاي غيـر هيبريـدي و هيبريـدي بـه منظـور افـزايش دقـت در امـر پـيش بينـي بـا 1- Geographic Information System (GIS) 2- Bala 3- Qiu 4- Hou 5- Induction of Decision Trees (C5) 6- Chauhan & Thakur 7- McGovern 8- Covariates 9- Radiosonde 10- Blouin 11- Mostajabi درنظرگرفتن تأثير پارامترهاي اقليمي و جغرافيايي در وقوع رعد و برق است ؛ درحاليکـه پـژوهش هـاي داخلـي تنهـا مبتني بر توفان هاي تندري يا تعيين فراواني يا پراکنش رعد و برق است و به پيش بيني احتمال وقـوع رعـد و بـرق و کاربرد تکنيک داده کاوي در پيش بيني توجه چنداني نشده است .