چکیده:
Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from external knowledge resources. However, these solutions are not well explored for the general web search in an open-domain setting. In addition, they mostly focus on supporting search in content expressed in English and Latin based languages. In this research, we propose a fully automated approach that aims to support exploratory search over the Arabic web content. It exploits the Arabic version of Wikipedia to extract complementary information that supports visual representation and deeper exploration of the search engine's results. Key Wikipedia entities are extracted from the text snippets produced by the search engine in response to the user's query. Entities are then filtered and ranked by using a novel ranking algorithm that extends the conventional PageRank algorithm. Finally, a graph is built and presented to the user to visually represent highly ranked topics and their relationships. The proposed approach was realized by developing ArabXplore, a system that integrates with the web browser to support the web search process by executing our approach in query time. It was assessed over a dataset of 100 Arabic search queries covering different domains, and results were assessed and rated by human subjects. The underlying ranking algorithm was also compared with the conventional PageRank.
خلاصه ماشینی:
Plenty of works have been proposed to support the exploratory search on the web: Several works proposed to use structured knowledge resources such as LOD (Linked Open Data) (Jacksi, Dimililer, & Zeebaree, 2015; Marie & Gandon, 2014), or ontologies (Dimitrova, Lau, Thakker, Yang-Turner, & Despotakis, 2013; Tvarožek, 2011) to enable the user to explore the topic of interest in depth.
Jiang & Li, 2016; Raza, Mokhtar, & Ahmad, 2018; Zhou, Wu, Zhao, Lawless, & Liu, 2017), and 3) knowledge base techniques; which exploit external knowledge sources to extract terms related to the query terms (Agarwalla, Parikh, & Sai, 2018; Jabri, Dahbi, Gadi, & Bassir, 2018; Xiong & Callan, 2015).
Commonly-used knowledge sources include Wikipedia, Open Linked Data (Dahir, Khalifi, & El Qadi, 2019; Raza, Mokhtar, Ahmad, Pasha, & Pasha, 2019), WordNet (Abbache, Meziane, Belalem, & Belkredim, 2018; Lu, Sun, Wang, Lo, & Duan, 2015), and domain ontologies (Alromima, Moawad, Elgohary, & Aref, 2016; Raza, Mokhtar, Noraziah, et al.
, 2018; Yunzhi, Huijuan, Shapiro, Travillian, & Lanjuan, 2016).
Faceting has been applied in many domain-specific systems, including e-commerce (Jumlesha, Sree, Likitha, & Goud, 2018; Vandic, Aanen, Frasincar, & Kaymak, 2017) and digital libraries (Aletras, Baldwin, Lau, & Stevenson, 2014; Gaona-García, Martin-Moncunill, & Montenegro-Marin, 2017), to enable users to navigate a multi-dimensional information space.
Text snippets in search results are mapped to Wikipedia entities, which are then filtered and ranked to maintain entities that are most relevant to the user's query.