Character n-gram application for automatic new topic identification

Bu öğeden alıntı yapmak, öğeye bağlanmak için bu tanımlayıcıyı kullanınız: http://hdl.handle.net/11452/33097

Başlık:	Character n-gram application for automatic new topic identification
Yazarlar:	Uludağ Üniversitesi/Mühendislik Fakültesi/Endüstri Mühendisliği Bölümü. 0000-0003-0159-8529 Çağlar, Burcu Gençosman Özmutlu, Hüseyin Cenk Özmutlu, Seda AAH-4480-2021 ABH-5209-2020 AAG-8600-2021 56263661900 6603061328 6603660605
Anahtar kelimeler:	Content-ignorant algorithms The levenshtein edit-distance New topic identification The character n-gram method Pre-processed spelling correction methods Neural-network applications Web Categorization Computer science Information science & library science Behavioral research Search engines Errors Internet Edit distance Topic identification Internet-based applications Spelling correction Minimizing the number of Search engine performance N-gram methods Network methodologies Algorithms
Yayın Tarihi:	26-Haz-2014
Yayıncı:	Elsevier
Atıf:	Çağlar, B. G. vd. (2014). "Character n-gram application for automatic new topic identification". Information Processing and Management, 50(6), 821-856.
Özet:	The widespread availability of the Internet and the variety of Internet-based applications have resulted in a significant increase in the amount of web pages. Determining the behaviors of search engine users has become a critical step in enhancing search engine performance. Search engine user behaviors can be determined by content-based or content-ignorant algorithms. Although many content-ignorant studies have been performed to automatically identify new topics, previous results have demonstrated that spelling errors can cause significant errors in topic shift estimates. In this study, we focused on minimizing the number of wrong estimates that were based on spelling errors. We developed a new hybrid algorithm combining character n-gram and neural network methodologies, and compared the experimental results with results from previous studies. For the FAST and Excite datasets, the proposed algorithm improved topic shift estimates by 6.987% and 2.639%, respectively. Moreover, we analyzed the performance of the character n-gram method in different aspects including the comparison with Levenshtein edit-distance method. The experimental results demonstrated that the character n-gram method outperformed to the Levensthein edit distance method in terms of topic identification.
URI:	https://doi.org/10.1016/j.ipm.2014.06.005 https://www.sciencedirect.com/science/article/pii/S0306457314000521 http://hdl.handle.net/11452/33097
ISSN:	0306-4573 1873-5371
Koleksiyonlarda Görünür:	Scopus Web of Science

Bu öğenin dosyaları:

Bu öğeyle ilişkili dosya bulunmamaktadır.

Tüm Öğe Kaydını Göster İstatistikler

DSpace'deki bütün öğeler, aksi belirtilmedikçe, tüm hakları saklı tutulmak şartıyla telif hakkı ile korunmaktadır.

Bursa Uludağ Üniversitesi Açık Erişim Sistemi

Bursa Uludağ Üniversitesinin araştırma çıktılarının yer aldığı açık erişim sistemidir.