Character n-gram application for automatic new topic identification

Bu öğeden alıntı yapmak, öğeye bağlanmak için bu tanımlayıcıyı kullanınız: http://hdl.handle.net/11452/33097

Tüm üstveri kaydı

Dublin Core Alanı	Değer	Dil
dc.date.accessioned	2023-06-21T10:13:19Z	-
dc.date.available	2023-06-21T10:13:19Z	-
dc.date.issued	2014-06-26	-
dc.identifier.citation	Çağlar, B. G. vd. (2014). "Character n-gram application for automatic new topic identification". Information Processing and Management, 50(6), 821-856.	en_US
dc.identifier.issn	0306-4573	-
dc.identifier.issn	1873-5371	-
dc.identifier.uri	https://doi.org/10.1016/j.ipm.2014.06.005	-
dc.identifier.uri	https://www.sciencedirect.com/science/article/pii/S0306457314000521	-
dc.identifier.uri	http://hdl.handle.net/11452/33097	-
dc.description.abstract	The widespread availability of the Internet and the variety of Internet-based applications have resulted in a significant increase in the amount of web pages. Determining the behaviors of search engine users has become a critical step in enhancing search engine performance. Search engine user behaviors can be determined by content-based or content-ignorant algorithms. Although many content-ignorant studies have been performed to automatically identify new topics, previous results have demonstrated that spelling errors can cause significant errors in topic shift estimates. In this study, we focused on minimizing the number of wrong estimates that were based on spelling errors. We developed a new hybrid algorithm combining character n-gram and neural network methodologies, and compared the experimental results with results from previous studies. For the FAST and Excite datasets, the proposed algorithm improved topic shift estimates by 6.987% and 2.639%, respectively. Moreover, we analyzed the performance of the character n-gram method in different aspects including the comparison with Levenshtein edit-distance method. The experimental results demonstrated that the character n-gram method outperformed to the Levensthein edit distance method in terms of topic identification.	en_US
dc.language.iso	en	en_US
dc.publisher	Elsevier	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	Content-ignorant algorithms	en_US
dc.subject	The levenshtein edit-distance	en_US
dc.subject	New topic identification	en_US
dc.subject	The character n-gram method	en_US
dc.subject	Pre-processed spelling correction methods	en_US
dc.subject	Neural-network applications	en_US
dc.subject	Web	en_US
dc.subject	Categorization	en_US
dc.subject	Computer science	en_US
dc.subject	Information science & library science	en_US
dc.subject	Behavioral research	en_US
dc.subject	Search engines	en_US
dc.subject	Errors	en_US
dc.subject	Internet	en_US
dc.subject	Edit distance	en_US
dc.subject	Topic identification	en_US
dc.subject	Internet-based applications	en_US
dc.subject	Spelling correction	en_US
dc.subject	Minimizing the number of	en_US
dc.subject	Search engine performance	en_US
dc.subject	N-gram methods	en_US
dc.subject	Network methodologies	en_US
dc.subject	Algorithms	en_US
dc.title	Character n-gram application for automatic new topic identification	en_US
dc.type	Article	en_US
dc.identifier.wos	000342546900001	tr_TR
dc.identifier.scopus	2-s2.0-84905168720	tr_TR
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi	tr_TR
dc.contributor.department	Uludağ Üniversitesi/Mühendislik Fakültesi/Endüstri Mühendisliği Bölümü.	tr_TR
dc.contributor.orcid	0000-0003-0159-8529	tr_TR
dc.identifier.startpage	821	tr_TR
dc.identifier.endpage	856	tr_TR
dc.identifier.volume	50	tr_TR
dc.identifier.issue	6	tr_TR
dc.relation.journal	Information Processing and Management	en_US
dc.contributor.buuauthor	Çağlar, Burcu Gençosman	-
dc.contributor.buuauthor	Özmutlu, Hüseyin Cenk	-
dc.contributor.buuauthor	Özmutlu, Seda	-
dc.contributor.researcherid	AAH-4480-2021	tr_TR
dc.contributor.researcherid	ABH-5209-2020	tr_TR
dc.contributor.researcherid	AAG-8600-2021	tr_TR
dc.subject.wos	Computer science, information systems	en_US
dc.subject.wos	Information science & library science	en_US
dc.indexed.wos	SCIE	en_US
dc.indexed.wos	SSCI	en_US
dc.indexed.scopus	Scopus	en_US
dc.wos.quartile	Q2	en_US
dc.contributor.scopusid	56263661900	tr_TR
dc.contributor.scopusid	6603061328	tr_TR
dc.contributor.scopusid	6603660605	tr_TR
dc.subject.scopus	Query Reformulation; Image Indexing; Information Retrieval	en_US
Koleksiyonlarda Görünür:	Scopus Web of Science

Bu öğenin dosyaları:

Bu öğeyle ilişkili dosya bulunmamaktadır.

Kısa Öğe Kaydını Göster İstatistikler

DSpace'deki bütün öğeler, aksi belirtilmedikçe, tüm hakları saklı tutulmak şartıyla telif hakkı ile korunmaktadır.

Bursa Uludağ Üniversitesi Açık Erişim Sistemi

Bursa Uludağ Üniversitesinin araştırma çıktılarının yer aldığı açık erişim sistemidir.