Please use this identifier to cite or link to this item:
http://hdl.handle.net/11452/33097
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.date.accessioned | 2023-06-21T10:13:19Z | - |
dc.date.available | 2023-06-21T10:13:19Z | - |
dc.date.issued | 2014-06-26 | - |
dc.identifier.citation | Çağlar, B. G. vd. (2014). "Character n-gram application for automatic new topic identification". Information Processing and Management, 50(6), 821-856. | en_US |
dc.identifier.issn | 0306-4573 | - |
dc.identifier.issn | 1873-5371 | - |
dc.identifier.uri | https://doi.org/10.1016/j.ipm.2014.06.005 | - |
dc.identifier.uri | https://www.sciencedirect.com/science/article/pii/S0306457314000521 | - |
dc.identifier.uri | http://hdl.handle.net/11452/33097 | - |
dc.description.abstract | The widespread availability of the Internet and the variety of Internet-based applications have resulted in a significant increase in the amount of web pages. Determining the behaviors of search engine users has become a critical step in enhancing search engine performance. Search engine user behaviors can be determined by content-based or content-ignorant algorithms. Although many content-ignorant studies have been performed to automatically identify new topics, previous results have demonstrated that spelling errors can cause significant errors in topic shift estimates. In this study, we focused on minimizing the number of wrong estimates that were based on spelling errors. We developed a new hybrid algorithm combining character n-gram and neural network methodologies, and compared the experimental results with results from previous studies. For the FAST and Excite datasets, the proposed algorithm improved topic shift estimates by 6.987% and 2.639%, respectively. Moreover, we analyzed the performance of the character n-gram method in different aspects including the comparison with Levenshtein edit-distance method. The experimental results demonstrated that the character n-gram method outperformed to the Levensthein edit distance method in terms of topic identification. | en_US |
dc.language.iso | en | en_US |
dc.publisher | Elsevier | en_US |
dc.rights | info:eu-repo/semantics/closedAccess | en_US |
dc.subject | Content-ignorant algorithms | en_US |
dc.subject | The levenshtein edit-distance | en_US |
dc.subject | New topic identification | en_US |
dc.subject | The character n-gram method | en_US |
dc.subject | Pre-processed spelling correction methods | en_US |
dc.subject | Neural-network applications | en_US |
dc.subject | Web | en_US |
dc.subject | Categorization | en_US |
dc.subject | Computer science | en_US |
dc.subject | Information science & library science | en_US |
dc.subject | Behavioral research | en_US |
dc.subject | Search engines | en_US |
dc.subject | Errors | en_US |
dc.subject | Internet | en_US |
dc.subject | Edit distance | en_US |
dc.subject | Topic identification | en_US |
dc.subject | Internet-based applications | en_US |
dc.subject | Spelling correction | en_US |
dc.subject | Minimizing the number of | en_US |
dc.subject | Search engine performance | en_US |
dc.subject | N-gram methods | en_US |
dc.subject | Network methodologies | en_US |
dc.subject | Algorithms | en_US |
dc.title | Character n-gram application for automatic new topic identification | en_US |
dc.type | Article | en_US |
dc.identifier.wos | 000342546900001 | tr_TR |
dc.identifier.scopus | 2-s2.0-84905168720 | tr_TR |
dc.relation.publicationcategory | Makale - Uluslararası Hakemli Dergi | tr_TR |
dc.contributor.department | Uludağ Üniversitesi/Mühendislik Fakültesi/Endüstri Mühendisliği Bölümü. | tr_TR |
dc.contributor.orcid | 0000-0003-0159-8529 | tr_TR |
dc.identifier.startpage | 821 | tr_TR |
dc.identifier.endpage | 856 | tr_TR |
dc.identifier.volume | 50 | tr_TR |
dc.identifier.issue | 6 | tr_TR |
dc.relation.journal | Information Processing and Management | en_US |
dc.contributor.buuauthor | Çağlar, Burcu Gençosman | - |
dc.contributor.buuauthor | Özmutlu, Hüseyin Cenk | - |
dc.contributor.buuauthor | Özmutlu, Seda | - |
dc.contributor.researcherid | AAH-4480-2021 | tr_TR |
dc.contributor.researcherid | ABH-5209-2020 | tr_TR |
dc.contributor.researcherid | AAG-8600-2021 | tr_TR |
dc.subject.wos | Computer science, information systems | en_US |
dc.subject.wos | Information science & library science | en_US |
dc.indexed.wos | SCIE | en_US |
dc.indexed.wos | SSCI | en_US |
dc.indexed.scopus | Scopus | en_US |
dc.wos.quartile | Q2 | en_US |
dc.contributor.scopusid | 56263661900 | tr_TR |
dc.contributor.scopusid | 6603061328 | tr_TR |
dc.contributor.scopusid | 6603660605 | tr_TR |
dc.subject.scopus | Query Reformulation; Image Indexing; Information Retrieval | en_US |
Appears in Collections: | Scopus Web of Science |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.