Please use this identifier to cite or link to this item: http://hdl.handle.net/11452/33097
Full metadata record
DC FieldValueLanguage
dc.date.accessioned2023-06-21T10:13:19Z-
dc.date.available2023-06-21T10:13:19Z-
dc.date.issued2014-06-26-
dc.identifier.citationÇağlar, B. G. vd. (2014). "Character n-gram application for automatic new topic identification". Information Processing and Management, 50(6), 821-856.en_US
dc.identifier.issn0306-4573-
dc.identifier.issn1873-5371-
dc.identifier.urihttps://doi.org/10.1016/j.ipm.2014.06.005-
dc.identifier.urihttps://www.sciencedirect.com/science/article/pii/S0306457314000521-
dc.identifier.urihttp://hdl.handle.net/11452/33097-
dc.description.abstractThe widespread availability of the Internet and the variety of Internet-based applications have resulted in a significant increase in the amount of web pages. Determining the behaviors of search engine users has become a critical step in enhancing search engine performance. Search engine user behaviors can be determined by content-based or content-ignorant algorithms. Although many content-ignorant studies have been performed to automatically identify new topics, previous results have demonstrated that spelling errors can cause significant errors in topic shift estimates. In this study, we focused on minimizing the number of wrong estimates that were based on spelling errors. We developed a new hybrid algorithm combining character n-gram and neural network methodologies, and compared the experimental results with results from previous studies. For the FAST and Excite datasets, the proposed algorithm improved topic shift estimates by 6.987% and 2.639%, respectively. Moreover, we analyzed the performance of the character n-gram method in different aspects including the comparison with Levenshtein edit-distance method. The experimental results demonstrated that the character n-gram method outperformed to the Levensthein edit distance method in terms of topic identification.en_US
dc.language.isoenen_US
dc.publisherElsevieren_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectContent-ignorant algorithmsen_US
dc.subjectThe levenshtein edit-distanceen_US
dc.subjectNew topic identificationen_US
dc.subjectThe character n-gram methoden_US
dc.subjectPre-processed spelling correction methodsen_US
dc.subjectNeural-network applicationsen_US
dc.subjectWeben_US
dc.subjectCategorizationen_US
dc.subjectComputer scienceen_US
dc.subjectInformation science & library scienceen_US
dc.subjectBehavioral researchen_US
dc.subjectSearch enginesen_US
dc.subjectErrorsen_US
dc.subjectInterneten_US
dc.subjectEdit distanceen_US
dc.subjectTopic identificationen_US
dc.subjectInternet-based applicationsen_US
dc.subjectSpelling correctionen_US
dc.subjectMinimizing the number ofen_US
dc.subjectSearch engine performanceen_US
dc.subjectN-gram methodsen_US
dc.subjectNetwork methodologiesen_US
dc.subjectAlgorithmsen_US
dc.titleCharacter n-gram application for automatic new topic identificationen_US
dc.typeArticleen_US
dc.identifier.wos000342546900001tr_TR
dc.identifier.scopus2-s2.0-84905168720tr_TR
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergitr_TR
dc.contributor.departmentUludağ Üniversitesi/Mühendislik Fakültesi/Endüstri Mühendisliği Bölümü.tr_TR
dc.contributor.orcid0000-0003-0159-8529tr_TR
dc.identifier.startpage821tr_TR
dc.identifier.endpage856tr_TR
dc.identifier.volume50tr_TR
dc.identifier.issue6tr_TR
dc.relation.journalInformation Processing and Managementen_US
dc.contributor.buuauthorÇağlar, Burcu Gençosman-
dc.contributor.buuauthorÖzmutlu, Hüseyin Cenk-
dc.contributor.buuauthorÖzmutlu, Seda-
dc.contributor.researcheridAAH-4480-2021tr_TR
dc.contributor.researcheridABH-5209-2020tr_TR
dc.contributor.researcheridAAG-8600-2021tr_TR
dc.subject.wosComputer science, information systemsen_US
dc.subject.wosInformation science & library scienceen_US
dc.indexed.wosSCIEen_US
dc.indexed.wosSSCIen_US
dc.indexed.scopusScopusen_US
dc.wos.quartileQ2en_US
dc.contributor.scopusid56263661900tr_TR
dc.contributor.scopusid6603061328tr_TR
dc.contributor.scopusid6603660605tr_TR
dc.subject.scopusQuery Reformulation; Image Indexing; Information Retrievalen_US
Appears in Collections:Scopus
Web of Science

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.