Görsel soru cevaplama probleminde bağlamsal vektörlerin performans analizi

Hakdağlı, Özlem

Please use this identifier to cite or link to this item: http://hdl.handle.net/11452/27070

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Bilgin, Metin	-
dc.contributor.author	Hakdağlı, Özlem	-
dc.date.accessioned	2022-06-13T06:09:43Z	-
dc.date.available	2022-06-13T06:09:43Z	-
dc.date.issued	2022-05-13	-
dc.identifier.citation	Hakdağlı, Ö. (2022). Görsel soru cevaplama probleminde bağlamsal vektörlerin performans analizi. Yayınlanmamış yüksek lisans tezi. Bursa Uludağ Üniversitesi Fen Bilimleri Enstitüsü.	tr_TR
dc.identifier.uri	http://hdl.handle.net/11452/27070	-
dc.description.abstract	Görsel soru cevaplama (GSC) çalışmaları, görsel imgeleri anlamlandırmanın yanında tutarlılık sağlamayı hedeflemektedir. GSC problemi, görsel bir imge ile bu imgeye sorulan soru arasındaki bağlantıyı ele almaktadır. Ele alınan bağlantının yorumlanması ve çözümlenmesi, sorulan soruya beklenen cevabın görsel içerisinden elde edilmesini sağlar. Çözümleme işlemini gerçekleştirmek için görsel imgelerin matematiksel düzlemde temsil edilmesi gereklidir. Bu temsiller vektör olarak adlandırılır. Görsel vektörlerin elde edinimi aşamasında, bu çalışmada ImageNet verisi ile eğitilmiş olan Xception ve Inception-Resnet-V2 modelleri kullanılmıştır. Modeller derin konvolüsyonel ağlara ve tekrarlayan katman yapısı sayesinden görsel veriden yüksek doğruluk ile vektör temsili elde edilmektedir. Görsel vektör temsili, GSC problemi için yeterli değildir. Görsele sorulan sorunun matematiksel düzlemde temsili gerekmektedir. Metinsel verilerin temsili diğer adı ile kelime gömmeleri, ön eğitimli modeller olan Word2Vec, Kelime Temsili için Global Vektörler (Global Vectors for Word Representation, GloVe) ve FastText yöntemleri ile anlamsal bağlamdan bağımsız şekilde elde edilmektedir. Transformatörlerden Çift Yönlü Kodlayıcı Temsilleri (Bi-directional Encoder Representations from Transformers, BERT), inşa edilmiş olduğu çok başlı ilgi yapısı ile kelimelerin arasındaki alt bağlamı öğrenmekte ve temsil etmektedir. Bu çalışma ile sorulan sorunun anlamsal bütünlüğünü güçlendirmek için BERT bağlamsal vektörleri uyarlanmıştır. Çalışmanın sonuçları değerlendirildiğinde BERT yöntemi; Word2Vec, GloVe ve FastText yöntemlerinden daha yüksek doğruluk oranlarına ulaştığı görüldü. Böylelikle, literatüre yeni girmiş olan BERT bağlamsal vektörleri yönteminin GSC problemindeki başarısı gösterilmiştir.Görsel soru cevaplama (GSC) çalışmaları, görsel imgeleri anlamlandırmanın yanında tutarlılık sağlamayı hedeflemektedir. GSC problemi, görsel bir imge ile bu imgeye sorulan soru arasındaki bağlantıyı ele almaktadır. Ele alınan bağlantının yorumlanması ve çözümlenmesi, sorulan soruya beklenen cevabın görsel içerisinden elde edilmesini sağlar. Çözümleme işlemini gerçekleştirmek için görsel imgelerin matematiksel düzlemde temsil edilmesi gereklidir. Bu temsiller vektör olarak adlandırılır. Görsel vektörlerin elde edinimi aşamasında, bu çalışmada ImageNet verisi ile eğitilmiş olan Xception ve Inception-Resnet-V2 modelleri kullanılmıştır. Modeller derin konvolüsyonel ağlara ve tekrarlayan katman yapısı sayesinden görsel veriden yüksek doğruluk ile vektör temsili elde edilmektedir. Görsel vektör temsili, GSC problemi için yeterli değildir. Görsele sorulan sorunun matematiksel düzlemde temsili gerekmektedir. Metinsel verilerin temsili diğer adı ile kelime gömmeleri, ön eğitimli modeller olan Word2Vec, Kelime Temsili için Global Vektörler (Global Vectors for Word Representation, GloVe) ve FastText yöntemleri ile anlamsal bağlamdan bağımsız şekilde elde edilmektedir. Transformatörlerden Çift Yönlü Kodlayıcı Temsilleri (Bi-directional Encoder Representations from Transformers, BERT), inşa edilmiş olduğu çok başlı ilgi yapısı ile kelimelerin arasındaki alt bağlamı öğrenmekte ve temsil etmektedir. Bu çalışma ile sorulan sorunun anlamsal bütünlüğünü güçlendirmek için BERT bağlamsal vektörleri uyarlanmıştır. Çalışmanın sonuçları değerlendirildiğinde BERT yöntemi; Word2Vec, GloVe ve FastText yöntemlerinden daha yüksek doğruluk oranlarına ulaştığı görüldü. Böylelikle, literatüre yeni girmiş olan BERT bağlamsal vektörleri yönteminin GSC problemindeki başarısı gösterilmiştir.	tr_TR
dc.description.abstract	Visual question answering (VQA) studies aim to provide consistency as well as to make sense of visual images. The VQA problem deals with the connection between a visual image and the question asked to that image. The interpretation and analysis of the discussed link ensures that the expected answer to the question asked is obtained from within the picture. In order to perform the analysis process, it is necessary to represent the visual images on the mathematical plane. These representations are called vectors. In the acquisition phase of visual vectors, Xception and Inception-Resnet-V2 models which are trained with ImageNet data were used. The models obtain vector representation from visual data with high accuracy due to deep convolutional networks and residual layer structure. Visual vector representation is not sufficient for the VQA problem. The mathematical representation of the question asked to the image is required. Representation of textual data, also known as word embeddings, can be obtained independently of the semantic context with the pre-trained models Word2Vec, Global Vectors for Word Representation (GloVe) and FastText, Bi-directional Encoder Representations from Transformers (BERT)learns and represents the sub-context between words with the multi-headed attention structure it is built on. BERT contextual vectors were adapted to strengthen the semantic integrity of the question asked in this study. When the results of the study were evaluated, it was seen that the BERT method achieved higher accuracy rates than the Word2Vec, GloVe and FastText methods. Thus, the success of the BERT contextual vectors method, which has just entered the literature, in the GSC problem has been demonstrated.	en_US
dc.format.extent	VIII, 60 sayfa	tr_TR
dc.language.iso	tr	tr_TR
dc.publisher	Bursa Uludağ Üniversitesi	tr_TR
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.rights	Atıf 4.0 Uluslararası	tr_TR
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	*
dc.subject	Görsel soru cevaplama	tr_TR
dc.subject	Derin öğrenme	tr_TR
dc.subject	Doğal dil işleme	tr_TR
dc.subject	Kelime gömmeleri	tr_TR
dc.subject	Bağlamsal kelime vektörleri	tr_TR
dc.subject	Visual question answering	en_US
dc.subject	Deep learning	en_US
dc.subject	Natural language processing	en_US
dc.subject	Word embedding	en_US
dc.subject	Contextual word vectors	en_US
dc.title	Görsel soru cevaplama probleminde bağlamsal vektörlerin performans analizi	tr_TR
dc.title.alternative	Performance analysis of contextual vectors in visual question answering problem	en_US
dc.type	masterThesis	en_US
dc.relation.publicationcategory	Tez	tr_TR
dc.contributor.department	Bursa Uludağ Üniversitesi/Fen Bilimleri Enstitüsü/Bilgisayar Mühendisliği Anabilim Dalı.	tr_TR
dc.contributor.orcid	0000-0002-3637-4309	tr_TR
Appears in Collections:	Fen Bilimleri Yüksek Lisans Tezleri / Master Degree

Files in This Item:

File	Description	Size	Format
Özlem_Hakdağlı.pdf		1.84 MB	Adobe PDF	View/Open

Show simple item record

This item is licensed under a Creative Commons License

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets