Create Your First Project

Start adding your projects to your portfolio. Click on "Manage Projects" to get started

.pro-gallery-wix-wrapper {display: block !important;} .pro-gallery-wix-wrapper .gallery-item-container {opacity: 1 !important; display: block !important;}

Word Embedding in FinTech Literature

Project type

Personal Project

Date

Jul-Sep 2023

Location

Nottingham, UK

Skills

- Python (sklearn, NLTK, genism, matplotlib): main coding of PDF preprocessing, NLP modeling, data analysis
- Microsoft Excel: generating plots
- Microsoft PowerPoint: presentation
- draw.io: generating plots

This is a summer project for the MSc dissertation,
"Comparative Analysis of Word Embedding Methods for Citation Sentence Matching in FinTech Literature"

- Aim: finding the best text embedding method for fintech literature
- Data collection: collecting 3,500 fintech-related scientific articles
- Data structuring: converting PDFs to plain texts and matching citation sentences to the reference article
- Data preprocessing: handling special characters(ex: ligature), applying tokenisation, removing stop words and applying lemmatisation to construct our own fintech literature dataset
- 8 text embedding methods are introduced to evaluate the similarity between a citation sentence and its reference article as performances of embedding methods
- Including TF-IDF, LSA, word2vec, GloVe, FastText, ELMo, USE, and BERT
- generating a 10K words dissertation and a 15-minute presentation