top of page

Create Your First Project

Start adding your projects to your portfolio. Click on "Manage Projects" to get started

Word Embedding in FinTech Literature

Project type

Personal Project

Date

Jul-Sep 2023

Location

Nottingham, UK

Skills

- Python (sklearn, NLTK, genism, matplotlib): main coding of PDF preprocessing, NLP modeling, data analysis
- Microsoft Excel: generating plots
- Microsoft PowerPoint: presentation
- draw.io: generating plots

This is a summer project for the MSc dissertation,
"Comparative Analysis of Word Embedding Methods for Citation Sentence Matching in FinTech Literature"

- Aim: finding the best text embedding method for fintech literature
- Data collection: collecting 3,500 fintech-related scientific articles
- Data structuring: converting PDFs to plain texts and matching citation sentences to the reference article
- Data preprocessing: handling special characters(ex: ligature), applying tokenisation, removing stop words and applying lemmatisation to construct our own fintech literature dataset
- 8 text embedding methods are introduced to evaluate the similarity between a citation sentence and its reference article as performances of embedding methods
- Including TF-IDF, LSA, word2vec, GloVe, FastText, ELMo, USE, and BERT
- generating a 10K words dissertation and a 15-minute presentation

bottom of page