Yi Yang, R. Harald Baayen
2025
Language and Cognition Vol. 17
10.1017/langcog.2024.47
摘要
Abstract This paper presents a cross-language study of lexical semantics within the framework of distributional semantics. We used a wide range of predefined semantic categories in Mandarin and English and compared the clusterings of these categories using FastText word embeddings. Three techniques of dimensionality reduction were applied to mapping 300-dimensional FastText vectors into two-dimensional planes: multidimensional scaling, principal components analysis, and t-distributed stochastic ...
This paper explores the semantic structures of Mandarin and English lexicons using distributional semantics. It investigates similarities and differences in how semantic categories cluster in both languages, utilizing FastText word embeddings and dimensionality reduction techniques.
The study used FastText word embeddings for Mandarin and English, applying multidimensional scaling, principal components analysis, and t-distributed stochastic neighbor embedding for dimensionality reduction. Procrustes analysis was used to align and compare the semantic spaces.
T-SNE provided the clearest semantic category clustering. Similar differentiation between nouns, verbs, and adjectives was observed in both languages, as well as between concrete and abstract words. Procrustes analysis revealed subtle differences in semantic lexicon structure.