Detailed Information

Cited 22 time in webofscience Cited 33 time in scopus
Metadata Downloads

TechWord: Development of a technology lexical database for structuring textual technology information based on natural language processing

Authors
Jang, HyejinJeong, YujinYoon, Byungun
Issue Date
Feb-2021
Publisher
PERGAMON-ELSEVIER SCIENCE LTD
Keywords
Patent mining; Natural language processing; Text mining; Lexical analysis; WordNet
Citation
EXPERT SYSTEMS WITH APPLICATIONS, v.164
Indexed
SCIE
SCOPUS
Journal Title
EXPERT SYSTEMS WITH APPLICATIONS
Volume
164
URI
https://scholarworks.dongguk.edu/handle/sw.dongguk/19530
DOI
10.1016/j.eswa.2020.114042
ISSN
0957-4174
1873-6793
Abstract
The role of text mining based on technological documents such as patents is important in the research field of technology intelligence for technology R&D planning. In addition, WordNet, an English-based lexical database, is widely used for pre-processing text data such as word lemmatization and synonym search. However, technological vocabulary information is complex and specific, and WordNet's ability to analyze technological information is limited in its reflecting technological features. Thus, to improve the text mining performance of technological information, this study proposes a methodology for designing a TechWord-based lexical database that is based on the lexical characteristics of technological words that are differentiated from general words. To do this, we define TechWord, a technology lexical information, and construct a TechSynset, a synonym set between TechWords. First, through dependency parsing between words, TechWord, a unit word that describes a technology, is structured and identifies nouns and verbs. The importance of connectivity is investigated by a network centrality index analysis based on the dependency relations of words. Subsequently, to search for synonyms suitable for the target technology domain, a TechSynset is constructed through synset information, with an additional analysis that calculates cosine similarity based on a word embedding vector. Applying the proposed methodology to the actual technology-related information analysis, we collect patent data on the technological fields of the automotive field, and present the results of the TechWord and TechSynset. This study improves technological information-based text mining by structuring the word-to-word link information in technological documents based on an automated process.
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of Engineering > Department of Industrial and Systems Engineering > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Yoon, Byung Un photo

Yoon, Byung Un
College of Engineering (Department of Industrial and Systems Engineering)
Read more

Altmetrics

Total Views & Downloads

BROWSE