Vol.17 No.1&2 March 1, 2018
Research Articles
A Framework for Product Description Classification in E-commerce
(pp001-027)
Damir Vandic,
Flavius Frasincar, and Uzay Kaymak
We propose the Hierarchical Product
Classification (HPC)
framework for the purpose of classifying products using a hierarchical
product taxonomy. The framework uses a classification system with
multiple classification nodes, each residing on a different level of the
taxonomy. The innovative part of the framework stems from the definition
of classification recipes that can be used to construct high-quality
classifier nodes, using the product descriptions in the most optimal
way. These classifier recipes are specifically tailored for the
e-commerce domain. The use of these classifier recipes enables flexible
classifiers that adjust to the taxonomy depth-specific characteristics
of product taxonomies. Furthermore, in order to gain insight into which
components are required to perform high quality product classification,
we evaluate several feature selection methods and classification
techniques in the context of our framework. Based on 3000 product
descriptions obtained from Amazon.com,
HPC
achieves an overall accuracy of 76.80% for product classification. Using
110 categories from
CircuitCity.com
and Amazon.com, we obtain a precision of 93.61% for mapping the
categories to the taxonomy of shopping.com.
Text-Mining
and Pattern-Matching based Prediction Models for Detecting Vulnerable
Files in Web Applications
(pp028-044)
Mukesh Kumar
Gupta, Mahesh Chandra Govil, and Girdhari Singh
The proliferation of technology has
empowered the web applications. At the same time, the presences of
Cross-Site Scripting (XSS) vulnerabilities in web applications have
become a major concern for all. Despite the many current detection and
prevention approaches, attackers are exploiting XSS vulnerabilities
continuously and causing significant harm to the web users. In this
paper, we formulate the detection of XSS vulnerabilities as a prediction
model based classification problem. A novel approach based on
text-mining and pattern-matching techniques is proposed to extract a set
of features from source code files. The extracted features are used to
build prediction models, which can discriminate the vulnerable code
files from the benign ones. The efficiency of the developed models is
evaluated on a publicly available labeled dataset that contains 9408 PHP
labeled (i.e. safe, unsafe) source code files. The experimental results
depict the superiority of the proposed approach over existing ones.
A Quantitative Analysis of the Use of Microdata
for Semantic Annotations on
Educational Resources
(pp045-072)
Rosa Del Carmen
Mavarrete Rueda and Sergio Lujan
A current trend in the semantic web is the use of
embedded markup formats aimed to semantically enrich web content by
making it more understandable to search engines and other applications.
The deployment of Microdata as a markup format has increased thanks to
the widespread of a controlled vocabulary provided by Schema.org.
Recently, a set of properties from the Learning Resource Metadata
Initiative (LRMI) specification, which describes educational resources,
was adopted by Schema.org. These properties, in addition to those
related to accessibility and the license of resources included in
Schema.org, would enable search engines to provide more relevant results
in searching for educational resources for all users, including users
with disabilities. In order to obtain a reliable evaluation
of the use of Microdata properties related to the LRMI specification,
accessibility, and the license of resources, this research
conducted a quantitative analysis of the deployment of these properties
in large-scale web corpora covering two consecutive years. The corpora
contain hundreds of millions of web pages. The results further our
understanding of this deployment in addition to highlighting the pending
issues and challenges concerning the use of such properties.
Semantic Emotion-Topic Model Based
Social Emotion Mining
(pp073-092)
Ruirong Xue, Xiangfeng Luo, Qichen Ma, and Shengwei Gu
With the
booming of social
media users, more and more short texts
with emotion labels appear, which contain users' rich emotions and
opinions about social events or enterprise products. Social emotion
mining on social media corpus can help government or enterprise make
their decisions. Emotion mining models involve statistical-based and
graph-based approaches. Among them, the former approaches are more
popular, e.g. Latent Dirichlet Allocation (LDA)-based Emotion Topic
Model. However, they are suffering from low retrieval performance, such
as the bad accuracy and the poor interpretability, due to them only
considering the bag-of-words or the emotion labels in social media
corpus. In this paper, we propose a LDA-based Semantic Emotion-Topic
Model (SETM) combining emotion labels and
inter-word relations to enhance the retrieval performance of social
emotion mining result. The performance influence of four factors on SETM
are considered, i.e., association relations, computing time, topic
number and semantic interpretability. Experimental results show that the
accuracy of our proposed model is 0.750, compared with 0.606, 0.663 and
0.680 of Emotion Topic Model (ETM), Multi-label Supervised Topic Model (MSTM)
and Sentiment Latent Topic Model (SLTM) respectively. Besides, the
computing time of our model is reduced by 87.81% through limiting word
frequency, and its accuracy is 0.703, compared with 0.501, 0.648 and
0.642 of the above baseline methods. Thus, the proposed model has broad
prospects in social emotion mining area.
Unsupervised Keyword Extraction
from Microblog Posts via Hashtags
(pp093-120)
Lin Li, Jinghang Liu, Yueqing Sun, Guangdong Xu,
Jingling Yuan and Luo Zhong
Nowadays, huge amounts of texts are
being generated for social networking purposes on Web. Keyword
extraction from such texts like microblog posts benefits many
applications such as advertising, search, and content filtering. Unlike
traditional web pages, a microblog post usually has some special social
feature like a hashtag that is topical in nature and generated by users.
Extracting keywords related to hashtags can reflect the intents of users
and thus provides us better understanding on post content. In this
paper, we propose a novel unsupervised keyword extraction approach for
microblog posts by treating hashtags as topical indicators. Our approach
consists of two hashtag enhanced algorithms. One is a topic model
algorithm that infers topic distributions biased to hashtags on a
collection of microblog posts. The words are ranked by their average
topic probabilities. Our topic model algorithm can not only find the
topics of a collection, but also extract hashtag-related keywords. The
other is a random walk based algorithm. It first builds a word-post
weighted graph by taking into account posts themselves. Then, a hashtag
biased random walk is applied on this graph, which guides the algorithm
to extract keywords according to hashtag topics. Last, the final ranking
score of a word is determined by the stationary probability after a
number of iterations. We evaluate our proposed approach on a collection
of real Chinese microblog posts. Experiments show that our approach is
more effective in terms of precision than traditional approaches
considering no hashtag. The result achieved by the combination of two
algorithms performs even better than each individual algorithm.
A Graph Based Technique of
Process Partitioning
(pp121-140)
Gang Xue,
Jing Liu, Liwen Wu, and Shaowen Yao
Web service is an important technology for
constructing distributed applications. In order to provide more complex
functionalities, services can be reused by applying service composition.
A service composition can be designed and implemented through a
centralization or decentralization strategy. When observing the
decentralized service composition, several researchers found out that
this kind of compositions has its own advantages. These findings promote
the development of approaches for designing, implementing and applying
decentralized service compositions. Process partitioning is a topic
about dividing a process into a collection of small parts. The technique
is applicable to partitioning a process in a centralized service
composition, and the result can provide support to constructing a
decentralized service composition. This paper presents a technique of
process partitioning. The technique can be used for constructing
decentralized service compositions, and it provides a graph
transformation based approach to reorganizing a process which is
represented as a process structure graph. Compared to existing
approaches, the technique can partition well-structured and unstructured
processes. Some issues about decentralized service compositions and
performance tests of service compositions are discussed in this paper.
Experimental results show that, when compared with the centralized
service composition, the decentralized service composition can have
lower average response time and higher throughput in runtime
environment.
Back
to JWE Online Front Page
|