Vol.14 No.5&6 November 1, 2015
Engineering the Web for Users, Developers and the
Crowd
Editorial
(pp361-362)
Sven Casteleyn, Gustavo Rossi, and Marco
Winckler
Patterns in Eyetracking Scanpaths and the Affecting Factors
(pp363-385)
Sukru Eraslan and Yeliz Yesilada
Web pages are typically decorated with different kinds of visual
elements that help sighted people complete their tasks. Unfortunately,
people accessing web pages in constrained environments, such as visually
disabled and small screen device users, cannot benefit from them. In our
previous work, we show that tracking the eye movements of sighted users
provide good understanding of how people use these visual elements. We
also show that reengineering web pages by using these visual elements
can improve people's experience in constrainted environments. However,
in order to reengineer web pages based on eyetracking, we first need to
aggregate, analyse and understand how a group of people's eyetracking
data can be combined to create a common scanpath (namely, eye movement
sequence) in terms of visual elements. This paper presents an algorithm
that aims to achieve this. This algorithm was developed iteratively and
experimentally evaluated with an eyetracking study. This study shows
that the proposed algorithm is able to identify patterns in eyetracking
scanpaths and it can work well with different number of participants. We
then extended our experiments to investigate the effects of the task,
gender and familiarity factors on common scanpaths. The results suggest
that these factors can cause some differences in common scanpaths. This
study also suggests that this algorithm can be improved by considering
different techniques for pre-processing the data, by addressing the
drawbacks of using the hierarchical structure and by taking into account
the underlying cognitive processes.
From TMR to Turtle: Predicting Result Relevance
from Mouse Cursor Interactions in Web Search
(pp386-413)
Maximilian Speicher, Sebastian Nuck, Lars
Wesemann, Andreas Both, and Martin Gaedke
The prime aspect of quality for search-driven web applications is to
provide users with the best possible results for a given query. Thus, it
is necessary to predict the relevance of results a priori.
Current solutions mostly engage clicks on results for respective
predictions, but research has shown that it is highly beneficial to also
consider additional features of user interaction. Nowadays, such
interactions are produced in steadily growing amounts by internet users.
Processing these amounts calls for streaming-based approaches and
incrementally updatable relevance models. We present
StreamMyRelevance! --- a novel streaming-based system for ensuring
quality of ranking in search engines. Our approach provides a complete
pipeline from collecting interactions in real-time to processing them
incrementally on the server side. We conducted a large-scale evaluation
with real-world data from the hotel search domain. Results show that our
system yields predictions as good as those of competing state-of-the-art
systems, but by design of the underlying framework at higher efficiency,
robustness, and scalability. Additionally, our system has been
transferred into a real-world industry context. A modified solution
called Turtle has been integrated into a new search engine for
general web search. To obtain high-quality judgments for learning
relevance models, it has been augmented with a novel crowdsourcing tool.
Identifying Web Performance Degradations through
Synthetic and Real-User Monitoring
(pp414-442)
Jurgen Cito, Devan Gotowka, Philipp Leitner,
Ryan Pelette, Dritan Suljoti, and Schahram Dustdar
The large scale of the Internet has offered unique economic
opportunities, that in turn introduce overwhelming challenges for
development and operations to provide reliable and fast services in
order to meet the high demands on the performance of online services. In
this paper, we investigate how performance engineers can identify three
different classes of externally-visible performance problems (global
delays, partial delays, periodic delays) from concrete traces. We
develop a simulation model based on a taxonomy of root causes in server
performance degradation. Within an experimental setup, we obtain results
through synthetic monitoring of a target Web service, and observe
changes in Web performance over time through exploratory visual analysis
and changepoint detection. We extend our analysis and apply our methods
to real-user monitoring (RUM) data. In a use case study, we discuss how
our underlying model can be applied to real performance data gathered
from a multinational, high-traffic website in the financial sector.
Finally, we interpret our findings and discuss various challenges and
pitfalls.
Designing Complex Crowdsourcing Applications Covering Multiple
Platforms and Tasks
(pp443-473)
Alessandro Bozzon, Marco Brambilla, Stefano
Ceri, Andrea Mauri, and Riccardo Volonterio
A number of emerging crowd-based applications cover
very different scenarios, including opinion mining, multimedia data
annotation, localised information gathering, marketing campaigns, expert
response gathering, and so on. In most of these scenarios, applications
can be decomposed into tasks that collectively produce their results;
tasks interactions give rise to arbitrarily complex workflows. In this
paper we propose methods and tools for designing crowd-based workflows
as interacting tasks. We describe the modelling concepts that are useful
in this framework, including typical workflow patterns, whose function
is to decompose a cognitively complex task into simple interacting tasks
for cooperative solving. We then discuss how workflows and patterns are
managed by CrowdSearcher, a system for designing, deploying and
monitoring applications on top of crowd-based systems, including social
networks and crowdsourcing platforms. Tasks performed by humans consist
of simple operations which apply to homogeneous objects; the complexity
of aggregating and interpreting task results is embodied within the
framework. We show our approach at work on a validation scenario and we
report quantitative findings, which highlight the effect of workflow
design on the final results.
Other Research Articles
Web Browsing Automation for Applications Quality
Control
(pp474-502)
Boni Garcia and Juan Carlos Duenas
Context: Quality control comprises the set of activities aimed to
evaluate that software meets its specification and delivers the
functionality expected by the consumers. These activities are often
removed in the development process and, as a result, the final software
product usually lacks quality.
Objective: We propose a set of techniques to automate the quality
control for web applications from the client-side, guiding the process
by functional and non-functional requirements (performance, security,
compatibility, usability and accessibility).
Method: The first step to achieve automation is to define the
structure of the web navigation. Existing software artifacts in the
phase of analysis and design are reused. Then, the independent paths of
navigation are found, and each path is traversed automatically using
real browsers while different kinds of assessments are carried out.
Results: The processes and methods proposed in this paper have been
implemented by means of a reference architecture and open source tools.
A laboratory experiment and an industrial case study have been performed
in order to validate the proposal.
Conclusion: The definition of navigation paths is a rich approach to
model web applications. Grey-box (black-box and white-box) methods have
been proved to be very valuable for web assessment. The Chinese Postman
Problem (CPP) is an optimal way to find the independent paths in a web
navigation modeled as a directed graph.
Modified PageRank for Concept Based Search
(pp503-524)
G. Pavai. E. Umamaheswari, and T.V. Geetha
Traditional PageRank algorithm computes the
weight for each hyper-linked document, which indicates the importance of
a page, based on the in-links and out-links. This is an off-line and
query independent process which suits a keyword based search strategy.
However, owing to the problems like polynymy, synonymy etc.., existing
in keyword based search, new methodologies for search like concept based
search, semantic web based search etc., have been developed. Concept
based search engines generally go in for content based ranking by
imparting semantics to the web pages. While this approach is better than
the keyword based ranking strategies, they do not consider the physical
link structure between documents which is the basis of the successful
PageRank algorithm. Hence, we made an attempt to combine the power of
link structures with content information to suit the concept based
search engines. Our main contribution includes, two modifications to the
traditional PageRank Algorithm, both specifically to cater to the
concept based search engines. Inspired by the topic sensitive PageRank
algorithm, we have multiple PageRanks for a document, rather than just
one for each document, as given in the traditional implementation of the
PageRank algorithm. We have compared our methodologies with an existing
concept based search engine’s ranking methodology, and found that our
modifications considerably improve the ranking of the conceptual search
results. Furthermore, we performed statistical significance test and
found out that our Version-2 modification to the PageRank algorithm is
statistically significant in its P@5 performance compared to the
baseline.
Bayesian Based Type Discrimination of Web Events
(pp525-544)
Qichen Ma, Xiangfeng Luo, Junyu Xuan, and
Huimin Liu
There are a large number of web events emerging
on the web and attracting people’s attention every day, and it is of
great interest and significance to distinguish the different types of
these web events in practice. For example, the distinguished emergent
web events should be paid more attentions by the departments of the
government to save lives and damages or by news websites to increase
their hit-rates using limited resources. However, how to efficiently
distinguish the types of web events remains a challenge issue due to the
seldom efforts paid to this issue in the community. In this paper, we
conduct a thorough consideration on this problem and then propose an
innovative Bayesian-based model to distinguish the different types of
web events. To be specific, all web events are firstly assumed within
three types whose formal definitions are given by considering their
properties. Aiming to sufficiently describe and distinguish three types
web events, a set of specially designed features are then extracted from
the volume and the content of web events. Finally, a Bayesian-based
model is proposed based on the designed features. The experimental
results demonstrate the capability of the proposed model to distinguish
types of web events, and the comparisons with other state-of-the-art
classifiers also show the efficiency of the proposed model.
Back
to JWE Online Front Page
|