Some Xmas Readings

A recommender system based on tag and time information for social tagging systems

Recently, social tagging has become increasingly prevalent on the Internet, which provides an effective way for users to organize, manage, share and search for various kinds of resources. These tagging systems offer lots of useful information, such as tag, an expression of user’s preference towards a certain resource; time, a denotation of user’s interests drift. As information explosion, it is necessary to recommend resources that a user might like. Since collaborative filtering (CF) is aimed to provide personalized services, how to integrate tag and time information in CF to provide better personalized recommendations for social tagging systems becomes a challenging task.

In this paper, we investigate the importance and usefulness of tag and time information when predicting users’ preference and examine how to exploit such information to build an effective resource-recommendation model. We design a recommender system to realize our computational approach. Also, we show empirically using data from a real-world dataset that tag and time information can well express users’ taste and we also show that better performances can be achieved if such information is integrated into CF.

A profile of information systems research published in expert systems with applications from 1995 to 2008

Expert systems with applications (ESWA) has been regarded as one of the highly qualified journals in the information system. This paper profiles research published in ESWA from 1995 to 2008. Based on the multidimensional analysis, we identified the most productive author and universities, research paper numbers per geographic region, and the most employed issues and methodologies used by the most highly published authors. Our results indicate that (1) ESWA is clearly an internationalized journal, (2) the most employed methodologies are fuzzy ESs and knowledge-based systems, and (3) the leading highly published authors always have diverse methodologies and applications. Furthermore, the implications for researchers, journal editors, universities, and research institution are presented.

High speed ant colony optimization CMOS chip

Ant colony optimization (ACO) is an optimization computation inspired by the study of the ant colonies’ behavior. This paper presents design and CMOS implementation of the ant colony optimization based algorithm for solving the TSP problem. In order to implement ant colony optimization algorithm in CMOS, we will present a new algorithm. This algorithm is based on the original ant colony optimization but it can be implemented in CMOS. Briefly, pheromone matrix is transformed on the chip area and ants move up-down through the pheromone matrix and they make their decisions. Finally ants select a global path. In previous researches only pheromone values is used, but select the next city in this paper is based on heuristics value and pheromone value. In definition of problem, we use heuristics value as a matrix. Previous researches could not be used for wide type of optimization problem but our chip gives heuristics value initially and we can change initial value of heuristics value according to the optimization problem so this capability increases the flexibility of ACO chip. Simple circuit is used in blocks of our chip to increase the speed of convergence of ACO chip. We use Linear Feedback Shift Register (LSFR) circuit for random number generator in ACO chip. ACO chip has capability of solving the big TSP problem. ACO chip is simulated by HSPICE software and simulation results show the good performance of final chip.

A novel prediction model based on hierarchical characteristic of web site

Internet has developed in a rapid way in the recent 10 years,and the information of web site has also been increasing fast. Predicting web user’s behavior becomes a crucial issue following the purposes like increasing the user’s browsing speed efficiently, decreasing the user’s latency as well as possible and reducing the loading of web server. In this paper, we propose an efficient prediction model, two-level prediction model (TLPM), using a novel aspect of natural hierarchical property from web log data. TLPM can decrease the size of candidate set of web pages and increase the speed of predicting with adequate accuracy. The experiment results prove that TLPM can highly enhance the performance of prediction when the number of web pages is increasing.

A Web page classification system based on a genetic algorithm using tagged-terms as features

The incredible increase in the amount of information on the World Wide Web has caused the birth of topic specific crawling of the Web. During a focused crawling process, an automatic Web page classification mechanism is needed to determine whether the page being considered is on the topic or not. In this study, a genetic algorithm (GA) based automatic Web page classification system which uses both HTML tags and terms belong to each tag as classification features and learns optimal classifier from the positive and negative Web pages in the training dataset is developed. Our system classifies Web pages by simply computing similarity between the learned classifier and the new Web pages. In the existing GA-based classifiers, only HTML tags or terms are used as features, however in this study both of them are taken together and optimal weights for the features are learned by our GA. It was found that, using both HTML tags and terms in each tag as separate features improves accuracy of classification, and the number of documents in the training dataset affects the accuracy such that if the number of negative documents is larger than the number of positive documents in the training dataset, the classification accuracy of our system increases up to 95% and becomes higher than the well known Naïve Bayes and k nearest neighbor classifiers.

Using chi-square statistics to measure similarities for text categorization

In this paper, we propose using chi-square statistics to measure similarities and chi-square tests to determine the homogeneity of two random samples of term vectors for text categorization. The properties of chi-square tests for text categorization are studied first. One of the advantages of chi-square test is that its significance level is similar to the miss rate that provides a foundation for theoretical performance (i.e. miss rate) guarantee. Generally a classifier using cosine similarities with TF * IDF performs reasonably well in text categorization. However, its performance may fluctuate even near the optimal threshold value. To improve the limitation, we propose the combined usage of chi-square statistics and cosine similarities. Extensive experiment results verify properties of chi-square tests and performance of the combined usage.

Simulated annealing with adaptive neighborhood: A case study in off-line robot path planning

Simulated annealing (SA) is an optimization technique that can process cost functions with degrees of nonlinearities, discontinuities and stochasticity. It can process arbitrary boundary conditions and constraints imposed on these cost functions. The SA technique is applied to the problem of robot path planning. Three situations are considered here: the path is represented as a polyline; as a Bézier curve; and as a spline interpolated curve. In the proposed SA algorithm, the sensitivity of each continuous parameter is evaluated at each iteration increasing the number of accepted solutions. The sensitivity of each parameter is associated to its probability distribution in the definition of the next candidate.

Identifying new business areas using patent information: A DEA and text mining approach

From a resource-based point of view, firm’s technological capabilities can be used as underlying sources for identifying new businesses. However, current methods are insufficient to systematically and clearly support firms in finding new business areas based on their technological strength. This research proposes a systematic approach to identify new business areas grounded on the relative technological strength of firms. Patent information is useful as a measure of firms’ technological resources and data envelopment analysis (DEA) is beneficial to obtain the weighted value of patents according to their quality. With this weighted quality of patents, a firm can evaluate their relative technological strength at the industry and product level according to potential business areas. To compute technological strength by products, this research applies text mining method to patent documents, a method which a researcher discovers knowledge with unstructured data with. This paper shows the usefulness of the newly proposed framework with a case study.

Communities and dynamical processes in a complex software network

Complex technological networks represent a growing challenge to support and maintain as their number of elements become higher and their interdependencies more involved. On the other hand, for networks that grow in a decentralized manner, it is possible to observe certain patterns in their overall structure that may be taken into account for a more tractable analysis. An example of such a pattern is the spontaneous formation of communities or modules. An important question regarding the detection of communities is if these are really representative of any internal network feature. In this work, we explore the community structure of a real complex software network, and correlate this modularity information with the internal dynamical processes that the network is designed to support. Our results show that the dependence between community structure and internal dynamical processes is remarkable, supporting the fact that a community division of this complex network is helpful in the assessment of the underlying dynamical structure, and thus is a useful tool to achieve a simpler representation of the complexity of the network.