Author(s)
De Silva, Anthony MihiranaKeywords
Feature GenerationTime-series Prediction
Context-free Grammar
Grammatical Evolution
Feature Selection
Machine Learning
Full record
Show full item recordOnline Access
http://hdl.handle.net/2123/10278Abstract
The application of machine learning techniques to predict time-series continues to attract considerable attention due to the difficulty of the prediction problems compounded by the non-linear and non-stationary nature of the real world time-series. The performance of machine learning techniques, among other things, depends on suitable engineering of features. This thesis proposes a systematic way for generating suitable features using context-free grammar. The notion of grammar families as a compact representation to generate a broad class of features is exploited. Implementation issues and ways to overcome them are explained in detail. A number of feature selection criteria are investigated and a hybrid feature generation and selection algorithm using grammatical evolution is proposed. The proposed approaches are demonstrated by predicting the closing price of major stock market indices, peak electricity load and net hourly foreign exchange client trade volume. The widely and commonly employed features in practice (in previous work) for electricity and financial time-series are explored. These features are considered as a basis for comparison with the features generated and selected by the proposed framework. Other model-based approaches and naive approaches are also used as benchmarks. It is shown that the generated features can improve results, while requiring no domain-specific knowledge. The proposed method is used to determine suitable features to use in predicting previously unexplored foreign exchange client trade volume and the capabilities of the approach in automatically engineering appropriate features is highlighted. The proposed method can be applied to a wide range of machine learning architectures and applications to represent complex feature dependencies explicitly when machine learning cannot achieve this by itself.Access is restricted to staff and students of the University of Sydney . UniKey credentials are required. Non university access may be obtained by visiting the University of Sydney Library.
Access is restricted to staff and students of the University of Sydney . UniKey credentials are required. Non university access may be obtained by visiting the University of Sydney Library.
Date
2014-04-04Type
Masters ThesisIdentifier
oai:ses.library.usyd.edu.au:2123/10278http://hdl.handle.net/2123/10278