GDF: A tool for function estimation through Ioannis G. Tsoulos (1), Dimitris Gavrilis(2), Evangelos Dermatas(2) (1)Department of Computer Science, University of Ioannina, P.O.
(2)Department of Electrical & Computer Egineering, University of Abstract
This article introduces a tool for data fitting that is based on genetic programming and especially on the grammatical evolution technique. The user needs to input a series of points and the accompanied dimensionality n and the tool will produce via the genetic programming paradigm a function f : Rn → R that is optimal in the least squares sense. The tool is entirely written in ANSI C++ and it can be installed in every UNIX PACS:02.30.Mv;02.60.-x;02.60.Ed ;07.05.Mh;
Corresponding author. Email: PROGRAM SUMMARY
Program available from: CPC Program Library, Queen’s University of Belfast, Computer for which the program is designed and others on which it has been tested : The tool is designed to be portable in all systems running the GNU Installation: University of Ioannina and University of Patras, Greece.
Programming language used : GNU-C++.
Memory required to execute with typical data: 200KB.
Has the code been vectorised or parallelized? : No.
No. of bytes in distributed program,including test data etc.: 30 Kbytes.
Distribution format : gzipped tar file.
Keywords: Function approximation, stochastic methods, genetic programming, Method of solution: Genetic programming.
The problem of function estimation consists of finding a function that will best approximate a set of n-dimensional points given their output. Function es- timation finds many applications in physics, chemistry, signal processing etc.
and it can be formulated as follows: Given M points and associated values (xi, yi) , i = 1, . . . , M with xi ∈ Rn estimate a function f : Rn → R that minimizes the least squares “Error” Through the years many methods have been proposed for this problem, such as spline based [2, 3] or neural network based [4, 5]. Although these techniques have been applied successfully to many data fitting problems, they produce functional forms which consist of applications of specific functions such as polynomials or sigmoidal functions. The proposed programming tool takes as input the points (xi, yi) and creates a functional form that minimizes the quantity in equation 1 through the procedure of Grammatical Evolution [1]. Grammatical Evolution is an evolutionary processes that can create programs in an arbitrary language. The production is performed using a mapping process governed by a grammar expressed in Backus Naur Form. Grammatical evolution has been applied successfully to problems such as symbolic regression [7], discovery of trigonometric identities [8], robot control [9], caching algorithms [10], financial prediction [11] etc. The rest of this article is organized as follows: in section 2 the contents of the distribution are presented, in section 3 the installation steps for any UNIX programming environment are expressed in detail, in section 4 the main parts of the distribution such as the underlying algorithm, the grammar specifications and the gdf program are thoroughly analyzed and finally in section 5 some conclusions from the application of the tool are listed.
The package is distributed in a tar.gz file named GDF.tar.gz and under UNIX systems the user must issue the following commands to extract the associated These steps create a directory named GDF with the following contents: 1. bin: It is a directory which is initially empty. After the compilation
of the package it will contain the executable gdf and the text file named grammar.txt. The first file is the created programming tool and the second one is an auxiliary file that contains the grammar of the tool, expressed 2. doc: This directory contains the documentation of the package (this file)
in different formats: A LYX file, A LATEX file, a PostScript file and a pdf 3. include: The directory which contains the header files for all the classes
4. src: The directory with the source files of the package.
5. Makefile: It is the file that will be read by the make utility in order to
built the tool. There is no need for the user to modify this file.
The following steps are required in order to build the tool: 1. Uncompress the tool as described in the previous section.
After the compilation the binary file gdf will be placed in the bin subdirectory accompanied with the text file grammar.txt The tool is entirely written in GNU C++ version 3.2.3, but it can be installed in systems with different ANSI C++ compiler. The only modification required is to replace the line in the file Makefile under the src subdirectory with the following one where mycpp is the name of the corresponding ANSI C++ compiler in the Program details
The underlying algorithm
The programming tool is based on the following stochastic algorithm: Initialization Step:
1. The program reads the data to be fitted from a text file.
2. The program reads the used grammar from a text file.
3. Every chromosome in the genetic population is initialized. The initializa- tion is performed by a randomly selection of a number in the range [0,255] for every element of each chromosome.
4. The values for the parameters selection rate and mutation rate are
selected. The selection rate denotes the fraction of the number of chro-
mosomes that will go through unchanged to the next generation. That means that the probability for crossover is set to 1 - selection rate. The
values for these parameters are mutually independent and they must be 5. Set k = 0, where k is the amount of the generations.
6. Set the value for the parameter maxK, where maxK is the maximum Evolution Step:
1. For every chromosome in the population, a function is created through the process of Grammatical Evolution.
2. The fitness of each member of the population is evaluated.
3. The chromosomes are sorted in descending order according to their fitness 4. A bunch of (1-selection rate)×population size new chromosomes is cre- ated. Every new chromosome is formed from two selected individuals (parents) of the current population with one - point crossover. In that procedure the chromosomes are cut at a randomly chosen point and their right-hand-side subchromosomes are exchanged, as shown in figure 1. For every new chromosome the selection of every parent is performed through (a) A group of K ≥ 2 randomly selected chromosomes is created.
(b) The chromosome with the best fitness value in the group is selected, 5. The mutation procedure is applied to each member of the population with probability equal to mutation rate.
6. Set k = k + 1
7. If k > maxK or the best fitness value has fallen below a predefined thresh- old, then the Evolution Step is terminated.
Grammar specification
The file that contains the grammar specification must be determined by the user with the -g option from the command line. The grammar must be specified in any simple text (ASCII) file with the format shown in figure 2.
The start symbol (<S>) is required and must be specified in the above form. The start symbol can give only one non-terminal symbol (e.g. <expr>).
The available non-terminal symbols are <expr>, <func>, <op>. The available terminal symbols are +,-,*,/,(,). The numbers are represented as lists of digits (including “.”) and can be specified by <terminal>. The subrules for the Figure 3: The rules of the symbol <terminal> <digit>::= 0 | 1 | 2 |3 |4 |5 |6 |7 |8 |9 <terminal> symbol are fixed in the code and they can not changed by the user. These rules are shown in figure 3. The symbol d in the rule for <xxlist> denotes the dimensionality of the objective function. The available functions are: sin, cos, log, exp, log10, tan, abs, sqrt, int, atan, acos, asin. If a non- teminal specification has more that one rules, those rules can be specified with a “|” instead of typing the entire left hand (e.g. in the second rule of <expr> , “<expr> ::=” is replaced by “|”). In this way, the user can easily alter the program parameters by specifying a different grammar. If, for example, it is known that log or log10 cannot exist in the desired output, the user can remove them from the grammar specification.
The main program gdf
The created executable gdf takes the following series of parameters in the com- 1. -h: The program prints a help screen to the user with a description for
each command line parameter and it terminates.
2. -g grammar file: The parameter grammar files determines a file with a
valid grammar for the tool. The user must have read access to the specified file. The default value for this parameter is grammar.txt, which is the default grammar and it is copied after the installation in the subdirectory 3. -p problem file: The parameter problem file determines a file containing
the points where the data fitting procedure will be applied. The user must have read access to the specified file and the contents of the file must conform to the format of the figure 4. The integer number D in the file determines the dimensionality of the specific problem, the number M determines the amount of points in the file and each consecutive line defines a point where the data fitting procedure will be applied. This parameter is the only one required from the program 4. -t test file: The parameter test file determines a file in the same format
as the problem file, where the produced function will be tested after the termination of the genetic algorithm. The user must have read access to xM1 xM2 . . . xMD yM the specified file and the dimension in the file test file must be the same 5. -c count: The parameter count specifies the number of chromosomes in
the genetic population. The default value for this parameter is 500.
6. -l length: The parameter length specifies the length of each chromosome
in the genetic population. The default value for this parameter is 100. The standard GE approach uses variable - length chrosomes, but the tool GDF uses chromosomes with static length in order to prevent it from creating 7. -s srate: The parameter srate specifies the value for the parameter selec-
tion rate of the genetic algorithm. The default value for this parameter
8. -m mrate: The parameter mrate determines the value for the parame-
ter mutation rate of the genetic algorithm. The default value for this
9. -n generations: The integer parameter generations determines the max-
imum number of the generations allowed for the genetic algorithm. The default value for this parameter is 2000.
10. -r seed: The parameter seed specifies the seed for the random number
generator. The default value for this parameter is 1.
In each generation the program prints in the screen the following quantities: 1. The number of generations passed.
3. The fitness value of the best discovered function.
Test runs
The performance of the proposed tool was measured by using 5 different datasets: one for the continuous function f (x) = x sin(x2) and 4 real life problems.
The function f (x) = x sin x2
The tool was tested on this function using a dataset with 100 random points from the function in the range [-2,2]. The tool was issued with the following where is the file containing the points for the data fitting. The last 10 lines from the output of the above program are the following: generation=156 f(x)=sqrt(sqrt((log(2.72))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-1.260575367e-06 generation=157 f(x)=sqrt(sqrt((log(2.72))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-1.260575367e-06 generation=158 f(x)=sqrt(sqrt((log(2.72))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-1.260575367e-06 generation=159 f(x)=sqrt(sqrt((log(2.72))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-1.260575367e-06 generation=160 f(x)=sqrt(sqrt((log(2.72))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-1.260575367e-06 generation=161 f(x)=sqrt(sqrt((log(2.72))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-1.260575367e-06 generation=162 f(x)=sqrt(sin((log(4.72))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-4.10576003e-07 generation=163 f(x)=sqrt(sin((log(4.72))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-4.10576003e-07 generation=164 f(x)=sqrt(sin((log(4.72))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-4.10576003e-07 generation=165 f(x)=sqrt(sin((log(4.82))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-4.832536538e-11 The Ailerons problem
This problem has 40 attributes and consists of 7150 points. This data set ad- dresses a control problem, namely flying a F16 aircraft. The attributes describe the status of the aeroplane, while the goal is to predict the control action on the ailerons of the aircraft. The original owner of the database is Rui Camacho (rca- The program gdf were trained with 200 points from the dataset and the resulting expression was tested over the rest 6950 points.
Ten independent experiments were conducted with different random seeds each time and in all cases the absolute value of the fitness was below 3 × 106. The 1187 − x8 756.92x3 with fitness 1.65 × 106 and test error per point 1.07 × 107. The resulting function depends only on 5 from the 40 attributes .
The Elevators problem
The original source of this problem is the experiments of Rui Camacho (rca- The problem has 18 attributes and this data set is also obtained from the task of controlling a F16 aircraft, although the target variable and attributes are different from the ailerons domain. In this case the goal variable is related to an action taken on the elevators of the aircraft. From this dataset 200 points were used for training and 8452 for testing. The best 9.8x13 − x18 + 9.91x10 exp (x4) with fitness value 1.64 × 103 and mean test error 3.41 × 105.
The Pyrimidines problem and it is a problem of 27 attributes and 74 number of patterns. The task consists of Learning Quantitative Structure Activity Relationships (QSARs).
The Inhibition of Dihydrofolate Reductase by Pyrimidines.The data are de- scribed in: King, Ross .D., Muggleton, Steven., Lewis, Richard. and Sternberg, Michael.J.E. Drug Design by machine learning: the use of inductive logic pro- gramming to model the structure-activity relationships of trimethoprim analo- gies binding to dihydrofolate reductase. From the above dataset 50 patterns were used for training and 24 for testing. The best discovered function was: (x) = cos (cos ( x20)) cos (x22 log (sin (exp (x6))))+ with fitness value 1.33 × 101and mean test error 7.25 × 103.
The Basketball problem
The source of this dataset is from Smoothing Methods in Statistics available which is a problem of four attributes and it tries to identify the points scored per minute from the attributes “assists per minute”, “player height”,”time played” and “player age”. From the 96 available patterns 60 were used for training and 36 for testing. The best discovered function was: 33.10.245 cos(x3x2) with fitness value 3.78 × 101 and mean test error 6.99 × 103.
The introduced tool is a program aimed to fit a function in a series of points of an arbitrary dimension. The applied function is created through the evolutionary process of Grammatical Evolution and as a consequence there is no guarantee that the goal will be achieved. However, the user can apply the tool even in cases where the existence of an analytical solution is difficult to be found. Also, the tool is provided with the ability of changing the underlying grammar according References
[1] M. O’Neill and C. Ryan, “Grammatical Evolution,” IEEE Trans. Evolu- tionary Computation, Vol. 5, pp. 349-358, 2001.
[2] De Boor C., “A practical guide to splines”, Springer - Verlang, New York, [3] Kincaid D., and Cheney W., “Numerical Analysis”, Brooks/Cole Publish- [4] Hornik K., Stinchcombe M., and White H., Neural Networks 2 (1989) 359.
[5] Cybenko G., “Approximation by superpositions of a sigmoidal function”, Mathematics of Control Signals and Systems 2 (1989) 303-314.
[6] J. R. Koza, Genetic Programming: On the programming of Computer by Means of Natural Selection. MIT Press: Cambridge, MA, 1992.
[7] M. O’Neill and C. Ryan, Grammatical Evolution: Evolutionary Automatic Programming in a Arbitrary Language, volume 4 of Genetic programming.
[8] C. Ryan, M. O’Neill, and J.J. Collins, “Grammatical Evolution: Solving Trigonometric Identities,” In proceedings of Mendel 1998: 4th International Mendel Conference on Genetic Algorithms, Optimization Problems, Fuzzy Logic, Neural Networks, Rough Sets., Brno, Czech Republic, June 24-26 1998. Technical University of Brno, Faculty of Mechanical Engineering, pp.
[9] Collins J. and Ryan C., “Automatic Generation of Robot Behaviors using Grammatical Evolution,” In Proc. of AROB 2000, the Fifth International Symposium on Artificial Life and Robotics.
[10] M. O’Neill and C. Ryan, “Automatic generation of caching algorithms,” In Kaisa Miettinen, Marko M. Mkel, Pekka Neittaanmki, and Jacques Peri- aux (eds.), Evolutionary Algorithms in Engineering and Computer Science, Jyvskyl, Finland, 30 May - 3 June 1999, John Wiley & Sons, pp. 127-134, [11] A. Brabazon and M. O’Neill, “A grammar model for foreign-exchange trad- ing,” In H. R. Arabnia et al., editor, Proceedings of the International con- ference on Artificial Intelligence, volume II, CSREA Press, 23-26 June 2003,


Lmx grease

LMX GREASE Lithium complex grease DESCRIPTION LMX Grease is a lithium-complex thickened mineral oil based grease of NLGI No.2 consistency having extreme pressure properties andinhibited against oxidation and corrosion. APPLICATIONS Lithium complex greases have similar basic properties to conventionalSuch performance characteristics mean that lithium complex greaseslithium soap grease


Was ist Viagra und wann wird es angewendet?Viagra ist der erste Vertreter einer neuen Medikamentengruppe mit der Bezeichnung Phosphodiesterase-Typ 5-Inhibitoren. Es wirkt, indem es bei sexuel er Erregung die Entspannung der Blutgefässe in Ihrem Penis unterstuetzt. Dadurch kann Blut leichter in den Penis fliessen und Sie erreichen auf natuerliche Weise eine Erektion. Sie sol en Viagra nicht einn

Copyright © 2014 Articles Finder