GDF: A tool for function estimation through
Ioannis G. Tsoulos (1)∗, Dimitris Gavrilis(2), Evangelos Dermatas(2)
(1)Department of Computer Science, University of Ioannina, P.O.
(2)Department of Electrical & Computer Egineering, University of
Abstract
This article introduces a tool for data fitting that is based on genetic
programming and especially on the grammatical evolution technique. The
user needs to input a series of points and the accompanied dimensionality
n and the tool will produce via the genetic programming paradigm a
function f : Rn → R that is optimal in the least squares sense. The tool
is entirely written in ANSI C++ and it can be installed in every UNIX
PACS:02.30.Mv;02.60.-x;02.60.Ed ;07.05.Mh; ∗Corresponding author. Email: sheridan@cs.uoi.gr
PROGRAM SUMMARY Program available from: CPC Program Library, Queen’s University of Belfast,
Computer for which the program is designed and others on which it has beentested : The tool is designed to be portable in all systems running the GNU
Installation: University of Ioannina and University of Patras, Greece. Programming language used : GNU-C++. Memory required to execute with typical data: 200KB. Has the code been vectorised or parallelized? : No. No. of bytes in distributed program,including test data etc.: 30 Kbytes. Distribution format : gzipped tar file. Keywords: Function approximation, stochastic methods, genetic programming,
Method of solution: Genetic programming. LONG WRITE UP Introduction
The problem of function estimation consists of finding a function that will best
approximate a set of n-dimensional points given their output. Function es-
timation finds many applications in physics, chemistry, signal processing etc.
and it can be formulated as follows: Given M points and associated values
(xi, yi) , i = 1, . . . , M with xi ∈ Rn estimate a function f : Rn → R thatminimizes the least squares “Error”
Through the years many methods have been proposed for this problem, such as
spline based [2, 3] or neural network based [4, 5]. Although these techniques have
been applied successfully to many data fitting problems, they produce functional
forms which consist of applications of specific functions such as polynomials
or sigmoidal functions. The proposed programming tool takes as input the
points (xi, yi) and creates a functional form that minimizes the quantity in
equation 1 through the procedure of Grammatical Evolution [1]. Grammatical
Evolution is an evolutionary processes that can create programs in an arbitrary
language. The production is performed using a mapping process governed by
a grammar expressed in Backus Naur Form. Grammatical evolution has been
applied successfully to problems such as symbolic regression [7], discovery of
trigonometric identities [8], robot control [9], caching algorithms [10], financial
prediction [11] etc. The rest of this article is organized as follows: in section 2
the contents of the distribution are presented, in section 3 the installation steps
for any UNIX programming environment are expressed in detail, in section 4 the
main parts of the distribution such as the underlying algorithm, the grammar
specifications and the gdf program are thoroughly analyzed and finally in section
5 some conclusions from the application of the tool are listed. Distribution
The package is distributed in a tar.gz file named GDF.tar.gz and under UNIX
systems the user must issue the following commands to extract the associated
These steps create a directory named GDF with the following contents:
1. bin: It is a directory which is initially empty. After the compilation
of the package it will contain the executable gdf and the text file named
grammar.txt. The first file is the created programming tool and the second
one is an auxiliary file that contains the grammar of the tool, expressed
2. doc: This directory contains the documentation of the package (this file)
in different formats: A LYX file, A LATEX file, a PostScript file and a pdf
3. include: The directory which contains the header files for all the classes
4. src: The directory with the source files of the package.
5. Makefile: It is the file that will be read by the make utility in order to
built the tool. There is no need for the user to modify this file. Installation
The following steps are required in order to build the tool:
1. Uncompress the tool as described in the previous section.
After the compilation the binary file gdf will be placed in the bin subdirectory
accompanied with the text file grammar.txt The tool is entirely written in GNU
C++ version 3.2.3, but it can be installed in systems with different ANSI C++
compiler. The only modification required is to replace the line
in the file Makefile under the src subdirectory with the following one
where mycpp is the name of the corresponding ANSI C++ compiler in the
Program details The underlying algorithm
The programming tool is based on the following stochastic algorithm:
Initialization Step:
1. The program reads the data to be fitted from a text file.
2. The program reads the used grammar from a text file.
3. Every chromosome in the genetic population is initialized. The initializa-
tion is performed by a randomly selection of a number in the range [0,255]
for every element of each chromosome.
4. The values for the parameters selection rate and mutation rate are
selected. The selection rate denotes the fraction of the number of chro-
mosomes that will go through unchanged to the next generation. That
means that the probability for crossover is set to 1 - selection rate. The
values for these parameters are mutually independent and they must be
5. Set k = 0, where k is the amount of the generations.
6. Set the value for the parameter maxK, where maxK is the maximum
Evolution Step:
1. For every chromosome in the population, a function is created through
the process of Grammatical Evolution.
2. The fitness of each member of the population is evaluated.
3. The chromosomes are sorted in descending order according to their fitness
4. A bunch of (1-selection rate)×population size new chromosomes is cre-
ated. Every new chromosome is formed from two selected individuals
(parents) of the current population with one - point crossover. In that
procedure the chromosomes are cut at a randomly chosen point and their
right-hand-side subchromosomes are exchanged, as shown in figure 1. For
every new chromosome the selection of every parent is performed through
(a) A group of K ≥ 2 randomly selected chromosomes is created.
(b) The chromosome with the best fitness value in the group is selected,
5. The mutation procedure is applied to each member of the population with
probability equal to mutation rate.
6. Set k = k + 1
7. If k > maxK or the best fitness value has fallen below a predefined thresh-
old, then the Evolution Step is terminated. Grammar specification
The file that contains the grammar specification must be determined by the user
with the -g option from the command line. The grammar must be specified in
any simple text (ASCII) file with the format shown in figure 2.
The start symbol (<S>) is required and must be specified in the above
form. The start symbol can give only one non-terminal symbol (e.g. <expr>).
The available non-terminal symbols are <expr>, <func>, <op>. The available
terminal symbols are +,-,*,/,(,). The numbers are represented as lists of digits
(including “.”) and can be specified by <terminal>. The subrules for the
Figure 3: The rules of the symbol <terminal>
<digit>::= 0 | 1 | 2 |3 |4 |5 |6 |7 |8 |9
<terminal> symbol are fixed in the code and they can not changed by the
user. These rules are shown in figure 3. The symbol d in the rule for <xxlist>
denotes the dimensionality of the objective function. The available functions
are: sin, cos, log, exp, log10, tan, abs, sqrt, int, atan, acos, asin. If a non-
teminal specification has more that one rules, those rules can be specified with
a “|” instead of typing the entire left hand (e.g. in the second rule of <expr>
, “<expr> ::=” is replaced by “|”). In this way, the user can easily alter the
program parameters by specifying a different grammar. If, for example, it is
known that log or log10 cannot exist in the desired output, the user can remove
them from the grammar specification. The main program gdf
The created executable gdf takes the following series of parameters in the com-
1. -h: The program prints a help screen to the user with a description for
each command line parameter and it terminates.
2. -g grammar file: The parameter grammar files determines a file with a
valid grammar for the tool. The user must have read access to the specified
file. The default value for this parameter is grammar.txt, which is the
default grammar and it is copied after the installation in the subdirectory
3. -p problem file: The parameter problem file determines a file containing
the points where the data fitting procedure will be applied. The user
must have read access to the specified file and the contents of the file
must conform to the format of the figure 4. The integer number D in
the file determines the dimensionality of the specific problem, the number
M determines the amount of points in the file and each consecutive line
defines a point where the data fitting procedure will be applied. This
parameter is the only one required from the program
4. -t test file: The parameter test file determines a file in the same format
as the problem file, where the produced function will be tested after the
termination of the genetic algorithm. The user must have read access to
xM1 xM2 . . . xMD yM
the specified file and the dimension in the file test file must be the same
5. -c count: The parameter count specifies the number of chromosomes in
the genetic population. The default value for this parameter is 500.
6. -l length: The parameter length specifies the length of each chromosome
in the genetic population. The default value for this parameter is 100. The
standard GE approach uses variable - length chrosomes, but the tool GDF
uses chromosomes with static length in order to prevent it from creating
7. -s srate: The parameter srate specifies the value for the parameter selec- tion rate of the genetic algorithm. The default value for this parameter
8. -m mrate: The parameter mrate determines the value for the parame-
ter mutation rate of the genetic algorithm. The default value for this
9. -n generations: The integer parameter generations determines the max-
imum number of the generations allowed for the genetic algorithm. The
default value for this parameter is 2000.
10. -r seed: The parameter seed specifies the seed for the random number
generator. The default value for this parameter is 1.
In each generation the program prints in the screen the following quantities:
1. The number of generations passed.
3. The fitness value of the best discovered function. Test runs
The performance of the proposed tool was measured by using 5 different datasets:
one for the continuous function f (x) = x sin(x2) and 4 real life problems. The function f (x) = x sin x2
The tool was tested on this function using a dataset with 100 random points
from the function in the range [-2,2]. The tool was issued with the following
where xsinxx.data is the file containing the points for the data fitting. The last
10 lines from the output of the above program are the following:
generation=156 f(x)=sqrt(sqrt((log(2.72))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-1.260575367e-06
generation=157 f(x)=sqrt(sqrt((log(2.72))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-1.260575367e-06
generation=158 f(x)=sqrt(sqrt((log(2.72))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-1.260575367e-06
generation=159 f(x)=sqrt(sqrt((log(2.72))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-1.260575367e-06
generation=160 f(x)=sqrt(sqrt((log(2.72))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-1.260575367e-06
generation=161 f(x)=sqrt(sqrt((log(2.72))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-1.260575367e-06
generation=162 f(x)=sqrt(sin((log(4.72))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-4.10576003e-07
generation=163 f(x)=sqrt(sin((log(4.72))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-4.10576003e-07
generation=164 f(x)=sqrt(sin((log(4.72))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-4.10576003e-07
generation=165 f(x)=sqrt(sin((log(4.82))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-4.832536538e-11
The Ailerons problem
This problem has 40 attributes and consists of 7150 points. This data set ad-
dresses a control problem, namely flying a F16 aircraft. The attributes describe
the status of the aeroplane, while the goal is to predict the control action on the
ailerons of the aircraft. The original owner of the database is Rui Camacho (rca-
macho@garfield.fe.up.pt). The program gdf were trained with 200 points from
the dataset and the resulting expression was tested over the rest 6950 points.
Ten independent experiments were conducted with different random seeds each
time and in all cases the absolute value of the fitness was below 3 × 10−6. The
1187 − x8 − 756.92x3
with fitness −1.65 × 10−6 and test error per point 1.07 × 10−7. The resulting
function depends only on 5 from the 40 attributes . The Elevators problem
The original source of this problem is the experiments of Rui Camacho (rca-
macho@garfield.fe.up.pt). The problem has 18 attributes and this data set is
also obtained from the task of controlling a F16 aircraft, although the target
variable and attributes are different from the ailerons domain. In this case the
goal variable is related to an action taken on the elevators of the aircraft. From
this dataset 200 points were used for training and 8452 for testing. The best
− 9.8x13 − x18 +
− 9.91x10 exp (x4)
with fitness value −1.64 × 10−3 and mean test error 3.41 × 10−5. The Pyrimidines problem
http://www.ncc.up.pt/~ltorgo/Regression/DataSets.html
and it is a problem of 27 attributes and 74 number of patterns. The task
consists of Learning Quantitative Structure Activity Relationships (QSARs).
The Inhibition of Dihydrofolate Reductase by Pyrimidines.The data are de-
scribed in: King, Ross .D., Muggleton, Steven., Lewis, Richard. and Sternberg,
Michael.J.E. Drug Design by machine learning: the use of inductive logic pro-
gramming to model the structure-activity relationships of trimethoprim analo-
gies binding to dihydrofolate reductase. From the above dataset 50 patterns
were used for training and 24 for testing. The best discovered function was:
(x) = cos (cos ( x20)) cos (x22 − log (sin (exp (x6))))+
with fitness value −1.33 × 10−1and mean test error 7.25 × 10−3. The Basketball problem
The source of this dataset is from Smoothing Methods in Statistics available
which is a problem of four attributes and it tries to identify the points scored per
minute from the attributes “assists per minute”, “player height”,”time played”
and “player age”. From the 96 available patterns 60 were used for training and
36 for testing. The best discovered function was:
33.1−0.245 cos(x3x2)
with fitness value −3.78 × 10−1 and mean test error 6.99 × 10−3. Conclusions
The introduced tool is a program aimed to fit a function in a series of points of an
arbitrary dimension. The applied function is created through the evolutionary
process of Grammatical Evolution and as a consequence there is no guarantee
that the goal will be achieved. However, the user can apply the tool even in cases
where the existence of an analytical solution is difficult to be found. Also, the
tool is provided with the ability of changing the underlying grammar according
References
[1] M. O’Neill and C. Ryan, “Grammatical Evolution,” IEEE Trans. Evolu-
tionary Computation, Vol. 5, pp. 349-358, 2001.
[2] De Boor C., “A practical guide to splines”, Springer - Verlang, New York,
[3] Kincaid D., and Cheney W., “Numerical Analysis”, Brooks/Cole Publish-
[4] Hornik K., Stinchcombe M., and White H., Neural Networks 2 (1989) 359.
[5] Cybenko G., “Approximation by superpositions of a sigmoidal function”,
Mathematics of Control Signals and Systems 2 (1989) 303-314.
[6] J. R. Koza, Genetic Programming: On the programming of Computer by
Means of Natural Selection. MIT Press: Cambridge, MA, 1992.
[7] M. O’Neill and C. Ryan, Grammatical Evolution: Evolutionary Automatic
Programming in a Arbitrary Language, volume 4 of Genetic programming.
[8] C. Ryan, M. O’Neill, and J.J. Collins, “Grammatical Evolution: Solving
Trigonometric Identities,” In proceedings of Mendel 1998: 4th International
Mendel Conference on Genetic Algorithms, Optimization Problems, Fuzzy
Logic, Neural Networks, Rough Sets., Brno, Czech Republic, June 24-26
1998. Technical University of Brno, Faculty of Mechanical Engineering, pp.
[9] Collins J. and Ryan C., “Automatic Generation of Robot Behaviors using
Grammatical Evolution,” In Proc. of AROB 2000, the Fifth International
Symposium on Artificial Life and Robotics.
[10] M. O’Neill and C. Ryan, “Automatic generation of caching algorithms,” In
Kaisa Miettinen, Marko M. Mkel, Pekka Neittaanmki, and Jacques Peri-
aux (eds.), Evolutionary Algorithms in Engineering and Computer Science,
Jyvskyl, Finland, 30 May - 3 June 1999, John Wiley & Sons, pp. 127-134,
[11] A. Brabazon and M. O’Neill, “A grammar model for foreign-exchange trad-
ing,” In H. R. Arabnia et al., editor, Proceedings of the International con-
ference on Artificial Intelligence, volume II, CSREA Press, 23-26 June 2003,
LMX GREASE Lithium complex grease DESCRIPTION LMX Grease is a lithium-complex thickened mineral oil based grease of NLGI No.2 consistency having extreme pressure properties andinhibited against oxidation and corrosion. APPLICATIONS Lithium complex greases have similar basic properties to conventionalSuch performance characteristics mean that lithium complex greaseslithium soap grease
Was ist Viagra und wann wird es angewendet?Viagra ist der erste Vertreter einer neuen Medikamentengruppe mit der Bezeichnung Phosphodiesterase-Typ 5-Inhibitoren. Es wirkt, indem es bei sexuel er Erregung die Entspannung der Blutgefässe in Ihrem Penis unterstuetzt. Dadurch kann Blut leichter in den Penis fliessen und Sie erreichen auf natuerliche Weise eine Erektion. Sie sol en Viagra nicht einn