GDF: A tool for function estimation through
Ioannis G. Tsoulos (1)∗
, Dimitris Gavrilis(2), Evangelos Dermatas(2)
(1)Department of Computer Science, University of Ioannina, P.O.
(2)Department of Electrical & Computer Egineering, University of
This article introduces a tool for data fitting that is based on genetic
programming and especially on the grammatical evolution technique. The
user needs to input a series of points and the accompanied dimensionality
and the tool will produce via the genetic programming paradigm a
: Rn → R
that is optimal in the least squares sense. The tool
is entirely written in ANSI C++ and it can be installed in every UNIX
Corresponding author. Email: firstname.lastname@example.org
Program available from
: CPC Program Library, Queen’s University of Belfast,
Computer for which the program is designed and others on which it has been
: The tool is designed to be portable in all systems running the GNU
: University of Ioannina and University of Patras, Greece.
Programming language used
Memory required to execute with typical data
Has the code been vectorised or parallelized?
No. of bytes in distributed program,including test data etc
.: 30 Kbytes.
: gzipped tar ﬁle.
: Function approximation, stochastic methods, genetic programming,
Method of solution
: Genetic programming.
LONG WRITE UP
The problem of function estimation consists of ﬁnding a function that will best
approximate a set of n-dimensional points given their output. Function es-
timation ﬁnds many applications in physics, chemistry, signal processing etc.
and it can be formulated as follows: Given M points and associated values
) , i
= 1, . . . , M with xi ∈ Rn estimate a function f
: Rn → R that
minimizes the least squares “Error”
Through the years many methods have been proposed for this problem, such as
spline based [2, 3] or neural network based [4, 5]. Although these techniques have
been applied successfully to many data ﬁtting problems, they produce functional
forms which consist of applications of speciﬁc functions such as polynomials
or sigmoidal functions. The proposed programming tool takes as input the
points (xi, yi
) and creates a functional form that minimizes the quantity in
equation 1 through the procedure of Grammatical Evolution . Grammatical
Evolution is an evolutionary processes that can create programs in an arbitrary
language. The production is performed using a mapping process governed by
a grammar expressed in Backus Naur Form. Grammatical evolution has been
applied successfully to problems such as symbolic regression , discovery of
trigonometric identities , robot control , caching algorithms , ﬁnancial
prediction  etc. The rest of this article is organized as follows: in section 2
the contents of the distribution are presented, in section 3 the installation steps
for any UNIX programming environment are expressed in detail, in section 4 the
main parts of the distribution such as the underlying algorithm, the grammar
speciﬁcations and the gdf program are thoroughly analyzed and ﬁnally in section
5 some conclusions from the application of the tool are listed.
The package is distributed in a tar.gz ﬁle named GDF.tar.gz and under UNIX
systems the user must issue the following commands to extract the associated
These steps create a directory named GDF with the following contents:
: It is a directory which is initially empty. After the compilation
of the package it will contain the executable gdf and the text ﬁle named
grammar.txt. The ﬁrst ﬁle is the created programming tool and the second
one is an auxiliary ﬁle that contains the grammar of the tool, expressed
: This directory contains the documentation of the package (this ﬁle)
in diﬀerent formats: A LYX ﬁle, A LATEX ﬁle, a PostScript ﬁle and a pdf
: The directory which contains the header ﬁles for all the classes
: The directory with the source ﬁles of the package.
: It is the ﬁle that will be read by the make utility in order to
built the tool. There is no need for the user to modify this ﬁle.
The following steps are required in order to build the tool:
1. Uncompress the tool as described in the previous section.
After the compilation the binary ﬁle gdf will be placed in the bin subdirectory
accompanied with the text ﬁle grammar.txt The tool is entirely written in GNU
C++ version 3.2.3, but it can be installed in systems with diﬀerent ANSI C++
compiler. The only modiﬁcation required is to replace the line
in the ﬁle Makefile under the src subdirectory with the following one
where mycpp is the name of the corresponding ANSI C++ compiler in the
The underlying algorithm
The programming tool is based on the following stochastic algorithm:
1. The program reads the data to be ﬁtted from a text ﬁle.
2. The program reads the used grammar from a text ﬁle.
3. Every chromosome in the genetic population is initialized. The initializa-
tion is performed by a randomly selection of a number in the range [0,255]
for every element of each chromosome.
4. The values for the parameters selection rate
and mutation rate
selected. The selection rate
denotes the fraction of the number of chro-
mosomes that will go through unchanged to the next generation. That
means that the probability for crossover is set to 1 - selection rate
values for these parameters are mutually independent and they must be
5. Set k
= 0, where k
is the amount of the generations.
6. Set the value for the parameter maxK, where maxK is the maximum
1. For every chromosome in the population, a function is created through
the process of Grammatical Evolution.
2. The ﬁtness of each member of the population is evaluated.
3. The chromosomes are sorted in descending order according to their ﬁtness
4. A bunch of (1-selection rate)×
population size new chromosomes is cre-
ated. Every new chromosome is formed from two selected individuals
(parents) of the current population with one - point crossover. In that
procedure the chromosomes are cut at a randomly chosen point and their
right-hand-side subchromosomes are exchanged, as shown in ﬁgure 1. For
every new chromosome the selection of every parent is performed through
(a) A group of K ≥
2 randomly selected chromosomes is created.
(b) The chromosome with the best ﬁtness value in the group is selected,
5. The mutation procedure is applied to each member of the population with
probability equal to mutation rate
6. Set k
7. If k >
maxK or the best ﬁtness value has fallen below a predeﬁned thresh-
old, then the Evolution
Step is terminated.
The ﬁle that contains the grammar speciﬁcation must be determined by the user
with the -g option from the command line. The grammar must be speciﬁed in
any simple text (ASCII) ﬁle with the format shown in ﬁgure 2.
The start symbol (<
) is required and must be speciﬁed in the above
form. The start symbol can give only one non-terminal symbol (e.g. <
The available non-terminal symbols are <
. The available
terminal symbols are +,-,*,/,(,). The numbers are represented as lists of digits
(including “.”) and can be speciﬁed by <
. The subrules for the
Figure 3: The rules of the symbol <
<digit>::= 0 | 1 | 2 |3 |4 |5 |6 |7 |8 |9
symbol are ﬁxed in the code and they can not changed by the
user. These rules are shown in ﬁgure 3. The symbol d in the rule for <
denotes the dimensionality of the objective function. The available functions
are: sin, cos, log, exp, log10, tan, abs, sqrt, int, atan, acos, asin. If a non-
teminal speciﬁcation has more that one rules, those rules can be speciﬁed with
” instead of typing the entire left hand (e.g. in the second rule of <
::=” is replaced by “|
”). In this way, the user can easily alter the
program parameters by specifying a diﬀerent grammar. If, for example, it is
known that log or log10 cannot exist in the desired output, the user can remove
them from the grammar speciﬁcation.
The main program gdf
The created executable gdf takes the following series of parameters in the com-
: The program prints a help screen to the user with a description for
each command line parameter and it terminates.
grammar ﬁle: The parameter grammar ﬁles determines a ﬁle with a
valid grammar for the tool. The user must have read access to the speciﬁed
ﬁle. The default value for this parameter is grammar.txt, which is the
default grammar and it is copied after the installation in the subdirectory
problem ﬁle: The parameter problem ﬁle determines a ﬁle containing
the points where the data ﬁtting procedure will be applied. The user
must have read access to the speciﬁed ﬁle and the contents of the ﬁle
must conform to the format of the ﬁgure 4. The integer number D in
the ﬁle determines the dimensionality of the speciﬁc problem, the number
M determines the amount of points in the ﬁle and each consecutive line
deﬁnes a point where the data ﬁtting procedure will be applied. This
parameter is the only one required from the program
test ﬁle: The parameter test ﬁle determines a ﬁle in the same format
as the problem ﬁle, where the produced function will be tested after the
termination of the genetic algorithm. The user must have read access to
2 . . . xMD yM
the speciﬁed ﬁle and the dimension in the ﬁle test ﬁle must be the same
count: The parameter count speciﬁes the number of chromosomes in
the genetic population. The default value for this parameter is 500.
length: The parameter length speciﬁes the length of each chromosome
in the genetic population. The default value for this parameter is 100. The
standard GE approach uses variable - length chrosomes, but the tool GDF
uses chromosomes with static length in order to prevent it from creating
srate: The parameter srate speciﬁes the value for the parameter selec-
of the genetic algorithm. The default value for this parameter
mrate: The parameter mrate determines the value for the parame-
ter mutation rate
of the genetic algorithm. The default value for this
generations: The integer parameter generations determines the max-
imum number of the generations allowed for the genetic algorithm. The
default value for this parameter is 2000.
seed: The parameter seed speciﬁes the seed for the random number
generator. The default value for this parameter is 1.
In each generation the program prints in the screen the following quantities:
1. The number of generations passed.
3. The ﬁtness value of the best discovered function.
The performance of the proposed tool was measured by using 5 diﬀerent datasets:
one for the continuous function f
) = x
2) and 4 real life problems.
The function f
) = x
The tool was tested on this function using a dataset with 100 random points
from the function in the range [-2,2]. The tool was issued with the following
where xsinxx.data is the ﬁle containing the points for the data ﬁtting. The last
10 lines from the output of the above program are the following:
generation=156 f(x)=sqrt(sqrt((log(2.72))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-1.260575367e-06
generation=157 f(x)=sqrt(sqrt((log(2.72))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-1.260575367e-06
generation=158 f(x)=sqrt(sqrt((log(2.72))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-1.260575367e-06
generation=159 f(x)=sqrt(sqrt((log(2.72))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-1.260575367e-06
generation=160 f(x)=sqrt(sqrt((log(2.72))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-1.260575367e-06
generation=161 f(x)=sqrt(sqrt((log(2.72))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-1.260575367e-06
generation=162 f(x)=sqrt(sin((log(4.72))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-4.10576003e-07
generation=163 f(x)=sqrt(sin((log(4.72))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-4.10576003e-07
generation=164 f(x)=sqrt(sin((log(4.72))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-4.10576003e-07
generation=165 f(x)=sqrt(sin((log(4.82))))*sin(abs(exp(log(((x1))*x1))))*x1 fitness=-4.832536538e-11
The Ailerons problem
This problem has 40 attributes and consists of 7150 points. This data set ad-
dresses a control problem, namely ﬂying a F16 aircraft. The attributes describe
the status of the aeroplane, while the goal is to predict the control action on the
ailerons of the aircraft. The original owner of the database is Rui Camacho (rca-
macho@garﬁeld.fe.up.pt). The program gdf were trained with 200 points from
the dataset and the resulting expression was tested over the rest 6950 points.
Ten independent experiments were conducted with diﬀerent random seeds each
time and in all cases the absolute value of the ﬁtness was below 3 ×
1187 − x
with ﬁtness −
6 and test error per point 1.
7. The resulting
function depends only on 5 from the 40 attributes .
The Elevators problem
The original source of this problem is the experiments of Rui Camacho (rca-
macho@garﬁeld.fe.up.pt). The problem has 18 attributes and this data set is
also obtained from the task of controlling a F16 aircraft, although the target
variable and attributes are diﬀerent from the ailerons domain. In this case the
goal variable is related to an action taken on the elevators of the aircraft. From
this dataset 200 points were used for training and 8452 for testing. The best
13 − x
10 exp (x
with ﬁtness value −
3 and mean test error 3.
The Pyrimidines problem
and it is a problem of 27 attributes and 74 number of patterns. The task
consists of Learning Quantitative Structure Activity Relationships (QSARs).
The Inhibition of Dihydrofolate Reductase by Pyrimidines.The data are de-
scribed in: King, Ross .D., Muggleton, Steven., Lewis, Richard. and Sternberg,
Michael.J.E. Drug Design by machine learning: the use of inductive logic pro-
gramming to model the structure-activity relationships of trimethoprim analo-
gies binding to dihydrofolate reductase. From the above dataset 50 patterns
were used for training and 24 for testing. The best discovered function was:
) = cos (cos ( x
20)) cos (x
log (sin (exp (x
with ﬁtness value −
1and mean test error 7.
The Basketball problem
The source of this dataset is from Smoothing Methods in Statistics available
which is a problem of four attributes and it tries to identify the points scored per
minute from the attributes “assists per minute”, “player height”,”time played”
and “player age”. From the 96 available patterns 60 were used for training and
36 for testing. The best discovered function was:
with ﬁtness value −
1 and mean test error 6.
The introduced tool is a program aimed to ﬁt a function in a series of points of an
arbitrary dimension. The applied function is created through the evolutionary
process of Grammatical Evolution and as a consequence there is no guarantee
that the goal will be achieved. However, the user can apply the tool even in cases
where the existence of an analytical solution is diﬃcult to be found. Also, the
tool is provided with the ability of changing the underlying grammar according
 M. O’Neill and C. Ryan, “Grammatical Evolution,” IEEE Trans. Evolu-
tionary Computation, Vol. 5, pp. 349-358, 2001.
 De Boor C., “A practical guide to splines”, Springer - Verlang, New York,
 Kincaid D., and Cheney W., “Numerical Analysis”, Brooks/Cole Publish-
 Hornik K., Stinchcombe M., and White H., Neural Networks 2 (1989) 359.
 Cybenko G., “Approximation by superpositions of a sigmoidal function”,
Mathematics of Control Signals and Systems 2 (1989) 303-314.
 J. R. Koza, Genetic Programming: On the programming of Computer by
Means of Natural Selection. MIT Press: Cambridge, MA, 1992.
 M. O’Neill and C. Ryan, Grammatical Evolution: Evolutionary Automatic
Programming in a Arbitrary Language, volume 4 of Genetic programming.
 C. Ryan, M. O’Neill, and J.J. Collins, “Grammatical Evolution: Solving
Trigonometric Identities,” In proceedings of Mendel 1998: 4th International
Mendel Conference on Genetic Algorithms, Optimization Problems, Fuzzy
Logic, Neural Networks, Rough Sets., Brno, Czech Republic, June 24-26
1998. Technical University of Brno, Faculty of Mechanical Engineering, pp.
 Collins J. and Ryan C., “Automatic Generation of Robot Behaviors using
Grammatical Evolution,” In Proc. of AROB 2000, the Fifth International
Symposium on Artiﬁcial Life and Robotics.
 M. O’Neill and C. Ryan, “Automatic generation of caching algorithms,” In
Kaisa Miettinen, Marko M. Mkel, Pekka Neittaanmki, and Jacques Peri-
aux (eds.), Evolutionary Algorithms in Engineering and Computer Science,
Jyvskyl, Finland, 30 May - 3 June 1999, John Wiley & Sons, pp. 127-134,
 A. Brabazon and M. O’Neill, “A grammar model for foreign-exchange trad-
ing,” In H. R. Arabnia et al., editor, Proceedings of the International con-
ference on Artiﬁcial Intelligence, volume II, CSREA Press, 23-26 June 2003,
LMX GREASE Lithium complex grease DESCRIPTION LMX Grease is a lithium-complex thickened mineral oil based grease of NLGI No.2 consistency having extreme pressure properties andinhibited against oxidation and corrosion. APPLICATIONS Lithium complex greases have similar basic properties to conventionalSuch performance characteristics mean that lithium complex greaseslithium soap grease
Was ist Viagra und wann wird es angewendet?Viagra ist der erste Vertreter einer neuen Medikamentengruppe mit der Bezeichnung Phosphodiesterase-Typ 5-Inhibitoren. Es wirkt, indem es bei sexuel er Erregung die Entspannung der Blutgefässe in Ihrem Penis unterstuetzt. Dadurch kann Blut leichter in den Penis fliessen und Sie erreichen auf natuerliche Weise eine Erektion. Sie sol en Viagra nicht einn