We start by setting up the environment.
%pylab inline
The data is from Chapter 4 of MLS. The variable x corresponds to the weight of various animals measured in grams; the variable y corresponds to the heart rate of the animal in beats per minute.
x=[4,25,200,300,2000,5000,30000,50000,70000,450000,500000,3000000]
y=[660,670,420,300,205,120,85,70,72,38,40,48]
plot(x,y,'o')
The data is very much clustered in one corner of the coordinate system. This is strongly suggests choosing some logarithmic scale. Let's first try a logarithmic scale on the data. If the y depended on x exponentially, we would see a straight line.
semilogy(x,y,'o')
Not much better. So let's test for an allometric relationship, which would show up as a straight line in a doubly logarithmic plot.
loglog(x,y,'o')
This rather strongly indicates an allometric relationship. So let's fit a line on the log-log scale.
l1=polyfit(log(x),log(y),1)
The first and last data point look like outliers, so we might want to exclude them when fitting:
l2=polyfit(log(x[1:-1]),log(y[1:-1]),1)
Let's now define the fitting functions for both cases and plot them together with the data:
def f1(x):
return exp(l1[1]) * x**l1[0]
def f2(x):
return exp(l2[1]) * x**l2[0]
loglog(x, y, 'o',
x,f1(x),
x,f2(x))
legend(('Data','Allometric Fit','Allometric Fit Excluding Endpoints'))
xlabel('Weight [g]')
ylabel('Heart rate [bpm]')
We can also test for the goodness of fit using the correlation coefficient. Note that Scipy, different from Matlab, always returns a correlation matrix containing all pairwise correlation coefficients between the data vectors given.
corrcoef(x,y)
So the value we are interested in is the off-diagonal element (the matrix is always symmetric), this is the element with index [0,1]. The 1 on the diagonal simply reflects that each data vector is perfectly correlated with itself.
corrcoef(x,y)[0,1]
This shows that the correlation is not very good, in agreement what we see in the first plot above. Similarly, the correlation
corrcoef(x,log(y))[0,1]
is still very weak, as seen in the second plot above. Testing for the correlation of an allometric fit,
corrcoef(log(x),log(y))[0,1]
shows a rather strong correlation. Excluding the first and last point,
corrcoef(log(x[1:-1]),log(y[1:-1]))[0,1]
is even better. This result says that
corrcoef(log(x[1:-1]),log(y[1:-1]))[0,1]**2 * 100
percent of the variance on the doubly logarithmic scale is explained by the fit.