Machine Learning with Python - Linear Regression

Thursday, October 27, 2011

Hi all,

I decided to start a new series of posts now focusing on general machine learning with several snippets for anyone to use with real problems or real datasets.  Since I am studying machine learning again with a great course online offered this semester by Stanford University, one of  the best ways to review the content learned is to write some notes about what I learned. The best part is that it will include examples with Python, Numpy and Scipy. I expect you enjoy all those posts!

Linear Regression

In this post I will implement the linear regression and get to see it work on data.  Linear Regression is the oldest and most widely used predictive model in the field of machine learning. The goal is to  minimize the sum of the squared errros to fit a straight line to a set of data points.  (You can find further information at Wikipedia).

The linear regression model fits a linear function to a set of data points. The form of the function is:

Y = β0 + β1*X1 + β2*X2 + … + βn*Xn

Where Y is the target variable,  and X1X2, ... Xare the predictor variables and  β1β2, … βare the coefficients that multiply the predictor variables.  βis constant. 

For example, suppose you are the CEO of a big company of shoes franchise and are considering different cities for opening a new store. The chain already has stores in various cities and you have data for profits and populations from the cities.  You would like to use this data to help you select which city to expand next. You could use linear regression for evaluating the parameters of a function that predicts profits for the new store.

The final function would be:

                                                         Y =   -3.63029144  + 1.16636235 * X1

There are two main approaches for linear regression: with one variable and with multiple variables. Let's see both!

Linear regression with one variable

Considering our last example, we have a file that contains the dataset of our linear regression problem. The first column is the population of the city and the second column is the profit of having a store in that city. A negative value for profit indicates a loss.

Before starting, it is useful to understand the data by visualizing it.  We will use the scatter plot to visualize the data, since it has only two properties to plot (profit and population). Many other problems in real life are multi-dimensional and can't be plotted on 2-d plot.

If you run this code above (you must have the Matplotlib package installed in order to present the plots), you will see the scatter plot of the data as shown at Figure 1.


Now you must fit the linear regression parameters to our dataset using gradient descent. The objective of linear regression is to minimize the cost function:

where the hypothesis H0 is given by the linear model:

The parameters of your model are the θ values. These are the values you will adjust to minimize cost J(θ). One way to do it is to use the batch gradient descent algorithm. In batch gradient, each iteration performs the update:

With each step of gradient  descent, your parameters θ, come close to the optimal values that will achieve the lowest cost J(θ).

For our initial inputs we start with our initial fitting parameters θ, our data and add another dimmension to our data  to accommodate the θo intercept term. As also our learning rate alpha to 0.01.

As you perform gradient descent to learn minimize the cost function J(θ), it is helpful to monitor the convergence by computing the cost. The function cost is show below:

A good way to verify that gradient descent is working correctly is to look at the value of J(θ) and check that it is decreasing with each step. It should converge to a steady valeu by the end of the algorithm.

Your final values for θ will be used to make predictions on profits in areas of 35.000 and 70.000 people.  For that we will use some matrix algebra functions with the packages Scipy and Numpy,  powerful Python packages for scientific computing.

Our final values as shown below:

                                                         Y =   -3.63029144  + 1.16636235 * X1

Now  you can use this function to predict your profits!  If you use this function with our data we will come with plot:

Another interesting plot is the contour plots, it will give you how J(θ) varies with changes in θo and  θ1.  The cost function J(θ) is bowl-shaped and has a global mininum as you can see in the figure below.

This minimum is the optimal point for θo and θi, and each step of gradient descent moves closer to this point.

All the code is shown here.

Linear regression with multiple variables

Ok, but when you have multiple variables ? How do we work with them using linear regression ? That comes the linear regression with multiple variables. Let's see an example:

Suppose you are selling your house and you want to know what a good market price would be. One way to do this is to first collect information on recent houses sold and make a model of housing prices.

Our training set of housing prices in Recife, Pernambuco, Brazil are formed by three columns  (three variables). The first column is the size of the house (in square feet), the second column is the number of bedrooms, and the third column is the price of the house.

But before going directly to the linear regression it is important to analyze our data. By looking at the values, note that house sizes are about 1000 times the number of bedrooms. When features differ by orders of magnitude, it is important to perfom a feature scaling that can make gradient descent converge much more quickly.

The basic steps are:

  • Subtract the mean value of each feature from the dataset.
  • After subtracting the mean, additionally scale (divide) the feature values by their respective “standard deviations.”

The standard deviation is a way of measuring how much variation there is in the range of values of a particular feature (most data points will lie within ±2 standard deviations of the mean); this is an alternative to taking the range of values (max-min).

Now that you have your data scaled, you can implement the gradient descent and the cost function.

Previously, you implemented gradient descent on a univariate regression problem. The only difference now is that there is one more feature in the matrix X. The hypothesis function and the batch gradient descent update rule remain unchanged.

In the multivariate case, the cost function can also be written in the following vectorized form:


After running our code, it will come with following function:

             215810.61679138,   61446.18781361,   20070.13313796

The gradient descent will run until convergence to find the final values of θ.  Next, we will this value of θ to predict the price of a house with 1650 square feet and 3 bedrooms.


Predicted price of a 1650 sq-ft, 3 br house: 183865.197988

If you plot the convergence plot of the gradient descent you may see that convergence will decrease as the number of iterations grows.

The code for linear regression with multi variables is available here.

Extra Notes

The Scipy package comes with several tools for helping you in this task, even with a module that has a linear regression implemented for you to use!

The module is scipy.stats.linregress  and implements several other techniques for updating the theta parameters.  Check more about it here.


The goal of regression is to determine the values of the ß parameters that minimize the sum of the squared residual values (difference betwen predicted and the observed) for the set of observations. Since linear regression is restricted to fiting linear (straight line/plane) functions to data, it's not adequate to real-world data as more general techniques such as neural networks which can model non-linear functions.  But linear regression has some interesting advantages:

  • Linear regression is the most widely used method, and it is well understood.
  • Training a linear regression model is usually much faster than methods such as neural networks.
  • Linear regression models are simple and require minimum memory to implement, so they work well on embedded controllers that have limited memory space.
  • By examining the magnitude and sign of the regression coefficients (β) you can infer how predictor variables affect the target outcome.
  • It's is one of the simplest algorithms and available in several packages, even Microsoft Excel!

I hope you enjoyed this simple post, and in the next one I will explore another field of machine learning with Python! You can download the code at this link.

Marcel Caraciolo


  1. Congratulations on Machine Learning in Python! achieves Machine Learning in JavaScript by asking questions for the human user to answer.

  2. Enjoy reading your post. Great article, thank you very much! Really nice and impressive blog i found today... Thx for sharing this

  3. This comes in handy for me! Thank you

  4. thanks so much man - I am really struggling with the stanford course and have been wishing it was in python! this is awesome...

  5. Your code is a bit confusing in that you do not use at all your compute_cost function!

    The result of the cost function is stored in the variable J_history which is never used again.

    Furthermore your gradient descent method never checks on a minimal error but just iterates a fixed number of times towards a minimum. This can also lead to bad results.

  6. great tutorial. your code helps my comprehension of the process. do you have any examples of this executed using pandas?

  7. hi,
    it[:,1] = X does not seem to work ...
    A new column was not added to the array.

  8. Nice work. Thanks!

    1. Excellent material can be found in

  9. I'm pretty sure gradient descent isn't actually linear regression, its a more general solver thats actually more advanced and used with non-linear data. Linear regression will fit only the simplest models but its FAST. Gradient descent is far slower.

  10. your are clear the error for python programming language.your site is very useful clear the error for programming.Thank you for sharing your paragraph.Best Python training institute in Chennai

  11. I have read you article very useful information for python training.Thank you for sharing you article.Best Python Training in Chennai

  12. Why would you not include the datatypes of the inputs in your comments instead of some useless phrase lilke "Comput cost for linear regression"...

  13. How can I use that code in mongodb ? Sry I am quite new, I have a MongoDB Database with a collection of "documents". How can I run this code against my collection. The Objects are only filled with 2 attributes and the attributes are numeric. I want to run a linear regression over these :) Thanks

  14. Welcome to Wiztech Automation - Embedded System Training in Chennai. We have knowledgeable Team for Embedded Courses handling and we also are after Job Placements offer provide once your Successful Completion of Course. We are Providing on Microcontrollers such as 8051, PIC, AVR, ARM7, ARM9, ARM11 and RTOS. Free Accommodation, Individual Focus, Best Lab facilities, 100% Practical Training and Job opportunities.

    Embedded System Training in chennai
    Embedded System Training Institute in chennai
    Embedded Training in chennai
    Embedded Course in chennai
    Best Embedded System Training in chennai
    Best Embedded System Training Institute in chennai
    Best Embedded System Training Institutes in chennai
    Embedded Training Institute in chennai
    Embedded System Course in chennai
    Best Embedded System Training in chennai

  15. Embedded system training: Wiztech Automation Provides Excellent training in embedded system training in Chennai - IEEE Projects - Mechanical projects in Chennai. Wiztech provide 100% practical training, Individual focus, Free Accommodation, Placement for top companies. The study also includes standard microcontrollers such as Intel 8051, PIC, AVR, ARM, ARMCotex, Arduino, etc.

    Embedded system training in chennai
    Embedded Course training in chennai
    Matlab training in chennai
    Android training in chennai
    LabVIEW training in chennai
    Robotics training in chennai
    Oracle training in chennai
    Final year projects in chennai
    Mechanical projects in chennai
    ece projects in chennai

  16. WIZTECH Automation, Anna Nagar, Chennai, has earned reputation offering the best automation training in Chennai in the field of industrial automation. Flexible timings, hands-on-experience, 100% practical. The candidates are given enhanced job oriented practical training in all major brands of PLCs (AB, Keyence, ABB, GE-FANUC, OMRON, DELTA, SIEMENS, MITSUBISHI, SCHNEIDER, and MESSUNG)

    PLC training in chennai
    Automation training in chennai
    Best plc training in chennai
    PLC SCADA training in chennai
    Process automation training in chennai
    Final year eee projects in chennai
    VLSI training in chennai

  17. This comment has been removed by the author.

  18. Wiztech Automation is the Leading Best IEEE Final year project Centre in Chennai and the final year students are provided complete guidance and support in their final year projects. The IEEE projects in Chennai that Wiztech Automation offers guidance and support for include complete range of system domains – such as PLC projects, embedded projects, VLSI projects, software projects, IT projects, Civil projects. Students looking for specific projects pertaining to departments of ECE, EEE, E&I, Mechanical, Mechatronics, bio-medical, IT, Computer, Civil projects in B.E, M.E, B.Tech, M.Tech, B.SC., and M.Sc Electronics, could also get turnkey solutions at Wiztech Automation Solutions to turn out successful project outcomes and models. Since the students at Wiztech Automation gain thorough theoretical and practical knowledge and skills as they pursue their final year projects and develop 2015 and 2016 Latest IEEE Projects portraying them well.

    Final year projects in chennai
    Mechanical projects in chennai
    ece projects in chennai
    Final year eee projects in chennai
    VLSI project center in chennai
    Industrial projects in chennai
    Fianl year CSE projects in chennai

  19. This can be a single case in which your Uk teacher had been right. But if your article is actually riddled together with punctuation blunders in addition to grammatical mishaps, you'll almost instantly get rid of reliability.nurse personal statement

  20. Hi admin thanks for sharing informative article on hadoop technology. In coming years, hadoop and big data handling is going to be future of computing world. This field offer huge career prospects for talented professionals. Thus, taking Hadoop & Spark Training in Hyderabad will help you to enter big data hadoop & spark technology.

  21. Could you share your ex1data1.txt data ? Thanks

    1. got the file @ Coursera Data science tutorial by andrew ng ..excecise week 2 assignment

  22. Excellent material can be found in

  23. Paris airport transfer - Parisairportransfer is very common in Paris that provides facilities to both the businessmen and the tourists. We provide airport transfers from London to any airport in London and also cruise transfer services at very affordable price to our valuable clients.

    Paris taxi
    Paris airport shuttle
    paris hotel transfer
    paris airport transfer
    paris shuttle
    paris car service
    paris airport service
    disneyland paris transfer
    paris airport transportation
    beauvais airport transfer
    taxi beauvais airport
    taxi cdg airport
    taxi orly airport

  24. Keep on posting these types of articles. I like your blog design as well. Cheers!!!MATLAB training in noida

  25. Useful Information
    one and only affiliate agency in south INDIA, earn money online from affiliate network in india

  26. Java Training Institute in Noida - Croma Campus imparts the most effective JAVA Training in Noida which is based on the principle write once and run anywhere which means that the code which runs on one platform does not need to be complied again to run on the other.


  27. I actually enjoyed reading through this posting.Many thanks.

    Function Point Estimation Training

  28. This comment has been removed by the author.

  29. Informatica training institutes in noida - Croma campus offers best Informatica Training in noida with most experienced professionals. Our Instructors are working in Informatica and joint technologies for more years in MNC’s. We aware of industry needs and we are offering Informatica Training in noida.

  30. Informatica training institutes in noida - Croma campus offers best Informatica Training in noida with most experienced professionals. Our Instructors are working in Informatica and joint technologies for more years in MNC’s. We aware of industry needs and we are offering Informatica Training in noida.