Machine Learning with Python - Logistic Regression

Sunday, November 6, 2011

 Hi all,

I decided to start a new series of posts now focusing on general machine learning with several snippets for anyone to use with real problems or real datasets.  Since I am studying machine learning again with a great course online offered this semester by Stanford University, one of  the best ways to review the content learned is to write some notes about what I learned. The best part is that it will include examples with Python, Numpy and Scipy. I expect you enjoy all those posts!

The series:

In this post I will cover the Logistic Regression and Regularization.

Logistic Regression

Logistic Regression is a type of regression that predicts the probability of ocurrence of an event by fitting data to a logit function (logistic function).  Like many forms of regression analysis, it makes use of several predictor variables that may be either numerical or categorical. For instance, the probability that a person has a heart attack within a specified time period might be predicted from knowledege of the person's age, sex and body mass index. This regression is quite used in several scenarios such as prediction of customer's propensity to purchase a product or cease a subscription in marketing applications and many others. 

Visualizing the Data

Let's explain the logistic regression by example. Consider you are the administrator of a university department and you want to determine each applicant's chance of admission based on their results on two exams. You have the historical data from previous applicants that you can use as a trainning set for logistic regression.  For each training example, you have the applicant's scores on two exams and the admissions decision.   We will use logistic regression to build this model that estimates the probability of admission based the scores from those two exams.

Let's first visualize our data on a 2-dimensional plot as show below. As you can see the axes are the two exam scores, and the positive and negative examples are shown with different markers.

Sample training visualization

The code

Costing Function and Gradient

The logistic regression hypothesis is defined as:

where the function g  is the sigmoid function. It is defined as:

The sigmoid function has special properties that can result values in the range [0,1].  So you have large positive values of X, the sigmoid should be close to 1, while for large negative values,  the sigmoid should be close to 0.

Sigmoid Logistic Function 

The cost function and gradient for logistic regression is given as below:

and the gradient of the cost is a vector theta where the j element is defined as follows:

You may note that the gradient is quite similar to the linear regression gradient, the difference is actually because linear and logistic regression have different definitions of h(x).

Let's see the code:

Now to find the minimum of this cost function, we will use a scipy built-in function called fmin_bfgs.  It will find the best parameters theta for the logistic regression cost function given a fixed dataset (of X and Y values).
The parameters are:
  • The initial values of the parameters you are trying to optimize;
  • A function that, when given the training set and a particular theta, computes the logistic regression cost and gradient with respect to theta for the dataset (X,y).

The final theta value will then be used to plot the decision boundary on the training data, resulting in a figure similar to the figure below.

Evaluating logistic regression

Now that you learned the parameters of the model, you can use the model to predict whether a particular student will be admited. For a student with an Exam1 score of 45 and an Exam 2 score of 85, you should see an admission probability of 0.776.

But you can go further, and evaluate the quality of the parameters that we have found and see how well the learned model predicts on our training set.  If we consider the threshold of 0.5 using our sigmoid logistic function, we can consider that:

Where 1 represents admited and -1 not admited.

Going to the code and calculate the training accuracy of our classifier we can evaluate the percentage of examples it got correct.  Source code.

89% , not bad hun?! 

Regularized logistic regression

But when your data can not be separated into positive and negative examples by a straight-line trought the plot ?  Since our logistic regression will be only be able to find a linear decision boundary, we will have to fit the data in a better way. Let's go through an example.

Suppose you are the product manager of the factory and you have the test results for some microships  of two different tests. From these two tests you would like to determine whether the microships should be accepted or rejected.  We have a dataset of test results on past microships, from which we can build a logistic regression model.  

Visualizing the data

Let's visualize our data. As you can see in the figure below, the axes are the two test scores, and the positive (y = 1, accepted) and negative (y = 0, rejected) examples are shown with different markers.
Microship training set 

You may see that the model built for this task may predict perfectly all training data and sometimes it migh cause some troubling cases.  Just because ithe model can perfectly reconstruct the training set does not mean that it had everything figured out.  This is known as overfitting.   You can imagine that if you  were relying on this model to make important decisions, it would be desirable to have at least of regularization in there. Regularization is a powerful strategy to combat the overfitting problem. We will see it in action at the next sections.

Feature mapping

One way to fit the data better is to create more features from each data point. We will map the features  into all polynomial terms of x1 tand x2 up to the sixth power.

As a result of this mapping, our vector of two features (the scores on two QA tests) has been transformed into a 28-dimmensional vector. A logistic regression classifier trained on this higher dimension feature vector  will have a more complex decision boundary and will appear nonlinear when drawn in our 2D plot.

Although the feature mapping allows us to buid a more expressive classifier, it also me susceptible to overfitting. That comes the regularized logistic regression to fit the data and avoid the overfitting problem.

Source code.

Cost function and gradient

The regularized cost function in logistic regression is :

Note that you should not regularize the parameter theta, so the final summation is for j = 1 to n, not j= 0 to n.  The gradient of the cost function is a vector where the jn element is defined as follows:

Now let's learn the optimal parameters theta.  Considering now those new functions and our last numpy optimization function we will be able to learn the parameters theta. 

The all code now provided (code)

Plotting the decision boundary

Let's visualize the model learned by the classifier. The plot will display the non-linear decision boundary that separates the positive and negative examples. 

Decision Boundary

As you can see our model succesfully predicted our data with accuracy of 83.05%.



Scikit-learn is an amazing tool for machine learning providing several modules for working with classification, regression and clustering problems. It uses python, numpy and scipy and it is open-source!

If you want to use logistic regression and linear regression you should take consider the scikit-learn. It has several examples and several types of regularization strategies to work with.  Take a look at this link and see by yourself!  I recommend!


Logistic regression has several advantages over linear regression, one specially it is more robust and does not assume linear relationship since it may handle nonlinear effects. However it requires much more data to achieve stable, meaningful results.  There are another machine learning techniques to handle with non-linear problems and we will see in the next posts.   I hope you enjoyed this article!

All source from this article here.


Marcel Caraciolo


  1. wow..i am also doing the stanford exercises in python..i was stuck with the optimization part..thank you very much for the article..

  2. the source code posted is giving errors..

    Warning: divide by zero encountered in log
    Warning: overflow encountered in power
    Warning: overflow encountered in power

    please check out! i am trying to debug it

  3. Hi Anonymous:
    Did you try feature scaling (mean normalization)?
    Here some quick code for that (n.B. I'm using other data, so the axes may have to change):
    def normalize(X):
    mu = numpy.mean(X, axis=0)
    Smin = numpy.amin(X, axis=0)
    Smax = numpy.amax(X, axis=0)
    x = (X - mu) / (Smax - Smin)
    return x

  4. Hi Marcel

    Great work posting this.

    I am getting similar warnings as anonymous using the above code (primarily when I expand the number of thetas to be estimated). The code works just fine for about 10 parameters or so.

    The following warning also appears:
    Warning: overflow encountered in double_scalars

    More concerning I suppose, the value returned maximum likelihood from fmin_bfgs (fmin_l_bfgs_b in my case) is nan and the following error occurs ABNORMAL_TERMINATION_IN_LNSRCH.

    Also, my features have already all be scaled as well.

    Any thoughts on what could possibly be occurring?

  5. fantastic presentation of Logistic Regression..

  6. Can you stop linking to this image (the bandwidth on my server is not for free)


  7. Hey, nice site you have here! Keep up the excellent work!

    Function Point Estimation Training

  8. This is a great tutorial, but I am confused with the first example. Why is the theta vector of length 3? Shouldn't it be of length 2? The theta vector you are trying to optimize is the slope and y-intercept, correct?

    Thanks for the help.

  9. How would I use fmin bfgs if I'm training for a Neural Network? The cost function over there has more than 1 theta. How would I provide a list of thetas to "decorated cost" function. I tried doing it but, I get errors in scipy optimize (the thetas don't change; program crashes after a couple of iterations.

  10. Hi. Nice post. I am wondering if it is possible to tweak a little bit of LogisticRegression in scikit-learn to get a "Regressor" rather that a "Classifier" like LogisticRegression? I went through all the codes. It seems that one of the main base class BaseLibLinear can only train different set of coefficients for different y. I really appreciate if you happy to get an answer. thanks.

  11. This comment has been removed by the author.

  12. Like several of the commenters, I get:

    Warning: divide by zero encountered in log

    because the elements of theta very quickly get big enough that sigmoid returns 1.0.

    Has anyone gotten the basic logistic regression code to actually work (without the regularization)?

  13. I figured it out. Line 26 of compute_cost():

    return - 1 * J.sum()

    This negates the entire cost function, which makes it difficult for LBFGS to minimize it. (:

    This explains why the thetas go through the roof.

  14. I seem to be having an issue with the code. Downloaded from GitHub and run it. I would assume that in that the output from decorated_cost() function would be the theta values defining our boundary. In fact, the code hard codes those theta values rather than using the model output. If you use what is returned by decorated_cost(), it is not accurate. How did you generate the hard coded values? Am I missing something?

  15. This is an informative post review. I am so pleased to get this post article and nice information. I was looking forward to get such a post which is very helpful to us. A big thank for posting this article in this website. Keep it up.
    mind control

  16. Thanks for sharing such kind of nice and wonderful collection......Nice post Dude keep it up.

    I have appreciate with getting lot of good and reliable and legislative information with your post......
    scripts, NLP, vance, advertisement

  17. I like totally and agree. And I think that in order to be comfortable with your style is to wear it more often. So wear your style to the lab on days that you don't have to do anything bloody, muddy or otherwise gross!
    subliminal advertising

  18. Heya¡­my very first comment on your site. ,I have been reading your blog for a while and thought I would completely pop in and drop a friendly note. .
    Function Point Estimation Training

  19. Hi all I solved the issue related to logistic regression, for a simple misunderstood I replaced the cost_function with wrong J , since the f_min receives only a single value and also the negative value which was wrong from the problem (minimization).

  20. Hello Marcel, I can not make either one work.
    the shows the "RuntimeWarning: overflow encountered in exp"
    for the, I changed the maxfun to maxiter
    but it still shows thetaR = theta[1:, 0]
    IndexError: too many indices"

    anybody got it work? Can I have the code please?

  21. Very Good information on property dealing. This site has very useful inputs related to Real Estate. Well Done & Keep it up to the team of Property Bytes….

    Function Point Estimation Training in Chennai

  22. Found your article and is very intersting after some effort to understand logistic regression. I notice that if the h[it] in predict function is changed from 0.5 to 0.2 or 0.3, the test accuracy result is sky rocketing to 0.92! Can you explain why ? How can we understand if that is a correct result or not ?
    Thanks for any feedback.

  23. In first example, what did you use to draw decision boundary?

  24. In theory example 1 should yield better accuracy if we added more features the same way it's done in example 2. After adding additional features for some reason minimizing function doesn't want to converge and stays at 60% any ideas why?

  25. Thank You For The Information

  26. I agree with your post, the Introduction of automation testing product shortens the development life cycle. It helps the software developers and programmers to validate software application performance and behavior before deployment. You can choose testing product based on your testing requirements and functionality. QTP Training Chennai

  27. Nice site....Please refer this site also nice
    Dot Net Training in Chennai,


  28. Its new for me, i will try learn this and we introduce my web design training institute

  29. Hi, While running
    scipy.optimize.fmin_bfgs(costFunction(theta, X, y), initial_theta,maxiter = 10),
    it throws error - 'tuple' object is not callable.

    My cost function is:

    def costFunction(theta, X, y):
    J = 0
    grad = zeros((size(theta)))

    z = sigmoid(,theta))
    cost = -(y*log(z)) - (1-y)*log(1 - z)

    J = sum(cost)/m

    grad =, (z - y))/m

    return grad,J

    Can anyone help me with this?

  30. hi,
    I am trying to do event recommendation by tags of events and i want to use logistic regresion as an algorith of the system. But logistic regression using vectors (x,y), but i could not transform tags to vectors. Does anyone can help me ?


  31. Thanks for sharing this informative blog. Recently I have completed Digital Marketing courses at a leading digital marketing company. It's really useful for me to make a bright career. If anyone wants to get Digital Marketing Course in Chennai visit infiniX located at Chennai. Rated as No.1 digital marketing company in Chennai.


  32. Your blog is really useful for me. Thanks for sharing this informative blog. If anyone wants to get real time Oracle Training in Chennai reach FITA located at Chennai. They give professional and job oriented training for all students.

  33. Thanks for sharing this informative blog. If anyone wants to get Unix Training in Chennai, Please visit Fita Academy located at Chennai, Velachery.

  34. Thanks for sharing this valuable information..If anyone wants to get SAP Training in Chennai, please visit FITA Academy located at Chennai..

  35. Your posts is really helpful for me.Thanks for your wonderful post.It is really very helpful for us and I have gathered some important information from this blog.If anyone wants to get Dot Net Training in Chennai reach FITA, rated as No.1 Dot Net Training Institutes in Chennai.

  36. Java is one of the popular technologies with improved job opportunity for hopeful professionals. Java Training in Chennai helps you to study this technology in details.

  37. Thanks for sharing this informative blog. If anyone wants to get Android Course in Chennai reach FITA Academy located at Chennai, Velachery.

  38. Thanks for sharing this informative blog.. If anyone want to get HTML Training in Chennai please visit FITA academy located at Chennai, Velachery. Rated as No.1 training and placement academy in Chennai.

  39. Thanks for sharing this informative blog..If anyone want to get Cloud Computing Training in Chennai reach FITA academy located at Chennai, Velachery.

  40. Thanks for sharing this informative blog..If anyone want to get Salesforce Training in Chennai reach FITA academy.

  41. SEO is one of the digital marketing techniques which is used to increase website traffic and organic search results. If anyone wants to get SEO Training in Chennai visit FITA Academy located at Chennai. Rated as No.1 Training institutes in Chennai.

  42. Thanks for your informative article on digital marketing trends. I hardly stick with SEO techniques in boosting my online presence as its cost efficient and deliver long term results. SEO Course in Chennai

  43. Its really awesome blog..If anyone wants to get Software Testing Training in Chennai visit FITA IT academy located at Chennai.

  44. I stick with Social Media Marketing. This promotional strategy is ideal for start-up and small organizations to enjoy maximum leads with minimal investment amount. However, you need to run effective marketing campaign to be successful. SEO Training Center in Chennai

  45. Your posts is really helpful for me.Thanks for your wonderful post.It is really very helpful for us and I have gathered some important information from this blog.If anyone wants to get Dot Net Training in Chennai reach FITA, rated as No.1 Dot Net Training Institutes in Chennai.

  46. your pots is really useful for me....i hope to really understand easy..hadoop training in chennai

  47. Selenium Training in Chennai,

    Selenium is an open source web automation tool developed by Thoughtworks. Since it is based on JavaScript so it can be operated from any of the platforms like Windows, Linux, Mac, Android (Mobile OS developed by Google) , iOS (OS for iPhone and iPad) along with the supported web browsers such as Firefox, Internet Explorer, Chrome, Safari, Opera etc. Visit Us, Selenium Training in Chennai

  48. QTP Training in Chennai,

    QTP is widely used test automation tool mainly for functional testing. QTP has many more advanced options and HP recommends that all existing and new users should begin with Quick Test Professional(QTP) instead of Win Runner.
    Visit Us, QTP Training in Chennai

  49. QTP Training in Chennai,
    Automated software testing is a process in which software tools execute pre-scripted tests on a software application before it is released into production Visit Us, QTP Training in Chennai

  50. Learn how to use Selenium from beginner level to advanced techniques which is taught by experienced working professionals. Best Training in Chennai

  51. Hi, I wish to be a regular contributor of your blog. I have read your blog. Your information is really useful for beginner. I did QTP Training Chennai at Fita training and placement academy which offer best Selenium Training Chennai with years of experienced professionals. This is really useful for me to make a bright career.

  52. Thanks for your wonderful post.It is really very helpful for us and I have gathered some important information from this blog.If anyone wants to get Dot Net Course in Chennai reach FITA, rated as No.1 Dot Net Training Institute in Chennai.

  53. Thanks for sharing this informative blog. Recently I did Digital Marketing Courses in Chennai at a leading digital marketing company. It's really useful for me to make a bright career. To know more details about this course please visit FITA.

  54. Thanks for sharing this information. SEO is one of the digital marketing techniques which is used to increase website traffic and organic search results. . If anyone wants to get SEO Training in Chennai visit FITA Academy located at Chennai. Rated as No.1 SEO Training Institute in Chennai.

  55. Thanks for your informative article. With the world is totally dependent on internet, the future of digital marketing is on positive note. It also assures lucrative career opportunity for professionals looking for job in digital marketing. Digital Marketing Training in Chennai | Digital Marketing Course in Chennai


  56. I have read all the articles in your blog; was really impressed after reading it. FITA is glad
    To inform you that; we provide Salesforcecrm practical training with MNC exports. We Assure you that through our training the students will gain all the sufficient knowledge to have a voyage in IT industry.

    Salesforce training in Chennai | Salesforce courses in Chennai | Salesforce training

  57. This comment has been removed by the author.

  58. I have read your article very useful python programming are clearing the errors for python programming.Thank you for sharing your article.Python training center in Chennai