## Thursday, October 27, 2011

Hi all,

I decided to start a new series of posts now focusing on general machine learning with several snippets for anyone to use with real problems or real datasets.  Since I am studying machine learning again with a great course online offered this semester by Stanford University, one of  the best ways to review the content learned is to write some notes about what I learned. The best part is that it will include examples with Python, Numpy and Scipy. I expect you enjoy all those posts!

Linear Regression

In this post I will implement the linear regression and get to see it work on data.  Linear Regression is the oldest and most widely used predictive model in the field of machine learning. The goal is to  minimize the sum of the squared errros to fit a straight line to a set of data points.  (You can find further information at Wikipedia).

The linear regression model fits a linear function to a set of data points. The form of the function is:

Y = β0 + β1*X1 + β2*X2 + … + βn*Xn

Where Y is the target variable,  and X1X2, ... Xare the predictor variables and  β1β2, … βare the coefficients that multiply the predictor variables.  βis constant.

For example, suppose you are the CEO of a big company of shoes franchise and are considering different cities for opening a new store. The chain already has stores in various cities and you have data for profits and populations from the cities.  You would like to use this data to help you select which city to expand next. You could use linear regression for evaluating the parameters of a function that predicts profits for the new store.

The final function would be:

Y =   -3.63029144  + 1.16636235 * X1

There are two main approaches for linear regression: with one variable and with multiple variables. Let's see both!

Linear regression with one variable

Considering our last example, we have a file that contains the dataset of our linear regression problem. The first column is the population of the city and the second column is the profit of having a store in that city. A negative value for profit indicates a loss.

Before starting, it is useful to understand the data by visualizing it.  We will use the scatter plot to visualize the data, since it has only two properties to plot (profit and population). Many other problems in real life are multi-dimensional and can't be plotted on 2-d plot.

If you run this code above (you must have the Matplotlib package installed in order to present the plots), you will see the scatter plot of the data as shown at Figure 1.

Now you must fit the linear regression parameters to our dataset using gradient descent. The objective of linear regression is to minimize the cost function:

where the hypothesis H0 is given by the linear model:

The parameters of your model are the θ values. These are the values you will adjust to minimize cost J(θ). One way to do it is to use the batch gradient descent algorithm. In batch gradient, each iteration performs the update:

With each step of gradient  descent, your parameters θ, come close to the optimal values that will achieve the lowest cost J(θ).

For our initial inputs we start with our initial fitting parameters θ, our data and add another dimmension to our data  to accommodate the θo intercept term. As also our learning rate alpha to 0.01.

As you perform gradient descent to learn minimize the cost function J(θ), it is helpful to monitor the convergence by computing the cost. The function cost is show below:

A good way to verify that gradient descent is working correctly is to look at the value of J(θ) and check that it is decreasing with each step. It should converge to a steady valeu by the end of the algorithm.

Your final values for θ will be used to make predictions on profits in areas of 35.000 and 70.000 people.  For that we will use some matrix algebra functions with the packages Scipy and Numpy,  powerful Python packages for scientific computing.

Our final values as shown below:

Y =   -3.63029144  + 1.16636235 * X1

Now  you can use this function to predict your profits!  If you use this function with our data we will come with plot:

Another interesting plot is the contour plots, it will give you how J(θ) varies with changes in θo and  θ1.  The cost function J(θ) is bowl-shaped and has a global mininum as you can see in the figure below.

This minimum is the optimal point for θo and θi, and each step of gradient descent moves closer to this point.

All the code is shown here.

Linear regression with multiple variables

Ok, but when you have multiple variables ? How do we work with them using linear regression ? That comes the linear regression with multiple variables. Let's see an example:

Suppose you are selling your house and you want to know what a good market price would be. One way to do this is to first collect information on recent houses sold and make a model of housing prices.

Our training set of housing prices in Recife, Pernambuco, Brazil are formed by three columns  (three variables). The first column is the size of the house (in square feet), the second column is the number of bedrooms, and the third column is the price of the house.

But before going directly to the linear regression it is important to analyze our data. By looking at the values, note that house sizes are about 1000 times the number of bedrooms. When features differ by orders of magnitude, it is important to perfom a feature scaling that can make gradient descent converge much more quickly.

The basic steps are:

• Subtract the mean value of each feature from the dataset.
• After subtracting the mean, additionally scale (divide) the feature values by their respective “standard deviations.”

The standard deviation is a way of measuring how much variation there is in the range of values of a particular feature (most data points will lie within ±2 standard deviations of the mean); this is an alternative to taking the range of values (max-min).

Now that you have your data scaled, you can implement the gradient descent and the cost function.

Previously, you implemented gradient descent on a univariate regression problem. The only difference now is that there is one more feature in the matrix X. The hypothesis function and the batch gradient descent update rule remain unchanged.

In the multivariate case, the cost function can also be written in the following vectorized form:

J(θ)=12m(Xθy)T(Xθy)

After running our code, it will come with following function:

215810.61679138,   61446.18781361,   20070.13313796

The gradient descent will run until convergence to find the final values of θ.  Next, we will this value of θ to predict the price of a house with 1650 square feet and 3 bedrooms.

θ:=θα1mxT(xθTy)

Predicted price of a 1650 sq-ft, 3 br house: 183865.197988

If you plot the convergence plot of the gradient descent you may see that convergence will decrease as the number of iterations grows.

The code for linear regression with multi variables is available here.

Extra Notes

The Scipy package comes with several tools for helping you in this task, even with a module that has a linear regression implemented for you to use!

The module is scipy.stats.linregress  and implements several other techniques for updating the theta parameters.  Check more about it here.

Conclusions

The goal of regression is to determine the values of the ß parameters that minimize the sum of the squared residual values (difference betwen predicted and the observed) for the set of observations. Since linear regression is restricted to fiting linear (straight line/plane) functions to data, it's not adequate to real-world data as more general techniques such as neural networks which can model non-linear functions.  But linear regression has some interesting advantages:

• Linear regression is the most widely used method, and it is well understood.
• Training a linear regression model is usually much faster than methods such as neural networks.
• Linear regression models are simple and require minimum memory to implement, so they work well on embedded controllers that have limited memory space.
• By examining the magnitude and sign of the regression coefficients (β) you can infer how predictor variables affect the target outcome.
• It's is one of the simplest algorithms and available in several packages, even Microsoft Excel!

I hope you enjoyed this simple post, and in the next one I will explore another field of machine learning with Python! You can download the code at this link.

Marcel Caraciolo

1. Muito bom, continue assim.

Abraço.

1. excellent post dude.... Keep posting.. Wishing you all the very best

2. Excelente Marcel, good job though!

3. Congratulations on Machine Learning in Python! http://www.scn.org/~mentifex/AiMind.html achieves Machine Learning in JavaScript by asking questions for the human user to answer.

4. Enjoy reading your post. Great article, thank you very much! Really nice and impressive blog i found today... Thx for sharing this

5. This comes in handy for me! Thank you

6. thanks so much man - I am really struggling with the stanford course and have been wishing it was in python! this is awesome...

7. is there a video tuorial on this?

8. Your code is a bit confusing in that you do not use at all your compute_cost function!

The result of the cost function is stored in the variable J_history which is never used again.

Furthermore your gradient descent method never checks on a minimal error but just iterates a fixed number of times towards a minimum. This can also lead to bad results.

9. great tutorial. your code helps my comprehension of the process. do you have any examples of this executed using pandas?

10. hi,
it[:,1] = X does not seem to work ...
A new column was not added to the array.

11. Nice work. Thanks!

1. Excellent material can be found in www.KautilyaClasses.com

12. I'm pretty sure gradient descent isn't actually linear regression, its a more general solver thats actually more advanced and used with non-linear data. Linear regression will fit only the simplest models but its FAST. Gradient descent is far slower.

1. Both are different thing.Gradient descent is used for minimising error I.e. getting better theta values. It's also used in neural networks and many other learning algorithms.

13. your are clear the error for python programming language.your site is very useful clear the error for programming.Thank you for sharing your paragraph.Best Python training institute in Chennai

14. I have read you article very useful information for python training.Thank you for sharing you article.Best Python Training in Chennai

1. You have share the great info. I like your post. Thanks for sharing the good points. I will recommend this info. seo training in jalandhar

15. Why would you not include the datatypes of the inputs in your comments instead of some useless phrase lilke "Comput cost for linear regression"...

16. How can I use that code in mongodb ? Sry I am quite new, I have a MongoDB Database with a collection of "documents". How can I run this code against my collection. The Objects are only filled with 2 attributes and the attributes are numeric. I want to run a linear regression over these :) Thanks

17. Welcome to Wiztech Automation - Embedded System Training in Chennai. We have knowledgeable Team for Embedded Courses handling and we also are after Job Placements offer provide once your Successful Completion of Course. We are Providing on Microcontrollers such as 8051, PIC, AVR, ARM7, ARM9, ARM11 and RTOS. Free Accommodation, Individual Focus, Best Lab facilities, 100% Practical Training and Job opportunities.

Embedded System Training in chennai
Embedded System Training Institute in chennai
Embedded Training in chennai
Embedded Course in chennai
Best Embedded System Training in chennai
Best Embedded System Training Institute in chennai
Best Embedded System Training Institutes in chennai
Embedded Training Institute in chennai
Embedded System Course in chennai
Best Embedded System Training in chennai

18. This is really good share,
"blueapplecourses"

19. Embedded system training: Wiztech Automation Provides Excellent training in embedded system training in Chennai - IEEE Projects - Mechanical projects in Chennai. Wiztech provide 100% practical training, Individual focus, Free Accommodation, Placement for top companies. The study also includes standard microcontrollers such as Intel 8051, PIC, AVR, ARM, ARMCotex, Arduino, etc.

Embedded system training in chennai
Embedded Course training in chennai
Matlab training in chennai
Android training in chennai
LabVIEW training in chennai
Robotics training in chennai
Oracle training in chennai
Final year projects in chennai
Mechanical projects in chennai
ece projects in chennai

20. WIZTECH Automation, Anna Nagar, Chennai, has earned reputation offering the best automation training in Chennai in the field of industrial automation. Flexible timings, hands-on-experience, 100% practical. The candidates are given enhanced job oriented practical training in all major brands of PLCs (AB, Keyence, ABB, GE-FANUC, OMRON, DELTA, SIEMENS, MITSUBISHI, SCHNEIDER, and MESSUNG)

PLC training in chennai
Automation training in chennai
Best plc training in chennai
Process automation training in chennai
Final year eee projects in chennai
VLSI training in chennai

21. This comment has been removed by the author.

22. Wiztech Automation is the Leading Best IEEE Final year project Centre in Chennai and the final year students are provided complete guidance and support in their final year projects. The IEEE projects in Chennai that Wiztech Automation offers guidance and support for include complete range of system domains – such as PLC projects, embedded projects, VLSI projects, software projects, IT projects, Civil projects. Students looking for specific projects pertaining to departments of ECE, EEE, E&I, Mechanical, Mechatronics, bio-medical, IT, Computer, Civil projects in B.E, M.E, B.Tech, M.Tech, B.SC., and M.Sc Electronics, could also get turnkey solutions at Wiztech Automation Solutions to turn out successful project outcomes and models. Since the students at Wiztech Automation gain thorough theoretical and practical knowledge and skills as they pursue their final year projects and develop 2015 and 2016 Latest IEEE Projects portraying them well.

Final year projects in chennai
Mechanical projects in chennai
ece projects in chennai
Final year eee projects in chennai
VLSI project center in chennai
Industrial projects in chennai
Fianl year CSE projects in chennai

23. This can be a single case in which your Uk teacher had been right. But if your article is actually riddled together with punctuation blunders in addition to grammatical mishaps, you'll almost instantly get rid of reliability.nurse personal statement

24. Hi admin thanks for sharing informative article on hadoop technology. In coming years, hadoop and big data handling is going to be future of computing world. This field offer huge career prospects for talented professionals. Thus, taking Hadoop & Spark Training in Hyderabad will help you to enter big data hadoop & spark technology.

25. Could you share your ex1data1.txt data ? Thanks

1. got the file @ Coursera Data science tutorial by andrew ng ..excecise week 2 assignment

26. Excellent material can be found in www.KautilyaClasses.com

27. Paris airport transfer - Parisairportransfer is very common in Paris that provides facilities to both the businessmen and the tourists. We provide airport transfers from London to any airport in London and also cruise transfer services at very affordable price to our valuable clients.

Paris taxi
Paris airport shuttle
paris hotel transfer
paris airport transfer
paris shuttle
paris car service
paris airport service
disneyland paris transfer
paris airport transportation
beauvais airport transfer
taxi beauvais airport
taxi cdg airport
taxi orly airport

28. Keep on posting these types of articles. I like your blog design as well. Cheers!!!MATLAB training in noida

29. Useful Information
one and only affiliate agency in south INDIA, earn money online from affiliate network in india

30. nice post and site, good work! Data Scientist online

31. Thanku for sharing this excellent posts..

32. Thanku for sharing..
sap fiori online training

33. Java Training Institute in Noida - Croma Campus imparts the most effective JAVA Training in Noida which is based on the principle write once and run anywhere which means that the code which runs on one platform does not need to be complied again to run on the other.

34. I actually enjoyed reading through this posting.Many thanks.

Function Point Estimation Training

35. This comment has been removed by the author.

36. Informatica training institutes in noida - Croma campus offers best Informatica Training in noida with most experienced professionals. Our Instructors are working in Informatica and joint technologies for more years in MNC’s. We aware of industry needs and we are offering Informatica Training in noida.

37. Informatica training institutes in noida - Croma campus offers best Informatica Training in noida with most experienced professionals. Our Instructors are working in Informatica and joint technologies for more years in MNC’s. We aware of industry needs and we are offering Informatica Training in noida.

38. This comment has been removed by the author.

39. Croma campus has been NO.1 & Best Android training institute in noida offering 100% Guaranteed JOB Placements, Cost-Effective, Quality & Real time Training courses Croma campus provide all IT course like JAVA, DOT NET, ANDROID APPS, PHP, PLC SCADA, ROBOTICS and more IT training then joining us Croma campus and your best futures.

40. Croma campus has been NO.1 & Best Android training institute in noida offering 100% Guaranteed JOB Placements, Cost-Effective, Quality & Real time Training courses Croma campus provide all IT course like JAVA, DOT NET, ANDROID APPS, PHP, PLC SCADA, ROBOTICS and more IT training then joining us Croma campus and your best futures.

41. Nice Information:
Telugu Cinema Contains Telugu Cinema News, Latest Movie Reviews, Actor, Actress, Movie Galleries And Many More Telugu Cinema News

42. Nice Information
one and only affiliate agency in south INDIA, earn money online from affiliate network in india

43. Useful Information……
Recruitment voice contains Daily GK Updates, Bank Recruitment, Government jobs, Bank jobs, Interview Tips, Banking News, GK Updates Bank Recruitment

44. This is one of the valuable information share by you about embedded linux course. Thanks for sharing it with us. Keep it on... Training on MATLAB | Training on VLSI

45. As a full-fledged Industrial automation training in Hyderabad company SOS India offers complete training on PLC & SCADA with advanced hardware facilities.Unlimited Practices-Industrial Tours-Excellent Placements.

48. Excellent post! keep sharing such a informative post.
dot net training in chennai

49. Croma campus biggest training center in robotics croma campus no1 Robotics Training Institute in Noida provide best class Rootics Trainer with job placement support.

50. I learning about a lot of great information for this weblog. We share it valuable information.

51. Thank you , very usefully information

best Python online training in Hyderabad

52. This comment has been removed by the author.

53. Where can I find dataset for univariate linear regression?

54. very nice article for kickstarting machine learning .thanks

55. Thanks You for sharing this post.

Linux Training in Chennai

56. Sagacity Software is one of the best Big Data Analytics Company in india. Sagacity is the top comapany providing services for big data analytics. It offers high performance, analytical solutions for enterprises.

57. Thank you, very usefully information about the Machine learning with Python linear.We are offering the Python Training In Hyderabad

58. Analogica data is a one of the Best Big Data Services Provider Company in India, provide acumens on operations, products and customers. We also support predictive analysis,Big Data Services, master data management, and real time dashboards.

59. Analogica is a Big Data Analytics, Processing and Solutions company based in India. Our team has lived the evolutions and changes in the data analytics.

60. Analogica Data is one of the best Big Data Analysis Company in India, provide effective Big Data Solutions, efficient Big Data Analytics services in india.

61. Analogica Data We are a Big Data Analytics, Processing and Solutions company based in India. Automation testingBig data analysis today is ubiquitous, but with 100+ man years of technical experience, we stand amongst the Top Big Data Analytics Services and Solution in India and US

62. Thanks for sharing the information about the Python and keep updating us.This information is really useful to me.

63. Your music is amazing. You have some very talented artists. I wish you the best of success. Pakistani Bridal Dresses

64. Just found your post by searching on the Google, I am Impressed and Learned Lot of new thing from your post. I am new to blogging and always try to learn new skill as I believe that blogging is the full time job for learning new things day by day.
"Emergers Technologies"

65. Do you have a website?

66. I have been surfing the internet for more than two hours looking for Page by Page Reviewing services and I have not come across such a wonderful and interesting blog. It has good content and a unique design. I will be visiting it occasionally to read both new and old articles.

67. Great post! I am actually getting ready to across this information, It's very helpful for this blog.Also great with all of the valuable information you have Keep up the good work you are doing well.
Python Training in Chennai

68. This is extremely great information for these blog!! And Very good work. It is very interesting to learn from to easy understood. Thank you for giving information. Please let us know and more information get post to link.
Analytics Training in Chennai

69. Thank you for taking the time to provide us with your valuable information. We strive to provide our candidates with excellent care and we take your comments to heart.As always, we appreciate your confidence and trust in us
Matlab Training in chennai

70. You are provided an excellent content it's very nice python online training

71. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
Android Training in Chennai
Ios Training in Chennai

72. Automation engineering is all about selecting, integrating, configuring and troubleshooting of various readymade products in different engineering branches which makes the machine run automatically. Autonetics helps to reduce gap between industry and yourself, helps you according to current market trend and industry need. Autonetics provide you portal to meet your professional characteristic and make you industry ready professional. Autonetics offers certification course in PLC Training programs for B.E. and Diploma graduating under and working profession. For a better career and higher post opportunities join Autonetics Training Center.
To know more visit: http://autoneticstraining.com/
Contact: +91 7721988881 / 7721988882
0253 6615509

73. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
Android Training in Chennai
Ios Training in Chennai

74. This is very interesting but the code has syntax errors. Please fix it. Thanks a lot.

75. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
Android Training in Chennai
Ios Training in Chennai

76. I have found this post to be very helpful, it has the kind of information that i would like to see more often. You have a way of getting the attraction of the readers. Its my wish that you will keep on posting. Translating a Novel Written in Kiswahili into English isnt always a walk in the park, at times its recommendable to seek professional help.

77. it is really amazing...thanks for sharing....provide more useful information...
Mobile app development company

78. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
Android Training in Chennai
Ios Training in Chennai

79. Nice work thank u for sharing...
Dot Net Training in Chennai

80. Online mba in India
DEIEDU is the best online Institute in the world with high class course outline and up to date learning materials. DEIEDU is providing the online mba in india, online mba in india, Distance learning mba courses in india, Correspondence mba in India Mba from distance in India, Online Executive Mba in India, distance Mba from India, Online distance mba in India. Distance learning mba degree in India.
401, fourth floor sg alpha tower
Vashundhra (up)
Phone: 9811210788
Email: info@deiedu.in
Website: http://www.deiedu.in/
online mba in india

81. Nice Blog! Experience the Best Assignment Writing Services at Assignment Help Sydney Australia

82. DIAC - We are Training industries in the field of industrial automation, industrial maintenance and industrial energy conservation. This opportunity for Fresher/Experienced ENGINEERS in terms of CORE Training And Placements. Call 9310096831.

83. Fabulous Information, thanks for sharing check it once through Devops Online Training Hyderabad for more information.

84. the blog is about Machine Learning with Python - Linear Regression #Python it is useful for students and Python Developers for more updates on python follow the link

Python Online Training

ServiceNow Online Training

mulesoft Online Training

85. • I very much enjoyed this article. Nice article thanks for given this information. I hope it useful to many PeopleHadoop admin Online Training

86. advance happy new year 2018
happy new year songs
happy new year 2018 sms hindi
new year dp for whatsapp
happy new year 2018 in advance
new year whatsapp dp
i like your post very much, keep posting such stuff in future also and i would love to read all of your posts

87. If your web site is on the discussed server, it'll share server assets along with additional web sites on the same server. This could be the most inexpensive solution, but can result in the worst overall performance if other websites are hogging your sources. These are typically not suggested if you are attempting to run an e-commerce site.!Great post…. you have done great job…its very cool blog. Linking is very useful thing you have really helped lots of people who visit this blog and provided them this useful information on opinionated astrologer. Thanks a lot for this, Well done.
AngularJS Training Institute in Chennai
AngularJS Cetification Training in Chennai
AWS Training in Chennai
AWS Devops Training in Chennai
Best AngularJS Training in Chennai
AngularJS Training in Chennai

88. Thanks for providing me this content.i read your content its so informative. Keep it up.
Python Training Course in Gurgaon

89. Thanks for sharing your knowledge with us. AITTA provide nursery teacher training & Montessori teacher training.

90. Listing your business data on these free business listing sites will increase on-line exposure and provides new avenues to achieve potential customers.

91. Nice blog that you shared with us MATLAB Assignment Help

92. Nice! thanks therefore much! thanks for sharing.
Your dairy posts area unit a lot of interesting and informative.
I think there are many people like and visit it regularly, including me.

93. PLC Training in Delhi Ncr. We DIAC Automation would like to introduce ourselves as leading company providing Placement Program in Advanced Industrial Automation Training and Process Automation Training for industries. Call @9310096831

94. thank you for sharing such a good and useful information, please keep on share like this
python training in ameerpet

95. Very informative blog that you shared with us Artificial Intelligence Assignment Help

96. NextinCareer strives to provide you a platform to solve your queries and doubts related to your career. May it be What do you want to Study next or Which career option you want to pursue in future.

97. It is very useful information about Python Training. This is the place for learner and glad to be here in this blog Thank you
Best Python Online Training
Python Training in Ameerpet

https://lnkd.in/eEikXQT

98. Nice Blog Thank You For Sharing Tableau Online Training

99. CIITN Noida provides Best java training in noida based on current industry standards that helps attendees to secure placements in their dream jobs at MNCs.The curriculum of our Java training institute in Noida is designed in a way to make sure that our students are not just able to understand the important concepts of the programming language but are also able to apply the knowledge in a practical way.

Java is inescapable, stopping meters, open transportation passes, ATMs, charge cards and TVs wherever Java is utilized.
What's more, that is the reason Well-prepared, profoundly gifted Java experts are high sought after.

If you wanna best java training, java industrial training, java summer training, core java training in noida, then join CIITN Noida.

100. Really very informative and creative contents. This concept is a good way to enhance the knowledge.thanks for sharing.
ERP SAP Training in Gurgaon

101. This information you provided in the blog that is really unique I love it!! Thanks for sharing such a great blog. Keep posting..
SAP FICO Training
SAP FICO Training Institute
SAP FICO Course

102. Great post. Thank you for sharing such useful information. Please keep sharing
Python Training in Delhi

103. Thanks for sharing this information and keep updating us. This is informatics and really useful to me.

Best Industrial Training in Noida
Best Industrial Training in Noida

104. You have done amazing work. I really impress by your post about approach a Website designing. This is very useful information for every one.

Online Robot Framework Training

105. Great post. Thank you for sharing such useful information. Please keep sharing

Best B.Tech College in Noida