Predict Apparent Temperature using Weather in Szeged Dataset
Weather in Szeged Data Set consists of data about Hourly/daily summary with temperature, pressure etc , in Szeged Hungary. The goal is to predict the Apparent Temperature for given input data . this dataset is available in kaggle .
this dataset have 17 columns .
Formatted Date’, ‘Summary’, ‘Precipitation Type’, ‘Temperature ©’, ‘Apparent Temperature ©’, ‘Humidity’, ‘Wind Speed (km/h)’, ‘Wind Bearing (degrees)’, ‘Visibility (km)’, ‘Loud Cover’, ‘Pressure (millibars)’, ‘Daily Summary’
first we will check whether there are any null values . and remove them .

here we can see ‘Precip Type’ column has null values . so we will remove that column.
next we will visualize the data with boxplot diagram to analyze the values furthermore .

here we can see that there are outliers to be handled . so we will use Inter Quartile Range(IQR) method to handle those outliers. we only handle outliers in ‘Humidity’ and ’Pressure (millibars)’ columns because other values are normally distributed .

next we will take a look at the histograms of our data.

after that we are using standard scaler to scale our dataset to a standard value set.

we will use correlation matrix to check the correlation between columns.

hence this is a linear model , and there us relation between the columns we are using Principal Component Analysis (PCA) to reduce the columns to better train the liner model .

here we can see the 7 columns have reduced to three columns .
now we will use these 3columns to train the linear regression model .
we will use train and test split with 20% test to train and test our model.
after training we can get the Mean Squared Error values as

and r2 score as

coefficient and intercept of our linear model.

score of our linear model.

model visualization 2D

model visualization 3D
