*Science*

#
All 1 entries tagged

View all 115 entries tagged *Science* on Warwick Blogs | View entries tagged *Science* at Technorati | There are no images tagged *Science* on this blog

## August 02, 2018

### Data Challenge

Writing about web page https://www.kaggle.com/competitions

Today is the day to start a Data Challenge, during 5 days we are going to go through an Introduction to Data Science in Python, Regression Challenge in R and finally an introduction to Matlab on Friday. I hope this will be very enjoyable!

**Challenge in Python : Data cleaning**

This is presented as an introduction to Python the first challenge is to explore the dataset, it is fairly easy, however as you know, easier things are the best to learn and understand, of course, Leonardo Da Vinci before painting La Mona Lisa needed first to learn to draw.

The dataset that I have chosen is the Adverse events dataset, many other are available in Kaggle, for example here: https://www.kaggle.com/rtatman/fun-beginner-friendly-datasets/

The solution consists of loading the data, and use describe().

We need to note that describe() only works on continuous variables if we instead are interested in, for example, categorical variables we can use count().

Here you can find my solution!

https://github.com/csetraynor/PythonChallenge

**Challenge in R: Regression modelling**

Regression is the model of output variables (y) from input variables (x) . There are many different ways to model regression and an important kind of regression are so called "generalised linear models".

Three kinds of regression are:

-Linear: Prediction of a continous variable.

-Logistic> Prediction of a categorical variable, for example a binary output 0, 1.

-Poisson: Prediction of a count variable.

**Github repo**

This is a link to the progress line of this challenge, where I will upload all the problems for this challenge.

https://github.com/csetraynor/DataChallenge