WY

wyu-du/DataScience-learning

my course work for the Applied Data Science with Python in Coursera

data-science python3

DataScience-learning

My course work for the Applied Data Science with Python in Coursera

Course 1: Introduction to Data Science in Python

Python

basic functions, types, strings and dates
read and write csv files
map, lambda, list
numpy

Pandas

data structure: series and dataframe
dataframe manipulation: load, index, query and deal with missing values
merge, group by and scale the data
pivot tables and time series
assignment reflections:
- sort the value of the dictionary: turn the dictionary into a tuple (use .items()), then use sorted() to sort the values in the tuple
- different ways of indexing: df['column_name'] get a column, df.loc['index_name'] get a row, df.iloc[a,b] get the value of a cell
- excute function by row: df.apply(function, axis=1)
- scale the data: cut() + Categorical()
- convert a float number to a string with thousands separator: format(number,',')

Statistical Analysis

distributions
hypothesis testing
assignment reflections:
- change the name of a column: use .T to transpose the dataframe, then use .rename() to change the name by row
- change a dataframe to a tuple: tuple(x) for x in df.values
- get the data which is not from tuple_list from a dataframe: df.loc[~df.index.isin(tuple_list)].dropna()

Course reflections

pandas is a powerful tool which can handle complex problems with one line of code
data pre-pocessing is extremely important, because if not handled properly, the rest of the work will be no sense
multiple check is necessary when doing the data analysis project, small mistakes may totally change the outcome

On this page

Languages

Python100.0%

Created September 29, 2017

Updated March 5, 2022

wyu-du/DataScience-learning | GitHunt