This Repo required for Asac labs class 2
The Pandas library is one of the most preferred tools for data scientists to do data manipulation and analysis, next to matplotlib for data visualization and NumPy, the fundamental library for scientific computing in Python on which Pandas was built.
The fast, flexible, and expressive Pandas data structures are designed to make real-world data analysis significantly easier, but this might not be immediately the case for those who are just getting started with it. Exactly because there is so much functionality built into this package that the options are overwhelming.
data = pd.read_csv('my_file.csv')
data.to_csv('my_new_file.csv', index=None)
data.shape
OR data.describe()
data.head(3)
OR data.loc[8]
OR data.loc[8, 'column_1']
OR data.loc[range(4,6)]
data[data['column_1']=='french']
data[(data['column_1']=='french') & (data['year_born']==1990)]
data[(data['column_1']=='french') & (data['year_born']==1990) & ~(data['city']=='London')]
data[data['column_1'].isin(['french', 'english'])]
Basic plotting
data['column_numerical'].plot()
data['column_numerical'].hist()
%matplotlib inline/
data.loc[8, 'column_1'] = 'english'
data.loc[data['column_1']=='french', 'column_1'] = 'French'
data['column_1'].value_counts()
data['column_1'].map(len)
data['column_1'].map(len).map(lambda x: x/100).plot()
data.apply(sum)
data.corr()
data.corr().applymap(lambda x: int(x*100)/100)
pd.plotting.scatter_matrix(data, figsize=(12,8))
data.merge(other_data, on=['column_1', 'column_2', 'column_3'])
data.groupby('column_1')['column_2'].apply(sum).reset_index()
dictionary = {}
for i,row in data.iterrows():