ÀÖÓãµç¾º



½ÌÓýÐÐÒµA¹ÉIPOµÚÒ»¹É£¨¹ÉƱ´úÂë 003032£©

È«¹ú×Éѯ/ͶËßÈÈÏߣº400-618-4000

python¿âpandasÖ®5ÖÖʹÓü¼ÇÉ

¸üÐÂʱ¼ä:2017Äê11ÔÂ13ÈÕ17ʱ30·Ö À´Ô´:ÀÖÓã²¥¿Í ä¯ÀÀ´ÎÊý:

Python ÕýѸËÙ³ÉΪÊý¾Ý¿ÆÑ§¼ÒÃǸüΪÖÓ°®µÄ±à³ÌÓïÑÔ¡£ÐγɸÃÏÖ×´µÄÀíÓɷdz£³ä·Ö£ºPython ÌṩÁËÒ»ÖÖ¸²¸Ç·¶Î§¸üΪ¹ãÀ«µÄ±à³ÌÓïÑÔÉú̬ϵͳ£¬ÒÔ¼°¾ßÓÐÒ»¶¨¼ÆËãÉî¶ÈÇÒÐÔÄÜÁ¼ºÃµÄ¿ÆÑ§¼ÆËã¿â¡£


ÔÚ Python ×Ô´øµÄ¿ÆÑ§¼ÆËã¿âÖУ¬Pandas Ä£¿éÊÇ×îÊÊÓÚÊý¾Ý¿ÆÑ§Ïà¹Ø²Ù×÷µÄ¹¤¾ß¡£±¾ÎÄ×ÅÖØ½éÉÜÁË Python ÖÐÊý¾Ý´¦ÀíµÄ5ÖÖ·½·¨¡£


Ê×Ïȵ¼ÈëÏà¹ØÄ£¿é²¢¼ÓÔØÊý¾Ý¼¯µ½ Python »·¾³ÖУº


import pandas as pd

import numpy as np

data = pd.read_csv("***.csv", index_col="Loan_ID")


1¡¢Apply º¯Êý


Apply º¯ÊýÊÇ´¦ÀíÊý¾ÝºÍ½¨Á¢Ð±äÁ¿µÄ³£Óú¯ÊýÖ®Ò»¡£ÔÚÏòÊý¾Ý¿òµÄÿһÐлòÿһÁд«µÝÖ¸¶¨º¯Êýºó£¬Apply º¯Êý»á·µ»ØÏàÓ¦µÄÖµ¡£Õâ¸öÓÉ Apply ´«ÈëµÄº¯Êý¿ÉÒÔÊÇϵͳĬÈϵĻòÕßÓû§×Ô


def num_missing(x):

return sum(x.isnull())

#Applying per column:

print "Missing values per column:"

print data.apply(num_missing, axis=0)


2¡¢Ìȱʧֵ


fillna() º¯Êý¿ÉÒ»´ÎÐÔÍê³ÉÌî²¹¹¦ÄÜ¡£Ëü¿ÉÒÔÀûÓÃËùÔÚÁеľùÖµ/ÖÚÊý/ÖÐλÊýÀ´Ìæ»»¸ÃÁеÄȱʧÊý¾Ý¡£ÏÂÃæÀûÓÓGender”¡¢“Married”¡¢ºÍ“Self_Employed”ÁÐÖи÷×ÔµÄÖÚÊýÖµÌî²¹¶ÔÓ¦ÁеÄȱʧÊý¾Ý¡£


from scipy.stats import mode

mode(data['Gender'])


3¡¢ Êý¾Ý͸ÊÓ±í


Pandas ¿É½¨Á¢ MS Excel ÀàÐ͵ÄÊý¾Ý͸ÊÓ±í¡£ÀýÈçÔÚÏÂÎĵĴúÂë¶ÎÀ¹Ø¼üÁГLoanAmount” ´æÔÚȱʧֵ¡£ÎÒÃÇ¿ÉÒÔ¸ù¾Ý“Gender”£¬“Married”ºÍ“Self_Employed”·Ö×éºóµÄƽ¾ù½ð¶îÀ´Ìæ»»¡£ “LoanAmount”µÄ¸÷×é¾ùÖµ¿ÉÓÉÈçÏ·½·¨È·¶¨


4¡¢¸´ºÏË÷Òý


Èç¹ûÄú×¢Òâ¹Û²ì#3¼ÆËãµÄÊä³öÄÚÈÝ£¬»á·¢ÏÖËüÓÐÒ»¸öÆæ¹ÖµÄÐÔÖÊ¡£¼´Ã¿¸öË÷Òý¾ùÓÉÈý¸öÊýÖµµÄ×éºÏ¹¹³É£¬³ÆÎª¸´ºÏË÷Òý¡£ËüÓÐÖúÓÚÔËËã²Ù×÷µÄ¿ìËÙ½øÐС£


´Ó#3µÄÀý×Ó¼ÌÐø¿ªÊ¼£¬ÒÑ֪ÿ¸ö·Ö×éÊý¾ÝÖµµ«»¹Î´½øÐÐÊý¾ÝÌî²¹¡£¾ßÌåµÄÌî²¹·½Ê½¿É½áºÏ´Ëǰѧµ½µÄ¶à¸ö¼¼ÇÉÀ´Íê³É¡£


for i,row in data.loc[data['LoanAmount'].isnull(),:].iterrows():

ind = tuple([row['Gender'],row['Married'],row['Self_Employed']])

data.loc[i,'LoanAmount'] = impute_grps.loc[ind].values[0]

#Now check the #missing values again to confirm:

print data.apply(num_missing, axis=0)


5¡¢Crosstab º¯Êý


¸Ãº¯ÊýÓÃÓÚ»ñÈ¡Êý¾ÝµÄ³õʼӡÏó(Ö±¹ÛÊÓͼ)£¬´Ó¶øÑé֤һЩ»ù±¾¼ÙÉè¡£ÀýÈçÔÚ±¾ÀýÖУ¬“Credit_History”±»ÈÏΪ»áÏÔÖøÓ°Ïì´û¿î״̬¡£Õâ¸ö¼ÙÉè¿ÉÒÔͨ¹ýÈçÏ´úÂëÉú³ÉµÄ½»²æ±í½øÐÐÑéÖ¤£º


pd.crosstab(data["Credit_History"],data["Loan_Status"],margins=True)
ÓÑÇéÌáʾ£º»ñµÃ¸ü¶àѧ¿ÆÑ§Ï°ÊÓÆµ+×ÊÁÏ+Ô´Â룬Çë¼ÓQQ£º3276250747¡£



±¾ÎİæÈ¨¹éÀÖÓã²¥¿ÍÈ˹¤ÖÇÄÜ+PythonѧԺËùÓУ¬»¶Ó­×ªÔØ£¬×ªÔØÇë×¢Ã÷×÷Õß³ö´¦¡£Ð»Ð»£¡
×÷ÕߣºÀÖÓã²¥¿ÍÈ˹¤ÖÇÄÜ+PythonѧԺ
Ê×·¢£ºhttp://python.itcast.cn/
0 ·ÖÏíµ½£º
ºÍÎÒÃÇÔÚÏß½»Ì¸£¡
¡¾ÍøÕ¾µØÍ¼¡¿¡¾sitemap¡¿