ÀÖÓãµç¾º

  • ½ÌÓýÐÐÒµA¹ÉIPOµÚÒ»¹É£¨¹ÉƱ´úÂë 003032£©

    È«¹ú×Éѯ/ͶËßÈÈÏߣº400-618-4000

    Python¿âpandasʹÓõÄ5ÖÖ¼¼ÇÉ

    ¸üÐÂʱ¼ä:2017Äê12ÔÂ26ÈÕ15ʱ20·Ö À´Ô´:ÀÖÓã²¥¿Í ä¯ÀÀ´ÎÊý:

    Python ÕýѸËÙ³ÉΪÊý¾Ý¿ÆÑ§¼ÒÃǸüΪÖÓ°®µÄ±à³ÌÓïÑÔ¡£ÐγɸÃÏÖ×´µÄÀíÓɷdz£³ä·Ö£ºPython ÌṩÁËÒ»ÖÖ¸²¸Ç·¶Î§¸üΪ¹ãÀ«µÄ±à³ÌÓïÑÔÉú̬ϵͳ£¬ÒÔ¼°¾ßÓÐÒ»¶¨¼ÆËãÉî¶ÈÇÒÐÔÄÜÁ¼ºÃµÄ¿ÆÑ§¼ÆËã¿â¡£

    ÔÚ Python ×Ô´øµÄ¿ÆÑ§¼ÆËã¿âÖУ¬Pandas Ä£¿éÊÇ×îÊÊÓÚÊý¾Ý¿ÆÑ§Ïà¹Ø²Ù×÷µÄ¹¤¾ß¡£±¾ÎÄ×ÅÖØ½éÉÜÁË Python ÖÐÊý¾Ý´¦ÀíµÄ 5ÖÖ·½·¨¡£

    Ê×Ïȵ¼ÈëÏà¹ØÄ£¿é²¢¼ÓÔØÊý¾Ý¼¯µ½ Python »·¾³ÖУº

    import pandas as pd

    import numpy as np

    data = pd.read_csv("***.csv", index_col="Loan_ID")

    1. Apply º¯Êý

    Apply º¯ÊýÊÇ´¦ÀíÊý¾ÝºÍ½¨Á¢Ð±äÁ¿µÄ³£Óú¯ÊýÖ®Ò»¡£ÔÚÏòÊý¾Ý¿òµÄÿһÐлòÿһÁд«µÝÖ¸¶¨º¯Êýºó£¬Apply º¯Êý»á·µ»ØÏàÓ¦µÄÖµ¡£Õâ¸öÓÉ Apply ´«ÈëµÄº¯Êý¿ÉÒÔÊÇϵͳĬÈϵĻòÕßÓû§×Ô

    def num_missing(x):

    return sum(x.isnull())

    #Applying per column:

    print "Missing values per column:"

    print data.apply(num_missing, axis=0)

    2.Ìȱʧֵ

    fillna() º¯Êý¿ÉÒ»´ÎÐÔÍê³ÉÌî²¹¹¦ÄÜ¡£Ëü¿ÉÒÔÀûÓÃËùÔÚÁеľùÖµ/ÖÚÊý/ÖÐλÊýÀ´Ìæ»»¸ÃÁеÄȱʧÊý¾Ý¡£ÏÂÃæÀûÓÓGender”¡¢“Married”¡¢ºÍ“Self_Employed”ÁÐÖи÷×ÔµÄÖÚÊýÖµÌî²¹¶ÔÓ¦ÁеÄȱʧÊý¾Ý¡£

    from scipy.stats import mode

    mode(data['Gender'])

    3. Êý¾Ý͸ÊÓ±í

    Pandas ¿É½¨Á¢ MS Excel ÀàÐ͵ÄÊý¾Ý͸ÊÓ±í¡£ÀýÈçÔÚÏÂÎĵĴúÂë¶ÎÀ¹Ø¼üÁГLoanAmount” ´æÔÚȱʧֵ¡£ÎÒÃÇ¿ÉÒÔ¸ù¾Ý“Gender”£¬“Married”ºÍ“Self_Employed”·Ö×éºóµÄƽ¾ù½ð¶îÀ´Ìæ»»¡£ “LoanAmount”µÄ¸÷×é¾ùÖµ¿ÉÓÉÈçÏ·½·¨È·¶¨

    4. ¸´ºÏË÷Òý

    Èç¹ûÄú×¢Òâ¹Û²ì#3¼ÆËãµÄÊä³öÄÚÈÝ£¬»á·¢ÏÖËüÓÐÒ»¸öÆæ¹ÖµÄÐÔÖÊ¡£¼´Ã¿¸öË÷Òý¾ùÓÉÈý¸öÊýÖµµÄ×éºÏ¹¹³É£¬³ÆÎª¸´ºÏË÷Òý¡£ËüÓÐÖúÓÚÔËËã²Ù×÷µÄ¿ìËÙ½øÐС£

    ´Ó#3µÄÀý×Ó¼ÌÐø¿ªÊ¼£¬ÒÑ֪ÿ¸ö·Ö×éÊý¾ÝÖµµ«»¹Î´½øÐÐÊý¾ÝÌî²¹¡£¾ßÌåµÄÌî²¹·½Ê½¿É½áºÏ´Ëǰѧµ½µÄ¶à¸ö¼¼ÇÉÀ´Íê³É¡£

    for i,row in data.loc[data['LoanAmount'].isnull(),:].iterrows():

    ind = tuple([row['Gender'],row['Married'],row['Self_Employed']])

    data.loc[i,'LoanAmount'] = impute_grps.loc[ind].values[0]

    #Now check the #missing values again to confirm:

    print data.apply(num_missing, axis=0)

    5. Crosstab º¯Êý

    ¸Ãº¯ÊýÓÃÓÚ»ñÈ¡Êý¾ÝµÄ³õʼӡÏó(Ö±¹ÛÊÓͼ)£¬´Ó¶øÑé֤һЩ»ù±¾¼ÙÉè¡£ÀýÈçÔÚ±¾ÀýÖУ¬“Credit_History”±»ÈÏΪ»áÏÔÖøÓ°Ïì´û¿î״̬¡£Õâ¸ö¼ÙÉè¿ÉÒÔͨ¹ýÈçÏ´úÂëÉú³ÉµÄ½»²æ±í½øÐÐÑéÖ¤£º

    pd.crosstab(data["Credit_History"],data["Loan_Status"],margins=True)

    ±¾ÎİæÈ¨¹éÀÖÓã²¥¿ÍÈ˹¤ÖÇÄÜ+PythonѧԺËùÓУ¬»¶Ó­×ªÔØ£¬×ªÔØÇë×¢Ã÷×÷Õß³ö´¦¡£Ð»Ð»£¡
    ×÷ÕߣºÀÖÓã²¥¿ÍÈ˹¤ÖÇÄÜ+PythonѧԺ
    Ê×·¢£ºhttp://python.itcast.cn/
    0 ·ÖÏíµ½£º
    ºÍÎÒÃÇÔÚÏß½»Ì¸£¡
    ¡¾ÍøÕ¾µØÍ¼¡¿¡¾sitemap¡¿