¸üÐÂʱ¼ä:2021Äê01ÔÂ18ÈÕ16ʱ02·Ö À´Ô´:ÀÖÓãµç¾º ä¯ÀÀ´ÎÊý:

ǰÆÚ²É¼¯µ½µÄÊý¾Ý£¬»ò¶à»òÉÙ¶¼´æÔÚһЩ覴úͲ»×㣬±ÈÈçÊý¾Ýȱʧ¡¢¼«¶ËÖµ¡¢Êý¾Ý¸ñʽ²»Í³Ò»µÈÎÊÌâ¡£Òò´Ë£¬ÔÚ·ÖÎöÊý¾Ý֮ǰÐèÒª¶ÔÊý¾Ý½øÐÐÔ¤´¦Àí£¬°üÀ¨Êý¾ÝµÄÇåÏ´¡¢ºÏ²¢¡¢ÖØËÜÓëת»»¡£PandasÖÐרÃÅÌṩÁËÓÃÓÚÊý¾ÝÔ¤´¦ÀíµÄºÜ¶àº¯ÊýÓë·½·¨£¬ÓÃÓÚÌæ»»Òì³£Êý¾Ý¡¢ºÏ²¢Êý¾Ý¡¢ÖØËÜÊý¾ÝµÈ¡£
Êý¾ÝÇåÏ´ÊÇÒ»ÏÔÓÇÒ·±ËöµÄ¹¤×÷£¬Í¬Ê±Ò²ÊÇÕû¸öÊý¾Ý·ÖÎö¹ý³ÌÖÐ×îÎªÖØÒªµÄ»·½Ú¡£Êý¾ÝÇåÏ´µÄÄ¿µÄÔÚÓÚÌá¸ßÊý¾ÝÖÊÁ¿£¬½«ÔàÊý¾Ý£¨ÔàÊý¾ÝÔÚÕâÀïÖ¸µÄÊǶÔÊý¾Ý·ÖÎöûÓÐʵ¼ÊÒâÒå¡¢¸ñʽ·Ç·¨¡¢²»ÔÚÖ¸¶¨·¶Î§ÄÚµÄÊý¾Ý£©ÇåÏ´¸É¾»£¬Ê¹ÔÊý¾Ý¾ßÓÐÍêÕûÐÔ¡¢Î¨Ò»ÐÔ¡¢È¨ÍþÐÔ¡¢ºÏ·¨ÐÔ¡¢Ò»ÖÂÐÔµÈÌØµã¡£PandasÖг£¼ûµÄÊý¾ÝÇåÏ´²Ù×÷ÓпÕÖµºÍȱʧֵµÄ´¦Àí¡¢ÖØ¸´ÖµµÄ´¦Àí¡¢Òì³£ÖµµÄ´¦Àí¡¢Í³Ò»Êý¾Ý¸ñʽµÈµÈ¡£
¿ÕÖµÒ»°ã±íʾÊý¾Ýδ֪¡¢²»ÊÊÓûò½«ÔÚÒÔºóÌí¼ÓÊý¾Ý¡£È±Ê§ÖµÊÇÖ¸Êý¾Ý¼¯ÖÐij¸ö»òijЩÊôÐÔµÄÖµÊDz»ÍêÕûµÄ£¬²úÉúµÄÔÒòÖ÷ÒªÓÐÈËΪÔÒòºÍ»úеÔÒòÁ½ÖÖ£¬ÆäÖлúеÔÒòÊÇÓÉÓÚ»úÆ÷¹ÊÕÏÔì³ÉÊý¾ÝδÄÜÊÕ¼¯»ò´æ´¢Ê§°Ü£¬ÈËΪÔÒòÊÇÓÉÖ÷¹ÛʧÎó»òÓÐÒâÒþÂ÷Ôì³ÉµÄÊý¾Ýȱʧ¡£
Ò»°ã¿ÕֵʹÓÃNone±íʾ£¬È±Ê§ÖµÊ¹ÓÃNaN±íʾ¡£PandasÖÐÌṩÁËһЩÓÃÓÚ¼ì²é»ò´¦Àí¿ÕÖµºÍȱʧֵµÄº¯Êý£¬ÆäÖУ¬Ê¹ÓÃisnull()ºÍnotnull()º¯Êý¿ÉÒÔÅжÏÊý¾Ý¼¯ÖÐÊÇ·ñ´æÔÚ¿ÕÖµºÍȱʧֵ£¬¶ÔÓÚȱʧÊý¾Ý¿ÉÒÔʹÓÃdropna()ºÍfillna()·½·¨¶Ôȱʧֵ½øÐÐɾ³ýºÍÌî³ä£¬ÏÂÃæÀ´Ò»Ò»½éÉÜ¡£
1. isnull()º¯Êý
isnull()º¯ÊýµÄÓï·¨¸ñʽÈçÏ£º
pandas.isnull(obj)
ÉÏÊöº¯ÊýÖÐÖ»ÓÐÒ»¸ö²ÎÊýobj£¬±íʾ¼ì²é¿ÕÖµµÄ¶ÔÏ󣬸ú¯Êý»á·µ»ØÒ»¸ö²¼¶ûÀàÐ͵ÄÖµ£¬Èç¹û·µ»ØµÄ½á¹ûΪTrue£¬Ôò˵Ã÷ÓпÕÖµ»òȱʧֵ£¬·ñÔòΪFalse¡££¨NaN»òNoneÓ³Éäµ½TrueÖµ£¬ÆäËüÄÚÈÝÓ³Éäµ½False£©
½ÓÏÂÀ´£¬Í¨¹ýÒ»¶ÎʾÀýÀ´ÑÝʾÈçºÎͨ¹ýisnull()º¯ÊýÀ´¼ì²éȱʧֵ»ò¿ÕÖµ£¬¾ßÌå´úÂëÈçÏ£º
In [1]: from pandas import DataFrame, Series
import pandas as pd
from numpy import NaN
series_obj = Series([1, None, NaN])
pd.isnull(series_obj) # ¼ì²éÊÇ·ñΪ¿ÕÖµ»òȱʧֵ
Out[1]:
0 False
1 True
2 True
dtype£ºbool
ÉÏÊöʾÀýÖУ¬Ê×ÏÈ´´½¨ÁËÒ»¸öSeries¶ÔÏ󣬸öÔÏóÖаüº¬1¡¢NoneºÍNaNÈý¸öÖµ£¬È»ºóµ÷ÓÃisnull()º¯Êý¼ì²éSeries¶ÔÏóÖеÄÊý¾Ý£¬Êý¾ÝΪ¿ÕÖµ»òȱʧֵ¾ÍÓ³ÉäΪTrue£¬ÆäÓàÖµ¾ÍÓ³ÉäΪFalse¡£´ÓÊä³ö½á¹û¿´³ö£¬µÚÒ»¸öÊý¾ÝÊÇÕý³£µÄ£¬ºóÁ½¸öÊý¾ÝÊÇ¿ÕÖµ»òȱʧֵ¡£
2. notnull()º¯Êý
notnull()º¯ÊýÓëisnull()º¯ÊýµÄ¹¦ÄÜÊÇÒ»ÑùµÄ£¬¶¼ÊÇÅжÏÊý¾ÝÖÐÊÇ·ñ´æÔÚ¿ÕÖµ»òȱʧֵ£¬²»Í¬Ö®´¦ÔÚÓÚ£¬Ç°Õß·¢ÏÖÊý¾ÝÖÐÓпÕÖµ»òȱʧֵʱ·µ»ØFalse£¬ºóÕß·µ»ØµÄÊÇTrue¡£
½«ÉÏÊöµ÷ÓÃisnull()º¯ÊýµÄ´úÂë¸ÄΪµ÷ÓÃnotnull()º¯Êý£¬¸ÄºóµÄ´úÂëÈçÏ£º
In [2]: from pandas import DataFrame, Series
import pandas as pd
from numpy import NaN
series_obj = Series([1, None, NaN])
pd.notnull(series_obj) # ¼ì²éÊÇ·ñ²»Îª¿ÕÖµ»òȱʧֵ
Out[2]:
0 True
1 False
2 False
dtype: bool
ÉÏÊöʾÀýÖУ¬Í¨¹ýnotnull()º¯ÊýÀ´¼ì²é¿ÕÖµ»òȱʧֵ£¬Ö»Òª³öÏÖ¿ÕÖµ»òȱʧֵ¾ÍÓ³ÉäΪFalse£¬ÆäÓàÔòÓ³ÉäΪTrue¡£´ÓÊä³ö½á¹û¿´³ö£¬Ë÷Òý0¶ÔÓ¦µÄÊý¾ÝΪTrue£¬ËµÃ÷ûÓгöÏÖ¿ÕÖµ»òȱʧֵ£¬Ë÷Òý1ºÍ2¶ÔÓ¦µÄÊý¾ÝΪFalse£¬ËµÃ÷³öÏÖÁË¿ÕÖµ»òȱʧֵ¡£
3. dropna()·½·¨
dropna()·½·¨µÄ×÷ÓÃÊÇɾ³ýº¬ÓпÕÖµ»òȱʧֵµÄÐлòÁУ¬ÆäÓï·¨¸ñʽÈçÏ£º
dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
ÉÏÊö·½·¨Öв¿·Ö²ÎÊý±íʾµÄº¬ÒåÈçÏ£º
(1) axis£ºÈ·¶¨¹ýÂËÐлòÁУ¬È¡Öµ¿ÉÒÔΪ£º
0»òindex£ºÉ¾³ý°üº¬È±Ê§ÖµµÄÐУ¬Ä¬ÈÏΪ0¡£
1»òcolumns£ºÉ¾³ý°üº¬È±Ê§ÖµµÄÁС£
(2) how£ºÈ·¶¨¹ýÂ˵ıê×¼£¬È¡Öµ¿ÉÒÔΪ£º
any£ºÄ¬ÈÏÖµ¡£Èç¹û´æÔÚNaNÖµ£¬Ôòɾ³ý¸ÃÐлò¸ÃÁС£
all£ºÈç¹ûËùÓÐÖµ¶¼ÊÇNaNÖµ£¬Ôòɾ³ý¸ÃÐлò¸ÃÁС£
(3) thresh£ºc±íʾÓÐЧÊý¾ÝÁ¿µÄ×îСҪÇó¡£Èô´«ÈëÁË2£¬ÔòÊÇÒªÇó¸ÃÐлò¸ÃÁÐÖÁÉÙÓÐÁ½¸ö·ÇNaNֵʱ½«Æä±£Áô¡£
(4) subset£º±íʾÔÚÌØ¶¨µÄ×Ó¼¯ÖÐѰÕÒNaNÖµ¡£
(5) inplace£º±íʾÊÇ·ñÔÚÔÊý¾ÝÉϲÙ×÷¡£Èç¹ûÉèΪTrue£¬Ôò±íʾֱ½ÓÐÞ¸ÄÔÊý¾Ý£»Èç¹ûÉèΪFalse£¬Ôò±íʾÐÞ¸ÄÔÊý¾ÝµÄ¸±±¾£¬·µ»ØÐµÄÊý¾Ý¡£
¼ÙÉ裬ÏÖÔÚÓÐÒ»ÕŹØÓÚÊé¼®ÐÅÏ¢µÄ±í¸ñ£¬ËüÀïÃæÓÐÀà±ð¡¢ÊéÃûºÍ×÷ÕßÈýÁÐÊý¾Ý¡£ÆäÖУ¬ÔÚË÷ÒýΪ0µÄÒ»ÐÐÖÐÊéÃûΪNaN£¬±íÃ÷¸ÃλÖõÄÊý¾ÝÊÇȱʧֵ£¬Ë÷ÒýΪ1µÄÒ»ÐÐÖÐ×÷ÕßΪNone£¬±íÃ÷¸ÃλÖõÄÊý¾ÝÊÇ¿ÕÖµ¡£Èç¹ûɾ³ýÕâЩ¿ÕÖµºÍȱʧֵ£¬ÄÇôɾ³ýǰºóµÄЧ¹ûÈçͼ1Ëùʾ¡£

ͼ1 ɾ³ý¿ÕÖµ/ȱʧֵǰºóµÄ±í¸ñ
½ÓÏÂÀ´£¬Í¨¹ýÒ»¸öʾÀýÀ´ÑÝʾÈçºÎʹÓÃdropna()·½·¨É¾³ý¿ÕÖµºÍȱʧֵ£¬¾ßÌå´úÂëÈçÏ¡£
In [3]: import pandas as pd
import numpy as np
df_obj = pd.DataFrame({"Àà±ð":['С˵', 'É¢ÎÄËæ±Ê', 'Çà´ºÎÄѧ', '´«¼Ç'],
"ÊéÃû":[np.nan, '¡¶Æ¤ÄÒ¡·', '¡¶Âó̽áÊøÊ±¡·', '¡¶ÀÏÉá×Ô´«¡·'],
"×÷Õß":["ÀÏÉá", None, "ÕÅÆäöÎ", "ÀÏÉá"]})
df_obj
Out[3]: Àà±ð ÊéÃû ×÷Õß
0 С˵ NaN ÀÏÉá
1 É¢ÎÄËæ±Ê ¡¶Æ¤ÄÒ¡· None
2 Çà´ºÎÄѧ ¡¶Âó̽áÊøÊ±¡· ÕÅÆäöÎ
3 ´«¼Ç ¡¶ÀÏÉá×Ô´«¡· ÀÏÉá
In [4]: df_obj.dropna() # ɾ³ýÊý¾Ý¼¯ÖеĿÕÖµºÍȱʧֵ
Out[4]:
Àà±ð ÊéÃû ×÷Õß
2 Çà´ºÎÄѧ ¡¶Âó̽áÊøÊ±¡· ÕÅÆäöÎ
3 ´«¼Ç ¡¶ÀÏÉá×Ô´«¡· ÀÏÉá
ÉÏÊö´úÂëÖУ¬Ê×ÏÈ´´½¨Ò»¸öº¬ÓпÕÖµºÍȱʧֵµÄDataFrame¶ÔÏó£¬ÔÙÈøöÔÏóµ÷ÓÃdropna()·½·¨½«Êý¾ÝÖеĿÕÖµ»òȱʧֵ½øÐйýÂËɾ³ý£¬Ö»±£ÁôÍêÕûµÄÊý¾Ý¡£
´ÓÊä³ö½á¹û¿´³ö£¬ËùÓаüº¬¿ÕÖµ»òȱʧֵµÄÐÐÒѾ±»É¾³ýÁË¡£
4. Ìî³ä¿ÕÖµ/ȱʧֵ
Ìî³äȱʧֵºÍ¿ÕÖµµÄ·½Ê½ÓкܶàÖÖ£¬±ÈÈçÈ˹¤Ìîд¡¢ÌØÊâÖµÌîд¡¢ÈÈ¿¨Ìî³äµÈ¡£PandasÖеÄfillna()·½·¨¿ÉÒÔʵÏÖÌî³ä¿ÕÖµ»òȱʧֵ£¬ÆäÓï·¨¸ñʽÈçÏ£º
fillna(value=None, method=None, axis=None, inplace=False,limit=None, downcast=None,
**kwargs)
ÉÏÊö·½·¨Öв¿·Ö²ÎÊý±íʾµÄº¬ÒåÈçÏ£º
(1) value£ºÓÃÓÚÌî³äµÄÊýÖµ¡£
(2) method£º±íʾÌî³ä·½Ê½£¬Ä¬ÈÏֵΪNone£¬ÁíÍ⻹֧³ÖÒÔÏÂȡֵ£º
pad/ffill£º½«×îºóÒ»¸öÓÐЧµÄÊý¾ÝÏòºó´«²¥£¬Ò²¾ÍÊÇ˵ÓÃÈ±Ê§ÖµÇ°ÃæµÄÒ»¸öÖµ´úÌæÈ±Ê§Öµ¡£
backfill/bfill£º½«×îºóÒ»¸öÓÐЧµÄÊý¾ÝÏòǰ´«²¥£¬Ò²¾ÍÊÇ˵ÓÃȱʧֵºóÃæµÄÒ»¸öÖµ´úÌæÈ±Ê§Öµ¡£
(3) limit£º ¿ÉÒÔÁ¬ÐøÌî³äµÄ×î´óÊýÁ¿£¬Ä¬ÈÏNone¡£
×¢Ò⣺
method²ÎÊý²»ÄÜÓëvalue²ÎÊýͬʱʹÓá£
µ±Ê¹ÓÃfillna()·½·¨½øÐÐÌî³äʱ£¬¼È¿ÉÒÔÊDZêÁ¿¡¢×ֵ䣬Ҳ¿ÉÒÔÊÇSeries»òDataFrame¶ÔÏó¡£
¼ÙÉèÏÖÔÚÓÐÒ»Õűí¸ñ£¬ËüÀïÃæ´æÔÚһЩȱʧֵ£¬Èç¹ûʹÓÃÒ»¸ö³£Á¿66.0À´Ì滻ȱʧֵ£¬ÄÇôÌî³äǰºóµÄЧ¹ûÈçͼ2Ëùʾ¡£

ͼ2 Ìî³äȱʧֵʾÀý
Ìî³ä³£ÊýÌæ»»È±Ê§ÖµµÄʾÀý´úÂëÈçÏ¡£
In [5]: import pandas as pd
import numpy as np
from numpy import NaN
df_obj = pd.DataFrame({'A': [1, 2, 3, NaN],
'B': [NaN, 4, NaN, 6],
'C': ['a', 7, 8, 9],
'D':[ NaN, 2, 3, NaN]})
df_obj
Out[5]:
A B C D
0 1.0 NaN a NaN
1 2.0 4.0 7 2.0
2 3.0 NaN 8 3.0
3 NaN 6.0 9 NaN
In [6]: df_obj.fillna('66.0') # ʹÓÃ66.0Ìæ»»È±Ê§Öµ
Out[6]:
A B C D
0 1.0 66.0 a 66.0
1 2.0 4.0 7 2.0
2 3.0 66.0 8 3.0
3 66.0 6.0 9 66.0
ͨ¹ý±È½ÏÁ½´ÎµÄ½á¹û¿ÉÖª£¬µ±Ê¹ÓÃÈÎÒâÒ»¸öÓÐÐ§ÖµÌæ»»¿ÕÖµ»òȱʧֵʱ£¬¶ÔÏóÖÐËùÓеĿÕÖµ»òȱʧֵ¶¼½«»á±»Ìæ»»¡£
Èç¹ûÏ£ÍûÌî³ä²»Ò»ÑùµÄÄÚÈÝ£¬ÀýÈ磬AÁÐȱʧµÄÊý¾ÝʹÓÃÊý×Ö“4.0”½øÐÐÌî³ä£¬BÁÐȱʧµÄÊý¾ÝʹÓÃÊý×Ö“5.0”À´Ìî³ä£¬ÄÇôÌî³äǰºóµÄЧ¹ûÈçͼ3Ëùʾ¡£

ͼ3 Ö¸¶¨Ìî³äÁÐ
µ÷ÓÃfillna()·½·¨Ê±´«ÈëÒ»¸ö×ֵ䏸value²ÎÊý£¬ÆäÖÐ×ÖµäµÄ¼üΪÁбêÇ©£¬×ÖµäµÄֵΪ´ýÌæ»»µÄÖµ£¬ÊµÏÖ¶ÔÖ¸¶¨ÁеÄȱʧֵ½øÐÐÌæ»»£¬¾ßÌåʾÀý´úÂëÈçÏ¡£
In [7]: import pandas as pd
import numpy as np
from numpy import NaN
df_obj = pd.DataFrame({'A': [1, 2, 3, NaN],
'B': [NaN, 4, NaN, 6],
'C': ['a', 7, 8, 9],
'D': [NaN, 2, 3, NaN]})
df_obj
Out[7]:
A B C D
0 1.0 NaN a NaN
1 2.0 4.0 7 2.0
2 3.0 NaN 8 3.0
3 NaN 6.0 9 NaN
In [8]: df_obj.fillna({'A': 4.0, 'B': 5.0}) # Ö¸¶¨ÁÐÌî³äÊý¾Ý
Out[8]:
A B C D
0 1.0 5.0 a NaN
1 2.0 4.0 7 2.0
2 3.0 5.0 8 3.0
3 4.0 6.0 9 NaN
Èç¹ûÏ£ÍûÌî³äÏàÁÚµÄÊý¾ÝÀ´Ì滻ȱʧֵ£¬ÀýÈ磬A~DÁÐÖа´´ÓǰÍùºóµÄ˳ÐòÌî³äȱʧµÄÊý¾Ý£¬Ò²¾ÍÊÇ˵ÔÚµ±Ç°ÁÐÖÐʹÓÃλÓÚÈ±Ê§ÖµÇ°ÃæµÄÊý¾Ý½øÐÐÌæ»»£¬Ìî³äǰºóµÄЧ¹ûÈçͼ4Ëùʾ¡£

ͼ4 ǰÏòÌî³äʾÀý
µ÷ÓÃfillna()·½·¨Ê±½«“ffill”´«Èë¸ømethod²ÎÊý£¬ÊµÏÖǰÏòÌî³äȱʧµÄÊý¾Ý£¬¾ßÌåʾÀý´úÂëÈçÏ¡£
In [9]: import pandas as pd
import numpy as np
from numpy import NaN
df = pd.DataFrame({'A': [1, 2, 3, None],
'B': [NaN, 4, None, 6],
'C': ['a', 7, 8, 9],
'D': [None, 2, 3, NaN]})
df
Out[9]:
A B C D
0 1.0 NaN a NaN
1 2.0 4.0 7 2.0
2 3.0 NaN 8 3.0
3 NaN 6.0 9 NaN
In [10]: df.fillna(method='ffill') # ʹÓÃǰÏòÌî³äµÄ·½Ê½Ìæ»»¿ÕÖµ»òȱʧֵ
Out[10]:
A B C D
0 1.0 NaN a NaN
1 2.0 4.0 7 2.0
2 3.0 4.0 8 3.0
²ÂÄãϲ»¶£º
PythonÖг£ÓõÄÊý¾Ý·ÖÎö¹¤¾ß£¨Ä£¿é£©ÓÐÄÄЩ£¿
ÀÖÓãµç¾ºPythonÅàѵ¿Î³Ì
±±¾©Ð£Çø