ÀÖÓãµç¾º

½ÌÓýÐÐÒµA¹ÉIPOµÚÒ»¹É£¨¹ÉƱ´úÂë 003032£©

È«¹ú×Éѯ/ͶËßÈÈÏߣº400-618-4000

ÓïÑÔÄ£ÐÍ-BERT£ºbertËã·¨½éÉÜ

¸üÐÂʱ¼ä:2020Äê09ÔÂ07ÈÕ13ʱ59·Ö À´Ô´:ÀÖÓãµç¾º ä¯ÀÀ´ÎÊý:

È˹¤ÖÇÄÜÈ˲ÅÅàѵÖúѧ½ð

±¾ÎĵÄÄ¿µÄÊÇÏòNLP°®ºÃÕßÃÇÏêϸ½âÎöÒ»¸öÖøÃûµÄÓïÑÔÄ£ÐÍ-BERT¡£ È«ÎĽ«·Ö4¸ö²¿·ÖÓÉdzÈëÉîµÄÒÀ´Î½²½â¡£

1.Bert¼ò½é

BERTÊÇ2018Äê10ÔÂÓÉGoogle AIÑо¿ÔºÌá³öµÄÒ»ÖÖԤѵÁ·Ä£ÐÍ¡£

BERTµÄÈ«³ÆÊÇBidirectional Encoder Representation from Transformers¡£BERTÔÚ»úÆ÷ÔĶÁÀí½â¶¥¼¶Ë®Æ½²âÊÔSQuAD1.1ÖбíÏÖ³ö¾ªÈ˵ijɼ¨: È«²¿Á½¸öºâÁ¿Ö¸±êÉÏÈ«Ãæ³¬Ô½ÈËÀ࣬²¢ÇÒÔÚ11ÖÖ²»Í¬NLP²âÊÔÖд´³öSOTA±íÏÖ£¬°üÀ¨½«GLUE»ù×¼ÍÆ¸ßÖÁ80.4% (¾ø¶Ô¸Ä½ø7.6%)£¬MultiNLI׼ȷ¶È´ïµ½86.7% (¾ø¶Ô¸Ä½ø5.6%)£¬³ÉΪNLP·¢Õ¹Ê·ÉϵÄÀï³Ì±®Ê½µÄÄ£ÐͳɾÍ¡£


2.¹ØÓÚBertµÄÄ£Ðͼܹ¹

×ÜÌå¼Ü¹¹£ºÈçÏÂͼËùʾ, ×î×ó±ßµÄ¾ÍÊÇBERTµÄ¼Ü¹¹Í¼£¬¿ÉÒÔºÜÇå³þµÄ¿´µ½BERT²ÉÓÃÁËTransformer Encoder block½øÐÐÁ¬½Ó£¬ ÒòΪÊÇÒ»¸öµäÐ͵ÄË«Ïò±àÂëÄ£ÐÍ¡£

BERT01

BERT02


3.1 ¹ØÓÚBertѵÁ·¹ý³ÌÖеĹؼüµã

1)ËĴ󹨼ü´Ê: Pre-trained, Deep, Bidirectional Transformer, Language Understanding

a. Pre-trained: Ê×ÏÈÃ÷È·ÕâÊǸöԤѵÁ·µÄÓïÑÔÄ£ÐÍ£¬Î´À´ËùÓеĿª·¢Õß¿ÉÒÔÖ±½Ó¼Ì³Ð!

Õû¸öBertÄ£ÐÍ×î´óµÄÁ½¸öÁÁµã¶¼¼¯ÖÐÔÚPre-trainedµÄÈÎÎñ²¿·Ö¡£

b. Deep

Bert_BASE:Layer = 12, Hidden = 768, Head = 12, Total Parameters = 110M

Bert_LARGE:Layer = 24, Hidden = 1024, Head = 16, Total Parameters = 340M

¶Ô±ÈÓÚTransformer: Layer = 6, Hidden = 2048, Head = 8£¬ÊǸödz¶ø¿í£¬ËµÃ÷BertÕâÑùÉî¶øÕ­µÄÄ£ÐÍЧ¹û¸üºÃ(ºÍCVÁìÓòµÄ×ÜÌå½áÂÛ»ù±¾Ò»ÖÂ)¡£

C. Bidirectional Transformer: BertµÄ¸ö´´Ðµã£¬ËüÊǸöË«ÏòµÄTransformerÍøÂç¡£

BertÖ±½ÓÒýÓÃÁËTransformer¼Ü¹¹ÖеÄEncoderÄ£¿é£¬²¢ÉáÆúÁËDecoderÄ£¿é, ÕâÑù±ã×Ô¶¯ÓµÓÐÁËË«Ïò±àÂëÄÜÁ¦ºÍÇ¿´óµÄÌØÕ÷ÌáÈ¡ÄÜÁ¦¡£

D. Language Understanding: ¸ü¼Ó²àÖØÓïÑÔµÄÀí½â£¬¶ø²»½ö½öÊÇÉú³É(Language Generation)


3.2 BertµÄÓïÑÔÊäÈë±íʾ°üº¬ÁË3¸ö×é³É²¿·Ö: (¼ûÉÏÃæµÚ¶þÕÅͼ)

´ÊǶÈëÕÅÁ¿: word embeddings

Óï¾ä·Ö¿éÕÅÁ¿: segmentation embeddings

λÖñàÂëÕÅÁ¿: position embeddings

×îÖÕµÄembeddingÏòÁ¿Êǽ«ÉÏÊöµÄ3¸öÏòÁ¿Ö±½Ó×ö¼ÓºÍµÄ½á¹û¡£


3.3: BertµÄԤѵÁ·ÖÐÒýÈëÁ½´óºËÐÄÈÎÎñ (ÕâÁ½¸öÈÎÎñÒ²ÊÇBertԭʼÂÛÎĵÄÁ½¸ö×î´óµÄ´´Ðµã)

a ÒýÈëMasked LM(´ømaskµÄÓïÑÔÄ£ÐÍѵÁ·)

a.1 ÔÚԭʼѵÁ·Îı¾ÖУ¬Ëæ»úµÄ³éÈ¡15%µÄtoken×÷Ϊ¼´½«²ÎÓëmaskµÄ¶ÔÏó¡£

a.2 ÔÚÕâЩ±»Ñ¡ÖеÄtokenÖУ¬Êý¾Ý?Éú³ÉÆ÷Æ÷²¢²»²»ÊǰÑËûÃÇÈ«²¿±ä³É[MASK]£¬?¶øÊÇÓÐÏÂÁÐÁÐ3¸öÑ¡Ôñ:

a.2.1 ÔÚ80%µÄ¸ÅÂÊÏ£¬ÓÃ[MASK]±ê¼ÇÌæ»»¸Ãtoken, ±ÈÈçmy dog is hairy -> my dog is [MASK]

a.2.2 ÔÚ10%µÄ¸ÅÂÊÏÂ, ??¸öËæ»úµÄµ¥´ÊÌæ»»¸Ãtoken, ±ÈÈçmy dog is hairy -> my dog is apple

a.2.3 ÔÚ10%µÄ¸ÅÂÊÏÂ, ±£³Ö¸Ãtoken²»±ä, ±ÈÈçmy dog is hairy -> my dog is hairy

a.3 Transformer EncoderÔÚѵÁ·µÄ¹ý³ÌÖÐ, ²¢²»ÖªµÀËü½«ÒªÔ¤²âÄÄЩµ¥´Ê? ÄÄЩµ¥´ÊÊÇԭʼµÄÑù? ÄÄЩµ¥´Ê±»ÕÚÑÚ³ÉÁË[MASK]? ÄÄЩµ¥´Ê±»Ìæ»»³ÉÁËÆäËûµ¥´Ê? ÕýÊÇÔÚÕâÑùÒ»Öָ߶Ȳ»È·¶¨µÄÇé¿öÏÂ, ·´µ¹±Æ×ÅÄ£ÐÍ¿ìËÙѧϰ¸ÃtokenµÄ·Ö²¼Ê½ÉÏÏÂÎĵÄÓïÒå, ¾¡×î´óŬÁ¦Ñ§Ï°Ô­Ê¼ÓïÑÔ˵»°µÄÑù×Ó!!! ͬʱÒòΪԭʼÎı¾ÖÐÖ»ÓÐ15%µÄtoken²ÎÓëÁËMASK²Ù×÷, ²¢²»»áÆÆ»µÔ­ÓïÑԵıí´ïÄÜÁ¦ºÍÓïÑÔ¹æÔò!!!

b ÒýÈëNext Sentence Prediction (ÏÂ?¾ä»°µÄÔ¤²âÈÎÎñ)

b.1 Ä¿µÄÊÇΪÁË·þÎñÎÊ´ð£¬ÍÆÀí£¬¾ä?Ö÷Ìâ¹ØÏµµÈNLPÈÎÎñ¡£

b.2 ËùÓеIJÎÓëÈÎÎñѵÁ·µÄÓï¾ä¶¼±»Ñ¡ÖвμÓ¡£

·50%µÄBÊÇԭʼ?±¾ÖÐʵ¼Ê¸úËæAµÄÏÂ?¾ä»°¡£(±ê¼ÇΪIsNext£¬´ú±íÕýÑù±¾)

·50%µÄBÊÇԭʼ?±¾ÖÐËæ»ú³éÈ¡µÄ?¾ä»°¡£(±ê¼ÇΪNotNext£¬´ú±í¸ºÑù±¾)

b.3 ÔÚ¸ÃÈÎÎñÖУ¬BertÄ£ÐÍ¿ÉÒÔÔÚ²âÊÔ¼¯ÉÏÈ¡µÃ97-98%µÄ׼ȷÂÊ¡£


3.4 ¹ØÓÚ»ùÓÚBertµÄÄ£ÐÍ΢µ÷(fine-tuning)

Ö»ÐèÒª½«Ìض¨ÈÎÎñµÄÊäÈ룬Êä³ö²åÈëµ½BertÖУ¬ÀûÓÃTransformerÇ¿´óµÄ×¢ÒâÁ¦»úÖÆ¾Í¿ÉÒÔÄ£ÄâºÜ¶àÏÂÓÎÈÎÎñ¡£(¾ä×Ó¶Ô¹ØÏµÅжÏ£¬µ¥Îı¾Ö÷Ìâ·ÖÀ࣬ÎÊ´ðÈÎÎñ(QA)£¬µ¥¾äÌù±êÇ©(ÃüÃûʵÌåʶ±ð))

΢µ÷µÄÈô¸É¾­Ñé:

batch size:16,32

epochs:3,4

learning rate:2e-5,5e-5

È«Á¬½Ó²ãÌí¼Ó:layers:1-3,hidden_size:64,128

BERT03


4¡¢BertÄ£Ðͱ¾ÉíµÄÓŵãºÍȱµã¡£ 


Óŵã: BertµÄ»ù´¡½¨Á¢ÔÚtransformerÖ®ÉÏ£¬ÓµÓÐÇ¿´óµÄÓïÑÔ±íÕ÷ÄÜÁ¦ºÍÌØÕ÷ÌáÈ¡ÄÜÁ¦¡£ÔÚ11Ïî NLP»ù×¼²âÊÔÈÎÎñÖдﵽÁËstate of the art¡£Í¬Ê±ÔÙ´ÎÖ¤Ã÷ÁËË«ÏòÓïÑÔÄ£Ð͵ÄÄÜÁ¦¸ü¼ÓÇ¿´ó¡£

ȱµã:
1)¿É¸´ÏÖÐԲ»ù±¾Ã»·¨×ö£¬Ö»ÄÜÄÃÀ´Ö÷ÒåÖ±½ÓÓÃ!
2)ѵÁ·¹ý³ÌÖÐÒòΪÿ¸öbatch_sizeÖеÄÊý¾ÝÖ»ÓÐ15%²ÎÓëÔ¤²â£¬Ä£ÐÍÊÕÁ²½ÏÂý£¬ÐèҪǿ´óµÄËãÁ¦Ö§³Å!

ÒýÉê:

1)Éî¶Èѧϰ¾ÍÊDZíÕ÷ѧϰ (Deep learning is representation learning)

·Õû¸öBertÔÚ11ÏîÓïÑÔÄ£ÐÍ´óÈüÖУ¬»ù±¾Ë¼Â·¾ÍÊÇË«ÏòTransformer¸ºÔðÌáÈ¡ÌØÕ÷£¬È»ºóÕû¸öÍøÂç¼ÓÒ»¸öÈ«Á¬½ÓÏßÐÔ²ã×÷Ϊfine-tuning΢µ÷¡£µ«¼´±ãÈç´Ëɵ¹ÏʽµÄ×é×°£¬ÔÚNLPÖÐÖøÃûµÄÄÑÈÎÎñ-NER(ÃüÃûʵÌåʶ±ð)ÖУ¬ÉõÖÁÖ±½ÓÈ¥³ýµôÁËCRF²ã£¬ÕÕÑù´ó³¬Ô½BiLSTM + CRFµÄ×éºÏЧ¹û, ÕâÈ¥ÄĶù˵ÀíÈ¥???

2)¹æÄ£µÄ¼«¶ËÖØÒªÐÔ (Scale matters)

²»¹ÜÊÇMasked LM£¬»¹ÊÇÏÂÒ»¾äÔ¤²âNext Sentence Prediction£¬¶¼²»ÊÇÊ×´´µÄ¸ÅÄ֮ǰÔÚÆäËûµÄÄ£ÐÍÖÐÒ²Ìá³ö¹ý£¬µ«ÊÇÒòΪÊý¾Ý¹æÄ£+ËãÁ¦¾ÖÏÞûÄÜÈÃÊÀÈË¿´µ½Õâ¸öÄ£Ð͵ÄDZÁ¦£¬ÄÇЩPaperÒ²¾Í²»ÖµÇ®ÁË¡£µ«Êǵ½Á˹ȸèÊÖÀï, ²»²îÇ®µÄ½á¹û¾ÍÊÇPaperֵǮÁË!!

3)¹ØÓÚ½øÒ»²½µÄÑо¿Õ¹Ê¾ÁËBertÔÚ²»Í¬µÄ²ãѧϰµ½ÁËʲô¡£

·µÍµÄÍøÂç²ã²¶×½µ½Á˶ÌÓï½á¹¹·½ÃæµÄÐÅÏ¢¡£

·µ¥´ÊºÍ×ÖµÄÌØÕ÷±íÏÖÔÚ3-4²ã£¬¾ä·¨ÐÅÏ¢µÄÌØÕ÷±íÏÖÔÚ6-9²ã£¬¾ä?ÓïÒåÐÅÏ¢µÄÌØÕ÷±íÏÖÔÚ10-12²ã¡£

·Ö÷νһÖµÄÌØÕ÷±íÏÖÔÚ8-9²ã (ÊôÓھ䷨ÐÅÏ¢µÄÒ»ÖÖ)¡£




²ÂÄãϲ»¶£º

realmÖÐÈçºÎʹÓÃÉ¢ÁÐËã·¨?

ÇóTopNÈÈËѹؼü´Ê

È˹¤ÖÇÄÜËã·¨ÈçºÎѧϰÊý¾ÝÖеĹæÂÉ?

0 ·ÖÏíµ½£º
ºÍÎÒÃÇÔÚÏß½»Ì¸£¡
¡¾ÍøÕ¾µØÍ¼¡¿¡¾sitemap¡¿