ÀÖÓãµç¾º

½ÌÓýÐÐÒµA¹ÉIPOµÚÒ»¹É£¨¹ÉƱ´úÂë 003032£©

È«¹ú×Éѯ/ͶËßÈÈÏߣº400-618-4000

´´½¨RDDµÄ2ÖÖ·½·¨¡¾´óÊý¾Ý¿ª·¢¡¿

¸üÐÂʱ¼ä:2022Äê03ÔÂ07ÈÕ18ʱ36·Ö À´Ô´:ÀÖÓãµç¾º ä¯ÀÀ´ÎÊý:

Spark RDD ±à³ÌµÄ³ÌÐòÈë¿Ú¶ÔÏóÊÇSparkContext¶ÔÏó(²»ÂÛºÎÖÖ±à³ÌÓïÑÔ)¡£Ö»Óй¹½¨³öSparkContext, »ùÓÚËü²ÅÄÜÖ´ÐкóÐøµÄAPIµ÷ÓúͼÆËã ¡£±¾ÖÊÉÏ, SparkContext¶Ô±à³ÌÀ´Ëµ, Ö÷Òª¹¦ÄܾÍÊÇ´´½¨µÚÒ»¸öRDD³öÀ´¡£

RDDµÄ´´½¨¿ÉÒÔͨ¹ý2ÖÖ·½Ê½£¬ ͨ¹ý²¢Ðл¯¼¯ºÏ´´½¨( ±¾µØ¶ÔÏóת·Ö²¼Ê½RDD )ºÍͨ¹ý¶ÁÈ¡ÍⲿÊý¾ÝÔ´( ¶ÁÈ¡Îļþ)´´½¨¡£

1.²¢Ðл¯´´½¨

²¢Ðл¯´´½¨ÊÇÖ¸½«±¾µØ¼¯ºÏתÏò·Ö²¼Ê½RDD,ÕâÒ»²½µÄ´´½¨ÊÇ·Ö²¼Ê½µÄ¿ª¶Ë£¬½«±¾µØ¼¯ºÏת»¯Îª·Ö²¼Ê½¼¯ºÏ¡£

APIÈçÏÂ

rdd=sparkcontext.parallelize(²ÎÊý1£¬²ÎÊý2)
#²ÎÊý1¼¯ºÏ¶ÔÏó¼´¿É£¬±ÈÈçlist
#²ÎÊý2·ÖÇøÊý
ÍêÕû´úÂ룺
# coding: utf8

from pyspark import SparkConf, SparkContext

if __name__ = '__main__': 
	# e.¹¹½¨SparkÖ´Ðл·¾³
	conf = SparkConf().setAppName("create rdd").\
	    setMaster("local[*]"]
	sc = SparkContext(conf = conf)

# sc¶ÔÏóµÄparallelize·½·¨£¬ ¿ÉÒÔ½«±¾µØ¼¯ºÏת»»³ÉRDD·µ»Ø¸øÄã
	data = [1, 2, 3, 4, 5, 6, 7, 8, 9]
	rdd = sc.parallelize(data, numSlices = 3)
	
	print(rdd.collect())

»ñÈ¡RDD·ÖÇøÊý·µ»ØÖµÊÇIntÊý×Ö£ºgetNumPartitions API

Ó÷¨
rdd.getNumPartitions()

2.¶ÁÈ¡Îļþ´´½¨

textFile API

Õâ¸öAPI¿ÉÒÔ¶ÁÈ¡±¾µØÊý¾Ý£¬Ò²¿ÉÒÔ¶ÁÈ¡hdfsÊý¾Ý

ʹÓ÷½·¨£º

sparkcontext.textFile(²ÎÊý1£¬²ÎÊý2)
#²ÎÊý1£¬±ØÌÎļþ·¾¶Ö§³Ö±¾µØÎļþÖ§³ÖHDFSÒ²Ö§³ÖһЩ±ÈÈçS3ЭÒé
#²ÎÊý2£¬¿ÉÑ¡£¬±íʾ×îС·ÖÇøÊýÁ¿¡£
#×¢Ò⣺²ÎÊý2»°ÓïȨ²»×㣬sparkÓÐ×Ô¼ºµÄÅжÏ£¬ÔÚËüÔÊÐíµÄ·¶Î§ÄÚ£¬²ÎÊý2ÓÐЧ¹û£¬³¬³ösparkÔÊÐíµÄ·¶Î§£¬²ÎÊý2ʧЧ
ÍêÕû´úÂë
1f __nane__ = '__main__:
   # B.¹¹½¨SparkÖ´Ðл·¾³
   conf = SparkConf().setAppNane("create rdd").\
      setMaster("local[*]")
   sc = SparkContext(conf=conf)
   # textFile API ¶ÁÈ¡Îļþ
   rdd = sc.textFile(".…/data/words.txt", 1000)
   print(rdd.getNumPartitions())

   rdd2 = sc.textFile("hdfs://nodel:8020/input/words.txt", 1888)
   #×îС·ÖÇøÊý¸øÁË1060£¬µ«ÊÇʵ¼Ê¾Í¿ªÁË85¸ö£¬ sparkûÓÐÀí»áÄãÒªÇó×îÉÙ1008µÄÒªÇ󣬶øÊǾ¡ÊǶ࿪¡£
   print(rdd2.getNumPartitions())
   
   print(rdd2.collect())

×¢Ò⣺textFile³ý·ÇÓкÜÃ÷È·µÄÖ¸ÏòÐÔ£¬Ò»°ãÇé¿öÏ£¬ÎÒÃDz»ÊÇÖ¸·ÖÇø²ÎÊý¡£

¶ÁÈ¡ÎļþµÄAPI£¬ÓиöСÎļþ¶ÁȡרÓó¡¾°£ºÊʺ϶Áȡһ¶ÑСÎļþ
Ó÷¨£º

 sparkcontext.wholeTextFiles(²ÎÊý1,²ÎÊý2)
#²ÎÊý1£¬±ØÌÎļþ·¾¶Ö§³Ö±¾µØÎļþÖ§³ÖHDFSÒ²Ö§³ÖһЩ±ÈÈçS3ЭÒé
#²ÎÊý2£¬¿ÉÑ¡£¬±íʾ×îС·ÖÇøÊýÁ¿¡£
#×¢Ò⣺²ÎÊý2»°ÓïȨ²»×㣬Õâ¸öAPI·ÖÇøÊýÁ¿×î¶àÒ²Ö»ÄÜ¿ªµ½ÎļþÊýÁ¿

Õâ¸öAPIÆ«ÏòÓÚÉÙÁ¿·ÖÇø¶ÁÈ¡Êý¾Ý,ÒòΪÕâ¸öAPI±íÃ÷ÁË×Ô¼ºÊÇСÎļþ¶ÁȡרÓã¬ÄÇôÎļþµÄÊý¾ÝºÜС¡£·ÖÇøºÜ¶à£¬µ¼ÖÂshuffleµÄ¼¸Âʸü¸ß.ËùÒÔ¾¡Á¿ÉÙ·ÖÇø¶ÁÈ¡Êý¾Ý¡£






²ÂÄãϲ»¶£º

RDDÔÚSparkÖÐÊÇÈçºÎÔËÐеÄ£¿

DataFrameÊÇʲôÒâ˼?ÓëRDDÏà±ÈÓÐÄÄЩÓŵ㣿

RDDΪʲôҪ½øÐÐÊý¾Ý³Ö¾Ã»¯£¿³Ö¾Ã»¯²Ù×÷²½Öè

Á½ÖÖRDDµÄÒÀÀµ¹ØÏµ½éÉÜ

ÀÖÓãµç¾ºpython+´óÊý¾Ý¿ª·¢Åàѵ¿Î³Ì

0 ·ÖÏíµ½£º
ºÍÎÒÃÇÔÚÏß½»Ì¸£¡
¡¾ÍøÕ¾µØÍ¼¡¿¡¾sitemap¡¿