Spark Streaming with Python

Photo by Michael Dziedzic on UnsplashWhat is Spark Streaming?Spark Streaming is an extension of the core Spark api that enables scalable, high throughput ,fault-tolerant stream processing of live data streamsData can be ingested from many sources like …


This content originally appeared on Level Up Coding - Medium and was authored by Amit Kumar Manjhi

Photo by Michael Dziedzic on Unsplash

What is Spark Streaming?

  • Spark Streaming is an extension of the core Spark api that enables scalable, high throughput ,fault-tolerant stream processing of live data streams
  • Data can be ingested from many sources like Kafka, Flume, Kinesis or TCP sockets, and can be processed during complex algorithms expressed with high-level functions like map, reduce ,join and reduce, join and reduce and window.
credit: spark.apache.org
  • Internally, Spark Streaming receives live input data streams and divides the data into batches, which are then processed by the Spark engine to generate the final stream of results in batches.
Spark-streaming flow: credit spark.apache.org
  • Let’s understand the Spark Streaming through one simple example (We will be building a very simple application that connects to a local stream of a data (an open terminal) through a socket connection.It will then count the words for each line that we type in.)

The steps for streaming will be :

  • Create a SparkContext
  • Create a StreamingContext
  • Create a Socket Text Stream
  • Read in the lines as a “DStream”

The steps for working with the data:

  • Split the input lines into a list of words
  • Map each word to a tuple
  • Then group (reduce) the tuples by the word (key) and sum up the second argument (the number one)

Note: RDD syntax relies heavily on lambda expressions

  • Here is complete code with output 👇🏻

Output of above example:

Spark-Streaming

Note: My Python version is 3.8.12 ,Spark version is 3.0.1 and Java version “1.8.0_25" .You might face errors due to version mismatch.

Thank you for reading . I appreciate your honest feedback!

Reference:


Spark Streaming with Python was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.


This content originally appeared on Level Up Coding - Medium and was authored by Amit Kumar Manjhi


Print Share Comment Cite Upload Translate Updates
APA

Amit Kumar Manjhi | Sciencx (2022-01-06T14:47:15+00:00) Spark Streaming with Python. Retrieved from https://www.scien.cx/2022/01/06/spark-streaming-with-python/

MLA
" » Spark Streaming with Python." Amit Kumar Manjhi | Sciencx - Thursday January 6, 2022, https://www.scien.cx/2022/01/06/spark-streaming-with-python/
HARVARD
Amit Kumar Manjhi | Sciencx Thursday January 6, 2022 » Spark Streaming with Python., viewed ,<https://www.scien.cx/2022/01/06/spark-streaming-with-python/>
VANCOUVER
Amit Kumar Manjhi | Sciencx - » Spark Streaming with Python. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2022/01/06/spark-streaming-with-python/
CHICAGO
" » Spark Streaming with Python." Amit Kumar Manjhi | Sciencx - Accessed . https://www.scien.cx/2022/01/06/spark-streaming-with-python/
IEEE
" » Spark Streaming with Python." Amit Kumar Manjhi | Sciencx [Online]. Available: https://www.scien.cx/2022/01/06/spark-streaming-with-python/. [Accessed: ]
rf:citation
» Spark Streaming with Python | Amit Kumar Manjhi | Sciencx | https://www.scien.cx/2022/01/06/spark-streaming-with-python/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.