The Data Science and Engineering with Spark XSeries, created in partnership with Databricks, will teach students how to perform data science and data engineering at scale using Spark, a cluster computing system well-suited for large-scale machine learning tasks. It will also present an integrated view of data processing by highlighting the various components of data analysis pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation. Students will gain hands-on experience building and debugging Spark applications. Internal details of Spark and distributed machine learning algorithms will be covered, which will provide students with intuition about working with big data and developing code for a distributed environment.
This XSeries requires a programming background and experience with Python (or the ability to learn it quickly). All exercises will use PySpark (the Python API for Spark), but previous experience with Spark or distributed computing is NOT required. Familiarity with basic machine learning concepts and exposure to algorithms, probability, linear algebra and calculus are prerequisites for two of the courses in this series.
What You’ll Learn
How to use Spark and its libraries to solve big data problems
How to approach large scale data science and engineering problems
Spark’s APIs, architecture and many internal details
The trade-offs between communication and computation in a distributed environment
Use cases for Spark
Courses in this XSeries Program
This series is ideally taken in sequence, but each course can be taken individually.
Learn the fundamentals and architecture of Spark, the leading cluster-computing framework among professionals.
Starts on April 14, 2016
Learn how to apply data science techniques using parallel programming in Spark to explore big data.
Starts on May 19, 2016
Learn the underlying principles required to develop scalable machine learning pipelines and gain hands-on experience using Spark.
Learn how to develop and deploy distributed machine leaning pipelines and gain the expertise to write efficient, scalable code in Spark.
Starts on August 2016
Learn common Spark use cases and take a deeper dive into Spark’s architecture and APIs.
Starts on October 2016
Instructors
Anthony D. Joseph
Professor in Electrical Engineering and Computer ScienceUniversity of California, Berkeley
Ameet Talwalkar
Assistant Professor of Computer ScienceUniversity of California, Los Angeles
Jon Bates
Spark InstructorDatabricks
In-depth Knowledge… Certified.
An XSeries is a group of courses that add up to a rich understanding of an area of study. As you learn, you prove you have the knowledge with the Verified Certificates you earn in each course. Once you pass the entire series, receive a personalized XSeries Certificate that shows you put in the work, understand the material, and you have a shareable certificate to prove it!