Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

Here’s how I would learn data engineering in 2025: 1. The basics: - learn SQL — SELECT, FROM, WHERE, GROUP BY, JOIN, HAVING, etc - learn Python — data structures: objects, arrays, tuples, namedtuples — algorithms: recursion, loops 2. Intermediate - learn distributed compute — pick up PySpark or...

29,164 görüntüleme • 11 ay önce •via X (Twitter)

9 Yorum

vingança da rainha anne profil fotoğrafı
vingança da rainha anne11 ay önce

Roadmap

Ayobami.Ola profil fotoğrafı
Ayobami.Ola11 ay önce

Roadmap

W0lfgeng profil fotoğrafı
W0lfgeng11 ay önce

Roadmap

Martin Shein profil fotoğrafı
Martin Shein11 ay önce

Roadmap

asa of tech profil fotoğrafı
asa of tech11 ay önce

Roadmap

4Eyed Ìfẹ́luyì Asojú profil fotoğrafı
4Eyed Ìfẹ́luyì Asojú11 ay önce

Roadmap

esp profil fotoğrafı
esp11 ay önce

Roadmap

Azningnam profil fotoğrafı
Azningnam11 ay önce

Roadmap

Kshiteej Pitta profil fotoğrafı
Kshiteej Pitta11 ay önce

Roadmap

Benzer Videolar

We just launched a major new Data Engineering Professional Certificate on Coursera! Data underlies all modern AI systems, and engineers who know how to build systems to store and serve it are in high demand. If you're interested in learning this skill, please check out this 4-course sequence, which is designed to make you job-ready to be a Data Engineer. This is a new specialization taught by Joe Reis, the co-author of the best-selling book “Fundamentals of Data Engineering," in collaboration with AWS. (Disclosure, I serve on Amazon's board.) For many AI systems, data engineering is 80% of the work, and modeling is 20%. But people’s attention on these two topics is often flipped. This makes the job of the data engineer particularly important. In this professional certificate, you'll learn foundational data engineering skills while implementing modern data architectures using open-source tools: - Learn the key steps of the data lifecycle, to generate, ingest, store, transform, and serve data. - Learn to align with organizational goals to design the data pipeline right for your business' needs. - Understand how to make necessary trade-offs between speed, scalability, security, and cost. Joe has distilled into this specialization decades of experience helping startups and large companies with data infrastructure. He is also joined by 17 other industry leaders in the data field, who will help you learn in-demand skills for the growing field of data engineering. Please sign up here:

Andrew Ng

118,937 görüntüleme • 1 yıl önce

Building Data Pipelines has levels to it: - level 0 Understand the basic flow: Extract → Transform → Load (ETL) or ELT This is the foundation. - Extract: Pull data from sources (APIs, DBs, files) - Transform: Clean, filter, join, or enrich the data - Load: Store into a warehouse or lake for analysis You’re not a data engineer until you’ve scheduled a job to pull CSVs off an SFTP server at 3AM! level 1 Master the tools: - Airflow for orchestration - dbt for transformations - Spark or PySpark for big data - Snowflake, BigQuery, Redshift for warehouses - Kafka or Kinesis for streaming Understand when to batch vs stream. Most companies think they need real-time data. They usually don’t. level 2 Handle complexity with modular design: - DAGs should be atomic, idempotent, and parameterized - Use task dependencies and sensors wisely - Break transformations into layers (staging → clean → marts) - Design for failure recovery. If a step fails, how do you re-run it? From scratch or just that part? Learn how to backfill without breaking the world. level 3 Data quality and observability: - Add tests for nulls, duplicates, and business logic - Use tools like Great Expectations, Monte Carlo, or built-in dbt tests - Track lineage so you know what downstream will break if upstream changes Know the difference between: - a late-arriving dimension - a broken SCD2 - and a pipeline silently dropping rows At this level, you understand that reliability > cleverness. level 4 Build for scale and maintainability: - Version control your pipeline configs - Use feature flags to toggle behavior in prod - Push vs pull architecture - Decouple compute and storage (e.g. Iceberg and Delta Lake) - Data mesh, data contracts, streaming joins, and CDC are words you throw around because you know how and when to use them. What else belongs in the journey to mastering data pipelines?

Zach Wilson

16,688 görüntüleme • 1 yıl önce