Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

Getting a big tech data engineer job in 2016: - do you know SQL? - yes - here’s $500k Getting a big tech data engineer job in 2024: - do you know Spark, Kafka, Iceberg? - yes - did you shake hands with Bill Inmon when he invented the...

43,547 Aufrufe • vor 1 Jahr •via X (Twitter)

5 Kommentare

Profilbild von DataProfessorX
DataProfessorXvor 1 Jahr

People still underestimate the AI effect, In a couple years more, it will become more clear

Profilbild von MicroSectors
MicroSectorsvor 3 Jahren

Are you a sophisticated investor that is bullish on big tech stocks? Learn more about $BULZ here:

Profilbild von XyzzyPoof
XyzzyPoofvor 1 Jahr

4) the y401B Visa workers are paid 50% or less of what you are

Profilbild von Zach Morris Wilson
Zach Morris Wilsonvor 1 Jahr

@SeamansRC No they aren’t 😂

Profilbild von silverlightwa
silverlightwavor 1 Jahr

Damn you are still grifting?

Ähnliche Videos

We just launched a major new Data Engineering Professional Certificate on Coursera! Data underlies all modern AI systems, and engineers who know how to build systems to store and serve it are in high demand. If you're interested in learning this skill, please check out this 4-course sequence, which is designed to make you job-ready to be a Data Engineer. This is a new specialization taught by Joe Reis, the co-author of the best-selling book “Fundamentals of Data Engineering," in collaboration with AWS. (Disclosure, I serve on Amazon's board.) For many AI systems, data engineering is 80% of the work, and modeling is 20%. But people’s attention on these two topics is often flipped. This makes the job of the data engineer particularly important. In this professional certificate, you'll learn foundational data engineering skills while implementing modern data architectures using open-source tools: - Learn the key steps of the data lifecycle, to generate, ingest, store, transform, and serve data. - Learn to align with organizational goals to design the data pipeline right for your business' needs. - Understand how to make necessary trade-offs between speed, scalability, security, and cost. Joe has distilled into this specialization decades of experience helping startups and large companies with data infrastructure. He is also joined by 17 other industry leaders in the data field, who will help you learn in-demand skills for the growing field of data engineering. Please sign up here:

Andrew Ng

118,937 Aufrufe • vor 1 Jahr

Building Data Pipelines has levels to it: - level 0 Understand the basic flow: Extract → Transform → Load (ETL) or ELT This is the foundation. - Extract: Pull data from sources (APIs, DBs, files) - Transform: Clean, filter, join, or enrich the data - Load: Store into a warehouse or lake for analysis You’re not a data engineer until you’ve scheduled a job to pull CSVs off an SFTP server at 3AM! level 1 Master the tools: - Airflow for orchestration - dbt for transformations - Spark or PySpark for big data - Snowflake, BigQuery, Redshift for warehouses - Kafka or Kinesis for streaming Understand when to batch vs stream. Most companies think they need real-time data. They usually don’t. level 2 Handle complexity with modular design: - DAGs should be atomic, idempotent, and parameterized - Use task dependencies and sensors wisely - Break transformations into layers (staging → clean → marts) - Design for failure recovery. If a step fails, how do you re-run it? From scratch or just that part? Learn how to backfill without breaking the world. level 3 Data quality and observability: - Add tests for nulls, duplicates, and business logic - Use tools like Great Expectations, Monte Carlo, or built-in dbt tests - Track lineage so you know what downstream will break if upstream changes Know the difference between: - a late-arriving dimension - a broken SCD2 - and a pipeline silently dropping rows At this level, you understand that reliability > cleverness. level 4 Build for scale and maintainability: - Version control your pipeline configs - Use feature flags to toggle behavior in prod - Push vs pull architecture - Decouple compute and storage (e.g. Iceberg and Delta Lake) - Data mesh, data contracts, streaming joins, and CDC are words you throw around because you know how and when to use them. What else belongs in the journey to mastering data pipelines?

Zach Wilson

16,688 Aufrufe • vor 1 Jahr