Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

Here’s how I would learn data engineering in 2025: 1. The basics: - learn SQL — SELECT, FROM, WHERE, GROUP BY, JOIN, HAVING, etc - learn Python — data structures: objects, arrays, tuples, namedtuples — algorithms: recursion, loops 2. Intermediate - learn distributed compute — pick up PySpark or... Snowflake or BigQuery - learn data make architecture — pick up iceberg or delta lake - learn job orchestration — pick up Airflow or Mage - learn data quality — pick up Great expectations 3. Advanced - learn the data modeling techniques — one big table vs kimball vs Inmon vs data vault techniques - learn machine learning features and vector databases — pick up pinecone and how to fine tune LLMs with high quality data My newsletter has a deeper roadmap here:show more

Zach Wilson

50,747 subscribers

29,164 görüntüleme • 11 ay önce •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

9 Yorum

vingança da rainha anne profil fotoğrafı

vingança da rainha anne11 ay önce

Roadmap

Ayobami.Ola profil fotoğrafı

Ayobami.Ola11 ay önce

Roadmap

W0lfgeng profil fotoğrafı

W0lfgeng11 ay önce

Roadmap

Martin Shein profil fotoğrafı

Martin Shein11 ay önce

Roadmap

asa of tech profil fotoğrafı

asa of tech11 ay önce

Roadmap

4Eyed Ìfẹ́luyì Asojú profil fotoğrafı

4Eyed Ìfẹ́luyì Asojú11 ay önce

Roadmap

esp profil fotoğrafı

esp11 ay önce

Roadmap

Azningnam profil fotoğrafı

Azningnam11 ay önce

Roadmap

Kshiteej Pitta profil fotoğrafı

Kshiteej Pitta11 ay önce

Roadmap

Benzer Videolar

Here's how I would learn data engineering basics in 2025: - Find a data source you care about (examples: gaming APIs, stock market, web scraping, etc) - Use Python to interact and ingest your source. Initially just write the data to a CSV. - Setup an account with Snowflake or Google BigQuery. - update your Python script to load a table in Snowflake/BigQuery - schedule your script with CRON in the cloud with a service like Heroku. - build aggregations and visualizations on top of your ingested data Only thing this misses is data quality and complex job orchestration which you can learn later! How would you learn data engineering nowadays?

Here's how I would learn data engineering basics in 2025: - Find a data source you care about (examples: gaming APIs, stock market, web scraping, etc) - Use Python to interact and ingest your source. Initially just write the data to a CSV. - Setup an account with Snowflake or Google BigQuery. - update your Python script to load a table in Snowflake/BigQuery - schedule your script with CRON in the cloud with a service like Heroku. - build aggregations and visualizations on top of your ingested data Only thing this misses is data quality and complex job orchestration which you can learn later! How would you learn data engineering nowadays?

Zach Wilson

20,363 görüntüleme • 11 ay önce

Is your JSON data getting hung up with trailing commas or incorrect data types? Just ask GitHub Copilot Chat what’s wrong and how to fix it 🛠️ Learn more in the Copilot Chat Cookbook.

Is your JSON data getting hung up with trailing commas or incorrect data types? Just ask GitHub Copilot Chat what’s wrong and how to fix it 🛠️ Learn more in the Copilot Chat Cookbook.

GitHub

29,212 görüntüleme • 1 yıl önce

Effective Table Data Extraction from PDF without LLM Sparrow Parse helps to read tabular data from PDFs, relying on various libraries, such as Unstructured or PyMuPDF4LLM. This allows us to avoid data hallucination errors often produced by LLMs when processing complex data structures. Learn more: ✅ ✅ Katana

Effective Table Data Extraction from PDF without LLM Sparrow Parse helps to read tabular data from PDFs, relying on various libraries, such as Unstructured or PyMuPDF4LLM. This allows us to avoid data hallucination errors often produced by LLMs when processing complex data structures. Learn more: ✅ ✅ Katana

Andrej Baranovskij

27,886 görüntüleme • 2 yıl önce

Sharing our latest short course: Building and Evaluating Data Agents, created in collaboration with Snowflake and taught by Anupam Datta (Anupam Datta) and Josh Reini (Josh Reini). A data agent extracts data from sources such as files or databases, analyzes it, and provides insights and visualizes its findings. But most data agents struggle with reliability or can't handle multi-step reasoning. In this course, you'll learn to build, trace, and evaluate a multi-agent workflow that plans tasks, pulls context from structured and unstructured data, performs web search, and summarizes or visualizes the final results. Learn more and enroll for free!

Sharing our latest short course: Building and Evaluating Data Agents, created in collaboration with Snowflake and taught by Anupam Datta (Anupam Datta) and Josh Reini (Josh Reini). A data agent extracts data from sources such as files or databases, analyzes it, and provides insights and visualizes its findings. But most data agents struggle with reliability or can't handle multi-step reasoning. In this course, you'll learn to build, trace, and evaluate a multi-agent workflow that plans tasks, pulls context from structured and unstructured data, performs web search, and summarizes or visualizes the final results. Learn more and enroll for free!

DeepLearning.AI

40,745 görüntüleme • 8 ay önce

We just launched a major new Data Engineering Professional Certificate on Coursera! Data underlies all modern AI systems, and engineers who know how to build systems to store and serve it are in high demand. If you're interested in learning this skill, please check out this 4-course sequence, which is designed to make you job-ready to be a Data Engineer. This is a new specialization taught by Joe Reis, the co-author of the best-selling book “Fundamentals of Data Engineering," in collaboration with AWS. (Disclosure, I serve on Amazon's board.) For many AI systems, data engineering is 80% of the work, and modeling is 20%. But people’s attention on these two topics is often flipped. This makes the job of the data engineer particularly important. In this professional certificate, you'll learn foundational data engineering skills while implementing modern data architectures using open-source tools: - Learn the key steps of the data lifecycle, to generate, ingest, store, transform, and serve data. - Learn to align with organizational goals to design the data pipeline right for your business' needs. - Understand how to make necessary trade-offs between speed, scalability, security, and cost. Joe has distilled into this specialization decades of experience helping startups and large companies with data infrastructure. He is also joined by 17 other industry leaders in the data field, who will help you learn in-demand skills for the growing field of data engineering. Please sign up here:

We just launched a major new Data Engineering Professional Certificate on Coursera! Data underlies all modern AI systems, and engineers who know how to build systems to store and serve it are in high demand. If you're interested in learning this skill, please check out this 4-course sequence, which is designed to make you job-ready to be a Data Engineer. This is a new specialization taught by Joe Reis, the co-author of the best-selling book “Fundamentals of Data Engineering," in collaboration with AWS. (Disclosure, I serve on Amazon's board.) For many AI systems, data engineering is 80% of the work, and modeling is 20%. But people’s attention on these two topics is often flipped. This makes the job of the data engineer particularly important. In this professional certificate, you'll learn foundational data engineering skills while implementing modern data architectures using open-source tools: - Learn the key steps of the data lifecycle, to generate, ingest, store, transform, and serve data. - Learn to align with organizational goals to design the data pipeline right for your business' needs. - Understand how to make necessary trade-offs between speed, scalability, security, and cost. Joe has distilled into this specialization decades of experience helping startups and large companies with data infrastructure. He is also joined by 17 other industry leaders in the data field, who will help you learn in-demand skills for the growing field of data engineering. Please sign up here:

Andrew Ng

118,937 görüntüleme • 1 yıl önce

Pick up the phone, you could learn something new

Pick up the phone, you could learn something new

Niamocha🐻✨

67,907 görüntüleme • 1 yıl önce

Shut the fuck up, watch, learn and maybe you might pick up a thing or two. 145kilos x 4 at 87kilos bw.

Shut the fuck up, watch, learn and maybe you might pick up a thing or two. 145kilos x 4 at 87kilos bw.

blaakbaki

351,749 görüntüleme • 2 ay önce

Announcing The Open Data Market by Stork Where onchain innovation is no longer suppressed by access to data. p.s. learn how to participate below 👀

Announcing The Open Data Market by Stork Where onchain innovation is no longer suppressed by access to data. p.s. learn how to participate below 👀

Stork

45,039 görüntüleme • 1 yıl önce

Someone made the most addictive game to learn how real data centers work.

Someone made the most addictive game to learn how real data centers work.

mitsuri

20,294 görüntüleme • 2 ay önce

Starting today, the FBI will publish reported crime data to the Crime Data Explorer every month. See the newest data and learn how this timelier data can help law enforcement crush violent crime:

Starting today, the FBI will publish reported crime data to the Crime Data Explorer every month. See the newest data and learn how this timelier data can help law enforcement crush violent crime:

FBI

22,485 görüntüleme • 9 ay önce

Learn to train an LLM with distributed data while ensuring privacy using federated learning in a new two-part short course, Intro to Federated Learning and Federated Fine-tuning of LLMs with Private Data, created with Flower and taught by Daniel J. Beutel and nic lane. Federated learning allows a single model to be trained across multiple devices, such as phones, or multiple organizations, such as hospitals, without the need to share data to a central server. This two-part course gives you an introduction to federated learning, and then teaches you how to fine-tune your large language model with distributed data using Flower Lab’s open source federated learning framework. You’ll learn: - How to use federated learning to train a variety of models, ranging from speech and vision models to LLMs, across distributed data while offering data privacy options to users and organizations. - Privacy Enhancing Technologies like differential privacy (DP), which obscures individual data by adding calibrated noise to query results. - Two variants of differential privacy - Central and Local - and how to choose depending on your use case. - How to measure and decrease bandwidth usage to make federated learning more practical and efficient with techniques like using pre-trained models and Parameter-Efficient Fine-Tuning - How federated LLM fine-tuning reduces the risk of leaking training data. Sign up here!

Learn to train an LLM with distributed data while ensuring privacy using federated learning in a new two-part short course, Intro to Federated Learning and Federated Fine-tuning of LLMs with Private Data, created with Flower and taught by Daniel J. Beutel and nic lane. Federated learning allows a single model to be trained across multiple devices, such as phones, or multiple organizations, such as hospitals, without the need to share data to a central server. This two-part course gives you an introduction to federated learning, and then teaches you how to fine-tune your large language model with distributed data using Flower Lab’s open source federated learning framework. You’ll learn: - How to use federated learning to train a variety of models, ranging from speech and vision models to LLMs, across distributed data while offering data privacy options to users and organizations. - Privacy Enhancing Technologies like differential privacy (DP), which obscures individual data by adding calibrated noise to query results. - Two variants of differential privacy - Central and Local - and how to choose depending on your use case. - How to measure and decrease bandwidth usage to make federated learning more practical and efficient with techniques like using pre-trained models and Parameter-Efficient Fine-Tuning - How federated LLM fine-tuning reduces the risk of leaking training data. Sign up here!

Andrew Ng

64,538 görüntüleme • 1 yıl önce

🚀 Framer Forms are here! Enjoy 10+ input types, custom states, and secure data handling. Send data via email, Google Sheets, or custom Webhook. Learn more below.

🚀 Framer Forms are here! Enjoy 10+ input types, custom states, and secure data handling. Send data via email, Google Sheets, or custom Webhook. Learn more below.

Framer

100,853 görüntüleme • 1 yıl önce

Learn why oracles need low latency data 🔮

Learn why oracles need low latency data 🔮

Pyth Network 🔮

24,981 görüntüleme • 2 yıl önce

Announcing the expansion of Microsoft Sentinel with a modern data lake and dynamic threat intel capabilities. Learn how it: 💰Optimizes costs 📈Simplifies data management ⚡Accelerates AI adoption

Announcing the expansion of Microsoft Sentinel with a modern data lake and dynamic threat intel capabilities. Learn how it: 💰Optimizes costs 📈Simplifies data management ⚡Accelerates AI adoption

Microsoft Security

28,901 görüntüleme • 10 ay önce

Dear Data Analyst, learn how to display percentages on a column chart in Power BI 🙉 DAX formula used: %MaritalStatus = DIVIDE( COUNTROWS('customer data'), CALCULATE(COUNTROWS('customer data'), ALL('customer data'[Marital_Status])))

Dear Data Analyst, learn how to display percentages on a column chart in Power BI 🙉 DAX formula used: %MaritalStatus = DIVIDE( COUNTROWS('customer data'), CALCULATE(COUNTROWS('customer data'), ALL('customer data'[Marital_Status])))

Zehida

22,557 görüntüleme • 1 yıl önce

DATA AVAILABILITY SAMPLING EXPLAINED! learn about -Modular Blockchains -probabilistic sampling techniques -Erasure Encoding Full vid below👇

DATA AVAILABILITY SAMPLING EXPLAINED! learn about -Modular Blockchains -probabilistic sampling techniques -Erasure Encoding Full vid below👇

gogoDiego

93,106 görüntüleme • 2 yıl önce

AI Czar David Sacks says American companies will learn efficiency techniques from China's DeepSeek AI model, but big AI data centers are still needed and scaling the biggest data centers is still an advantage

AI Czar David Sacks says American companies will learn efficiency techniques from China's DeepSeek AI model, but big AI data centers are still needed and scaling the biggest data centers is still an advantage

Tsarathustra

360,750 görüntüleme • 1 yıl önce

Building Data Pipelines has levels to it: - level 0 Understand the basic flow: Extract → Transform → Load (ETL) or ELT This is the foundation. - Extract: Pull data from sources (APIs, DBs, files) - Transform: Clean, filter, join, or enrich the data - Load: Store into a warehouse or lake for analysis You’re not a data engineer until you’ve scheduled a job to pull CSVs off an SFTP server at 3AM! level 1 Master the tools: - Airflow for orchestration - dbt for transformations - Spark or PySpark for big data - Snowflake, BigQuery, Redshift for warehouses - Kafka or Kinesis for streaming Understand when to batch vs stream. Most companies think they need real-time data. They usually don’t. level 2 Handle complexity with modular design: - DAGs should be atomic, idempotent, and parameterized - Use task dependencies and sensors wisely - Break transformations into layers (staging → clean → marts) - Design for failure recovery. If a step fails, how do you re-run it? From scratch or just that part? Learn how to backfill without breaking the world. level 3 Data quality and observability: - Add tests for nulls, duplicates, and business logic - Use tools like Great Expectations, Monte Carlo, or built-in dbt tests - Track lineage so you know what downstream will break if upstream changes Know the difference between: - a late-arriving dimension - a broken SCD2 - and a pipeline silently dropping rows At this level, you understand that reliability > cleverness. level 4 Build for scale and maintainability: - Version control your pipeline configs - Use feature flags to toggle behavior in prod - Push vs pull architecture - Decouple compute and storage (e.g. Iceberg and Delta Lake) - Data mesh, data contracts, streaming joins, and CDC are words you throw around because you know how and when to use them. What else belongs in the journey to mastering data pipelines?

Building Data Pipelines has levels to it: - level 0 Understand the basic flow: Extract → Transform → Load (ETL) or ELT This is the foundation. - Extract: Pull data from sources (APIs, DBs, files) - Transform: Clean, filter, join, or enrich the data - Load: Store into a warehouse or lake for analysis You’re not a data engineer until you’ve scheduled a job to pull CSVs off an SFTP server at 3AM! level 1 Master the tools: - Airflow for orchestration - dbt for transformations - Spark or PySpark for big data - Snowflake, BigQuery, Redshift for warehouses - Kafka or Kinesis for streaming Understand when to batch vs stream. Most companies think they need real-time data. They usually don’t. level 2 Handle complexity with modular design: - DAGs should be atomic, idempotent, and parameterized - Use task dependencies and sensors wisely - Break transformations into layers (staging → clean → marts) - Design for failure recovery. If a step fails, how do you re-run it? From scratch or just that part? Learn how to backfill without breaking the world. level 3 Data quality and observability: - Add tests for nulls, duplicates, and business logic - Use tools like Great Expectations, Monte Carlo, or built-in dbt tests - Track lineage so you know what downstream will break if upstream changes Know the difference between: - a late-arriving dimension - a broken SCD2 - and a pipeline silently dropping rows At this level, you understand that reliability > cleverness. level 4 Build for scale and maintainability: - Version control your pipeline configs - Use feature flags to toggle behavior in prod - Push vs pull architecture - Decouple compute and storage (e.g. Iceberg and Delta Lake) - Data mesh, data contracts, streaming joins, and CDC are words you throw around because you know how and when to use them. What else belongs in the journey to mastering data pipelines?

Zach Wilson

16,688 görüntüleme • 1 yıl önce

Make data work smarter. 💡💪 Storage Browser for #AmazonS3 in Salesforce Data Cloud gives teams direct access to customer data, right where they need it. Learn more ➡️

Make data work smarter. 💡💪 Storage Browser for #AmazonS3 in Salesforce Data Cloud gives teams direct access to customer data, right where they need it. Learn more ➡️

Amazon Web Services

13,077 görüntüleme • 1 yıl önce