Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

Here's how I would learn data engineering basics in 2025: - Find a data source you care about (examples: gaming APIs, stock market, web scraping, etc) - Use Python to interact and ingest your source. Initially just write the data to a CSV. - Setup an account with Snowflake... show more

Zach Wilson

50,936 subscribers

20,368 просмотров • 1 год назад •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

Комментарии: 5

Фото профиля Nina Taft

Nina Taft1 год назад

Great advise, I would also encourage folks who just started and setting up own infra to setup billing alerts to a)don’t get unexpected bills b)get a legit biz sense in the price of data manipulation

Фото профиля Mobile Scanner

Mobile Scanner1 год назад

Scan any documents, convert images into text, PDF files, etc. 👍

Фото профиля prodigal son

prodigal son1 год назад

Going to incorporate this with my cloud dev studies to add some level of functionality to my pipelines

Фото профиля Row Skilli-Vitzky

Row Skilli-Vitzky1 год назад

What of doing all this with Airbyte , Snowflake and dbtFusion for Ingestion, Storage and Orchestration respectively I’ve been curious to know your thoughts on the future of loading data with python especially as these tools get cheaper

Фото профиля Fran

Fran1 год назад

Gonna try this

Похожие видео

Here’s how I would learn data engineering in 2025: 1. The basics: - learn SQL — SELECT, FROM, WHERE, GROUP BY, JOIN, HAVING, etc - learn Python — data structures: objects, arrays, tuples, namedtuples — algorithms: recursion, loops 2. Intermediate - learn distributed compute — pick up PySpark or Snowflake or BigQuery - learn data make architecture — pick up iceberg or delta lake - learn job orchestration — pick up Airflow or Mage - learn data quality — pick up Great expectations 3. Advanced - learn the data modeling techniques — one big table vs kimball vs Inmon vs data vault techniques - learn machine learning features and vector databases — pick up pinecone and how to fine tune LLMs with high quality data My newsletter has a deeper roadmap here:

Here’s how I would learn data engineering in 2025: 1. The basics: - learn SQL — SELECT, FROM, WHERE, GROUP BY, JOIN, HAVING, etc - learn Python — data structures: objects, arrays, tuples, namedtuples — algorithms: recursion, loops 2. Intermediate - learn distributed compute — pick up PySpark or Snowflake or BigQuery - learn data make architecture — pick up iceberg or delta lake - learn job orchestration — pick up Airflow or Mage - learn data quality — pick up Great expectations 3. Advanced - learn the data modeling techniques — one big table vs kimball vs Inmon vs data vault techniques - learn machine learning features and vector databases — pick up pinecone and how to fine tune LLMs with high quality data My newsletter has a deeper roadmap here:

Zach Wilson

29,420 просмотров • 1 год назад

We just launched a major new Data Engineering Professional Certificate on Coursera! Data underlies all modern AI systems, and engineers who know how to build systems to store and serve it are in high demand. If you're interested in learning this skill, please check out this 4-course sequence, which is designed to make you job-ready to be a Data Engineer. This is a new specialization taught by Joe Reis, the co-author of the best-selling book “Fundamentals of Data Engineering," in collaboration with AWS. (Disclosure, I serve on Amazon's board.) For many AI systems, data engineering is 80% of the work, and modeling is 20%. But people’s attention on these two topics is often flipped. This makes the job of the data engineer particularly important. In this professional certificate, you'll learn foundational data engineering skills while implementing modern data architectures using open-source tools: - Learn the key steps of the data lifecycle, to generate, ingest, store, transform, and serve data. - Learn to align with organizational goals to design the data pipeline right for your business' needs. - Understand how to make necessary trade-offs between speed, scalability, security, and cost. Joe has distilled into this specialization decades of experience helping startups and large companies with data infrastructure. He is also joined by 17 other industry leaders in the data field, who will help you learn in-demand skills for the growing field of data engineering. Please sign up here:

We just launched a major new Data Engineering Professional Certificate on Coursera! Data underlies all modern AI systems, and engineers who know how to build systems to store and serve it are in high demand. If you're interested in learning this skill, please check out this 4-course sequence, which is designed to make you job-ready to be a Data Engineer. This is a new specialization taught by Joe Reis, the co-author of the best-selling book “Fundamentals of Data Engineering," in collaboration with AWS. (Disclosure, I serve on Amazon's board.) For many AI systems, data engineering is 80% of the work, and modeling is 20%. But people’s attention on these two topics is often flipped. This makes the job of the data engineer particularly important. In this professional certificate, you'll learn foundational data engineering skills while implementing modern data architectures using open-source tools: - Learn the key steps of the data lifecycle, to generate, ingest, store, transform, and serve data. - Learn to align with organizational goals to design the data pipeline right for your business' needs. - Understand how to make necessary trade-offs between speed, scalability, security, and cost. Joe has distilled into this specialization decades of experience helping startups and large companies with data infrastructure. He is also joined by 17 other industry leaders in the data field, who will help you learn in-demand skills for the growing field of data engineering. Please sign up here:

Andrew Ng

118,937 просмотров • 1 год назад

Enrollment is now open for the Data Engineering Professional Certificate! Data engineers are the architects of modern organizations, ensuring data is reliable, accessible, and ready for analytics and machine learning. This professional certificate is tailored to equip you with the critical skills, through frameworks and hands-on practice, to excel in this role. Taught by industry expert Joe Reis, co-author of the best-selling book "Fundamentals of Data Engineering," along with 17 guest instructors from the data field, you will gain expertise to start and further your career in the high-demand field of data engineering. Key focus areas: 🗂️ Data Engineering Lifecycle: Learn the important stages of building an efficient data pipeline that creates business value. 📥 Data Ingestion: Learn how to efficiently gather data from various sources. 💾 Data Storage: Master the techniques for storing data securely and cost-effectively. 🔄 Data Transformation: Understand how to clean, organize, and prepare data for analysis and machine learning. 🏗️ Data Architecture Design: Build robust architectures that support scalable, efficient data workflows. 📊 Serving Data: Ensure that data is available to stakeholders when and where they need it to drive business decisions. Enroll now!

Enrollment is now open for the Data Engineering Professional Certificate! Data engineers are the architects of modern organizations, ensuring data is reliable, accessible, and ready for analytics and machine learning. This professional certificate is tailored to equip you with the critical skills, through frameworks and hands-on practice, to excel in this role. Taught by industry expert Joe Reis, co-author of the best-selling book "Fundamentals of Data Engineering," along with 17 guest instructors from the data field, you will gain expertise to start and further your career in the high-demand field of data engineering. Key focus areas: 🗂️ Data Engineering Lifecycle: Learn the important stages of building an efficient data pipeline that creates business value. 📥 Data Ingestion: Learn how to efficiently gather data from various sources. 💾 Data Storage: Master the techniques for storing data securely and cost-effectively. 🔄 Data Transformation: Understand how to clean, organize, and prepare data for analysis and machine learning. 🏗️ Data Architecture Design: Build robust architectures that support scalable, efficient data workflows. 📊 Serving Data: Ensure that data is available to stakeholders when and where they need it to drive business decisions. Enroll now!

DeepLearning.AI

20,833 просмотров • 1 год назад

Import large data sets in seconds! With Appwrite’s new CSV imports feature, you can import documents into your Appwrite collections using a simple CSV file. And the best part, you don’t even need a custom script. Check out this video to learn how it works.

Import large data sets in seconds! With Appwrite’s new CSV imports feature, you can import documents into your Appwrite collections using a simple CSV file. And the best part, you don’t even need a custom script. Check out this video to learn how it works.

Appwrite

44,969 просмотров • 1 год назад

Today, we’re excited to announce and launch Reflex Cloud on Product Hunt! Reflex is an open-source framework for building and deploying data and AI web apps in pure Python. Frontend and Backend in Pure Python: No JavaScript required! With Reflex Cloud, you can now deploy, manage, and scale your Python apps with just a single command! If you're a Python developer, an upvote or a share would mean a lot to us :)💪❤️

Today, we’re excited to announce and launch Reflex Cloud on Product Hunt! Reflex is an open-source framework for building and deploying data and AI web apps in pure Python. Frontend and Backend in Pure Python: No JavaScript required! With Reflex Cloud, you can now deploy, manage, and scale your Python apps with just a single command! If you're a Python developer, an upvote or a share would mean a lot to us :)💪❤️

Reflex

55,024 просмотров • 1 год назад

I got a smart meter recently and saw you can download a HDF file with the data so I had the idea of writing a script that could parse that and show the data in a useful manner. However I discovered that someone has already done this and done a really good job on it. The video explains how it works. In simple terms it uses the ESB smart meter data and shows a breakdown of how much data you are using and when and also recommends plans and estimates what each one would cost based on the data. The tool is available at easier to read on Laptop or Tablet or then your phone to Landscape.

I got a smart meter recently and saw you can download a HDF file with the data so I had the idea of writing a script that could parse that and show the data in a useful manner. However I discovered that someone has already done this and done a really good job on it. The video explains how it works. In simple terms it uses the ESB smart meter data and shows a breakdown of how much data you are using and when and also recommends plans and estimates what each one would cost based on the data. The tool is available at easier to read on Laptop or Tablet or then your phone to Landscape.

Carlow Weather

298,521 просмотров • 2 лет назад

there is a game called "data center" on steam which let's you build and manage your own data center. this is lowkey genius, the best way to educate people on a new trait. hyperscalers should learn a thing or two from "edutainment".

there is a game called "data center" on steam which let's you build and manage your own data center. this is lowkey genius, the best way to educate people on a new trait. hyperscalers should learn a thing or two from "edutainment".

P.M

7,333,046 просмотров • 5 месяцев назад

I made an AI data science team with Python. I'm excited to share my new project, the AI Data Science Team, a Python package for creating AI co-pilots and agents to assist in common data science tasks. I walk you through the GitHub page, showcasing the progress and potential of this project. I'll explain how these agents can streamline tasks like code generation and modeling pipeline creation, making data science projects more efficient. Introduction to AI Data Science Team Project Integrating Human-in-the-Loop Example: Accessing Agents for Feature Engineering Setting up Feature Engineering Agent Code On GitHub: Python AI-Tips Newsletter (stay updated as I build this project in the open):

I made an AI data science team with Python. I'm excited to share my new project, the AI Data Science Team, a Python package for creating AI co-pilots and agents to assist in common data science tasks. I walk you through the GitHub page, showcasing the progress and potential of this project. I'll explain how these agents can streamline tasks like code generation and modeling pipeline creation, making data science projects more efficient. Introduction to AI Data Science Team Project Integrating Human-in-the-Loop Example: Accessing Agents for Feature Engineering Setting up Feature Engineering Agent Code On GitHub: Python AI-Tips Newsletter (stay updated as I build this project in the open):

Matt Dancho (Business Science)

11,799 просмотров • 1 год назад

I'm excited to introduce my FREE AI Pandas Data Analyst Copilot which created a data analysis report with dozens of charts from my questions in under 30 seconds. Today, I'll share with you how to automate data analysis with my Pandas AI Agent + Copilot, which is available on GitHub. I'll guide you through setting up the Copilot app, creating dozens of data analysis charts from any CSV or Excel file, and interacting with your data live. This AI is a BIG help! Table of Contents: 00:00 Introduction to the App 02:24 Setting Up the App 04:25 Using the App 08:58 Understanding the Python Code Github to AI Data Science Team (app is in the apps folder): Get the Code and Future Updates by Joining my Python AI/ML Tips Newsletter: === Want to learn how to build AI projects companies actually want? (live Python Code) On Wednesday, April 9th, I'm sharing one of my best AI Projects: Time Series Forecasting with AI Register here (500 Seats):

I'm excited to introduce my FREE AI Pandas Data Analyst Copilot which created a data analysis report with dozens of charts from my questions in under 30 seconds. Today, I'll share with you how to automate data analysis with my Pandas AI Agent + Copilot, which is available on GitHub. I'll guide you through setting up the Copilot app, creating dozens of data analysis charts from any CSV or Excel file, and interacting with your data live. This AI is a BIG help! Table of Contents: 00:00 Introduction to the App 02:24 Setting Up the App 04:25 Using the App 08:58 Understanding the Python Code Github to AI Data Science Team (app is in the apps folder): Get the Code and Future Updates by Joining my Python AI/ML Tips Newsletter: === Want to learn how to build AI projects companies actually want? (live Python Code) On Wednesday, April 9th, I'm sharing one of my best AI Projects: Time Series Forecasting with AI Register here (500 Seats):

Matt Dancho (Business Science)

12,736 просмотров • 1 год назад

How to Create a Spinning Globe 🌍 with Your Own Data Layers (No Coding Required) Learn how to create a spinning 3D globe map without writing any code using a simple web-based tool. In this step-by-step geospatial tutorial, you'll see how to build an interactive spinning globe, overlay raster and vector data layers, and visualize datasets directly in your browser or in a Jupyter Notebook environment. Web-based demo: Jupyter notebook demo: Video tutorial: #geospatial #opensource #python #jupyter #maplibre

How to Create a Spinning Globe 🌍 with Your Own Data Layers (No Coding Required) Learn how to create a spinning 3D globe map without writing any code using a simple web-based tool. In this step-by-step geospatial tutorial, you'll see how to build an interactive spinning globe, overlay raster and vector data layers, and visualize datasets directly in your browser or in a Jupyter Notebook environment. Web-based demo: Jupyter notebook demo: Video tutorial: #geospatial #opensource #python #jupyter #maplibre

Qiusheng Wu

12,704 просмотров • 4 месяцев назад

Making Data and AI Lovable. Lovable now integrates with Databricks, providing a natural language interface that allows anyone—regardless of technical skills—to build live data apps can read and write data stored in Databricks. Bridge the gap between complex data engineering and beautiful, functional front-ends.

Making Data and AI Lovable. Lovable now integrates with Databricks, providing a natural language interface that allows anyone—regardless of technical skills—to build live data apps can read and write data stored in Databricks. Bridge the gap between complex data engineering and beautiful, functional front-ends.

Databricks

34,315 просмотров • 3 месяцев назад

New JavaScript short course: Build a full-stack web application that uses RAG in JavaScript RAG Web Apps with LlamaIndex, taught by Laurie Voss, VP of Developer Relations at LlamaIndex 🦙 and npm co-founder. - Build a RAG application for querying your own data - Develop tools to interact with multiple data sources using an agent that intelligently selects the right tool for your queries - Create a full-stack web app that can chat with your data - Dig further into production-ready techniques, like how to persist your data so you aren’t constantly reindexing, and try the create-llama command line tool from LlamaIndex You can sign up here:

New JavaScript short course: Build a full-stack web application that uses RAG in JavaScript RAG Web Apps with LlamaIndex, taught by Laurie Voss, VP of Developer Relations at LlamaIndex 🦙 and npm co-founder. - Build a RAG application for querying your own data - Develop tools to interact with multiple data sources using an agent that intelligently selects the right tool for your queries - Create a full-stack web app that can chat with your data - Dig further into production-ready techniques, like how to persist your data so you aren’t constantly reindexing, and try the create-llama command line tool from LlamaIndex You can sign up here:

Andrew Ng

218,284 просмотров • 2 лет назад

Is your JSON data getting hung up with trailing commas or incorrect data types? Just ask GitHub Copilot Chat what’s wrong and how to fix it 🛠️ Learn more in the Copilot Chat Cookbook.

Is your JSON data getting hung up with trailing commas or incorrect data types? Just ask GitHub Copilot Chat what’s wrong and how to fix it 🛠️ Learn more in the Copilot Chat Cookbook.

GitHub

29,212 просмотров • 1 год назад

Your agents can't keep up with real-time data. Especially when it's scattered across dozens of sources. Most teams waste weeks building custom connectors for every database, API, and data warehouse. Then they build ETL pipelines to sync everything. By the time your agent retrieves the data, it's already outdated. Picture this: Your Postgres database updated 5 minutes ago. Your MongoDB collection changed 2 minutes ago. Your agent is still pulling from yesterday's snapshot. This is why most production RAG systems fail. There's a better approach: MindsDB is an open-source AI platform with a federated data engine that lets you query multiple data sources in real-time using SQL - without moving any data. Here's what makes it different: ↳ Your data stays in place. No ETL pipelines or data duplication ↳ Query Postgres, MongoDB, REST APIs, and more using consistent SQL ↳ JOIN across different sources in real-time with a unified interface ↳ Works with both structured and un-structured data And here's the best part: You don't even need to write SQL. Just describe what you want in plain English, and MindsDB converts it to SQL automatically. The system does all the heavy lifting. The breakthrough for AI agents is simple: When data updates at the source, your agent gets fresh results immediately. No sync delays. No stale embeddings. No custom code for each integration. You can literally write a SQL query that joins a Postgres table with a MongoDB collection and gets live results. This is what production AI applications need but rarely get. In this video, I give you a complete walkthrough of what we just discussed and how to actually do it. Make sure you watch this till the end. I've shared the link to MindsDB's GitHub repo in the next tweet!

Your agents can't keep up with real-time data. Especially when it's scattered across dozens of sources. Most teams waste weeks building custom connectors for every database, API, and data warehouse. Then they build ETL pipelines to sync everything. By the time your agent retrieves the data, it's already outdated. Picture this: Your Postgres database updated 5 minutes ago. Your MongoDB collection changed 2 minutes ago. Your agent is still pulling from yesterday's snapshot. This is why most production RAG systems fail. There's a better approach: MindsDB is an open-source AI platform with a federated data engine that lets you query multiple data sources in real-time using SQL - without moving any data. Here's what makes it different: ↳ Your data stays in place. No ETL pipelines or data duplication ↳ Query Postgres, MongoDB, REST APIs, and more using consistent SQL ↳ JOIN across different sources in real-time with a unified interface ↳ Works with both structured and un-structured data And here's the best part: You don't even need to write SQL. Just describe what you want in plain English, and MindsDB converts it to SQL automatically. The system does all the heavy lifting. The breakthrough for AI agents is simple: When data updates at the source, your agent gets fresh results immediately. No sync delays. No stale embeddings. No custom code for each integration. You can literally write a SQL query that joins a Postgres table with a MongoDB collection and gets live results. This is what production AI applications need but rarely get. In this video, I give you a complete walkthrough of what we just discussed and how to actually do it. Make sure you watch this till the end. I've shared the link to MindsDB's GitHub repo in the next tweet!

Akshay 🚀

65,672 просмотров • 8 месяцев назад

Knowledge graphs for representing information are unbeatable. After this, you will never build a RAG system without knowledge graphs. It will take you five lines of code to build a knowledge graph with your data. I recorded a video to show you how you can do this. I used Cognee, an open-source library that outperforms any basic vector search approach in terms of retrieval relevance. They are collaborating with me on this post. Cognee is: • Easy to use • Reduces hallucinations • Open-source Here is a link to the repository: They also offer a comprehensive platform and UI with Python notebooks you can utilize to manage your data. Here is the link:

Knowledge graphs for representing information are unbeatable. After this, you will never build a RAG system without knowledge graphs. It will take you five lines of code to build a knowledge graph with your data. I recorded a video to show you how you can do this. I used Cognee, an open-source library that outperforms any basic vector search approach in terms of retrieval relevance. They are collaborating with me on this post. Cognee is: • Easy to use • Reduces hallucinations • Open-source Here is a link to the repository: They also offer a comprehensive platform and UI with Python notebooks you can utilize to manage your data. Here is the link:

Santiago

125,928 просмотров • 10 месяцев назад

Urban planners spend hours wrangling GIS data for stakeholder presentation, analysis, and reviews. Unify your city planning data in Google Earth with new SHP imports. Securely combine zoning data, environmental constraints, and property boundaries as performant, cloud-native data layers to create a single source of truth for your entire team. Upload your SHP files to Google Earth today.

Urban planners spend hours wrangling GIS data for stakeholder presentation, analysis, and reviews. Unify your city planning data in Google Earth with new SHP imports. Securely combine zoning data, environmental constraints, and property boundaries as performant, cloud-native data layers to create a single source of truth for your entire team. Upload your SHP files to Google Earth today.

Google Earth

20,526 просмотров • 2 месяцев назад

Imitation learning is great, but needs us to have (near) optimal data. We throw away most other data (failures, evaluation data, suboptimal data, undirected play data), even though this data can be really useful and way cheaper! In our new work - RISE, we show a simple way to *use all of this non-optimal data to robustify imitation learning* with minimal requirements beyond BC. Key idea: use non-expert data to learn how to *recover* back to expert data with a minimal frills offline RL that works under sparse data coverage. Allows usage of *all* available data, not just expert data - never throw your data away! Paper: Website: A 🧵(1/10)

Imitation learning is great, but needs us to have (near) optimal data. We throw away most other data (failures, evaluation data, suboptimal data, undirected play data), even though this data can be really useful and way cheaper! In our new work - RISE, we show a simple way to use all of this non-optimal data to robustify imitation learning with minimal requirements beyond BC. Key idea: use non-expert data to learn how to recover back to expert data with a minimal frills offline RL that works under sparse data coverage. Allows usage of all available data, not just expert data - never throw your data away! Paper: Website: A 🧵(1/10)

Abhishek Gupta

20,612 просмотров • 8 месяцев назад

Major program launch: Data Analytics Professional Certificate! This large, five-course sequence takes you all the way to being job-ready as a data analyst, and shows how to use Generative AI as a thought partner to enhance your work in this role. Offered by on Coursera, this is taught by Sean Barnes, Ph.D., a Data Science & Engineering Leader at Netflix. Analyzing data remains one of the most important skills in where the world is going with AI. This comprehensive certificate takes you all the way to being job-ready. Each course comes with practical projects demonstrated in real-world contexts, such as analyzing sales data for a Korean bakery, video game sales trends across different regions, or identifying factors impacting customer retention for a communications company. You'll also work on estimating fire distribution for forest fire prevention, analyzing how a diamond's properties affect its market value, and developing predictive models for retail sales analysis, carbon emissions, and coral reef conservation. Here's some of what you'll learn: - How to define data and categorize it into its many types such as discrete & continuous numerical, structured & unstructured, time series, categorical, and know what insights can be derived from the different types of data categories. - How to differentiate between data-related job roles and their responsibilities, and how data flows through an organization from the moment of capture to decision-making. - How to perform data processing functions and apply conditional formatting in spreadsheets to extract business value from your data using statistical calculations and best practices for visualizing and interpreting data. - How to use LLMs for stakeholder analysis, data exploration, and data visualization. - Best practices for using LLMs for as a thought partner to data analysis work By the end of this professional certificate program, you will have learned core statistical concepts, analysis techniques, and visualization methodologies that will serve as the foundation for working as a data analyst. The world needs more data analysts, especially ones who know how to use modern generative AI. With data science roles projected to grow 36% by 2033, the skills taught in this program create new professional opportunities in data. Sign up here!

Major program launch: Data Analytics Professional Certificate! This large, five-course sequence takes you all the way to being job-ready as a data analyst, and shows how to use Generative AI as a thought partner to enhance your work in this role. Offered by on Coursera, this is taught by Sean Barnes, Ph.D., a Data Science & Engineering Leader at Netflix. Analyzing data remains one of the most important skills in where the world is going with AI. This comprehensive certificate takes you all the way to being job-ready. Each course comes with practical projects demonstrated in real-world contexts, such as analyzing sales data for a Korean bakery, video game sales trends across different regions, or identifying factors impacting customer retention for a communications company. You'll also work on estimating fire distribution for forest fire prevention, analyzing how a diamond's properties affect its market value, and developing predictive models for retail sales analysis, carbon emissions, and coral reef conservation. Here's some of what you'll learn: - How to define data and categorize it into its many types such as discrete & continuous numerical, structured & unstructured, time series, categorical, and know what insights can be derived from the different types of data categories. - How to differentiate between data-related job roles and their responsibilities, and how data flows through an organization from the moment of capture to decision-making. - How to perform data processing functions and apply conditional formatting in spreadsheets to extract business value from your data using statistical calculations and best practices for visualizing and interpreting data. - How to use LLMs for stakeholder analysis, data exploration, and data visualization. - Best practices for using LLMs for as a thought partner to data analysis work By the end of this professional certificate program, you will have learned core statistical concepts, analysis techniques, and visualization methodologies that will serve as the foundation for working as a data analyst. The world needs more data analysts, especially ones who know how to use modern generative AI. With data science roles projected to grow 36% by 2033, the skills taught in this program create new professional opportunities in data. Sign up here!

Andrew Ng

84,686 просмотров • 1 год назад