正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Submodular optimization for token/sentence selection from long contexts. Here's an interesting exp: first used jina-embeddings-v4's multi-vector feature to extract token-level embeddings from a passage, then applied submodular optimization to cherry-pick the tokens that provide the best coverage, finally call tokenizer and convert selections back to the strings at their... show more

Jina AI

17,297 subscribers

13,173 次观看 • 11 个月前 •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

3 条评论

Jina AI 的头像

Jina AI11 个月前

Try it on Google Colab's L4 GPU for free: This could be an interesting approach for extracting information from long documents, saving tokens for LLMs, etc. Check out our recent blog posts and learn more about submodular optimization.

Richard Collins, The Internet Foundation 的头像

Richard Collins, The Internet Foundation11 个月前

Can you scale to replace Google? Put your AI on it and put in the numbers. Back of the envelope or "in a spreadsheet" is better than "in your head somewhere as an idea only". It might be easier than you think now. If the whole Internet is coded as it goes in, not scraped and indexed and tokenized later - completely separated from the authors, without their permission or help. Check my writing on "global open tokens" where all tokens are linked to the real things in the world - not arbitrary strings of characters in one language. Using universal (global) tokens means "the sun", "the earth", "water" and those are independent of human language so ties things together. Yes, choose the things that matter, keep it lean and sufficient and sustainable, not shotgun or brute force, only for people with big computers. For all humans, not just a few. Richard Collins, The Internet Foundation

Franck Lebeau 的头像

Franck Lebeau11 个月前

interesting how "Late chucking" is condensed into "lateing" (tokens "late" + "##ing"). As I understand it, it means that the semantic of "chuncking" (tokens "chunck"+"#ing") is mainly supported by the contextualized embedding of the "#ing".

相关视频

I built a Pinterest clone that uses AI to find similar images I crawled tumblr and collected lots of images, then used a model to get vector embeddings. When you click an image it finds the most similar embeddings and returns the images

I built a Pinterest clone that uses AI to find similar images I crawled tumblr and collected lots of images, then used a model to get vector embeddings. When you click an image it finds the most similar embeddings and returns the images

ab

15,330 次观看 • 2 年前

ELON: AI COULD MAKE DISCOVERIES BY REASONING FROM FIRST PRINCIPLES “You can think of intelligence like a compression problem. For any given field, it can be physics, or any field, you can distill things down to their most basic axiomatic elements, the things that are extremely likely to be true. And then, you can reason up from there to reach a cogent conclusion. I think if you apply that sort of first-principles compression to a field, and then you decompress it and contest the results against reality, then AI is capable of innovation and discovering new physics.” Source: Town Hall in Philadelphia, October 2024

ELON: AI COULD MAKE DISCOVERIES BY REASONING FROM FIRST PRINCIPLES “You can think of intelligence like a compression problem. For any given field, it can be physics, or any field, you can distill things down to their most basic axiomatic elements, the things that are extremely likely to be true. And then, you can reason up from there to reach a cogent conclusion. I think if you apply that sort of first-principles compression to a field, and then you decompress it and contest the results against reality, then AI is capable of innovation and discovering new physics.” Source: Town Hall in Philadelphia, October 2024

Mario Nawfal

41,306 次观看 • 1 年前

3Blue1Brown’s new video explains why every LLM is actually a compression machine. everyone describes pre-training as “next token prediction” but that’s just the surface-level objective. in reality it is a means to making the most efficient text compressor. prediction and compression are two sides of the same coin. when you train the model to predict the next token you’re not just teaching it to guess the next word but how to best encode the human knowledge it sees. better compression means better abstraction means better reasoning at some point, compression stops looking like storage or a database (as some like to call it on X) and looks like an approximation of understanding.

3Blue1Brown’s new video explains why every LLM is actually a compression machine. everyone describes pre-training as “next token prediction” but that’s just the surface-level objective. in reality it is a means to making the most efficient text compressor. prediction and compression are two sides of the same coin. when you train the model to predict the next token you’re not just teaching it to guess the next word but how to best encode the human knowledge it sees. better compression means better abstraction means better reasoning at some point, compression stops looking like storage or a database (as some like to call it on X) and looks like an approximation of understanding.

ℏεsam

119,233 次观看 • 24 天前

Some footage I kept to myself from Ansem last time.fun stream. "What made you buy 3% of your own Ansem token" "I didn't own that much of it - most of the fees I make from the app, it made sense to just put that back to the token." "If I'm going to be active on the app, then obviously I think everythings going to go a lot higher than here" Ansem’s minutes ca - HAs8hvTB8ZH6dBG26KQGik4fxitNYi41jnYd49bvtime His timedotfun wallet -

Some footage I kept to myself from Ansem last time.fun stream. "What made you buy 3% of your own Ansem token" "I didn't own that much of it - most of the fees I make from the app, it made sense to just put that back to the token." "If I'm going to be active on the app, then obviously I think everythings going to go a lot higher than here" Ansem’s minutes ca - HAs8hvTB8ZH6dBG26KQGik4fxitNYi41jnYd49bvtime His timedotfun wallet -

Bruce

28,096 次观看 • 11 个月前

Supabase can be used as a vector database! This means that you can perform a semantic search against Supabase! This allows you to create RAG apps or content recommendation engines on top of Supabase! Learn what embeddings are, and how you can use them 👇

Supabase can be used as a vector database! This means that you can perform a semantic search against Supabase! This allows you to create RAG apps or content recommendation engines on top of Supabase! Learn what embeddings are, and how you can use them 👇

Tyler Shukert

39,359 次观看 • 1 年前

I finally built a web scraper that uses AI! It scrapes the entire text of a site, and then uses AI to extract the parts that you want. You can add an infinite number of custom prompts, and then it returns a csv. I need some beta testers. Who's in?

I finally built a web scraper that uses AI! It scrapes the entire text of a site, and then uses AI to extract the parts that you want. You can add an infinite number of custom prompts, and then it returns a csv. I need some beta testers. Who's in?

Adrian | The Web Scraping Guy

94,344 次观看 • 2 年前

Elon Musk explains why AI could be capable of discovering new physics. “You can think of intelligence like a compression problem. For any given field, it can be physics, or any field, you can distill things down to their most basic axiomatic elements, the things that are extremely likely to be true. And then, you can reason up from there to reach a cogent conclusion. I think if you apply that sort of first-principles compression to a field, and then you decompress it and contest the results against reality, then AI is capable of innovation and discovering new physics.” Town Hall in Philadelphia, October 18, 2024

Elon Musk explains why AI could be capable of discovering new physics. “You can think of intelligence like a compression problem. For any given field, it can be physics, or any field, you can distill things down to their most basic axiomatic elements, the things that are extremely likely to be true. And then, you can reason up from there to reach a cogent conclusion. I think if you apply that sort of first-principles compression to a field, and then you decompress it and contest the results against reality, then AI is capable of innovation and discovering new physics.” Town Hall in Philadelphia, October 18, 2024

ELON CLIPS

1,054,036 次观看 • 1 年前

TOKENFI LAUNCHPAD IS OFFICIALLY LIVE ON MAINNET TokenFi Launchpad is a decentralized launchpad for projects that want to raise funds for their crypto tokens. It is powered by $TOKEN as its main utility token on the BNB and ETH chains, with a 2% fee charged on funds raised by every project, 50% of which is used to buy and burn $TOKEN, making it perpetually deflationary. The very first project to go live on TokenFi Launchpad is the YakDAO token sale, and it is live now. You can find information on how to participate in the YakDAO token sale here:

TOKENFI LAUNCHPAD IS OFFICIALLY LIVE ON MAINNET TokenFi Launchpad is a decentralized launchpad for projects that want to raise funds for their crypto tokens. It is powered by $TOKEN as its main utility token on the BNB and ETH chains, with a 2% fee charged on funds raised by every project, 50% of which is used to buy and burn $TOKEN, making it perpetually deflationary. The very first project to go live on TokenFi Launchpad is the YakDAO token sale, and it is live now. You can find information on how to participate in the YakDAO token sale here:

TokenFi

113,700 次观看 • 2 年前

If you can get to the path of totality for the #solareclipse2024 you should. It is literally a night and day difference from 99% to 100% coverage. It is a life-changing experience. Here's a look at the path of 100% totality through the middle of the U.S.

If you can get to the path of totality for the #solareclipse2024 you should. It is literally a night and day difference from 99% to 100% coverage. It is a life-changing experience. Here's a look at the path of 100% totality through the middle of the U.S.

Brad Panovich

1,526,479 次观看 • 2 年前

Every token tries to replicate Bitcoin without understanding the fundamental problem that it solved. What level of confidence would you need to have in a digital network to store your life savings? What level of confidence would you need to have in a form of digital money to trade your time and labor for it? My talk from Nashville at Bitcoin2024:

Every token tries to replicate Bitcoin without understanding the fundamental problem that it solved. What level of confidence would you need to have in a digital network to store your life savings? What level of confidence would you need to have in a form of digital money to trade your time and labor for it? My talk from Nashville at Bitcoin2024:

Anil ⚡

11,499 次观看 • 1 年前

LLMs can make sense of retrieved context because of how transformers work. In one of the lessons from the Retrieval Augmented Generation (RAG) course, we unpack how LLMs process augmented prompts using token embeddings, positional vectors, and multi-head attention. Understanding these internals helps you design more reliable and efficient RAG systems. Watch the breakdown and keep learning how to build production-ready RAG systems in this course, taught by Zain:

LLMs can make sense of retrieved context because of how transformers work. In one of the lessons from the Retrieval Augmented Generation (RAG) course, we unpack how LLMs process augmented prompts using token embeddings, positional vectors, and multi-head attention. Understanding these internals helps you design more reliable and efficient RAG systems. Watch the breakdown and keep learning how to build production-ready RAG systems in this course, taught by Zain:

DeepLearning.AI

11,500 次观看 • 11 个月前

Screaming Frog now has the power to perform N-gram analysis! This step by step tutorial shows you how to use the new N-gram reports: Screaming Frog's latest update was a big one. Along with the ability to perform ChatGPT crawls, they also released a feature that analyzes the N-grams of a page. N-grams are sequences of words that can be used to find semantic concepts or relevant queries. The N-gram analysis feature can be used in a variety of different ways. You can use it better understand the optimization of a page, your site's topical authority or even using them to find internal linking opportunities. Here's the process:

Screaming Frog now has the power to perform N-gram analysis! This step by step tutorial shows you how to use the new N-gram reports: Screaming Frog's latest update was a big one. Along with the ability to perform ChatGPT crawls, they also released a feature that analyzes the N-grams of a page. N-grams are sequences of words that can be used to find semantic concepts or relevant queries. The N-gram analysis feature can be used in a variety of different ways. You can use it better understand the optimization of a page, your site's topical authority or even using them to find internal linking opportunities. Here's the process:

Chris Long

17,795 次观看 • 2 年前

We think text-to-image AI is pretty interesting, so here's text-to-BIM! It won’t “create a museum in the style of Zaha Hadid.” (Yet.) But you can describe your building and get an editable 3D model in return. Coming soon from Hypar !

We think text-to-image AI is pretty interesting, so here's text-to-BIM! It won’t “create a museum in the style of Zaha Hadid.” (Yet.) But you can describe your building and get an editable 3D model in return. Coming soon from Hypar !

Hypar

28,752 次观看 • 3 年前

🚨BREAKING: Bitboy sources confirm in a Fox News interview that the SEC and Ripple have reached a settlement! #XRP is expected to see a big price increase. The top DeFi token on the XRPL, CTF token, could rise from 0.97 XRP per token to 374.25 XRP per token. The CTF token only needs a $10 billion market cap to make this jump from 0.97 XRP to 374.25 XRP. This sudden spike is likely due to a supply shock, as CTF token has a total supply of only 120 million tokens, the lowest among the top 10 tokens on XRPL. $CTF token is a gold mine! CTF token trade link on XRPL: Website: *always DYOR

🚨BREAKING: Bitboy sources confirm in a Fox News interview that the SEC and Ripple have reached a settlement! #XRP is expected to see a big price increase. The top DeFi token on the XRPL, CTF token, could rise from 0.97 XRP per token to 374.25 XRP per token. The CTF token only needs a $10 billion market cap to make this jump from 0.97 XRP to 374.25 XRP. This sudden spike is likely due to a supply shock, as CTF token has a total supply of only 120 million tokens, the lowest among the top 10 tokens on XRPL. $CTF token is a gold mine! CTF token trade link on XRPL: Website: *always DYOR

𝓐𝓶𝓮𝓵𝓲𝓮

102,965 次观看 • 2 年前

The first version of the Fondant Face Proxy Normals is feature complete, so here's a closer look at how it is used (character by Joce ). First, align the armature to the head. If you want to attach to an existing rig, make sure the Root bone is exactly placed on the head bone. Adjust it to fit the face with the alignment bone layers. Use the debug material to help see what you are doing. Then use the other bones to adjust the shape. You can attach it to an existing rig by parenting the root bone to the head bone, or even merging the proxy rig into it (2nd clip.) The Normals rig can be posed on the fly for different shots. Since the setup actually uses a mesh object to control the curves (via Geometry Nodes), you can also easily make shapekeys on the proxy to correspond to expression keys, and then drive them (3rd clip.)

The first version of the Fondant Face Proxy Normals is feature complete, so here's a closer look at how it is used (character by Joce ). First, align the armature to the head. If you want to attach to an existing rig, make sure the Root bone is exactly placed on the head bone. Adjust it to fit the face with the alignment bone layers. Use the debug material to help see what you are doing. Then use the other bones to adjust the shape. You can attach it to an existing rig by parenting the root bone to the head bone, or even merging the proxy rig into it (2nd clip.) The Normals rig can be posed on the fly for different shots. Since the setup actually uses a mesh object to control the curves (via Geometry Nodes), you can also easily make shapekeys on the proxy to correspond to expression keys, and then drive them (3rd clip.)

aVersionOfReality

11,348 次观看 • 1 年前

Did you know: Scroll is a zk rollup compatible with Ethereum at the bytecode-level and now you can mint your ticket to the conference using your wallet easily: Just select Scroll 📜 as the network in the ticket form and you're good to go! You can even pick your favorite token for the transaction Link in replies 🧵

Did you know: Scroll is a zk rollup compatible with Ethereum at the bytecode-level and now you can mint your ticket to the conference using your wallet easily: Just select Scroll 📜 as the network in the ticket form and you're good to go! You can even pick your favorite token for the transaction Link in replies 🧵

ETHDubai

14,845 次观看 • 2 年前

This is, by far, one of the best uses of modern AI. If you don't use embeddings when querying your database, you are definitely leaving a lot on the table. In this video, I'll show you how to run semantic searches using OpenAI and PostgreSQL. It's all thanks to Pgai, an open-source PostgreSQL extension: Here's what will happen: 1. We'll create a simple table with news articles 2. We'll generate embeddings for those articles 3. We'll run queries on top of those embeddings For this video, I generated the embeddings using a simple query, but pgai Vectorizer would do the same automatically as new information makes it into the database. This is awesome! If you have a PostgreSQL database with data you are searching over, you should start experimenting with semantic searches immediately. For most use cases, a combination of full-text search + semantic search is the best approach. If you don't have a PostgreSQL database around, you can try free for 30 days using Timescale: Thanks to the Timescale (now TigerData) team for partnering with me on this post!

This is, by far, one of the best uses of modern AI. If you don't use embeddings when querying your database, you are definitely leaving a lot on the table. In this video, I'll show you how to run semantic searches using OpenAI and PostgreSQL. It's all thanks to Pgai, an open-source PostgreSQL extension: Here's what will happen: 1. We'll create a simple table with news articles 2. We'll generate embeddings for those articles 3. We'll run queries on top of those embeddings For this video, I generated the embeddings using a simple query, but pgai Vectorizer would do the same automatically as new information makes it into the database. This is awesome! If you have a PostgreSQL database with data you are searching over, you should start experimenting with semantic searches immediately. For most use cases, a combination of full-text search + semantic search is the best approach. If you don't have a PostgreSQL database around, you can try free for 30 days using Timescale: Thanks to the Timescale (now TigerData) team for partnering with me on this post!

Santiago

109,517 次观看 • 1 年前