Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

Submodular optimization for token/sentence selection from long contexts. Here's an interesting exp: first used jina-embeddings-v4's multi-vector feature to extract token-level embeddings from a passage, then applied submodular optimization to cherry-pick the tokens that provide the best coverage, finally call tokenizer and convert selections back to the strings at their... show more

Jina AI

17,314 subscribers

13,177 görüntüleme • 1 yıl önce •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

3 Yorum

Jina AI profil fotoğrafı

Jina AI1 yıl önce

Try it on Google Colab's L4 GPU for free: This could be an interesting approach for extracting information from long documents, saving tokens for LLMs, etc. Check out our recent blog posts and learn more about submodular optimization.

Richard Collins, The Internet Foundation profil fotoğrafı

Richard Collins, The Internet Foundation1 yıl önce

Can you scale to replace Google? Put your AI on it and put in the numbers. Back of the envelope or "in a spreadsheet" is better than "in your head somewhere as an idea only". It might be easier than you think now. If the whole Internet is coded as it goes in, not scraped and indexed and tokenized later - completely separated from the authors, without their permission or help. Check my writing on "global open tokens" where all tokens are linked to the real things in the world - not arbitrary strings of characters in one language. Using universal (global) tokens means "the sun", "the earth", "water" and those are independent of human language so ties things together. Yes, choose the things that matter, keep it lean and sufficient and sustainable, not shotgun or brute force, only for people with big computers. For all humans, not just a few. Richard Collins, The Internet Foundation

Franck Lebeau profil fotoğrafı

Franck Lebeau1 yıl önce

interesting how "Late chucking" is condensed into "lateing" (tokens "late" + "##ing"). As I understand it, it means that the semantic of "chuncking" (tokens "chunck"+"#ing") is mainly supported by the contextualized embedding of the "#ing".

Benzer Videolar

Compressed Nostalgia is a small series that begins with an abstract landscape and continuously applies layers of compression, reshaping the image into a shifting echo of its original form. The visual never settles, drifting further from clarity while still holding traces of where it started. The work reflects a familiar mental state for me: the sense of being close to a memory, almost able to grasp it, but never quite reaching it. The optional sound (your ears will bleed) stays in sync with the compression, reacting to the different transformations happening in the image. Token holders can update their piece on-chain, with the option to adjust the tone or change the palette toward something closer to their own sense of nostalgia. The series consists of 44 outputs, with 22 held in my personal wallet. The work is fully rendered on-chain, with no external dependencies, and can be minted through a small site I built (Link in reply to this post). If you enjoy it and you’d like to mint one, the mint opens today at 8:00 PM CEST.

Compressed Nostalgia is a small series that begins with an abstract landscape and continuously applies layers of compression, reshaping the image into a shifting echo of its original form. The visual never settles, drifting further from clarity while still holding traces of where it started. The work reflects a familiar mental state for me: the sense of being close to a memory, almost able to grasp it, but never quite reaching it. The optional sound (your ears will bleed) stays in sync with the compression, reacting to the different transformations happening in the image. Token holders can update their piece on-chain, with the option to adjust the tone or change the palette toward something closer to their own sense of nostalgia. The series consists of 44 outputs, with 22 held in my personal wallet. The work is fully rendered on-chain, with no external dependencies, and can be minted through a small site I built (Link in reply to this post). If you enjoy it and you’d like to mint one, the mint opens today at 8:00 PM CEST.

rudxane

10,870 görüntüleme • 3 ay önce

3Blue1Brown’s new video explains why every LLM is actually a compression machine. everyone describes pre-training as “next token prediction” but that’s just the surface-level objective. in reality it is a means to making the most efficient text compressor. prediction and compression are two sides of the same coin. when you train the model to predict the next token you’re not just teaching it to guess the next word but how to best encode the human knowledge it sees. better compression means better abstraction means better reasoning at some point, compression stops looking like storage or a database (as some like to call it on X) and looks like an approximation of understanding.

3Blue1Brown’s new video explains why every LLM is actually a compression machine. everyone describes pre-training as “next token prediction” but that’s just the surface-level objective. in reality it is a means to making the most efficient text compressor. prediction and compression are two sides of the same coin. when you train the model to predict the next token you’re not just teaching it to guess the next word but how to best encode the human knowledge it sees. better compression means better abstraction means better reasoning at some point, compression stops looking like storage or a database (as some like to call it on X) and looks like an approximation of understanding.

ℏεsam

119,751 görüntüleme • 1 ay önce

Some footage I kept to myself from Ansem last time.fun stream. "What made you buy 3% of your own Ansem token" "I didn't own that much of it - most of the fees I make from the app, it made sense to just put that back to the token." "If I'm going to be active on the app, then obviously I think everythings going to go a lot higher than here" Ansem’s minutes ca - HAs8hvTB8ZH6dBG26KQGik4fxitNYi41jnYd49bvtime His timedotfun wallet -

Some footage I kept to myself from Ansem last time.fun stream. "What made you buy 3% of your own Ansem token" "I didn't own that much of it - most of the fees I make from the app, it made sense to just put that back to the token." "If I'm going to be active on the app, then obviously I think everythings going to go a lot higher than here" Ansem’s minutes ca - HAs8hvTB8ZH6dBG26KQGik4fxitNYi41jnYd49bvtime His timedotfun wallet -

Bruce

28,096 görüntüleme • 1 yıl önce

I finally built a web scraper that uses AI! It scrapes the entire text of a site, and then uses AI to extract the parts that you want. You can add an infinite number of custom prompts, and then it returns a csv. I need some beta testers. Who's in?

I finally built a web scraper that uses AI! It scrapes the entire text of a site, and then uses AI to extract the parts that you want. You can add an infinite number of custom prompts, and then it returns a csv. I need some beta testers. Who's in?

Adrian | The Web Scraping Guy

94,351 görüntüleme • 2 yıl önce

TOKENFI LAUNCHPAD IS OFFICIALLY LIVE ON MAINNET TokenFi Launchpad is a decentralized launchpad for projects that want to raise funds for their crypto tokens. It is powered by $TOKEN as its main utility token on the BNB and ETH chains, with a 2% fee charged on funds raised by every project, 50% of which is used to buy and burn $TOKEN, making it perpetually deflationary. The very first project to go live on TokenFi Launchpad is the YakDAO token sale, and it is live now. You can find information on how to participate in the YakDAO token sale here:

TOKENFI LAUNCHPAD IS OFFICIALLY LIVE ON MAINNET TokenFi Launchpad is a decentralized launchpad for projects that want to raise funds for their crypto tokens. It is powered by $TOKEN as its main utility token on the BNB and ETH chains, with a 2% fee charged on funds raised by every project, 50% of which is used to buy and burn $TOKEN, making it perpetually deflationary. The very first project to go live on TokenFi Launchpad is the YakDAO token sale, and it is live now. You can find information on how to participate in the YakDAO token sale here:

TokenFi

113,700 görüntüleme • 2 yıl önce

Elon Musk explains why AI could be capable of discovering new physics. “You can think of intelligence like a compression problem. For any given field, it can be physics, or any field, you can distill things down to their most basic axiomatic elements, the things that are extremely likely to be true. And then, you can reason up from there to reach a cogent conclusion. I think if you apply that sort of first-principles compression to a field, and then you decompress it and contest the results against reality, then AI is capable of innovation and discovering new physics.” Town Hall in Philadelphia, October 18, 2024

Elon Musk explains why AI could be capable of discovering new physics. “You can think of intelligence like a compression problem. For any given field, it can be physics, or any field, you can distill things down to their most basic axiomatic elements, the things that are extremely likely to be true. And then, you can reason up from there to reach a cogent conclusion. I think if you apply that sort of first-principles compression to a field, and then you decompress it and contest the results against reality, then AI is capable of innovation and discovering new physics.” Town Hall in Philadelphia, October 18, 2024

ELON CLIPS

1,054,036 görüntüleme • 1 yıl önce

If you can get to the path of totality for the #solareclipse2024 you should. It is literally a night and day difference from 99% to 100% coverage. It is a life-changing experience. Here's a look at the path of 100% totality through the middle of the U.S.

If you can get to the path of totality for the #solareclipse2024 you should. It is literally a night and day difference from 99% to 100% coverage. It is a life-changing experience. Here's a look at the path of 100% totality through the middle of the U.S.

Brad Panovich

1,526,479 görüntüleme • 2 yıl önce

Every token tries to replicate Bitcoin without understanding the fundamental problem that it solved. What level of confidence would you need to have in a digital network to store your life savings? What level of confidence would you need to have in a form of digital money to trade your time and labor for it? My talk from Nashville at Bitcoin2024:

Every token tries to replicate Bitcoin without understanding the fundamental problem that it solved. What level of confidence would you need to have in a digital network to store your life savings? What level of confidence would you need to have in a form of digital money to trade your time and labor for it? My talk from Nashville at Bitcoin2024:

Anil ⚡

11,499 görüntüleme • 1 yıl önce

LLMs can make sense of retrieved context because of how transformers work. In one of the lessons from the Retrieval Augmented Generation (RAG) course, we unpack how LLMs process augmented prompts using token embeddings, positional vectors, and multi-head attention. Understanding these internals helps you design more reliable and efficient RAG systems. Watch the breakdown and keep learning how to build production-ready RAG systems in this course, taught by Zain:

LLMs can make sense of retrieved context because of how transformers work. In one of the lessons from the Retrieval Augmented Generation (RAG) course, we unpack how LLMs process augmented prompts using token embeddings, positional vectors, and multi-head attention. Understanding these internals helps you design more reliable and efficient RAG systems. Watch the breakdown and keep learning how to build production-ready RAG systems in this course, taught by Zain:

DeepLearning.AI

11,500 görüntüleme • 11 ay önce

We think text-to-image AI is pretty interesting, so here's text-to-BIM! It won’t “create a museum in the style of Zaha Hadid.” (Yet.) But you can describe your building and get an editable 3D model in return. Coming soon from Hypar !

We think text-to-image AI is pretty interesting, so here's text-to-BIM! It won’t “create a museum in the style of Zaha Hadid.” (Yet.) But you can describe your building and get an editable 3D model in return. Coming soon from Hypar !

Hypar

28,752 görüntüleme • 3 yıl önce

Screaming Frog now has the power to perform N-gram analysis! This step by step tutorial shows you how to use the new N-gram reports: Screaming Frog's latest update was a big one. Along with the ability to perform ChatGPT crawls, they also released a feature that analyzes the N-grams of a page. N-grams are sequences of words that can be used to find semantic concepts or relevant queries. The N-gram analysis feature can be used in a variety of different ways. You can use it better understand the optimization of a page, your site's topical authority or even using them to find internal linking opportunities. Here's the process:

Screaming Frog now has the power to perform N-gram analysis! This step by step tutorial shows you how to use the new N-gram reports: Screaming Frog's latest update was a big one. Along with the ability to perform ChatGPT crawls, they also released a feature that analyzes the N-grams of a page. N-grams are sequences of words that can be used to find semantic concepts or relevant queries. The N-gram analysis feature can be used in a variety of different ways. You can use it better understand the optimization of a page, your site's topical authority or even using them to find internal linking opportunities. Here's the process:

Chris Long

17,795 görüntüleme • 2 yıl önce

The first version of the Fondant Face Proxy Normals is feature complete, so here's a closer look at how it is used (character by Joce ). First, align the armature to the head. If you want to attach to an existing rig, make sure the Root bone is exactly placed on the head bone. Adjust it to fit the face with the alignment bone layers. Use the debug material to help see what you are doing. Then use the other bones to adjust the shape. You can attach it to an existing rig by parenting the root bone to the head bone, or even merging the proxy rig into it (2nd clip.) The Normals rig can be posed on the fly for different shots. Since the setup actually uses a mesh object to control the curves (via Geometry Nodes), you can also easily make shapekeys on the proxy to correspond to expression keys, and then drive them (3rd clip.)

The first version of the Fondant Face Proxy Normals is feature complete, so here's a closer look at how it is used (character by Joce ). First, align the armature to the head. If you want to attach to an existing rig, make sure the Root bone is exactly placed on the head bone. Adjust it to fit the face with the alignment bone layers. Use the debug material to help see what you are doing. Then use the other bones to adjust the shape. You can attach it to an existing rig by parenting the root bone to the head bone, or even merging the proxy rig into it (2nd clip.) The Normals rig can be posed on the fly for different shots. Since the setup actually uses a mesh object to control the curves (via Geometry Nodes), you can also easily make shapekeys on the proxy to correspond to expression keys, and then drive them (3rd clip.)

aVersionOfReality

11,358 görüntüleme • 1 yıl önce

This is, by far, one of the best uses of modern AI. If you don't use embeddings when querying your database, you are definitely leaving a lot on the table. In this video, I'll show you how to run semantic searches using OpenAI and PostgreSQL. It's all thanks to Pgai, an open-source PostgreSQL extension: Here's what will happen: 1. We'll create a simple table with news articles 2. We'll generate embeddings for those articles 3. We'll run queries on top of those embeddings For this video, I generated the embeddings using a simple query, but pgai Vectorizer would do the same automatically as new information makes it into the database. This is awesome! If you have a PostgreSQL database with data you are searching over, you should start experimenting with semantic searches immediately. For most use cases, a combination of full-text search + semantic search is the best approach. If you don't have a PostgreSQL database around, you can try free for 30 days using Timescale: Thanks to the Timescale (now TigerData) team for partnering with me on this post!

This is, by far, one of the best uses of modern AI. If you don't use embeddings when querying your database, you are definitely leaving a lot on the table. In this video, I'll show you how to run semantic searches using OpenAI and PostgreSQL. It's all thanks to Pgai, an open-source PostgreSQL extension: Here's what will happen: 1. We'll create a simple table with news articles 2. We'll generate embeddings for those articles 3. We'll run queries on top of those embeddings For this video, I generated the embeddings using a simple query, but pgai Vectorizer would do the same automatically as new information makes it into the database. This is awesome! If you have a PostgreSQL database with data you are searching over, you should start experimenting with semantic searches immediately. For most use cases, a combination of full-text search + semantic search is the best approach. If you don't have a PostgreSQL database around, you can try free for 30 days using Timescale: Thanks to the Timescale (now TigerData) team for partnering with me on this post!

Santiago

109,517 görüntüleme • 1 yıl önce

Elon Musk explains real AI safety: “The best thing I can come up with for AI safety is to make it a maximum truth-seeking AI, maximally curious. Have its optimization function be to understand the nature of the universe If that is its optimization function, it will actually want to preserve and extend human civilization because we’re just much more interesting than an asteroid with nothing on it” “You definitely don’t want to teach an AI to lie. That is a path to a dystopian future”

Elon Musk explains real AI safety: “The best thing I can come up with for AI safety is to make it a maximum truth-seeking AI, maximally curious. Have its optimization function be to understand the nature of the universe If that is its optimization function, it will actually want to preserve and extend human civilization because we’re just much more interesting than an asteroid with nothing on it” “You definitely don’t want to teach an AI to lie. That is a path to a dystopian future”

X Freeze

15,350 görüntüleme • 5 ay önce

An analyst claims that elites and banks are manipulating XRP's price in order to prevent people from making millions off their $XRP investments. While the world watches the evolution of blockchain, $BXE Token is quietly building the future of media and finance! Powering the BanxChange Media Network, BXE is at the forefront of a decentralized platform where journalism thrives. BXE token fuels a platform that puts the power back in the hands of content creators and consumers. Buy link: $BXE token is powered by where anyone can launch a token in minutes

An analyst claims that elites and banks are manipulating XRP's price in order to prevent people from making millions off their $XRP investments. While the world watches the evolution of blockchain, $BXE Token is quietly building the future of media and finance! Powering the BanxChange Media Network, BXE is at the forefront of a decentralized platform where journalism thrives. BXE token fuels a platform that puts the power back in the hands of content creators and consumers. Buy link: $BXE token is powered by where anyone can launch a token in minutes

Skipper | XRPL

14,111 görüntüleme • 7 ay önce

Novograt (Mike Novogratz) on CNBC talking about Hyperliquid > Each of this ecosystems are working hard to prove their worth. Crypto gave the investment world the idea that community can create value. > The next generation token has to be easy for equity investors to understand. It has to connect utility. > I think of a project like Hyperliquid, a decentralized version of the Nasdaq, of Binance. It's growing like a weed because all of the profits it makes it uses to buy back its token. An equity investor can get that. > It's very transparent and shows how much money it makes each day. > Full disclosure, we own Hyperliquid in our portfolio. The normal equity investor can make sense of it. Hyperliquid.

Novograt (Mike Novogratz) on CNBC talking about Hyperliquid > Each of this ecosystems are working hard to prove their worth. Crypto gave the investment world the idea that community can create value. > The next generation token has to be easy for equity investors to understand. It has to connect utility. > I think of a project like Hyperliquid, a decentralized version of the Nasdaq, of Binance. It's growing like a weed because all of the profits it makes it uses to buy back its token. An equity investor can get that. > It's very transparent and shows how much money it makes each day. > Full disclosure, we own Hyperliquid in our portfolio. The normal equity investor can make sense of it. Hyperliquid.

kirbycrypto

83,116 görüntüleme • 1 yıl önce