Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

[Graph Convolutional Network] by hand ✍️ Graph Convolutional Networks (GCNs), introduced by Thomas Kipf and Max Welling in 2017, have emerged as a powerful tool in the analysis and interpretation of data structured as graphs. This exercise demonstrates how GCN works in a simple application: binary classification. -- Goal... -- Predict if a node in a graph is X. -- Architecture -- 🟪 Graph Convolutional Network (GCN) 1. GCN1(4,3) 2. GCN2(3,3) 🟦 Fully Connected Network (FCN) 1. Linear1(3,5) 2. ReLU 3. Linear2(5,1) 4. Sigmoid Simplications: • Adjacent matrices are not normalized. • ReLU is applied to messages directly. -- Walkthrough -- [1] Given ↳ A graph with five nodes A, B, C, D, E [2] 🟩 Adjacency Matrix: Neighbors ↳ Add 1 for each edge to neighbors ↳ Repeat in both directions (e.g., A->C, C->A) ↳ Repeat for both GCN layers [3] 🟩 Adjacency Matrix: Self ↳ Add 1's for each self loop ↳ Equivalent to adding the identity matrix ↳ Repeat for both GCN layers [4] 🟪 GCN1: Messages ↳ Multiply the node embeddings 🟨 with weights and biases ↳ Apply ReLU (negatives → 0) ↳ The result is one message per node [5] 🟪 GCN1: Pooling ↳ Multiply the messages with the adjacent matrix ↳ The purpose is the pool messages from each node's neighbors as well as from the node itself. ↳ The result is a new feature per node [6] 🟪 GCN1: Visualize ↳ For node 1, visualize how messages are pooled to obtain a new feature for better understanding ↳ [3,0,1] + [1,0,0] = [4,0,1] [7] 🟪 GCN2: Messages ↳ Multiply the node features with weights and biases ↳ Apply ReLU (negatives → 0) ↳ The result is one message per node [8] 🟪 GCN2: Pooling ↳ Multiply the messages with the adjacent matrix ↳ The result is a new feature per node [9] 🟪 GCN2: Visualize ↳ For node 3, visualize how messages are pooled to obtain a new feature for better understanding ↳ [1,2,4] + [1,3,5] + [0,0,1] = [2,5,10] [10] 🟦 FCN: Linear 1 + ReLU ↳ Multiply node features with weights and biases ↳ Apply ReLU (negatives → 0) ↳ The result is a new feature per node ↳ Unlike in GCN layers, no messages from other nodes are included. [11] 🟦 FCN: Linear 2 ↳ Multiply node features with weights and biases [12] 🟦 FCN: Sigmoid ↳ Apply the Sigmoid activation function ↳ The purpose is to obtain a probability value for each node ↳ One way to calculate Sigmoid by hand ✍️ is to use the approximation below: • >= 3 → 1 • 0 → 0.5 • <= -3 → 0 -- Outputs -- A: 0 (Very unlikely) B: 1 (Very likely) C: 1 (Very likely) D: 1 (Very likely) E: 0.5 (Neutral)show more

Tom Yeh

51,129 subscribers

46,499 просмотров • 1 год назад •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

Комментарии: 0

Нет доступных комментариев

Здесь появятся комментарии из оригинального поста

Похожие видео

[VAE] by Hand ✍️ A Variational Auto Encoder (VAE) learns the structure (mean and variance) of hidden features and generates new data from the learned structure. In contrast, GANs only learn to generate new data to fool a discriminator; they may not necessarily know the underlying structure of the data. The International Conference on Learning Representations (ICLR) this year announced its first ever "Test of Time Award" to recognizes the VAE paper, published 10 years ago. This exercise demonstrates how to calculate a VAE by hand. [1] Given: ↳ Three training examples X1, X2, X3 ↳ Copy training examples to the bottom ↳ The purpose is to train the network to reconstruct the training examples. ↳ Since each target is a training example itself, we use the Greek word "auto" which means "self." This crucial step is what makes an autoencoder "auto." [2] Encoder: Layer 1 + ReLU ↳ Multiply inputs with weights and biases ↳ Apply ReLU, crossing out negative values (-1 -> 0) [3] Encoder: Mean and Variance ↳ Multiply features with two sets of weights and biases ↳ 🟩 The first set predicts the means (𝜇) of latent distributions ↳ 🟪 The second set predicts the standard deviation (𝜎) of latent distributions [4] Reparameterization Trick: Random Offset ↳ Sample epsilon ε from the normal distribution with mean = 0 and variance = 1. ↳ The purpose is to randomly pick a offset away from the mean. ↳ Multiply the standard deviation values with epsilon values. ↳ The purpose is to scale the offset by the standard deviation. [5] Reparameterization Trick: Mean + Offset ↳ Add the sampled offset to predicted mean ↳ The result are new parameters or features 🟨 as inputs to the Decoder. [6] Decoder: Layer 1 + ReLU ↳ Multiply input features with weights and biases ↳ Apply ReLU, crossing out negative values. Here, -4 is crossed out. [7] Decoder: Layer 2 ↳ Multiply features with weights and biases ↳ The output is Decoder's attempt to reconstruct the input data X from reparameterized distributions described by 𝜇 and 𝜎. [8]-[10] KL Divergence Loss [8] Loss Gradient: Mean 𝜇 ↳ We want 𝜇 to approach 0. ↳ A lot of math called SGVB simplifies the calculation of loss gradients to simply 𝜇 [9,10] Loss Gradient: Stdev 𝜎 ↳ We want 𝜎 to approach 1. ↳ A lot of math simplifies the calculation to 𝜎 - (1/ 𝜎) [11] Reconstruction Loss ↳ We want the reconstructed data Y (dark 🟧) to be the same as the input data X. ↳ Some math involving Mean Square Error simplifies the calculation to Y - X.

[VAE] by Hand ✍️ A Variational Auto Encoder (VAE) learns the structure (mean and variance) of hidden features and generates new data from the learned structure. In contrast, GANs only learn to generate new data to fool a discriminator; they may not necessarily know the underlying structure of the data. The International Conference on Learning Representations (ICLR) this year announced its first ever "Test of Time Award" to recognizes the VAE paper, published 10 years ago. This exercise demonstrates how to calculate a VAE by hand. [1] Given: ↳ Three training examples X1, X2, X3 ↳ Copy training examples to the bottom ↳ The purpose is to train the network to reconstruct the training examples. ↳ Since each target is a training example itself, we use the Greek word "auto" which means "self." This crucial step is what makes an autoencoder "auto." [2] Encoder: Layer 1 + ReLU ↳ Multiply inputs with weights and biases ↳ Apply ReLU, crossing out negative values (-1 -> 0) [3] Encoder: Mean and Variance ↳ Multiply features with two sets of weights and biases ↳ 🟩 The first set predicts the means (𝜇) of latent distributions ↳ 🟪 The second set predicts the standard deviation (𝜎) of latent distributions [4] Reparameterization Trick: Random Offset ↳ Sample epsilon ε from the normal distribution with mean = 0 and variance = 1. ↳ The purpose is to randomly pick a offset away from the mean. ↳ Multiply the standard deviation values with epsilon values. ↳ The purpose is to scale the offset by the standard deviation. [5] Reparameterization Trick: Mean + Offset ↳ Add the sampled offset to predicted mean ↳ The result are new parameters or features 🟨 as inputs to the Decoder. [6] Decoder: Layer 1 + ReLU ↳ Multiply input features with weights and biases ↳ Apply ReLU, crossing out negative values. Here, -4 is crossed out. [7] Decoder: Layer 2 ↳ Multiply features with weights and biases ↳ The output is Decoder's attempt to reconstruct the input data X from reparameterized distributions described by 𝜇 and 𝜎. [8]-[10] KL Divergence Loss [8] Loss Gradient: Mean 𝜇 ↳ We want 𝜇 to approach 0. ↳ A lot of math called SGVB simplifies the calculation of loss gradients to simply 𝜇 [9,10] Loss Gradient: Stdev 𝜎 ↳ We want 𝜎 to approach 1. ↳ A lot of math simplifies the calculation to 𝜎 - (1/ 𝜎) [11] Reconstruction Loss ↳ We want the reconstructed data Y (dark 🟧) to be the same as the input data X. ↳ Some math involving Mean Square Error simplifies the calculation to Y - X.

Tom Yeh

48,356 просмотров • 2 лет назад

[LSTM] by Hand ✍️ LSTMs have been the most effective architecture to process long sequences of data, until our world was taken over by the Transformers. LSTMs belong to the broader family of recurrent neural network (RNNs) that process data sequentially in a recurrent manner. Transformers, on the other hand, abandon recurrence and use self-attention instead to process data concurrently in parallel. Recently, there is renewed interest in recurrence as people realized self-attention doesn’t scale to extremely long sequences, like hundreds of thousands of tokens. Mamba is a good example to bring back recurrence. All of a sudden, it is cool to study LSTMs. How do LSTMs work? [1] Given ↳ 🟨 Input sequence X1, X2, X3 (d = 3) ↳ 🟩 Hidden state h (d = 2) ↳ 🟦 Memory C (d = 2) ↳ Weight matrices Wf, Wc, Wi, Wo Process t = 1 [2] Initialize ↳ Randomly set the previous hidden state h0 to [1, 1] and memory cells C0 to [0.3, -0.5] [3] Linear Transform ↳ Multiply the four weight matrices with the concatenation of current input (X1) and the previous hidden state (h0). ↳ The results are feature values, each is a linear combination of the current input and hidden state. [4] Non-linear Transform ↳ Apply sigmoid σ to obtain gate values (between 0 and 1). • Forget gate (f1): [-4, -6] → [0, 0] • Input gate (i1): [6, 4] → [1, 1] • Output gate (o1): [4, -5] → [1, 0] ↳ Apply tanh to obtain candidate memory values (between -1 and 1) • Candidate memory (C’1): [1, -6] → [0.8, -1] [5] Update Memory ↳ Forget (C0 .* f1): Element-wise multiply the current memory with forget gate values. ↳ Input (C’1 .* o1): Element-wise multiply the “candidate” memory with input gate values. ↳ Update the memory to C1 by adding the two terms above: C0 .* f1 + C’1 .* o1 = C1 [6] Candiate Output ↳ Apply tanh to the new memory C1 to obtain candidate output o’1. [0.8, -1] → [0.7, -0.8] [7] Update Hidden State ↳ Output (o’1 .* o1 → h1): Element-wise multiply the candidate output with the output gate. ↳ The result is updated hidden state h1 ↳ Also, it is the first output. Process t = 2 [8] Initialize ↳ Copy previous hidden state h1 and memory C1 [9] Linear Transform ↳ Repeat [3] [10] Update Memory (C2) ↳ Repeat [4] and [5] [11] Update Hidden State (h2) ↳ Repeat [6] and [7] Process t = 3 [12] Initialize ↳ Copy previous hidden state h2 and memory C2 [13] Linear Transform ↳ Repeat [3] [14] Update Memory (C3) ↳ Repeat [4] and [5] [15] Update Hidden State (h3) ↳ Repeat [6] and [7]

[LSTM] by Hand ✍️ LSTMs have been the most effective architecture to process long sequences of data, until our world was taken over by the Transformers. LSTMs belong to the broader family of recurrent neural network (RNNs) that process data sequentially in a recurrent manner. Transformers, on the other hand, abandon recurrence and use self-attention instead to process data concurrently in parallel. Recently, there is renewed interest in recurrence as people realized self-attention doesn’t scale to extremely long sequences, like hundreds of thousands of tokens. Mamba is a good example to bring back recurrence. All of a sudden, it is cool to study LSTMs. How do LSTMs work? [1] Given ↳ 🟨 Input sequence X1, X2, X3 (d = 3) ↳ 🟩 Hidden state h (d = 2) ↳ 🟦 Memory C (d = 2) ↳ Weight matrices Wf, Wc, Wi, Wo Process t = 1 [2] Initialize ↳ Randomly set the previous hidden state h0 to [1, 1] and memory cells C0 to [0.3, -0.5] [3] Linear Transform ↳ Multiply the four weight matrices with the concatenation of current input (X1) and the previous hidden state (h0). ↳ The results are feature values, each is a linear combination of the current input and hidden state. [4] Non-linear Transform ↳ Apply sigmoid σ to obtain gate values (between 0 and 1). • Forget gate (f1): [-4, -6] → [0, 0] • Input gate (i1): [6, 4] → [1, 1] • Output gate (o1): [4, -5] → [1, 0] ↳ Apply tanh to obtain candidate memory values (between -1 and 1) • Candidate memory (C’1): [1, -6] → [0.8, -1] [5] Update Memory ↳ Forget (C0 .* f1): Element-wise multiply the current memory with forget gate values. ↳ Input (C’1 .* o1): Element-wise multiply the “candidate” memory with input gate values. ↳ Update the memory to C1 by adding the two terms above: C0 .* f1 + C’1 .* o1 = C1 [6] Candiate Output ↳ Apply tanh to the new memory C1 to obtain candidate output o’1. [0.8, -1] → [0.7, -0.8] [7] Update Hidden State ↳ Output (o’1 .* o1 → h1): Element-wise multiply the candidate output with the output gate. ↳ The result is updated hidden state h1 ↳ Also, it is the first output. Process t = 2 [8] Initialize ↳ Copy previous hidden state h1 and memory C1 [9] Linear Transform ↳ Repeat [3] [10] Update Memory (C2) ↳ Repeat [4] and [5] [11] Update Hidden State (h2) ↳ Repeat [6] and [7] Process t = 3 [12] Initialize ↳ Copy previous hidden state h2 and memory C2 [13] Linear Transform ↳ Repeat [3] [14] Update Memory (C3) ↳ Repeat [4] and [5] [15] Update Hidden State (h3) ↳ Repeat [6] and [7]

Tom Yeh

72,891 просмотров • 2 лет назад

$[Backpropagation] by Hand✍️ [1] Forward Pass ↳ Given a multi layer perceptron (3 levels), an input vector X, predictions Y^{Pred} = [0.5, 0.5, 0], and ground truth label Y^{Target} = [0, 1, 0]. [2] Backpropagation ↳ Insert cells to hold our calculations. [3] Layer 3 - Softmax (blue) ↳ Calculate ∂L / ∂z3 directly using the simple equation: Y^{Pred} - Y^{Target} = [0.5, -0.5, 0]. ↳ This simple equation is the benefit of using Softmax and Cross Entropy Loss together. [4] Layer 3 - Weights (orange) & Biases (black) ↳ Calculate ∂L / ∂W3 and ∂L / ∂b3 by multiplying ∂L / ∂z3 and [ a2 | 1 ]. [5] Layer 2 - Activations (green) ↳ Calculate ∂L / ∂a2 by multiplying ∂L / ∂z3 and W3. [6] Layer 2 - ReLU (blue) ↳ Calculate ∂L / ∂z2 by multiplying ∂L / ∂a2 with 1 for positive values and 0 otherwise. [7] Layer 2 - Weights (orange) & Biases (black) ↳ Calculate ∂L / ∂W2 and ∂L / ∂b2 by multiplying ∂L / ∂z2 and [ a1 | 1 ]. [8] Layer 1 - Activations (green) ↳ Calculate ∂L / ∂a1 by multiplying ∂L / ∂z2 and W2. [9] Layer 1 - ReLU (blue) ↳ Calculate ∂L / ∂z1 by multiplying ∂L / ∂a1 with 1 for positive values and 0 otherwise. [10] Layer 1 - Weights (orange) & Biases (black) ↳ Calculate ∂L / ∂W1 and ∂L / ∂b1 by multiplying ∂L / ∂z1 and [ x | 1 ]. [11] Gradient Descent ↳ Update weights and biases (typically a learning rate is applied here). 💡 Matrix Multiplication is All You Need: Just like in the forward pass, backpropagation is all about matrix multiplications. You can definitely do everything by hand as I demonstrated in this exercise, albeit slow and imperfect. This is why GPU's ability to multiply matrices efficiently plays such an important role in the deep learning evolution. This is why NVIDIA is now close to $1 trillion in valuation. 💡Exploding Gradients: We can already see the gradients are getting larger as we back-propagate up, even in this simple 3-layer network. This motivates using methods like skip connections to handle exploding (or diminishing) gradients as in the ResNet. I did the calculations entirely by hand. Please let me know if you spot any error or have any questions!$

[Backpropagation] by Hand✍️ [1] Forward Pass ↳ Given a multi layer perceptron (3 levels), an input vector X, predictions Y^{Pred} = [0.5, 0.5, 0], and ground truth label Y^{Target} = [0, 1, 0]. [2] Backpropagation ↳ Insert cells to hold our calculations. [3] Layer 3 - Softmax (blue) ↳ Calculate ∂L / ∂z3 directly using the simple equation: Y^{Pred} - Y^{Target} = [0.5, -0.5, 0]. ↳ This simple equation is the benefit of using Softmax and Cross Entropy Loss together. [4] Layer 3 - Weights (orange) & Biases (black) ↳ Calculate ∂L / ∂W3 and ∂L / ∂b3 by multiplying ∂L / ∂z3 and [ a2 | 1 ]. [5] Layer 2 - Activations (green) ↳ Calculate ∂L / ∂a2 by multiplying ∂L / ∂z3 and W3. [6] Layer 2 - ReLU (blue) ↳ Calculate ∂L / ∂z2 by multiplying ∂L / ∂a2 with 1 for positive values and 0 otherwise. [7] Layer 2 - Weights (orange) & Biases (black) ↳ Calculate ∂L / ∂W2 and ∂L / ∂b2 by multiplying ∂L / ∂z2 and [ a1 | 1 ]. [8] Layer 1 - Activations (green) ↳ Calculate ∂L / ∂a1 by multiplying ∂L / ∂z2 and W2. [9] Layer 1 - ReLU (blue) ↳ Calculate ∂L / ∂z1 by multiplying ∂L / ∂a1 with 1 for positive values and 0 otherwise. [10] Layer 1 - Weights (orange) & Biases (black) ↳ Calculate ∂L / ∂W1 and ∂L / ∂b1 by multiplying ∂L / ∂z1 and [ x | 1 ]. [11] Gradient Descent ↳ Update weights and biases (typically a learning rate is applied here). 💡 Matrix Multiplication is All You Need: Just like in the forward pass, backpropagation is all about matrix multiplications. You can definitely do everything by hand as I demonstrated in this exercise, albeit slow and imperfect. This is why GPU's ability to multiply matrices efficiently plays such an important role in the deep learning evolution. This is why NVIDIA is now close to $1 trillion in valuation. 💡Exploding Gradients: We can already see the gradients are getting larger as we back-propagate up, even in this simple 3-layer network. This motivates using methods like skip connections to handle exploding (or diminishing) gradients as in the ResNet. I did the calculations entirely by hand. Please let me know if you spot any error or have any questions!

Tom Yeh

64,645 просмотров • 2 лет назад

We are excited to unveil the latest version of the AIOZ Node: The Version 4.0 update! This update includes a new user interface and brings substantial functional improvements, enhancing your overall experience for increased productivity and efficiency. More information below: The standout feature of AIOZ Node v4.0 is the introduction of the Transcoding functionality, which is currently available in beta. This functionality enables your node to participate in video transcoding, which converts video files into different formats for various digital devices and media platforms. By enabling transcoding, your node can contribute more significantly to the AIOZ Network, expanding the network's capabilities and potential $AIOZ token rewards. While the transcoding functionality is currently in beta, the upcoming AIOZ W3Stream integration, a DePIN Video Infrastructure due for release in Q3 2024, will unlock the full potential of your node and enable seamless video transcoding tasks. To get started with AIOZ Node v4.0, you simply need to visit our official website to download the latest version of the AIOZ Node: This download process is very straightforward, and with a one-click installation process, you can set up AIOZ Node v4.0 to start running on your device within a few minutes. If you are already running an AIOZ Node on your device, the version 4.0 update will be applied automatically, ensuring you have the latest features and improvements without hassle! With the Node v4.0 update running on your device, you can proceed to familiarize yourself with the new layout, check out the performance improvements, and start transcoding to see how it enhances your contributions to the network! Learn More: $AIOZ

We are excited to unveil the latest version of the AIOZ Node: The Version 4.0 update! This update includes a new user interface and brings substantial functional improvements, enhancing your overall experience for increased productivity and efficiency. More information below: The standout feature of AIOZ Node v4.0 is the introduction of the Transcoding functionality, which is currently available in beta. This functionality enables your node to participate in video transcoding, which converts video files into different formats for various digital devices and media platforms. By enabling transcoding, your node can contribute more significantly to the AIOZ Network, expanding the network's capabilities and potential $AIOZ token rewards. While the transcoding functionality is currently in beta, the upcoming AIOZ W3Stream integration, a DePIN Video Infrastructure due for release in Q3 2024, will unlock the full potential of your node and enable seamless video transcoding tasks. To get started with AIOZ Node v4.0, you simply need to visit our official website to download the latest version of the AIOZ Node: This download process is very straightforward, and with a one-click installation process, you can set up AIOZ Node v4.0 to start running on your device within a few minutes. If you are already running an AIOZ Node on your device, the version 4.0 update will be applied automatically, ensuring you have the latest features and improvements without hassle! With the Node v4.0 update running on your device, you can proceed to familiarize yourself with the new layout, check out the performance improvements, and start transcoding to see how it enhances your contributions to the network! Learn More: $AIOZ

AIOZ Network

20,428 просмотров • 2 лет назад

Vector Database by Hand ✍️ Vector databases are revolutionizing how we search and analyze complex data. They have become the backbone of Retrieval Augmented Generation (#RAG). How do vector databases work? [1] Given ↳ A dataset of three sentences, each has 3 words (or tokens) ↳ In practice, a dataset may contain millions or billions of sentences. The max number of tokens may be tens of thousands (e.g., 32,768 mistral-7b). Process "how are you" [2] 🟨 Word Embeddings ↳ For each word, look up corresponding word embedding vector from a table of 22 vectors, where 22 is the vocabulary size. ↳ In practice, the vocabulary size can be tens of thousands. The word embedding dimensions are in the thousands (e.g., 1024, 4096) [3] 🟩 Encoding ↳ Feed the sequence of word embeddings to an encoder to obtain a sequence of feature vectors, one per word. ↳ Here, the encoder is a simple one layer perceptron (linear layer + ReLU) ↳ In practice, the encoder is a transformer or one of its many variants. [4] 🟩 Mean Pooling ↳ Merge the sequence of feature vectors into a single vector using "mean pooling" which is to average across the columns. ↳ The result is a single vector. We often call it "text embeddings" or "sentence embeddings." ↳ Other pooling techniques are possible, such as CLS. But mean pooling is the most common. [5] 🟦 Indexing ↳ Reduce the dimensions of the text embedding vector by a projection matrix. The reduction rate is 50% (4->2). ↳ In practice, the values in this projection matrix is much more random. ↳ The purpose is similar to that of hashing, which is to obtain a short representation to allow faster comparison and retrieval. ↳ The resulting dimension-reduced index vector is saved in the vector storage. [6] Process "who are you" ↳ Repeat [2]-[5] [7] Process "who am I" ↳ Repeat [2]-[5] Now we have indexed our dataset in the vector database. [8] 🟥 Query: "am I you" ↳ Repeat [2]-[5] ↳ The result is a 2-d query vector. [9] 🟥 Dot Products ↳ Take dot product between the query vector and database vectors. They are all 2-d. ↳ The purpose is to use dot product to estimate similarity. ↳ By transposing the query vector, this step becomes a matrix multiplication. [10] 🟥 Nearest Neighbor ↳ Find the largest dot product by linear scan. ↳ The sentence with the highest dot product is "who am I" ↳ In practice, because scanning billions of vectors is slow, we use an Approximate Nearest Neighbor (ANN) algorithm like the Hierarchical Navigable Small Worlds (HNSW).

Vector Database by Hand ✍️ Vector databases are revolutionizing how we search and analyze complex data. They have become the backbone of Retrieval Augmented Generation (#RAG). How do vector databases work? [1] Given ↳ A dataset of three sentences, each has 3 words (or tokens) ↳ In practice, a dataset may contain millions or billions of sentences. The max number of tokens may be tens of thousands (e.g., 32,768 mistral-7b). Process "how are you" [2] 🟨 Word Embeddings ↳ For each word, look up corresponding word embedding vector from a table of 22 vectors, where 22 is the vocabulary size. ↳ In practice, the vocabulary size can be tens of thousands. The word embedding dimensions are in the thousands (e.g., 1024, 4096) [3] 🟩 Encoding ↳ Feed the sequence of word embeddings to an encoder to obtain a sequence of feature vectors, one per word. ↳ Here, the encoder is a simple one layer perceptron (linear layer + ReLU) ↳ In practice, the encoder is a transformer or one of its many variants. [4] 🟩 Mean Pooling ↳ Merge the sequence of feature vectors into a single vector using "mean pooling" which is to average across the columns. ↳ The result is a single vector. We often call it "text embeddings" or "sentence embeddings." ↳ Other pooling techniques are possible, such as CLS. But mean pooling is the most common. [5] 🟦 Indexing ↳ Reduce the dimensions of the text embedding vector by a projection matrix. The reduction rate is 50% (4->2). ↳ In practice, the values in this projection matrix is much more random. ↳ The purpose is similar to that of hashing, which is to obtain a short representation to allow faster comparison and retrieval. ↳ The resulting dimension-reduced index vector is saved in the vector storage. [6] Process "who are you" ↳ Repeat [2]-[5] [7] Process "who am I" ↳ Repeat [2]-[5] Now we have indexed our dataset in the vector database. [8] 🟥 Query: "am I you" ↳ Repeat [2]-[5] ↳ The result is a 2-d query vector. [9] 🟥 Dot Products ↳ Take dot product between the query vector and database vectors. They are all 2-d. ↳ The purpose is to use dot product to estimate similarity. ↳ By transposing the query vector, this step becomes a matrix multiplication. [10] 🟥 Nearest Neighbor ↳ Find the largest dot product by linear scan. ↳ The sentence with the highest dot product is "who am I" ↳ In practice, because scanning billions of vectors is slow, we use an Approximate Nearest Neighbor (ANN) algorithm like the Hierarchical Navigable Small Worlds (HNSW).

Tom Yeh

191,994 просмотров • 2 лет назад

Transformer by hand ✍️ ~ 6 steps walkthrough below Open the hood of a transformer and the parts list is overwhelming: embeddings, positional encoding, attention weighting, self-attention, cross-attention, multi-head attention, layer norm, skip connections, softmax, linear, Nx, shifted right, query, key, value, masking. Which of those actually make the car run? Two of them. Attention weighting and the feed-forward network. Everything else is an enhancement to make it run faster and longer, which is how we got from a car to a truck, and to the word "large" in large language model. So I drew and calculated those two parts entirely by hand. Goal: push five features through one transformer block, filling in every cell yourself. 1. Given Five positions of input features, arriving from the previous block. 2. Attention matrix Let us feed all five features to a query-key module (QK) and read back an attention weight matrix, A. The details of that module are a post of their own. 3. Attention weighting We multiply the input features by A to get the attention weighted features, Z. Still five positions. The effect is to combine features *across positions*, horizontally: X1 becomes X1 + X2, X2 becomes X2 + X3, and so on. 4. First layer Let us feed all five weighted features into the first layer of the FFN. Multiply by the weights and biases. This time the combining happens *across feature dimensions*, vertically, and each feature grows from 3 numbers to 4. Note that every position goes through the same weight matrix. That is what "position-wise" means. 5. ReLU We cross out the negatives. They become zeros. 6. Second layer Let us bring it back down: 4 dimensions to 3. The output feeds the next block, which has a completely separate set of parameters, and the whole thing runs again. You have just calculated a transformer block by hand. ✍️ The takeaway: the two parts are doing two different jobs, and neither one alone is enough. Attention mixes *across positions*, so a feature can see its neighbours. The FFN mixes *across feature dimensions*, so each position can think about itself. Horizontal, then vertical. Then that pattern repeats N times, each block with its own separate set of weights. That is the Nx from the list up top, and that is what makes the transformer run. 💾 Save this post! #AIbyHand #Transformers #DeepLearning

Transformer by hand ✍️ ~ 6 steps walkthrough below Open the hood of a transformer and the parts list is overwhelming: embeddings, positional encoding, attention weighting, self-attention, cross-attention, multi-head attention, layer norm, skip connections, softmax, linear, Nx, shifted right, query, key, value, masking. Which of those actually make the car run? Two of them. Attention weighting and the feed-forward network. Everything else is an enhancement to make it run faster and longer, which is how we got from a car to a truck, and to the word "large" in large language model. So I drew and calculated those two parts entirely by hand. Goal: push five features through one transformer block, filling in every cell yourself. 1. Given Five positions of input features, arriving from the previous block. 2. Attention matrix Let us feed all five features to a query-key module (QK) and read back an attention weight matrix, A. The details of that module are a post of their own. 3. Attention weighting We multiply the input features by A to get the attention weighted features, Z. Still five positions. The effect is to combine features across positions, horizontally: X1 becomes X1 + X2, X2 becomes X2 + X3, and so on. 4. First layer Let us feed all five weighted features into the first layer of the FFN. Multiply by the weights and biases. This time the combining happens across feature dimensions, vertically, and each feature grows from 3 numbers to 4. Note that every position goes through the same weight matrix. That is what "position-wise" means. 5. ReLU We cross out the negatives. They become zeros. 6. Second layer Let us bring it back down: 4 dimensions to 3. The output feeds the next block, which has a completely separate set of parameters, and the whole thing runs again. You have just calculated a transformer block by hand. ✍️ The takeaway: the two parts are doing two different jobs, and neither one alone is enough. Attention mixes across positions, so a feature can see its neighbours. The FFN mixes across feature dimensions, so each position can think about itself. Horizontal, then vertical. Then that pattern repeats N times, each block with its own separate set of weights. That is the Nx from the list up top, and that is what makes the transformer run. 💾 Save this post! #AIbyHand #Transformers #DeepLearning

Tom Yeh

23,948 просмотров • 6 дней назад

Microsoft made 100B parameter models run on a single CPU. bitnet.cpp: The official inference framework for 1-bit LLMs. The math behind 1-bit LLMs is what makes them revolutionary. Traditional LLMs use 16-bit floating point weights. Every parameter is a number like 0.0023847 or -1.4729. When you run inference, you multiply these floats together. Billions of times. That's why you need GPUs, they're optimized for floating point matrix multiplication. BitNet b1.58 uses ternary weights: {-1, 0, 1}. That's not a simplification. That's a fundamental change in the math. When your weights are only -1, 0, or 1: → Multiply by 1 = keep the value → Multiply by -1 = flip the sign → Multiply by 0 = skip entirely Matrix multiplication becomes addition and subtraction. No floating point operations. No GPU required. This is why bitnet.cpp achieves: → 2.37x to 6.17x speedup on x86 CPUs → 1.37x to 5.07x speedup on ARM CPUs → 71.9% to 82.2% energy reduction on x86 → 55.4% to 70.0% energy reduction on ARM The speedups scale with model size. Larger models see bigger gains because there are more operations to simplify. A 100B parameter model running at human reading speed (5-7 tokens/second) on a single CPU. That's not optimization. That's a different paradigm. Why 1.58 bits? Because log₂(3) ≈ 1.58. Three possible values = 1.58 bits of information per weight. The key insight: These models aren't quantized after training. They're trained from scratch with ternary weights. The model learns to work within the constraint. No precision loss. No quality tradeoff.

Microsoft made 100B parameter models run on a single CPU. bitnet.cpp: The official inference framework for 1-bit LLMs. The math behind 1-bit LLMs is what makes them revolutionary. Traditional LLMs use 16-bit floating point weights. Every parameter is a number like 0.0023847 or -1.4729. When you run inference, you multiply these floats together. Billions of times. That's why you need GPUs, they're optimized for floating point matrix multiplication. BitNet b1.58 uses ternary weights: {-1, 0, 1}. That's not a simplification. That's a fundamental change in the math. When your weights are only -1, 0, or 1: → Multiply by 1 = keep the value → Multiply by -1 = flip the sign → Multiply by 0 = skip entirely Matrix multiplication becomes addition and subtraction. No floating point operations. No GPU required. This is why bitnet.cpp achieves: → 2.37x to 6.17x speedup on x86 CPUs → 1.37x to 5.07x speedup on ARM CPUs → 71.9% to 82.2% energy reduction on x86 → 55.4% to 70.0% energy reduction on ARM The speedups scale with model size. Larger models see bigger gains because there are more operations to simplify. A 100B parameter model running at human reading speed (5-7 tokens/second) on a single CPU. That's not optimization. That's a different paradigm. Why 1.58 bits? Because log₂(3) ≈ 1.58. Three possible values = 1.58 bits of information per weight. The key insight: These models aren't quantized after training. They're trained from scratch with ternary weights. The model learns to work within the constraint. No precision loss. No quality tradeoff.

Tech with Mak

23,036 просмотров • 3 месяцев назад

$ReLU vs Leaky ReLU 👉 = ReLU = ReLU is the default activation in modern deep learning — cheap to compute, and stable enough to train networks hundreds of layers deep. To see what it does, picture five boba tea shops on the same block — 𝚊, 𝚋, 𝚌, 𝚍, 𝚎 — each running their own books. Each value is a shop's monthly profit — receipts minus rent, ingredients, and wages. When profit is positive, the shop stays open and the owner pockets every dollar. When profit turns negative, the shop runs out of cash and shutters — the lights go off, the books are wiped to zero. ReLU is exactly that rule, applied one shop at a time. Read the diagram left to right. The first column is the raw value x — each shop's profit at month's end. The second column is the gate: 1 if the shop is open (x > 0), 0 if it has shuttered. The last column is the ReLU output: open shops pass their profit through untouched, while shuttered ones are zeroed out. Five rows means five parallel shops on the same block, each evaluated independently. That's why ReLU is called an element-wise activation: every neuron decides its own fate. = LeakyRelu = Plain ReLU wipes negative values to zero — clean, but a shop that shutters can never recover, since both its output and its gradient stay pinned at zero. This is the dying ReLU problem, and in deep networks it can quietly kill a meaningful fraction of the units. Leaky ReLU is the one-line fix: instead of shuttering, the shop files for Chapter 11 protection and keeps the lights on at reduced capacity. Its debt is restructured down to a fraction α (typically 0.1) — the rest is forgiven, and the shop is wounded, not killed. A small negative signal still flows through, so the gradient survives, and the shop can crawl back to life if a TikTok goes viral. Read the diagram left to right. The first column is the raw value x — each shop's profit at month's end. The second column is the leakage α — the fraction of the loss held over after restructuring (default 0.1, editable). The third column is the gate: 1 for shops still in the black, α for those operating under bankruptcy protection. The last column is the Leaky ReLU output: y = x · gate. Profitable shops pass through untouched; struggling ones shrink by a factor of α but still carry a sign. Five rows means five parallel shops, each evaluated independently. Like ReLU, this is an element-wise activation: every neuron's fate is decided on its own merits. #aibyhahd$

ReLU vs Leaky ReLU 👉 = ReLU = ReLU is the default activation in modern deep learning — cheap to compute, and stable enough to train networks hundreds of layers deep. To see what it does, picture five boba tea shops on the same block — 𝚊, 𝚋, 𝚌, 𝚍, 𝚎 — each running their own books. Each value is a shop's monthly profit — receipts minus rent, ingredients, and wages. When profit is positive, the shop stays open and the owner pockets every dollar. When profit turns negative, the shop runs out of cash and shutters — the lights go off, the books are wiped to zero. ReLU is exactly that rule, applied one shop at a time. Read the diagram left to right. The first column is the raw value x — each shop's profit at month's end. The second column is the gate: 1 if the shop is open (x > 0), 0 if it has shuttered. The last column is the ReLU output: open shops pass their profit through untouched, while shuttered ones are zeroed out. Five rows means five parallel shops on the same block, each evaluated independently. That's why ReLU is called an element-wise activation: every neuron decides its own fate. = LeakyRelu = Plain ReLU wipes negative values to zero — clean, but a shop that shutters can never recover, since both its output and its gradient stay pinned at zero. This is the dying ReLU problem, and in deep networks it can quietly kill a meaningful fraction of the units. Leaky ReLU is the one-line fix: instead of shuttering, the shop files for Chapter 11 protection and keeps the lights on at reduced capacity. Its debt is restructured down to a fraction α (typically 0.1) — the rest is forgiven, and the shop is wounded, not killed. A small negative signal still flows through, so the gradient survives, and the shop can crawl back to life if a TikTok goes viral. Read the diagram left to right. The first column is the raw value x — each shop's profit at month's end. The second column is the leakage α — the fraction of the loss held over after restructuring (default 0.1, editable). The third column is the gate: 1 for shops still in the black, α for those operating under bankruptcy protection. The last column is the Leaky ReLU output: y = x · gate. Profitable shops pass through untouched; struggling ones shrink by a factor of α but still carry a sign. Five rows means five parallel shops, each evaluated independently. Like ReLU, this is an element-wise activation: every neuron's fate is decided on its own merits. #aibyhahd

Tom Yeh

32,165 просмотров • 2 месяцев назад

$Softmax vs Sigmoid ✍️ Interact 👉 = Softmax = Softmax is how deep networks turn raw scores into a probability distribution — the final layer of every classifier, and the core of every attention head in a transformer. To see what it does, picture five boba tea shops on the same block, all competing for your dollar. Five candidates: a, b, c, d, e — different chains, different brewing styles, different pearls. A boba reviewer hands you a 𝘤𝘩𝘦𝘸𝘪𝘯𝘦𝘴𝘴 𝘴𝘤𝘰𝘳𝘦 for each — higher means perfectly chewy "QQ" pearls with the right bite (ask a Taiwanese friend to find out what QQ means). Negative scores are real: mushy bobas, overcooked pearls, a batch left sitting too long. How do you turn five chewiness scores into an allocation that adds to a whole dollar? You could spend everything at the chewiest shop, but that ignores how good the runners-up are. Softmax is the smooth alternative. Read the diagram left to right. First, raise each score to e^{x} — this does two things: it turns negative chewiness into small positives, and it stretches the gaps between scores exponentially. Then sum all five into a single total Z. Finally, divide each e^{x} by Z to get a probability. The five probabilities add up to one, so you can read them as percentages of your dollar. The chewiest shop gets the biggest slice — but never the whole dollar. That's the point of softmax: it ranks confidently while still leaving room for the others. = Sigmoid = Sigmoid squashes any real number into a probability between 0 and 1 — the classic activation for binary classification, and still the gating function inside LSTMs and GRUs. Same boba block as the previous Softmax example, narrowed to just two contenders — a hot new shop `a` with chewiness score x, and your usual go-to `b` whose score is pinned at zero (the neutral baseline you've come to expect). Sigmoid is just softmax with two players, one of them pinned to zero. Read the diagram left to right. First, raise each score to e^{x} — for the usual shop `b` whose score is zero, this is just e^0 = 1 (the constant baseline). Then sum the two into a total Z. Finally, divide each e^{x} by Z to get a probability. The two probabilities add up to one — the new shop wins more of your dollar when its pearls get chewier, and your usual keeps the rest. That's the point of sigmoid: it turns a single chewiness score into a clean 0-to-1 chance you'll try the new place over your usual. --- AI Math, Algorithms, Architectures by hand ✍️ Subscribe to my 60K+ reader newsletter 👉$

Softmax vs Sigmoid ✍️ Interact 👉 = Softmax = Softmax is how deep networks turn raw scores into a probability distribution — the final layer of every classifier, and the core of every attention head in a transformer. To see what it does, picture five boba tea shops on the same block, all competing for your dollar. Five candidates: a, b, c, d, e — different chains, different brewing styles, different pearls. A boba reviewer hands you a 𝘤𝘩𝘦𝘸𝘪𝘯𝘦𝘴𝘴 𝘴𝘤𝘰𝘳𝘦 for each — higher means perfectly chewy "QQ" pearls with the right bite (ask a Taiwanese friend to find out what QQ means). Negative scores are real: mushy bobas, overcooked pearls, a batch left sitting too long. How do you turn five chewiness scores into an allocation that adds to a whole dollar? You could spend everything at the chewiest shop, but that ignores how good the runners-up are. Softmax is the smooth alternative. Read the diagram left to right. First, raise each score to e^{x} — this does two things: it turns negative chewiness into small positives, and it stretches the gaps between scores exponentially. Then sum all five into a single total Z. Finally, divide each e^{x} by Z to get a probability. The five probabilities add up to one, so you can read them as percentages of your dollar. The chewiest shop gets the biggest slice — but never the whole dollar. That's the point of softmax: it ranks confidently while still leaving room for the others. = Sigmoid = Sigmoid squashes any real number into a probability between 0 and 1 — the classic activation for binary classification, and still the gating function inside LSTMs and GRUs. Same boba block as the previous Softmax example, narrowed to just two contenders — a hot new shop `a` with chewiness score x, and your usual go-to `b` whose score is pinned at zero (the neutral baseline you've come to expect). Sigmoid is just softmax with two players, one of them pinned to zero. Read the diagram left to right. First, raise each score to e^{x} — for the usual shop `b` whose score is zero, this is just e^0 = 1 (the constant baseline). Then sum the two into a total Z. Finally, divide each e^{x} by Z to get a probability. The two probabilities add up to one — the new shop wins more of your dollar when its pearls get chewier, and your usual keeps the rest. That's the point of sigmoid: it turns a single chewiness score into a clean 0-to-1 chance you'll try the new place over your usual. --- AI Math, Algorithms, Architectures by hand ✍️ Subscribe to my 60K+ reader newsletter 👉

Tom Yeh

73,787 просмотров • 2 месяцев назад

Stateless History Node is almost like a regular Ethereum node, but it doesn't store state and it doesn't have EVM execution. It's used only for syncing events and thus - is faster and gives you FREE INDEXING. You don't have to pay 6 figures for RPC anymore! Just spin up a Stateless History Node, plug rindexer or Ponder there, and enjoy free (AND FAST!!) indexing! This node is syncing >1000 blocks per second at my local pc (less than 6hrs for the whole Ethereum), and it should use less than 200GB - which means you can host it on a MacMini, Hetzner or whatever. You can futhermore filter that by using block ranges or bloom filters, etc - I haven't developed this yet. What you see is a proof of concept. It works via native devp2p 'eth' protocol, but with EIP4444 and The Prune we would have to also support era1 archives and Portal Network. But so far it works - there are plenty of peers serving historical receipts, and they serve them FAST! If you run Stateless History Node you can also serve the blocks and receipts - so that could help to preserve archival data too. For now there is no data validation yet (and even no data storage - that's a very early PoC), but we can verify validity of chain by simultaneously running a lightweight CL node (or not lightweight if you're extremely paranoid). And then support verifying the hashes of receipts and blocks with their parents, maintaining full integrity and zero trust. It's also written in rust, btw. So, I guess, at least for Ethereum Mainnet the era of RPC's pumping moneybags is over - there's finally a local, trustless and free indexing alternative available. Too sad this won't work for Optimism / Base , cause despite introducing P2P after Bedrock - they haven't enabled receipts transfer in the protocol (or at least I couldn't find one). Arbitrum is even sadder - I don't believe there is a P2P layer at all - you just have to run your own node, hold state and execute blocks to get events. There is hope - Paradigm recently released Ress - stateless execution, but it requires nodes to support Witness preparation & exchange - but this could work for L2s - cause the main blocker for local RPCs rn is huge state (VPS with TB storage cost a lot), and the second blocker is EVM forks makes it hard to hold a node - it needs to be maintained, upgraded, etc. Ress at least solves the state part. But anyways, I will try to continue working on this and release some MVP version with RPC endpoint and data storage soon - follow the updates!

Stateless History Node is almost like a regular Ethereum node, but it doesn't store state and it doesn't have EVM execution. It's used only for syncing events and thus - is faster and gives you FREE INDEXING. You don't have to pay 6 figures for RPC anymore! Just spin up a Stateless History Node, plug rindexer or Ponder there, and enjoy free (AND FAST!!) indexing! This node is syncing >1000 blocks per second at my local pc (less than 6hrs for the whole Ethereum), and it should use less than 200GB - which means you can host it on a MacMini, Hetzner or whatever. You can futhermore filter that by using block ranges or bloom filters, etc - I haven't developed this yet. What you see is a proof of concept. It works via native devp2p 'eth' protocol, but with EIP4444 and The Prune we would have to also support era1 archives and Portal Network. But so far it works - there are plenty of peers serving historical receipts, and they serve them FAST! If you run Stateless History Node you can also serve the blocks and receipts - so that could help to preserve archival data too. For now there is no data validation yet (and even no data storage - that's a very early PoC), but we can verify validity of chain by simultaneously running a lightweight CL node (or not lightweight if you're extremely paranoid). And then support verifying the hashes of receipts and blocks with their parents, maintaining full integrity and zero trust. It's also written in rust, btw. So, I guess, at least for Ethereum Mainnet the era of RPC's pumping moneybags is over - there's finally a local, trustless and free indexing alternative available. Too sad this won't work for Optimism / Base , cause despite introducing P2P after Bedrock - they haven't enabled receipts transfer in the protocol (or at least I couldn't find one). Arbitrum is even sadder - I don't believe there is a P2P layer at all - you just have to run your own node, hold state and execute blocks to get events. There is hope - Paradigm recently released Ress - stateless execution, but it requires nodes to support Witness preparation & exchange - but this could work for L2s - cause the main blocker for local RPCs rn is huge state (VPS with TB storage cost a lot), and the second blocker is EVM forks makes it hard to hold a node - it needs to be maintained, upgraded, etc. Ress at least solves the state part. But anyways, I will try to continue working on this and release some MVP version with RPC endpoint and data storage soon - follow the updates!

Convergence Boy

28,877 просмотров • 6 месяцев назад

iAGENT NODECITY: Season 1 Farming Launches in 24 Hours Step into NodeCity, where the glow of neon dances upon the slick streets and towering skyscrapers cast shadows against the skyline. So, what exactly is NodeCity? It's the dynamic dashboard of iAGENT, your portal to connect with your Agents, dive into vibrant discussions, and reap the rewards. These rewards come in the form of: - early access to Genesis Node WL. - a slice of the $AGNT airdrop. In NodeCity, points are the currency of choice, earned through two thrilling avenues: - Node Credit (NC): Share your invite link. - Node Points (NP): Conquer daily challenges. These challenges aren't your ordinary tasks. Picture them as verbal duels, each day presenting a fresh opportunity for witty exchanges. And who are the players? Four distinct characters, each offering five days of engaging interactions. Earn the most points in Week 1 due to Multiplier Here's the kicker: The first week starts off gently, with few simple prompts—share, like, RT and follow. But it's also the most lucrative, offering early bird advantages and a bountiful points harvest. The NodeCity Farming Season 1 will grace us with its presence for just over three weeks from its launch today. Even if you arrive a little late (5 days in), fear not. There still will be time to immerse yourself in the NodeCity experience and seize its riches! Coz at iAgent, we intend to “Make Farming Fun Again.” 🟢

iAGENT NODECITY: Season 1 Farming Launches in 24 Hours Step into NodeCity, where the glow of neon dances upon the slick streets and towering skyscrapers cast shadows against the skyline. So, what exactly is NodeCity? It's the dynamic dashboard of iAGENT, your portal to connect with your Agents, dive into vibrant discussions, and reap the rewards. These rewards come in the form of: - early access to Genesis Node WL. - a slice of the $AGNT airdrop. In NodeCity, points are the currency of choice, earned through two thrilling avenues: - Node Credit (NC): Share your invite link. - Node Points (NP): Conquer daily challenges. These challenges aren't your ordinary tasks. Picture them as verbal duels, each day presenting a fresh opportunity for witty exchanges. And who are the players? Four distinct characters, each offering five days of engaging interactions. Earn the most points in Week 1 due to Multiplier Here's the kicker: The first week starts off gently, with few simple prompts—share, like, RT and follow. But it's also the most lucrative, offering early bird advantages and a bountiful points harvest. The NodeCity Farming Season 1 will grace us with its presence for just over three weeks from its launch today. Even if you arrive a little late (5 days in), fear not. There still will be time to immerse yourself in the NodeCity experience and seize its riches! Coz at iAgent, we intend to “Make Farming Fun Again.” 🟢

iAgent

30,890 просмотров • 2 лет назад

OpenClaw Releases iOS and Android Companion Node Apps That Connect a Phone to a Self-Hosted AI Agent Gateway Most "AI assistant" apps are a chatbot in a sandbox, calling someone else's API. OpenClaw's iOS and Android apps draw a very clear line away from that model. They're companion nodes, not standalone apps. Each phone pairs to a self-hosted OpenClaw Gateway over a WebSocket (default port 18789) with role: "node". The Gateway — the single control plane for sessions, routing, channels, and events — runs on macOS, Linux, or Windows (WSL2). The phone gives the agent a body: camera, location, voice, notifications, and a live Canvas. Here's what's actually interesting: → The assistant runs on your machine — chat messages land on the Gateway, never on the phone → Nodes expose a command surface (canvas., camera., device., notifications., system.*) through node.invoke → Privacy-heavy commands like camera.snap and screen.record stay off until you allowlist them via gateway.nodes.allowCommands → Camera and screen capture run foreground-only; pairing needs explicit approval (openclaw devices approve) → Both store listings declare no data collection; ws:// is LAN-only, remote needs a wss:// TLS endpoint via Tailscale Full analysis: Android app: iOS App: OpenClaw🦞

OpenClaw Releases iOS and Android Companion Node Apps That Connect a Phone to a Self-Hosted AI Agent Gateway Most "AI assistant" apps are a chatbot in a sandbox, calling someone else's API. OpenClaw's iOS and Android apps draw a very clear line away from that model. They're companion nodes, not standalone apps. Each phone pairs to a self-hosted OpenClaw Gateway over a WebSocket (default port 18789) with role: "node". The Gateway — the single control plane for sessions, routing, channels, and events — runs on macOS, Linux, or Windows (WSL2). The phone gives the agent a body: camera, location, voice, notifications, and a live Canvas. Here's what's actually interesting: → The assistant runs on your machine — chat messages land on the Gateway, never on the phone → Nodes expose a command surface (canvas., camera., device., notifications., system.*) through node.invoke → Privacy-heavy commands like camera.snap and screen.record stay off until you allowlist them via gateway.nodes.allowCommands → Camera and screen capture run foreground-only; pairing needs explicit approval (openclaw devices approve) → Both store listings declare no data collection; ws:// is LAN-only, remote needs a wss:// TLS endpoint via Tailscale Full analysis: Android app: iOS App: OpenClaw🦞

Marktechpost AI

38,657 просмотров • 23 дней назад

A Look Into OpSec Cloudverse 👇 OpSec Cloudverse is a comprehensive platform designed to bring the formidable capabilities of blockchain technology to your fingertips. It's the engine that powers seamless access and management of blockchain nodes, enabling users to harness the full spectrum of web3 functionalities with unprecedented ease. How Cloudverse Works At its core, Cloudverse acts as a control panel for various blockchain-related services. It offers a user-friendly dashboard that allows you to remotely access, monitor, and manage nodes or servers, which are the fundamental building blocks of blockchain networks. These nodes operate round the clock, validating transactions, securing the network, and ensuring that your digital assets are always under your command. Node Operation Simplified Running nodes is often a technical and complex task, but Cloudverse streamlines the process. With its innovative 'one-click' setup, you can deploy a node in seconds, bypassing the intricate configuration typically required. This not only opens the door for non-technical users to contribute to blockchain networks but also significantly reduces the time and effort for developers and miners. Potential Rewards for Users Participation in the Cloudverse ecosystem is incentivized. As you contribute to the network by hosting nodes or validating transactions, you can earn rewards. These rewards vary based on the blockchain you support and the demand for transaction processing on the network. While specific figures can fluctuate, the potential for earning is tied directly to the vitality and activity of the blockchain ecosystem you choose to support. Beyond Nodes: A Diverse Web3 Toolkit Cloudverse also serves as a versatile suite of tools for traders, gamers, and developers. Whether it's hosting decentralized applications (DApps), enjoying high-speed gaming servers, or trading with the utmost performance, Cloudverse equips you with the resources to excel. Continuous Evolution The OpSec Cloudverse is a living ecosystem, continuously expanding with new features and products. The roadmap is dynamic, with regular updates enhancing your experience and capabilities within the digital realm. Your Role in the Digital Future By engaging with Cloudverse, you're not only utilising a service. You are actively participating in the foundation of a decentralized, secure digital landscape. It's a call to action for those who envision a future where digital trust is paramount, and everyone has a role to play in safeguarding the integrity of our online world. Join OpSec Cloudverse, and be at the forefront of the blockchain revolution. Let's build a stronger, more secure digital future, together! - End of the post, Beware of fake accounts -

A Look Into OpSec Cloudverse 👇 OpSec Cloudverse is a comprehensive platform designed to bring the formidable capabilities of blockchain technology to your fingertips. It's the engine that powers seamless access and management of blockchain nodes, enabling users to harness the full spectrum of web3 functionalities with unprecedented ease. How Cloudverse Works At its core, Cloudverse acts as a control panel for various blockchain-related services. It offers a user-friendly dashboard that allows you to remotely access, monitor, and manage nodes or servers, which are the fundamental building blocks of blockchain networks. These nodes operate round the clock, validating transactions, securing the network, and ensuring that your digital assets are always under your command. Node Operation Simplified Running nodes is often a technical and complex task, but Cloudverse streamlines the process. With its innovative 'one-click' setup, you can deploy a node in seconds, bypassing the intricate configuration typically required. This not only opens the door for non-technical users to contribute to blockchain networks but also significantly reduces the time and effort for developers and miners. Potential Rewards for Users Participation in the Cloudverse ecosystem is incentivized. As you contribute to the network by hosting nodes or validating transactions, you can earn rewards. These rewards vary based on the blockchain you support and the demand for transaction processing on the network. While specific figures can fluctuate, the potential for earning is tied directly to the vitality and activity of the blockchain ecosystem you choose to support. Beyond Nodes: A Diverse Web3 Toolkit Cloudverse also serves as a versatile suite of tools for traders, gamers, and developers. Whether it's hosting decentralized applications (DApps), enjoying high-speed gaming servers, or trading with the utmost performance, Cloudverse equips you with the resources to excel. Continuous Evolution The OpSec Cloudverse is a living ecosystem, continuously expanding with new features and products. The roadmap is dynamic, with regular updates enhancing your experience and capabilities within the digital realm. Your Role in the Digital Future By engaging with Cloudverse, you're not only utilising a service. You are actively participating in the foundation of a decentralized, secure digital landscape. It's a call to action for those who envision a future where digital trust is paramount, and everyone has a role to play in safeguarding the integrity of our online world. Join OpSec Cloudverse, and be at the forefront of the blockchain revolution. Let's build a stronger, more secure digital future, together! - End of the post, Beware of fake accounts -

OpSec

31,170 просмотров • 2 лет назад

Dagknight technical progress As would be mentioned in a still unshared post by Michael Sutton, the dagknight effort is split into v0 devnet, v1 testnet and v2 mainnet candidate. I’ve been testing the current v0-based implementation in a small devnet with the help of some testers who run nodes and miners with me. The DK work can be thought of as split into two parts: (1) implementing the actual protocol and (2) wiring it up and using it. The testing and development over the last month has been focused on (2). Obviously, DK is a consensus change for selecting parents. What’s not so obvious is that such a change affects DAA, coinbase, IBD, pruning and a lot more. Each of these areas is very sensitive and requires proper understanding to wire correctly. An important consideration and difference from GD is that DK does not focus on maximizing a property like blue work. So to maintain topological properties of blue work, an independent (free) GD implementation is kept running specifically for maintaining blue work. This allows us to keep using the property for topology. Coloring and blue score use the megachain induced by DK. The wiring around DK as of this posting is in a working state, but still needs to be reviewed. Next efforts will be focused on protocol specific components, particularly Tie-Breaking and incremental UMC. Attached are some captures from the internal devnet. The dense DAG image is what happens when things related to DAA or other similar consensus parameter causes a node to insist on their POV. The video is a recent snippet of the KGI running on the devnet showing (perhaps not obviously) DK at work. The current “dagknight” branch is now posted on the main repo. A topic in the Public R&D has been opened for Dagknight development.

Dagknight technical progress As would be mentioned in a still unshared post by Michael Sutton, the dagknight effort is split into v0 devnet, v1 testnet and v2 mainnet candidate. I’ve been testing the current v0-based implementation in a small devnet with the help of some testers who run nodes and miners with me. The DK work can be thought of as split into two parts: (1) implementing the actual protocol and (2) wiring it up and using it. The testing and development over the last month has been focused on (2). Obviously, DK is a consensus change for selecting parents. What’s not so obvious is that such a change affects DAA, coinbase, IBD, pruning and a lot more. Each of these areas is very sensitive and requires proper understanding to wire correctly. An important consideration and difference from GD is that DK does not focus on maximizing a property like blue work. So to maintain topological properties of blue work, an independent (free) GD implementation is kept running specifically for maintaining blue work. This allows us to keep using the property for topology. Coloring and blue score use the megachain induced by DK. The wiring around DK as of this posting is in a working state, but still needs to be reviewed. Next efforts will be focused on protocol specific components, particularly Tie-Breaking and incremental UMC. Attached are some captures from the internal devnet. The dense DAG image is what happens when things related to DAA or other similar consensus parameter causes a node to insist on their POV. The video is a recent snippet of the KGI running on the devnet showing (perhaps not obviously) DK at work. The current “dagknight” branch is now posted on the main repo. A topic in the Public R&D has been opened for Dagknight development.

coderofstuff

52,857 просмотров • 4 месяцев назад

Full Fine-tuning vs. Freezing Layers. Interact 👉 and == Full Fine-tuning == A real network has many — three layers in this example, billions of parameters in a production model. What does fine-tuning look like when you update all of them? That’s full fine-tuning: continue training every weight in the pretrained network on your new task. Every layer’s W gets its own ΔW. Nothing is frozen — every parameter is in play. Think of an MLP as a chain of prerequisites leading to an advanced course. Layer 1 might be Linear Algebra, layer 2 Probability, layer 3 Advanced Machine Learning — each one building on what came before. Fine-tuning is what happens during graduate study: the foundations are already there from undergrad, so you’re not re-learning. Full fine-tuning is reviewing every prerequisite to see what new topics have appeared and what discoveries the field has made since the last time you sat through them. Effective — but exhausting. This diagram shows the same three-layer MLP twice, side by side. On the left, the pretrained network runs on input X: three weight matrices W₁, W₂, W₃, each followed by a ReLU activation. Full fine-tuning gives the model the most freedom to specialize. Every parameter can move — and every parameter that can move must be stored. But not every prerequisite needs revisiting. The further you go back in the chain, the less the material has changed since pretraining — the linear-algebra basics under your computer-vision course are largely the same as they ever were. The next page does exactly that: freeze the prerequisites that haven’t moved, and only refresh the advanced one closest to your specialization. == Freezing Layers == Full fine-tuning reviewed every prerequisite — Linear Algebra, Probability, Advanced ML — to refresh each subject with the latest topics. Effective, but exhausting. Then you realize something. The prerequisites haven’t actually changed that much. Linear Algebra is still Linear Algebra; the matrix decompositions you learned still hold. Probability is still Probability; the distributions and Bayes’ rule haven’t moved. Almost all the new material — the new ideas, the recent discoveries — lives in the advanced layer at the top. That’s freezing layers: keep the prerequisite layers fixed at their pretrained state, and only update the advanced one. In the diagram below, W1 and W2 — the foundational prerequisites — stay frozen. Only W3 — the layer closest to your task-specific output — gets a ΔW.

Full Fine-tuning vs. Freezing Layers. Interact 👉 and == Full Fine-tuning == A real network has many — three layers in this example, billions of parameters in a production model. What does fine-tuning look like when you update all of them? That’s full fine-tuning: continue training every weight in the pretrained network on your new task. Every layer’s W gets its own ΔW. Nothing is frozen — every parameter is in play. Think of an MLP as a chain of prerequisites leading to an advanced course. Layer 1 might be Linear Algebra, layer 2 Probability, layer 3 Advanced Machine Learning — each one building on what came before. Fine-tuning is what happens during graduate study: the foundations are already there from undergrad, so you’re not re-learning. Full fine-tuning is reviewing every prerequisite to see what new topics have appeared and what discoveries the field has made since the last time you sat through them. Effective — but exhausting. This diagram shows the same three-layer MLP twice, side by side. On the left, the pretrained network runs on input X: three weight matrices W₁, W₂, W₃, each followed by a ReLU activation. Full fine-tuning gives the model the most freedom to specialize. Every parameter can move — and every parameter that can move must be stored. But not every prerequisite needs revisiting. The further you go back in the chain, the less the material has changed since pretraining — the linear-algebra basics under your computer-vision course are largely the same as they ever were. The next page does exactly that: freeze the prerequisites that haven’t moved, and only refresh the advanced one closest to your specialization. == Freezing Layers == Full fine-tuning reviewed every prerequisite — Linear Algebra, Probability, Advanced ML — to refresh each subject with the latest topics. Effective, but exhausting. Then you realize something. The prerequisites haven’t actually changed that much. Linear Algebra is still Linear Algebra; the matrix decompositions you learned still hold. Probability is still Probability; the distributions and Bayes’ rule haven’t moved. Almost all the new material — the new ideas, the recent discoveries — lives in the advanced layer at the top. That’s freezing layers: keep the prerequisite layers fixed at their pretrained state, and only update the advanced one. In the diagram below, W1 and W2 — the foundational prerequisites — stay frozen. Only W3 — the layer closest to your task-specific output — gets a ΔW.

Tom Yeh

27,587 просмотров • 3 месяцев назад

🍿Ready for #SonicMovie3? We are! HTC VIVE is celebrating the blue blur's return to movie theatres with a special prize giveaway on Instagram: 🔹1 grand prize winner will get a VIVE Focus Vision AND a VIVE Ultimate Tracker 3+1 Kit and two Fandango Movie Tickets* to see Sonic 3🕶️ 🔹 24 lucky winners will each receive a pair of Fandango Movie Tickets* for Sonic 3. 🎟️ (up to $15 each) To enter for a chance to win: 1️⃣ Go to Instagram 2️⃣ Follow HTC VIVE 3️⃣ Like one of our giveaway posts on Instagram 4⃣ Tag a friend in the comments of the giveaway post. But you gotta go fast. Our giveaway will close on December 30, 2024, at 11:59 p.m. PT. Winners will be announced on December 31, 2024. And don’t forget to see the new Sonic the Hedgehog only in theatres December 20! #vivexsonic This giveaway's prizes are only available to participants located in the U.S., must be 18 years or older. No purchase necessary. Multiples entries are prohibited. See link in HTC VIVE Instagram bio for full details. Terms apply. Terms & Conditions apply: *Each Fandango Promotional Code (“Code”) is good towards the purchase of one movie ticket (up to $15 total ticket price and associated fees and charges) to see Sonic the Hedgehog 3 at Fandango partner theaters in the US. Valid only for purchases at or via the Fandango app. Code is void if not redeemed by 1/31/2025 or when Sonic the Hedgehog 3 is no longer in theaters, whichever comes first. Not for resale; void if sold or exchanged. Offer valid for one-time use only. Limit 2 Codes per person. The redemption of the Code is subject to Fandango’s Terms and Policies at All rights reserved. FANDANGO and the Fandango Logo are registered trademarks of Fandango Media, LLC. HTC, VIVE, VIVERSE, VIVEPORT and all other HTC product and service names are trademarks or registered trademarks of HTC Corporation. ©2024 Paramount Pictures and Sega of America, Inc.

HTC VIVE

16,817 просмотров • 1 год назад

A summary of my fireside chat with vitalik.eth at the Home Staking Summit in Singapore last week if you weren't able to attend. *Disclaimer: This also includes some of my prior understanding and interpretation. Part 1 of 3: The importance of solo stakers Solo stakers serve as both the first and last line of defence for the Ethereum network. First line = Providing censorship resistance. Last line = Quorum-blocking set by preventing 67% finalisation of the wrong chain As a censorship resistant set, solo stakers are typically uncorrelated and unaligned with any organisations, making them a difficult target for regulatory capture or coercion. What this means is that, certain organisations can try to impose censorship on Ethereum but will likely only go as far as delaying these transactions. Full censorship will be extremely difficult because solo stakers exist on Ethereum (amongst other mechanisms). IMO this is a huge part on why ETH is the credibly neutral block space that serious money wants to settle on!😎 As a quorum-blocking set: Ethereum's social layer can organise a recovery from a 51% capture where the chain splits in 2 but finalising the wrong thing (i.e., 67% capture) will be very bad as the blockchain's past & future can be changed without direct slashing risks. One way to prevent finalisation capture is to increase the quorum threshold - e.g., from 67% to 75% or even 85%. At 85%, current numbers of identified solo stakers + unidentified stakers from hildobby 's dashboard will be sufficient. But on the other hand, will this decrease the cost of attack to prevent finalisation? - e.g., from >33% to just >15% of staked ETH? Not necessarily, as there are cheaper backdoor ways to mount this attack even today. Eg. Bribing core devs or key employees of large node operators, or even outrightly buying out the large node operators Hence, the 33% of staked ETH here represents the highest cost of attack to hold the chain hostage. If so, then we are currently overpaying to prevent >33% front door attack today, which means there is room to decrease the budget in favour of better security against ">66%" attacks. The analogy Vitalik uses is that if we imagine a house with 10cm thick bulletproof glass as windows but has a crappy wooden door, then we should reallocate resources to strengthening the weakest link. The other more ambitious but realistic method is to increase the number of solo stakers such that we become the quorum-blocking set of the current finalisation threshold. How do we achieve this? Stay tuned for Parts 2 and 3 for my discussion with Vitalik on Orbit SSF and Rainbow Staking!

A summary of my fireside chat with vitalik.eth at the Home Staking Summit in Singapore last week if you weren't able to attend. *Disclaimer: This also includes some of my prior understanding and interpretation. Part 1 of 3: The importance of solo stakers Solo stakers serve as both the first and last line of defence for the Ethereum network. First line = Providing censorship resistance. Last line = Quorum-blocking set by preventing 67% finalisation of the wrong chain As a censorship resistant set, solo stakers are typically uncorrelated and unaligned with any organisations, making them a difficult target for regulatory capture or coercion. What this means is that, certain organisations can try to impose censorship on Ethereum but will likely only go as far as delaying these transactions. Full censorship will be extremely difficult because solo stakers exist on Ethereum (amongst other mechanisms). IMO this is a huge part on why ETH is the credibly neutral block space that serious money wants to settle on!😎 As a quorum-blocking set: Ethereum's social layer can organise a recovery from a 51% capture where the chain splits in 2 but finalising the wrong thing (i.e., 67% capture) will be very bad as the blockchain's past & future can be changed without direct slashing risks. One way to prevent finalisation capture is to increase the quorum threshold - e.g., from 67% to 75% or even 85%. At 85%, current numbers of identified solo stakers + unidentified stakers from hildobby 's dashboard will be sufficient. But on the other hand, will this decrease the cost of attack to prevent finalisation? - e.g., from >33% to just >15% of staked ETH? Not necessarily, as there are cheaper backdoor ways to mount this attack even today. Eg. Bribing core devs or key employees of large node operators, or even outrightly buying out the large node operators Hence, the 33% of staked ETH here represents the highest cost of attack to hold the chain hostage. If so, then we are currently overpaying to prevent >33% front door attack today, which means there is room to decrease the budget in favour of better security against ">66%" attacks. The analogy Vitalik uses is that if we imagine a house with 10cm thick bulletproof glass as windows but has a crappy wooden door, then we should reallocate resources to strengthening the weakest link. The other more ambitious but realistic method is to increase the number of solo stakers such that we become the quorum-blocking set of the current finalisation threshold. How do we achieve this? Stay tuned for Parts 2 and 3 for my discussion with Vitalik on Orbit SSF and Rainbow Staking!

Samuel Chong

560,959 просмотров • 1 год назад

HOW TO DODGE EVERY SKILLSHOT IN LEAGUE OF LEGENDS SO YOU GET ACCUSED OF SCRIPTING - Script in your mind - Draw out how far, wide, fast an ability is relative to your character thats all the easy stuff that I have been preaching already you can find in my free discord for improvement however one thing that League coaches fail to explain is the human aspect of it every game you play in League of Legends, every single person in the game is constantly building their profile in a game on how they operate both sides are constantly trying to mind f*ck each other to land and dodge skillshots. I have broken it down into layers the three layers to dodging are layer 0 - no dodge (unconscious) layer 1 - dodge (conscious) layer 2 - no dodge (conscious) Notice how in the clip in a challenger game below Olaf shoots a layer 0 skillshot, but because I am playing at a layer 1, I dodge his axe. Now the Thresh hook gets a little deeper bare with me, because I built the profile that I will dodge an ability in that moment, he thinks that I won't dodge and is shooting a hook at a layer 2 thinking that I will dodge at a layer 2 also. However I know that he knows I will likely not juke and walk straight so I make the conscious choice to dodge AGAIN playing at a layer 1 resulting in me dodging the hook, of course he could be accounting for my tumble but the point still stands. There are many deeper things to consider like zoning abilities, environment etc but you generally want to always play at a layer 1 until you gain more data in a game to adapt. However one thing that always stays true throughout my 13 years of playing League is in teamfights that have gone on for awhile, human beings tend to panic and default to layer 0 of shooting abilities, so if your able to operate at layer 1 as a teamfight progresses, you will likely dodge that one final skillshot that wins you the game. study the saskio way

Tony Chau

185,544 просмотров • 8 месяцев назад