Why Language?

2023-04-16T15:37:00+08:00

Now, everyone in the AI — if not the whole tech industry — is excited about GPT (Generative Pre-trained Transformer). I have been using it for various things from helping me to generate code snippets in a language, libraries, and framework that I'm not familiar with (mostly written in rust or python), to the general task of searching for new information, researching a travel itinerary, or understanding a new body of knowledge — I build and fly fpv quadcopters</a> in my free time, so I did some search for Friis Transmission Equation</a> to estimate the range of radio signals. Given the right prompts, it can do a calculation with the right formula on GPT-3.5 or some variants of GPT at you.com</a>.</p>

For about one year, I've been training and evaluating a custom dataset with YOLO for object detection, which is probably one of the most commonly used, tested and widely deployed neural network architecture at present. But, GPT is a brand new way of processing inputs and generating outputs. I watched some videos discussing about Transformer</code> layers, and self-attention mechanism, and finally came across the original paper</a> with the code being used for training and evaluation of the models</a>.</p>

To summarize why a Transformer</code> layer is powerful according to the paper, it's in its ability to process a sequence of inputs in parallel compared to a recurrent layer. In terms of algorithmic complexity, the following table taken from the paper clearly describes this:</p>

Layer Type</th>	Complexity per Layer</th>	Sequential Operations</th>	Maximum Path Length</th></tr></thead>
Self-Attention</td>	O(n^{2</sup> * d)</td>}	O(1)</td>	O(1)</td></tr>
Recurrent</td>	O(n * d^{2</sup>)</td>}	O(n)</td>	O(n)</td></tr>
Convolutional</td>	O(k * n * d^{2</sup>)</td>}	O(1)</td>	O(log_{k</sub>(n))</td></tr>}
Self-Attention (restricted)</td>	O(r * n * d)</td>	O(1)</td>	O(n/r)</td></tr> </tbody></table> Where `n</code> is the sequence length, d</code> is the representation dimension, k</code> is the kernel size of convolutions and r</code> the size of the neighborhood in restricted self-attention.</p>` `As can be observed from the table, the benefit of this approach can be explained in section 7. Why Self-Attention</strong> quoted from the paper:</p>` Learning long-range dependencies is a key challenge in many sequence transduction tasks. One key factor affecting the ability to learn such dependencies is the length of the paths forward and backward signals have to traverse in the network. The shorter these paths between any combination of positions in the input and output sequences, the easier it is to learn long-range dependencies [12]. Hence we also compare the maximum path length between any two input and output positions in networks composed of the different layer types. </p> As noted in Table 1, a self-attention layer connects all positions with a constant number of sequentially executed operations, whereas a recurrent layer requires O(n) sequential operations. In terms of computational complexity, self-attention layers are faster than recurrent layers when the sequence length n is smaller than the representation dimensionality d, which is most often the case with sentence representations used by state-of-the-art models in machine translations, such as word-piece [38] and byte-pair [31] representations. To improve computational performance for tasks involving very long sequences, self-attention could be restricted to considering only a neighborhood of size r in the input sequence centered around the respective output position. This would increase the maximum path length to O(n/r). We plan to investigate this approach further in future work.</p> A single convolutional layer with kernel width k < n does not connect all pairs of input and output positions. Doing so requires a stack of O(n/k) convolutional layers in the case of contiguous kernels, or O(logk(n)) in the case of dilated convolutions [18], increasing the length of the longest paths between any two positions in the network. Convolutional layers are generally more expensive than recurrent layers, by a factor of k. Separable convolutions [6], however, decrease the complexity considerably, to O(k * n * d + n * d^{2</sup>). Even with k = n, however, the complexity of a separable convolution is equal to the combination of a self-attention layer and a point-wise feed-forward layer, the approach we take in our model.</p>} As side benefit, self-attention could yield more interpretable models. We inspect attention distributions from our models and present and discuss examples in the appendix. Not only do individual attention heads clearly learn to perform different tasks, many appear to exhibit behavior related to the syntactic and semantic structure of the sentences.</p> </blockquote> To illustrate this in a less accurate but easier to understand analogy: </p> I will need to take at least 100 milliseconds to process a word, or faster if I have learned and understood the meaning of a word. Longer in the order of seconds or minutes if I never heard the word before and need to look up the word in the dictionary, books, or the internet. </p> To process the previous paragraph, I will need to process 55</code> words. </p> >>> words = "</span>I will need to take at least 100 milliseconds to process a word, or faster if I have learned and understood the meaning of a word. Longer in the order of seconds or minutes if I never heard the word before and need to look up the word in the dictionary, books, or the internet.</span>" >>> </span>len</span>(words.</span>split</span>(' ')) </span>55 </span></code></pre> In total that's about 5500 milliseconds or 5.5 seconds. Which could be faster if I understood most of the words written above, but it will be significantly slower if it was written in a different language. </p> A Transformer</code> could process the sequence of words in parallel at constant sequential operation O(1)</code> which I still couldn't imagine how it's capable of doing so, but I trust the author of the paper. At a certain dimension d</code> it will increase the computational complexity, but a restricted Self-Attention</code> can limit the amount of sequences to O(n/r)</code> which is still impressive.</p> Back to the topic of this post Why Language?</em> As I learn and self-reflect on this from time-to-time. We have become more intelligent and capable to acquire new knowledge, understanding, and interact with others — humans and machines alike — as we have better languages. Surely, we have other elements of physical gestures, tactile, and vision to further improve the process of learning and interaction. However, the signals produced from such experiences have to be interpreted, processed, and understood as well. We constructed a representation in a language form that could possibly describe those process, as best as we could.</p> In most cases we don't actually have a precise language</em> to describe what we experience, feel and perceive as a whole. Humans do not think in statistical properties of a signal. We don't measure the amplitude, variance, or means of our vision and stress signals. Even if we are presented with these numbers, how fast can we process them to decide on and commit to a specific action? It's significantly slower.</p> To a Transformer</code> these signals could represent some input sequences, and due to its ability to perform a sequential operation at constant complexity to the dimension of d</code> in parallel, it can correlate multiple sequences at once, and very consistent at doing so. This opens up a whole other topic involving the human condition, our intelligence, and existential risks, which I have slightly explored and reflect on it from various discussions surrounding GPT on YouTube and the Internet. But, I have not decided that it's a topic worthy of my full attention, as everyone will pay attention to it eventually, once we live in a very strange society 😄</p> A recommended conversation of this topic with Eliezer Yudkowsky who believe that our ability to interpret an AGI (Artificial General Intelligence) or a GPT</code> is much slower than its capability to improve further, and there's more on Lex Fridman channel</a>:</p> </iframe> </div> About Me 2022-09-03T15:04:33+08:00 A self-taught software engineer from computer and control systems education background.</p> I am working as a Senior Software Engineer at Screening Eagle</a> Singapore.</p> I have been working on iOS platform since 2008. At that time, I was fascinated and intrigued by the iOS when Apple released the first SDK to developers all over the world with iPhone being hackable and having the same base system as OS X.</p> Since then, I have worked mostly on mobile applications development on iOS, and will continue to do so for the native / web platform. In general, I'm looking forward to the potential of Swift development ecosystem as a full stack platform, in addition to macOS / iOS. However, throughout the years I have also dedicated some time into engineering work that is not necessarily specific to Apple's platform. </p> But, on the frontend side basically there are only 3 choices available in order to make a career out of it. That is the browser</em> which we call the web</em> and Android or iOS on a mobile device or smartphone. Apart from that, it's likely that a VR headset such as Oculus Rift would become the next frontend</em>, human interactions may no longer require physical presence. People will need a better interface to interact with the outside world and with each other.</p> As an analog to VR, an FPV headset is a cheaper and practical system that has been used by hobbyists and professionals. This is especially apparent in drone racing or FPV cinematic, and we can obviously see its application in robotics or any remote sensing application, where you need a live video feed of a remote operation. AR which is now popular in smartphones, will have a better form as a headset. If we couple a digital FPV headset with augmented reality layer, we would have a much better experience. As a company like Orqa</a> refers to this system as Remote Reality</em>.</p> It has been a long time 2021-05-31T00:09:39+08:00 It has been a long time since I have written my last post</a>. I would say I have accomplished the minimum prerequisite of what I wanted to do with fpv, either micro or mini quads. I have built my own quads with my own selected components. They are not ideal or the best that it can be, as I only have few months of experience building and flying a DIY. But they are not bad either.</p> I can't believe COVID isn't over yet, but that doesn't stop me to build and fly. However, the pace of building and flying has plateaued for a moment, until I can find the right moment again, after I finished the most important step in my life, which is to buy my own property: my first house. I won't talk about that process here, as it would make this post very long.</p> What did I learn so far?</p> I have built two racing quads with 5-inch frame configuration, 6S (six Lithium Polymer battery cells) powered motors, ESC (Electronic Speed Controller), and FC (Flight Controller). I could write the details of my build, but I have written one of my builds on my DVR recordings here:</p> </iframe> </div> My first 5-inch build was HGLRC Wind 5 Lite frame with HGLRC tower stack F722 FC + 45A 4-in-1 ESC. However, due to my blunder, the FC, radio receiver (Archer RS), and the VTX was fried just before I was about to fly it. I have tested with smoke stopper and multimeter before I plug it in directly to the battery. But, possibly due to messy wirings (as the stack is very tight) it caused a short circuit. I wasn't actually sure what was the cause.</p> While my second build is a Five33 Switchback frame with the same HGLRC electronics, as recorded on the above video.</p> Lesson learned:</p> In a tower stack configuration, make sure to have enough space between the ESC <-> FC JST connector. If you like to layout the motor wires on top of the ESC, make sure this doesn't occupy the space of the JST connector. Otherwise, it may cause signal connection issues between the FC & ESC. </li> My recommendation for motor wirings is to use something like racewire which connects to the ESC motor pads in between each of the motor. A WS2812 LED board can be used for the same purpose as well. This will make motor replacement easier, and the overall wiring to be much cleaner. In this case you only need to re-solder the racewire / LED board pads near the end of each arm.</li> Do some initial pre-configuration on your ESC firmware if you're running 6S. This is to prevent the voltage / current spikes that's likely to happen on a higher powered build. In addition to soldering Low ESR capacitor at the battery lead. On BLHeli_32 firmware you might need to reduce rampup power and setting Demag compensation to High. Checkout Mini Quad Test Bench</a> for detailed explanation of BLHeli_32 configuration.</li> Always do a continuity check on your battery lead, and between the (+) of your battery lead to each of the motor pins of your 4-in-1 ESC, then do the same for the (-) lead. This must be done before and</strong> after soldering all the motors and XT60 / XT30 connectors. I have shorted one motor pin on a brand new T-Motor Ultra F55A mini 4-in-1 ESC. Which could not be known what was the root cause if I didn't do this continuity check before soldering.</li> A lot of Betaflight filtering guide tend to advise 5-inch quad to reduce filtering, by moving the D-term and Gyro low-pass filter sliders to 1.5 to reduce delay. Don't do this if your motor is warm even at 1.3 silder value. I have blown up one FET in an ESC of my HGLRC Wind 5 quad, because it slightly hit a tree branch, and one motor was stuck. In this case the motor will heat up, and without enough filtering, it may fry one of the ESC. I replaced this with the T-Motor F55A ESC, but one of the FETs broke again :(</li> </ol> For some reason, I encountered these issues only on a high-powered 6S build. I didn't have these weird issues on a 4S build. It signifies the extra care required when you're building, configuring or flying a higher voltage build such as 6S and above. </p> I did however encounter video electrical noise issue on an All-In-One board powered by 4S where one board has the FC + ESC, but that's happening at different component. It's a common issue where the closer the video signal is to the source of noise in power supply, the more likely it will be disturbed. I cleaned up the wirings and installed a 25V 1000uF Panasonic FM Low ESR capacitor. But, it's not a severe problem that require me to replace my FC / ESC / Motor.</p> In conclusion, a lot of these annoying issues can be prevented as we have more clean wirings, solder joints, and experience in troubleshooting electronics of our own build.</p> Micro or Mini quad? 2020-10-26T00:37:57+08:00 I have been flying only 3-inch propeller size quadcopters so far, as it's the first one that catches my eye as it has the right size and the right power & speed for my circumstances. A 5-inch propeller size would theoretically fly better and further, and it's the most common size being used by racers and professional FPV pilots. But, have we ever wonder why there are more and more micro-to-mini size?</p> This is made more apparent with micro long range explorer such as the Flywoo Explorer and GEPRC Crocodile Baby.</p> To clarify the terms about the size of quadcopters, mini quad refers to mostly quadcopter frames with size of around 200mm - 300mm of diagonal wheelbase, while a micro is mostly 100mm or less, but sometimes it can be used for sizes in between 100mm to 200mm. But, for most people, it's much simpler to refer to mini quad as the ones equipped with 5-inch propellers size, and micros can be used for those who are sized less than 100mm up to 3-inch. </p> I found it a little bit inconsistent and can be confusing to refer to Brushless Whoop</em> or Tiny Whoop</em> propeller sizes with milimeters while referring to Toothpick</em> or larger propeller sizes with inches. Then, we always refer to frame sizes in milimeters :). But, as time goes by everyone in this industry got used to these inconsistencies. I assume it's because the manufacturers have to cater to the US market with Imperial units, and those outside US who are using metric units. It's easy to get the specifications mixed up.</p> The main reason so far is as clearly described by BetaFPV</a>:</p> BETAFPV is one of the premiere drone companies in the world — fueling FPV racing and freestyle communities worldwide with cutting edge products and gear. We know all about the quaddiction, which is a fun way of saying we’re hooked on drones too. Our products empower newbies and pros to build and rip drones with little effort or cost. Our primary focus is micro quadcopters (called whoops) and the accessories you need to make yours fly well and look good. We love sharing the joy this hobby brings with others. Micro drones are the best way to begin. Micros are not heavily regulated or restricted worldwide. They can be flown almost anywhere. We sell hundreds of high quality products with customers all over the world.</em></p> But, from my experience how does someone actually scale up from flying a tiny whoop to a powerful racing drone being used in actual race or any other purpose? That's where 3-inches quad comes in, and also the recently becoming popular 4-inches long range explorer. A simulator such as DRL or VelociDrone can only simulate so much, but it will not be able to mimic the actual experience of flying FPV with all the equipment, and environment. Flying indoors might be an experience that's closer to a simulator, but outdoors flying takes more risk and environment sensing. Although, it can be a similar experience if you can successfully simulate the same quad with the same characteristics. </p> Another reason for flying 3-inches is simply it's just difficult to find a place that's legal to fly, where there's no crowd, and reasonably far away from people and cars. If you live near the wilderness, you're lucky and you can fly your 5-inches or even 7-inches with more freedom. But not for most people. That's the drawbacks of city life, people have become less adventurous, and curiosity has become lower, although the economic and education level are higher. Not simply because of choice, it's also because there's no choice but to survive and adapt to the existing environment.</p> But, to be more honest 3-inches or even smaller quads are more compact, lightweight, and portable. It's just easier to handle, and I don't need large capacity batteries to have fun with it. Although I need more battery capacity if I'm flying one of those 4-inches long range explorer.</p> Freestyle or Racing? 2020-04-26T22:15:46+08:00 When I started the hobby, I am aware that there are two ways of flying FPV which is racing and freestyle. Drone racing has been going on for a while with professional competition all over the world such as DRL and MultiGP. But, freestyle doesn't seem to have a specific venue, because as the name suggests, it's free flying with style. </p> I would like to explore a bit on how does someone who started with FPV approach these two styles of flying. In racing, there are specific tracks, obstacles, goals, and time tracker. It's structured, and the goal is obvious: to be the fastest to finish the track. But does that mean being the fastest is all it takes? Depends on what it means to be fast</em>. I have been using the DRL (Drone Racing League) simulator, and have participated in the 2020 Tryout tracks, which is a course in Drone Park</code>, and as someone who is just starting, it's extremely hard to get to the top. Even on a simulator, not the real race.</p> How difficult is it to race at Drone Park?</p> </iframe> </div> As you can see from the video, the track is the most challenging I've ever seen compared to DRL tracks in previous years, as it involves multi-directional full-throttle maneuver, power-loops, corkscrew, and more. A slight interruption to flow will just ruin everything and it will definitely drop your lap time. I could finish the track but with about 2-3 of the time it takes compared to the #1 in the leaderboard. Also, it's very difficult to do inverted power loop, while staying on track with full throttle. Alpha Pilot?</a> Let's just say robotics and AI will need to learn animal instinct before it can compete with the best FPV pilots.</p> In comparison, flying freestyle is about learning tricks and maneuver in various interesting location and spots. The goal is mostly about how each style transition smoothly to the next style. What is a good style is also subjective, but in general if you have a very good and precise control of how to fly a racing quad, you can transition to freestyle easily. Because the dynamics are similar, just the goals are different. </p> I found that practicing racing skills are hard but more motivating than a non-targeted practice of freestyle tricks. This is especially true in a simulator (DRL or VelociDrone), where you don't really have real-world immersion. In the real world, practically it's easier to fly freestyle in most cases. Thus, I tend to view freestyle as a relaxation or exploration flying that you couldn't do with racing. It's likely that the better someone is at racing, the better someone could be at freestyle. </p>

Ruminations

Why Language?

About Me

It has been a long time

Micro or Mini quad?

Freestyle or Racing?