When should I exactly manipulate bits?

When should I exactly manipulate bits?


It depends on what you do. If your program has huge bool arrays and needs to do operations on consecutive bools that can be expressed as bitwise operations on integers, then you may very well get performance out of doing something like that. If really all you do is flip single bools between true and false, then no.


>If your program has huge bool arrays and needs to do operations on consecutive bools that can be expressed as bitwise operations on integers, then you may very well get performance out of doing something like that. Yes! This is very similar to what I have going on. I guess since performance is my number one priority now, I'll refactor the parts of code that require individual bit manipulation. Thanks.


If you can, try to measure both solutions. That will be the best confirmation that one strategy is superior to another.


Yeah, you'll want to use uint32\_t and similar data types in your declarations. I prefer unsigned integers because sometimes I need to use them in cases where I need their actual values, and if the left bit is 1 in a signed integer, boom it's less than 0.


If you want to use integer be aware that they don't neccessarily have the same size on different systems. If you want to write functions like FlipNthBit(bitVector, n) make them inline. And rewrite a little part of the application first and do some performanecetesting with typical workloads because it is very likely that you won't get a positive effect. It realy depends on how the application gets and processes its data and how good the cache can be used. Good luck :)


A good rule of thumb is don't refactor for performance until you isolate the bottlenecks in your code and actually know where you have performance issues, lots of times you'll be surprised where it's at.


I suppose what I'd do is one of the three following things: 1. std::bitset 2. std::vector 3. struct { bool a : 1 = false; ... }; Option 1 is great for fixed-sizes. If you know how many bits you'll need, this is the way to go. Pair this with an enum (not enum class) so you can access the bits using named indices. Option 2 is needed for dynamic approaches. Pair this with an enum (not enum class) so you can access the bits using named indices. Option 3 is great because you don't need to index anything, c++20 allows you to set default values for each bit, and you do not need an enum since you can just name the bitfield entries something meaningful. As an honorable mention, there is always the route of using an integer and bit masking/shifting. This works well up to a certain point, but becomes convoluted once you max out a 64-bit integer. Options 1-3 scale arbitrarily. The honorable mention might be used if you ever need to write the bit field out to a file.


Awesome, these are very good to know. I am going to play around with all of the ones you mentioned, thank you!


std::vector is generally implemented such that each element is a bit (vs. an actual bool) so it's very space efficient.


Yes! But check your implementation before making this assumption. The standard makes no guarantee that it's implemented like this despite making provisions to allow it. There is a lot to be said about std::vector


I should also point out that you are flirting with something called data-oriented design. The idea that you lay out your data in a contiguous, cache-friendly manner and in a way that makes sense with the use-case. Packing bools into a contiguous array is one thing, but if you want more optimizations like this, you should definitely familiarize yourself with the principles DOD. I like to give ECS as an example of an extensible DOD pattern since it uses the principles so nicely.


Unlinked STL entries: [std::bitset](https://en.cppreference.com/w/cpp/utility/bitset), [std::vector](https://en.cppreference.com/w/cpp/container/vector) --- ^(Last update: 26.05.21. Last change: Free links considered) [readme](https://github.com/Narase33/std_bot/blob/main/README.md)


At the minimum you will get a 8x reduction in memory. You can probably rearrange and write functions to be able to operate on bytes instead of bitwise operations.


> 8x reduction in memory. and bandwidth, which is usually at least as important.


much better cache utilization would be the goal here


Amdahl's law suggests you find the bottle necks in memory and performance via analysis before you go off changing code (or hardware). 8x reduction in bool memory may not really amount to much.


I think you should not, bit manipulation is meant to... manipulate bits. Of course bit manipulation *can* be used for ganging bools up, but I would only use it if it has a significant performance boost for your use case, and even then I would envelop that in a class to make it "invisible" to the rest of the code.


Can you give us anything more to go on? When you boil it down, all programming is bit manipulation.


It's a trading app, I have been assigned to make the code more efficient (less memory usage and faster in general). I know that bit manipulation on individual bytes could speed it up, but I am just wondering if it is worth refactoring the code base, since it *could* help. Also, since a bool is 1 byte, isn't it taking up an entire byte, when a single bit is sufficient enough to tell if it is `true` or `false` (`0` or `1`)?


I strongly subscribe to the philosophy that you don't have a performance problem until you measure it, and once you've identified a problem exists, you should drill down to ensure you're fixing the true bottlenecks before investing all of the engineering effort. In most applications (including most non-alpha/ML trading logic), bitwise operations aren't likely to address the real bottlenecks. In case you haven't, I definitely recommend digging a little deeper with sample-based profiling, orchestrated tracing, or OS-based analysis. If you're on Linux, I highly recommend poking around your application with eBPF http://www.brendangregg.com/index.html https://ebpf.io . Good luck!


Ah Interesting, I didn't know about eBPF, it seems very useful. I'll give it a try, thanks man!


No problem! If you choose to take eBPF for a spin, I just realized it can be a lot to wrap your head around at first. The "sample" utilities distributed with the BCC or BPFTrace projects are honestly a great place to start, and you can construct additional tools if you think they would be useful. https://github.com/iovisor/bcc#tools


> Also, since a bool is 1 byte You really cant get memory in sizes less than 1 byte. So your `bool b;` will take up more than one bit. This is another reason why potentially accumulating them into an integer and doing bitwise integer operations on them may be faster. That way you can fit 8 "bools" into a byte. That is what `std::vector` *potentially* does (and we would all be better of if it didnt and that fancy stuff were its own type).


>This is another reason why potentially accumulating them into an integer and doing bitwise integer operations on them may be faster. That way you can fit 8 "bools" into a byte. Ahhh, I see. I am going to try and play around with it and see how big the performance jump is. Thank you!


> Also, since a bool is 1 byte, isn't it taking up an entire byte, when a single bit is sufficient enough to tell if it is true or false (0 or 1)? Classic time/space/complexity tradeoff. You can't address a bit. Your load/store operations work on (at least) bytes. You don't have 1-bit registers or operations. So to get a packed bit you have to do three or four different operations to mask/shift the bit you want into where you want it. For a lot of cases, working with bytes is much more practical than working with packed bits. The language gives you everything you need to build packed bitwise operations if you need them, but the language would have to be massively more complex to provide a 1-bit type because it doesn't map directly to what the hardware is doing.


I wrote high speed trading software for almost a decade. Video games before that. > I know that bit manipulation on individual bytes could speed it up, but I am just wondering if it is worth refactoring the code base, since it could help. > Also, since a bool is 1 byte, isn't it taking up an entire byte, when a single bit is sufficient enough to tell if it is true or false (0 or 1)? Again, what bits? What manipulation? You're being too vague. I can't tell you anything. Are you trying to ask how to optimize code in general? There's a lot to factor in. Are you actually slow? That's a big one, because although you can measure and get some numbers you can probably improve, there are lots of optimizations that will make that pice faster but have zero net gain in performance. Where are you slow? How are you slow? Is it a software problem? Are there other, cheaper solutions? And once you determine a solution, how are you going to protect that investment? I worked with a company who wrote their trading app in Java. It was fast enough, frankly, because the problem wasn't the software, we were IO bound, which is typical. There's no point in making the software faster if you can't shove the results out the door any sooner or get it there any faster. $5k -> $15k per network card? What does that gain us, 600 ns? Absolutely, we'll take it. That's way cheaper than 6 months of developer time. We even used microwave antennas out the windows because the LOS was a shorter distance to the exchange across the street than running the fiber down the building, across the street, and back up. And those 600 ns were guaranteed. I could write a faster algorithm, but then the next code change could consume that performance gain, and then some. So again, how do you protect an extremely costly performance improvement? So you've got to measure and actually attack the problem where it matters most, where it's actually significant, and. And you need to really think outside the box about your needs and assumptions. We didn't log along the critical path. Everything the trade server did could be deduced from the network traffic, so we had passive taps on the lines that did packet capture. Logging is pointless and hella expensive, a total waste that is better used elsewhere, along non-critical parts of the code. Think about that: the 80/20 rule says 20% of the code does 80% of the work, so 80% of your *code paths* CAN have logging, 20% MUST NOT. Shared resources are always a mistake. It's often both faster and cheaper to copy data per thread than to try to share among them. One thread per socket DOES NOT SCALE in performance. You `select` on the main thread and dispatch messages to a work queue. Look long and hard at your data structures. One company I worked for marshaled between the FIX buffer off the wire and a struct they used in memory. The struct was larger than the FIX buffer, and marshaling took time. I kept the data in the buffer, performed a linear search, and extracted the values to their proper types as they were needed. The way they worked, I even wrote back into the buffer. I also sorted the buffer where the last accessed field was first in the buffer - if it was used before, chances are, it'll be used again. The most frequently used fields stayed at the top, the unused fields stayed at the bottom. My linear search was orders of magnitude faster than any other general purpose search, including binary search. I mean, why start in the middle of the buffer when chances are the field you want is actually in the front most of the time? General purpose databases are too slow. Relational databases? FORGET IT. You need something in memory, and you need it to be custom tailored to your needs. We had started out using a multi-index map, but child and other order relationships were slow, so I wrote a custom container that stored related orders spatially. That meant related orders were "close", both in lookup and in physical memory (I tried to keep related orders on the same or next page of memory). The most important thing is the data. The data is everything. How you lay it out in memory, how you access it, is most important. I think you should do a little research on Data Oriented Design. OOP becomes a transparent view on top, but then again, you or your colleagues who wrote the trade server you're inheriting probably have an archaic idea of what objects and encapsulation are. But all told, if your data is laid out correctly, you can write the most inefficient algorithms, and still be fast. BUBBLE SORT can be *unbeatably fast* in the right context, the most correct solution in some cases. If my code is slow, I always look at the data and how it's used first. Changing that tends to guide my hand at changing the algorithms. Remember, you're almost always IO bound at some level. Typically you're slow due to cache inefficiencies, leading to misses and fetching latencies.


Thank you, this was a really good read and great information for me.


>Also, since a bool is 1 byte, isn't it taking up an entire byte, when a single bit is sufficient enough to tell if it is ... But in order to make any relevant performance difference you should have a program that does an heavy amount of booleans, like a container you want to iterate over, so that caching can get more bools in... But that's really stretching the extent of manual optimization you can do. Whatever the software it's extremely more likely that there are hundreds of things to improve before looking at the 7 bits wasted in a boolean. Also don't always make the assumption that all the booleans in your codebase only contain 0 or 1...


What Operation on the bool are you doing? If it’s just set (a flag for instance) than, in most cases you are better off using a char or an int. Why? Because an and or an or requires the prior value. That means going out to memory. Memory reads cause the processor to halt. Memory writes to not (assuming the write queue doesn’t get full). So if your going to be setting a crapload of fags and then reading them later you may not want to do but manipulation. As well if your access is mostly random then you don’t have temporal locality which hurts cache reuse so you won’t by anything from having packed data. There are a lot of things you need to consider before you come to the best strategy. That said…. Don’t optimize ahead of time. Unless you know a priori that the particular piece of code is the critical path then don’t worry about it and just do what’s the most maintainable.


Are you having trouble wrapping your head around bit manipulation in every case, with this as just 1 example, or do you generally understand the value of bit manipulation, but are unsure if it's a benefit in this case? Here's the annoying thing about C++ that I love so much about it: Nothing is sure until tested. /u/IyeOnline phrased their answer very specifically. They listed a very specific case where you are likely to get performance benefits, but even then, you *may* get performance. Your specific use case might have that access pattern and still get performance penalties. You haven't given enough information for anyone to know for sure. The only way to know is to test it.


Back in the day we were programming an image accelerator co-processor. It was a SoC and had a bunch of hardware devices that were controlled by registers. Each bit in those registers meant something. So we had some bit shifting and setting macros that we used to set or clear those bits. In a summary, you may need to use those when writing drivers.


One reason that using bit maniupulation might be to check multiple "bools" at the same time. For instance, if you have an integer `foo = 0b11010100`, and each bit corresponds to some boolean, you can use bitmasking to determine if multiple bits are set. Like, if for example bit 7 (the MSB) was "X active" and bit 6 (the next MSB) was "Y active", some function could check both "bools" at the same time, like `if ( foo & 0xC0 == 0xC0 )`, then both "X" and "Y" are active (because `0xC0 == 0b11000000`). One place I've often used bit manipulation is to construct packets of specialized data. We might have a header byte in a packet with 4-bits representing an opcode and 4-bits representing the packet length. Often times things like this is necessary in environments where speed and bandwidth aren't a luxury you have, like some military communications standards I've seen are stuck with like 64-byte packets at a max of 32Hz, giving an absolute max data rate of `~2 kb/s`. In cases like that, every bit counts, and data needs to be packed efficiently.


Thank you, this is very useful info.


Let me give you a real world example of where using bit manipulation ... it helped me optimize my sql database's memory. So i was working on a admin system for a website . In that Admins maintains different parts of contents for the like uploading and deleting files , managing promo codes , making other admins etc.. so basically i needed a permissions system like ( edit_promo permission , edit_admins permissions , manupulate_files permission and etc...) so i initially thought of storing these permissions as an array in sql table like a csv which would cost a lot of memory in case there's a lot of data.. so i thought about storing all the permission in a composed integers . so the idea was like this i would use each bit of integer to store a particular permission like 0001 would be edit_promo 0010 would be edit_admins 0100 would be manipulate_files and lets say there's an admin with both edit_promo and edit_admins permissions so his permission would be 0011 as you can se we can use bit wise or operation for composing multiple permissions... now that i know to compose multiple permissions ..i need to check if certain permissions is there for the admin like say there's an admin with permission 0110 so now to check if he has a permissions of edit_admins ...( we do an and operation ) like this 0110 & 0010 which would be 0010 which on converted to bool would be true.. Isnt that powerful and and great optimization !?leme know if you are interested to see that project file..


Ah great man, that honestly seems so much efficient, after couple hours of testing, I think I have found the bottleneck/performance hit in my code base. I am going to try and optimize the code and do something similar to what you did. Thanks for the example.


You should be manipulating bits if your analysis and profiling reveals that your current boolean manipulation is too slow or takes up too much memory. It can definitely make a difference, but you need to profile since each program is different.


Consider the difference between setting 32 individual bool variables to true and int x = 0xFFFFFFFF; It's not just a storage thing, you can feasibly get a 64x speedup as well. That said, look into bitset.


Oh wow. I am reading up on `std::bitset` right now, is that what you are referring to?


No, he suggests you to use `int` values as `std::bitset<32>`


Got it! Thanks




Ah got it, thanks.