C++ notes

To prevent SIGPIPE while debugging client/server use (gdb) handle SIGPIPE nostop

GENERAL

defined functions in a aclass declaration (in a header file) implicitly defined as inline i.e. their body will be injected everywhere function call appears
static const integral class members can be declared wo definition (value set directly in place where it is declared), in this case compiler substitutes all the occurences of this memeber with actual value (does not require memory allocation)
nested classes are friends to the outer class
custom new (with args) must be always accompanied with delete with the same list of args. Otherwise C++ runtime can not undo new if ctr throws an exception
applying delete never calls “placement delete”
zero length arrays are not legal in C++
ellipsis function — functoin that accepts everything: foo(...)
local classes are final
the compiler does not compile not used functions it only checks their syntax
if prevents CPU pipelines i.e. affects performance

C++ Core Guidelines

for cheap-copying objects (primitives, string_view, iterators, small objects ~16B) prefer pass by value
prefer retrun value to output parameters

TEMPLATE META PROGRAMMING

the expression being evaluated is a compile-time constant, which means that you can have the compile instead of runtime code check. The idea is to pass the compiler a language construct that is legal for a non-zero expression (aka sizeof(A) <= sizeof(B)) and illegal for an expression that evaluates for zero
assume
```
template<bool> struct CompileTimeError;
template<> struct CompileTimeError<true>{};  
```
if you try to instantiate CompileTimeError<false> the compiler alerts a message “Undefined specialization”
if the compiler sees a specialized template it uses its implementation everywhere such specialization is appliable. Otherwise — a generic implementation
partial specialization is not appliable to functions (before C++17). But it is appliable to non member functions within a namespace, but only for function args (not RV as it is kind of overloading)
specialized templates in C++ are differnet types. Hence Type<true> and Type<false> are different. This makes it possible to use compile time dispatching
virtual functions can not be templated

TEMPLATE TYPE DEDUCTION

Suppose signature: foo(T&); It is safe to pass const param as deducted type will be const ParamType&, i.e. if const int x = 123; then foo(const int& x)
Universal reference (forwarding reference), i.e. foo(T&&) can be deducted to be either lvalue reference or rvalue reference depending on the actual type of the parameter passed
Suppose foo(T), where T is a ParamType; actuall parameter will be always passed by value, i.e. copied
to constrain templates. i.e. disable them for compiler if some terms apply, one can use (in templates expressions) std::enable_if; std::is_same; std::is_base_of; std::is_integral; std::is_constructable etc. See Item 27

OBJECT INITIALIZATION

initialization with curle braces Foo f{} prefers ctr with std::initializer_list if exists
{} initilazer can not be perfect forwarded: new Foo(std::forward<Ts>(params)...)
auto foo = {10, 20} — creates std::initializer_list

LVALUE&RVALUE

lvalue – persistent value; rvalue – temporary value, e.g. foo(new Bar) – here the result of the new is a rvalue
lvalue are passed by value, i.e. copying (slow); rvalue — by move (fast)
std::move converts lvalue into rvalue
to overload method with universal reference as an argument one can use tag dispatching technique. See Effective Modren C++ Item 27

TAG DISPATCHING

A keystone of tag dispatch is the existence of a single (unoverloaded) function as the client API. This single function dispatches the work to be done to the implementation functions.

SMART POINTERS

shared pointer is two times bigger than a raw pointer (Widget + control block). Control block may be quite big if custom deleter is used
make_shared performs single memory allocation (Widget + control block)
using new instead of calling make_shared[uninque] may lead to resource leak: processWidget(shared(new Bar), doSmth()) -> new Bar; doSmth [- throws exception]; shared ptr ctr [is not executed]
do not try to pass shared_ptr by reference: prefer pointer to the object or const ref to the object.
pass shared_ptr by value if called function changes ownership
control block contains weak count (in addition to ref count and other stuff) and thus can not be deleted while weak cont GT 0
when pimpl idiom is applied and unique_ptr is used ctr; copy and move operations must be declared in the header and defined in the source file (simple Widget::~Widget() = default; will do)

MOVING AND COPYING

std::move for rvalues; std::forward for universal references
std::move used in function call (foo(std::move(smth))) forces compiler to think about a local variable as about temporary unnamed parameter to a function
RVO – return value optimization, i.e. if 1) function creates a local object and 2) returns this object (has the same return type) this object maybe placed into return value placeholder on the stack, i.e. no copying occurs on return statement
perfect forwarding fails in the following cases: type deduction fails or impossible; {} initialization; 0, NULL pointer overloading; forwarding overloaded functions; forwarding bitfields

CONCURRENCY

std::async default launch policy does not guarantee that the task will be executed concurrently nor it will be executed (it will be executed only if get or wait is called on the resulting future=vv v)
volatile in C++ means special memory (memory mapped IO, for instance) and disallows optimizations on this memory, e.g. reodering, deleting dead stores etc

CUDA

device function limitations:
1. no address
2. no recursion
3. no state variables
4. no vararg
variable specification limits:
1. no extern
2. constant write only through special functions on CPU
3. shared may not be initialized in declaration
data types on GPU: 1/2/3/4-dimensional vectros of (u)char, (u)int, (u)short, (u)long, long long, float and double
starting from GPU 8 (nvidia) GPU supports natively bit operations
max length 128 bit
double[][] – OK; double[][][] – NOK
dim3 – uint_3 with ctr(not initialized = 1), i.e. dim3(5) == [5,1,1]
factory for types: make_{type} e.g. make_int2(1,7)

kernel has access to:

dim3 gridDim; 
uint3 blockIdx;
dim3 blockDim;
uint3 threadIdx;
int warpSize;

CUDA by default initiales everything to 1

how to run kernel with total nx threads:

float* data;//array ptr
dim3 threads(256);//threads N
dim3 blocks(nx/256);//how many blocks
//defines a set of 10 blocks with length = 256
int kernel <<< blocks, threads>>>(data);

GAME DESIGN

prefer event driven architecture in any case
split logic and application and view layers. Where veiw represents logic state changes to users (this can be also AI); application layer speaks with the hardware; logic accumulates game state
low-end video cards tend to have DirectX drivers
OGG format is OpenSource; also FMod — quite capable framework for playing sounds
standard OS mem manager is not efficient
different order of accessing n-dim arrays may affect performance. It depends on how arrays are stored in RAM
CPU reads and writes mem aligned data much faster i.e. int stored at 0x04 much better than at 0x0400…0002. 32%4=0– 8-byte boundary
always align data memebers so its performance will increase for free
do not let compiler ad alignment. This causes enormous memory waste. Use #pragma pack() to ensure manual alignment