Showing posts with label C. Show all posts
Showing posts with label C. Show all posts

Thursday, 3 May 2018

Is low level programming still relevant these days?

The levels of abstraction have made the application programming much easier and faster. But everything comes at a price.

This is a new type of article here, I hope I will have some more such articles describing programming problems and the practices to solve the former.


Contents

Top


Unlimited Resources Myth


The overhead of various frameworks and libraries sometimes (or should I say always) makes memory usage and CPU utilization less efficient. Fast hardware and loads of RAM lead to sloppy coding practices too, luring the developers into the mirage of endless resources.

Sure thing, most developers would place a check now and than (e.g. making sure malloc does not return NULL, although most of the time this will not happen). But what happens when memory is fragmented? Large data structures will make fragmentation even worse and allocating more memory will become slower and slower.

And yes, eventually the available RAM will shrink.

Top


Basic Data Structure


Consider this data structure:

typedef struct{
  int id; //MAX id < 10000
  int array1[10000]; //MAX element value <= 4
  int array2[10000]; //MAX element value <= 4
} my_struct_type;

struct my_struct_type my_array[10000];

On a 64bit machine the structure will take 8 (id) + 8 x 10000 x 2 (array1,2) = 160008 bytes. Creating an array of 10000 such structures will occupy about 1.6GB. This is a not-so-high level language. Just imagine what size the equivalent structure will be in Java or C#. 
 
Top


Improved Data Structure


Let's take a closer look at the struct. It is noticeable that the arrays operate on very small values, so it would be natural to change the structure to use smallest possible datatype for arrays elements:


typedef struct{
  int id; //MAX id <= 10000
  uint8_t array1[10000]; //MAX element value <= 4
  uint8_t array2[10000]; //MAX element value <= 4
} my_struct_type;

Now the size of the structure is 8 (id) + 1 x 10000 x 2 (array1,2) = 20008 bytes. And the total size is 200MB. That is much more manageable.

But wait! What will happen if we need more structures  -e.g. 100K? This will lead to 2GB - oops, too much again (to be honest, 200MB is too much anyway).

Most developers (I hope) know that a byte consists of 8 bits. From the data structure comments we can assume that an array element will not occupy more than 2 bytes. So why don't we use bitfields instead of full uint8_t? Well, it's better not to - bitfields are compiler dependent and are not that efficient (more on that later).

Theoretically we can reduce the data struct footprint to a quarter size. And it is possible using bitmasks and bitwise operators. Lets do it step by step.


Top


Optimized Data Structure


Start with the notion the arrays1,2 are of the same size. So we can put element of array2 into unused part of array1:

typedef struct{
  int id; //MAX id <= 10000
  uint8_t array12[10000]; //two arrays combined
} my_struct_type;

So now the array element in its binary form will look like array12[0] = 0b0000BBAA, where AA is the array1 element  and BB is array2 element. So the size is halved. But there is still enough room to insert more data. Why don't we store one more of each elements of array1 and array2 in the same element of array 12, like this: array12[0] = 0bB1A1B0A0.

this way our struct becomes

typedef struct{
  int id; //MAX id <= 10000
  uint8_t array12[5000]; //four values in one element
} my_struct_type;

Now the memory footprint will be roughly 50MB. And this is not the only benefit of the current data structure. But before discussing it lets consider the possible overhead of bit manipulation.

Data Packing Algorithm


To store the value val in the array's element el we need to perform the following operations:

1. Put value in correct binary position:  

val = val << val_pos;

2. Depending on the index value, shift the value to the left half:

val = val << (index % 2 ? 4 : 0);

Modulus can be replaced with binary index & 1  and the value can be shifted left 2 bits to get rid of conditional jump:  

val = val << ((index & 1 ) << 2);

3. Insert the value into the element:

el |= val;

There are 5 bitwise operators. Bitwise operators are the most simple and "inexpencive" ones for the CPU. This will take about the same time (or only slightly more) than storing (copying) a value into array.

Faster Memory Allocation


As I mentioned earlier the reduced memory footprint is not the only good thing about the new data structure. For the computer's memory management engine it is much easier to allocate lots of small memory chunks than lots of large memory chunks. By reducing the structures memory size we increase the processing seed (possible compensating for additional CPU cycles taken by bitwise operators).

Sporadic nature of new data insertion


Another thing worth noting that addition of new elements does not happen all at once, but occurs more or less sporadically. The new element creation will be unnoticeable from CPU point of view overshadowed by more CPU intense operations. As for the occupied memory, there is not way around it and at some stage the program will have to struggle with subsequent memory allocation due to RAM fragmentation (if using dynamic memory allocation of course).

Top


Example program


A working example can be found here: https://github.com/droukin-jobs/packer - a proof of concept that bitwise data placement is not much slower than traditional approach and much more memory efficient.

Top


Conclusion


While having "unlimited" memory and CPU resources it is still important to keep track of your resource usage. How many times you had to sit and wait until some program loads a new screen or processes a new request (a good example here is SolidWorks - the behemoth is so slow even on the fastest systems despite being a well known and respected brand). When proramming it is good to create robust and easy to understand code, but after all testing is done, why not optimize some obvious parts - if you know how?
Top

Wednesday, 18 October 2017

Digital DI Box 05 - Proof of concept

Digital DI Box 05 - Proof of concept

I finally made it to the actual proof of concept stage where I can actually see if the Digital DI box is possible at all

Test setup

Since I don't have (yet) another TQFP-44 socket to properly test the balancing/unbalancing of audio signal, I had to drive the inverted audio through the same MCU.


Audio IN (orig) ----> ADC0 -> Invert -> DAC0 -> Audio OUT (inverted)
                        |
                        +---------+
                                  |
                     Shift to adjust for time delay
                                  |
                                  v

Audio IN (inverted) -> ADC1 ---> Subtract -> DAC1 -> Audio OUT (final)

Time delay shift is tricky to measure precisely, there is no guarantee the inverted signal is exactly one measuring cycle away from the original.

Results

The resulting signal was difficult to obtain without some tweaking of the ADC data.

First of all there is a DC bias imposed by the ADC itself. It varies over time, depending on many factors such as original audio volume, temperature, etc. The DC bias makes it hard to subtract inverted signal from the original without overflowing , it had to be adjusted manually. NB: use low pass filter to capture the DC component on separate ADC channel to remove it.

Secondly the time delay between the inverted and original signals caused distortions in the output. Accurate measure of the delay would help, although I am not sure how to perform it reliably. May need to run a few tests with predefined waveform on DAC to understand the timing of conversion.

Overall if the above two problems get solved I can postulate that the Digital DI Box is possible to implement with minimal external components. I will wait for the socket, with two MCs I should be able to test balancing/unbalancing without (significant) time delay.

Monday, 16 October 2017

Digital DI Box 04 - direct ADC to DAC transfer

Digital DI Box 04 - direct ADC to DAC transfer

Direct transfer

After a lot of experimenting with DMA and Timers I decided to strive for the best possible data transfer from ADC to DAC. For this I put ADC into high priority interrupt mode and the rest to medium to low priority. To monitor the data I had to sacrifice continuous flow due to computer's USB serial port unable to cope with such loads. As a result the output quality is very good, I put some low pass RC filter in place to get rid of high frequency noise caused by ADC. Later I will fine tune it and install low pass filter as well.

Communication via interrupts

Serial comms now done through interrupts instead of polling. Sampling is done by getting the respective command on serial line, all data goes into uint16_t[512] array. Once the data is ready I get it in parts of 16 samples using a simple protocol. This causes some chirping sounds during playback, but this is only for data analysis, so should be fine

Sunday, 8 October 2017

Digital DI Box 03 - continuous data flow

Digital DI Box 03 - continuous data flow

Downsampling for better dataflow

Previously I concluded I will have to sample t 10KHz in order to keep up with the data flow. Right now I set the sampling rate at 7us - approximately 15KHz. The sound is not perfect (coming out of 10bit DAC) - mainly due to DAC noise and occasional ADC errors. And of course low sample rate makes higher frequencies sound poorly. Here is the graph of a youtube video passed through ATXmega controller and received in digital form at the same computer.

The red line is the actual signal, the blue line is the same signal averaged over 5 past values. The graph is autoscaled after reading roughly 250 samples from serial port every 100ms

Time problem

As usual the datasheet was near useless with timer setup. I had to guess the timer register TCC0 and I recon there will be more guessing on what is required to get the timer going. If I had a debugger I would probably able to figure it out. Oh well, I may need to google again.

Digital DI Box 02 - ADC

Digital DI Box 02 - ADC

ATXmega ADC

ATXmega32A4U has a 12bit ADC with the ability to perform up to 1M conversions per second. Since audio signal is signed, I will only get 11 bits, but with such fast rate I can oversample (I hope). Below is the initial buffered sample (approximately 1000 readings). Drawn in canvas, captured via serial port at 2Mbs Baud.

Due to the amount of data being transferred I could not send the bulk of data and at the same time perform the conversion. My calculations show that in order to send reasonable amount of data to put into more or less pleasant visible form will require at least 10us of CPU time. That means I will have to scan the ADC channel once in 10us. ATXmega datasheet is quiet about the ADC maximum clock (I assumed 2us, but at what CPU clock?). Anyway 2us supposed to give me almost 50KHz sampling rate. 10us for sending data over serial line will reduce the sampling rate down to 10KHz, unless I can oversample. This will have to wait, I need to get the continuous flow of data first - both for analysis purpose and to create a cool waveform on the computer screen :).

Thursday, 5 October 2017

Digital DI Box 01 - Intro

Digital DI Box 01 - Intro

What is DI Box

In Audio processing DI Box converts an unbalanced signal into balanced, to improve audio transfer on a longer distances. Normally this is done with a high quality transformer which inverts the incoming audio and sends it along side with t non-inverted original. This kind of devices are quite expensive.

Why Digital?

Since IC's these days are much cheaper than many analogue parts, I will attempt to create a Digital DI box. I will convert the unbalanced audio signal into bits, invert digitally and convert back to analogue. (Sending pure digital data will require complex protocols and proprietary hardware which defeats the purpose of the experiment). At the ogher end I will digitize both streams and combine back into unbalanced audio.

Plan

So far I will need the following:
  1. Practice with ADC on a relatively fast MC - e.g. ATXmega32A4U
  2. Establish high speed communication with the MC to ensure the data gets converted sufficiently
  3. Invert and convert to analogue digital audio
  4. Receive two audio streams simultaneously converting to digital
  5. Combine two signals into one and convert to analogue

Sunday, 17 September 2017

Power Control - Code

Power control - Code

Syntax highlight generated with tohtml.com

Relay1.c file

#include <avr/io.h>
#include <util/delay.h>
// Clock Speed
#define F_CPU 8000000
#define BAUD 9600
#define MYUBRR F_CPU/16/BAUD-1
#define CMDERR 0x02
volatile unsigned char c;
volatile unsigned char i;

void blink(unsigned char led){
        if (led) PORTB |= 0x02;
        else PORTB &= ~0x02;  
}

void USART_Init( unsigned int ubrr)
{
        /*Set baud rate */
        UCSRA |= (1<<2); //set U2X bit
        UBRRH = ( unsigned char)(ubrr>>8);
        UBRRL = ( unsigned char)ubrr;
        /*Enable receiver and transmitter */
        UCSRB = (1<<RXEN)|(1<<TXEN);
        /* Set frame format: 8data, 2stop bit */
        UCSRC = (1<<URSEL)|(1<<USBS)|(3<<UCSZ0);
}
void USART_Transmit(  unsigned char data )
{
        /* Wait for empty transmit buffer */
        while ( !( UCSRA & (1<<UDRE)));
        /* Put data into buffer, sends the data */
        UDR = data;
}

void send_string(unsigned char * s){
        unsigned int i=0;
        while(s[i]!='\0'){
                USART_Transmit(s[i]);
                i++;
        }
        USART_Transmit('\n');
        _delay_ms(100);
}
void send_number(unsigned char n, unsigned char radix){
        char s[8];
        itoa(n,s,radix);
        send_string(s);
}

unsigned char USART_Receive( void ){
        /* Wait for data to be received */
        while ( !(UCSRA & (1<<RXC)) )
                ;
        /* Get and return received data from buffer */
        return  UDR;
}


void main(void )
{
        DDRD |= ((1<<DDD2)|(1<<DDD3)|(1<<DDD4)|(1<<DDD5)|(1<<DDD6)|(1<<DDD7));
        USART_Init(MYUBRR);
        send_string("\n\nReady...");
        i=32;
        char dir=1;
        short n=0;
        short relay=0;
        unsigned char b1;
        unsigned char b2;
        short data;
        short data2;
        char buffer[20];
        char message[20];
        while(1){
                c=0;
                while(c != '\n' && c != '\r' && n<20){
                        c=USART_Receive();
                        buffer[n]=c;
                        n++;
                }

                buffer[n]='\0';
                send_string("received:");
                send_string(buffer);
                for(i=0;i<n;i++){
                        if(buffer[i] == 'R' || buffer[i] == 'r'){
                                relay = buffer[i+1] - '0';
                                relay += 1;     //adjust for unused tx/rx port d
                                if (buffer[i] == 'r'){
                                        PORTD |= 1<<relay;
                                        sprintf(message,"portd %d on",relay);
                                        send_string(message);
                                        send_string("relay off");
                                }else{
                                        PORTD &= ~(1<<relay);
                                        sprintf(message,"portd %d off",relay);
                                        send_string(message);
                                        send_string("relay on");
                                }
                        }
                }
                n=0;
        }

}

Makefile

# sample makefile for programming AVR microcontrollers
# be sure to fill up the blank entries before you run this!

# the C source file, without extension
SOURCE = relay1

# device name
MCU = atmega8

# CPU speed, needed by <util/delay.h>
F_CPU = 8000000

# tools
CC = avr-gcc
SIZE = avr-size
OBJ = avr-objcopy
OBJD = avr-objdump
AVRDUDE = avrdude
RM = rm -f

# avr-gcc options
CC_OPT = s
CC_WARN = all
CC_LST = -Wa,-adhlns

# avrdude options
AVRDUDE_WRITE_FLASH = -U flash:w:$(SOURCE).hex
AVRDUDE_PROGRAMER = usbasp
AVRDUDE_PORT = 

# some strings for the UI
STR_BEGIN = Starting Build...
STR_CLEAN = Starting Clean...
STR_PROGRAM = Downloading Code...
STR_END = Done.

# general targets
build:
        $(CC) -mmcu=$(MCU) -W$(CC_WARN) -DF_CPU=$(F_CPU) -O$(CC_OPT) $(CC_LST)=$(SOURCE).lst $(SOURCE).c -o $(SOURCE).elf
        $(OBJD) -S $(SOURCE).elf > $(SOURCE).lss
        $(OBJ) -O ihex $(SOURCE).elf $(SOURCE).hex
size:
        $(SIZE) --mcu=$(MCU) --format=avr $(SOURCE).elf
clean_files:
        $(RM) $(SOURCE).elf
        $(RM) $(SOURCE).lst
        $(RM) $(SOURCE).lss
        $(RM) $(SOURCE).hex
download:
#       $(AVRDUDE) -c $(AVRDUDE_PROGRAMER) -p $(MCU) -P $(AVRDUDE_PORT) $(AVRDUDE_WRITE_FLASH) 
        $(AVRDUDE) -c $(AVRDUDE_PROGRAMER) -p $(MCU) -B 20 $(AVRDUDE_WRITE_FLASH) 

begin_all:
        @echo $(STR_BEGIN)
begin_clean:
        @echo $(STR_CLEAN)
begin_program:
        @echo $(STR_PROGRAM)
end:
        @echo $(STR_END)

# WINAVR targets
all: begin_all build size begin_program download end
clean: begin_clean clean_files end
program: begin_program download end

Is low level programming still relevant these days?

The levels of abstraction have made the application programming much easier and faster. But everything comes at a price. This is a new ...