DGL • Thema anzeigen - Mini Collector

Mini Collector

Moderator: DGL-Team

Seite 1 von 2

[ 17 Beiträge ]

Gehe zu Seite 1, 2 Nächste

Vorheriges Thema | Nächstes Thema

Autor

Nachricht

yunharla

Betreff des Beitrags: Mini Collector

Verfasst: Di Okt 07, 2014 20:47

DGL Member

Registriert: Mo Nov 08, 2010 18:41
Beiträge: 769
Programmiersprache: Gestern

Gestern damit begonnen den boehm garbage collector in meiner Engine zu ersetzen. Hintergrund war
halt das ich das auch gerne in einigen Visual Studio Geschichten benutzen wollte. Naja wer
meinen Ansatz mal ausprobieren oder weiterentwickeln will... have fun,für mich reichts es so erst einmal aus.

Das Ganze ist im Prinzip eine bessere Version von "alloca". Die Funktion ref_alloc alloziert Speicher der mindestens
"size" bytes lang ist und gibt einen Zeiger auf den Anfang des Speicher zurück. Der Speicher wird automatisch
freigegeben wenn der Zeiger nicht mehr vom Stack, oder einer der registrierten Roots, aus sichtbar ist.

Code:

 
//This is free and unencumbered software released into the public domain.
//
//Anyone is free to copy, modify, publish, use, compile, sell, or
//distribute this software, either in source code form or as a compiled
//binary, for any purpose, commercial or non - commercial, and by any
//means.
//
//In jurisdictions that recognize copyright laws, the author or authors
//of this software dedicate any and all copyright interest in the
//software to the public domain.We make this dedication for the benefit
//of the public at large and to the detriment of our heirs and
//successors.We intend this dedication to be an overt act of
//relinquishment in perpetuity of all present and future rights to this
//software under copyright law.
//
//THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
//EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
//MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
//IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
//OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
//ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
//OTHER DEALINGS IN THE SOFTWARE.
//
//For more information, please refer to <http://unlicense.org/>
 
 
#include <Windows.h>
 
typedef unsigned char byte;
 
 
#define PTR_RSHIFT(ptr,offset) ((void*)(((byte*)ptr) + offset))
#define PTR_LSHIFT(ptr,offset) ((void*)(((byte*)ptr) - offset))
 
 
struct mem_block_s;
struct mem_zone_s;
struct gc_list_s;
 
//the following 2 data structures provide a 2 dimensional 
//dynamic ring-buffer for memory allocations. each allocation
//will search a zone that has a free block of required size, if
//no block was found a new zone is added to the zone-ring. If 
//the block can store more than the required size, it maybe
//be split by the allocator.
 
typedef struct mem_block_s {
    bool isfree;
    size_t size;
    struct mem_block_s * prev;
    struct mem_block_s * next;
    struct mem_zone_s * zone;
} mem_block_t;
 
typedef struct mem_zone_s {
    size_t size;
    size_t used;
    struct mem_zone_s * prev;
    struct mem_zone_s * next;
    mem_block_t list;
    mem_block_t * active;
} mem_zone_t;
 
//list structure used by the garbage collector
 
typedef struct gc_list_s {
    void * head; //current element
    size_t size; //size of the current element
    struct gc_list_s * next; //next element (if any)
} gc_list_t;
 
 
static char * stack_base = NULL; //lowest address of the stack
static char * stack_end = NULL; //highest address of the stack
static HANDLE gc_thread = NULL; //garbage collector thread
static HANDLE gc_lock = NULL; //garbage collector mutex
static gc_list_t * list_gc_mem = NULL; //list of all existing memory
static gc_list_t * list_found_mem = NULL; //list of memory marked by the GC
static gc_list_t * list_gc_roots = NULL; //list of additional GC roots
static mem_zone_t * active_zone = NULL; //zone with highest priority
static size_t mem_used = 0; //total amount of memory used by the application
static size_t mem_size = 0; //total amount of memory allocated by the zones
static size_t mem_count = 0; //total number of living allocations
//readonly public interface :-)
const size_t * m_used = &mem_used;
const size_t * m_size = &mem_size;
const size_t * m_count = &mem_count;
 
 
//some shortcuts for mutex handling
#define lock(handle) WaitForSingleObject(handle, INFINITE)
#define unlock(handle) ReleaseMutex(handle)
 
 
void zone_create(size_t size) {
    mem_zone_t * zone = NULL;
    if (size < 0xFFFFFF) {
        size = 0xFFFFFF;
    }
    size += sizeof(mem_block_t);
    zone = (mem_zone_t*)malloc(size + sizeof(mem_zone_t));
    mem_size += size;
    //initialize zone data by creating a new
    //ring-buffer of blocks
    mem_block_t * block;
    zone->list.next = zone->list.prev = block = (mem_block_t*)PTR_RSHIFT(zone, sizeof(mem_zone_t));
    zone->list.isfree = false;
    zone->list.size = 0;
    zone->active = block;
    zone->size = size;
    zone->used = sizeof(mem_block_t);
    block->prev = block->next = &(zone->list);
    block->isfree = true;
    block->size = size - sizeof(mem_zone_t);
    block->zone = zone->list.zone = zone;
    //add zone to the ring-buffer
    if (active_zone) {
        zone->next = active_zone;
        zone->prev = active_zone->prev;
        active_zone->prev = zone;
        zone->prev->next = zone;
    } else {
        zone->prev = zone->next = zone;
    }
    //give the new zone the highest priority as it is completly empty
    active_zone = zone;
}
 
 
void mem_free(void * ptr) {
    if (ptr) {
        mem_block_t * block;
        mem_zone_t * zone;
        block = (mem_block_t*)PTR_LSHIFT(ptr, sizeof(mem_block_t));
        zone = block->zone;
        if (block->isfree) { //nothing todo here
            return;
        }
        //update counters
        zone->used -= block->size;
        mem_count--;
        mem_used -= block->size;
        memset(ptr, 0xFF, block->size - sizeof(mem_block_t));
        block->isfree = true;
        //merge nodes so they form a larger block of memory
        while (block->prev->isfree) {
            block = block->prev;
            block->size += block->next->size;
            block->next->prev = block;
            block->next = block->next->next;
        }
        while (block->next->isfree) {
            block->size += block->next->size;
            block->next->prev = block;
            block->next = block->next->next;
        }
        //give the node the highest priority
        zone->active = block;
    }
}
 
void * mem_alloc(size_t size) {
    if (size) {
        mem_zone_t * zone = active_zone;
        mem_block_t * start;
        mem_block_t * pout;
        size += sizeof(mem_block_t);
        size = (size + 7) & ~7; //align size
        do {
            //search a zone that fits the required size
            pout = NULL;
            start = NULL;
            if (zone->size > (size + zone->used)) {
                pout = zone->active;
                start = pout->prev;
                do {
                    //search a block that fits the size and is free
                    if (start == pout) {
                        pout = NULL;
                        break;
                    }
                    if (pout->isfree && pout->size >= size) {
                        break;
                    }
                    pout = pout->next;
                } while (pout->size < size || !pout->isfree);
                if (pout) {
                    //we have a result so lets mark it as non-free
                    pout->isfree = false;
                    break;
                }
            }
            if (zone->next == active_zone) {
                zone_create(size);
            }
            zone = zone->next;
        } while (true);
        //split node if enough memory is left
        if (pout->size > (size + 128)) {
            mem_block_t * tmp = (mem_block_t*)PTR_RSHIFT(pout, size);
            tmp->size = pout->size - size;
            tmp->isfree = true;
            tmp->prev = pout;
            tmp->zone = pout->zone;
            tmp->next = pout->next;
            tmp->next->prev = tmp;
            pout->next = tmp;
            pout->size = size;
        }
        //update counters
        zone->used += pout->size;
        mem_used += pout->size;
        mem_count++;
        zone->active = pout->next;
        return PTR_RSHIFT(pout, sizeof(mem_block_t));
 
    }
    return NULL;
}
 
//checks if a list of memory is referenced 
//by the given range of memory.
gc_list_t * gc_mark(gc_list_t * check, char * start, char * end) {
    char * i = start;
    char ** p = NULL;
    gc_list_t * not_found = check; //initially all pointers are unmarked
    gc_list_t * cmem = NULL;
    gc_list_t * tmem = NULL;
 
    end -= sizeof(char*) - 1;
    check = list_found_mem;
 
    //walk through each possible pointer value of the range
    for (; i < end && not_found; i += 1) {
        p = (char**)i;
        if (*p) {
            cmem = not_found;
            not_found = NULL;
            while (cmem) { //update our lists
                tmem = cmem;
                cmem = cmem->next;
                if (*p == tmem->head) { //mark pointer
                    tmem->next = list_found_mem;
                    list_found_mem = tmem;
                } else { //leave pointer unmarked
                    tmem->next = not_found;
                    not_found = tmem;
                }
            }
        }
    }
    //check marked pointer for references to unmarked pointers
    if (list_found_mem != check) {
        cmem = list_found_mem;
 
        while (cmem && not_found) {
            not_found = gc_mark(not_found, (char*)cmem->head, (char*)cmem->head + cmem->size);
            cmem = cmem->next;
        }
    }
    //return unmarked pointers
    return not_found;
}
 
//main function of the gc thread
void gc_collect(void) {
    while (true) {
        lock(gc_lock);
        gc_list_t * tmp = list_gc_mem;
        gc_list_t * not_found = NULL;
        list_gc_mem = NULL;
        list_found_mem = NULL;
        not_found = gc_mark(tmp, stack_base, stack_end);
        tmp = list_gc_roots;
        while (tmp) {
            not_found = gc_mark(not_found, (char*)tmp->head, (char*)tmp->head + tmp->size);
            tmp = tmp->next;
        }
        while (list_found_mem) {
            tmp = list_found_mem;
            list_found_mem = list_found_mem->next;
            tmp->next = list_gc_mem;
            list_gc_mem = tmp;
        }
        while (not_found) {
            void * tmp = not_found;
            not_found = not_found->next;
            mem_free(tmp);
        }
 
        unlock(gc_lock);
        Sleep(2);
    }
}
 
//called when the main thread exits 
void mem_finish(void) {
    mem_zone_t * zone = active_zone;
    TerminateThread(gc_thread, 0);//stop gc thread
    //clear all allocated zones
    do {
        void * tmp = zone;
        zone = zone->next;
        free(tmp);
    } while (zone != active_zone);
}
 
 
//called you use ref_alloc the first time.
void mem_init(void) {
    //get addresses of the stack
    NT_TIB* tib = NULL;
#if _WIN64 
    unsigned __int64 tib_p = __readgsqword(0x30);
#else 
    unsigned __int32 tib_p = __readfsdword(0x18);
#endif
    memcpy(&tib, &tib_p, sizeof(tib_p));
    if (tib->StackBase < tib->StackLimit) {
        stack_base = (char*)tib->StackBase;
        stack_end = (char*)tib->StackLimit;
    } else {
        stack_base = (char*)tib->StackLimit;
        stack_end = (char*)tib->StackBase;
    }
    //create first zone with default size
    zone_create(0);
    //create GC mutex and thread
    gc_lock = CreateMutex(NULL, false, NULL);
    gc_thread = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)& gc_collect, NULL, 0, 0);
 
}
 
void ref_push_root(void *ptr, size_t size) {
    lock(gc_lock);
    gc_list_t * item = (gc_list_t *)malloc(sizeof(gc_list_t));
    item->head = ptr;
    item->size = size;
    item->next = list_gc_roots;
    list_gc_roots = item;
    unlock(gc_lock);
}
 
 
//retrieves memory from a zone and updates the GC list
void * ref_alloc(size_t size) {
    if (stack_base == NULL) {
        mem_init();
        atexit(mem_finish);
    }
    if (size) {
        lock(gc_lock);
        gc_list_t * pout = (gc_list_t *)mem_alloc(size + sizeof(gc_list_t));
        if (pout) {
            pout->head = PTR_RSHIFT(pout, sizeof(gc_list_t));
            pout->size = size;
            memset(pout->head, 0, size);
            pout->next = list_gc_mem;
            list_gc_mem = pout;
            unlock(gc_lock);
            return pout->head;
        }
        unlock(gc_lock);
    }
    return NULL;
}
 
 

_________________
Meine Homepage

Nach oben

TAK2004

Betreff des Beitrags: Re: Mini Collector

Verfasst: Di Okt 14, 2014 10:12

DGL Member

Registriert: Di Mai 18, 2004 16:45
Beiträge: 2621
Wohnort: Berlin
Programmiersprache: Go, C/C++

Sieht interessant aus, ich würde noch

Code:

//readonly public interface :-)
const size_t * m_used = &mem_used;
const size_t * m_size = &mem_size;
const size_t * m_count = &mem_count;

ein weiteres const vor dem * setzen, damit weder pointer noch wert änderbar sind.

Ich mag das Mutex nicht, man kann das bestimmt noch mit atomic pointer lösen, damit keine hohen latenzen entstehen, wenn man speicher holt oder frei gibt und in dem moment der gc thread auf räumt und locked.

An sich sieht das sehr interessant aus aber mir fällt leider kein Verwendungszweck bei mir ein, um damit mal rum zu spielen :\

_________________
"Wer die Freiheit aufgibt um Sicherheit zu gewinnen, der wird am Ende beides verlieren"
Benjamin Franklin

Projekte: https://github.com/tak2004

Nach oben

yunharla

Betreff des Beitrags: Re: Mini Collector

Verfasst: Di Okt 14, 2014 19:20

DGL Member

Registriert: Mo Nov 08, 2010 18:41
Beiträge: 769
Programmiersprache: Gestern

Mhh ja Performance-Technisch würde ich da nicht viel Ändern.
Hier mal noch einmal das Ganze plus die Änderungen aus meiner Engine.

Code:

 
//This is free and unencumbered software released into the public domain.
//
//Anyone is free to copy, modify, publish, use, compile, sell, or
//distribute this software, either in source code form or as a compiled
//binary, for any purpose, commercial or non - commercial, and by any
//means.
//
//In jurisdictions that recognize copyright laws, the author or authors
//of this software dedicate any and all copyright interest in the
//software to the public domain.We make this dedication for the benefit
//of the public at large and to the detriment of our heirs and
//successors.We intend this dedication to be an overt act of
//relinquishment in perpetuity of all present and future rights to this
//software under copyright law.
//
//THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
//EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
//MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
//IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
//OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
//ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
//OTHER DEALINGS IN THE SOFTWARE.
//
//For more information, please refer to <http://unlicense.org/>
 
#include "runtime.h"
#include <Windows.h>
#include <omp.h>
 
#define BT_MEM_MAGIC "BTGCMEM"
#define BT_MEM_MAGIC_LEN 7
#define BT_MEM_SET_MAGIC(magic) \
    magic[0] = 'B';             \
    magic[1] = 'T';             \
    magic[2] = 'G';             \
    magic[3] = 'C';             \
    magic[4] = 'M';             \
    magic[5] = 'E';             \
    magic[6] = 'M';             
 
struct mem_block_s;
struct mem_zone_s;
struct gc_list_s;
 
//the following 2 data structures provide a 2 dimensional 
//dynamic ring-buffer for memory allocations. each allocation
//will search a zone that has a free block of required size, if
//no block was found a new zone is added to the zone-ring. If 
//the block can store more than the required size, it maybe
//be split by the allocator.
 
typedef struct mem_block_s {
    bool isfree;
    size_t size;
    void * self;
    f_thiscall finalizer; //finalizer that is to be called upon deallocation
    struct mem_block_s * prev;
    struct mem_block_s * next;
    struct mem_zone_s * zone;
} mem_block_t;
 
typedef struct mem_zone_s {
    size_t size;
    size_t used;
    struct mem_zone_s * prev;
    struct mem_zone_s * next;
    mem_block_t list;
    mem_block_t * active;
} mem_zone_t;
 
//list structure used by the garbage collector
 
typedef struct gc_list_s {
    void * head; //current element
    size_t size; //size of the current element
    bool marked; //was this item found by the GC?
    char magic[BT_MEM_MAGIC_LEN+1]; //magic number of this pointer
    struct gc_list_s * next; //next element (if any)
} gc_list_t;
 
 
static char * stack_base = NULL; //lowest address of the stack
static char * stack_end = NULL; //highest address of the stack
static HANDLE gc_thread = NULL; //garbage collector thread
static HANDLE gc_lock = NULL; //garbage collector mutex
static gc_list_t * list_gc_mem = NULL; //list of all existing memory
static gc_list_t * list_found_mem = NULL; //list of memory marked by the GC
static mem_zone_t * active_zone = NULL; //zone with highest priority
static size_t mem_used = 0; //total amount of memory used by the application
static size_t mem_size = 0; //total amount of memory allocated by the zones
static size_t mem_count = 0; //total number of living allocations
static size_t num_gc_threads = 0; //total number of available worker threads
static gc_list_t * class_finalizer = NULL;
//readonly public interface :-)
static bool mem_done = false;
const size_t * const m_used = &mem_used;
const size_t * const m_size = &mem_size;
const size_t * const m_count = &mem_count;
 
 
const int _fltused = 0; //required by the crt
 
 
//some shortcuts for mutex handling
#define lock(handle) WaitForSingleObject(handle, INFINITE)
#define unlock(handle) ReleaseMutex(handle)
 
bool zone_create(size_t size) {
    mem_zone_t * zone = NULL;
    if (size < 0xFFFFFF) {
        size = 0xFFFFFF;
    }
    size += sizeof(mem_block_t);
    zone = (mem_zone_t*)HeapAlloc(GetProcessHeap(), 0,size + sizeof(mem_zone_t));
    if (zone == NULL) {
        return false;
    }
    mem_size += size;
    //initialize zone data by creating a new
    //ring-buffer of blocks
    mem_block_t * block;
    zone->list.next = zone->list.prev = block = (mem_block_t*)PTR_RSHIFT(zone, sizeof(mem_zone_t));
    zone->list.isfree = false;
    zone->list.size = 0;
    zone->active = block;
    zone->size = size;
    zone->used = sizeof(mem_block_t);
    block->prev = block->next = &(zone->list);
    block->isfree = true;
    block->size = size - sizeof(mem_zone_t);
    block->zone = zone->list.zone = zone;
    //add zone to the ring-buffer
    if (active_zone) {
        zone->next = active_zone;
        zone->prev = active_zone->prev;
        active_zone->prev = zone;
        zone->prev->next = zone;
    } else {
        zone->prev = zone->next = zone;
    }
    //give the new zone the highest priority as it is completly empty
    active_zone = zone;
    return true;
}
 
void mem_fin(void * ptr, f_thiscall finalizer) {
    if (ptr) {
        gc_list_t * gc = PTR_LSHIFT(ptr, sizeof(gc_list_t));
        if (gc->head == ptr && mem_cmp(gc->magic, BT_MEM_MAGIC, BT_MEM_MAGIC_LEN) == 0) {
            mem_fin(gc, finalizer);
        } else {
            mem_block_t * block = PTR_LSHIFT(ptr, sizeof(mem_block_t));
            block->finalizer = finalizer;
        }
    }
}
 
void mem_free(void * ptr) {
    if (ptr) {
        gc_list_t * gc = PTR_LSHIFT(ptr, sizeof(gc_list_t));
        mem_block_t * block;
        mem_zone_t * zone;
 
        if (gc->head == ptr && mem_cmp(gc->magic, BT_MEM_MAGIC, BT_MEM_MAGIC_LEN) == 0) {
            mem_free(gc);
            return;
        }
        block = (mem_block_t*)PTR_LSHIFT(ptr, sizeof(mem_block_t));
        zone = block->zone;
        if (block->isfree) { //nothing todo here
            return;
        }
        if (block->finalizer && block->self) {
            block->finalizer(block->self);
            block->finalizer = NULL;
        }
        //update counters
        zone->used -= block->size;
        mem_count--;
        mem_used -= block->size;
        mem_set(ptr, 0xFF, block->size - sizeof(mem_block_t));
        block->isfree = true;
        //merge nodes so they form a larger block of memory
        while (block->prev->isfree) {
            block = block->prev;
            block->size += block->next->size;
            block->next->prev = block;
            block->next = block->next->next;
        }
        while (block->next->isfree) {
            block->size += block->next->size;
            block->next->prev = block;
            block->next = block->next->next;
        }
        //give the node the highest priority
        zone->active = block;
    }
}
 
void * mem_alloc(size_t size) {
    if (size) {
        mem_zone_t * zone = active_zone;
        mem_block_t * start;
        mem_block_t * pout;
        size += sizeof(mem_block_t);
        size = (size + 7) & ~7; //align size
 
        do {
            //search a zone that fits the required size
            pout = NULL;
            start = NULL;
            if (zone->size > (size + zone->used)) {
                pout = zone->active;
                start = pout->prev;
                do {
                    //search a block that fits the size and is free
                    if (start == pout) {
                        pout = NULL;
                        break;
                    }
                    if (pout->isfree && pout->size >= size) {
                        break;
                    }
                    pout = pout->next;
                } while (pout->size < size || !pout->isfree);
                if (pout) {
                    //we have a result so lets mark it as non-free
                    pout->isfree = false;
                    break;
                }
            }
            if (zone->next == active_zone) {
                if (!zone_create(size)) {
                    return NULL;
                }
            }
            zone = zone->next;
        } while (true);
        //split node if enough memory is left
        if (pout->size > (size + 128)) {
            mem_block_t * tmp = (mem_block_t*)PTR_RSHIFT(pout, size);
            tmp->size = pout->size - size;
            tmp->isfree = true;
            tmp->prev = pout;
            tmp->zone = pout->zone;
            tmp->next = pout->next;
            tmp->next->prev = tmp;
            pout->next = tmp;
            pout->size = size;
        }
        //update counters
        zone->used += pout->size;
        mem_used += pout->size;
        mem_count++;
        zone->active = pout->next;
        pout->finalizer = NULL;
        pout->self = PTR_RSHIFT(pout, sizeof(mem_block_t));
        return pout->self;
 
    }
    return NULL;
}
 
//checks if a list of memory is referenced 
//by the given range of memory.
 
gc_list_t * gc_mark(gc_list_t * check, char * start, char * end) {
    INT64 i = (INT64)(start - NULL);
    INT64 s = i;
    INT64 e = (INT64)(end - NULL) - (sizeof(char*) - 1);
    gc_list_t * imem = NULL;
    gc_list_t * tmp = NULL;
    gc_list_t * not_found = NULL;
    bool has_found = false;
    if (check) {
        //check all content for a pointer in our list
#pragma omp parallel for
        for (i = s; i < e; i += sizeof(void*)) {
            void * p = *((void**)i);
            if (p) {
                gc_list_t * cmem = check;
                while (cmem) {
                    if (p >= cmem->head && p <= PTR_RSHIFT(cmem->head, cmem->size)) {
                        cmem->marked = true;
                        has_found = true;
                    }
                    cmem = cmem->next;
                }
            }
        }
    }
    if (has_found) {
        imem = check;
        while (imem) {
            tmp = imem;
            imem = imem->next;
            if (tmp->marked) {
                tmp->next = list_found_mem;
                list_found_mem = tmp;
            } else {
                tmp->next = not_found;
                not_found = tmp;
            }
        }
        imem = list_found_mem;
        while (imem && not_found) {
            not_found = gc_mark(not_found, (char*)imem->head, (char*)PTR_RSHIFT(imem, imem->size));
            imem = imem->next;
        }
    } else {
        return check;
    }
    return not_found;
}
 
//main function of the gc thread
 
void gc_collect(void) {
    while (!mem_done) {
        lock(gc_lock);
        gc_list_t * tmp = list_gc_mem;
        gc_list_t * not_found = NULL;
        list_gc_mem = NULL;
        list_found_mem = NULL;
        bool not_full = false;
        if (not_full) {
            unlock(gc_lock);
        }
        not_found = gc_mark(tmp, stack_base, stack_end);
        while (tmp) {
            not_found = gc_mark(not_found, (char*)tmp->head, (char*)tmp->head + tmp->size);
            tmp = tmp->next;
        }
        if (not_full) {
            lock(gc_lock);
        }
        while (list_found_mem) {
            tmp = list_found_mem;
            tmp->marked = false;
            list_found_mem = list_found_mem->next;
            tmp->next = list_gc_mem;
            list_gc_mem = tmp;
        }
        while (not_found) {
            void * tmp = not_found;
            not_found = not_found->next;
            mem_free(tmp);
        }
 
        unlock(gc_lock);
        Sleep(2);
    }
}
 
//called when the main thread exits 
 
void mem_finish(void) {
    mem_zone_t * zone = active_zone;
    mem_done = true;
    lock(gc_thread);
    while (list_gc_mem) {
        mem_free(list_gc_mem);
        list_gc_mem = list_gc_mem->next;
    }
    do {
        void * tmp = zone;
        zone = zone->next;
        HeapFree(GetProcessHeap(),0,tmp);
    } while (zone != active_zone);
}
 
 
//called you use ref_alloc the first time.
 
void mem_init(void) {
    //get addresses of the stack
    NT_TIB* tib = NULL;
#if _WIN64 
    tib = (NT_TIB*)(__readgsqword(0x30));
#else 
    tib = (NT_TIB*)(__readfsdword(0x18));
#endif
    if (tib->StackBase < tib->StackLimit) {
        stack_base = (char*)tib->StackBase;
        stack_end = (char*)tib->StackLimit;
    } else {
        stack_base = (char*)tib->StackLimit;
        stack_end = (char*)tib->StackBase;
    }
    num_gc_threads = 1;
#ifdef _OPENMP
    int iCPU = omp_get_num_procs();
    if (iCPU > 1) {
        num_gc_threads = iCPU - 1;
    } 
    omp_set_num_threads(num_gc_threads);
#endif
    //create first zone with default size
    zone_create(0);
    //create GC mutex and thread
    gc_lock = CreateMutex(NULL, false, NULL);
    gc_thread = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)& gc_collect, NULL, 0, 0);
    Sleep(100); //give some time for startup
}
 
//retrieves memory from a zone and updates the GC list
 
void * mem_allocs(size_t size) {
    if (size) {
        lock(gc_lock);
        gc_list_t * pout = (gc_list_t *)mem_alloc(size + sizeof(gc_list_t) + 8);
        if (pout) {
            pout->head = PTR_RSHIFT(pout, sizeof(gc_list_t));
            pout->size = size;
            mem_set(pout->head, 0, size + 8);
            BT_MEM_SET_MAGIC(pout->magic);
            ((mem_block_t*)PTR_LSHIFT(pout, sizeof(mem_block_t)))->self = pout->head;
            pout->marked = false;
            pout->next = list_gc_mem;
            list_gc_mem = pout;
            unlock(gc_lock);
            return pout->head;
        }
        unlock(gc_lock);
    }
    return NULL;
}
 
size_t mem_len(void* ptr) {
    if (ptr) {
        gc_list_t * l = (gc_list_t*)PTR_LSHIFT(ptr, sizeof(gc_list_t));
        return l->size;
    }
    return 0;
}
 
 
 
void before_exit(f_callback fun) {
    gc_list_t * tmp = (gc_list_t *)HeapAlloc(GetProcessHeap(), 0, sizeof(gc_list_t*));
    tmp->head = (void*)fun;
    tmp->next = class_finalizer;
    class_finalizer = tmp;
}
 
bool main(char * args);
 
 
 
void __cdecl mainCRTStartup() {
    int mainret;
    timeBeginPeriod(1);
    mem_init();
    mainret = main(GetCommandLine());
    mem_finish();
    gc_list_t * tmp = NULL;
    f_callback fun = NULL;
    while (class_finalizer) {
        tmp = class_finalizer;
        ((f_callback)(tmp->head))();
        class_finalizer = class_finalizer->next;
        HeapFree(GetProcessHeap(), 0, tmp);
    }
    timeEndPeriod(1);
    ExitProcess(mainret);
}
 
void __cdecl WinMainCRTStartup() {
    mainCRTStartup();
}
 

_________________
Meine Homepage

Nach oben

yunharla

Betreff des Beitrags: Re: Mini Collector

Verfasst: Sa Dez 27, 2014 14:56

DGL Member

Registriert: Mo Nov 08, 2010 18:41
Beiträge: 769
Programmiersprache: Gestern

Ganz frisch ausm Ofen (deswegen keine Comments):

Code:

 
 
struct memroot_t {
    thread_id_t tid;
    char * min;
    char * max;
    memroot_t * rnext;
};
 
struct memhdr_t : memroot_t {
    bool isused;
    bool mark;
    memhdr_t * next;
    size_t size;
};
 
static int num_roots = 0;
static memroot_t * roots = NULL;
static memhdr_t * pool = NULL;
static lock_t gclock;
 
void GCCollect_IMP(void) {
    if (pool && num_roots) {
        memhdr_t * cur = pool;
        char * ps = pool->min;
        char * pe = ps + MEMORY_POOL_SIZE;
        memroot_t * root = NULL;
        memroot_t * troot = NULL;
        int mc = 0;
        int tc = 0;
        int r = 0;
        while (cur) {
            if (cur->isused) {
                cur->mark = true;
                cur->isused = false;
                ++mc;
            }
            cur = cur->next;
        }
        for (; r < num_roots; r++) {
            roots[r].rnext = root;
            root = &(roots[r]);
        }
        while (root && mc) {
            char * c = root->min;
            char * e = root->max;
            troot = root;
            while (c < e && mc) {
                char * ptr = *((char**)c++);
                if (ptr) {
                    if (ptr >= ps && ptr <= pe) {
                        cur = pool;
                        while (cur) {
                            if (cur->mark) {
                                if (ptr >= cur->min && ptr <= cur->max) {
                                    cur->isused = true;
                                    cur->mark = false;
                                    --mc;
                                    tc++;
                                    cur->rnext = root;
                                    root = cur;
                                }
                            }
                            cur = cur->next;
                        }
                    }
                }
            }
            if (troot == root) {
                root = root->rnext;
            }
        }
    }
}
 
 
void GCCollect(void) {
    gclock.lock();
    GCCollect_IMP();
    gclock.unlock();
}
 
void GCRegisterRoot(thread_id_t id, char * min, char * max) {
    gclock.lock();
    for (int i = 0; i < num_roots; i++) {
        if (memcmp(&(roots[i].tid), &id, sizeof(thread_id_t)) == 0) {
            roots[i].min = min;
            roots[i].max = max - sizeof(char*);
            gclock.unlock();
            return;
        }
    }
    if (num_roots % 10 == 0) {
        int nr = (num_roots / 10 + 1) * 10 + 1;
        memroot_t * tmp = roots;
        roots = (memroot_t*)calloc(nr, sizeof(memroot_t));
        if (tmp) {
            memcpy(roots, tmp, sizeof(memroot_t)*num_roots);
            free(tmp);
        }
    }
    roots[num_roots].min = min;
    roots[num_roots].max = max - sizeof(char*);
    memcpy(&(roots[num_roots].tid), &id, sizeof(thread_id_t));
    roots[num_roots++].rnext = NULL;
    gclock.unlock();
}
 
void GCUnregisterRoot(thread_id_t id) {
    gclock.lock();
    for (int i = 0; i < num_roots; i++) {
        if (memcmp(&(roots[i].tid), &id, sizeof(thread_id_t))) {
            continue;
        }
        for (int j = i; j < num_roots; j++) {
            memcpy(&(roots[j]), &(roots[j + 1]), sizeof(memroot_t));
        }
        break;
    }
    num_roots--;
    gclock.unlock();
}
 
void * GCAlloc(size_t size) {
    if (pool) {
        if (size) {
            int tries = 0;
            size += (sizeof(char*)) - (size % sizeof(char*));
            gclock.lock();
            for (; tries < 2; ++tries) {
                memhdr_t * cur = pool;
                while (cur) {
                    if (!cur->isused) {
                        cur->mark = false;
                        while (cur->next && !cur->next->isused) {
                            cur->size += sizeof(memhdr_t);
                            cur->size += cur->next->size;
                            cur->next = cur->next->next;
                        }
                        if (cur->size > size) {
                            size_t left = cur->size - size;
                            if (left > sizeof(memhdr_t)) {
                                memhdr_t * split = (memhdr_t*)(cur->min + size);
                                split->next = cur->next;
                                split->size = left - sizeof(memhdr_t);
                                split->isused = false;
                                split->min = ((char*)split) + sizeof(memhdr_t);
                                split->mark = false;
                                split->max = split->min + split->size - sizeof(char*);
                                cur->size = size;
                                cur->next = split;
                                cur->max = cur->min + cur->size - sizeof(char*);
                            }
                            cur->isused = true;
                            gclock.unlock();
                            return (void*)cur->min;
                        }
                    }
                    cur = cur->next;
                }
                GCCollect_IMP();
            }
            gclock.unlock();
        }
    } else {
        pool = (memhdr_t*)calloc(MEMORY_POOL_SIZE,1);
        if (pool) {
            pool->size = MEMORY_POOL_SIZE - sizeof(memhdr_t);
            pool->min = ((char*)pool) + sizeof(memhdr_t);
            pool->isused = false;
            pool->max = pool->min + pool->size;
            return GCAlloc(size);
        }
    }
    return NULL;
}
 
void GCFree(void * ptr) {
    if (ptr) {
        gclock.lock();
        memhdr_t * hdr = (memhdr_t*)((char*)ptr - sizeof(memhdr_t));
        hdr->isused = false;
        gclock.unlock();
    }
}
 

Die locks habe ich übrigens mit dem InterlockedCompareExchange gestaltet

Die thread id ist bei mir wie folgt definiert

Code:

 
struct thread_id_t {
#ifdef _WIN64
    long long tid;
    long long fid;
#else 
    long tid;
    long fid;
#endif
};
 
NT_TIB * tib = NULL;
#if _WIN64 
    tib = (NT_TIB*)(__readgsqword(0x30));
#else 
    tib = (NT_TIB*)(__readfsdword(0x18));
#endif
    if (tib->StackBase < tib->StackLimit) {
        self->stack_min = (char*)tib->StackBase;
        self->stack_max = (char*)tib->StackLimit;
    } else {
        self->stack_max = (char*)tib->StackBase;
        self->stack_min = (char*)tib->StackLimit;
    }
#if _WIN64 
    self->id.tid = (long long) GetCurrentThreadId();
    self->id.tid = (long long) tib->FiberData;
#else
    self->id.tid = (long) GetCurrentThreadId();
    self->id.tid = (long) tib->FiberData;
#endif
 

_________________
Meine Homepage

Nach oben

TAK2004

Betreff des Beitrags: Re: Mini Collector

Verfasst: Sa Dez 27, 2014 18:56

DGL Member

Registriert: Di Mai 18, 2004 16:45
Beiträge: 2621
Wohnort: Berlin
Programmiersprache: Go, C/C++

Sieht wesentlich übersichtlicher aus als vorher

_________________
"Wer die Freiheit aufgibt um Sicherheit zu gewinnen, der wird am Ende beides verlieren"
Benjamin Franklin

Projekte: https://github.com/tak2004

Nach oben

yunharla

Betreff des Beitrags: Re: Mini Collector

Verfasst: So Dez 28, 2014 13:20

DGL Member

Registriert: Mo Nov 08, 2010 18:41
Beiträge: 769
Programmiersprache: Gestern

Fehlt aber noch einiges von dem was ich mir so vorgenommen habe

_________________
Meine Homepage

Nach oben

TAK2004

Betreff des Beitrags: Re: Mini Collector

Verfasst: So Dez 28, 2014 14:36

DGL Member

Registriert: Di Mai 18, 2004 16:45
Beiträge: 2621
Wohnort: Berlin
Programmiersprache: Go, C/C++

Ist doch normal.
Wenn ich überlege, was ich noch alles am Radon Framework, Radon Converter und Zero Prime machen will, dann müsste ich eigentlich noch ein paar Programmierer anheuern.

Du könntest z.B. die Performance, Speicherverbrauch oder lesbarkeit optimieren und da gibt es wenige Grenzen ^^
Du kannst z.B. mal PVS Studio drüber jagen.

_________________
"Wer die Freiheit aufgibt um Sicherheit zu gewinnen, der wird am Ende beides verlieren"
Benjamin Franklin

Projekte: https://github.com/tak2004

Nach oben

yunharla

Betreff des Beitrags: Re: Mini Collector

Verfasst: So Dez 28, 2014 20:15

DGL Member

Registriert: Mo Nov 08, 2010 18:41
Beiträge: 769
Programmiersprache: Gestern

Nu ich arbeite gerade an einer nicht-blockierenden Variante. Ich denke das ist ein must-have wenn es um Multithreading geht.

_________________
Meine Homepage

Nach oben

yunharla

Betreff des Beitrags: Re: Mini Collector

Verfasst: So Dez 28, 2014 23:00

DGL Member

Registriert: Mo Nov 08, 2010 18:41
Beiträge: 769
Programmiersprache: Gestern

Ok ich habe jetzt den einzelnen Knoten einen lock verpasst. Die Ergebnisse sind nun wirklich sehr viel krasser. In meinen kleinen Testbenchmark schafft die vorherige Version bei mir etwa 20000- 30000 Objekte pro Sekunde. Diese Version schafft noch genauso viele... bei einen thread. Bei mehreren threads kann es aber auf bis 150000 hochziehen

Code:

 
//BroodTech ZweigEngine
//created by Alex 'Yunharla' Schmidt
//see http://www.broodtech.de
 
#include "shared.h"
 
struct memrng_t {
    char * min;
    char * max;
    memrng_t * gcnext;
};
 
struct memroot_t : memrng_t {
    thread_id_t tid;
};
 
struct memhdr_t : memrng_t, public lock_t {
    bool isused;
    bool mark;
    size_t size;
    lock_t plock;
    memhdr_t * hskip;
    memhdr_t * hnext; 
};
 
lock_t rlock = lock_t();
static int num_roots = 0;
static memroot_t *  roots = NULL;
static memhdr_t * pool = NULL;
static char * ps = NULL;
static char * pe = NULL;
int num_objects = 0;
 
void GCInit(void) {
    pool = (memhdr_t*)calloc(MEMORY_POOL_SIZE, 1);
    pool->min = ((char*)pool) + sizeof(memhdr_t);
    pool->size = MEMORY_POOL_SIZE - sizeof(memhdr_t);
    pool->max = pool->min + pool->size - sizeof(char*);
    ps = pool->min;
    pe = pool->max;
    pool->unlock();
}
 
void GCShutdown(void) {
    free(pool);
    pool = NULL;
    ps = pe = NULL;
}
 
void GCRegisterRoot(thread_id_t id, char * min, char * max) {
    rlock.lock();
    for (int i = 0; i < num_roots; i++) {
        if (memcmp(&(roots[i].tid), &id, sizeof(thread_id_t)) == 0) {
            roots[i].min = min;
            roots[i].max = max - sizeof(char*);
            rlock.unlock();
            return;
        }
    }
    if (num_roots % 10 == 0) {
        if (num_roots == 0) {
            GCInit();
        }
        int nr = (num_roots / 10 + 1) * 10 + 1;
        memroot_t * tmp = roots;
        roots = (memroot_t*)calloc(nr, sizeof(memroot_t));
        if (tmp) {
            memcpy(roots, tmp, sizeof(memroot_t)*num_roots);
            free(tmp);
        }
    }
    roots[num_roots].min = min;
    roots[num_roots].max = max - sizeof(char*);
    memcpy(&(roots[num_roots].tid), &id, sizeof(thread_id_t));
    roots[num_roots++].gcnext = NULL;
    rlock.unlock();
}
 
void GCUnregisterRoot(thread_id_t id) {
    rlock.lock();
    if (num_roots == 1) {
        GCShutdown();
        free(roots);
        roots = NULL;
    } else {
        for (int i = 0; i < num_roots; i++) {
            if (memcmp(&(roots[i].tid), &id, sizeof(thread_id_t))) {
                continue;
            }
            for (int j = i; j < num_roots; j++) {
                memcpy(&(roots[j]), &(roots[j + 1]), sizeof(memroot_t));
            }
            break;
        }
        num_roots--;
    }
    rlock.unlock();
}
 
void GCCollect(void) {
    rlock.lock();
    if (pool && num_roots) {
        memhdr_t * cur = pool;
        memhdr_t * prev = NULL;
        memrng_t * root = NULL;
        memrng_t * troot = NULL;
        int mc = 0;
        int tc = 0;
        while (cur) {
            if (cur->isused) {
                cur->lock();
                if (prev) {
                    prev->hskip = cur;
                }
                prev = cur;
                cur->mark = true;
                cur->isused = false;
                ++mc;
            }
            cur = cur->hnext;
        }
        for (int r = 0; r < num_roots; r++) {
            roots[r].gcnext = root;
            root = &(roots[r]);
        }
        while (root && mc) {
            char * c = root->min;
            char * e = root->max;
            troot = root;
            while (c < e && mc) {
                char * ptr = *((char**)c++);
                if (ptr) {
                    if (ptr >= ps && ptr <= pe) {
                        cur = pool;
                        prev = NULL;
                        while (cur) {
                            if (cur->mark) {
                                if (ptr >= cur->min && ptr <= cur->max) {
                                    cur->isused = true;
                                    cur->mark = false;
                                    cur->unlock();
                                    --mc;
                                    tc++;
                                    if (prev) {
                                        prev->hskip = cur->hskip;
                                    }
                                    cur->gcnext = root;
                                    root = cur;
                                } else {
                                    prev = cur;
                                }
                            }
                            cur = cur->hskip;
                        }
                    }
                }
            }
            if (troot == root) {
                root = root->gcnext;
            }
        }
 
        num_objects = tc;
        cur = pool;
        while (cur) {
            if (cur->mark) {
                cur->mark = false;
                cur->unlock();
            }
            cur = cur->hskip;
        }
    }
    rlock.unlock();
}
 
void * GCAlloc(size_t size) {
    if (size) {
        bool first = true;
        memhdr_t * cur = pool;
    second:
        size += (sizeof(char*)) - (size & sizeof(char*));
        while (cur) {
            if (cur->isused) {
                cur = cur->hnext;
                continue;
            } else if(cur->trylock()) {
                while (cur->hnext) {
                    if (cur->hnext->isused) {
                        break;
                    } else if (cur->hnext->trylock()) {
                        cur->size += sizeof(memhdr_t);
                        cur->size += cur->hnext->size;
                        cur->hnext = cur->hnext->hnext;
                    } else {
                        break;
                    }
                }
                if (cur->size > size) {
                    size_t left = cur->size - size;
                    if (left > sizeof(memhdr_t)) {
                        memhdr_t * split = (memhdr_t*)(cur->min + size);
                        split->unlock();
                        split->lock();
                        split->hnext = cur->hnext;
                        split->size = left - sizeof(memhdr_t);
                        split->isused = false;
                        split->min = ((char*)split) + sizeof(memhdr_t);
                        split->mark = false;
                        split->max = split->min + split->size - sizeof(char*);
                        cur->size = size;
                        cur->hnext = split;
                        cur->max = cur->min + cur->size - sizeof(char*);
                        split->unlock();
                    }
                    cur->isused = true;
                    cur->unlock();
                    num_objects++;
                    return (void*)cur->min;
                }
                cur->unlock();
            }
            cur = cur->hnext;
        }
        if (first) {
            first = false;
            GCCollect();
            goto second;
        }
    }
    return NULL;
}
 
void GCFree(void * ptr) {
    if (ptr) {
        memhdr_t * hdr = (memhdr_t*)((char*)ptr - sizeof(memhdr_t));
        hdr->isused = false;
    }
}
 
void * operator new(size_t size){
    return GCAlloc(size);
}
 
void * operator new[](size_t size) {
    return GCAlloc(size);
}
 
void operator delete(void * ptr) {
    GCFree(ptr);
}
 
void operator delete[](void * ptr) {
    GCFree(ptr);
}
 
 

_________________
Meine Homepage

Nach oben

TAK2004

Betreff des Beitrags: Re: Mini Collector

Verfasst: Mo Dez 29, 2014 01:08

DGL Member

Registriert: Di Mai 18, 2004 16:45
Beiträge: 2621
Wohnort: Berlin
Programmiersprache: Go, C/C++

Du könntest noch die memcpy und memcmp Funktion durch Agner Fog seine ASM lib variante ersetzen.
Diese nutzt die höchst mögliche SIMD Variante die auf der jeweiligen CPU geht, um das zu beschleunigen.

_________________
"Wer die Freiheit aufgibt um Sicherheit zu gewinnen, der wird am Ende beides verlieren"
Benjamin Franklin

Projekte: https://github.com/tak2004

Nach oben

yunharla

Betreff des Beitrags: Re: Mini Collector

Verfasst: Mo Dez 29, 2014 14:43

DGL Member

Registriert: Mo Nov 08, 2010 18:41
Beiträge: 769
Programmiersprache: Gestern

ersma bugs fixxen oder

[edit]
so, läuft jetzt alles wieder stabil. Ich konnte den Scanvorgang noch einmal wesentlich beschleunigen. Im Schnitt bleibt jetzt alles bei 100000 (große) Objekten pro Sekunden. Je nachdem wie voll der GC ist und welche Settings man benutzt, ist auch wesentlich mehr drinne

Die BTGE fährt mit folgenden Einstellungen etwa 50000 Objekte pro Sekunde, vorher waren es 10000 (Boehm sogar nur 2000):
#define MEMORY_POOL_SIZE (1024 * 1024 * 1024) //defines the length of each memory pool
#define NUM_SCAN_BUFFER 8
#define NUM_OBJECT_SLOWDOWN 400

Also, auf ganzer Linie ein Erfolg (jedenfalls bei mir)

Code:

 
 
 
//This is free and unencumbered software released into the public domain.
//
//Anyone is free to copy, modify, publish, use, compile, sell, or
//distribute this software, either in source code form or as a compiled
//binary, for any purpose, commercial or non - commercial, and by any
//means.
//
//In jurisdictions that recognize copyright laws, the author or authors
//of this software dedicate any and all copyright interest in the
//software to the public domain.We make this dedication for the benefit
//of the public at large and to the detriment of our heirs and
//successors.We intend this dedication to be an overt act of
//relinquishment in perpetuity of all present and future rights to this
//software under copyright law.
//
//THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
//EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
//MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
//IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
//OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
//ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
//OTHER DEALINGS IN THE SOFTWARE.
//
//For more information, please refer to <http://unlicense.org/>
 
#include "shared.h"
//a simple linked list used by the garbage collector to build and
//handle searches for references. Each reference found by the GC, will
//be added to the list so we can search for children.
struct memrng_t {
    char * min;
    char * max;
    memrng_t * gcnext;
};
 
struct memroot_t : memrng_t {
    thread_id_t tid;
};
 
//the following defines a ring-buffer for memory allocations. Each
//node contains a lock so we can use it for "none-blocking" garbage
//collection.
struct memhdr_t : memrng_t {
    size_t size; //includes header and fragments
    bool isunused;
    memhdr_t * next;
    memhdr_t * skip;
    lock_t plock;
};
 
static lock_t rlock = lock_t();
volatile int num_roots = 0;
static memroot_t *  roots = NULL;
static memhdr_t * pool = NULL;
static memhdr_t * pcur = NULL;
static char * ps = NULL;
static char * pe = NULL;
volatile int num_objects = 0;
volatile int num_deallocs = 0;
 
void GCCollect(void) {
    if (rlock.trylock()) {
        memhdr_t * cur = pool;
        memhdr_t * prev = NULL;
        memhdr_t * fnd[NUM_SCAN_BUFFER + 1];
        char * min[NUM_SCAN_BUFFER + 1];
        char * max[NUM_SCAN_BUFFER + 1];
        memrng_t * check = NULL;
        char * ptr = NULL;
        int mc = 0;
        int tc = 0;
        bool found = false;
        memset(max, 0, sizeof(max));
        memset(min, 0xFFFFFF, sizeof(min));
        memset(fnd, 0, sizeof(min));
        size_t l = (pe - ps) / NUM_SCAN_BUFFER + 1;
        size_t h = 0;
        //walk through each node and mark used nodes
        //for garbage collection. Each node will be
        //locked. It becomes unlocked when the next
        //is locked or when the operation is completed.
        do {
            cur->plock.lock();
            if (prev) {
                prev->plock.unlock();
                prev = NULL;
            }
            if (cur->isunused) {
                prev = cur;
            } else {
                h = ((size_t)cur->min) / l;
                if (cur->min < min[h]) {
                    min[h] = cur->min;
                }
                if (cur->max > max[h]) {
                    max[h] = cur->max;
                }
                cur->skip = fnd[h];
                fnd[h] = cur;
                cur->isunused = true;
                ++mc;
            }
            cur = cur->next;
        } while (cur != pool);
 
        if (prev) {
            prev->plock.unlock();
            prev = NULL;
        }
        //The following searches each node of a list for
        //references to previously allocated memory. Each
        //list item represents a range for memory from which
        //each byte is searched for references to a block of
        //memory.
        for (int r = 0; r < num_roots; r++) {
            roots[r].gcnext = check;
            check = &(roots[r]);
        }
        tc = mc;
        while (check && tc) {
            char * c = check->min;
            char * e = check->max;
            while (c < e && tc) {
                ptr = *((char**)c++);
                h = ((size_t)ptr) / l;
                found = false;
                for (int i = h; i <= NUM_SCAN_BUFFER && !found; i++) {
                    if (ptr >= min[h] && ptr <= max[h]) {
                        cur = fnd[h];
                        prev = NULL;
                        while (cur) {
                            //compare possible pointer with the values
                            //of the current node
                            if (ptr >= cur->min && ptr <= cur->max) {
                                cur->isunused = false;
                                --tc;
                                if (prev) {
                                    prev->skip = cur->skip;
                                } else {
                                    fnd[h] = cur->skip;
                                }
                                cur->gcnext = check->gcnext;
                                check->gcnext = cur;
                                cur->plock.unlock();
                                //there cannot be multiple references from a single
                                //pointer so lets skip the rest
                                found = true;
                                break;
                            } else {
                                prev = cur;
                            }
                            cur = cur->skip;
                        }
                    }
                }
            }
            if (check == check->gcnext) {
                system("pause");
            }
            check = check->gcnext;
        }
 
        num_objects = mc - tc;
        mc = 0;
        for (int i = 0; i < NUM_SCAN_BUFFER; i++) {
            while (fnd[i]) {
                prev = fnd[i];
                fnd[i] = fnd[i]->skip;
                prev->plock.unlock();
                mc++;
            }
        }
        num_deallocs = mc;
    } else {
        rlock.lock(); //just wait when gc is in progress
    }
    rlock.unlock();
}
 
void * GCAlloc(size_t size) {
    if (size) {
        bool firstry = true;
        if (num_objects > NUM_OBJECT_SLOWDOWN) { //slow down allocator when dealing with many objects
            Sleep(num_objects / NUM_OBJECT_SLOWDOWN);
        }
        size += sizeof(memhdr_t); //add block header size
        size += sizeof(char*);
        size = (size + (sizeof(char*) - 1)) & ~(sizeof(char*) - 1); //align
    secondtry:
        //you might want to use pool instead of pcur
        //when dealing with many small objects as it runs
        //a bit smoother in this case. 
        memhdr_t * start = pcur;
        memhdr_t * cur = start;
        memhdr_t * prev = NULL;
        do {
            if (cur->plock.trylock()) {
                if (cur->isunused) {
                    //merge coherent blocks
                    while (cur < cur->next && cur->next->plock.trylock()) {
 
                        if (cur->next->isunused) {
                            cur->size += cur->next->size;
                            cur->next = cur->next->next;
                        } else {
                            cur->next->plock.unlock();
                            break;
                        }
                    }
                    if (cur->size >= size) {
                        size_t left = cur->size - size;
                        //split the block if its big enough to hold another value
                        if (left > (sizeof(memhdr_t) + 64)) {
                            memhdr_t * split = (memhdr_t*)(((char*)cur) + size);
                            split->size = left;
                            split->isunused = true;
                            split->next = cur->next;
                            split->min = ((char*)split) + sizeof(memhdr_t);
                            split->max = ((char*)split) + split->size - sizeof(char*);
                            split->plock.unlock();
                            cur->size = size;
                            cur->max = ((char*)cur) + cur->size - sizeof(char*);
                            cur->next = split;
                            pcur = split;
                        }
                        cur->isunused = false;
                        cur->plock.unlock();
                        num_objects++;
                        return (void*)cur->min;
                    }
                }
                prev = cur;
                cur = cur->next;
                prev->plock.unlock();
            } else {
                cur = cur->next;
            }
        } while (cur != start);
        if (firstry) {
            firstry = false;
            GCCollect();
            goto secondtry;
        }
    }
    return NULL;
}
 
 
void GCInit(void) {
    pool = (memhdr_t*)calloc(MEMORY_POOL_SIZE + sizeof(memhdr_t) + 1, 1);
    if (pool) {
        pool->size = MEMORY_POOL_SIZE;
        pool->next = pool;
        pool->isunused = true;
        ps = pool->min = ((char*)pool) + sizeof(memhdr_t);
        pe = pool->max = pool->min + MEMORY_POOL_SIZE - sizeof(char*);
        pool->plock.unlock();
    }
    pcur = pool;
    num_objects = 0;
}
 
void GCShutdown(void) {
    free(pool);
    pool = NULL;
    pcur = NULL;
    ps = pe = NULL;
    num_objects = 0;
}
 
void GCRegisterRoot(thread_id_t id, char * min, char * max) {
    rlock.lock();
    for (int i = 0; i < num_roots; i++) {
        if (memcmp(&(roots[i].tid), &id, sizeof(thread_id_t)) == 0) {
            roots[i].min = min;
            roots[i].max = max - sizeof(char*);
            rlock.unlock();
            return;
        }
    }
    if (num_roots % 10 == 0) {
        if (num_roots == 0) {
            GCInit();
        }
        int nr = (num_roots / 10 + 1) * 10 + 1;
        memroot_t * tmp = roots;
        roots = (memroot_t*)calloc(nr, sizeof(memroot_t));
        if (tmp) {
            memcpy(roots, tmp, sizeof(memroot_t)*num_roots);
            free(tmp);
        }
    }
    roots[num_roots].min = min;
    roots[num_roots].max = max - sizeof(char*);
    memcpy(&(roots[num_roots].tid), &id, sizeof(thread_id_t));
    roots[num_roots++].gcnext = NULL;
    rlock.unlock();
}
 
void GCUnregisterRoot(thread_id_t id) {
    rlock.lock();
    if (num_roots == 1) {
        GCShutdown();
        free(roots);
        roots = NULL;
    } else {
        for (int i = 0; i < num_roots; i++) {
            if (memcmp(&(roots[i].tid), &id, sizeof(thread_id_t))) {
                continue;
            }
            for (int j = i; j < num_roots; j++) {
                memcpy(&(roots[j]), &(roots[j + 1]), sizeof(memroot_t));
            }
            break;
        }
        num_roots--;
    }
    rlock.unlock();
}
 
void GCFree(void * ptr) {
    if (ptr) {
        memhdr_t * hdr = (memhdr_t*)(((char*)ptr) - sizeof(memhdr_t));
        if (rlock.trylock() && hdr->min == ptr && hdr->plock.trylock()) {
            hdr->isunused = false;
            rlock.unlock();
            hdr->plock.unlock();
        }
    }
}
 
void * operator new(size_t size){
    return GCAlloc(size);
}
 
void * operator new[](size_t size) {
    return GCAlloc(size);
}
 
void operator delete(void * ptr) {
    GCFree(ptr);
}
 
void operator delete[](void * ptr) {
    GCFree(ptr);
}
 
 

_________________
Meine Homepage

Nach oben

OpenglerF

Betreff des Beitrags: Re: Mini Collector

Verfasst: Di Dez 30, 2014 15:33

DGL Member

Registriert: Do Dez 29, 2011 19:40
Beiträge: 421
Wohnort: Deutschland, Bayern
Programmiersprache: C++, C, D, C# VB.Net

Ich würde an deiner Stelle einfach mal profilen.
Außerdem ist meine Vermutung, dass es den Cache signifikant entlasten würde, wenn du gar keine Linked List mehr verwenden würdest oder als Alternative vlt. auch immer eine bestimmte Anzahl Objekte in einem Node vereinst.

An deiner Stelle würde ich auch mal schauen Atomics anstatt "volatile" zu verwenden. Das das funktioniert, ist nämlich nicht im Standard garantiert. Außer wenn du sehr alte Versionen von Compilern unterstützen willst, ist das eine sicherere und auch wesentlich flexiblere Alternative. Ich würde auch mal schauen, die "volatile" bzw. Atomic-Zugriffe zum Beispiel aus dem Loops herauszuziehen und vorher in einer lokalen Variable zu cachen. (Zum Beispiel bei "for (int r = 0; r < num_roots; r++)"). Der Compiler kann das nämlich nicht selbst machen, weil "volatile" ihn dazu zwingt die Variable jedes mal neu zu laden und nicht in einem Register zwischenzuspeichern.

Nach oben

yunharla

Betreff des Beitrags: Re: Mini Collector

Verfasst: Di Dez 30, 2014 19:58

DGL Member

Registriert: Mo Nov 08, 2010 18:41
Beiträge: 769
Programmiersprache: Gestern

Nach meiner bisherigen Erfahrung ist das wohl abhängig von der jeweiligen Applikation. Halt je nachdem wie Stark die einzelnen Threads alles fragmentieren oder wie groß deine Objekte sind etc..

Da müsste man wohl das Ganze etwas länger mit richtigen Anwendungen testen. Was allerdings bei allen meiner Versuchs... ähh Freunde richtig viel gebracht hat ist folgender Zusatz:

Code:

 
cur = (memhdr_t*)(ptr - sizeof(memhdr_t));
                                if (cur->min == ptr && cur->isunused) {
                                    --tc;
                                    cur->isunused = false;
                                    cur->gcnext = check->gcnext;
                                    check->gcnext = cur;
                                    cur->plock.unlock();
                                    break;
                                }
 

Die meisten verschieben den erstellen Pointer nicht mehr und von daher braucht man hier auch keine Range-Checks über die Liste ... eventuell sollte man hier auch eher eine Art perfect Hash bauen der komplett ohne Listen oder Arrays und co. auskommt.

Um die registrierten Roots brauchst du dich übrigens gar nicht kümmern. Da wird wohl so schnell kein Bottleneck entstehen ... jedenfalls nicht wenn jemand noch 3 Hirnzellen hat

_________________
Meine Homepage

Nach oben

TAK2004

Betreff des Beitrags: Re: Mini Collector

Verfasst: Di Dez 30, 2014 20:47

DGL Member

Registriert: Di Mai 18, 2004 16:45
Beiträge: 2621
Wohnort: Berlin
Programmiersprache: Go, C/C++

Ich weiß, dass die neueren CPU's(Intel i5/i7 AMD K10) indirect pointer auflösen können.
Dies war ne Optimierung um Objekt Orientierte Sprachen wie C#, C++ und Java zu beschleunigen, weil die in der Regel ein pointer(auf das Objekt) auf ein pointer(vtable/rtti) enthalten. Daher sind diese um einige % schneller bei jeglicher art von Software.
Ich hab mich nicht weiter damit beschäftigt, als dass ich weiß das es existiert aber vieleicht greifen dort dann auch die Link Listen.
In dem Fall könnte man sich eine Optimierung sparen.

Ich kann nur empfehlen AMD CodeXL auf einem AMD System mal laufen zu lassen.
Die haben sehr viel für Intel CPU's nach gelegt aber für AMD bekommt man noch die Cache Effizienz für die einzelnen Level sowie weitere tiefliegende Profielinformationen.
Ich hab auf Arbeit extra wegen GPU und CPU Profiling und Debugging ein AMD System bekommen und die anderen haben ein Intel & NV System.

Wegen GC würde ich vieleicht mal bei Java rein schauen, JRocket soll z.B. ein sehr guten GC haben. Vieleicht haben die ja noch Papers was so deren Erfahrung ist. JRockit Mission Control z.B. zeigt mir, dass die sich sehr wohl bewusst sind, was man so alles mit GC profilen und optimieren kann.

Es wäre praktisch den mal im Release gegen malloc/new/new[] laufen zu lassen.
Wenn die auf gleiche Performance kommen, würde ich mir keine weiteren gedanken mehr machen.

_________________
"Wer die Freiheit aufgibt um Sicherheit zu gewinnen, der wird am Ende beides verlieren"
Benjamin Franklin

Projekte: https://github.com/tak2004

Nach oben

OpenglerF

Betreff des Beitrags: Re: Mini Collector

Verfasst: Di Dez 30, 2014 22:13

DGL Member

Registriert: Do Dez 29, 2011 19:40
Beiträge: 421
Wohnort: Deutschland, Bayern
Programmiersprache: C++, C, D, C# VB.Net

Um ehrlich zu sein, glaube ich eher nicht an die Optimierung in neueren Prozessoren. Es kann sein, dass solche Optimierungen das Problem leicht abschwächen, aber wohl kaum auflösen. Zum einen sind Linked Lists nicht bloß Doppelpointer sondern "Endlospointer", zum anderen ist auch die Zerstreuung und Verschwendung des Caches ein Problem.

Ich verweise da mal wieder auf die Präsentation des Clang-Entwicklers: https://www.youtube.com/watch?v=fHNmRkzxHWs&t=34m41s

Nach oben

Seite 1 von 2

[ 17 Beiträge ]

Gehe zu Seite 1, 2 Nächste

Foren-Übersicht » Programmierung » Allgemein

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 52 Gäste

Du darfst keine neuen Themen in diesem Forum erstellen.
Du darfst keine Antworten zu Themen in diesem Forum erstellen.
Du darfst deine Beiträge in diesem Forum nicht ändern.
Du darfst deine Beiträge in diesem Forum nicht löschen.
Du darfst keine Dateianhänge in diesem Forum erstellen.