Matthias Bitzer

A Bridge from Kurt Gödel to Zen Buddhism

2024-12-01T00:00:00+00:00

This article explores the fascinating philosophical connections between Kurt Gödel’s mathematical discoveries and Zen Buddhist thought.

Introduction

Kurt Gödel’s incompleteness theorems, published in 1931, fundamentally changed our understanding of the limits of formal systems and mathematical reasoning. Surprisingly, these insights from mathematical logic find unexpected resonance in the ancient wisdom of Zen Buddhism.

Gödel’s Incompleteness

Gödel proved that any sufficiently powerful formal system cannot be both complete and consistent. There will always be true statements that cannot be proven within the system itself. This revolutionary insight revealed inherent limitations in our formal approaches to truth.

The Zen Perspective

Zen Buddhism has long taught that ultimate truth cannot be captured in words or concepts. The famous Zen saying goes: “The finger pointing at the moon is not the moon.” Language and logic, while useful tools, cannot fully encompass reality.

The Bridge

Both Gödel and Zen point to something beyond the reach of formal systems:

Self-reference paradoxes: Gödel used self-reference to construct unprovable truths. Zen koans often employ paradox to push the mind beyond conceptual thinking.
Limits of language: Gödel showed mathematical limitations; Zen emphasizes the limitations of all conceptual frameworks.
Direct experience: Where formal systems fail, both traditions suggest a more direct approach to understanding.

Conclusion

The bridge between Gödel and Zen reminds us that the deepest truths may lie beyond our formal systems of thought—a humbling insight for both mathematicians and meditators alike.

How to use data version control (dvc) in a machine learning project

2019-07-01T00:00:00+00:00

Data Version Control (DVC) is an essential tool for machine learning projects. This guide shows you how to integrate it into your workflow.

Introduction

Managing data and model versions in machine learning projects can be challenging. Unlike code, datasets can be large and frequently updated. DVC (Data Version Control) solves this by providing Git-like version control for data.

Why DVC?

Version control for large files: Track datasets without bloating your Git repository
Reproducibility: Ensure experiments can be reproduced with exact data versions
Pipeline management: Define and track your ML pipelines
Storage agnostic: Works with S3, GCS, Azure, SSH, and more

Getting Started

Installation

pip install dvc

Initialize DVC in your project

cd your-ml-project
git init
dvc init

Add your data

dvc add data/training_data.csv
git add data/training_data.csv.dvc data/.gitignore
git commit -m "Add training data"

Setting up Remote Storage

dvc remote add -d myremote s3://mybucket/dvc-storage
git add .dvc/config
git commit -m "Configure DVC remote"

Working with DVC

Push data to remote

dvc push

Pull data from remote

dvc pull

Track changes

When your data changes:

dvc add data/training_data.csv
git add data/training_data.csv.dvc
git commit -m "Update training data"
dvc push

DVC Pipelines

Define reproducible ML pipelines in dvc.yaml:

stages:
  prepare:
    cmd: python src/prepare.py
    deps:
      - src/prepare.py
      - data/raw
    outs:
      - data/prepared
  
  train:
    cmd: python src/train.py
    deps:
      - src/train.py
      - data/prepared
    outs:
      - models/model.pkl
    metrics:
      - metrics.json:
          cache: false

Run the pipeline:

dvc repro

Best Practices

Always commit .dvc files alongside code changes
Use meaningful commit messages that describe data changes
Set up CI/CD to automatically run dvc repro
Document your data sources and preprocessing steps

Conclusion

DVC bridges the gap between data science and software engineering best practices. By integrating it into your workflow, you’ll achieve better reproducibility and collaboration in your ML projects.

How to extend Python with C/C++ Code

2019-01-01T00:00:00+00:00

Python’s simplicity comes at a performance cost. Learn how to extend Python with C/C++ for critical code paths.

Introduction

Python is beloved for its readability and ease of use, but interpreted languages have inherent performance limitations. When you need maximum speed for compute-intensive operations, extending Python with C or C++ is a powerful solution.

When to Use C Extensions

CPU-bound computations: Mathematical operations, algorithms
Interfacing with C libraries: Using existing C/C++ code
Performance-critical paths: When Python becomes a bottleneck
Memory-intensive operations: More control over memory management

Method 1: Python C API

The traditional approach using Python’s C API:

Example: A simple add function

#include 

static PyObject* add(PyObject* self, PyObject* args) {
    int a, b;
    if (!PyArg_ParseTuple(args, "ii", &a, &b)) {
        return NULL;
    }
    return PyLong_FromLong(a + b);
}

static PyMethodDef methods[] = {
    {"add", add, METH_VARARGS, "Add two integers"},
    {NULL, NULL, 0, NULL}
};

static struct PyModuleDef module = {
    PyModuleDef_HEAD_INIT,
    "mymodule",
    NULL,
    -1,
    methods
};

PyMODINIT_FUNC PyInit_mymodule(void) {
    return PyModule_Create(&module);
}

Build with setup.py

from setuptools import setup, Extension

module = Extension('mymodule', sources=['mymodule.c'])

setup(
    name='mymodule',
    ext_modules=[module]
)

python setup.py build_ext --inplace

Method 2: Cython

Cython provides a more Pythonic approach:

cython_example.pyx

def add(int a, int b):
    return a + b

def fast_sum(double[:] arr):
    cdef double total = 0
    cdef int i
    for i in range(arr.shape[0]):
        total += arr[i]
    return total

setup.py for Cython

from setuptools import setup
from Cython.Build import cythonize

setup(
    ext_modules=cythonize("cython_example.pyx")
)

Method 3: ctypes

For interfacing with existing shared libraries:

import ctypes

# Load the library
lib = ctypes.CDLL('./mylib.so')

# Define argument and return types
lib.add.argtypes = [ctypes.c_int, ctypes.c_int]
lib.add.restype = ctypes.c_int

# Call the function
result = lib.add(5, 3)

Method 4: pybind11

Modern C++ binding with pybind11:

#include 

int add(int a, int b) {
    return a + b;
}

PYBIND11_MODULE(mymodule, m) {
    m.def("add", &add, "Add two integers");
}

Performance Comparison

Method	Ease of Use	Performance	Use Case
Python C API	Low	Highest	Full control
Cython	Medium	High	Numeric code
ctypes	High	Medium	Existing libs
pybind11	High	High	C++ integration

Conclusion

Choose the right tool based on your needs:

pybind11: Best for new C++ code
Cython: Great for optimizing Python code
ctypes: Quick integration with existing libraries
C API: Maximum control and performance

10 important papers to get started with machine learning

2018-09-01T00:00:00+00:00

Starting your journey in machine learning? These 10 papers provide essential foundations for understanding the field.

Introduction

The machine learning field has produced thousands of papers, but certain works stand out as foundational. This curated list helps newcomers build a solid theoretical foundation.

The Papers

1. Backpropagation (1986)

“Learning representations by back-propagating errors”
Rumelhart, Hinton, Williams

The paper that made training deep neural networks practical. Understanding backpropagation is essential for any deep learning work.

2. Support Vector Machines (1995)

“Support-Vector Networks”
Cortes, Vapnik

Introduced SVMs, demonstrating how to find optimal decision boundaries. Still relevant for understanding kernel methods and margin-based classification.

3. Random Forests (2001)

“Random Forests”
Leo Breiman

Ensemble learning that remains one of the most robust and interpretable methods. Essential for understanding bagging and feature importance.

4. Dropout (2014)

“Dropout: A Simple Way to Prevent Neural Networks from Overfitting”
Srivastava, Hinton, et al.

A remarkably simple yet effective regularization technique that’s now standard in deep learning.

5. Adam Optimizer (2014)

“Adam: A Method for Stochastic Optimization”
Kingma, Ba

The default optimizer for most deep learning projects. Understanding adaptive learning rates is crucial.

6. Batch Normalization (2015)

“Batch Normalization: Accelerating Deep Network Training”
Ioffe, Szegedy

Made training very deep networks practical by normalizing layer inputs.

7. ResNet (2015)

“Deep Residual Learning for Image Recognition”
He, Zhang, Ren, Sun

Introduced skip connections, enabling training of networks with hundreds of layers. Won ImageNet 2015.

8. Attention Is All You Need (2017)

“Attention Is All You Need”
Vaswani et al.

Introduced the Transformer architecture, revolutionizing NLP and beyond. Foundation for BERT, GPT, and modern LLMs.

9. BERT (2018)

“BERT: Pre-training of Deep Bidirectional Transformers”
Devlin et al.

Demonstrated the power of pre-training and transfer learning in NLP.

10. An Image is Worth 16x16 Words (2020)

“An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale”
Dosovitskiy et al.

Vision Transformers (ViT) brought transformer success to computer vision.

How to Read Papers

First pass: Read abstract, introduction, and conclusion
Second pass: Understand the methodology and key figures
Third pass: Implement or reproduce the results

Additional Resources

ArXiv: Latest preprints
Papers With Code: Papers with implementations
Connected Papers: Visualize paper relationships

Conclusion

These papers span three decades of progress. Reading them provides both historical context and practical knowledge that remains relevant today.