Las bases de datos son bloat, es mejor usar un archivo texto plano y usar las herramientas de Unix para buscar informacion dentro de ellos. --- Aquí tenés una **guía práctica y realista** para usar **un archivo de texto plano como “base de datos”** para un colegio, usando solo **grep, awk, sed, sort, cut** y herramientas Unix. La clave es **formato bien definido**, **un registro por línea** y **campos separados** de forma consistente. --- # ✅ **1. Elegir un formato simple, estable y fácil de parsear** El mejor formato para “base de datos en texto plano” usando Unix es: ### **Formato: TSV (tab-separated values, separado por TAB)** Razones: * No choca con comas dentro del texto. * `awk -F '\t'` lo lee perfecto. * `grep` funciona igual. * Fácil de editar en Vim. --- # ✅ **2. Estructura universal para cualquier tipo de registro** Usá un **“schema” fijo**: ``` TYPE ID DATE FIELD1 FIELD2 FIELD3 EXTRA ``` ### Campos recomendados: * `TYPE`: el tipo de dato → `DOCENTE`, `ALUMNO`, `ASISTENCIA`, `LICENCIA`, `AMON`, `TUTOR`, etc. * `ID`: identificador único → DNI, legajo o código. * `DATE`: formato estándar → `2025-03-04` * `FIELD1..FIELDn`: depende del tipo. * `EXTRA`: texto libre opcional. --- # ✅ **3. Ejemplos reales** ### **Docentes** ``` DOCENTE 20300300 2025-03-01 ingreso Juan Perez Matemática DOCENTE 20300300 2025-03-15 licencia enfermedad 2-dias ``` ### **Alumnos** ``` ALUMNO 55022110 2025-03-01 ingreso Lopez Maria 2B TUTOR 55022110 2025-03-01 tutor Carlos Lopez padre ``` ### **Asistencias** ``` ASIST 55022110 2025-03-04 presente 2B ASIST 55022110 2025-03-05 ausente 2B ``` ### **Amonestaciones** ``` AMON 55022110 2025-03-05 leve celular profesorX AMON 55022110 2025-03-10 grave conducta preceptorY ``` --- # ✅ **4. Cómo consultar rápido con Unix** ### ■ Todos los registros de un alumno: ``` grep -P "\t55022110\t" base2025.tsv ``` ### ■ Solo asistencias: ``` grep "^ASIST" base2025.tsv ``` ### ■ Asistencias de un alumno: ``` awk -F '\t' '$1=="ASIST" && $2=="55022110"' base2025.tsv ``` ### ■ Cantidad de ausencias: ``` awk -F '\t' '$1=="ASIST" && $2=="55022110" && $4=="ausente" {c++} END {print c}' base2025.tsv ``` ### ■ Sanciones de un curso: ``` awk -F '\t' '$1=="AMON" && $6=="2B"' base2025.tsv ``` --- # ✅ **5. Cómo agregar un registro desde Bash** Hacete un script: ### `add.sh` ```bash #!/bin/sh echo -e "$@" >> base2025.tsv ``` Uso: ```bash ./add.sh "ASIST\t55022110\t2025-03-06\tpresente\t2B" ``` --- # ✅ **6. Cómo validar datos** Podés agregar checks con awk: ``` awk -F '\t' 'NF < 4 { print "Registro inválido:", $0}' base2025.tsv ``` --- # ✅ **7. Organización por año** Te recomiendo: ``` /datos-colegio/ base2025.tsv base2026.tsv base2027.tsv ``` Un archivo grande por año es perfecto. --- # ✅ **8. Opcional: separar por “tablas virtuales”** Si querés dividir un poco: ``` docentes2025.tsv alumnos2025.tsv asistencias2025.tsv amonestaciones2025.tsv tutores2025.tsv ``` Pero tu idea de **un solo archivo** es totalmente válida — solo necesitás el campo `TYPE`. --- # ⚡ ¿Querés que te genere un *schema* completo y lista de comandos útiles para cada tipo de dato? Puedo armarte un “manual del sistema” con todo predefinido. --- You can absolutely use a **plain text file as a database** in Unix. This is classic Unix philosophy: *“Text is the universal interface.”* Below are **practical patterns** depending on how you want to store and query data — all without jq, without SQLite, without JSON — just text + grep/awk/sed. --- # ✅ 1. Choose a Format (custom plain text) The simplest and most robust **flat-file database** formats are: ### **A) KEY=VALUE (one record per line)** ``` id=1 name="John Doe" age=30 id=2 name="Alice" age=25 ``` ### **B) CSV (comma separated)** ``` 1,John Doe,30 2,Alice,25 ``` ### **C) Pipe/Tab separated** ``` 1|John Doe|30 2|Alice|25 ``` ### **D) One file per record (directory database)** ``` db/ 1.txt 2.txt ``` Each file contains arbitrary text. --- # ✅ 2. How to Query the “Database” ## 🔍 Search rows by field ### **KEY=VALUE format** Find all entries where name contains "Alice": ```sh grep 'name="Alice"' db.txt ``` Find id=2: ```sh grep '^id=2 ' db.txt ``` ### CSV example Find records whose first column is "2": ```sh awk -F, '$1 == 2' db.csv ``` --- # 🔄 3. Insert new record Append: ```sh echo 'id=3 name="Bob" age=50' >> db.txt ``` --- # ✏ 4. Update a record Replace age for id=1: ```sh sed -i 's/^id=1 .*/id=1 name="John Doe" age=31/' db.txt ``` --- # ❌ 5. Delete a record Delete id=2: ```sh sed -i '/^id=2 /d' db.txt ``` --- # 🔥 6. Example Flat-File Database Implementation ### **db.sh** ```sh #!/bin/sh DB="db.txt" add() { echo "id=$1 name=\"$2\" age=$3" >> "$DB" } get() { grep "^id=$1 " "$DB" } delete() { sed -i "/^id=$1 /d" "$DB" } update_age() { sed -i "s/^id=$1 .*/id=$1 name=\"$2\" age=$3/" "$DB" } ``` Usage: ```sh ./db.sh add 1 "John Doe" 30 ./db.sh add 2 "Alice" 25 ./db.sh get 2 ./db.sh update_age 1 "John Doe" 31 ./db.sh delete 2 ``` --- # 📂 7. Alternative: directory per record ``` db/ 1 2 ``` Create: ```sh echo "name=Alice" > db/2 echo "age=25" >> db/2 ``` Search: ```sh grep -R "Alice" db/ ``` --- # ⭐ When is this useful? ✔ shell scripts ✔ embedded devices ✔ tiny systems ✔ config management ✔ low data volume (<50k records) ✔ no dependencies ✔ grep/awk/sed are extremely fast --- # ⚠ When it falls apart ❌ hundreds of thousands of rows ❌ concurrent writes ❌ complex queries (joins, ordering, types) --- # If you want, I can: * design a full mini-database system in shell * give you CRUD helpers * make an “SQL-like” interface in bash * design indexing for fast lookups (using grep caches) Just tell me your ideal workflow. --- Nice — you want a *practical*, file-based “database” using only Unix tools. Below is a focused, pragmatic guide + ready-to-run shell toolkit you can copy-paste and use. It covers data layout choices, common operations (CRUD + queries), performance & safety tips, and a simple script (`plaindb.sh`) that implements insert/find/update/delete on newline-delimited JSON (NDJSON) and on delimited CSV/TSV. I thought deeper about scale, concurrency, indexing, and schema evolution so you don't get burned later. # Why plain files (quick pros/cons) **Pros** * Simple, transparent, easy to version with git. * No DB server, low dependency surface. * Fast for many read-heavy workloads using grep/rg/awk/jq. **Cons** * No transactions unless you implement locking/atomic writes. * Harder to scale and do complex joins/queries compared to a DB. * Requires careful handling for concurrency, schema evolution, and backups. # Recommended file formats 1. **NDJSON (newline-delimited JSON)** — each line is a JSON object. Best for structured records, works great with `jq`. * Example: `users.ndjson` * `{"id":1,"name":"A","email":"a@x.com"}` 2. **CSV/TSV** — great for tabular data and fast parsing with `awk`, `cut`. * Use `\t` (TSV) to avoid comma-escaping headaches. 3. **Flat key-value** — `key:value` per line for tiny lookup tables. 4. **Inverted index files** — precomputed index files to support fast full-text lookups. # Tools to use (short) * `grep` / `rg` (ripgrep) — text search * `awk` — field processing, aggregations, joins-ish * `sed` — in-place edits / transforms * `jq` — JSON querying and mutation * `cut`, `sort`, `uniq`, `join` — selection and joins on sorted files * `flock` — file locking for safe concurrent writes * `mv`/`cp` technique — atomic replace (`mv tmp file`) * `gzip`/`xz` — compress cold data * `git` — version your files for history & lightweight rollback * `fzf` — interactive selection --- # Patterns & examples ## 1) NDJSON: append, find, update, delete File: `users.ndjson` ```json {"id":1,"name":"Alice","email":"alice@example.com"} {"id":2,"name":"Bob","email":"bob@example.com"} ``` ### Insert (append safely) ```sh # create a new record and append atomically record='{"id":3,"name":"Carol","email":"carol@example.com"}' printf '%s\n' "$record" >> users.ndjson # better: use flock for multi-writer safety (see script below) ``` ### Find records * Full-text with grep: ```sh grep -i 'alice' users.ndjson ``` * Field-level with jq: ```sh jq -c 'select(.email=="alice@example.com")' users.ndjson ``` ### Select columns (project) ```sh jq -r '.id, .name' users.ndjson # prints each field on a new line (not ideal) jq -r '. | [.id, .name] | @tsv' users.ndjson # idname ``` ### Update a record (idempotent pattern) Can't modify in-place reliably — create a new file then move: ```sh jq 'if .id==2 then .email="bob@new.com" else . end' users.ndjson > users.ndjson.tmp mv users.ndjson.tmp users.ndjson ``` If multiple writers exist, use `flock` to protect the critical section (script later). ### Delete ```sh jq 'select(.id != 2)' users.ndjson > users.ndjson.tmp && mv users.ndjson.tmp users.ndjson ``` ## 2) CSV/TSV with awk File: `products.tsv` (header: idnameprice) ``` id name price 1 Widget 12.50 2 Gizmo 9.99 ``` ### Find where price > 10 ```sh awk -F'\t' 'NR==1{print; next} $3+0 > 10' products.tsv ``` ### Group by and count Count products by name prefix: ```sh awk -F'\t' 'NR>1 {prefix=substr($2,1,3); counts[prefix]++} END{for (k in counts) print k, counts[k]}' products.tsv | sort -k2 -n -r ``` ### Join two files (both sorted by key) `join` requires files sorted on join field: ```sh # assume customers.tsv and orders.tsv keyed by customer_id in column1 join -t$'\t' -1 1 -2 1 <(sort -k1,1 customers.tsv) <(sort -k1,1 orders.tsv) ``` ## 3) Indexing for speed (simple inverted index) If you have lots of text and `grep` becomes slow / you need targeted search, create an index mapping token → list of record IDs. Example pipeline to build a basic inverted index (tokenized lowercased words): ```sh # input: docs.ndjson each line {"id":123, "text":"..."} jq -r '. | [.id, .text] | @tsv' docs.ndjson \ | tr '[:upper:]' '[:lower:]' \ | awk -F'\t' '{id=$1; text=$2; gsub(/[^a-z0-9]+/," ",text); split(text, a, " "); for(i in a) if(length(a[i])>1) print a[i] "\t" id}' \ | sort -k1,1 -u \ | awk -F'\t' '{word=$1; id=$2; ids[word]=ids[word]?ids[word] "," id : id} END{for (w in ids) print w "\t" ids[w]}' \ > index.tsv ``` Lookup: ```sh grep -P '^keyword\t' index.tsv # parse list of ids and then pull records from ndjson using jq or grep -Ff ``` ## 4) Transactions & concurrency (flock + atomic move) Example pattern for safe writes: ```sh ( flock -x 200 || exit 1 # make changes in a tmp file jq '...update...' users.ndjson > users.ndjson.tmp mv users.ndjson.tmp users.ndjson ) 200>users.ndjson.lock ``` * `flock -x` grabs exclusive lock on file descriptor 200. * Always write to a `.tmp` and `mv` to replace atomically. ## 5) Backups & snapshots * Periodic snapshots: `cp users.ndjson users.$(date +%Y%m%d%H%M).ndjson` * Lightweight: commit changes to git (`git add *.ndjson && git commit -m "snapshot"`). * For big files, use incremental rsync to remote. ## 6) Validation & schema evolution * Keep a small schema file `schema.json` that documents required fields and types. * Validate new records with `jq` filter before appending: ```sh jq -e '(.id|type=="number") and (.email|test("@"))' <<<"$candidate" ``` * For migration: write a one-shot `jq` or `awk` migration script that produces a new file, test it, then replace. --- # Ready-to-use toolkit: `plaindb.sh` Copy this script and put it in your `$PATH` (`chmod +x plaindb.sh`). It implements simple CRUD on an NDJSON file with locking. ```sh #!/usr/bin/env bash # plaindb.sh - minimal NDJSON "db" operations: insert, find, update, delete, list # Usage: plaindb.sh [args...] # Requires: jq, flock, mktemp DB="$1"; shift CMD="$1"; shift LOCK="${DB}.lock" usage(){ cat < [args...] Commands: insert Append a JSON record (string) find Print matching records (jq filter) list Print all records update '' Apply jq program to all records (e.g. 'if .id==2 then .name="X" else . end') delete '' Delete records matching condition (jq select; e.g. '.id==2') EOF } if [ -z "$DB" ] || [ -z "$CMD" ]; then usage; exit 1; fi mkdir -p "$(dirname "$DB")" 2>/dev/null || true : "${DB:=db.ndjson}" with_lock() { # usage: with_lock exec 200>"$LOCK" flock -x 200 "$@" flock -u 200 } case "$CMD" in insert) json="$1" if ! echo "$json" | jq -e . >/dev/null 2>&1; then echo "invalid json" >&2; exit 2 fi # append under lock with_lock bash -c "printf '%s\n' \"$json\" >> '$DB'" ;; find) filter="$1" if [ -z "$filter" ]; then filter='.'; fi # safe read, no write locking necessary jq -c "$filter" "$DB" ;; list) jq -c '.' "$DB" ;; update) prog="$1" tmp="$(mktemp "${DB}.tmp.XXXX")" with_lock bash -c "jq -c '$prog' '$DB' > '$tmp' && mv '$tmp' '$DB'" ;; delete) cond="$1" tmp="$(mktemp "${DB}.tmp.XXXX")" with_lock bash -c "jq -c \"select( ( $cond ) | not )\" '$DB' > '$tmp' && mv '$tmp' '$DB'" ;; *) usage; exit 1 esac ``` Examples: ```sh # insert ./plaindb.sh users.ndjson insert '{"id":1,"name":"Alice"}' # find by id ./plaindb.sh users.ndjson find 'select(.id==1)' # update ./plaindb.sh users.ndjson update 'if .id==1 then .email="a@x.com" else . end' # delete ./plaindb.sh users.ndjson delete '.id==1' ``` --- # Advanced tips (thinking deeper) * **Search performance**: `rg` (ripgrep) is much faster than `grep`. For enormous files, split data by sharding keys (year, prefix), or maintain indexes. * **Atomic counters**: maintain a small file `next_id` and `flock` it to safely increment IDs. * **Multi-field joins**: sort files on join keys and use `join`. For complex joins, it's often simpler to load small sets into memory with `awk` associative arrays. * **Partial reads**: store large blobs separately (e.g., `blobs/.dat`) and keep metadata in the main NDJSON. * **Full-text search**: if you need advanced text queries (stemming, ranking), either maintain an index or use `sqlite FTS` (that's a DB—but it's a single file). * **Monitoring & metrics**: keep a `metrics.log` file with line-oriented events (timestamped) and roll it daily. * **Immutable append-only log**: prefer append-only files and periodic compaction—this makes crash recovery trivial. Compaction script reads file, removes tombstones, writes compacted file, swaps atomically. * **Encryption**: use `gpg --encrypt` for at-rest, `gpg --decrypt` when reading; or use `age`. * **Testing & CI**: add schema tests that `jq`-validate every new commit to the repo. --- # Quick cheatsheet * `grep -i foo file` — fast text search * `jq -c 'select(.field=="x")' file` — filter NDJSON * `awk -F'\t' '$3>10' file.tsv` — numeric filter on column 3 * `sort -k1,1 file | join -t$'\t' -1 1 -2 1 file2` — join on column1 * `flock -x 200; mv tmp file;` — atomic replacement with lock * `git add *.ndjson && git commit -m "snapshot"` — snapshot your DB state --- If you want, I can: * produce a slightly fancier CLI (with `getopts`, `--where`, `--select`) to make ad-hoc queries easier, * show how to build a tiny inverted index for full-text search and a lookup tool that returns top-N results, * or tailor the scripts for CSV/TSV only (no jq). Which of those should I give you next? --- If you want to use **jq** as your “database engine,” the **best possible format** is: # ✅ NDJSON (Newline-Delimited JSON) One JSON object **per line**: `users.ndjson`: ```json {"id":1, "name":"Alice", "email":"alice@example.com"} {"id":2, "name":"Bob", "email":"bob@example.com"} {"id":3, "name":"Carol", "email":"carol@example.com"} ``` No commas, no arrays, no brackets — **just one JSON per line**. This is the format used by big data tools (Elasticsearch, Firehose, Hadoop, etc.). --- # ✅ 1. FIND (query with jq) List *all* records: ```sh jq -c . users.ndjson ``` Find by name: ```sh jq -c 'select(.name == "Alice")' users.ndjson ``` Case-insensitive: ```sh jq -c 'select(.name | ascii_downcase == "alice")' users.ndjson ``` Find by ID: ```sh jq -c 'select(.id == 2)' users.ndjson ``` Search email ending in `.com`: ```sh jq -c 'select(.email | endswith(".com"))' users.ndjson ``` --- # ✅ 2. INSERT (append) Append a new object: ```sh echo '{"id":4, "name":"David", "email":"david@example.com"}' >> users.ndjson ``` --- # ✅ 3. DELETE a record Delete ID=3: ```sh jq -c 'select(.id != 3)' users.ndjson \ > tmp && mv tmp users.ndjson ``` Delete every Bob: ```sh jq -c 'select(.name != "Bob")' users.ndjson \ > tmp && mv tmp users.ndjson ``` --- # ✅ 4. UPDATE a record Example: change Bob’s email: ```sh jq -c ' if .id == 2 then .email = "bob@newmail.com" else . end ' users.ndjson > tmp && mv tmp users.ndjson ``` Update multiple fields: ```sh jq -c ' if .id == 3 then .name = "Carolina" | .email = "caro@example.com" else . end ' users.ndjson > tmp && mv tmp users.ndjson ``` --- # ✅ 5. SORT the "database" Sort by ID: ```sh jq -c . users.ndjson | sort -n -t: -k2 ``` Better way (jq loads fully, sorts, outputs NDJSON): ```sh jq -c -s 'sort_by(.id)[]' users.ndjson > tmp && mv tmp users.ndjson ``` (`-s` = "slurp" = load entire file) Sort by name: ```sh jq -c -s 'sort_by(.name)[]' users.ndjson > tmp && mv tmp users.ndjson ``` --- # ✅ 6. CREATE a reusable CLI tool `jsondb.sh`: ```sh #!/bin/sh DB="$1" CMD="$2" shift 2 case "$CMD" in find) jq -c "select($1)" "$DB" ;; insert) echo "$1" >> "$DB" ;; delete) jq -c "select(($1) | not)" "$DB" > tmp && mv tmp "$DB" ;; update) filter="$1" # e.g. '.id == 2' update="$2" # e.g. '.email = "x@y.com"' jq -c " if ($filter) then $update else . end " "$DB" > tmp && mv tmp "$DB" ;; *) echo "Commands: find / insert / delete / update" ;; esac ``` Example: ```sh ./jsondb.sh users.ndjson find '.id == 1' ./jsondb.sh users.ndjson insert '{"id":4,"name":"D","email":"d@mail"}' ./jsondb.sh users.ndjson delete '.id == 2' ./jsondb.sh users.ndjson update '.id == 3' '.name = "Carol X"' ``` --- # Want more? I can generate: * a full **query language** (where, select, limit, order by) * automatic **primary key indexing** * multi-table JSON structure * a “transaction log” version * CLI with subcommands and help text Just tell me what you'd like to build next.