Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
d20d386
Updated doc comments above DRef, poolget, poolset
jlim5634 Apr 11, 2026
08a9f30
updated index.md and make.jl for MemPool
jlim5634 Apr 11, 2026
3a0c9c8
updated PR and fixed Julian's comments
jlim5634 Apr 12, 2026
4f7f172
Remove assets folder from PR
jlim5634 Apr 12, 2026
3beb1c5
Update docs/src/index.md
jlim5634 Apr 24, 2026
70f8b26
Update docs/src/index.md
jlim5634 Apr 24, 2026
5ca4a83
Update docs/src/index.md
jlim5634 Apr 24, 2026
7b35686
Update docs/src/index.md
jlim5634 Apr 24, 2026
0b04553
Update docs/src/index.md
jlim5634 Apr 24, 2026
c410bbc
Update docs/make.jl
jlim5634 Apr 24, 2026
dc40742
Update docs/make.jl
jlim5634 Apr 24, 2026
1543138
Update docs/src/index.md
jlim5634 Apr 24, 2026
7693c35
Update docs/src/index.md
jlim5634 Apr 24, 2026
f47fcee
Update docs/src/index.md
jlim5634 Apr 24, 2026
c6924fc
Update docs/src/index.md
jlim5634 Apr 24, 2026
f309923
Update docs/src/index.md
jlim5634 Apr 24, 2026
41ef1fa
Update docs/src/index.md
jlim5634 Apr 24, 2026
64cc98b
Update docs/src/index.md
jlim5634 Apr 24, 2026
69d23fe
Update docs/src/index.md
jlim5634 Apr 24, 2026
0605254
Update docs/src/index.md
jlim5634 Apr 24, 2026
aeed2d8
Update docs/src/index.md
jlim5634 Apr 24, 2026
8613fb9
Update docs/src/index.md
jlim5634 Apr 24, 2026
477211f
Update docs/src/index.md
jlim5634 Apr 24, 2026
9998c3a
Update index.md
jlim5634 Apr 24, 2026
dbc6e45
Fixed changes that were copied from Dagger
jlim5634 Apr 24, 2026
99fd558
Merge branch 'setup-mempool-docs' of https://github.com/jlim5634/MemP…
jlim5634 Apr 24, 2026
6b61b0b
Update docs/make.jl
jlim5634 Apr 24, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
[deps]
Dagger = "d58978e5-989f-55fb-8d15-ea34adc7bf54"
DaggerWebDash = "cfc5aa84-1a2a-41ab-b391-ede92ecae40c"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
GraphViz = "f526b714-d49f-11e8-06ff-31ed36ee7ee0"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
TimespanLogging = "a526e669-04d3-4846-9525-c66122c55f63"

[compat]
Documenter = "1"
julia = "1.11"

[sources]
Dagger = {path = ".."}
DaggerWebDash = {path = "../lib/DaggerWebDash"}
TimespanLogging = {path = "../lib/TimespanLogging"}
Comment thread
jlim5634 marked this conversation as resolved.
Outdated
25 changes: 25 additions & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
using MemPool
using Documenter
import Documenter.Remotes: GitHub

makedocs(;
modules = [MemPool],
authors = "JuliaParallel and contributors",
repo = GitHub("JuliaParallel", "MemPool.jl"),
sitename = "MemPool.jl",
format = Documenter.HTML(;
prettyurls = get(ENV, "CI", "false") == "true",
canonical = "https://JuliaParallel.github.io/MemPool.jl",
Comment thread
jlim5634 marked this conversation as resolved.
Outdated
assets = String["assets/favicon.ico"],
Comment thread
jlim5634 marked this conversation as resolved.
Outdated
),
pages = [
"Home" => "index.md",
"API Reference" => "api.md",
],
warnonly = [:missing_docs]
)

deploydocs(;
repo = "github.com/JuliaParallel/MemPool.jl",
devbranch = "main", # Or "master", check your repo's default branch
Comment thread
jlim5634 marked this conversation as resolved.
Outdated
)
Empty file added docs/src/api.md
Empty file.
100 changes: 100 additions & 0 deletions docs/src/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# MemPool: A framework for out-of-core and parallel execution

MemPool.jl acts as a gatekeeper and looks at its internal table of "chunks". If space exists
for a DAG, it tells the Dagger Scheduler that it is good to run. It then handles the
bit-by-bit transfer of data from where it is (RAM/Disk) into that GPU's memory. If there is
no space and it is full, it tells the Scheduler that the capacity is reached.
Comment thread
jlim5634 marked this conversation as resolved.
Outdated


Note: when using MemPool with multiple workers, make sure that the workers are
initialized *before* importing MemPool This ensures the package is loaded on all nodes:
Comment thread
jlim5634 marked this conversation as resolved.
Outdated
```julia-repl
julia> using Distributed

julia> addprocs(2)

julia> using MemPool
```

-----

## Quickstart: Data Management

For more details: [Data Management](@ref)

The core of MemPool revolves around `DRef`(Distributed Reference). A `DRef` is a pointer
Comment thread
jlim5634 marked this conversation as resolved.
Outdated
to data that might live in local RAM, remote RAM, or on disk.

### Creating and retreiving data

Use `poolset` to register data with the pool and `poolget` to retreive the actual value
Comment thread
jlim5634 marked this conversation as resolved.
Outdated
```julia
using MemPool

A = rand(1000, 1000)
ref = poolset(A)

A_retrieved = poolget(ref)
```
This will move a large array (A) into a ref using `poolset(A)`.
Comment thread
jlim5634 marked this conversation as resolved.
Outdated
If you wanted to clear the data from local scope and retrive it later
from the `DRef`, run `poolget(ref)`.
Comment thread
jlim5634 marked this conversation as resolved.
Outdated


### Manual Worker Assignment

You can force data to be stored on a specific worker by passing a worker ID to 'poolset':

```julia
ref_w2 = poolset(rand(500), 2)
```

Comment thread
jlim5634 marked this conversation as resolved.
### Quickstart: Out-of-Core Configuration
Comment thread
jlim5634 marked this conversation as resolved.
Outdated

When `membound` is reached, MemPool will trigger a GC sweep or move data to the `diskpath`.
Comment thread
jlim5634 marked this conversation as resolved.
Outdated

## Enabling the Disk Cache
Comment thread
jlim5634 marked this conversation as resolved.
Outdated
```julia
# 1. Define the configuration
cfg = MemPool.DiskCacheConfig(
toggle = true,
membound = 4 * 1024^3, # 4GB RAM Limit
diskpath = "/tmp/mempool_cache", # Disk storage location
allocator_type = "LRU" # Least Recently Used eviction
)

# 2. Apply the configuration
MemPool.setup_global_device!(cfg)
```

Comment thread
jlim5634 marked this conversation as resolved.
## Memory Reservation Logic
Comment thread
jlim5634 marked this conversation as resolved.
Outdated

MemPool includes a `ensure_memory_reserved` mechanism. When a `poolset` is called, the system checks if the OS is running
tight on memory. If so, it will:
1. Trigger a local GC.
2. If memory is still tight, trigger a full `GC.gc(true)`.
3. Finally, trigger a cluster-wide GC (`@everywhere GC.gc(true)`).
Comment thread
jlim5634 marked this conversation as resolved.
Outdated


### Quickstart: Persistence & Migration

## Migrating Data Between Workers
Comment thread
jlim5634 marked this conversation as resolved.
Outdated

You can move data from one worker to another without breaking existing references:
Comment thread
jlim5634 marked this conversation as resolved.
Outdated

```julia
# Move data from current owner to worker 3
new_ref = MemPool.migrate!(ref, 3)
```

Comment thread
jlim5634 marked this conversation as resolved.
## Managed File I/O
Comment thread
jlim5634 marked this conversation as resolved.
Outdated

Treat files as managed `DRef` objects to avoid loading massive datasets into RAM all at once:

```julia
#Create a lazy refence to a serialized Julia file
f = MemPool.File("large_dataset.jls")

#Data is only loaded when explicitly requested
data = poolget(f)
```
Comment thread
jlim5634 marked this conversation as resolved.
19 changes: 19 additions & 0 deletions src/datastore.jl
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,13 @@ else
import Distributed: ClusterSerializer, worker_id_from_socket
end

"""
DRef(owner::Int, id::Int, size::UInt)

A Distributed Reference (DRef) which acts as a handle to store data in MemPool.
It tracks which worker 'owner' holds the data and a unique 'id' assigned to the data.
'size' stores an aproximation of the in-memory byte size of the object.
"""
mutable struct DRef
owner::Int
id::Int
Expand Down Expand Up @@ -451,6 +458,12 @@ function ensure_memory_reserved(size::Integer=0; max_sweeps::Integer=MEM_RESERVE
end
end

"""
poolset(x, [pid]; kwargs...) -> DRef

Stores the value 'x' into the memory pool on worker 'pid' (defaults to myid())
and returns a 'DRef' handle that can be used to later access the value.
"""
function poolset(@nospecialize(x), pid=myid(); size=approx_size(x),
retain=false, restore=false,
device=GLOBAL_DEVICE[], leaf_device=initial_leaf_device(device),
Expand Down Expand Up @@ -523,6 +536,12 @@ function forwardkeyerror(f)
end
end

"""
poolget(ref::DRef)

Retrieves the data value referenced by 'ref'. If the data is remote or
on disk, MemPool handles the retrieval automatically.
"""
function poolget(ref::DRef)
DEBUG_REFCOUNTING[] && _enqueue_work(Core.print, "?? (", ref.owner, ", ", ref.id, ") at ", myid(), "\n")
return access_ref(identity, ref)
Expand Down
Loading