Monday, June 18, 2012

Calling C and Fortran Code in Julia programming language

Calling C and Fortran Code

Though most code can be written in Julia, there are many high-quality, mature libraries for numerical computing already written in C and Fortran. To allow easy use of this existing code, Julia makes it simple and efficient to call C and Fortran functions. Julia has a “no boilerplate” philosophy: functions can be called directly from Julia without any “glue” code, code generation, or compilation — even from the interactive prompt. This is accomplished in three steps:

Load a shared library and create a handle to it.
Lookup a library function by name, getting a handle to it.
Call the library function using the built-in ccall function.

The code to be called must be available as a shared library. Most C and Fortran libraries ship compiled as shared libraries already, but if you are compiling the code yourself using GCC (or Clang), you will need to use the -shared and -fPIC options. The machine instructions generated by Julia’s JIT are the same as a native C call would be, so the resulting overhead is the same as calling a library function from C code. (Non-library function calls in both C and Julia can be inlined and thus may have even less overhead than calls to shared library functions. When both libraries and executables are generated by LLVM, it is possible to perform whole-program optimizations that can even optimize across this boundary, but Julia does not yet support that. In the future, however, it may do so, yielding even greater performance gains.)
Shared libraries are loaded with dlopen function, which provides access to the functionality of the POSIX dlopen(3) call: it locates a shared library binary and loads it into the process’ memory allowing the program to access functions and variables contained in the library. The following call loads the standard C library, and stores the resulting handle in a Julia variable called libc:

libc = dlopen("libc")

Once a library has been loaded, functions can be looked up by name using the dlsym function, which exposes the functionality of the POSIX dlsym(3) call. This returns a handle to the clock function from the standard C library:

libc_clock = dlsym(libc, :clock)

Finally, you can use ccall to actually generate a call to the library function. Inputs to ccall are as follows:

Function reference from dlsym — a value of type Ptr{Void}.
Return type, which may be any bits type, including Int32, Int64, Float64, or Ptr{T} for any type parameter T, indicating a pointer to values of type T, or just Ptr for void* “untyped pointer” values.
A tuple of input types, like those allowed for the return type.
The following arguments, if any, are the actual argument values passed to the function.

As a complete but simple example, the following calls the clock function from the standard C library:

julia> t = ccall(dlsym(libc, :clock), Int32, ())
5380445

julia> typeof(ans)
Int32

clock takes no arguments and returns an Int32. One common gotcha is that a 1-tuple must be written with with a trailing comma. For example, to call the getenv function to get a pointer to the value of an environment variable, one makes a call like this:

julia> path = ccall(dlsym(libc, :getenv), Ptr{Uint8}, (Ptr{Uint8},), "SHELL")
Ptr{Uint8} @0x00007fff5fbfd670

julia> cstring(path)
"/bin/zsh"

Note that the argument type tuple must be written as (Ptr{Uint8},), rather than (Ptr{Uint8}). This is because (Ptr{Uint8}) is just Ptr{Uint8}, rather than a 1-tuple containing Ptr{Uint8}:

julia> (Ptr{Uint8})
Ptr{Uint8}

julia> (Ptr{Uint8},)
(Ptr{Uint8},)

In practice, especially when providing reusable functionality, one generally wraps ccall uses in Julia functions that set up arguments and then check for errors in whatever manner the C or Fortran function indicates them, propagating to the Julia caller as exceptions. This is especially important since C and Fortran APIs are notoriously inconsistent about how they indicate error conditions. For example, the getenv C library function is wrapped in the following Julia function in `env.jl <https://github.com/JuliaLang/julia/blob/master/base/env.jl>`_:

function getenv(var::String)
  val = ccall(dlsym(libc, :getenv),
              Ptr{Uint8}, (Ptr{Uint8},), cstring(var))
  if val == C_NULL
    error("getenv: undefined variable: ", var)
  end
  cstring(val)
end

The C getenv function indicates an error by returning NULL, but other standard C functions indicate errors in various different ways, including by returning -1, 0, 1 and other special values. This wrapper throws an exception clearly indicating the problem if the caller tries to get a non-existent environment variable:

julia> getenv("SHELL")
"/bin/zsh"

julia> getenv("FOOBAR")
getenv: undefined variable: FOOBAR

Here is a slightly more complex example that discovers the local machine’s hostname:

function gethostname()
  hostname = Array(Uint8, 128)
  ccall(dlsym(libc, :gethostname), Int32,
        (Ptr{Uint8}, Ulong),
        hostname, length(hostname))
  return cstring(convert(Ptr{Uint8}, hostname))
end

This example first allocates an array of bytes, then calls the C library function gethostname to fill the array in with the hostname, takes a pointer to the hostname buffer, and converts the pointer to a Julia string, assuming that it is a NUL-terminated C string. It is common for C libraries to use this pattern of requiring the caller to allocate memory to be passed to the callee and filled in. Allocation of memory from Julia like this is generally accomplished by creating an uninitialized array and passing a pointer to its data to the C function.
When calling a Fortran function, all inputs must be passed by reference.
A prefix & is used to indicate that a pointer to a scalar argument should be passed instead of the scalar value itself. The following example computes a dot product using a BLAS function.

libBLAS = dlopen("libLAPACK")

function compute_dot(DX::Vector, DY::Vector)
  assert(length(DX) == length(DY))
  n = length(DX)
  incx = incy = 1
  product = ccall(dlsym(libBLAS, :ddot_),
                  Float64,
                  (Ptr{Int32}, Ptr{Float64}, Ptr{Int32}, Ptr{Float64}, Ptr{Int32}),
                  &n, DX, &incx, DY, &incy)
  return product
end

The meaning of prefix & is not quite the same as in C. In particular, any changes to the referenced variables will not be visible in Julia. However, it will not cause any harm for called functions to attempt such modifications (that is, writing through the passed pointers). Since this & is not a real address operator, it may be used with any syntax, such as &0 or &f(x).
Note that no C header files are used anywhere in the process. Currently, it is not possible to pass structs and other non-primitive types from Julia to C libraries. However, C functions that generate and use opaque structs types by passing around pointers to them can return such values to Julia as Ptr{Void}, which can then be passed to other C functions as Ptr{Void}. Memory allocation and deallocation of such objects must be handled by calls to the appropriate cleanup routines in the libraries being used, just like in any C program.

Mapping C Types to Julia

Julia automatically inserts calls to the convert function to convert each argument to the specified type. For example, the following call:

ccall(dlsym(libfoo, :foo), Void, (Int32, Float64),
      x, y)

will behave as if the following were written:

ccall(dlsym(libfoo, :foo), Void, (Int32, Float64),
      convert(Int32, x), convert(Float64, y))

When a scalar value is passed with & as an argument of type Ptr{T}, the value will first be converted to type T.

Array conversions

When an Array is passed to C as a Ptr argument, it is “converted” simply by taking the address of the first element. This is done in order to avoid copying arrays unnecessarily, and to tolerate the slight mismatches in pointer types that are often encountered in C APIs (for example, passing a Float64 array to a function that operates on uninterpreted bytes).
Therefore, if an Array contains data in the wrong format, it will have to be explicitly converted using a call such as int32(a).

Type correspondences

On all systems we currently support, basic C/C++ value types may be translated to Julia types as follows.
System-independent:

bool ⟺ Bool
char ⟺ Uint8
signed char ⟺ Int8
unsigned char ⟺ Uint8
short ⟺ Int16
unsigned short ⟺ Uint16
int ⟺ Int32
usigned int ⟺ Uint32
long long ⟺ Int64
usigned long long ⟺ Uint64
float ⟺ Float32
double ⟺ Float64

Note: the bool type is only defined by C++, where it is 8 bits wide. In C, however, int is often used for boolean values. Since int is 32-bits wide (on all supported systems), there is some potential for confusion here.
A C function declared to return Void will give nothing in Julia.
System-dependent:

long ⟺ Int
unsigned long ⟺ Uint
size_t ⟺ Uint
wchar_t ⟺ Char

Note: Although wchar_t is technically system-dependent, on all the systems we currently support (UNIX), it is a 32 bits.
C functions that take an arguments of the type char** can be called by using a Ptr{Ptr{Uint8}} type within Julia. For example, C functions of the form:

int main(int argc, char **argv);

can be called via the following Julia code:

argv = [ "a.out", "arg1", "arg2" ]
ccall(:main, Int32, (Int32, Ptr{Ptr{Uint8}}), length(argv), argv)

Parallel Computing in Julia programming language

Parallel Computing

Most modern computers possess more than one CPU, and several computers can be combined together in a cluster. Harnessing the power of these multiple CPUs allows many computations to be completed more quickly. There are two major factors that influence performance: the speed of the CPUs themselves, and the speed of their access to memory. In a cluster, it’s fairly obvious that a given CPU will have fastest access to the RAM within the same computer (node). Perhaps more surprisingly, similar issues are very relevant on a typical multicore laptop, due to differences in the speed of main memory and the cache. Consequently, a good multiprocessing environment should allow control over the “ownership” of a chunk of memory by a particular CPU. Julia provides a multiprocessing environment based on message passing to allow programs to run on multiple processors in separate memory domains at once.
Julia’s implementation of message passing is different from other environments such as MPI. Communication in Julia is generally “one-sided”, meaning that the programmer needs to explicitly manage only one processor in a two-processor operation. Furthermore, these operations typically do not look like “message send” and “message receive” but rather resemble higher-level operations like calls to user functions.
Parallel programming in Julia is built on two primitives: remote references and remote calls. A remote reference is an object that can be used from any processor to refer to an object stored on a particular processor. A remote call is a request by one processor to call a certain function on certain arguments on another (possibly the same) processor. A remote call returns a remote reference to its result. Remote calls return immediately; the processor that made the call proceeds to its next operation while the remote call happens somewhere else. You can wait for a remote call to finish by calling wait on its remote reference, and you can obtain the full value of the result using fetch.
Let’s try this out. Starting with julia -p n provides n processors on the local machine. Generally it makes sense for n to equal the number of CPU cores on the machine.

$ ./julia -p 2

julia> r = remote_call(2, rand, 2, 2)
RemoteRef(2,1,0)

julia> s = remote_call(2, +, 1, r)
RemoteRef(2,1,1)

julia> fetch(r)
0.10824216411304866 0.13798233877923116
0.12376292706355074 0.18750497916607167

julia> fetch(s)
1.10824216411304866 1.13798233877923116
1.12376292706355074 1.18750497916607167

The first argument to remote_call is the index of the processor that will do the work. Most parallel programming in Julia does not reference specific processors or the number of processors available, but remote_call is considered a low-level interface providing finer control. The second argument to remote_call is the function to call, and the remaining arguments will be passed to this function. As you can see, in the first line we asked processor 2 to construct a 2-by-2 random matrix, and in the second line we asked it to add 1 to it. The result of both calculations is available in the two remote references, r and s.
Occasionally you might want a remotely-computed value immediately. This typically happens when you read from a remote object to obtain data needed by the next local operation. The function remote_call_fetch exists for this purpose. It is equivalent to fetch(remote_call(...)) but is more efficient.

julia> remote_call_fetch(2, ref, r, 1, 1)
0.10824216411304866

Remember that ref(r,1,1) is equivalent to r[1,1], so this call fetches the first element of the remote reference r.
The syntax of remote_call is not especially convenient. The macro @spawn makes things easier. It operates on an expression rather than a function, and picks where to do the operation for you:

julia> r = @spawn rand(2,2)
RemoteRef(1,1,0)

julia> s = @spawn 1+fetch(r)
RemoteRef(1,1,1)

julia> fetch(s)
1.10824216411304866 1.13798233877923116
1.12376292706355074 1.18750497916607167

Note that we used 1+fetch(r) instead of 1+r. This is because we do not know where the code will run, so in general a fetch might be required to move r to the processor doing the addition. In this case, @spawn is smart enough to perform the computation on the processor that owns r, so the fetch will be a no-op.
(It is worth noting that @spawn is not built-in but defined in Julia as a macro. It is possible to define your own such constructs.)
One important point is that your code must be available on any processor that runs it. For example, type the following into the julia prompt:

julia> function rand2(dims...)
         return 2*rand(dims...)
       end

julia> rand2(2,2)
2x2 Float64 Array:
 0.153756  0.368514
 1.15119   0.918912

julia> @spawn rand2(2,2)
RemoteRef(1,1,1)

julia> @spawn rand2(2,2)
RemoteRef(2,1,2)

julia> exception on 2: in anonymous: rand2 not defined

Processor 1 knew about the function rand2, but processor 2 did not. To make your code available to all processors, there are two primary methods. First, the load function will automatically load a source file on all currently available processors. In a cluster, the contents of the file (and any files loaded recursively) will be sent over the network.

julia> load("myfile.jl")

Alternatively, all Julia processes will automatically load a file called startup.jl (if it exists) in the same directory as the Julia executable on startup. If you regularly work with certain source files, it makes sense to load them from this file. Julia also loads the file .juliarc.jl in the user’s home directory.

Data Movement

Sending messages and moving data constitute most of the overhead in a parallel program. Reducing the number of messages and the amount of data sent is critical to achieving performance and scalability. To this end, it is important to understand the data movement performed by Julia’s various parallel programming constructs.
fetch can be considered an explicit data movement operation, since it directly asks that an object be moved to the local machine. @spawn (and a few related constructs) also moves data, but this is not as obvious, hence it can be called an implicit data movement operation. Consider these two approaches to constructing and squaring a random matrix:

# method 1
A = rand(1000,1000)
Bref = @spawn A^2
...
fetch(Bref)

# method 2
Bref = @spawn rand(1000,1000)^2
...
fetch(Bref)

The difference seems trivial, but in fact is quite significant due to the behavior of @spawn. In the first method, a random matrix is constructed locally, then sent to another processor where it is squared. In the second method, a random matrix is both constructed and squared on another processor. Therefore the second method sends much less data than the first.
In this toy example, the two methods are easy to distinguish and choose from. However, in a real program designing data movement might require more thought and very likely some measurement. For example, if the first processor needs matrix A then the first method might be better. Or, if computing A is expensive and only the current processor has it, then moving it to another processor might be unavoidable. Or, if the current processor has very little to do between the @spawn and fetch(Bref) then it might be better to eliminate the parallelism altogether. Or imagine rand(1000,1000) is replaced with a more expensive operation. Then it might make sense to add another @spawn statement just for this step.

Parallel Map and Loops

Fortunately, many useful parallel computations do not require data movement. A common example is a monte carlo simulation, where multiple processors can handle independent simulation trials simultaneously. We can use @spawn to flip coins on two processors:

function count_heads(n)
    c::Int = 0
    for i=1:n
        c += randbit()
    end
    c
end

a = @spawn count_heads(100000000)
b = @spawn count_heads(100000000)
fetch(a)+fetch(b)

The function count_heads simply adds together n random bits. Then we perform some trials on two machines, and add together the results. (Remeber that count_heads should be defined in a file and loaded to make sure it is available to both processors.)
This example, as simple as it is, demonstrates a powerful and often-used parallel programming pattern. Many iterations run independently over several processors, and then their results are combined using some function. The combination process is called a reduction, since it is generally tensor-rank-reducing: a vector of numbers is reduced to a single number, or a matrix is reduced to a single row or column, etc. In code, this typically looks like the pattern x = f(x,v[i]), where x is the accumulator, f is the reduction function, and the v[i] are the elements being reduced. It is desirable for f to be associative, so that it does not matter what order the operations are performed in.
Notice that our use of this pattern with count_heads can be generalized. We used two explicit @spawn statements, which limits the parallelism to two processors. To run on any number of processors, we can use a parallel for loop, which can be written in Julia like this:

nheads = @parallel (+) for i=1:200000000
  randbit()
end

This construct implements the pattern of assigning iterations to multiple processors, and combining them with a specified reduction (in this case (+)). The result of each iteration is taken as the value of the last expression inside the loop. The whole parallel loop expression itself evaluates to the final answer.
Note that although parallel for loops look like serial for loops, their behavior is dramatically different. In particular, the iterations do not happen in a specified order, and writes to variables or arrays will not be globally visible since iterations run on different processors. Any variables used inside the parallel loop will be copied and broadcast to each processor.
For example, the following code will not work as intended:

a = zeros(100000)
@parallel for i=1:100000
  a[i] = i
end

Notice that the reduction operator can be omitted if it is not needed. However, this code will not initialize all of a, since each processor will have a separate copy if it. Parallel for loops like these must be avoided. Fortunately, distributed arrays can be used to get around this limitation, as we will see in the next section.
Using “outside” variables in parallel loops is perfectly reasonable if the variables are read-only:

a = randn(1000)
@parallel (+) for i=1:100000
  f(a[randi(end)])
end

Here each iteration applies f to a randomly-chosen sample from a vector a shared by all processors.
In some cases no reduction operator is needed, and we merely wish to apply a function to all integers in some range (or, more generally, to all elements in some collection). This is another useful operation called parallel map, implemented in Julia as the pmap function. For example, we could compute the singular values of several large random matrices in parallel as follows:

M = {rand(1000,1000) for i=1:10}
pmap(svd, M)

Julia’s pmap is designed for the case where each function call does a large amount of work. In contrast, @parallel for can handle situations where each iteration is tiny, perhaps merely summing two numbers.

Distributed Arrays

Large computations are often organized around large arrays of data. In these cases, a particularly natural way to obtain parallelism is to distribute arrays among several processors. This combines the memory resources of multiple machines, allowing use of arrays too large to fit on one machine. Each processor operates on the part of the array it owns, providing a ready answer to the question of how a program should be divided among machines.
A distributed array (or, more generally, a global object) is logically a single array, but pieces of it are stored on different processors. This means whole-array operations such as matrix multiply, scalar*array multiplication, etc. use the same syntax as with local arrays, and the parallelism is invisible. In some cases it is possible to obtain useful parallelism just by changing a local array to a distributed array.
Julia distributed arrays are implemented by the DArray type. A DArray has an element type and dimensions just like an Array, but it also needs an additional property: the dimension along which data is distributed. There are many possible ways to distribute data among processors, but at this time Julia keeps things simple and only allows distributing along a single dimension. For example, if a 2-d DArray is distributed in dimension 1, it means each processor holds a certain range of rows. If it is distrbuted in dimension 2, each processor holds a certain range of columns.
Common kinds of arrays can be constructed with functions beginning with d:

dzeros(100,100,10)
dones(100,100,10)
drand(100,100,10)
drandn(100,100,10)
dcell(100,100,10)
dfill(x, 100,100,10)

In the last case, each element will be initialized to the specified value x. These functions automatically pick a distributed dimension for you. To specify the distributed dimension, other forms are available:

drand((100,100,10), 3)
dzeros(Int64, (100,100), 2)
dzeros((100,100), 2, [7, 8])

In the drand call, we specified that the array should be distributed across dimension 3. In the first dzeros call, we specified an element type as well as the distributed dimension. In the second dzeros call, we also specified which processors should be used to store the data. When dividing data among a large number of processors, one often sees diminishing returns in performance. Placing DArrays on a subset of processors allows multiple DArray computations to happen at once, with a higher ratio of work to communication on each processor.
distribute(a::Array, dim) can be used to convert a local array to a distributed array, optionally specifying the distributed dimension. localize(a::DArray) can be used to obtain the locally-stored portion of a DArray. owner(a::DArray, index) gives the id of the processor storing the given index in the distributed dimension. myindexes(a::DArray) gives a tuple of the indexes owned by the local processor. convert(Array, a::DArray) brings all the data to one node.
A DArray can be stored on a subset of the available processors. Three properties fully describe the distribution of DArray d. d.pmap[i] gives the processor id that owns piece number i of the array. Piece i consists of indexes d.dist[i] through d.dist[i+1]-1. distdim(d) gives the distributed dimension. For convenience, d.localpiece gives the number of the piece owned by the local processor (this could also be determined by searching d.pmap). The array d.pmap is also available as procs(d).
Indexing a DArray (square brackets) gathers all of the referenced data to a local Array object.
Indexing a DArray with the sub function creates a “virtual” sub-array that leaves all of the data in place. This should be used where possible, especially for indexing operations that refer to large pieces of the original array.
sub itself, naturally, does no communication and so is very efficient. However, this does not mean it should be viewed as an optimization in all cases. Many situations require explicitly moving data to the local processor in order to do a fast serial operation. For example, functions like matrix multiply perform many accesses to their input data, so it is better to have all the data available locally up front.

Constructing Distributed Arrays

The primitive DArray constructor is the function darray, which has the following somewhat elaborate signature:

darray(init, type, dims, distdim, procs, dist)

init is a function of three arguments that will run on each processor, and should return an Array holding the local data for the current processor. Its arguments are (T,d,da) where T is the element type, d is the dimensions of the needed local piece, and da is the new DArray being constructed (though, of course, it is not fully initialized).
type is the element type.
dims is the dimensions of the entire DArray.
distdim is the dimension to distribute in.
procs is a vector of processor ids to use.
dist is a vector giving the first index of each contiguous distributed piece, such that the nth piece consists of indexes dist[n] through dist[n+1]-1. If you have a vector v of the sizes of the pieces, dist can be computed as cumsum([1,v]).
The last three arguments are optional, and defaults will be used if they are omitted. The first argument, the init function, can also be omitted, in which case an uninitialized DArray is constructed.
As an example, here is how to turn the local array constructor rand into a distributed array constructor:

drand(args...) = darray((T,d,da)->rand(d), Float64, args...)

In this case the init function only needs to call rand with the dimensions of the local piece it is creating. drand accepts the same trailing arguments as darray. darray also has definitions that allow functions like drand to accept the same arguments as their local counterparts, so calls like drand(m,n) will also work.
The changedist function, which changes the distribution of a DArray, can be implemented with one call to darray where the init function uses indexing to gather data from the existing array:

function changedist(A::DArray, to_dist)
    return darray((T,sz,da)->A[myindexes(da)...],
                  eltype(A), size(A), to_dist, procs(A))
end

It is particularly easy to construct a DArray where each block is a function of a block in an existing DArray. This is done with the form darray(f, A). For example, the unary minus function can be implemented as:

-(A::DArray) = darray(-, A)

Distributed Array Computations

Whole-array operations (e.g. elementwise operators) are a convenient way to use distributed arrays, but they are not always sufficient. To handle more complex problems, tasks can be spawned to operate on parts of a DArray and write the results to another DArray. For example, here is how you could apply a function f to each 2-d slice of a 3-d DArray:

function compute_something(A::DArray)
    B = darray(eltype(A), size(A), 3)
    for i = 1:size(A,3)
        @spawnat owner(B,i) B[:,:,i] = f(A[:,:,i])
    end
    B
end

We used @spawnat to place each operation near the memory it writes to.
This code works in some sense, but trouble stems from the fact that it performs writes asynchronously. In other words, we don’t know when the result data will be written to the array and become ready for further processing. This is known as a “race condition”, one of the famous pitfalls of parallel programming. Some form of synchronization is necessary to wait for the result. As we saw above, @spawn returns a remote reference that can be used to wait for its computation. We could use that feature to wait for specific blocks of work to complete:

function compute_something(A::DArray)
    B = darray(eltype(A), size(A), 3)
    deps = cell(size(A,3))
    for i = 1:size(A,3)
        deps[i] = @spawnat owner(B,i) B[:,:,i] = f(A[:,:,i])
    end
    (B, deps)
end

Now a function that needs to access slice i can perform wait(deps[i]) first to make sure the data is available.
Another option is to use a @sync block, as follows:

function compute_something(A::DArray)
    B = darray(eltype(A), size(A), 3)
    @sync begin
        for i = 1:size(A,3)
            @spawnat owner(B,i) B[:,:,i] = f(A[:,:,i])
        end
    end
    B
end

@sync waits for all spawns performed within it to complete. This makes our compute_something function easy to use, at the price of giving up some parallelism (since calls to it cannot overlap with subsequent operations).
Still another option is to use the initial, un-synchronized version of the code, and place a @sync block around a larger set of operations in the function calling this one.

Synchronization With Remote References

Scheduling

Julia’s parallel programming platform uses Tasks (aka Coroutines) to switch among multiple computations. Whenever code performs a communication operation like fetch or wait, the current task is suspended and a scheduler picks another task to run. A task is restarted when the event it is waiting for completes.
For many problems, it is not necessary to think about tasks directly. However, they can be used to wait for multiple events at the same time, which provides for dynamic scheduling. In dynamic scheduling, a program decides what to compute or where to compute it based on when other jobs finish. This is needed for unpredictable or unbalanced workloads, where we want to assign more work to processors only when they finish their current tasks.
As an example, consider computing the singular values of matrices of different sizes:

M = {rand(800,800), rand(600,600), rand(800,800), rand(600,600)}
pmap(svd, M)

If one processor handles both 800x800 matrices and another handles both 600x600 matrices, we will not get as much scalability as we could. The solution is to make a local task to “feed” work to each processor when it completes its current task. This can be seen in the implementation of pmap:

function pmap(f, lst)
    np = nprocs()  # determine the number of processors available
    n = length(lst)
    results = cell(n)
    i = 1
    # function to produce the next work item from the queue.
    # in this case it's just an index.
    next_idx() = (idx=i; i+=1; idx)
    @sync begin
        for p=1:np
            @spawnlocal begin
                while true
                    idx = next_idx()
                    if idx > n
                        break
                    end
                    results[idx] = remote_call_fetch(p, f, lst[idx])
                end
            end
        end
    end
    results
end

@spawnlocal is similar to @spawn, but only runs tasks on the local processor. We use it to create a “feeder” task for each processor. Each task picks the next index that needs to be computed, then waits for its processor to finish, then repeats until we run out of indexes. A @sync block is used to wait for all the local tasks to complete, at which point the whole operation is done. Notice that all the feeder tasks are able to share state via next_idx() since they all run on the same processor. However, no locking is required, since the threads are scheduled cooperatively and not preemptively. This means context switches only occur at well-defined points (during the fetch operation).

Adding Processors

Metaprogramming in Julia programming language

Metaprogramming

The strongest legacy of Lisp in the Julia language is its metaprogramming support. Like Lisp, Julia is homoiconic: it represents its own code as a data structure of the language itself. Since code is represented by objects that can be created and manipulated from within the language, it is possible for a program to transform and generate its own code. This allows sophisticated code generation without extra build steps, and also allows true Lisp-style macros, as compared to preprocessor “macro” systems, like that of C and C++, that perform superficial textual manipulation as a separate pass before any real parsing or interpretation occurs. Another aspect of metaprogramming is reflection: the ability of a running program to dynamically discover properties of itself. Reflection emerges naturally from the fact that all data types and code are represented by normal Julia data structures, so the structure of the program and its types can be explored programmatically just like any other data.

Expressions and Eval

Julia code is represented as a syntax tree built out of Julia data structures of type Expr. This makes it easy to construct and manipulate Julia code from within Julia, without generating or parsing source text. Here is the definition of the Expr type:

type Expr
  head::Symbol
  args::Array{Any,1}
  typ
end

The head is a symbol identifying the kind of expression, and args is an array of subexpressions, which may be symbols referencing the values of variables at evaluation time, may be nested Expr objects, or may be actual values of objects. The typ field is used by type inference to store type annotations, and can generally be ignored.
There is special syntax for “quoting” code (analogous to quoting strings) that makes it easy to create expression objects without explicitly constructing Expr objects. There are two forms: a short form for inline expressions using : followed by a single expression, and a long form for blocks of code, enclosed in quote ... end. Here is an example of the short form used to quote an arithmetic expression:

julia> ex = :(a+b*c+1)
+(a,*(b,c),1)

julia> typeof(ex)
Expr

julia> ex.head
call

julia> typeof(ans)
Symbol

julia> ex.args
{+,a,*(b,c),1}

julia> typeof(ex.args[1])
Symbol

julia> typeof(ex.args[2])
Symbol

julia> typeof(ex.args[3])
Expr

julia> typeof(ex.args[4])
Int64

Expressions provided by the parser generally only have symbols, other expressions, and literal values as their args, whereas expressions constructed by Julia code can easily have arbitrary run-time values without literal forms as args. In this specific example, + and a are symbols, *(b,c) is a subexpression, and 1 is a literal 64-bit signed integer. Here’s an example of the longer expression quoting form:

julia> quote
     x = 1
     y = 2
     x + y
   end

begin
  x = 1
  y = 2
  +(x,y)
end

When the argument to : is just a symbol, a Symbol object results instead of an Expr:

julia> :foo
foo

julia> typeof(ans)
Symbol

In the context of an expression, symbols are used to indicate access to variables, and when an expression is evaluated, a symbol evaluates to the value bound to that symbol in the appropriate scope (see Variables and Scoping for further details).

Eval and Interpolation

Given an expression object, one can cause Julia to evaluate (execute) it at the top level scope — i.e. in effect like loading from a file or typing at the interactive prompt — using the eval function:

julia> :(1 + 2)
+(1,2)

julia> eval(ans)
3

julia> ex = :(a + b)
+(a,b)

julia> eval(ex)
a not defined

julia> a = 1; b = 2;

julia> eval(ex)
3

Expressions passed to eval are not limited to returning values — they can also have side-effects that alter the state of the top-level evaluation environment:

julia> ex = :(x = 1)
x = 1

julia> x
x not defined

julia> eval(ex)
1

julia> x
1

Here, the evaluation of an expression object causes a value to be assigned to the top-level variable x.
Since expressions are just Expr objects which can be constructed programmatically and then evaluated, one can, from within Julia code, dynamically generate arbitrary code which can then be run using eval. Here is a simple example:

julia> a = 1;

julia> ex = Expr(:call, {:+,a,:b}, Any)
+(1,b)

julia> a = 0; b = 2;

julia> eval(ex)
3

The value of a is used to construct the expression ex which applies the + function to the value 1 and the variable b. Note the important distinction between the way a and b are used:

The value of the variable a at expression construction time is used as an immediate value in the expression. Thus, the value of a when the expression is evaluated no longer matters: the value in the expression is already 1, independent of whatever the value of a might be.
On the other hand, the symbol :b is used in the expression construction, so the value of the variable b at that time is irrelevant — :b is just a symbol and the variable b need not even be defined. At expression evaluation time, however, the value of the symbol :b is resolved by looking up the value of the variable b.

Constructing Expr objects like this is powerful, but somewhat tedious and ugly. Since the Julia parser is already excellent at producing expression objects, Julia allows “splicing” or interpolation of expression objects, prefixed with $, into quoted expressions, written using normal syntax. The above example can be written more clearly and concisely using interpolation:

julia> a = 1;
1

julia> ex = :($a + b)
+(1,b)

This syntax is automatically rewritten to the form above where we explicitly called Expr. The use of $ for expression interpolation is intentionally reminiscent of string interpolation and command interpolation. Expression interpolation allows convenient, readable programmatic construction of complex Julia expressions.

Code Generation

When a significant amount of repetitive boilerplate code is required, it is common to generate it programmatically to avoid redundancy. In most languages, this requires an extra build step, and a separate program to generate the repetitive code. In Julia, expression interpolation and eval allow such code generation to take place in the normal course of program execution. For example, the following code defines a series of operators on three arguments in terms of their 2-argument forms:

for op = (:+, :*, :&, :|, :$)
  eval(quote
    ($op)(a,b,c) = ($op)(($op)(a,b),c)
  end)
end

In this manner, Julia acts as its own preprocessor, and allows code generation from inside the language. The above code could be written slightly more tersely using the : prefix quoting form:

for op = (:+, :*, :&, :|, :$)
  eval(:(($op)(a,b,c) = ($op)(($op)(a,b),c)))
end

This sort of in-language code generation, however, using the eval(quote(...)) pattern, is common enough that Julia comes with a macro to abbreviate this pattern:

for op = (:+, :*, :&, :|, :$)
  @eval ($op)(a,b,c) = ($op)(($op)(a,b),c)
end

The @eval macro rewrites this call to be precisely equivalent to the above longer versions. For longer blocks of generated code, the expression argument given to @eval can be a block:

@eval begin
  # multiple lines
end

Interpolating into an unquoted expression is not supported and will cause a compile-time error:

julia> $a + b
not supported

Macros

Macros are the analogue of functions for expression generation at compile time: they allow the programmer to automatically generate expressions by transforming zero or more argument expressions into a single result expression, which then takes the place of the macro call in the final syntax tree. Macros are invoked with the following general syntax:

@name expr1 expr2 ...

Note the distinguishing @ before the macro name and the lack of commas between the argument expressions. Before the program runs, this statement will be replaced with the result of calling an expander function for name on the expression arguments. Expanders are defined with the macro keyword:

macro name(expr1, expr2, ...)
    ...
end

Here, for example, is very nearly the definition of Julia’s @assert macro (see `error.jl <https://github.com/JuliaLang/julia/blob/master/base/error.jl>`_ for the actual definition, which allows @assert to work on booleans arrays as well):

macro assert(ex)
    :($ex ? nothing : error("Assertion failed: ", $string(ex)))
end

This macro can be used like this:

julia> @assert 1==1.0

julia> @assert 1==0
Assertion failed: 1==0

Macro calls are expanded so that the above calls are precisely equivalent to writing

1==1.0 ? nothing : error("Assertion failed: ", "1==1.0")
1==0 ? nothing : error("Assertion failed: ", "1==0")

That is, in the first call, the expression :(1==1.0) is spliced into the test condition slot, while the value of string(:(1==1.0)) is spliced into the assertion message slot. The entire expression, thus constructed, is placed into the syntax tree where the @assert macro call occurs. Therefore, if the test expression is true when evaluated, the entire expression evaluates to nothing, whereas if the test expression is false, an error is raised indicating the asserted expression that was false. Notice that it would not be possible to write this as a function, since only the value of the condition and not the expression that computed it would be available.

Hygiene

An issue that arises in more complex macros is that of hygiene. In short, one needs to ensure that variables introduced and used by macros do not accidentally clash with the variables used in code interpolated into those macros. To demonstrate the problem before providing the solution, let us consider writing a @time macro that takes an expression as its argument, records the time, evaluates the expression, records the time again, prints the difference between the before and after times, and then has the value of the expression as its final value. A naïve attempt to write this macro might look like this:

macro time(ex)
  quote
    local t0 = time()
    local val = $ex
    local t1 = time()
    println("elapsed time: ", t1-t0, " seconds")
    val
  end
end

At first blush, this appears to work correctly:

julia> @time begin
         local t = 0
         for i = 1:10000000
           t += i
         end
         t
       end
elapsed time: 1.1377708911895752 seconds
50000005000000

Suppose, however, that we change the expression passed to @time slightly:

julia> @time begin
         local t0 = 0
         for i = 1:10000000
           t0 += i
         end
         t0
       end
syntax error: local t0 declared twice

What happened? The trouble is that after macro expansion, the above expression becomes equivalent to:

begin
  local t0 = time()
  local val = begin
    local t0 = 0
    for i = 1:100000000
      t0 += i
    end
    t0
  end
  local t1 = time()
  println("elapsed time: ", t1-t0, " seconds")
  val
end

Declaring a local variable twice in the same scope is illegal, and since begin blocks do not introduce a new scope block (see Variables and Scoping), this code is invalid. The root problem is that the naïve @time macro implementation is unhygienic: it is possible for the interpolated code to accidentally use variables that clash with the variables used by the macro’s code.
To address the macro hygiene problem, Julia provides the gensym function, which generates unique symbols that are guaranteed not to clash with any other symbols. Called with no arguments, gensym returns a single unique symbol:

julia> s = gensym()
#1007

Since it is common to need more than one unique symbol when generating a block of code in a macro, if you call gensym with an integer argument, it returns a tuple of that many unique symbols, which can easily be captured using tuple destructuring:

julia> s1, s2 = gensym(2)
(#1009,#1010)

julia> s1
#1009

julia> s2
#1010

The gensym function can be used to define the @time macro correctly, avoiding potential variable name clashes:

macro time(ex)
  t0, val, t1 = gensym(3)
  quote
    local $t0 = time()
    local $val = $ex
    local $t1 = time()
    println("elapsed time: ", $t1-$t0, " seconds")
    $val
  end
end

The call to gensym(3) generates three unique names for variables to use inside of the generated code block. With this definition, both of the above uses of @time work identically — the behavior of the code no longer depends in any way upon the names of variables in the given expression, since they are guaranteed not to collide with the names of variables used in code generated by the macro.

Non-Standard String Literals

Recall from Strings that string literals prefixed by an identifier are called non-standard string literals, and can have different semantics than un-prefixed string literals. For example:

E"$100\n" interprets escape sequences but does no string interpolation
r"^\s*(?:#|$)" produces a regular expression object rather than a string
b"DATA\xff\u2200" is a byte array literal for [68,65,84,65,255,226,136,128].

Perhaps surprisingly, these behaviors are not hard-coded into the Julia parser or compiler. Instead, they are custom behaviors provided by a general mechanism that anyone can use: prefixed string literals are parsed as calls to specially-named macros. For example, the regular expression macros is just the following:

macro r_str(p)
  Regex(p)
end

That’s all. This macro says that the literal contents of the string literal r"^\s*(?:#|$)" should be passed to the @r_str macro and the result of that expansion should be placed in the syntax tree where the string literal occurs. In other words, the expression r"^\s*(?:#|$)" is equivalent to placing the following object directly into the syntax tree:

Regex("^\\s*(?:#|\$)")

Not only is the string literal form shorter and far more convenient, but it is also more efficient: since the regular expression is compiled and the Regex object is actually created when the code is compiled, the compilation occurs only once, rather than every time the code is executed. Consider if the regular expression occurs in a loop:

for line = lines
  m = match(r"^\s*(?:#|$)", line)
  if m.match == nothing
    # non-comment
  else
    # comment
  end
end

Since the regular expression r"^\s*(?:#|$)" is compiled and inserted into the syntax tree when this code is parsed, the expression is only compiled once instead of each time the loop is executed. In order to accomplish this without macros, one would have to write this loop like this:

re = Regex("^\\s*(?:#|\$)")
for line = lines
  m = match(re, line)
  if m.match == nothing
    # non-comment
  else
    # comment
  end
end

Moreover, if the compiler could not determine that the regex object was constant over all loops, certain optimizations might not be possible, making this version still less efficient than the more convenient literal form above. Of course, there are still situations where the non-literal form is more convenient: if one needs to interpolate a variable into the regular expression, has to take this more verbose approach; in cases where the regular expression pattern itself is dynamic, potentially changing upon each loop iteration, a new regular expression object must be constructed on each iteration. The vast majority of use cases, however, one does not construct regular expressions dynamically, depending on run-time data. In this majority of cases, the ability to write regular expressions as compile-time values is, well, invaluable.
The mechanism for user-defined string literals is deeply, profoundly powerful. Not only are Julia’s non-standard literals implemented using it, but also the command literal syntax (`echo "Hello, $person"`) and regular string interpolation are implemented using it. These two powerful facilities are implemented with the following innocuous-looking pair of macros:

macro cmd(str)
  :(cmd_gen($shell_parse(str)))
end

macro str(s)
  interp_parse(s)
end

Of course, a large amount of complexity is hidden in the functions used in these macro definitions, but they are just functions, written entirely in Julia. You can read their source and see precisely what they do — and all they do is construct expression objects to be inserted into your program’s syntax tree.

Reflection

Running External Programs in Julia programming language

Running External Programs

Julia borrows backtick notation for commands from the shell, Perl, and Ruby. However, in Julia, writing

julia> `echo hello`
`echo hello`

differs in a several aspects from the behavior in various shells, Perl, or Ruby:

Instead of immediately running the command, backticks create a Cmd object to represent the command. You can use this object to connect the command to others via pipes, run it, and read or write to it.
When the command is run, Julia does not capture its output unless you specifically arrange for it to. Instead, the output of the command by default goes to stdout as it would using libc‘s system call.
The command is never run with a shell. Instead, Julia parses the command syntax directly, appropriately interpolating variables and splitting on words as the shell would, respecting shell quoting syntax. The command is run as julia‘s immediate child process, using fork and exec calls.

Here’s a simple example of actually running an external program:

julia> run(`echo hello`)
hello
true

The hello is the output of the echo command, while the true is the return value of the command, indicating that it succeeded. (These are colored differently by the interactive session if your terminal supports color.)

Interpolation

Suppose you want to do something a bit more complicated and use the name of a file in the variable file as an argument to a command. You can use $ for interpolation much as you would in a string literal (see Strings):

julia> file = "/etc/passwd"
"/etc/passwd"

julia> `sort $file`
`sort /etc/passwd`

A common pitfall when running external programs via a shell is that if a file name contains characters that are special to the shell, they may cause undesirable behavior. Suppose, for example, rather than /etc/passwd, we wanted to sort the contents of the file /Volumes/External HD/data.csv. Let’s try it:

julia> file = "/Volumes/External HD/data.csv"
"/Volumes/External HD/data.csv"

julia> `sort $file`
`sort '/Volumes/External HD/data.csv'`

How did the file name get quoted? Julia knows that file is meant to be interpolated as a single argument, so it quotes the word for you. Actually, that is not quite accurate: the value of file is never interpreted by a shell, so there’s no need for actual quoting; the quotes are inserted only for presentation to the user. This will even work if you interpolate a value as part of a shell word:

julia> path = "/Volumes/External HD"
"/Volumes/External HD"

julia> name = "data"
"data"

julia> ext = "csv"
"csv"

julia> `sort $path/$name.$ext`
`sort '/Volumes/External HD/data.csv'`

As you can see, the space in the path variable is appropriately escaped. But what if you want to interpolate multiple words? In that case, just use an array (or any other iterable container):

julia> files = ["/etc/passwd","/Volumes/External HD/data.csv"]
["/etc/passwd","/Volumes/External HD/data.csv"]

julia> `grep foo $files`
`grep foo /etc/passwd '/Volumes/External HD/data.csv'`

If you interpolate an array as part of a shell word, Julia emulates the shell’s {a,b,c} argument generation:

julia> names = ["foo","bar","baz"]
["foo","bar","baz"]

julia> `grep xylophone $names.txt`
`grep xylophone foo.txt bar.txt baz.txt`

Moreover, if you interpolate multiple arrays into the same word, the shell’s Cartesian product generation behavior is emulated:

julia> names = ["foo","bar","baz"]
["foo","bar","baz"]

julia> exts = ["aux","log"]
["aux","log"]

julia> `rm -f $names.$exts`
`rm -f foo.aux foo.log bar.aux bar.log baz.aux baz.log`

Since you can interpolate literal arrays, you can use this generative functionality without needing to create temporary array objects first:

julia> `rm -rf $["foo","bar","baz","qux"].$["aux","log","pdf"]`
`rm -rf foo.aux foo.log foo.pdf bar.aux bar.log bar.pdf baz.aux baz.log baz.pdf qux.aux qux.log qux.pdf`

Quoting

Inevitably, one wants to write commands that aren’t quite so simple, and it becomes necessary to use quotes. Here’s a simple example of a perl one-liner at a shell prompt:

sh$ perl -le '$|=1; for (0..3) { print }'
0
1
2
3

The Perl expression needs to be in single quotes for two reasons: so that spaces don’t break the expression into multiple shell words, and so that uses of Perl variables like $| (yes, that’s the name of a variable in Perl), don’t cause interpolation. In other instances, you may want to use double quotes so that interpolation does occur:

sh$ first="A"
sh$ second="B"
sh$ perl -le '$|=1; print for @ARGV' "1: $first" "2: $second"
1: A
2: B

In general, the Julia backtick syntax is carefully designed so that you can just cut-and-paste shell commands as-is into backticks and they will work: the escaping, quoting, and interpolation behaviors are the same as the shell’s. The only difference is that the interpolation is integrated and aware of Julia’s notion of what is a single string value, and what is a container for multiple values. Let’s try the above two examples in Julia:

julia> `perl -le '$|=1; for (0..3) { print }'`
`perl -le '$|=1; for (0..3) { print }'`

julia> run(ans)
0
1
2
3
true

julia> first = "A"; second = "B";

julia> `perl -le 'print for @ARGV' "1: $first" "2: $second"`
`perl -le 'print for @ARGV' '1: A' '2: B'`

julia> run(ans)
1: A
2: B
true

The results are identical, and Julia’s interpolation behavior mimics the shell’s with some improvements due to the fact that Julia supports first-class iterable objects while most shells use strings split on spaces for this, which introduces ambiguities. When trying to port shell commands to Julia, try cut and pasting first. Since Julia shows commands to you before running them, you can easily and safely just examine its interpretation without doing any damage.

Pipelines

Shell metacharacters, such as |, &, and >, are not special inside of Julia’s backticks: unlike in the shell, inside of Julia’s backticks, a pipe is always just a pipe:

julia> run(`echo hello | sort`)
hello | sort
true

This expression invokes the echo command with three words as arguments: “hello”, “|”, and “sort”. The result is that a single line is printed: “hello | sort”. Inside of backticks, a “|” is just a literal pipe character. How, then, does one construct a pipeline? Instead of using “|” inside of backticks, one uses Julia’s | operator between Cmd objects:

julia> run(`echo hello` | `sort`)
hello
true

This pipes the output of the echo command to the sort command. Of course, this isn’t terribly interesting since there’s only one line to sort, but we can certainly do much more interesting things:

julia> run(`cut -d: -f3 /etc/passwd` | `sort -n` | `tail -n5`)
210
211
212
213
214
true

This prints the highest five user IDs on a UNIX system. The cut, sort and tail commands are all spawned as immediate children of the current julia process, with no intervening shell process. Julia itself does the work to setup pipes and connect file descriptors that is normally done by the shell. Since Julia does this itself, it retains better control and can do some things that shells cannot.
Julia can run multiple commands in parallel:

julia> run(`echo hello` & `echo world`)
world
hello
true

The order of the output here is non-deterministic because the two echo processes are started nearly simultaneously, and race to make the first write to the stdout descriptor they share with each other and the julia parent process. Julia lets you pipe the output from both of these processes to another program:

julia> run(`echo world` & `echo hello` | `sort`)
hello
world
true

In terms of UNIX plumbing, what’s happening here is that a single UNIX pipe object is created and written to by both echo processes, and the other end of the pipe is read from by the sort command.
The combination of a high-level programming language, a first-class command abstraction, and automatic setup of pipes between processes is a powerful one. To give some sense of the complex pipelines that can be created easily, here are some more sophisticated examples, with apologies for the excessive use of Perl one-liners:

julia> prefixer(prefix, sleep) = `perl -nle '$|=1; print "'$prefix' ", $_; sleep '$sleep';'`

julia> run(`perl -le '$|=1; for(0..9){ print; sleep 1 }'` | prefixer("A",2) & prefixer("B",2))
A   0
B   1
A   2
B   3
A   4
B   5
A   6
B   7
A   8
B   9
true

This is a classic example of a single producer feeding two concurrent consumers: one perl process generates lines with the numbers 0 through 9 on them, while two parallel processes consume that output, one prefixing lines with the letter “A”, the other with the letter “B”. Which consumer gets the first line is non-deterministic, but once that race has been won, the lines are consumed alternately by one process and then the other. (Setting $|=1 in Perl causes each print statement to flush the stdout handle, which is necessary for this example to work. Otherwise all the output is buffered and printed to the pipe at once, to be read by just one consumer process.)
Here is an even more complex multi-stage producer-consumer example:

julia> run(`perl -le '$|=1; for(0..9){ print; sleep 1 }'` |
           prefixer("X",3) & prefixer("Y",3) & prefixer("Z",3) |
           prefixer("A",2) & prefixer("B",2))
B   Y   0
A   Z   1
B   X   2
A   Y   3
B   Z   4
A   X   5
B   Y   6
A   Z   7
B   X   8
A   Y   9
true

This example is similar to the previous one, except there are two stages of consumers, and the stages have different latency so they use a different number of parallel workers, to maintain saturated throughput.
Finally, we have an example of how you can make a process read from itself:

julia> gen = `perl -le '$|=1; for(0..9){ print; sleep 1 }'`
`perl -le '$|=1; for(0..9){ print; sleep 1 }'`

julia> dup = `perl -ne '$|=1; warn $_; print ".$_"; sleep 1'`
`perl -ne '$|=1; warn $_; print ".$_"; sleep 1'`

julia> run(gen | dup | dup)
0
.0
1
..0
2
.1
3
...0
4
.2
5
..1
6
.3
....0
7
.4
8
9
..2
.5
...1
.6
..3
.....0
.7
..4
.8
.9
...2
..5
....1
..6
...3

This example never terminates since the dup process reads its own output and duplicates it to stderr forever. We strongly encourage you to try all these examples to see how they work.

Arrays in Julia programming language

Arrays

Julia, like most technical computing languages, provides a first-class array implementation. Most technical computing languages pay a lot of attention to their array implementation at the expense of other containers. Julia does not treat arrays in any special way. The array library is implemented almost completely in Julia itself, and derives its performance from the compiler, just like any other code written in Julia.
An array is a collection of objects stored in a multi-dimensional grid. In the most general case, an array may contain objects of type Any. For most computational purposes, arrays should contain objects of a more specific type, such as Float64 or Int32.
In general, unlike many other technical computing languages, Julia does not expect programs to be written in a vectorized style for performance. Julia’s compiler uses type inference and generates optimized code for scalar array indexing, allowing programs to be written in a style that is convenient and readable, without sacrificing performance, and using less memory at times.
In Julia, all arguments to functions are passed by reference. Some technical computing languages pass arrays by value, and this is convenient in many cases. In Julia, modifications made to input arrays within a function will be visible in the parent function. The entire Julia array library ensures that inputs are not modified by library functions. User code, if it needs to exhibit similar behaviour, should take care to create a copy of inputs that it may modify.

Basic Functions

ndims(A) — the number of dimensions of A
size(A,n) — the size of A in a particular dimension
size(A) — a tuple containing the dimensions of A
eltype(A) — the type of the elements contained in A
length(A) — the number of elements in A
nnz(A) — the number of nonzero values in A
stride(A,k) — the size of the stride along dimension k
strides(A) — a tuple of the linear index distances between adjacent elements in each dimension

Construction and Initialization

Many functions for constructing and initializing arrays are provided. In the following list of such functions, calls with a dims... argument can either take a single tuple of dimension sizes or a series of dimension sizes passed as a variable number of arguments.

Array(type, dims...) — an uninitialized dense array
cell(dims...) — an uninitialized cell array (heterogeneous array)
zeros(type, dims...) — an array of all zeros of specified type
ones(type, dims...) — an array of all ones of specified type
trues(dims...) — a Bool array with all values true
falses(dims...) — a Bool array with all values false
reshape(A, dims...) — an array with the same data as the given array, but with different dimensions.
copy(A) — copy A
similar(A, element_type, dims...) — an uninitialized array of the same type as the given array (dense, sparse, etc.), but with the specified element type and dimensions. The second and third arguments are both optional, defaulting to the element type and dimensions of A if omitted.
reinterpret(type, A) — an array with the same binary data as the given array, but with the specified element type.
rand(dims) — random array with Float64 uniformly distributed values in [0,1)
randf(dims) — random array with Float32 uniformly distributed values in [0,1)
randn(dims) — random array with Float64 normally distributed random values with a mean of 0 and standard deviation of 1
eye(n) — n-by-n identity matrix
eye(m, n) — m-by-n identity matrix
linspace(start, stop, n) — a vector of n linearly-spaced elements from start to stop.
fill!(A, x) — fill the array A with value x

The last function, fill!, is different in that it modifies an existing array instead of constructing a new one. As a convention, functions with this property have names ending with an exclamation point. These functions are sometimes called “mutating” functions, or “in-place” functions.

Comprehensions

Comprehensions provide a general and powerful way to construct arrays. Comprehension syntax is similar to set construction notation in mathematics:

A = [ F(x,y,...) for x=rx, y=ry, ... ]

The meaning of this form is that F(x,y,...) is evaluated with the variables x, y, etc. taking on each value in their given list of values. Values can be specified as any iterable object, but will commonly be ranges like 1:n or 2:(n-1), or explicit arrays of values like [1.2, 3.4, 5.7]. The result is an N-d dense array with dimensions that are the concatenation of the dimensions of the variable ranges rx, ry, etc. and each F(x,y,...) evaluation returns a scalar.
The following example computes a weighted average of the current element and its left and right neighbour along a 1-d grid.

julia> const x = rand(10)
[0.6017125321472665,0.55317268439850298,0.83375372173664064,0.20371170284589835,0.50800458572940888,0.52963052092498386,0.33042233578025493,0.49411133447814293,0.29570938193206264,0.81897111867503525]

julia> [ 0.5*x[i-1] + x[i] + 0.5*x[i+1] for i=2:length(x)-1 ]
[1.27090581134045655,1.21219591535884108,0.8745908565789231,0.87467569761484998,0.94884398167981576,0.84229326348181832,0.80717719333430171,0.95225060850865173]

NOTE: In the above example, x is declared as constant because type inference in Julia does not work as well on non-constant global variables.

Indexing

The general syntax for indexing into an n-dimensional array A is:

X = A[I_1, I_2, ..., I_n]

where each I_k may be:

A scalar value
A Range of the form :, a:b, or a:b:c
An arbitrary integer vector, including the empty vector []

The result X has the dimensions (size(I_1), size(I_2), ..., size(I_n)), with location (i_1, i_2, ..., i_n) of X containing the value A[I_1[i_1], I_2[i_2], ..., I_n[i_n]].
Indexing syntax is equivalent to a call to ref:

X = ref(A, I_1, I_2, ..., I_n)

Example:

julia> x = reshape(1:16, 4, 4)
4x4 Int64 Array
1 5 9 13
2 6 10 14
3 7 11 15
4 8 12 16

julia> x[2:3, 2:end-1]
2x2 Int64 Array
6 10
7 11

Assignment

The general syntax for assigning values in an n-dimensional array A is:

A[I_1, I_2, ..., I_n] = X

where each I_k may be:

A scalar value
A Range of the form :, a:b, or a:b:c
An arbitrary integer vector, including the empty vector []

The size of X should be (size(I_1), size(I_2), ..., size(I_n)), and the value in location (i_1, i_2, ..., i_n) of A is overwritten with the value X[I_1[i_1], I_2[i_2], ..., I_n[i_n]].
Index assignment syntax is equivalent to a call to assign:

A = assign(A, X, I_1, I_2, ..., I_n)

Example:

julia> x = reshape(1:9, 3, 3)
3x3 Int64 Array
1 4 7
2 5 8
3 6 9

julia> x[1:2, 2:3] = -1
3x3 Int64 Array
1 -1 -1
2 -1 -1
3 6 9

Concatenation

Arrays can be concatenated along any dimension using the following syntax:

cat(dim, A...) — concatenate input n-d arrays along the dimension dim
vcat(A...) — Shorthand for cat(1, A...)
hcat(A...) — Shorthand for cat(2, A...)
hvcat(A...)

Concatenation operators may also be used for concatenating arrays:

[A B C...] — calls hcat
[A, B, C, ...] — calls vcat
[A B; C D; ...] — calls hvcat

Vectorized Operators and Functions

The following operators are supported for arrays. In case of binary operators, the dot version of the operator should be used when both inputs are non-scalar, and any version of the operator may be used if one of the inputs is a scalar.

Unary Arithmetic — -
Binary Arithmetic — +, -, *, .*, /, ./, \, .\, ^, .^, div, mod
Comparison — ==, !=, <, <=, >, >=
Unary Boolean or Bitwise — ~
Binary Boolean or Bitwise — &, |, $
Trigonometrical functions — sin, cos, tan, sinh, cosh, tanh, asin, acos, atan, atan2, sec, csc, cot, asec, acsc, acot, sech, csch, coth, asech, acsch, acoth, sinc, cosc, hypot
Logarithmic functions — log, log2, log10, log1p, logb, ilogb
Exponential functions — exp, expm1, exp2, ldexp
Rounding functions — ceil, floor, trunc, round, ipart, fpart
Other mathematical functions — min, max, abs, pow, sqrt, cbrt, erf, erfc, gamma, lgamma, real, conj, clamp

Implementation

The base array type in Julia is the abstract type AbstractArray{T,n}. It is parametrized by the number of dimensions n and the element type T. AbstractVector and AbstractMatrix are aliases for the 1-d and 2-d cases. Operations on AbstractArray objects are defined using higher level operators and functions, in a way that is independent of the underlying storage class. These operations are guaranteed to work correctly as a fallback for any specific array implementation.
The Array{T,n} type is a specific instance of AbstractArray where elements are stored in column-major order. Vector and Matrix are aliases for the 1-d and 2-d cases. Specific operations such as scalar indexing, assignment, and a few other basic storage-specific operations are all that have to be implemented for Array, so that the rest of the array library can be implemented in a generic manner for AbstractArray.
SubArray is a specialization of AbstractArray that performs indexing by reference rather than by copying. A SubArray is created with the sub function, which is called the same way as ref (with an array and a series of index arguments). The result of sub looks the same as the result of ref, except the data is left in place. sub stores the input index vectors in a SubArray object, which can later be used to index the original array indirectly.
StridedVector and StridedMatrix are convenient aliases defined to make it possible for Julia to call a wider range of BLAS and LAPACK functions by passing them either Array or SubArray objects, and thus saving inefficiencies from indexing and memory allocation.
The following example computes the QR decomposition of a small section of a larger array, without creating any temporaries, and by calling the appropriate LAPACK function with the right leading dimension size and stride parameters.

julia> a = rand(10,10);

julia> b = sub(a, 2:2:8,2:2:4)
4x2 SubArray of 10x10 Float64 Array
0.48291296659328276 0.31639301252254248
0.11191852765878418 0.80311033863988501
0.34377272170384798 0.12998312467801409
0.75207724893767547 0.48974544536835718

julia> (q,r,p) = qr(b);

julia> q
4x2 Float64 Array
-0.31610281030340204 0.38994108897230212
-0.80237370921615103 -0.5848318975546335
-0.12986390146593485 0.36571345172816944
-0.48929624071011685 0.61005841520202764

julia> r
2x2 Float64 Array
-1.00091806276211814 -0.65508286752651457
0.0 0.70738744643074303

julia> p
[2,1]

Search This Blog

Monday, June 18, 2012