Tuesday, June 19, 2012

Getting Started with Julia programming language

Getting Started

The latest version of Julia can be downloaded and installed by following the instructions on the main GitHub page. The easiest way to learn and experiment with Julia is by starting an interactive session (also known as a read-eval-print loop or “repl”):

$ julia
               _
   _       _ _(_)_     |
  (_)     | (_) (_)    |  A fresh approach to technical computing.
   _ _   _| |_  __ _   |
  | | | | | | |/ _` |  |  Version 0 (pre-release)
  | | |_| | | | (_| |  |  Commit 61847c5aa7 (2011-08-20 06:11:31)*
 _/ |\__'_|_|_|\__'_|  |
|__/                   |

julia> 1 + 2
3

julia> ans
3

julia> load("file.jl")

To exit the interactive session, type ^D — the control key together with the d key. When run in interactive mode, julia displays a banner and prompts the user for input. Once the user has entered a complete expression, such as 1 + 2, and hits enter, the interactive session evaluates the expression and shows its value. If an expression is entered into an interactive session with a trailing semicolon, its value is not shown. The variable ans is bound to the value of the last evaluated expression whether it is shown or not. The load function reads and evaluates the contents of the given file.
To run code in a file non-interactively, you can give it as the first argument to the julia command:

$ julia script.jl arg1 arg2...

As the example implies, the following command-line arguments to julia are taken as command-line arguments to the program script.jl, passed in the global constant ARGS. ARGS is also set when script code is given using the -e option on the command line (see the julia help output below). For example, to just print the arguments given to a script, you could do this:

$ julia -e 'for x in ARGS; println(x); end' foo bar
foo
bar

Or you could put that code into a script and run it:

$ echo 'for x in ARGS; println(x); end' > script.jl
$ julia script.jl foo bar
foo
bar

There are various ways to run Julia code and provide options, similar to those available for the perl and ruby programs:

julia [options] [program] [args...]
 -q --quiet               Quiet startup without banner
 -H --home=          Load files relative to 
 -T --tab=          Set REPL tab width to 

 -e --eval=         Evaluate  and don't print
 -E --print=        Evaluate and print 
 -P --post-boot=    Evaluate  right after boot
 -L --load=file           Load  right after boot
 -J --sysimage=file       Start up with the given system image file

 -p n                     Run n local processes

 -h --help                Print this message

Example Code

At this point it is useful to take a look at some man-example-programs.

Major Differences From MATLAB®

Julia’s syntax is intended to be familiar to users of MATLAB®. However, Julia is in no way a MATLAB® clone: there are major syntactic and functional differences. The following are the most significant differences that may trip up Julia users accustomed to MATLAB®:

Arrays are indexed with square brackets, A[i,j].
Multiple values are returned and assigned with parentheses, return (a, b) and (a, b) = f(x).
Values are passed and assigned by reference. If a function modifies an array, the changes will be visible in the caller.
Use n for nx1: The number of arguments to an array constructor equals the number of dimensions of the result. In particular, rand(n) makes a 1-dimensional array.
Concatenating scalars and arrays with the syntax [x,y,z] concatenates in the first dimension (“vertically”). For the second dimension (“horizontally”), use spaces as in [x y z]. To construct block matrices (concatenating in the first two dimensions), the syntax [a b; c d] is used to avoid confusion.
Colons a:b and a:b:c construct Range objects. To construct a full vector, use linspace, or “concatenate” the range by enclosing it in brackets, [a:b].
Functions return values using the return keyword, instead of by listing their names in the function definition (see The “return” Keyword for details).
A file may contain any number of functions, and all definitions will be externally visible when the file is loaded.
Reductions such as sum, prod, and max are performed over every element of an array when called with a single argument as in sum(A).
Functions such as sort that operate column-wise by default (sort(A) is equivalent to sort(A,1)) do not have special behavior for 1xN arrays; the argument is returned unmodified since it still performs sort(A,1). To sort a 1xN matrix like a vector, use sort(A,2).
Parentheses must be used to call a function with zero arguments, as in tic() and toc().
Do not use semicolons to end statements. The results of statements are not automatically printed (except at the interactive prompt), and lines of code do not need to end with semicolons. The function println can be used to print a value followed by a newline.
If A and B are arrays, A == B doesn’t return an array of booleans. Use A .== B instead. Likewise for the other boolean operators, <, >, !=, etc.

Potential Features in Julia programming language

Potential Features

Julia is still a very young programming language, and there are many features that have been discussed and planned to various extents, but not yet implemented. This page documents some of these potential future features, but is likely to be out of date. See the mailing list and GitHub issues for the latest discussion on potential features.

Local ``goto` <https://github.com/JuliaLang/julia/issues/101>`_
Namespaces/modules
Abstract multiple inheritance
Enhanced C struct and array compatibility
Const fields in composite types
Multiline quotes using `“”” ... “”“`` <https://github.com/JuliaLang/julia/issues/70>`_
Multiline comments using ``### ... ###` <https://github.com/JuliaLang/julia/issues/69>`_
Symbolic optimization
Collaborative cloud-computing architecture

Performance Tips in Julia programming language

Performance Tips

In the following sections, we briefly go through a few techniques that can help make your Julia code run as fast as possible.

Avoid global variables

A global variable might have its value, and therefore its type, change at any point. This makes it difficult for the compiler to optimize code using global variables. Variables should be local, or passed as arguments to functions, whenever possible.
We find that global names are frequently constants, and declaring them as such greatly improves performance:

const DEFAULT_VAL = 0

Uses of non-constant globals can be optimized by annotating their types at the point of use:

global x
y = f(x::Int + 1)

Type declarations

In many languages with optional type declarations, adding declarations is the principal way to make code run faster. In Julia, the compiler generally knows the types of all function arguments and local variables. However, there are a few specific instances where declarations are helpful.

Declare specific types for fields of composite types

Given a user-defined type like the following:

type Foo
    field
end

the compiler will not generally know the type of foo.field, since it might be modified at any time to refer to a value of a different type. It will help to declare the most specific type possible, such as field::Float64 or field::Array{Int64,1}.

Annotate values taken from untyped locations

It is often convenient to work with data structures that may contain values of any type, such as the original Foo type above, or cell arrays (arrays of type Array{Any}). But, if you’re using one of these structures and happen to know the type of an element, it helps to share this knowledge with the compiler:

function foo(a::Array{Any,1})
    x = a[1]::Int32
    b = x+1
    ...
end

Here, we happened to know that the first element of a would be an Int32. Making an annotation like this has the added benefit that it will raise a run-time error if the value is not of the expected type, potentially catching certain bugs earlier.

Break functions into multiple definitions

Writing a function as many small definitions allows the compiler to directly call the most applicable code, or even inline it.
Here is an example of a “compound function” that should really be written as multiple definitions:

function norm(A)
    if isa(A, Vector)
        return sqrt(real(dot(x,x)))
    elseif isa(A, Matrix)
        return max(svd(A)[2])
    else
        error("norm: invalid argument")
    end
end

This can be written more concisely and efficiently as:

norm(A::Vector) = sqrt(real(dot(x,x)))
norm(A::Matrix) = max(svd(A)[2])

Write “type-stable” functions

When possible, it helps to ensure that a function always returns a value of the same type. Consider the following definition:

pos(x) = x < 0 ? 0 : x

Although this seems innocent enough, the problem is that 0 is an integer (of type Int) and x might be of any type. Thus, depending on the value of x, this function might return a value of either of two types. This behavior is allowed, and may be desirable in some cases. But it can easily be fixed as follows:

pos(x) = x < 0 ? zero(x) : x

There is also a one function, and a more general oftype(x,y) function, which returns y converted to the type of x. The first argument to any of these functions can be either a value or a type.

Avoid changing the type of a variable

An analogous “type-stability” problem exists for variables used repeatedly within a function:

function foo()
    x = 1
    for i = 1:10
        x = x/bar()
    end
    return x
end

Local variable x starts as an integer, and after one loop iteration becomes a floating-point number (the result of the / operator). This makes it more difficult for the compiler to optimize the body of the loop. There are several possible fixes:

Initialize x with x = 1.0
Declare the type of x: x::Float64 = 1
Use an explicit conversion: x = one(T)

Separate kernel functions

Many functions follow a pattern of performing some set-up work, and then running many iterations to perform a core computation. Where possible, it is a good idea to put these core computations in separate functions. For example, the following contrived function returns an array of a randomly-chosen type:

function strange_twos(n)
    a = Array(randbool() ? Int64 : Float64, n)
    for i = 1:n
        a[i] = 2
    end
    return a
end

This should be written as:

function fill_twos!(a)
    for i=1:numel(a)
        a[i] = 2
    end
end

function strange_twos(n)
    a = Array(randbool() ? Int64 : Float64, n)
    fill_twos!(a)
    return a
end

Julia’s compiler specializes code for argument types at function boundaries, so in the original implementation it does not know the type of a during the loop (since it is chosen randomly). Therefore the second version is generally faster since the inner loop can be recompiled as part of fill_twos! for different types of a.
The second form is also often better style and can lead to more code reuse.
This pattern is used in several places in the standard library. For example, see _jl_hvcat_fill in `abstractarray.jl <https://github.com/JuliaLang/julia/blob/master/base/abstractarray.jl>`_, or the fill! function, which we could have used instead of writing our own fill_twos!.
Functions like strange_twos occur when dealing with data of uncertain type, for example data loaded from an input file that might contain either integers, floats, strings, or something else.

Tweaks

These are some minor points that might help in tight inner loops.

Use size(A,n) when possible instead of size(A).
Avoid unnecessary arrays. For example, instead of sum([x,y,z]) use x+y+z.

A Data Scientist's blog

Search This Blog