Search This Blog

Tuesday, June 12, 2012

Julia Programming Language, Why we created Julia

Why We Created Julia

In short, because we are greedy.
We are power Matlab users. Some of us are Lisp hackers. Some are Pythonistas, others Rubyists, still others Perl hackers. There are those of us who used Mathematica before we could grow facial hair. There are those who still can’t grow facial hair. We’ve generated more R plots than any sane person should. C is our desert island programming language.
We love all of these languages; they are wonderful and powerful. For the work we do — scientific computing, machine learning, data mining, large-scale linear algebra, distributed and parallel computing — each one is perfect for some aspects of the work and terrible for others. Each one is a trade-off.
We are greedy: we want more.
We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.
(Did we mention it should be as fast as C?)
While we’re being demanding, we want something that provides the distributed power of Hadoop — without the kilobytes of boilerplate Java and XML; without being forced to sift through gigabytes of log files on hundreds of machines to find our bugs. We want the power without the layers of impenetrable complexity. We want to write simple scalar loops that compile down to tight machine code using just the registers on a single CPU. We want to write A*B and launch a thousand computations on a thousand machines, calculating a vast matrix product together.
We never want to mention types when we don’t feel like it. But when we need polymorphic functions, we want to use generic programming to write an algorithm just once and apply it to an infinite lattice of types; we want to use multiple dispatch to efficiently pick the best method for all of a function’s arguments, from dozens of method definitions, providing common functionality across drastically different types. Despite all this power, we want the language to be simple and clean.
All this doesn’t seem like too much to ask for, does it?
Even though we recognize that we are inexcusably greedy, we still want to have it all. About two and a half years ago, we set out to create the language of our greed. It’s not complete, but it’s time for a 1.0 release — the language we’ve created is called Julia. It already delivers on 90% of our ungracious demands, and now it needs the ungracious demands of others to shape it further. So, if you are also a greedy, unreasonable, demanding programmer, we want you to give it a try.

What do scientists want in a programming language? New Julia language seeks to be the C for scientists

What I liked best about this interview was the exploration of the (implicit) question, “What do scientists want in a programming language?”  It sounds like the answers are (explicitly) performance and broad applicability, for the applications that scientists care about (e.g., numeric processing, linear algebra, and statistics), but also (implicitly) ease of reading/writing. On the Julia Language page, there is discussion about how much easier it is to read and write Julia compared to similar C++. The language looks surprisingly Python-like.
And I have to admit that what most interested me on the Julia Language page were the statistics on JavaScript.  Really?  It’s that fast now?!?
InfoWorld: When you say technical computing, to what type of applications are you specifically referring?
Karpinski: It’s a broad category, but it’s pretty much anything that involves a lot of number-crunching. In my own background, I’ve done a lot of linear algebra but a fair amount of statistics as well. The tool of choice for linear algebra tends to be Matlab. The tool of choice for statistics tends to be R, and I’ve used both of those a great deal. But they’re not really interchangeable. If you want to do statistics in Matlab, it’s frustrating. If you want to do linear algebra in R, it’s frustrating.
InfoWorld: So you developed Julia with the intent to make it easier to build technical applications?
Karpinski: Yes. The idea is that it should be extremely high productivity. To that end, it’s a dynamic language, so it’s relatively easy to program, and it’s got a very simple programming model. But it has extremely high performance, which cuts out [the need for] a third language [C], which is often [used] to get performance in any of these other languages. I should also mention NumPy, which is a contender for these areas. For Matlab, R, and NumPy, for all of these options, you need to at some point drop down into C to get performance. One of our goals explicitly is to have sufficiently good performance in Julia that you’d never have to drop down into C.

Julia Programming Language: The Mannual

Julia Programming Language: How to Install

README.md

               _
   _       _ _(_)_     |
  (_)     | (_) (_)    |   A fresh approach to technical computing
   _ _   _| |_  __ _   |
  | | | | | | |/ _` |  |          http://julialang.org
  | | |_| | | | (_| |  |       julia-dev@googlegroups.com
 _/ |\__'_|_|_|\__'_|  |           #julia on freenode
|__/                   |

The Julia Language

Julia is a high-level, high-performance dynamic language for technical computing. The main homepage for Julia can be found at julialang.org. This is the GitHub repository of Julia source code, including instructions for compiling and installing Julia, below.


Currently Supported Platforms

  • GNU/Linux: x86/64 (64-bit); x86 (32-bit).
  • Darwin/OS X: x86/64 (64-bit); x86 (32-bit).
  • FreeBSD: x86/64 (64-bit); x86 (32-bit).

Source Download & Compilation

First, acquire the source code by cloning the git repository:
git clone git://github.com/JuliaLang/julia.git
Next, enter the julia/ directory and run make to build the julia executable. To perform a parallel build, use make -j N and supply the maximum number of concurrent processes. When compiled the first time, it will automatically download and build its external dependencies. This takes a while, but only has to be done once. Building julia requires 1.5GiB of diskspace and approximately 700MiB of virtual memory.
Note: the build process will not work if any of the build directory's parent directories have spaces in their names (this is due to a limitation in GNU make).
Once it is built, you can either run the julia executable using its full path in the directory created above, or add that directory to your executable path so that you can run the julia program from anywhere:
export PATH="$(pwd)/julia:$PATH"
Now you should be able to run julia like this:
julia
If everything works correctly, you will see a Julia banner and an interactive prompt into which you can enter expressions for evaluation. You can read about getting started in the manual.

Platform-Specific Notes

Linux

On some Linux distributions you may need to change how the readline library is linked. If you get a build error involving readline, try changing the value of USE_SYSTEM_READLINE in Make.inc to 1.
On Ubuntu systems, You may also need to install the package libncurses5-dev.

OS X

You may need to install gfortran. Either download and install gfortran from hpc.sf.net, or 64-bit gfortran from gcc.gnu.org.
If you get link errors mentioning gfortran, it might help to put /usr/local/gfortran/lib at the beginning of the DYLD_LIBRARY_PATH environment variable.

FreeBSD

The prerequisites can be installed from ports like this:
cd /usr/ports/devel/gmake
make install

cd /usr/ports/ftp/curl
make install

cd /usr/ports/devel/libunwind
make install

cd /usr/ports/lang/gcc45
make install
ln -s /usr/local/bin/gfortran45 /usr/local/bin/gfortran
Other versions of gcc are also exist but currently gfortran45 is the one used by all the ports that depend on Fortran.
Use the gmake command on FreeBSD instead of make

MKL

To use the Intel MKL BLAS & LAPACK libraries, edit the following settings in Make.inc:
USE_MKL = 1
MKLLIB = /path/to/mkl/lib/arch
MKLLIB points to the directory containing libmkl_rt.so. Requires v10.3 or greater. To rebuild a pre-built Julia source install with MKL support, delete from deps/, the OpenBLAS, ARPACK, and SuiteSparse dependencies, then run make cleanall testall.

Required Build Tools & External Libraries

Building Julia requires that the following software be installed:
  • GNU make — building dependencies.
  • gcc, g++, gfortran — compiling and linking C, C++ and Fortran code.
  • git — contributions and version control.
  • perl — preprocessing of header files of libraries.
  • wget or curl — to automatically download external libraries (Linux defaults to wget, OS X and FreeBSD to curl).
  • m4 — needed to build GMP.
Julia uses the following external libraries, which are automatically downloaded (or in a few cases, included in the Julia source repository) and then compiled from source the first time you run make:
  • LLVM — compiler infrastructure. Currently, julia requires LLVM 3.0.
  • FemtoLisp — packaged with julia source, and used to implement the compiler front-end.
  • GNU readline — library allowing shell-like line editing in the terminal, with history and familiar key bindings.
  • fdlibm — a portable implementation of much of the system-dependent libm math library's functionality.
  • MT — a fast Mersenne Twister pseudorandom number generator library.
  • OpenBLAS — a fast, open, and maintained basic linear algebra subprograms (BLAS) library, based on Kazushige Goto's famous GotoBLAS.
  • LAPACK — a library of linear algebra routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems.
  • MKL (optional) – OpenBLAS & LAPACK may be replaced by Intel's MKL library.
  • AMOS — subroutines for computing Bessel and Airy functions.
  • SuiteSparse — a library of linear algebra routines for sparse matrices.
  • ARPACK — a collection of subroutines designed to solve large, sparse eigenvalue problems.
  • FFTW — library for computing fast Fourier transforms very quickly and efficiently.
  • PCRE — Perl-compatible regular expressions library.
  • GMP — the GNU multiple precision arithmetic library, needed for bigint support.
  • D3 — JavaScript visualization library.
  • double-conversion — efficient number-to-text conversion.
  • GLPK - linear programming.
  • Rmath - basic RNGs and distributions.

Directories

base/          source code for Julia's standard library
contrib/       emacs, vim and textmate support for Julia
deps/          external dependencies
examples/      example Julia programs
extras/        useful optional libraries
lib/           shared libraries loaded by Julia's standard libraries
src/           source for Julia language core
test/          unit and functional test cases
ui/            source for various front ends

Binary Installation

Because of the rapid pace of development at this point, we recommend installing the latest Julia from source, but platform-specific tarballs with pre-compiled binaries are also available for download. To install from source, download the appropriate tarball and untar it somewhere. For example, if you are on an OS X (Darwin) x86/64 system, do the following:
wget https://github.com/downloads/JuliaLang/julia/julia-c4865bd18d-Darwin-i386.tar.gz
tar zxvf julia-c4865bd18d-Darwin-i386.tar.gz
You can either run the julia executable using its full path in the directory created above, or add that directory to your executable path so that you can run the julia program from anywhere:
export PATH="$(pwd)/julia:$PATH"
Now you should be able to run julia like this:
julia
If everything works correctly, you will see a Julia banner and an interactive prompt into which you can enter expressions for evaluation. You can read about getting started in the manual.
An Arch Linux package is also available.

Editor & Terminal Setup

Currently, julia editing mode support is available for Emacs, Vim, and Textmate.
Adjusting your terminal bindings is optional; everything will work fine without these key bindings. For the best interactive session experience, however, make sure that your terminal emulator (Terminal, iTerm, xterm, etc.) sends the ^H sequence for Backspace (delete key) and that the Shift-Enter key combination sends a \n newline character to distinguish it from just pressing Enter, which sends a \r carriage return character. These bindings allow custom readline handlers to trap and correctly deal with these key sequences; other programs will continue behave normally with these bindings. The first binding makes backspacing through text in the interactive session behave more intuitively. The second binding allows Shift-Enter to insert a newline without evaluating the current expression, even when the current expression is complete. (Pressing an unmodified Enter inserts a newline if the current expression is incomplete, evaluates the expression if it is complete, or shows an error if the syntax is irrecoverably invalid.)
On Linux systems, the Shift-Enter binding can be set by placing the following line in the file .xmodmaprc in your home directory:
keysym Return = Return Linefeed

Web REPL

Julia has a web REPL with very preliminary graphics capabilities. The web REPL is currently a showcase to try out new ideas. The web REPL is social - multiple people signing in with a common session name can collaborate within a session.
  1. Do make -C deps install-lighttpd to download and build the webserver.
  2. Start the web REPL service with ./usr/bin/launch-julia-webserver.
  3. Point your browser to http://localhost:2000/.
  4. Try plot(cumsum(randn(1000))) and other things.

Try it Online

Forio.com is generously hosting and maintaining an instance of Julia's web REPL here: julia.forio.com.

Julia Programming Language: An introduction

The Julia Language
 
Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library. The library, mostly written in Julia itself, also integrates mature, best-of-breed C and Fortran libraries for linear algebra, random number generation, FFTs, and string processing. More libraries continue to be added over time. Julia programs are organized around defining functions, and overloading them for different combinations of argument types (which can also be user-defined). For a more in-depth discussion of the rationale and advantages of Julia over other systems, see the following highlights or read the introduction in the online manual.

High-Performance JIT Compiler

Julia’s LLVM-based just-in-time (JIT) compiler combined with the language’s design allow it to approach and often match the performance of C/C++. To get a sense of relative performance of Julia compared to other languages that can or could be used for numerical and scientific computing, we’ve written a small set of micro-benchmarks in a variety of languages. The source code for the various implementations can be found here: C++, Julia, Python, Matlab/Octave, R, and JavaScript. We encourage you to skim the code to get a sense for how easy or difficult numerical programming in each language is. The following micro-benchmark results are from a MacBook Pro with a 2.53GHz Intel Core 2 Duo CPU and 8GB of 1066MHz DDR3 RAM:

JuliaPythonMatlabOctaveRJavaScript

3f670da02.7.1R2011a3.42.14.2V8 3.6.6.11
fib1.9731.471336.372383.80225.231.55
parse_int1.4416.50815.196454.50337.522.17
quicksort1.4955.84132.713127.50713.774.11
mandel5.5531.1565.44824.68156.685.67
pi_sum0.7418.031.08328.33164.690.75
rand_mat_stat3.3739.3411.6454.5422.078.12
rand_mat_mul1.001.180.701.658.6441.79
Figure: benchmark times relative to C++ (smaller is better).
C++ compiled by GCC 4.2.1, taking best timing from all optimization levels (-O0 through -O3).
The Python implementations of rand_mat_stat and rand_mat_mul use NumPy (v1.5.1) functions; the rest are pure Python implementations.
Julia beats other high-level systems on most micro-benchmarks, with a few exceptions for Matlab and JavaScript. Relative performance between languages on other systems is similar. Matlab’s ability to beat both C and Julia on random matrix multiplication comes from its use of the proprietary Intel Math Kernel Library, which has extremely optimized code for matrix multiplication on Intel platforms. Users who have a licensed copy of MKL can use it with Julia, but the default BLAS is a high quality open source implementation (see the GitHub page for more details).
These benchmarks, while not comprehensive, do test compiler performance on a range of common code patterns, such as function calls, string parsing, sorting, numerical loops, random number generation, and array operations. Julia is strong in an area that high-level languages have traditionally been weak: scalar arithmetic loops, such as that found in the pi summation benchmark. Matlab’s JIT for floating-point arithmetic does well here too, as does the V8 JavaScript engine. V8 is impressive in that it can provide such a dynamic language with C-like performance in so many circumstances. JavaScript, however, is unable to utilize technical computing libraries such as LAPACK, resulting in poor performance on benchmarks like matrix multiplication. In contrast with both Matlab and JavaScript, Julia has a more comprehensive approach to eliminating overhead that allows it to consistently optimize all kinds of code for arbitrary user-defined data types, not just certain special cases.
To give a quick taste of what Julia looks like, here is the code used in the Mandelbrot and random matrix statistics benchmarks:
function mandel(z)
    c = z
    maxiter = 80
    for n = 1:maxiter
        if abs(z) > 2
            return n-1
        end
        z = z^2 + c
    end
    return maxiter
end

function randmatstat(t)
    n = 5
    v = zeros(t)
    w = zeros(t)
    for i = 1:t
        a = randn(n,n)
        b = randn(n,n)
        c = randn(n,n)
        d = randn(n,n)
        P = [a b c d]
        Q = [a b; c d]
        v[i] = trace((P.'*P)^4)
        w[i] = trace((Q.'*Q)^4)
    end
    std(v)/mean(v), std(w)/mean(w)
end
As you can see, the code is quite clear, and should feel familiar to anyone who has programmed in other mathematical languages. Although C++ beats Julia in the random matrix statistics benchmark by a significant factor, consider how much simpler this code is than the C++ implementation. There are more compiler optimizations planned that we hope will close this performance gap in the future. By design, Julia allows you to range from low-level loop and vector code, up to a high-level programming style, sacrificing some performance, but gaining the ability to express complex algorithms easily. This continuous spectrum of programming levels is a hallmark of the Julia approach to programming and is very much an intentional feature of the language.

Designed for Parallelism & Cloud Computing

Julia does not impose any particular style of parallelism on the user. Instead, it provides a number of key building blocks for distributed computation, making it flexible enough to support a number of styles of parallelism, and allowing users to add more. The following simple example demonstrates how to count the number of heads in a large number of coin tosses in parallel.
nheads = @parallel (+) for i=1:100000000
  randbit()
end
This computation is automatically distributed across all available compute nodes, and the result, reduced by summation (+), is returned at the calling node.
Although it is in the early stages, Julia already supports a fully remote cloud computing mode. Here is a screenshot of a web-based interactive Julia session, plotting an oscillating function and a Gaussian random walk:

You can try Julia in the web repl yourself at julia.forio.com (EC2 instance and maintenance graciously provided by Forio). There will eventually be full support for cloud-based operation, including data management, code editing and sharing, execution, debugging, collaboration, analysis, data exploration, and visualization. The goal is to allow people who work with big data to stop worrying about administering machines and managing data and get straight to the real problem.

Free, Open Source & Library-Friendly

The core of the Julia implementation is licensed under the MIT license. Various libraries used by the Julia environment include their own licenses such as the GPL, LGPL, and BSD (therefore the environment, which consists of the language, user interfaces, and libraries, is under the GPL). The language can be built as a shared library, so users can combine Julia with their own C/Fortran code or proprietary third-party libraries. Furthermore, Julia makes it simple to call external functions in C and Fortran shared libraries, without writing any wrapper code or even recompiling existing code. You can try calling external library functions directly from Julia’s interactive prompt, getting immediate feedback. See LICENSE for the full terms of Julia’s licensing.
Fork me on GitHub

Julia Programming Language : Downloads

/ julia


Download Packages

  1. 6 downloads

    julia-4cdb14a8b9-Darwin-x86_64.tar.gz — Darwin binaries built on OS X 10.7.3. (Make sure you have gfortran installed).

    16.4MB · Uploaded
  2. 1,126 downloads

    julia-package.zip — Windows build of julia. Unzip, and run julia.bat.

    26.8MB · Uploaded
  3. 120 downloads

    julia-33a3780d30-Linux-i686-glibc211.tar.gz — Linux binaries built for i686, glibc 2.11

    22.5MB · Uploaded
  4. 148 downloads

    julia-a1fcc12042-Linux-x86_64-glibc25.tar.gz — Linux binaries built for x86_64, glibc 2.5

    23.2MB · Uploaded
  5. 229 downloads

    julia_0be57b740a_i386.deb — Debian package built on Ubuntu 11.10 (32-bit)

    21.7MB · Uploaded
  6. 91 downloads

    julia-0dc37c6eca-Linux-x86_64-glibc215.tar.gz — Linux binaries built for x86_64, glibc 2.14 or 2.15

    23.4MB · Uploaded
  7. 190 downloads

    julia-0900cdf990-Linux-x86_64-glibc213.tar.gz — Linux binaries built for x86_64, glibc 2.13

    31.4MB · Uploaded
  8. 80 downloads

    julia-212eb34efd-Linux-x86_64-glibc211.tar.gz — Linux binaries built for x86_64, glibc 2.11

    29.8MB · Uploaded

Julia Programming Language: An R programmer looks at Julia

An R programmer looks at Julia (r-bloggers.com)

40 points by TalGalili 64 days ago | comments


chimeracoder 64 days ago | link

I have to come to love R (for what I use it it for), but reading this makes me realize how unusual my R-workflow must be, because most of the 'advantages' of Julia over R don't really come up in my daily workflow anymore - it seems that's likely because I've adapted to the shortcomings of R and have twisted other tools to my needs. I'll add Julia to my list of languages to check out in more detail, because perhaps Julia could replace my need for this rather esoteric workflow that I've developed out of sheer necessity.I use Python (NumPy/SciPy) for most of the data preprocessing, and perhaps that's why. I used to do this in R, and I realized that it's just a lot easier to get done in Python (and it ends up being faster anyway). The problem is that Python/NumPy/SciPy still doesn't lend itself quite as well as R does to certain aspects of the statistician's use case. It's possible that things have changed since the last time I evaluated the two, but I still find it easier to prototype various models in R, even if I do all of the preqrequisite data munging in a different environment.
I understand that R, like Perl, is 'blessed' (pun intended) with two different, incompatible type systems - in fact, this is the reason I avoid using R's type system, and whenever I'm advising newcomers, I always recommend the same. I don't write statistical packages, so this doesn't come up, but when I find myself needing to write a method in R, I ask myself if this would actually be done more easily another way instead. Generally, I find the answer is 'yes, yes it would'.
I really do think the problem is the type system. The kind of type system that lends itself well to data manipulation is not the same type system that lends itself well to model manipulation - when I think about it, I've unconsciously segregated my workflow into two parts, doing everything naturally done with Python's type system in Python, and likewise for R. Maybe that's just the way that I happen to approach data manipulation, but I think it's non-coincidental. R's relative homoiconicity (compared to Python) makes it really nice for some things, but there are other warts with its typing that are just too annoying to work around, when a python shell is just a few keystrokes away.
I guess the answer is (as always!) to use a purely homoiconic Lisp dialect, so you get the best of both worlds but that's asking a lot of statisticians.
I really have come love R for what it does do, though. Of all all the statistical software packages I've seen (comparable: SAS, SPSS, Stata, MATLAB), it's far and away the best (and the GNU license makes it very, very attractive to broke students looking to avoid the still-absurdly-priced student licenses for the alternatives). That said, I still sigh every time I realize that I'm essentially gluing together two separate runtime environments for something that should really be easily integrated. I do what I do now because it ends up being faster than using either Python or R for everything, but it still strikes me as weird that a language so perfect for munging data (Python) can still be so awkward for analyzing it, and vice versa.
-----
necubi 64 days ago | link

I find that I do the same thing, except with Ruby for data processing instead of Python. It may be that I just don't know R all that well, but there are so many tasks that are incredibly awkward in R, often requiring a third-party library like plyr which are easily expressed in a language with more "normal" semantics.An example, from this week: I have a bunch of CSV data files from various trials of an experiment. I want to combine them into one data frame with a new column that includes an id for trial. This took me about a half hour to figure out in R, and five minutes to write in Ruby.
I think the main problem with R is that there's a different way to do everything. It feels like a language that was not so much designed as gradually evolved. In a functional-ish language like Ruby or Python you have a few workhorse data manipulation tools: map, fold, etc. But in R everything is different depending on whether you're dealing with row vectors, column vectors, data frame, or arrays. It makes it hard to generalize over slightly different problems to find common solutions.
Julia looks really awesome, though, and I'm excited to see something that might be able to replace R and bring all of this comfortably into one language.
-----
chimeracoder 64 days ago | link

> I think the main problem with R is that there's a different way to do everything. It feels like a language that was not so much designed as gradually evolved.I don't know how much you know about the history of R, but you're spot on about that.
-----
StefanKarpinski 64 days ago | link

> I guess the answer is (as always!) to use a purely homoiconic Lisp dialect, so you get the best of both worlds but that's asking a lot of statisticians.For what it's worth, Julia is homoiconic and "underneath" the Matlab-like syntactic exterior, quite a lot like scheme.
-----
drunkpotato 64 days ago | link

That is very interesting. I use Python to pre-process data for Matlab, and have been giving serious thought lately to learning R for its free license and easy(?) integration with Hadoop. Can you briefly comment on the advantages or R over Matlab aside from licensing?-----
chimeracoder 64 days ago | link

If you're already used to Matlab, then you may not find my comments as relevant. If you were already proficient in both, then they're both interchangeable for many tasks (which is in fact why I always recommend learning R over learning Matlab).Licensing isn't just a minor thing - getting Matlab to run on non-Debian Linux is a painful ordeal. I never actually got it working, because I never bothered to debug its cryptic error messages, and since it's distributed as a precompiled binary, I wasn't going to sit around trying to patch it. A corollary is that R is easier to integrate into other toolkits, and there are a ridiculous number of freely available R libraries that make your life easier.
My issues with Matlab may be things that someone familiar with the language would care less about. That said, I find Matlab to be incredibly, incredibly irritating, and I think that's because it's design is tailored towards people with minimal experience with other programming languages (like research scientists), whereas R's design is simply based off of S - so I find it violates the Principle of Least Surprise less. Matlab is not like Lisp or Haskell (where the journey of understanding the language is valuable in itself) - it's really just a means to an end (number crunching), so the POLS is especally important.
R, unlike Matlab, imposes almost no restrictions on the structure of a program. The way I see it, Matlab makes Java's broken one-class-per-file model even worse, by imposing more filesystem-level restrictions on my program.
R, unlike Matlab, uses a type system that's more familiar to someone used to programming with multiple datatypes, as opposed to someone used to thinking in terms of strictly numerical structures. I never got the hang of when I should index with () or {} or [] Matlab ... I'd have to look it up to tell you. R, on the other hand, is more like Python in this regard - even if it's not quite as clean as Python, it makes basic things like importing/maniplating CSVs much easier than Python (or even Excel, which is even designed around that exact purpose).
R, unlike Matlab, returns the last value computed, not the last values with the same local names as the return value names.
R, unlike Matlab, uses a more intuitive (to me) definition of dimensions (and of row- vs. column-vectors). I spent 80% of my time in Matlab figuring out how to get dimensions to match in a robust manner, and I've never had to do that in R.
You get the idea - my frustrations with the language itself are mostly with the fact that it's so unlike most other languages, and it's too much of a hassle to learn. My frustrations with the language environment is that the free alternative (R) is much easier to work with, and much more cross-platform.
-----
drunkpotato 63 days ago | link

Thank you, that's a good list! Dealing with cell arrays and text vs. numerics is why I do my pre-processing in Python. Matlab's job is to read in the data, run it through various algorithms, collect accuracy statistics, and show plots.For us the issue is not so much Matlab as a programming language, but rather availability of new algorithms and ease of parallel processing. The licensing issues involved in getting the parallel toolbox running on multiple workstations seems like a headache, which is part of what is motivating us to look at R.
-----
TalGalili 64 days ago | link

One thing is probably the huge number of statistical packages it has (see: http://cran.r-project.org/web/views/), including (static) graphics (e.g: ggplot2, lattice, and just the base graphics).-----
timClicks 64 days ago | link

What do you think of pandas? It gives you an R-esque feel within Python.-----
chimeracoder 64 days ago | link

I haven't used it - I'll check it out. Though the more I think about it, the more I think that my issues stem from the fact that I need two fundamentally different ways of looking at types in each part of the workflow, so it may be difficult to simulate that within Python - we'll see.-----
TalGalili 64 days ago | link

Hi chimeracoder, I am very curious to better understand how you find python better for the data pre-processing stage. I use only R, and would love to know what "I am missing" here. Any simple example will also make it easier to understand.Thanks!
-----
probably 64 days ago | link

Well explained by chimeracoder. Data-table centric operations are much more natural in R while sequential objects (lists, tuples, and strings) are quickly manipulated in Python (there are more string/regex methods there).I am a heavy Python user, but when I use Numpy/Scipy I don't feel like I'm using Python much anymore so at that point I either switch to R (or Fortran)... though I'm quite optimistic that at some point the pandas DataFrame can become my default storage structure from which I can parse out R tasks through Rpy, SQLite, HDF5, or possibly Reddis.matplotlib is very verbose though; I almost prefer Matlab's graphics model... though less so than R's basic and lattice graphics.
-----
drucken 64 days ago | link

Have you evaluated any other of the LLVM languages such as Clay or Rust?-----
chimeracoder 64 days ago | link

One of my friends seems to like Rust, though I haven't actually used either myself.-----
keithflower 64 days ago | link

I tend to use Python and Num/SciPy for as much as I can, then reach for RPy:http://rpy.sourceforge.net/
-----
chimeracoder 64 days ago | link

Haha, that's really interesting - I can't believe I've never seen that. I'll see if this may solve my problems....-----
dkarl 64 days ago | link

If you read the original blog post, you won't get a headache from the hundreds of missing spaces in the r-bloggers version: http://dmbates.blogspot.com/2012/04/r-programmer-looks-at-ju...-----
haberman 64 days ago | link

I've written a few hundred lines of R sporadically over the last several years. The absolute worst thing about it in my opinion is the type system. It does not matter how many times I use R, I cannot for the life of me remember or understand the difference between vectors, arrays, lists, data frames, and matrices. A list is sort of like a mix between an array and a map, a matrix is sorta like a 2d vector but can have row/column names, an array is like a matrix but different, and data frame is like a heterogenous matrix. And converting between them is always tricky.As much as R may be capable of, I just can't get past how inconsistent and complicated its basic types are.
-----
chubot 64 days ago | link

The terminology is weird. I'm not an R expert, but here's how I think of it:vector: this one is clear based on the name; it's a homogeneous sequence (with very aggressive type conversion). A sequence of strings, a sequence of numerics, etc. One thing worth knowing is that there are no atomic types, so c(1) == 1. That is, the value 1 is identical to the singleton vector containing 1. Also the empty vector c() is identical to NULL! is.null(c()) == TRUE. Weird.
list: the name is confusing, but I think of it basically like a dict in Python. And the syntax is the same: list(a=1, b=2) vs dict(a=1, b=2). I think you can use it like a sequence as you are saying, but I never use them that way. Lists are for ad hoc composite types -- if I want to return 2 values from a function, I return a list() of them. I think you can convert lists to environments easily, or they are the same -- also similar to Python's dicts.
data frame: This is the core type AFAICT, it is basically a collection of named column vectors of the same length. e.g. data.frame(name=c("a", "b", "c"), value=c(1,2,3)). This seems pretty intuitive. A row has different types (like a DB relation) but the columns have the same type since a column is vector.
matrix: I don't use these too much, but it basically seems like a homogeneous type like vector, except you specify the dimensions.
array: I don't use this, but the R documentation says "A 2-dimensional array is the same thing as a matrix". So I think I am confused and what I typed above is an "array", and matrix is the special 2D case. Yes the names are bad. I think of a matrix as having arbitrary number of dimensions (e.g. in matlab).
I think where it gets confusing is that there are all these arbitary conversions. And you can use things more than the prescribed ways, so you might stumble across code that uses them wrong. But after a fair amount of R programming, there is my mental model, whether right or wrong :)
I think a lot of the mess comes from the fact that dealing with real data is just messy. R takes the mess and makes the common case convenient, and people like that. But it's like Perl in that it's a "Do what I mean" language and tries to guess a lot, rather than "Do what I say" like Python. And when it's guessing your intent wrong it can leave you very frustrated, as with Perl.
-----
TalGalili 64 days ago | link

Hi chubot,Two things:
1) A data.frame is in fact a list of vectors of the same length "compacted" together.
2) I find the types very "sensible" for a person doing statistics. But I guess (almost) everything makes sense once you get used to it...
-----
eddie_the_head 64 days ago | link

I wonder if anyone has written up a comparison on R and J.-----
TalGalili 63 days ago | link

There is now: http://www.r-bloggers.com/comparing-julia-and-r%E2%80%99s-vo...-----
eddie_the_head 62 days ago | link

That's comparing R and Julia, not R and J.