How to use Julia in Quarto


Jihong Zhang


March 15, 2024

1 Previous posts

This post illustrates how to use Julia to create a gradient descent algorithm. What has not been introduced, however, is how to perform the data analysis using Julia in Quarto. This post will illustrate the workflow step by step.

2 Initial Setup

First of all, refer to this, JuliaHub, and Patrick Altmeyer’s post. The first step is to install following components:

  1. IJulia
  2. Revise.jl
  3. Jupyter Cache
using Pkg
using Conda

Second, when you create the new quarto document, make sure the yaml header contains the jupyter item. For example, the yaml of this post is:

title: 'How to use Julia in Quarto'
author: 'Jihong Zhang'
date: 'Mar 10 2024'
  - Julia
  - Quarto
    code-summary: 'Code'
    code-fold: false
    code-line-numbers: false
jupyter: julia-1.6

After the installation, you should be able to run the julia code in quarto like:

print("Hello World!")
Hello World!

3 Import dataset

# import packages
using DataFrames
using CSV
# load in the diamonds.csv
diamonds = DataFrame(CSV.File("diamonds.csv"))
first(diamonds, 7)
7×10 DataFrame
Row carat cut color clarity depth table price x y z
Float64 String15 String1 String7 Float64 Float64 Int64 Float64 Float64 Float64
1 0.23 Ideal E SI2 61.5 55.0 326 3.95 3.98 2.43
2 0.21 Premium E SI1 59.8 61.0 326 3.89 3.84 2.31
3 0.23 Good E VS1 56.9 65.0 327 4.05 4.07 2.31
4 0.29 Premium I VS2 62.4 58.0 334 4.2 4.23 2.63
5 0.31 Good J SI2 63.3 58.0 335 4.34 4.35 2.75
6 0.24 Very Good J VVS2 62.8 57.0 336 3.94 3.96 2.48
7 0.24 Very Good I VVS1 62.3 57.0 336 3.95 3.98 2.47

4 Basic Statistical Modeling

Following the previous post, we can easily model a generalized linear regression using GLM module:

using GLM
lm_fit = lm(@formula(price ~ depth), diamonds)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

price ~ 1 + depth

               Coef.  Std. Error      t  Pr(>|t|)  Lower 95%   Upper 95%
(Intercept)  5763.67    740.556    7.78    <1e-14  4312.17    7215.16
depth         -29.65     11.9897  -2.47    0.0134   -53.1499    -6.15005

Let’s do some more advanced measurement - Factor analysis:

using MultivariateStats
# only sample first 300 cases and four variables
Xtr = diamonds[1:300 , [:x, :y, :z]]
# with each observation in a column
Xtr = Matrix(Xtr)' # somehow the data matrix has size of (d, n), which is the trasponse of data matrix in R 
# train a one-factor model
M = fit(FactorAnalysis, Xtr; maxoutdim=1, method=:em)
Factor Analysis(indim = 3, outdim = 1)

You can refer to this doc for more details for parameter estimation of factor analysis

3×1 Matrix{Float64}:

Let’s quickly compare the results of lavaan

X = diamonds[1:300, c('x', 'y', 'z')]
fa_model = "
F1 =~ x + y + z
fit = cfa(fa_model, data = X, = TRUE)
coef(fit)[1:3] # factor loading
    F1=~x     F1=~y     F1=~z 
0.7802245 0.7673664 0.4752576 
