<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://analogensemble.ddns.net/AnalogsEnsemble/feed.xml" rel="self" type="application/atom+xml" /><link href="https://analogensemble.ddns.net/AnalogsEnsemble/" rel="alternate" type="text/html" /><updated>2026-05-22T17:30:25+00:00</updated><id>https://analogensemble.ddns.net/AnalogsEnsemble/feed.xml</id><title type="html">Parallel Analog Ensemble</title><subtitle>An integrated package for parallel ensemble forecasts and more, implemented in R and C++.</subtitle><author><name>Weiming Hu</name></author><entry><title type="html">How to format observations for AnEn</title><link href="https://analogensemble.ddns.net/AnalogsEnsemble/2019/11/18/format-obs.html" rel="alternate" type="text/html" title="How to format observations for AnEn" /><published>2019-11-18T00:00:00+00:00</published><updated>2019-11-18T00:00:00+00:00</updated><id>https://analogensemble.ddns.net/AnalogsEnsemble/2019/11/18/format-obs</id><content type="html" xml:base="https://analogensemble.ddns.net/AnalogsEnsemble/2019/11/18/format-obs.html"><![CDATA[<!-- vim-markdown-toc GitLab -->

<ul>
  <li><a href="#introduction">Introduction</a></li>
  <li><a href="#access">Access</a></li>
</ul>

<!-- vim-markdown-toc -->

<h2 id="introduction">Introduction</h2>

<p>This short tutorial walks you through the steps of converting observations stored in a CSV file to an R list that have the required variables by <code class="language-plaintext highlighter-rouge">RAnEn</code>.</p>

<p>It is recommended to use <a href="https://mybinder.org/v2/gh/Weiming-Hu/AnalogsEnsemble/master?urlpath=rstudio">binder</a> and <code class="language-plaintext highlighter-rouge">.Rmd</code> files will guide you through the script line by line.</p>

<p>You will learn the followings:</p>

<ul>
  <li>Formatting observations for <code class="language-plaintext highlighter-rouge">RAnEn</code></li>
  <li><code class="language-plaintext highlighter-rouge">RAnEn::writeNetCDF</code></li>
  <li><code class="language-plaintext highlighter-rouge">RAnEn::readObservations</code></li>
</ul>

<h2 id="access">Access</h2>

<p>This tutorial can be accessed on binder. Please click <a href="https://mybinder.org/v2/gh/Weiming-Hu/AnalogsEnsemble/master?urlpath=rstudio">here</a> to start an interactive session and go over the tutorial under <code class="language-plaintext highlighter-rouge">RAnalogs/examples</code>. Or you can download the repository and use the <a href="https://github.com/Weiming-Hu/AnalogsEnsemble/blob/master/RAnalogs/examples/demo-5_observation-conversion.Rmd">R markdown file</a> directly.</p>]]></content><author><name>Weiming Hu</name></author><category term="tutorial" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Running Large Scale Analog Ensemble on Cheyenne</title><link href="https://analogensemble.ddns.net/AnalogsEnsemble/2019/04/15/large-simulation-on-cheyenne.html" rel="alternate" type="text/html" title="Running Large Scale Analog Ensemble on Cheyenne" /><published>2019-04-15T00:00:00+00:00</published><updated>2019-04-15T00:00:00+00:00</updated><id>https://analogensemble.ddns.net/AnalogsEnsemble/2019/04/15/large-simulation-on-cheyenne</id><content type="html" xml:base="https://analogensemble.ddns.net/AnalogsEnsemble/2019/04/15/large-simulation-on-cheyenne.html"><![CDATA[<!-- vim-markdown-toc GitLab -->

<ul>
  <li><a href="#background">Background</a></li>
  <li><a href="#a-brief-introduction-to-the-problem">A Brief Introduction to the Problem</a></li>
  <li><a href="#workflow">Workflow</a></li>
</ul>

<!-- vim-markdown-toc -->

<h2 id="background">Background</h2>

<p>This showcase was originally created for the 2019 Software Engineering Assembly (now Improving Scientific Software) conference. During the conference, the presentation was about running large scale Analog Ensemble (AnEn) on NCAR Cheyenne supercomputer systems and the hands-on workshop worked through some basic examples of <code class="language-plaintext highlighter-rouge">RAnEn</code> locally on a desktop and then showcased the workflow of running large scale AnEn on Cheyenne.</p>

<p>Some helpful information link is provided below:</p>

<ul>
  <li><a href="https://sea.ucar.edu/event/uncertainty-quantification-analog-ensemble-scale">The presentation information page</a></li>
  <li><a href="https://sea.ucar.edu/event/parallel-analog-ensemble-forecasts-ensemble-toolkit-hpc">The workshop information page</a></li>
  <li><a href="https://prosecco.geog.psu.edu/docs/SEA2019/">The slides for the presentation and the workshop</a></li>
</ul>

<p>This post summarizes the second part of the workshop, <em>Analog Ensemble at Scale</em>.</p>

<h2 id="a-brief-introduction-to-the-problem">A Brief Introduction to the Problem</h2>

<p>To generate AnEn for wind speed for one month of July 2018 using 1 year of search data in 2017, since the North America Mesoscale (NAM) model is used, we are dealing with about 838 GB of model data including forecasts and analysis. In total, there are 262,792 grid points in the model domain. This big domain is decomposed (broken) row-wise into 50 chunks so that we can generate AnEn for each chunk of the domain in parallel.</p>

<p>Please find the scripts used in this post <a href="https://github.com/Weiming-Hu/AnalogsEnsemble/tree/gh-pages/assets/posts/2019-04-15-large-simulation-on-cheyenne">here</a>.</p>

<p>Please refer to the <a href="https://weiming-hu.github.io/AnalogsEnsemble/2019/02/17/build-on-cheyenne.html">help page on building AnEn on Cheyenne</a> if you would like to check out the tools used in this tutorial.</p>

<h2 id="workflow">Workflow</h2>

<p>In this workshop, 3 steps are involved for AnEn generation and visualization after you have built/accessed the tools and collected data:</p>

<ul>
  <li>Step 1: Generate AnEn for each domain chunk. Each domain chunk is associated with a job, and a set of configuration files. Each configuration file specifies to <code class="language-plaintext highlighter-rouge">analogGenerator</code> which part of the file should be read. A general configuration file is also used to specify some of the common parameters that are shared across all domain chunks like weights and observation id.</li>
  <li>Step 2: Reshape AnEn results from all days per chunk to all chunks per day. It is generally more convenient for verification and visualization when all grid points are include in the same file. This is achieved by reorganizing the data files from separate chunks into a complete domain with day intervals.</li>
  <li>Step 3: Visualize AnEn results. At this point, each NetCDF file should have a daily forecast for the entire model domain which is easy to visualize. An R script is prepared to generate the figures.</li>
</ul>

<p>This is the video showing how the entire workflow looks like. Click <a href="https://youtu.be/GfOmE9zeBLs">here</a> if the video is not showing up correctly.</p>

<iframe width="560" height="315" src="https://www.youtube.com/embed/GfOmE9zeBLs" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>

<p>Please connect <a href="weiming@psu.edu">Weiming Hu</a> if you would like a copy of the script.</p>

<p>Thanks.</p>]]></content><author><name>Weiming Hu</name></author><category term="showcase" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">How to Automate Data Preprocessing for AnEn Computation on Cheyenne</title><link href="https://analogensemble.ddns.net/AnalogsEnsemble/2019/03/01/using-gribConverter-on-cheyenne.html" rel="alternate" type="text/html" title="How to Automate Data Preprocessing for AnEn Computation on Cheyenne" /><published>2019-03-01T00:00:00+00:00</published><updated>2019-03-01T00:00:00+00:00</updated><id>https://analogensemble.ddns.net/AnalogsEnsemble/2019/03/01/using-gribConverter-on-cheyenne</id><content type="html" xml:base="https://analogensemble.ddns.net/AnalogsEnsemble/2019/03/01/using-gribConverter-on-cheyenne.html"><![CDATA[<!-- vim-markdown-toc GitLab -->

<ul>
  <li><a href="#introduction">Introduction</a></li>
  <li><a href="#data-preparation">Data Preparation</a></li>
  <li><a href="#scripts">Scripts</a></li>
  <li><a href="#results">Results</a></li>
  <li><a href="#appendix">Appendix</a></li>
</ul>

<!-- vim-markdown-toc -->

<h2 id="introduction">Introduction</h2>

<p>This tutorial shows how to use the data preprocessing tools (<code class="language-plaintext highlighter-rouge">gribConverter</code>, <code class="language-plaintext highlighter-rouge">windFieldCalculator</code>) in the AnEn package to reformat the data to the correct form that can be directly used by a number of computation tools (<code class="language-plaintext highlighter-rouge">similarityCalculator</code>, <code class="language-plaintext highlighter-rouge">analogGenerator</code>, <code class="language-plaintext highlighter-rouge">RAnEn</code>) to generate analog ensembles. Addition to that, this tutorial also shows how to automate and parallelize the process on Cheyenne supercomputers.</p>

<p>This tutorial assumes the basic knowledge on bash script language and that <code class="language-plaintext highlighter-rouge">AnEn</code> package has already been successfully installed. More information of how to install <code class="language-plaintext highlighter-rouge">AnEn</code> on Cheyenne can be found <a href="https://weiming-hu.github.io/AnalogsEnsemble/2019/02/17/build-on-cheyenne.html">here</a>.</p>

<p>This tutorial also assumes that you have already built the AnEn tools. Instructions for building the tools can be found <a href="https://weiming-hu.github.io/AnalogsEnsemble/2019/02/17/build-on-cheyenne.html">here</a>.</p>

<h2 id="data-preparation-and-goals">Data Preparation and Goals</h2>

<p>A large collection (~5.4TB) of data from <a href="https://www.ncdc.noaa.gov/data-access/model-data/model-datasets/north-american-mesoscale-forecast-system-nam">North American Mesoscale Forecast model</a> have been downloaded for the time period from October, 2008, to July, 2018. The original files are <code class="language-plaintext highlighter-rouge">.g2.tar</code> files and an example for the file name is <code class="language-plaintext highlighter-rouge">nam_218_2008102900.g2.tar</code>. Files have already been arranged by <code class="language-plaintext highlighter-rouge">YearMonth</code> in each folder as follows:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; ls
200810  200907  201004  201101  201110  201207  201304  201401  201410  201507  201604  201701  201710  201807
200811  200908  201005  201102  201111  201208  201305  201402  201411  201508  201605  201702  201711  
200812  200909  201006  201103  201112  201209  201306  201403  201412  201509  201606  201703  201712  
200901  200910  201007  201104  201201  201210  201307  201404  201501  201510  201607  201704  201801  
200902  200911  201008  201105  201202  201211  201308  201405  201502  201511  201608  201705  201802
200903  200912  201009  201106  201203  201212  201309  201406  201503  201512  201609  201706  201803
200904  201001  201010  201107  201204  201301  201310  201407  201504  201601  201610  201707  201804
200905  201002  201011  201108  201205  201302  201311  201408  201505  201602  201611  201708  201805
200906  201003  201012  201109  201206  201303  201312  201409  201506  201603  201612  201709  201806
</code></pre></div></div>

<p>Original NAM forecast files are organized by day, cycle time, and lead time. Each file is a compilation of parameters at all available locations/grid points. However, data that AnEn requires have a <a href="https://weiming-hu.github.io/AnalogsEnsemble/2019/01/16/NetCDF-File-Types.html#forecasts">different format</a>. This format requires the file to have parameters, grid points, times, and lead times information included. Our goal is convert the model output to this format.</p>

<p>Since the total file size exceeds 5 TB, it would be a better practice to avoid a huge file, but to have it broken down to chunks. Therefore, the files are grouped by month.</p>

<h2 id="scripts">Scripts</h2>

<p>I have prepared two scripts. The first script is the resource PBS script. This script does the following tasks:</p>

<ul>
  <li>Identifies which folders are currently being processed by search for a lock file and which folders have already been processed by searching for the expected output data file;</li>
  <li>Selects <em>only one</em> folder that has not yet been processed;</li>
  <li>Untars tarballs in a temporary folder;</li>
  <li>Convert submessages to independent messages using <code class="language-plaintext highlighter-rouge">grib_copy</code> (<a href="https://confluence.ecmwf.int/pages/viewpage.action?pageId=52462916#GRIBtoolsexamples-grib_copyexamples">reference: grib_copy example 4</a>);</li>
  <li>Converts grb2 files to NetCDF files;</li>
  <li>Computes and adds wind direction and speed fields to the NetCDF file;</li>
  <li>Exits normally.</li>
</ul>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#!/bin/bash

# The name of the task
#PBS -N process_each_month

# The project account
#PBS -A MY.PROJECT.ACCOUNT

# The time resources requested
#PBS -l walltime=10:00:00

# The queue type
#PBS -q regular

# Combine standard output and errors
#PBS -j oe                           

# The computing resources requested
#PBS -l select=1:ncpus=1:mem=109GB:ompthreads=1

# I would like to receive an email when tasks
# (a)bort, (b)egin, and (e)nd.
#
#PBS -m abe

# And this is the email
#PBS -M my.email@server.com

# These are the available folders. The folder names are also going to be the names of NetCDF files.
declare -a arr=("200810" "200811" "200812" "200901" "200902" "200903" "200904" "200905" "200906" "200907" "200908" "200909" "200910" "200911" "200912" "201001" "201002" "201003" "201004" "201005" "201006" "201007" "201008" "201009" "201010" "201011" "201012" "201101" "201102" "201103" "201104" "201105" "201106" "201107" "201108" "201109" "201110" "201111" "201112" "201201" "201202" "201203" "201204" "201205" "201206" "201207" "201208" "201209" "201210" "201211" "201212" "201301" "201302" "201303" "201304" "201305" "201306" "201307" "201308" "201309" "201310" "201311" "201312" "201401" "201402" "201403" "201404" "201405" "201406" "201407" "201408" "201409" "201410" "201411" "201412" "201501" "201502" "201503" "201504" "201505" "201506" "201507" "201508" "201509" "201510" "201511" "201512" "201601" "201602" "201603" "201604" "201605" "201606" "201607" "201608" "201609" "201610" "201611" "201612" "201701" "201702" "201703" "201704" "201705" "201706" "201707" "201708" "201709" "201710" "201711" "201712" "201801" "201802" "201803" "201804" "201805" "201806" "201807")

# Define the configuration file for gribConverter.
# The file can be found at 
# https://github.com/Weiming-Hu/AnalogsEnsemble/blob/master/apps/app_gribConverter/example/commonConfig.cfg
#
converterConfig=/glade/u/home/wuh20/scratch/data/forecasts/forecasts.cfg

# Define the output destination
destDir=/glade/u/home/wuh20/flash/forecasts_new/

# Define the lock file name
lockFile=.lock

for month in "${arr[@]}"; do
    # This is the data folder
    monthDir=/glade/u/home/wuh20/scratch/data/forecasts/$month
    
    # Whether this folder has already been processed
    if [ -f $destDir/$month\.nc  ]; then
        echo Month $month has been processed. Skip this month.
        continue
    fi
    
    # Check whether this directory exists
    if [ ! -d $monthDir  ]; then
        echo Directory not found: $monthDir
        exit 1
    fi
    
    cd $monthDir
    
    # Lock this directory
    if [ -f $lockFile  ]; then
        echo Directory $monthDir is in process. Skip this directory.
        continue
    else
        echo Lock directory $monthDir
        touch $lockFile
    fi
    
    # Create a folder to store the original files
    if [ ! -d original-extract-files ]; then
        echo Create folder for original extract files ...
        mkdir original-extract-files
    fi
    
    # Unpack tar files
    echo Extracting from tar files ...
    if [ -f log_extract  ]; then
        rm log_extract
    fi

    for tarFile in *.g2.tar; do
        tar --skip-old-files -xvf $tarFile -C original-extract-files &gt;&gt; log_extract
    done
    
    echo flattening messages with submessages ...
    for file in `ls original-extract-files`; do
        if [ ! -f $file ]; then
            /glade/u/home/wuh20/github/AnalogsEnsemble/dependency/install/bin/grib_copy original-extract-files/$file $file
        fi
    done
    
    # Convert grb2 files
    echo Converting grb2 files ...
    if [ ! -f $month-original.nc ]; then
        /glade/u/home/wuh20/github/AnalogsEnsemble/output/bin/gribConverter -c $converterConfig --folder ./ -o $month-original.nc -v 3 &gt; log_converter
    fi

    # Add wind fields
    echo Adding wind fields ...
    if [ ! -f $month\.nc ]; then
        /glade/u/home/wuh20/github/AnalogsEnsemble/output/bin/windFieldCalculator --file-in $month-original.nc --file-type Forecasts --file-out $month\.nc -U 1000IsobaricInhPaU -V 1000IsobaricInhPaV --dir-name 1000IsobaricInhPaDir --speed-name 1000IsobaricInhPaSpeed -v 3 &gt; log_wind
    fi

    # Move the data file elsewhere
    echo Moving data to $destDir
    mv $month\.nc $destDir
    
    # Cleaning
    echo Cleaning ...
    rm -rf original-extract-files
    rm $month-original.nc
    rm *.grb2
    
    echo Releasing the folder lock
    rm $lockFile
   
    # Each job only process one month
    echo Finished processing month $month
    exit 0
done
</code></pre></div></div>

<p>The first pbs script pretty much defines all the tasks that should be done. However, tasks for each month are entirely independent from each other and can be fully parallelized. Therefore, I decided that each task only processes one folder instead of continuing to the next folder available to avoid confusion between tasks. The following script simply deal with batch submitting the tasks to Cheyenne scheduler.</p>

<p>We have another problem here that if two tasks are started simultaneously, there is a slight possibility that they will process the same folder and the folder lock mechanism based on file creating might not work. A simple workaround for that is to ensure submitting a new job when there is no queueing tasks meaning all tasksing have been started.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#!/bin/bash

# Define the total number of jobs to create.
totalJobs=118

# Define the counter start.
submittedJobs=0

while true; do
    # Get the number of queued jobs by looking at the queue status looking for the symbols
    number=`qstat | grep "Q regular" | wc -l`
    
    echo The number of queued jobs: $number
    echo The number of submitted jobs: $submittedJobs
    if (( number == 0  )); then
        echo There is no queued jobs. Submit a new one.
        qsub batch_process.pbs
        submittedJobs=$((submittedJobs + 1))
        if (( submittedJobs == totalJobs  )); then
            echo $submittedJobs jobs submitted. Done!
            exit 0
        fi
    fi
    sleep 10
done
</code></pre></div></div>

<h2 id="results">Results</h2>

<p>By the completion of the scripts, we would have the following files in our destination folder:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; ls
200810.nc  200910.nc  201010.nc  201110.nc  201210.nc  201310.nc  201410.nc  201510.nc	201610.nc  201710.nc
200811.nc  200911.nc  201011.nc  201111.nc  201211.nc  201311.nc  201411.nc  201511.nc	201611.nc  201711.nc
200812.nc  200912.nc  201012.nc  201112.nc  201212.nc  201312.nc  201412.nc  201512.nc	201612.nc  201712.nc
200901.nc  201001.nc  201101.nc  201201.nc  201301.nc  201401.nc  201501.nc  201601.nc	201701.nc  201801.nc
200902.nc  201002.nc  201102.nc  201202.nc  201302.nc  201402.nc  201502.nc  201602.nc	201702.nc  201802.nc
200903.nc  201003.nc  201103.nc  201203.nc  201303.nc  201403.nc  201503.nc  201603.nc	201703.nc  201803.nc
200904.nc  201004.nc  201104.nc  201204.nc  201304.nc  201404.nc  201504.nc  201604.nc	201704.nc  201804.nc
200905.nc  201005.nc  201105.nc  201205.nc  201305.nc  201405.nc  201505.nc  201605.nc	201705.nc  201805.nc
200906.nc  201006.nc  201106.nc  201206.nc  201306.nc  201406.nc  201506.nc  201606.nc	201706.nc  201806.nc
200907.nc  201007.nc  201107.nc  201207.nc  201307.nc  201407.nc  201507.nc  201607.nc	201707.nc  201807.nc
200908.nc  201008.nc  201108.nc  201208.nc  201308.nc  201408.nc  201508.nc  201608.nc	201708.nc
200909.nc  201009.nc  201109.nc  201209.nc  201309.nc  201409.nc  201509.nc  201609.nc	201709.nc
</code></pre></div></div>

<p>And each file has the correct format for AnEn computation.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; ncdump -h 201801.nc 
netcdf \201801 {
dimensions:
	num_parameters = 17 ;
	num_chars = 50 ;
	num_stations = 262792 ;
	num_times = 31 ;
	num_flts = 53 ;
variables:
	char ParameterNames(num_parameters, num_chars) ;
	double ParameterWeights(num_parameters) ;
	char ParameterCirculars(num_parameters, num_chars) ;
	char StationNames(num_stations, num_chars) ;
	double Xs(num_stations) ;
	double Ys(num_stations) ;
	double Times(num_times) ;
	double FLTs(num_flts) ;
	double Data(num_flts, num_times, num_stations, num_parameters) ;
}
</code></pre></div></div>]]></content><author><name>Weiming Hu</name></author><category term="tutorial" /><category term="gribConverter" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Building AnEn on NCAR Cheyenne</title><link href="https://analogensemble.ddns.net/AnalogsEnsemble/2019/02/17/build-on-cheyenne.html" rel="alternate" type="text/html" title="Building AnEn on NCAR Cheyenne" /><published>2019-02-17T00:00:00+00:00</published><updated>2019-02-17T00:00:00+00:00</updated><id>https://analogensemble.ddns.net/AnalogsEnsemble/2019/02/17/build-on-cheyenne</id><content type="html" xml:base="https://analogensemble.ddns.net/AnalogsEnsemble/2019/02/17/build-on-cheyenne.html"><![CDATA[<!-- vim-markdown-toc GitLab -->

<ul>
  <li><a href="#introduction">Introduction</a></li>
  <li><a href="#building-anen">Building AnEn</a></li>
</ul>

<!-- vim-markdown-toc -->

<h2 id="introduction">Introduction</h2>

<p>This short tutorial walks you through the steps of building the AnEn C++ program on <a href="https://www2.cisl.ucar.edu/resources/computational-systems/cheyenne/cheyenne">NCAR Cheyenne Supercomputers</a>.</p>

<h2 id="building-anen">Building AnEn</h2>

<p>Several things to be noted before we carry on:</p>

<ul>
  <li>Most of the dependencies are already available on Cheyenne, so I’m going to load them directly. <code class="language-plaintext highlighter-rouge">Boost</code>, however, is not available, so I tell <code class="language-plaintext highlighter-rouge">cmake</code> to build it for me.</li>
  <li>I will be installing <code class="language-plaintext highlighter-rouge">PAnEn</code> into a user space folder after the successful building. You can change the argument <code class="language-plaintext highlighter-rouge">CMAKE_INSTALL_PREFIX</code>.</li>
  <li>Notice the argument <code class="language-plaintext highlighter-rouge">CMAKE_INSTALL_RPATH</code>. This is needed because the modules are not in system path. When we install programs, <code class="language-plaintext highlighter-rouge">cmake</code> by default removes build-time run path, so we need to specify the run-time path for install and where the executable should be looking for libraries.</li>
</ul>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Download the source files
wget https://github.com/Weiming-Hu/AnalogsEnsemble/archive/master.zip

# Unzip the tarball
unzip master.zip

# Go to the source folder
cd AnalogEnsemble-master

# Clean modules
module purge

# Load required modules
module load gnu/9.1.0 netcdf/4.7.3 ncarenv/1.3 cmake/3.16.4 eccodes/2.12.5

# Carry an out-of-tree build
mkdir build
cd build

# Generate build system
cmake -DCMAKE_INSTALL_PREFIX=../../release -DBUILD_BOOST=ON -DCMAKE_PREFIX_PATH="$NCAR_ROOT_ECCODES;$NETCDF" -DCMAKE_INSTALL_RPATH="$NCAR_ROOT_ECCODES/lib;$NETCDF/lib" ..

# Build
make -j 16

# Test
make test

# Instal
make install

# Show help message
cd ../../release/bin
./anen
</code></pre></div></div>

<p>If you log out and log back in, you need to at least load the GNU module for <code class="language-plaintext highlighter-rouge">anen</code> to work.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>module load gnu/9.1.0
</code></pre></div></div>

<p>If you encountered any problems, please open a ticket <a href="https://github.com/Weiming-Hu/AnalogsEnsemble/issues">here</a>.</p>]]></content><author><name>Weiming Hu</name></author><category term="tutorial" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Operational Search with RAnEn</title><link href="https://analogensemble.ddns.net/AnalogsEnsemble/2019/02/12/operational-search.html" rel="alternate" type="text/html" title="Operational Search with RAnEn" /><published>2019-02-12T00:00:00+00:00</published><updated>2019-02-12T00:00:00+00:00</updated><id>https://analogensemble.ddns.net/AnalogsEnsemble/2019/02/12/operational-search</id><content type="html" xml:base="https://analogensemble.ddns.net/AnalogsEnsemble/2019/02/12/operational-search.html"><![CDATA[<!-- vim-markdown-toc GitLab -->

<ul>
  <li><a href="#introduction">Introduction</a></li>
  <li><a href="#access">Access</a></li>
</ul>

<!-- vim-markdown-toc -->

<h2 id="introduction">Introduction</h2>

<p>Prediction accuracy of the Analog Ensemble depends on the quality of analogs. Presumably, better analogs will generate better predictions. In an operational model, it is likely that the historical forecasts in the near past are the most similar to the current forecast. Therefore, in operational mode, as each day passes, it is added to the historical repository.</p>

<p>This article shows an example of how to use <code class="language-plaintext highlighter-rouge">RAnEn</code> with an operational search. It is strongly suggested to go over the <a href="https://weiming-hu.github.io/AnalogsEnsemble/2018/11/04/demo-1-RAnEn-basics.html">demo 1</a> prior to this tutorial.</p>

<h2 id="access">Access</h2>

<p>This tutorial can be accessed on binder. Please click <a href="https://mybinder.org/v2/gh/Weiming-Hu/AnalogsEnsemble/master?urlpath=rstudio">here</a> to start an interactive session and go over the tutorial under <code class="language-plaintext highlighter-rouge">RAnalogs/examples</code>. Or you can download the repository and use the <a href="https://github.com/Weiming-Hu/AnalogsEnsemble/blob/master/RAnalogs/examples/demo-3_operational-search.Rmd">R markdown file</a>.</p>]]></content><author><name>Weiming Hu</name></author><category term="tutorial" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">NetCDF File Types and Variables for Analog Ensemble Applications</title><link href="https://analogensemble.ddns.net/AnalogsEnsemble/2019/01/16/NetCDF-File-Types.html" rel="alternate" type="text/html" title="NetCDF File Types and Variables for Analog Ensemble Applications" /><published>2019-01-16T00:00:00+00:00</published><updated>2019-01-16T00:00:00+00:00</updated><id>https://analogensemble.ddns.net/AnalogsEnsemble/2019/01/16/NetCDF-File-Types</id><content type="html" xml:base="https://analogensemble.ddns.net/AnalogsEnsemble/2019/01/16/NetCDF-File-Types.html"><![CDATA[<hr />

<p><em>Updates on 2021/12/16</em></p>

<ol>
  <li>I used <code class="language-plaintext highlighter-rouge">R</code> to generate the file format messages below. If you are using <code class="language-plaintext highlighter-rouge">ncdump</code> or <code class="language-plaintext highlighter-rouge">python</code>, you should reverse the dimension orders. For example, <code class="language-plaintext highlighter-rouge">Data</code> would be <code class="language-plaintext highlighter-rouge">[num_flts, num_times, num_stations, num_parameters]</code>.</li>
  <li>For character-related variables, like <code class="language-plaintext highlighter-rouge">ParameterNames</code> and <code class="language-plaintext highlighter-rouge">StationNames</code>, there are two storing options. They can be stored as a character matrix shown below, or they can be store as a string vector. In that case, the format would be <code class="language-plaintext highlighter-rouge">string StationNames(num_stations)</code>.</li>
</ol>

<hr />

<h2 id="introduction">Introduction</h2>

<!-- vim-markdown-toc GitLab -->

<ul>
  <li><a href="#introduction">Introduction</a></li>
  <li><a href="#file-types">File Types</a>
    <ul>
      <li><a href="#forecasts">Forecasts</a></li>
      <li><a href="#observations">Observations</a></li>
      <li><a href="#analogs">Analogs</a></li>
      <li><a href="#similarity">Similarity</a></li>
      <li><a href="#standarddeviation">StandardDeviation</a></li>
      <li><a href="#matrix">Matrix</a></li>
    </ul>
  </li>
  <li><a href="#references">References</a></li>
</ul>

<!-- vim-markdown-toc -->

<p>Under the <a href="https://github.com/Weiming-Hu/AnalogsEnsemble/tree/master/apps">apps</a> directory, there are several C++ programs that implements different phases of generating analog ensembles, including calculating standard deviations, calculating similarity metrics, and selecting analog forecasts, and some other programs for data pre-processing. Currently, all input and output files are in NetCDF format. This articles documents variables and dimensions expected in each file type based on the file type, for example, Forecasts, Observations, Similarity, and so on.</p>

<h2 id="file-types">File Types</h2>

<p>The defined <a href="https://weiming-hu.github.io/AnalogsEnsemble/CXX/class_an_en_i_o.html#addbfb455f641a394c14907163874d8fe">file types</a> include:</p>

<ul>
  <li>Forecasts</li>
  <li>Observations</li>
  <li>Analogs</li>
  <li>Similarity</li>
  <li>StandardDeviation</li>
  <li>Matrix</li>
</ul>

<p>Each file type is associated with a list of expected dimensions and a list of expected variables. Those variables and dimensions are required to ensure the correctness and performance of C++ program. Some variables can also be very helpful during visualization.</p>

<h3 id="forecasts"><a href="https://weiming-hu.github.io/AnalogsEnsemble/CXX/class_forecasts.html">Forecasts</a></h3>

<p>An example <code class="language-plaintext highlighter-rouge">Forecasts</code> file includes the following content:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>9 variables (excluding dimension variables):
   char ParameterNames[num_chars,num_parameters]   (Contiguous storage)  
   double ParameterWeights[num_parameters]   (Contiguous storage)  
   char ParameterCirculars[num_chars,num_parameters]   (Contiguous storage)  
   char StationNames[num_chars,num_stations]   (Contiguous storage)  
   double Xs[num_stations]   (Contiguous storage)  
   double Ys[num_stations]   (Contiguous storage)  
   double Times[num_times]   (Contiguous storage)  
   double FLTs[num_flts]   (Contiguous storage)  
   double Data[num_parameters,num_stations,num_times,num_flts]   (Contiguous storage)  

5 dimensions:
   num_parameters  Size:17
   num_chars  Size:50
   num_stations  Size:262792
   num_times  Size:31
   num_flts  Size:53
</code></pre></div></div>

<ul>
  <li><strong>ParameterNames</strong> are the names of each parameters in the forecasts.</li>
  <li><strong>ParameterWeights</strong> are the corresponding weight for each parameter in the forecasts to be used when computing forecast similarity.</li>
  <li><strong>ParameterCirculars</strong> are the names of the circular parameters.</li>
  <li><strong>StationNames</strong> are the names of the forecast stations or grid points.</li>
  <li><strong>Xs</strong> are the x coordinates of the forecast stations or grid points.</li>
  <li><strong>Ys</strong> are the y coordinates of the forecast stations or grid points.</li>
  <li><strong>Times</strong> are the time representation of forecasts. It is the number of seconds since the origin, <a href="https://weiming-hu.github.io/AnalogsEnsemble/CXX/classanen_time_1_1_times.html#a7e08602fb0628df1c5f1cccbb98baeb1">1970-01-01 00:00:00 UTC</a> by default.</li>
  <li><strong>FLTs</strong> are the time representation of forecast lead times. It is the number of seconds since the initialization of the forecast model.</li>
  <li><strong>Data</strong> is a 4-dimensional array that stores the actual forecast values.</li>
</ul>

<h3 id="observations"><a href="https://weiming-hu.github.io/AnalogsEnsemble/CXX/class_observations.html">Observations</a></h3>

<p>An example <code class="language-plaintext highlighter-rouge">Observations</code> file looks pretty much similar <code class="language-plaintext highlighter-rouge">Forecasts</code>, except that the variable <strong>Data</strong> is a 3-dimensional array without forecast lead times.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>8 variables (excluding dimension variables):
   char ParameterNames[num_chars,num_parameters]   (Contiguous storage)  
   double ParameterWeights[num_parameters]   (Contiguous storage)  
   char ParameterCirculars[num_chars,num_parameters]   (Contiguous storage)  
   char StationNames[num_chars,num_stations]   (Contiguous storage)  
   double Xs[num_stations]   (Contiguous storage)  
   double Ys[num_stations]   (Contiguous storage)  
   double Times[num_times]   (Contiguous storage)  
   double Data[num_parameters,num_stations,num_times]   (Contiguous storage)  

4 dimensions:
   num_parameters  Size:15
   num_chars  Size:50
   num_stations  Size:262792
   num_times  Size:496
</code></pre></div></div>

<h3 id="analogs"><a href="https://weiming-hu.github.io/AnalogsEnsemble/CXX/class_analogs.html">Analogs</a></h3>

<p>An example <code class="language-plaintext highlighter-rouge">Analogs</code> file includes the following content:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>10 variables (excluding dimension variables):
    double Analogs[num_stations,num_times,num_flts,num_members,num_cols]   (Contiguous storage)  
    char StationNames[num_chars,num_stations]   (Contiguous storage)  
    double Xs[num_stations]   (Contiguous storage)  
    double Ys[num_stations]   (Contiguous storage)  
    double Times[num_times]   (Contiguous storage)  
    double FLTs[num_flts]   (Contiguous storage)  
    char MemberStationNames[num_chars,member_num_stations]   (Contiguous storage)  
    double MemberXs[member_num_stations]   (Contiguous storage)  
    double MemberYs[member_num_stations]   (Contiguous storage)  
    double MemberTimes[member_num_times]   (Contiguous storage)  

8 dimensions:
    num_stations  Size:10
    num_times  Size:100
    num_flts  Size:10
    num_members  Size:5
    num_cols  Size:3
    num_chars  Size:50
    member_num_stations  Size:10
    member_num_times  Size:1000
</code></pre></div></div>

<ul>
  <li><strong>Analogs</strong> is a 5-dimensional array that stores analog forecasts. More information about analogs can be found at <a href="https://weiming-hu.github.io/AnalogsEnsemble/CXX/class_analogs.html">here</a>.</li>
  <li><strong>FLTs</strong> is the time representation of the analog forecasts. It is the number of seconds since the initialization of the forecast model.</li>
  <li><strong>StationNames</strong> are the names of stations for analog forecasts.</li>
  <li><strong>Xs</strong> are the x coordinates of stations for analog forecasts.</li>
  <li><strong>Ys</strong> are the y coordinates of stations for analog forecasts.</li>
  <li><strong>Times</strong> is the time representation of the analog forecasts. It is the number of seconds since the origin, <a href="https://weiming-hu.github.io/AnalogsEnsemble/CXX/classanen_time_1_1_times.html#a7e08602fb0628df1c5f1cccbb98baeb1">1970-01-01 00:00:00 UTC</a> by default.</li>
  <li><strong>MemberStationNames</strong> are the names of stations for analog members. This can be used together with the search station index in the fifth dimension to get the exact details of search station used.</li>
  <li><strong>MemberXs</strong> are the x coordinates of stations for analog members. This can be used together with the search station index in the fifth dimension to get the exact details of search station used.</li>
  <li><strong>MemberYs</strong> are the y coordinates of stations for analog members. This can be used together with the search station index in the fifth dimension to get the exact details of search station used.</li>
  <li><strong>MemberTimes</strong> is the time representation of the search times. This can be used together with the search time index in the fifth dimension to know what historical time this member belongs to.</li>
</ul>

<h3 id="similarity"><a href="https://weiming-hu.github.io/AnalogsEnsemble/CXX/class_similarity_matrices.html">Similarity</a></h3>

<p>An example <code class="language-plaintext highlighter-rouge">Similarity</code> file includes the following content:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>13 variables (excluding dimension variables):
    double SimilarityMatrices[num_cols,num_entries,num_flts,num_times,num_stations]   (Contiguous storage)  
    char ParameterNames[num_chars,num_parameters]   (Contiguous storage)  
    double ParameterWeights[num_parameters]   (Contiguous storage)  
    char ParameterCirculars[num_chars,num_parameters]   (Contiguous storage)  
    char StationNames[num_chars,num_stations]   (Contiguous storage)  
    double Xs[num_stations]   (Contiguous storage)  
    double Ys[num_stations]   (Contiguous storage)  
    double Times[num_times]   (Contiguous storage)  
    double FLTs[num_flts]   (Contiguous storage)  
    char SearchStationNames[num_chars,search_num_stations]   (Contiguous storage)  
    double SearchXs[search_num_stations]   (Contiguous storage)  
    double SearchYs[search_num_stations]   (Contiguous storage)  
    double SearchTimes[search_num_times]   (Contiguous storage)  

9 dimensions:
    num_stations  Size:10
    num_times  Size:100
    num_flts  Size:10
    num_entries  Size:100
    num_cols  Size:3
    num_parameters  Size:10
    num_chars  Size:50
    search_num_stations  Size:10
    search_num_times  Size:100
</code></pre></div></div>

<ul>
  <li><strong>SimilarityMatrices</strong> is a 5-dimensional array that stores similarity metric values.</li>
  <li><strong>ParameterNames</strong> are names of parameters used to calculate the similarity.</li>
  <li><strong>ParameterWeights</strong> are weights of parameters used to calculate the similarity.</li>
  <li><strong>ParameterCirculars</strong> are names of circular parameters.</li>
  <li><strong>StationNames</strong> are names of stations or grid points for which similaity is generated.</li>
  <li><strong>Xs</strong> are x coordinates of stations or grid points for which similaity is generated.</li>
  <li><strong>Ys</strong> are y coordinates of stations or grid points for which similaity is generated.</li>
  <li><strong>Times</strong> is the time representation of the similarity. It is the number of seconds since the origin, <a href="https://weiming-hu.github.io/AnalogsEnsemble/CXX/classanen_time_1_1_times.html#a7e08602fb0628df1c5f1cccbb98baeb1">1970-01-01 00:00:00 UTC</a> by default.</li>
  <li><strong>FLTs</strong> is the time representation of the similarity. It is the number of seconds since the initialization of the forecast model.</li>
  <li><strong>SearchTimes</strong> are times for the complete search period. This can be used together with the search time index in the fifth dimension to know what historical forecast this similarity is generated from.</li>
  <li><strong>SearchStationNames</strong> are stations names for the complete search data. This can be used together with the search station index in the fifth dimension to know what station/grid point is used to generate similarity.</li>
  <li><strong>SearchXs</strong> are x coordinates for the complete search stations. This can be used together with the search station index in the fifth dimension to know what station/grid point is used to generate similarity.</li>
  <li><strong>SearchYs</strong> are y coordinates for the complete search stations. This can be used together with the search station index in the fifth dimension to know what station/grid point is used to generate similarity.</li>
</ul>

<h3 id="standarddeviation"><a href="https://weiming-hu.github.io/AnalogsEnsemble/CXX/class_standard_deviation.html">StandardDeviation</a></h3>

<p>An example <code class="language-plaintext highlighter-rouge">StandardDeviation</code> file includes the following content:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>8 variables (excluding dimension variables):
    double StandardDeviation[num_parameters,num_stations,num_flts]   (Contiguous storage)  
    char ParameterNames[num_chars,num_parameters]   (Contiguous storage)  
    double ParameterWeights[num_parameters]   (Contiguous storage)  
    char ParameterCirculars[num_chars,num_parameters]   (Contiguous storage)  
    char StationNames[num_chars,num_stations]   (Contiguous storage)  
    double Xs[num_stations]   (Contiguous storage)  
    double Ys[num_stations]   (Contiguous storage)  
    double FLTs[num_flts]   (Contiguous storage)  

4 dimensions:
    num_parameters  Size:10
    num_stations  Size:10
    num_flts  Size:10
    num_chars  Size:50
</code></pre></div></div>

<ul>
  <li><strong>StandardDeviation</strong> is a 3-dimensional array that stores standard deviation values.</li>
  <li><strong>ParameterNames</strong> are the names of parameters.</li>
  <li><strong>ParameterWeights</strong> are the weights of parameters.</li>
  <li><strong>ParameterCirculars</strong> are the names of circular parameters.</li>
  <li><strong>StationNames</strong> are the names of stations or grid points.</li>
  <li><strong>Xs</strong> are the x coordinates of stations or grid points.</li>
  <li><strong>Ys</strong> are the y coordinates of stations or grid points.</li>
  <li><strong>FLTs</strong> are the forecast lead times.</li>
</ul>

<h3 id="matrix">Matrix</h3>

<p>File type <code class="language-plaintext highlighter-rouge">Matrix</code> is designed for time mapping matrix between forecast times/forecast lead times and observation times. It is usually in text file format.</p>

<h2 id="references">References</h2>

<p>All the example output is generated using R package <a href="https://cran.r-project.org/web/packages/ncdf4/index.html">ncdf4</a>.</p>]]></content><author><name>Weiming Hu</name></author><category term="document" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Profile AnEn</title><link href="https://analogensemble.ddns.net/AnalogsEnsemble/2019/01/08/Profile-AnEn.html" rel="alternate" type="text/html" title="Profile AnEn" /><published>2019-01-08T00:00:00+00:00</published><updated>2019-01-08T00:00:00+00:00</updated><id>https://analogensemble.ddns.net/AnalogsEnsemble/2019/01/08/Profile-AnEn</id><content type="html" xml:base="https://analogensemble.ddns.net/AnalogsEnsemble/2019/01/08/Profile-AnEn.html"><![CDATA[<!-- vim-markdown-toc GFM -->

<ul>
  <li><a href="#introduction">Introduction</a></li>
  <li><a href="#result-preview">Result Preview</a></li>
  <li><a href="#preparation-and-clarification">Preparation and Clarification</a></li>
  <li><a href="#profiling-with-tau">Profiling with TAU</a>
    <ul>
      <li><a href="#build-with-tau">Build with <code class="language-plaintext highlighter-rouge">TAU</code></a></li>
      <li><a href="#profiling">Profiling</a></li>
      <li><a href="#visualization">Visualization</a></li>
    </ul>
  </li>
  <li><a href="#profiling-with-gprof">Profiling with <code class="language-plaintext highlighter-rouge">gprof</code></a>
    <ul>
      <li><a href="#build-with-gprof">Build with <code class="language-plaintext highlighter-rouge">gprof</code></a></li>
      <li><a href="#profiling-1">Profiling</a></li>
      <li><a href="#visualization-1">Visualization</a></li>
    </ul>
  </li>
  <li><a href="#profiling-with-valgrind">Profiling with <code class="language-plaintext highlighter-rouge">valgrind</code></a>
    <ul>
      <li><a href="#build">Build</a></li>
      <li><a href="#profiling-2">Profiling</a></li>
      <li><a href="#visualization-2">Visualization</a></li>
    </ul>
  </li>
  <li><a href="#sequel-on-tau-installation">Sequel on TAU Installation</a></li>
</ul>

<!-- vim-markdown-toc -->

<h2 id="introduction">Introduction</h2>

<p>This file documents the process of profiling analysis of the weather forecast technique <a href="https://weiming-hu.github.io/AnalogsEnsemble/"><code class="language-plaintext highlighter-rouge">Analog Ensemble</code></a>.</p>

<h2 id="result-preview">Result Preview</h2>

<p>These figures are generated using TAU profiler and the visualization tools <code class="language-plaintext highlighter-rouge">paraprof</code>.</p>

<p><img src="https://github.com/Weiming-Hu/AnalogsEnsemble/raw/gh-pages/assets/posts/2019-01-08-Profile-AnEn/tau-breakdown-by-thread.png" alt="time-breakdown" />
<img src="https://github.com/Weiming-Hu/AnalogsEnsemble/raw/gh-pages/assets/posts/2019-01-08-Profile-AnEn/tau-3D.png" alt="time-3D" />
<img src="https://github.com/Weiming-Hu/AnalogsEnsemble/raw/gh-pages/assets/posts/2019-01-08-Profile-AnEn/tau-threads.png" alt="time-threads" /></p>

<p>The following figure is generated from <code class="language-plaintext highlighter-rouge">gprof</code>.</p>

<p><img src="https://github.com/Weiming-Hu/AnalogsEnsemble/raw/gh-pages/assets/posts/2019-01-08-Profile-AnEn/gprof.png" alt="time-dot-graph" /></p>

<h2 id="preparation-and-clarification">Preparation and Clarification</h2>

<p>Please note a couple of placeholders in this tutorial. It is recommended to use the absolute full path to replace them.</p>

<ul>
  <li>[Allocation Name] is the project name you are attached to. It shows up every time when you log onto ICS.</li>
  <li>[Analog Ensemble Source Dir] is the root directory of Analog Ensemble programs. You can download it from <a href="https://github.com/Weiming-Hu/AnalogsEnsemble">Github</a>.</li>
  <li>[TAU Source Dir] is the folder all TAU source files are extracted to. You can download TAU <a href="https://www.cs.uoregon.edu/research/tau/downloads.php">here</a>;</li>
  <li>[Profile Data Dir] is the folder with profile data and a configuration file. Please generate the profile data using the R script <a href="https://github.com/Weiming-Hu/AnalogsEnsemble/raw/gh-pages/assets/posts/2019-01-08-Profile-AnEn/generateAnEnInput.R">generateAnEnInput.R</a> by running <code class="language-plaintext highlighter-rouge">Rscript generateAnEnInput.R</code> in a console. The R package <code class="language-plaintext highlighter-rouge">ncdf4</code> is required. The configuration file is <a href="https://github.com/Weiming-Hu/AnalogsEnsemble/raw/gh-pages/assets/posts/2019-01-08-Profile-AnEn/config.cfg">config.cfg</a>.</li>
</ul>

<h2 id="profiling-with-tau">Profiling with TAU</h2>

<h3 id="build-with-tau">Build with <code class="language-plaintext highlighter-rouge">TAU</code></h3>

<p>Similar to <code class="language-plaintext highlighter-rouge">gprof</code>, we need to build the program with <code class="language-plaintext highlighter-rouge">tau</code> compilers. Please install <code class="language-plaintext highlighter-rouge">tau</code> first. Here, I assume that <code class="language-plaintext highlighter-rouge">tau</code> is already available. Wondering how to install <code class="language-plaintext highlighter-rouge">TAU</code>, please jump to the last section.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Build AnEn programs
cd [Analog Ensemble Source Dir]
mkdir build &amp;&amp; cd build

# Generate the make system. We are installing to a specific location to avoid any program clashing
CC=taucc CXX=taucxx cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=../release_tau ..

# Sometimes, TAU might not be able to find some packages. So you might need to add -DCMAKE_PREFIX_PATH to guide tau compilers
CC=taucc CXX=taucxx cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=../release_tau -DCMAKE_PREFIX_PATH=/usr/lib/x86_64-linux-gnu .. 

# Build
make -j 2

# Test
make test

# Install
make install
</code></pre></div></div>

<h3 id="profiling">Profiling</h3>

<p>To collect profiler data, run the program normally. It is necessary to run the program with the exact command for <code class="language-plaintext highlighter-rouge">gprof</code>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd [Profile Data Dir]
OMP_NUM_THREADS=1 [Analog Ensemble Source Dir]/release_tau/bin/anen_grib -c config.cfg
</code></pre></div></div>

<h3 id="visualization">Visualization</h3>

<p>Profile files have names like <code class="language-plaintext highlighter-rouge">profile.0.0.*</code>. We can use the following tools to visualize the results.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># For text visualization
pprof

# For graphic visualization
paraprof
</code></pre></div></div>

<h2 id="profiling-with-gprof">Profiling with <code class="language-plaintext highlighter-rouge">gprof</code></h2>

<p><em>Please note that <code class="language-plaintext highlighter-rouge">gprof</code> might have the highest sampling error among the three solutions here.</em></p>

<h3 id="build-with-gprof">Build with <code class="language-plaintext highlighter-rouge">gprof</code></h3>

<p>To profile the program with <code class="language-plaintext highlighter-rouge">gprof</code>, we only need to build the program with the extra flag <code class="language-plaintext highlighter-rouge">-pg</code>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Go to our root directory and carry an out-of-tree build
cd [Analog Ensemble Source Dir]
mkdir build &amp;&amp; cd build

# Generate the make system. We are installing to a specific location to avoid any program clashing
cmake -DCMAKE_CXX_FLAGS='-pg' -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=../release_gprof ..

# Build
make -j 2

# Test
make test

# Install
make install
</code></pre></div></div>

<p>Let’s check the program is built successfully.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Change the directory to the installation folder
cd ../release/bin
./anen_grib

# The following file should be automatically generated.
file gmon.out
</code></pre></div></div>

<h3 id="profiling-1">Profiling</h3>

<p>Run the program normally.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd [Profile Data Dir]
OMP_NUM_THREADS=1 [Analog Ensemble Source Dir]/release_gprof/bin/anen_grib -c config.cfg
</code></pre></div></div>

<p>This should generate a <code class="language-plaintext highlighter-rouge">gmon.out</code> file.</p>

<h3 id="visualization-1">Visualization</h3>

<p>To visualize the <code class="language-plaintext highlighter-rouge">gprof</code> output, we can convert the text file to a dot graph and then an image. I’m using the <a href="https://github.com/jrfonseca/gprof2dot">gprof2dot</a> program which is written in python.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Install the graphviz if you do not have it
sudo apt install graphviz

virtualenv env -p python3
source env/bin/activate
pip install gprof2dot

# -w for wrapping function names
# -s for stripping detailed function information to reduce texts
#
gprof [Analog Ensemble Source Dir]/build/release/bin/anen_grib gmon.out | gprof2dot -w -s | dot -Tpng -Gdpi=500 -o profile-gprof.png
</code></pre></div></div>

<h2 id="profiling-with-valgrind">Profiling with <code class="language-plaintext highlighter-rouge">valgrind</code></h2>

<p><code class="language-plaintext highlighter-rouge">valgrind</code> is very accurate because it runs your program in a virtual environment. <em>But it does introduces a lot of overhead (10x ~ 80x slower)</em>.</p>

<p>Check if you have already installed the profiler tools. To install them, you can use <code class="language-plaintext highlighter-rouge">sudo apt install kcachegrind valgrind</code>.</p>

<h3 id="build">Build</h3>

<p>No extra configurations are needed. Just build the program as you normally would.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Go to our root directory and carry an out-of-tree build
cd [Analog Ensemble Source Dir]
mkdir build &amp;&amp; cd build

# Generate the make system. We are installing to a specific location to avoid any program clashing
cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=../release_valgrind ..

# Build
make -j 2

# Test
make test

# Install
make install
</code></pre></div></div>

<h3 id="profiling-2">Profiling</h3>

<p>Run the executable with <code class="language-plaintext highlighter-rouge">valgrind</code>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd [Profile Data Dir]
export OMP_NUM_THREADS=1
time valgrind --tool=callgrind [Analog Ensemble Source Dir]/release_valgrind/bin/anen_grib -c config.cfg
</code></pre></div></div>

<h3 id="visualization-2">Visualization</h3>

<p>Some profile data files with names like <code class="language-plaintext highlighter-rouge">callgrind.out.*</code> should have been generated. Use <code class="language-plaintext highlighter-rouge">kcachegrind</code> to visualize them. Choose the latest one if you have multiple of them. Usually this is because you have run the command multiple times.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kcachegrind [callgrind.out.* profile data file]
</code></pre></div></div>

<h2 id="sequel-on-tau-installation">Sequel on TAU Installation</h2>

<p>I found <a href="https://www.cs.uoregon.edu/research/tau/home.php">TAU</a> profiler to be very powerful and convenient to use. It is a piece of software from the University of Oregon. The <a href="http://www.paratools.com/tau">video</a> walks you through the installation and I followed it. There might be typos so be careful when reading and watching.</p>

<p>For <code class="language-plaintext highlighter-rouge">TAU_OPTIONS</code>, you can find the references <a href="https://www.alcf.anl.gov/user-guides/tuning-and-analysis-utilities-tau">here</a>. At this point, I have successfully built <code class="language-plaintext highlighter-rouge">TAU</code> with the visualizer <code class="language-plaintext highlighter-rouge">paraprof</code>.</p>]]></content><author><name>Weiming Hu</name></author><category term="tutorial" /><category term="profiling" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">The Analog Ensemble Technique Explained</title><link href="https://analogensemble.ddns.net/AnalogsEnsemble/2018/12/14/AnEn-explained.html" rel="alternate" type="text/html" title="The Analog Ensemble Technique Explained" /><published>2018-12-14T00:00:00+00:00</published><updated>2018-12-14T00:00:00+00:00</updated><id>https://analogensemble.ddns.net/AnalogsEnsemble/2018/12/14/AnEn-explained</id><content type="html" xml:base="https://analogensemble.ddns.net/AnalogsEnsemble/2018/12/14/AnEn-explained.html"><![CDATA[<h2 id="schematic-diagram">Schematic Diagram</h2>

<p>The following schematic diagram shows the four steps to generate a four-member ensemble forecast.</p>

<p><img src="https://github.com/Weiming-Hu/AnalogsEnsemble/raw/gh-pages/assets/posts/2018-12-14-AnEn-explained/AnEn-schema.png" alt="AnEn-scheme" /></p>

<ul>
  <li>Step 1: The process starts with a current deterministic multivariate prediction and a set of historical predictions from a deterministic weather model. The multivariate prediction includes surface temperature, humidity, wind speed, and so on. Corresponding observations to each historical forecasts are also collected.</li>
  <li>Step 2: A number of historical predictions are identified based on their similarity to the current multivariate prediction. This similarity is also time-dependent, meaning that, instead of point-to-point comparison, it also compares the trend of each weather variable within a short time range.</li>
  <li>Step 3: The corresponding observations associated with the identified historical predictions are selected.</li>
  <li>Step 4: These <strong>observations</strong> become ensemble members in the final forecast.</li>
</ul>

<h2 id="simplified-example-for-temperature-forecasts">Simplified Example for Temperature Forecasts</h2>

<p>Please navigate through the following slides to see the example.</p>

<iframe src="https://onedrive.live.com/embed?resid=BCFC2A6DB1591BCA%212248&amp;authkey=%21AHAAoXCKyl1NiTs&amp;em=2&amp;wdAr=1.7777777777777777" width="610px" height="367px" frameborder="0">This is an embedded <a target="_blank" href="https://office.com">Microsoft Office</a> presentation, powered by <a target="_blank" href="https://office.com/webapps">Office Online</a>.</iframe>

<p><em>Animation credited to <a href="http://geoinf.psu.edu/people.shtml">Laura Clemente-Harding</a> and <a href="http://geoinf.psu.edu/people.shtml">Guido Cervone</a></em></p>

<ul>
  <li>Step 1: A deterministic model has been running for a week and a new prediction is generated from the model. Red dots are temperature observations, and black dots are model predictions.</li>
  <li>Step 2: By comparing current and historical model predictions (black dots), most similar past forecasts are identified.</li>
  <li>Step 3: The corresponding observations are selected that are associated with the identified past predictions.</li>
  <li>Step 4: These <strong>observations</strong> become ensemble members in the final forecast.</li>
</ul>

<p>Of course, in reality, the similarity metric is a time-dependent and multivariate metric.</p>

<h2 id="references">References</h2>

<ul>
  <li><a href="https://weiming-hu.github.io/AnalogsEnsemble/">Analog Ensemble Package</a></li>
  <li><a href="https://ral.ucar.edu/sites/default/files/public/images/events/WISE_documentation_20170725_Final.pdf">A Beginners Introduction to the Analog Ensemble Technique</a></li>
  <li><a href="https://www.schweizerbart.de/papers/metz/detail/24/84737/Predictor_weighting_strategies_for_probabilistic_wind_power_forecasting_with_an_analog_ensemble">Predictor-weighting strategies for probabilistic wind power forecasting with an analog ensemble</a></li>
  <li><a href="https://www.sciencedirect.com/science/article/pii/S0960148117301386">Short-term photovoltaic power forecasting using Artificial Neural Networks and an Analog Ensemble</a></li>
  <li><a href="https://journals.ametsoc.org/doi/10.1175/MWR-D-12-00281.1">Probabilistic Weather Prediction with an Analog Ensemble</a></li>
</ul>]]></content><author><name>Weiming Hu</name></author><category term="document" /><summary type="html"><![CDATA[Schematic Diagram]]></summary></entry><entry><title type="html">Search Space Extension with RAnEn</title><link href="https://analogensemble.ddns.net/AnalogsEnsemble/2018/11/24/search-extension.html" rel="alternate" type="text/html" title="Search Space Extension with RAnEn" /><published>2018-11-24T00:00:00+00:00</published><updated>2018-11-24T00:00:00+00:00</updated><id>https://analogensemble.ddns.net/AnalogsEnsemble/2018/11/24/search-extension</id><content type="html" xml:base="https://analogensemble.ddns.net/AnalogsEnsemble/2018/11/24/search-extension.html"><![CDATA[<!-- vim-markdown-toc GitLab -->

<ul>
  <li><a href="#introduction">Introduction</a></li>
  <li><a href="#access">Access</a></li>
</ul>

<!-- vim-markdown-toc -->

<h2 id="introduction">Introduction</h2>

<p>This article demonstrates how to use the search space functionality within the <code class="language-plaintext highlighter-rouge">RAnEn</code> package. If you haven’t done so, please read <a href="https://weiming-hu.github.io/AnalogsEnsemble/2018/11/04/demo-1-RAnEn-basics.html">the instructions for basic usage of <code class="language-plaintext highlighter-rouge">RAnEn</code></a> first. This article skips the part that has been covered in the previous article.</p>

<p>The classic <code class="language-plaintext highlighter-rouge">AnEn</code> technique searches for the most similar historical foreasts at its current location. Therefore, only forecasts from the current station/grid point will be traversed and compared. This search style is referred to as the <em>Independent Search (IS)</em>. Another possible search style is extended search, which is referred to as <em>Search Space Extension (SSE)</em>. It simply indicates that forecasts at nearby stations/grid points are included in the search process. As a result, the search space is significantly larger when using the search space extension.</p>

<p>There are currently two ways to define what nearby locations to be included into the search. Users can set the nearest number of neighbors to be included and/or a distance threshold. The two restraints can be used together.</p>

<p>You will learn how to use these functions:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">generateAnalogs</code></li>
</ul>

<h2 id="access">Access</h2>

<p>This tutorial can be accessed on binder. Please click <a href="https://mybinder.org/v2/gh/Weiming-Hu/AnalogsEnsemble/master?urlpath=rstudio">here</a> to start an interactive session and go over the tutorial under <code class="language-plaintext highlighter-rouge">RAnalogs/examples</code>. Or you can download the repository and use the <a href="https://github.com/Weiming-Hu/AnalogsEnsemble/blob/master/RAnalogs/examples/demo-2_search-extension.Rmd">R markdown file</a>.</p>]]></content><author><name>Weiming Hu</name></author><category term="tutorial" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Basics of RAnEn</title><link href="https://analogensemble.ddns.net/AnalogsEnsemble/2018/11/04/demo-1-RAnEn-basics.html" rel="alternate" type="text/html" title="Basics of RAnEn" /><published>2018-11-04T00:00:00+00:00</published><updated>2018-11-04T00:00:00+00:00</updated><id>https://analogensemble.ddns.net/AnalogsEnsemble/2018/11/04/demo-1-RAnEn-basics</id><content type="html" xml:base="https://analogensemble.ddns.net/AnalogsEnsemble/2018/11/04/demo-1-RAnEn-basics.html"><![CDATA[<!-- vim-markdown-toc GitLab -->

<ul>
  <li><a href="#introduction">Introduction</a></li>
  <li><a href="#access">Access</a></li>
</ul>

<!-- vim-markdown-toc -->

<h2 id="introduction">Introduction</h2>

<p>This article walks you through the basic usage of the <code class="language-plaintext highlighter-rouge">RAnEn</code> library. This exercise uses short-term surface temperature forecasts as an example. Recommend using <a href="https://mybinder.org/v2/gh/Weiming-Hu/AnalogsEnsemble/master?urlpath=rstudio">binder</a> and the corresponding <code class="language-plaintext highlighter-rouge">.Rmd</code> file will guide you through the script line by line.</p>

<p>You will learn how to use these functions:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">generateConfiguration</code></li>
  <li><code class="language-plaintext highlighter-rouge">generateAnalogs</code></li>
  <li><code class="language-plaintext highlighter-rouge">verify*</code> functions</li>
</ul>

<h2 id="access">Access</h2>

<p>This tutorial can be accessed on binder. Please click <a href="https://mybinder.org/v2/gh/Weiming-Hu/AnalogsEnsemble/master?urlpath=rstudio">here</a> to start an interactive session and go over the tutorial under <code class="language-plaintext highlighter-rouge">RAnalogs/examples</code>. Or you can download the repository and use the <a href="https://github.com/Weiming-Hu/AnalogsEnsemble/blob/master/RAnalogs/examples/demo-1_AnEn-basics.Rmd">R markdown file</a>.</p>]]></content><author><name>Weiming Hu</name></author><category term="tutorial" /><summary type="html"><![CDATA[]]></summary></entry></feed>