Intro
OCI Quick Start repositories on GitHub are collections of Terraform scripts and configurations provided by Oracle. These repositories are designed to help organizations quickly deploy common infrastructure setups on the OCI Platform. Each Quick Start focuses on a specific use case or workload, simplifying the process of provisioning on OCI using Terraform—a sort of IaC-based reference architecture.
Today, we will code review one of those reference architectures, which is a Fortinet firewall solution deployed in OCI.
Note: This article won’t discuss the architecture, but will rather address its Terraform code flaws and fixes.
Why Some Errors Never Get to Your OCI Resource Manager Stack
Certain Terraform errors may not reach your RM stack due to its design. For instance, RM allows the hardcoding of specific variables, like availability domains, directly in its interface. This sidesteps the need for these variables to be checked by native conditions in the TF code.
Moreover, RM reads these variables from the schema.yaml
file, altering the behavior compared to local Terraform CLI execution. This approach can result in certain errors being handled or bypassed within the RM environment, creating a distinction from standard Terraform workflows.
The Stack: FortiGate HA Cluster using DRG – Reference Architecture
The stack is a result of the collaboration of both Oracle and Fortinet. This architecture is based on a Hub & Spoke topology, using FortiGate firewall from OCI Marketplace. I actually deployed it while working on one of my projects.
For details of the architecture, see Set up a hub-and-spoke network topology.
The Repository
You will find this Terraform configuration under the main OCI-Fortinet GitHub repository, but not in the root directory. The folder in question is drg-ha-use-case
under: oracle-quickstart/oci-fortinet/use-cases/drg-ha-use-case
.
The Errors
At the time of writing this, the errors were still not fixed despite opening issues and sharing the fix. You can see that the last commit goes back to 2 years. You will need to clone the repo and navigate to the drg-ha-use-case
subdirectory:
$ git clone https://github.com/oracle-quickstart/oci-fortinet.git
$ cd oracle-quickstart/oci-fortinet/use-cases/drg-ha-use-case
$ terraform init
1. Data Source Error in Regions with Unique AD
You will face this issue in a region with only one availability domain (e.g., ca-toronto-1
) as the data source of the availability domain will fail the Terraform execution plan.
CAUSE: See issue #8
- In the above error, Terraform complains about the availability data source having only one element. This impacts two of the
oci_core_instance
resource blocks (2 web VMs, 2 DB VMs).- File:
compute.tf
- Lines: 235 & 276
Problem:
The
count.index
for the data source block will always be equal to 0 in single AD regions (1 element).- File:
data_source.tf
- Lines: 8-10
This configuration hasn’t been tested in single AD regions.
$ vi data_source.tf
# —— Get list of availability domains
data “oci_identity_availability_domains” “ADs” {
compartment_id = var.tenancy_ocid
}
…Reason:
In Terraform, the
count.index
always starts at 0. If you have a resource with a count of 4, thecount.index
object will be 0, 1, 2, and 3.Let’s take, for example, the “web-vms”
oci_core_instance
block incompute.tf
at line 235: - File:
If we run the condition:
- The variable
availability_domain_name
is empty. - The
ads
data source length is 1 element. That means that the AD name will be equal toads
data source collection with an index value of[0+1] = 1
.
data.oci_identity_availability_domains.ADs.availability_domains[1]
doesn’t exist as it only contains 1 element.
Solution
Complete the full availability domain conditional expression on line 235 and line 276 (web-vms
/db-vms
). Add the case where the data source ads.availability_domains
has 1 element (the region has one AD only).
Bad Logic
Seeking the name of the count.index+1
availability domain is still wrong when the region has more than 1 AD. For example, say you want to create 3 VMs and your region has 2 Availability Domains:
- The first iteration
[0]
will setcount.index+1 = 1
(2nd data source element = AD2). - Then the second iteration sets
count.index+1 = 2
(3rd data source element = AD3).
The 2nd and 3rd iteration will always fail because there are only 2 ADs (index list [0,1]).
2. Wrong Compartment Argument in the Security List Data Sources
Another issue you will run into is a failure to deploy subnets due to the data source collection being empty (no element).
Solution
This was a silly mistake, but it took me a day to figure out while delving through a pile of new Terraform files. All you need to do is replace the compute compartment variable with var.network_compartment_ocid
.
Edit network.tf
lines 64-74:
# —— Get the Allow All Security Lists for Subnets in Firewall VCN
data “oci_core_security_lists” “allow_all_security” {
compartment_id = var.network_compartment_ocid # CORRECT Compartment
vcn_id = local.use_existing_network ? var.vcn_id : oci_core_vcn.hub.0.id
…
}
3. More Code Inconsistencies
I wasn’t done debugging as I found other misplaced compartment variables in some VNIC attachments data sources.
- File:
datasource.tf
- Lines: 103-115 & 118-130
You need to replace them with var.compute_compartment_ocid
.
Conclusion & Recommendations
This type of undetected code issue is why I never trust the first deployment in Resource Manager. To avoid problems in the future, especially if you decide to migrate out of RM at some point, I suggest the following workflow:
- Run locally and validate any code bugs.
- Run on Resource Manager.
- Store to a git repo (blueprint with eventual versioning).
I hope this was helpful as the issues I opened have remained unsolved for over a year in their GitHub repo.