Panorama180 Movement demo : Algorithm

Reproduction of movable space using "Spatial cache" and "Spatial interpolation".
Developer : ft-lab.
Date : 04/25/2019 - 04/30/2019

Back
Japanese

This document was translated into English with Google Translate.

In the beginning

When using a 180 degree panorama-3D image such as the VR 180, the player can turn around the camera on the VR but can not move it.
This is 3DoF operation.
It is possible to move in a specific direction on the VR space using this panoramic image,
perform calculations with as little load as possible, and enable limited 6DoF operation.

Here, stereo that considers parallax in the 180 degree panorama of the hemisphere is called "180 degree panorama-3D".


Here, the coordinate system will be described using a left-handed system in Unity (+X to the right of the screen, +Y to the top of the screen, +Z to the forward of the screen).

Overview

Prepare a plurality of 180 degree panorama-3D images and depth images, and perform processing to connect them in a smooth manner.
The 180 degree panorama-3D image has position and front direction (rotation angle around Y axis) information in 3D space.
We will call these multiple "Resources" as "Spatial cache" here.


The panoramic image which interpolated the information of "Spatial cache" from the position on the scene is displayed in real time using GPU.
We will call this interpolation "Spatial interpolation" here.
We will discuss this "Spatial cache" and "Spatial interpolation".

Spatial cache

Multiple sampling of "180 degree panorama-3D image" and "180 degree panorama-3D depth image" on 3D space.
This process was performed in the space expressed by 3DCG.
The following resources are required:
Moves the camera linearly at regular intervals and holds this information.
For panoramic view as VR, a resolution of 4096 x 2048 pixels or more is desirable for "180 degree panorama-3D image (RGB)".
Panorama RGB images are saved in jpeg format.
"180 degree panorama-3D image (Depth)" does not matter if it is not as large as an RGB image.
In this case, we will hold Depth at a resolution of 1024 x 512 pixels, which is one-fourth the size of an RGB image.

Store the depth

One pixel in the depth image is converted to a real number of 0.0 to 1.0 and stored using the following formula.
Let zDist be the distance from the camera to the view coordinates, nearPlane be the distance to the near clip plane of the camera, and farPlane be the distance to the far clip plane.
depth = (zDist < farPlane) ? ((zDist - nearPlane) / zDist) : 1.0;
It becomes non-linear and becomes white on the whole.
Due to this bias, saving files with the OpenEXR (exr) format with the compression option has the effect of reducing the file size.
Also, since the depth value is used to determine depth, antialiasing between pixels should not be applied.

Camera center position and rotation

Holds the center position and rotation information of the camera when creating a 180 degree panorama-3D image (RGB/depth).
The figure below is looking down from directly above.
c1/c2/c3 is the camera center position, and r1/r2/r3 is the camera rotation angle.

The gaze direction of the camera is expressed by rotation around the Y axis and horizontal to the XZ plane.
Keep the rotation the same (r1=r2=r3) to save the image in a 180 degree panorama.
The above "Spatial cache" processing is pre-processing for storing resources.

Spatial interpolation

In the scene used as VR, the +Z direction is forward.
The initial rotation of the camera is (0, 0, 0).
Use "Spatial cache" as a resource and UV map Equirectangular's 180 degree panorama to a hemispheric part against a sphere as a background image.

At this time, most background images that are not at the camera position in "Spatial cache" are interpolated and reproduced in real time using Shader.
However, keep in mind that it is constrained on a straight line.
The camera position sampled as "Spatial cache" is c1/c2/c3.

Convert camera position of "Spatial cache"

Convert camera position information sampled by "Spatial cache" to a position on the VR scene.

Let the angle of Y rotation of the camera of "Spatial cache" be r1.
Let the position of the first camera in VR space be StPos, and c1 be the same position as StPos.
Rotate c1/c2/c3 around c1 about the Y axis by an angle of -r1.
// Matrix rotated by -r1 degrees around Y axis.
Matrix4x4 rotYMat = Matrix4x4.Rotate(Quaternion.Euler(0.0f, -r1, 0.0f));

// Convert camera position.
Vector3 basePos = c1;
c1 = rotYMat.MultiplyVector(c1 - basePos) + StPos;
c2 = rotYMat.MultiplyVector(c2 - basePos) + StPos;
c3 = rotYMat.MultiplyVector(c3 - basePos) + StPos;
With this conversion, the camera position in the scene sampled by "Spatial cache" and the orientation of the 180 degree panorama-3D image are converted to those on the "Spatial interpolation" scene.

The +Z direction of the panoramic image is the front.

Calculate the perpendicular position on the line corresponding to the position of VR-HMD

Let P be the world coordinate position of the camera (VR-HMD) in VR space.
Whether P is within the distance r from the line segment of c1-c2-c3 is determined by the perpendicular distance from P to the line segment.

If P is greater than the distance of r, skip background updates.
If P is within r, calculate whether c1-c2 or c2-c3 has a perpendicular leg(P').
If P' is between c1 and c2, pass the RGB texture and depth texture of the 180 degree panorama-3D image at the positions of c1 and c2 to Shader.

If c1 to c2 are expressed as numbers from 0.0 to 1.0, let B be the blend value at P'.
When B is 0.0, the panoramic image of c1 is adopted as it is. When B is 1.0, the panoramic image of c2 is adopted as it is.
Otherwise(0.0 < B < 1.0), it interpolates pixel by pixel. Most will be interpolated.

Preprocessing transformation of depth texture

Depth textures use rough resolutions.


By reducing the resolution, texture resources are reduced and flicker when VR-HMD position movement is performed is suppressed.
In addition, blurring has the effect of stabilizing the spatial interpolation during movement.

This conversion is preprocessed for each texture when importing the depth texture.
At this time, do not apply anti-aliasing between pixels.
Use the pixel values of the Depth texture to back calculate the Z distance in view coordinates.
Interpolating the Depth pixel values will result in non-existent distances.

When it is simply synthesized by blend value

It will not be smooth if it is simply synthesized according to the value of B when c1 to c2 is 0.0-1.0.

The calculation is as follows.
float4 col1 = tex2D(_Tex1, uv);     // Get the specified UV color with texture _Tex1 with camera c1.
float4 col2 = tex2D(_Tex2, uv);     // Get the specified UV color with texture _Tex2 with camera c2.
float4 col = lerp(col1, col2, B);   // Synthesized by blend value B (0.0-1.0).
lerp is a calculation of linear interpolation of "col = col1 * (1.0 - B) + col2 * B;".

Interpolate space considering depth

When drawing a panorama image UV-mapped to a sphere as Equirectangular as a background, the fragment shader will scan every pixel.
At this time, one direction of the Shader's 180 degree panorama can be calculated from the UV value.
float3 calcVDir (float2 _uv) {
  float theta = UNITY_PI2 * (_uv.x - 0.5);
  float phi   = UNITY_PI * (_uv.y - 0.5);
  float sinP = sin(phi);
  float cosP = cos(phi);
  float sinT = sin(theta);
  float cosT = cos(theta);
  float3 vDir = float3(cosP * sinT, sinP, cosP * cosT);
  return vDir;
}
UNITY_PI is a constant value of PI (3.141592 ...). UNITY_PI2 is (UNITY_PI * 2.0).
It calculates direction vector from polar coordinates.

Throw rays from cameras c1 and c2 using the direction vector vDir calculated from UV, and calculate the intersection position in world coordinates with reference to Depth.
"Throw rays" here does not mean shooting as ray tracing,
but it is the process of calculating UV from the direction vector and calculating the depth value at that position.

To calculate UV from world position, do the following:
It will be the reverse of the operation of "one direction of the Shader's 180 degree panorama can be calculated from the UV value" mentioned above.
float2 calcWPosToUV (float3 wPos, float3 centerPos) {
  float3 vDir = normalize(wPos - centerPos);
  float sinP = vDir.y;
  float phi = asin(sinP);    // from -PI/2 to +PI/2 (-90 to +90 degrees).
  float cosP = cos(phi);
  if (abs(cosP) < 1e-5) cosP = 1e-5;
  float sinT = vDir.x / cosP;
  float cosT = vDir.z / cosP;
  sinT = max(sinT, -1.0);
  sinT = min(sinT,  1.0);
  cosT = max(cosT, -1.0);
  cosT = min(cosT,  1.0);
  float a_s = asin(sinT);
  float a_c = acos(cosT);
  float theta = (a_s >= 0.0) ? a_c : (UNITY_PI2 - a_c);
			
  float2 uv = float2((theta / UNITY_PI2) + 0.5, (phi / UNITY_PI) + 0.5);
  if (uv.x < 0.0) uv.x += 1.0;
  if (uv.x > 1.0) uv.x -= 1.0;
  return uv;
}
Depth values can be obtained using UV from the depth texture.
float depth1 = tex2D(_TexDepth1, uv).r;
float depth2 = tex2D(_TexDepth2, uv).r;
_TexDepth1 is the Depth texture at the camera position of c1.
_TexDepth2 is the Depth texture at the camera position of c2.
This is the same procedure as getting colors from RGB textures. However, for depth value acquisition, do not perform linear interpolation between pixels.
The value of 1 pixel in the Depth texture contains a value between 0.0 and 1.0, which is converted to Z distance in view coordinates by the following formula.
float zDist1 = (depth1 >= 0.99999) ? farPlane : (nearPlane / (1.0 - depth1));
float zDist2 = (depth2 >= 0.99999) ? farPlane : (nearPlane / (1.0 - depth2));
The world coordinate position is calculated from the Z distance.
float3 wPos1 = (vDir * zDist1) + c1;
float3 wPos2 = (vDir * zDist2) + c2;
The following is a function that calculates the collision position in world coordinates taking into account the depth of a specified camera from the UV value.

/**
 * Calculate colliding world coordinate position from UV position and direction vector.
 * @param[in] depthTex  Depth texture.
 * @param[in] uv        UV value.
 * @param[in] cPos      Center of the camera in world coordinates.
 * @param[in] vDir      Direction vector.
 */
float3 calcUVToWorldPos (sampler2D depthTex, float2 uv, float3 cPos, float3 vDir) {
  float depth = tex2D(depthTex, uv).r;
			
  // Convert from depth value to distance from camera.
  depth = (depth >= 0.99999) ? farPlane : (nearPlane / (1.0 - depth));
  depth = min(depth, farPlane);

  // World coordinate position where it collided.
  return (vDir * depth) + cPos;
}
When wPos1 to wPos2 are connected in a straight line (in case of a linear transition), the intersection position at position B can be calculated as follows.
float3 wPosC = lerp(wPos1, wPos2, B);

It is unknown whether this wPosC is a correct value at this stage.
At this time, to check whether wPosC is correct, it is judged whether wPosC exists on the line segment extended in the direction of vDir from the position of B and is visible from the camera c1 or c2.
float3 wPosC0 = lerp(c1, c2, B);  // World position of B on c1-c2 straight line.

// Convert wPosC in world coordinates to UV value with camera of c1.
float2 newUV1 = calcWPosToUV(wPosC, c1);

// Convert wPosC in world coordinates to UV value with camera of c2.
float2 newUV2 = calcWPosToUV(wPosC, c2);

// Calculate each world coordinate position from UV value.
float3 wPosA = calcUVToWorldPos(_TexDepth1, newUV1, c1, normalize(wPosC - c1));
float3 wPosB = calcUVToWorldPos(_TexDepth2, newUV2, c2, normalize(wPosC - c2));

float angle1 = dot(normalize(wPosA - wPosC0), vDir);
float angle2 = dot(normalize(wPosB - wPosC0), vDir);
If angle1 and angle2 calculated here are infinitely close to 1.0,
it means that wPosC is a collision that is on the straight line of vDir and is visible from the camera position of c1 or c2.
In other words, it is possible to obtain the color of the pixel from the UV values of newUV1 and newUV2 calculated here.


// Let RGB texture in c1 be _Tex1, RGB texture in c2 be _Tex2.
// Get pixel color from calculated UV.					
float4 col1 = tex2D(_Tex1, newUV1);
float4 col2 = tex2D(_Tex2, newUV2);
float4 col = float4(0, 0, 0, 1);
if (angle1 > 0.99999 && angle2 > 0.99999) {
  col = lerp(col1, col2, B);
}

It is not always linear during wPos1-wPos2.
If there is unevenness as shown below,
it can be confirmed that wPosA and wPosB calculated in consideration of Depth do not exist on the straight line extended in the vDir direction from wPosC0.


Also, there are cases where only wPosA seen from the camera of c1 exists on a straight line extended in the direction of vPos from wPosC0 (or vice versa).

In this case, there is sampling information that can be seen from the camera of c1, but it can not be seen from the camera of c2.
In this case, the pixel color is adopted from the RGB texture in the c1 camera.
// Get pixel color from calculated UV.						
float4 col1 = tex2D(_Tex1, newUV1);
float4 col2 = tex2D(_Tex2, newUV2);
float4 col = float4(0, 0, 0, 1);
if (angle1 > 0.99999 && angle2 <= 0.99999) {
  col = col1;
} else if (angle2 > 0.99999 && angle1 <= 0.99999) {
  col = col2;
}
In summary, the judgment of whether the estimated intersection position wPosC is correct has the following four patterns.
These are patterns that work well when wPos1 to wPos2 transition linearly,
assuming that a collision from the camera c1 in the vDir direction is wPos1, and a collision from the camera c2 in the vDir direction is wPos2.
Most of the pixel-by-pixel spatial interpolations can infer the correctly interpolated pixel colors in this way.
The following shows the error part in red.


Let's adjust the pixel that got an error a little more.
Using the Z distance calculated from the Depth value used when calculating wPos1 and wPos2, calculate the smaller distance (minDepth) and the larger distance (maxDepth).
Calculate the position on the straight line in the direction of vDir from wPosC0, and set it as wPosC1 and wPosC2.

Calculate whether this wPosC1 and wPosC2 are seen from the camera of c1 or c2.
In the case of the above figure, wPosC2 is closer to the desired value than wPosC.
wPosC checks if the estimated value is correct, wPosC1 checks if the estimated value is correct, and wPosC2 checks if the estimated value is correct,
it can be confirmed that the error location (red pixel) is reduced.


Since there are few remaining pixels in error,
when simply combining two pixel colors obtained from the calculated UV value with the value of B, it becomes as follows.

With the above, it has been confirmed that if there are RGB/Depth textures from two cameras, spatial interpolation can be performed to some extent.
If the camera position of the panoramic image sampled by "Spatial cache" is far (when the number of samplings is insufficient), errors will increase.

Alternatively, if the pixel-to-pixel transition as the camera moves from c1 to c2 is not linear, it may be conceivable to create a rough look-up table (texture) by preprocessing and reference it.
In addition, although linear interpolation from two cameras is used in this explanation, it may be possible to increase the freedom of movement by interpolation from two or more cameras.

Source code

The following links are available as Unity project.
https://github.com/ft-lab/Unity_Panorama180Movement

Used assets when creating resources

The captured image on this page uses the following assets from the Unity Asset Store:
I used "Asia-Pacific Common Residential Theme Pack" as a rendering scene of panorama images.
Generation of the 180-degree panoramic images was output using "Panorama180 Render"(ver.1.0.2).