Rendering 3D Graphics: Mathematical Preliminaries
In this series of articles, we'll walk through a basic, didactic software rendering scheme for three-dimensional graphics. We assume familiarity with high school geometry and vectors. In this first part, we will explore the mathematical preliminaries. Afterwards, we will implement the renderer using javascript canvas. ### Why do distant objects appear small? This is a fundamental question that we must consider. An object very far away from us appears small, and it increases in size as it approaches us. Why is that, and can we specify how its observed size behaves as a function of its distance from us? Imagine yourself in a one-dimensional city with two buildings, of equal height, at different distances from you (Fig. 1). @@[email protected]@(/img/3d-graphics/building_perspective1.png)(Illustration of perspective)[Fig. 1: two buildings, with the same height, at different distances from the observer $O$. The angle $A_1OB_1$ of the near building is larger than the angle $A_2OB_2$ of the far building.] If we look at the angles $A_iOB_i$ made by the top of a building, the observer, and the bottom of the building, we see that the angle is smaller when the building is further away. This observation is important, but there is still an element missing. We have no notion of "observed height" in the field of vision. We get this if we put a screen somewhere in between the observer and the object being observed (Fig. 2). @@[email protected]@(/img/3d-graphics/building_perspective2.png)(Illustration of perspective with screen)[Fig. 2: defining "observed height" by the position where the dashed lines intersect the screen $S$.] Now we can pose the question more clearly: where does a line (i.e. a ray of light) between the observer and the observed object intersect the screen? One simple way of answering the question is to use similar triangles. Let $h$ be the height of the building, $d_0$ the distance between the observer and the screen, $d$ the distance between the screen and the base of the building, and $y$ the observed height of the building on the screen (Fig. 3). @@[email protected]@(/img/3d-graphics/building_perspective3.png)(Solving for observed height using similar triangles)[Fig. 3: solving for observed height using similar triangles.] Then we have $$\frac{y}{h}=\frac{d_0}{d_0+d},$$ and therefore $$\textrm{observed height}=y=d_0\frac{h}{d_0+d}.$$ This result shows that the observed height is proportional to the actual height, and it varies inversely with $d_0+d$. That distance is not the distance between observer and object (blue dashed line in Fig. 3), but rather the object's distance "inward", along the observer's line of sight (the horizontal axis). Likewise, $h$ is the distance of the object away from the line of sight, i.e. perpendicular to the horizontal axis. Put another way, $$\textrm{observed size}=d_0\cdot\frac{\textrm{distance away from line of sight}}{\textrm{distance along line of sight}}. \label{screen_scaling}$$ The parameter $d_0$, which is totally free, has an impact on the way sizes are perceived. We will explore the meaning of $d_0$ later. ### Generalizing Now that we understand a bit about perspective, let's leave flatland and consider the problem in three dimensions. I will refer to the observer-screen combination as the "camera". The camera has several important parameters, such as its location, the direction it's pointing, the direction that the observer considers "up", the internal distance $d_0$, its width and height, etc. In what follows, we will denote the position of the observer by ${\bf p}$, the orientation vector of the camera by ${\bf\hat{c}}$ (this is a unit normal to the screen, pointing away from the observer), and the unit vector in the "up" direction by ${\bf\hat{u}}$. You should convince yourself that the position and orientation of the camera are not sufficient to uniquely specify a viewing situation; the up vector is also needed. However, given those two, we can uniquely define a "right" unit vector, ${\bf\hat{g}}={\bf\hat{c}}\times{\bf\hat{u}}$ via the right-hand rule. The "camera", just described, is depicted in Fig. 4. @@[email protected]@(/img/3d-graphics/camera_in_3d_1.png)(Illustration of camera)[Fig. 4: illustration of the "camera". The observer is at the tip of the pyramid and the screen is the blue rectangle. The camera's orthonormal basis $\{{\bf\hat{c}},{\bf\hat{u}},{\bf\hat{g}}\}$ is also illustrated.] Suppose we are looking at a tree and we want to determine where the leaf at ${\bf r}$ should appear on the screen. We draw Fig. 5. @@[email protected]@(/img/3d-graphics/camera_in_3d_2.png)(Camera pointed at a tree)[Fig. 5: the camera pointed at a tree, with some relevant vectors marked. The origin ${\bf O}$ is arbitrarily chosen.] With a bit of thought, we see that this maps directly onto the problem that we solved in one dimension. We must determine the distance along the camera axis between the observer and the leaf (this was $d_0+d$) as well as the distance of the leaf off the camera axis (this was $h$). The distance off the axis is unambiguously defined as the minimal distance between the leaf and the camera axis; this is equivalent to the dashed red line segment in Fig. 5 meeting the camera axis in a perpendicular. Thus the distance along the camera axis is however much of ${\bf r}-{\bf p}$ goes along ${\bf\hat{c}}$, i.e. the projection $$\textrm{distance along camera axis}=({\bf r}-{\bf p})\cdot{\bf\hat{c}}.$$ The vector pointing from the camera axis to the leaf (the dashed red line segment) is then whatever's left over, so $$\textrm{vector from camera axis to object}=({\bf r}-{\bf p})-\left(\left({\bf r}-{\bf p}\right)\cdot{\bf\hat{c}}\right){\bf\hat{c}},$$ which is as computationally unwieldy as it looks. Fortunately, we don't care as much about that length as we do about how it breaks down into components on the screen. @@[email protected]@(/img/3d-graphics/leaf_coordinates.png)(Coordinates in a plane parallel to the screen)[Fig. 6: illustrating Cartesian coordinates in a plane parallel to the screen. The red dashed line segment is the same as in Fig. 5, connecting the camera axis to the leaf.] In Fig. 6, we consider a plane parallel to the screen which contains the leaf (the green dot). We want to compute its Cartesian coordinates in this plane (taking the origin to be $(0,0)$, right and up being the positive directions). From those quantities, it will be a simple matter to scale them down to screen coordinates. But these coordinates are just projections. The $y$ coordinate is the component of the red vector in the ${\bf u}$ direction: $$y=\left[({\bf r}-{\bf p})-\left(\left({\bf r}-{\bf p}\right)\cdot{\bf\hat{c}}\right){\bf\hat{c}}\right]\cdot{\bf\hat{u}}=({\bf r}-{\bf p})\cdot{\bf\hat{u}}, \label{leaf_y}$$ where we make use of the fact that ${\bf\hat{c}}\cdot{\bf\hat{u}}=0$. Likewise, because ${\bf\hat{c}}\cdot{\bf\hat{g}}=0$, we have $$x=({\bf r}-{\bf p})\cdot{\bf\hat{g}}. \label{leaf_x}$$ Finally we can write down the screen coordinates of the leaf. Again, we take the center of the screen to be the origin of our coordinates, and we take the positive directions to be up and to the right. By Eqns. (\ref{screen_scaling}), (\ref{leaf_y}) and (\ref{leaf_x}), we have $$\textrm{screen x coordinate}=d_0\frac{({\bf r}-{\bf p})\cdot{\bf\hat{g}}}{({\bf r}-{\bf p})\cdot{\bf\hat{c}}}$$ and $$\textrm{screen y coordinate}=d_0\frac{({\bf r}-{\bf p})\cdot{\bf\hat{u}}}{({\bf r}-{\bf p})\cdot{\bf\hat{c}}}.$$ ### Summing up We now have a straightforward way of implementing three-dimensional rendering. We imagine our scene taking place in $\mathbb{R}^3$. There's a camera with some parameters, most importantly $$\begin{eqnarray*} {\bf\hat{c}} =& \textrm{unit vector along camera's axis,}\\ {\bf\hat{u}} =& \textrm{unit vector along camera's up direction,}\\ {\bf\hat{g}} =& \textrm{unit vector along camera's right direction,}\\ {\bf p} =& \textrm{position of observer}. \end{eqnarray*}$$ Then an object at ${\bf r}$ will appear on the screen at the coordinates $$(x,y)=d_0\left(\frac{({\bf r}-{\bf p})\cdot{\bf\hat{g}}}{({\bf r}-{\bf p})\cdot{\bf\hat{c}}},\frac{({\bf r}-{\bf p})\cdot{\bf\hat{u}}}{({\bf r}-{\bf p})\cdot{\bf\hat{c}}}\right),$$ taking the center of the screen to be the origin, with positive $x$ to the right and positive $y$ up. In [the next part of this series](./rendering-3d-in-under-100-lines-js), we will implement this renderer using HTML5 canvas.