Einstein’s equation $$R_{\mu\nu}-\frac{1}{2}Rg_{\mu\nu}=8\pi GT_{\mu\nu}$$ takes the place of (1) the inverse square gravity law and (2) Newton’s second law. It relates the curvature of spacetime (LHS) to the effect it has on matter (RHS).
Free particles move along geodesics in spacetime.
“Special relativity: the theory of spacetime in the absence of gravity/curvature”.
A particle’s worldline is its four-dimensional trajectory $x^\mu(\tau)$.
Simultaneity: in Newtonian physics, there is a notion of “all space at a given time”, a cross-section in time. In relativity, there is not.
An interval between two spacetime events is timelike if the spatial distance could have been traversed in that time by something travelling at speeds less than $c$ (inside light cone). An interval is spacelike if the distance is too great to have been travelled by light in that time (outside light cone). An interval is null if light could travel between the events in that time (on light cone).
The interval is defined as (with $c=1$) $$(\Delta s)^2=-(\Delta t)^2+(\Delta x)^2+(\Delta y)^2+(\Delta z)^2$$ which is negative for timelike intervals. The proper time is defined as $$(\Delta\tau)^2=-(\Delta s)^2.$$ Claim: “the proper time between two events measures the time elapsed as seen by an observer moving on a straight path between the events.” If the two events have the same spatial coordinates, then $\Delta\tau=\Delta t$. Under more general circumstances, the proper time will not coincide with the coordinate time.
The metric is therefore chosen to be $$\eta_{\mu\nu}={\rm diag}(-1,1,1,1).$$
Nice discussion of the twin paradox in terms of proper time. The proper time is a property of the entire worldline, not just a function of its endpoints. A trajectory with no spatial variation gives maximal proper time: it never benefits from subtracting any $(\Delta {\bf x})^2$. The twin that stays home therefore experiences a longer proper time although the two worldlines coincide at the endpoints.
The infinitesimal interval is given by $ds^2=\eta_{\mu\nu}dx^\mu dx^\nu$. As usual, the notation $ds^2$ really refers to the “square of the infinitesimal $ds$” as opposed to $d(s^2)$. To integrate the interval over a path, parametrize the path as $x^\mu(\lambda)$, and then write $$\Delta s=\int ds=\int d\lambda\sqrt{\eta_{\mu\nu}\frac{dx^\mu}{d\lambda}\frac{dx^\nu}{d\lambda}}.$$ When the infinitesimal intervals are spacelike, the integrand is real. When the infinitesimal intervals are timelike, we can write $$\Delta\tau=\int d\lambda\sqrt{-\eta_{\mu\nu}\frac{dx^\mu}{d\lambda}\frac{dx^\nu}{d\lambda}}.$$ When the infinitesimal intervals are null, the integral is zero.
The definition of an inertial frame is somewhat ill-motivated (like much of this section) and hard to follow. My understanding: we’ll call a coordinate system inertial according to the following: a beam of light is sent from point $A$ to point $B$ at time $t_1$ and reflected from $B$ back to $A$. The light reaches $B$ at time $t_2$ and arrives back at $A$ at time $t_3$. The times $t_1$ and $t_3$ are measured at $A$ while the time $t_2$ is measured at $B$. If $t_2=\frac{1}{2}(t_1+t_3)$, then the frame is called inertial.
Hobson defines an inertial frame as one in which Newton’s first law holds: a free particle is at rest or moves with a constant velocity. It’s not clear to me how these two definitions are equivalent.
A Lorentz transformation is a change of coordinates which leaves the interval $(\Delta s)^2$ between two events invariant.
Carroll writes a spatial transformation as $$x^\mu\to x^{\mu'}=\delta_\mu^{\mu'}(x^\mu+a^\mu),$$ with a prime on the index, so as to emphasize that the geometric quantity $x$ has not changed. Rather, our coordinates have changed. Why is the lower index of $\delta$ not pushed to the right?
What is the general criterion for a matrix $\Lambda$, with $$x^{\mu'}={\Lambda^{\mu'}}\nu x^\nu,$$ to not modify the interval? We must have $$(\Delta s)^2=(\Delta x)^T\cdot\eta\cdot(\Delta x)\stackrel{!}{=}(\Delta x')^T\cdot\eta\cdot(\Delta x')=(\Delta x)^T\Lambda^T\cdot\eta\cdot\Lambda(\Delta x)$$ for all $\Delta x$. Then the criterion on $\Lambda$ is that conjugation of $\eta$ by it does not change $\eta$. With indices: $$\eta{\rho\sigma}={\Lambda^{\mu'}}\rho{\Lambda^{\nu'}}\sigma\eta_{\mu'\nu'}$$ is the condition on $\Lambda$ for it to be a Lorentz transformation.
The above condition for Lorentz transformations is analogous to rotation matrices (in $O(n)$) which preserve length: $I=R^T IR$ with $R\in O(n)$. The Lorentz group can be referred to as $O(3,1)$ reflecting the signature of the metric $\eta$.
From $$-1=\eta_{00}={\Lambda^{\mu'}}_0{\Lambda^{\nu'}}_0\eta_{\mu'\nu'}=-({\Lambda^{0'}}_0)^2+({\Lambda^{1'}}_0)^2+({\Lambda^{2'}}_0)^2+({\Lambda^{3'}}_0)^2,$$ we see that $$|{\Lambda^{0'}}_0|\ge 1.$$ Those $\Lambda$ with ${\Lambda^{0'}}_0\le-1$ feature a reversal in time. Why? $x^{0'}={\Lambda^{0'}}_\nu x^\nu$ so the new time coordinate is related to the old by $t'=A-Bt$ for some values $B\ge 0$. The Lorentz transformations $\Lambda$ with $\det\Lambda=1$ (proper) and ${\Lambda^{0'}}_0\ge 1$ (orthochronous) form the “proper orthochronous Lorentz group”.
Why is the “proper orthochronous Lorentz group” a group? Follow the argument given here: Put $\Lambda$ in block form $$\Lambda=\begin{pmatrix}a&b^T\c&R\end{pmatrix},$$ where $a={\Lambda^{0'}}0$ is $1\times1$, $b,c\in\mathbb{R}^3$ and $R\in{\rm Mat}{3\times3}(\mathbb{R})$. Now we have $\Lambda^T\eta\Lambda=\eta$ and the transpose of that equation, $\Lambda\eta\Lambda^T=\eta$ because $\eta^T=\eta$. Computing the top-left element in either case, we find that $b^Tb=c^Tc=a^2-1$.
Now compute a product $\Lambda=\Lambda_1\Lambda_2$ with $M,N$ proper orthochronous Lorentz transformations. The top-left component is ${\Lambda^0}_0=a_1 a_2+b_1^T c_2$, and we want to show that ${\Lambda^0}_0\ge 1$. By the Cauchy-Schwarz inequality, we have $${\Lambda^0}_0-a_1a_2\ge -|b_1||c_2|=\sqrt{b_1\cdot b_1}\sqrt{c_2\cdot c_2}=\sqrt{a_1^2-1}\sqrt{a_2^2-1}.$$ Considering the function $f(a_1,a_2)=a_1 a_2-\sqrt{a_1^2-1}\sqrt{a_2^2-1}$, we find that it is stationary ($\partial f/\partial a_1=\partial f/\partial a_2=0$) where $a_1=a_2$ (restricting to the orthochronous region, $a_1,a_2\ge 1$). Then $f(a_1,a_2)\ge f(a_1,a_1)=a_1^2-(a_1^2-1)=1$. We have ${\Lambda^0}_0\ge f(a_1,a_2)\ge 1$, as desired.
A boost along $x$ looks like $${\Lambda^{\mu'}}_\nu=\begin{pmatrix}\cosh\phi&-\sinh\phi\-\sinh\phi&\cosh\phi\end{pmatrix},$$ and its effect is the coordinate change $$t'=t\cosh\phi-x\sinh\phi,$$ $$x'=-t\sinh\phi+x\cosh\phi.$$ Viewed from the original frame $(t,x)$, the point $x'=0$ is moving: $x=(\tanh\phi)t$.
In the $xt$-plane, the boosted axes $x'=0$ and $t'=0$ are tilted inwards towards the line $x=t$, which is also the line $x'=t'$. I guess the coincidence of $x=t$ and $x'=t'$ reflects the fact that light travels the same speed in both frames.
Carroll describes his approach to SR as an “abstract approach”: in truth it is just lacking motivation and justification. I guess that makes it abstract.
A vector $A$ can be expanded as $A=A^\mu e_{(\mu)}$ in a basis $e_{(\mu)}$. The values $A^\mu$ are the components relative to the basis, but the vector $A$ has a geometric meaning independent of coordinates. Carroll’s notation $e_{(\mu)}$, with the parentheses, serves to indicate that this is the $\mu$-th element of a list of vectors, as opposed to the $\mu$-th component of a vector $e$.
Consider a trajectory $x^\mu(\lambda)$; the tangent vector $V(\lambda)$ has coordinates $V^\mu=\frac{dx^\mu}{d\lambda}$. He claims that, under a Lorentz transformation $\Lambda$, we have $V^\mu\to V^{\mu'}={\Lambda^{\mu'}}_\nu V^\nu$. While the components of $x$ transform that way, this still isn’t obvious. The vector $V$ is unchanged by the Lorentz transformation; only its components transform.
Under Lorentz transform $\Lambda$, a set of basis vectors transform as $$e_{(\nu')}={\Lambda^\mu}_{\nu'}e_{(\mu)}$$ where ${\Lambda^\mu}_{\nu'}$ is the inverse of ${\Lambda^{\nu'}}_\mu$ in the sense that $${\Lambda^\mu}_{\nu'}{\Lambda^{\nu'}}_\rho=\delta^\mu_\rho.$$ Thus, the basis vectors transform according to the inverse of the transform. This is worked out in the book. The takeaway point is that things with lower indices transform as the inverse, so that an upper-lower contraction will be Lorentz-invariant.
I am switching books at this point. I find the discussions in Carroll to be largely devoid of content. Everything is taken on faith or defined with no motivation; there are no justifications given for anything. This is a book written by somebody who does not have a clear understanding of the subject.