# Processing and memory requirements for a 3G WCDMA basestation baseband solution

Daniel Wiklund

Dept. of Electrical Engineering Linköping University S-581 83 Linköping, Sweden danwi@isy.liu.se

## ABSTRACT

The WCDMA standard is the main system used for third generation mobile communications in Europe. The basestation is the central node in the radio access network. The cost of a basestation is a very important factor in the deployment of the 3G network. This cost can be lowered by merging functionality of the basestation into fewer components. In order to get an appropriate system level model of the 3G WCDMA basestation baseband part, a processing task survey has been done. This survey has been conducted through analysis of the standard documents and published research papers. The results from the survey show that the baseband part of a 128 channel basestation may be possible to implement on a single chip.

# 1. INTRODUCTION

The main standard for third generation mobile communications in Europe is wideband CDMA (WCDMA). The radio network for WCDMA uses cells that are served by basestations. A basestation is connected to a radio network controller (RNC). Each RNC is connected to several basestations. The RNCs are responsible for higher level control, such as handover between basestations.

The central component in the radio network is the basestation. The basestation acts as the cell controller and link between the terminals (user equipments, UE) and the network. The architecture and requirements on the terminals have been investigated by many researchers. This paper however focuses on the processing tasks in the baseband part of the basestation.

In order to allow a good system model of the baseband processing in the basestation to be made, a survey of the processing flows and the tasks involved is necessary. The starting point for this survey is chapter 25 of the Third generation partnership project's (3GPP's) technical specification [1, 2, 3, 4, 5, 6]. Another very good reference on the 3G WCDMA system is the book edited by Holma and Toskala [7].

The radio traffic uses orthogonal code sequences to separate the different simultaneous transmitted channels that share the common RF spectrum. The spreading is done using a combination of a scrambling code and a channelization code. The channelization codes allow multiple physical channels to be multiplexed on a single radio channel. The scrambling code in turn separates different radio channels that share the same spectrum. All terminals (or user equipments, UEs) in a cell use different scrambling codes so that they can be separated at the base station. The basestation in turn uses only a single scrambling code so that the UEs can separate it from other basestations.

This paper is a summary of a processing task survey. It also investigates the feasibility of a one-chip baseband solution for a WCDMA basestation. Section 2 gives a short overview of the WCDMA standard. Section 3 discusses the uplink processing and section 4 the downlink. Some implementation aspects are discussed in section 5. Finally, section 6 draws some conclusions from the survey.

# 2. WCDMA OVERVIEW

The WCDMA standard is very complex and consists of thousands of pages. The main specification highlights in the baseband part are:

- Radio channel spacing is 5 MHz.
- Duplexing mode is either FDD or TDD. This paper will only discuss the FDD mode.

Daniel Wiklund is a Ph.D. student at the division of Computer Engineering, Dept of Electrical Engineering, Linköping University. This work is funded by the STRINGENT electronics center.



Fig. 1. WCDMA uplink reception.

- A radio frame is 10 ms, corresponding to 38400 chips. Each frame is divided into 15 slots (2560 chips).
- Chip rate is 3.84 Mchip/s.
- Spreading is done through a combination of channelization and scrambling codes. The channelization codes are from the orthogonal variable spreading factor (OVSF) family. The scrambling codes are truncated long gold codes.
- All UEs use different scrambling codes for separation in the uplink.
- A basestation uses a single scrambling code for the downlink.
- Fast power control is employed at radio frame slot level (i.e. 1500 Hz). This keeps the power received at the basestation similar for all the UEs in a cell. The benefit is that the near-far problem experienced when sharing spectrum is removed.

The specifications for the uplink (i.e. UE to basestation) and the downlink differs on some points. E.g. the uplink use BPSK or QPSK modulation while the downlink use QPSK or 16QAM. The result is a possible doubling of the number of bits transmitted in the downlink compared to the uplink.

## 3. WCDMA UPLINK

The data rate for the worst case in the WCDMA uplink case is 2048 kbit/s. This corresponds to three data channels and one control channel code-multiplexed in the uplink. The data processing flow for the uplink is shown in Fig. 1. Table 1 shows a summary of the processing tasks and an estimate of the processing operations and memory consumption of each task. The WCDMA standard contains an overview over the processing tasks and bit rates for the uplink in TS25.104 [2].

#### 3.1. Uplink reception tasks - Chip rate

The incoming code-multiplexed signal is at the chip rate of 3.84 Mchip/s. The pilots in the uplink are used in the multipath search filter to get the coefficients for

**Table 1**. Processing requirements summary for the WCDMA uplink

| Task              | Ops    | Mem    | Channels |
|-------------------|--------|--------|----------|
| Multipath search  | 1500 M | 1000   | 1        |
| Rake              | 220 M  | 100    | 3        |
| Chip rate total   | 1720 M | 1100   |          |
| 2nd deinterleave  | 0      | 120 k  | 1        |
| Rate matching     | 1 M    | 125 k  | 1        |
| Radio frame reasm | 0      | 125 k  | 1        |
| 1st deinterleave  | 0      | 1000 k | 1        |
| Turbo             | 1104 M | 1000 k | 1        |
| CRC 16            | 128 k  | 330 k  | 40       |
| Viterbi           | 1 M    | 1 k    | 1        |
| CRC 12            | 200    | 1 k    | 1        |
| Data rate total   | 1106 M | 2702 k |          |

the Rake receiver. This multipath filter can be implemented as an adaptive FIR filter. If the signal is oversampled four times and the equivalent FIR filter has 100 taps the total processing requirements will be around 1.5 GMAC/s.

After the incoming multipaths have been identified the code demultiplexing and despreading are done in the Rake receivers. The only possibility for achieving high throughput in the multipath combining is using an ASIC implementation of the Rake and multipath detection subsystem.

The different subchannels are separated, despread, and demapped after the Rake [5]. Both the multipath search (channel estimation) filter and the Rake processing must be done in dedicated hardware [8].

Only the Rake receivers and multipath combining are dependent on the actual data rate in the channel. The multipath search filter is only dependent on the pilot channel(s) and is thus completely isolated from the data rate.

#### 3.2. Uplink reception tasks - Data rate

The data from the Rake receiver will have to be stored in order to do the 2nd deinterleaving. This deinterleaving has a window of one 10 ms frame. By doing the deinterleaving using dedicated address generators it will not consume any operations except for memory writes and reads. The dedicated transport channel (DTCH) and the dedicated control channel (DCCH) can be separated after the 2nd deinterleaving.

The next step is the rate matching. Dependent on the load conditions the rate matching is either puncturing or repetition [4]. The restoration of received data in the case of repetition is trivial. Retrieval of the data in the case of puncturing requires some processing.

The radio frame reassembly step is a trivial concatenation operation. This step is done using eight 10 ms frames. The 1st deinterleaving can be done in a fashion similar to the 2nd deinterleaving.

The DCCH has a bitrate of 2.4 kbit/s and uses Viterbi coding with 1/3 rate. Each bit received to the Viterbi decoding is processing intensive but the low data rate gives an overall low processing rate. The CRC is best done in dedicated hardware, but since the data rate is low, it can just as well be implemented in a general purpose processor.

The DTCH bitrate is 2048 kbit/s coded with 1/3 rate Turbo coding. This decoding consumes significant processing power. The figure given in table 1 is based on a general purpose DSP processor [9, 10]. This can be significantly improved with a custom solution but will still consume a significant number of operations.

The data rate processing will scale roughly linear with the data rate. This is because the largest part of the data rate processing is due to the Turbo decoding. Only the control part (i.e. Viterbi and CRC12) is reasonably independent of the data rate.

## 3.3. Memory costs

The memory consumption in the multipath search and Rake is very small. This is due to that these must be run at sample speed and that no buffering can be accepted.

The main memory costs involved in the uplink reception is found in the deinterleaving, rate matching, and reassembly stages. Each stage will require twice the amount of storage compared to the the block size of the processed data. One buffer is used for saving the results from the previous processing stage while the other buffer is used for the processing in this stage.

The highest memory requirements are in the 1st deinterleave and Turbo stages. This is due to that these stages work on eight radio frames at a time.

The memory requirements will scale roughly linear with the data rate in the uplink.

# 4. WCDMA DOWNLINK

The downlink is capable of transmitting 2304 kbit/s with three channels multiplexed using OVSF codes and 16QAM modulation. The spreading factor for this highest data rate is four.

The WCDMA standard contains an overview over the processing tasks and bit rates for the downlink in



Fig. 2. WCDMA downlink transmission.

 Table 2. Processing requirements summary for the

 WCDMA downlink

| Task              | Ops    | Mem   | Channels |
|-------------------|--------|-------|----------|
| CRC12             | 210    | 1 k   | 1        |
| Viterbi           | 2.8 k  | 1 k   | 1        |
| CRC16             | 144 k  | 47 k  | 3        |
| Turbo             | 2314 k | 140 k | 1        |
| Rate matching     | 1000 k | 110 k | 1        |
| 1st interleave    | 0      | 110 k | 1        |
| Radio frame segm  | 0      | 110 k | 1        |
| 2nd interleave    | 0      | 110 k | 1        |
| Data rate total   | 3461 k | 630 k |          |
| Spreading/mapping | 150 M  | 0     | 1        |
| Chip rate total   | 150 M  | 0     |          |

TS25.101 [1]. The overview of the downlink is limited to the 384 kbit/s case.

# 4.1. Downlink transmission tasks - Data rate

The first step in the downlink transmission is to add the CRC values to the data and control channels. As with the uplink channel CRC12 is used for control and CRC16 is used for data. The CRC calculations are exactly the same complexity for sender and transmitter so the values from the uplink can be reused with adjustments for the rate difference.

Both Turbo and Viterbi coding are trivial tasks with appropriate hardware (as defined in section 4.2.3 of the 25.212 document [4]). If the Viterbi encoder is implemented in a general purpose DSP processor it can be compared to three parallel eight-tap FIR filters. The Turbo encoder is comparable to two parallel three-tap IIR filters.

Rate matching is comparable in complexity to the uplink reception rate matching. Interleaving and segmentation are trivial tasks for the transmission as well. These can be done with configurable address generation hardware just as for the reception case above.

#### 4.2. Downlink transmission tasks - Chip rate

Mapping is a simple serial to 2/4-bit parallel conversion for each downlink channel. Half of the parallelized bits are used as the in-phase value and the other half are used for the quadrature value. 2-bit parallelization is used for QPSK and 4-bit parallelization for 16QAM. Each channel is then spread using the channelization and scrambling codes.

The final step is to sum all the channels (including the dedicated downlink control channels) using a weighted sum.

The chip rate processing requires a total number of complex operations that is  $3N_{ch}$  multiplications and  $N_{ch} - 1$  additions per chip.  $N_{ch}$  is the total number of physical channels and will typically be 10 for the downlink. This number can be somewhat less if some optimizations of the processing is done.

## 4.3. Memory costs

The memory costs in the downlink processing is significantly lower than for the uplink. This is mainly due to the smaller window that the tasks operate on. The downlink DTCH processing is restricted to one radio frame of 10 ms.

# 5. IMPLEMENTATION ASPECTS

The uplink reception is by far the most processing intensive part. Dedicated hardware is necessary to be able to reach the processing requirements. This means that multipath search, Rake, Turbo decoding, and CRC16 should be done in dedicated (configurable) hardware because of the heavy computing demands. Special configurable address generators should be used to ease the deinterleaving and radio frame reassembly tasks. All other uplink tasks can be performed in general purpose DSPs.

Though the downlink processing is much less intense, the CRCs, Turbo, and Viterbi coding should preferably be done in configurable dedicated hardware. Interleaving and segmentation can be supported by configurable address generators as in the uplink. The most efficient way of implementing the chip rate processing is to use dedicated hardware, but the requirements are sufficiently low to allow an implementation based on a general purpose DSP.

The memory requirements are in the order of 3 Mbit per channel for full speed reception. It must be considered a rare situation that all UEs will transmit at full rate simultaneously so the real figure will be significantly less on average.

# 6. CONCLUSIONS

The processing and memory requirements for a 3G WCDMA basestation has been analyzed under the assumption that full speed transmission occurs. This has shown that the total processing is in the order of 3 Gop/s per channel for reception and 155 Mop/s for transmission. Assuming 128 channels this will give around 380 Gop/s for the basestation.

At the same time, a total of 2.7 Mbit memory is required per uplink channel and 0.63 Mbit for the downlink. This sums up to around 45 MBytes for 128 channels.

As have been stated earlier, the situation where all uplinks run at maximum rate must be considered rare. Assume that the data rate is only half and the processing requirements for 128 channels will drop to around 290 Gop/s consuming 22 MBytes of memory.

With these figures it is reasonable to assume that the baseband part of the basestation can be implemented in a single chip. This of course assumes that other issues, e.g. power distribution and dissipation, can be handled.

# 7. REFERENCES

- [1] 3GPP, TS25.101, v6.0.0 edition, Dec. 2003.
- [2] 3GPP, TS25.104, v6.0.0 edition, Dec. 2003.
- [3] 3GPP, *TS*25.211, v6.0.0 edition, Dec. 2003.
- [4] 3GPP, TS25.212, v6.0.0 edition, Dec. 2003.
- [5] 3GPP, TS25.213, v6.0.0 edition, Dec. 2003.
- [6] 3GPP, TS25.214, v6.0.0 edition, Dec. 2003.
- [7] Harri Holma and Antti Toskala, Eds., WCDMA for UMTS: Radio access for third generation mobile communications, Wiley, 2000, ISBN 0-471-72051-8.
- [8] Lasse Harju, Mika Kuulusa, and Jari Nurmi, "Flexible implementations of a WCDMA RAKE receiver," in *IEEE Workshop on Signal Processing Systems (SIPS)*, 2001.
- [9] Xiao-Jun Zeng and Zhi-Liang Hong, "Design and implementation of a turbo decoder for 3G W-CDMA systems," *IEEE Transactions on consumer electronics*, vol. 48, no. 2, pp. 284–291, 2002.
- [10] F. Kienle, H. Michel, F. Gilbert, and N. Wehn, "Efficient map-algorithm implementation on programmable architectures," in *Advances in Radio science*, 2003, number 1, pp. 259–263.