|Oct 12, 2012
|Eugeny Brychkov (RU)
The dream of every computer geek is a technical and technological excellence, more memory, larger hard disk, and higher speed. We have committed on the GR8BIT specifications page that GR8BIT is expected to work at 4-20MHz, and here we will see if this promise is achieved, and if it is achievable.
Warning: accelerated GR8BIT will not add value if you use machine for gaming. All the MSX games will work fine at the standard 3.58MHz clock speed. Even more, above specific clocking frequency level accelerated GR8BIT is not guaranteed to work 100% properly in terms of representation of the graphics.
Accelerated GR8BIT will add real value in terms of computing power. There's absolutely no goal to make GR8BIT a super-computer, and this document rather describes technological experiment and endeavor to show the flexibility of the MSX architecture and give another piece of important experience.
Special thanks to Alwin Henseler (NL) for sharing his experience about his Super-Turbo, for his advices on circuit design and for reviewing this KB article.
If we are talking about the computer's speed, we need to consider local and global clocks using within it. Clock signal is fed to the CPU, to the GR8BUS and all devices attached to it, to slowdown circuit and to the slot board (and thus devices within slots 1 and 2). By default there's one global clock signal, supplied by the VDP to the GR8BUS and spread throughout the system (CPU board's SJ1 and SJ7 are closed, and video board's SJ1 is closed). It equals to 3.58MHz, and strobes slowdown circuit in terms of wait state count, M1 wait state processing, serves as reference frequency for PSG on the I/O board, SCC in the slot board or converter board, and other devices.
If we change GR8BUS clocking speed, it will not affect PSG as I/O board is designed to hold dedicated clock generation circuit, but other devices will be affected.
CPU board is also designed to hold dedicated clock generation circuit, and it can be used to supply global clock signal to the GR8BUS (if CPU board's IC1 and other related elements are installed, and video board's SJ1 is open). You can vary system speed by varying nominal of the XTAL1. But increasing speed will decrease clock period relative to the specific chip specifications, and those chips may not survive proper operation.
Let's consider the most vital case, video display processor, which is required for proper system operation and dialogue between you and your GR8BIT.
Figure 1. Example of the waveforms (scales are not kept)
VDP is driven by the IORQ signal (fig. 1). This signal has a length (taccess_cycle), has active phase (tactive), and inactive phase (tinactive). Let's simplify situation and assume that two consecutive IORQ signals address VDP.
During the active phase, VDP samples data from the data bus or puts its own data on the data bus. VDP works with specific speed and if tactive is less than it can serve, VDP will not be able to sample correct data from the data bus (on CPU write), as well as be too late to supply its data to the CPU (on CPU read). If you'll look into VDP datasheet, this minimal time equals to 186 nanoseconds and typical value - 700ns (page 121).
Inactive phase timing is also vitally important - VDP is processing information during this period of time. It may need to read data from the video memory, or perform specific hardware-accelerated operation. Look into the same data sheet, and you will see that minimal inactivity time varies between 2µs and 8µs (depends on the VDP command phase and operation requested). At CPU clock speed of 3.58Mhz, 8µs equals to 28.5 T-cycles. To compare, one NOP command takes 4 T-cycles, two EX (SP),HL command takes 38 cycles.
Therefore, increasing GR8BIT system clock frequency decreases length of T-Cycle, and thus effective time dedicated for VDP data sampling and processing, and at some critical value VDP stops working correctly. First it exhibits as small artifacts on the screen, and as speed increases makes machine stop working properly.
With GR8BIT you have flexibility to adjust length of the active phase of the control signal using its slowdown circuit, but it has limitation of 15 additional wait states, and does not allow lengthening time of signal inactivity.
This option allows to slightly increase the processing speed of the machine. No major hardware/design modifications are required. Here're the steps to achieve the results:
The drawback of this implementation that global clock frequency changes, leading to changed pitch of PSG and SCC music.
Your homework: try different crystals to see at which its nominal you notice artifacts on the screen/GR8BIT does not start, varying number of wait-states inserted using SW1 main board's switch. Also ensure that this switch has position #5 turned off (for modified slowdown circuit - turned on), and position #6 turned on (so that additional wait-states would apply also to PPI and RTC access cycles).
This option has a right to exist, however after implementing it and testing the system we decided to discard this option completely for the reason that it was almost impossible to play games which heavily used VDP and PSG resources due to game play snatching including video effects and music/audio effects.
The idea is that system clock stops for predefined period of clock cycles when specific conditions are met - we used condition of IORQ (I/O port access) deactivation to accommodate the lengthened inactivity period of time (activity period of time is still variable using slowdown circuit of the main board).
If you are interested to know more about this option, or interested in schematic, please let us know. It requires 3 additional chips to be installed onto CPU board, and some other very minor modifications.
This option is more complicated to implement, but allows to significantly speed systems' computing power without adverse effects exhibited by the option #2. In our tests the maximal CPU clocking frequency we achieved with proper system (and VDP in particular) is 12MHz. With crystal of 14.318MHz system did not start at all in its maximal (in this design version) wait-state and brake settings configuration. With 10MHz Z80 CPU provided within your GR8BIT you can use 10MHz crystal, with 20MHz CPU we recommend to use 12MHz crystal.
After modification you will have GR8BIT storage subsystem and most Konami games working (with proper sound). In gaming mode (with massive VDP access) average speed of the CPU is braked to 3.58MHz, in computational mode (when CPU does not perform port I/O access) CPU runs at full speed equal to the value of the crystal of the CPU board.
We used simple condition to trigger CPU speed braking - activation of the IORQ line. As soon as it activates, system starts getting clock signal from another, "low speed" clock source, which ensures proper operation of the external and internal devices like VDP (ports 98h-9bh), RTC (ports 0b4h-0b5h), PPI (ports 0a8h-0abh), GR8BIT storage subsystem (ports 0a4h-0a5h) and many others. As you will see from the explanation below, you have a freedom to choose other (additional) signals to trigger braking, and have some options to tune how braking system works and for which time CPU clock signal is braked.
Let's look onto fig. 2. There're six input signals for the braking circuit in this implementation -
and one output signal, CPUCLK, which clocks CPU (and some other circuits if you choose so - read on).
Here's a description of the functions of the logic elements on the schematic:
Figure 2. Schematic of the braking circuit. Input signals are on the left (2+4), and output is on the right (1)
After the reset condition exact function of the IC5/IC4 depends on how IC4 initialized: if its output connected to BRAKE initialized in high level, IC5B's output is set to 1, counter does not receive clocking signal, and system starts at CLKHI clock; if output connected to BRAKE is initialized in low level, system starts at CLKLO clock, selected clock is passed through IC5B to the counter. As soon as counter reaches the state when output connected to BRAKE becomes high, BRAKE deactivates, clock stops being supplied to the counter, and system runs at CLKHI speed. Thus system reaches stable state of running at CLKHI speed within maximum 255 IC4A's clock cycles (full loop of 256 states minus one).
State changes as soon as IORQ (or any other triggering ACTx) signal goes low. It immediately clears the counter setting CRES signal high, also resetting BRAKE signal low, thus system switches to CLKLO now on. This "counter reset" state lasts until respective triggering signal is released (become inactive), and CRES signal goes low. From this time counter starts to count (as chosen clock signal is passed to the IC4A through IC5B).
If there's another triggering signal activation occurs, counter IC4 got reset again, and counting restarts.
If no following triggering signal occurs within the timeframe when IC4 counter reaches value defined by the connection of the BRAKE line, BRAKE signal goes high, and clock switching circuit starts supplying CLKHI to the CPUCLK line. At the same time IC5B's input 9 becomes low, and disables further counting (IC5B will always output high level to the IC4A clock input). Therefore, when counter reaches predefined number of cycles, it is stopped and system switches to high speed. This state is stable, and changes when any of triggering signals activate next time.
Now let's consider how clock switching circuit works. It has three inputs: CLKLO, CLKHI, and BRAKE, and one output, CPUCLK, which is expected to be equal to CLKHI if BRAKE is high level, and CLKLO if BRAKE is low level.
After the reset condition, BRAKE is high level for a long time (relative to CLKHI and CLKLO), and whatever frequency is present on IC6's pin 7 (STAGE) , IC3A is in high state, and IC3B is in high state too (look carefully - you will see that IC3A and IC3B are "daisy-chained" - positive output of IC3A is serving as data input to IC3B, and they both are clocked in by the same signal STAGE). Therefore, IC6 inputs A and B are both in high level, selecting 1C3 input (CLKHI) for STAGE output and 2C3 (also CLKHI) for CPUCLK output. This circuit is in stable state.
As soon as BRAKE signal goes active (low), on the next positive edge of CLKHI (which went through IC6's input pin 1C3, out through 1Y output in to STAGE signal), IC3A latches low level. IC3B remains in high level, and IC6's "BA" input pin combination becomes 01 (binary), selecting 1C1 for STAGE signal (CLKLO), and 2C1 for CPUCLK. Please note that at the time, when STAGE goes high and locks low level in IC3A, CPUCLK is equal to STAGE, and also in high level. So when this transition of "BA" is performed, CPUCLK is sourced from input 2C1 which is in high level, and CPUCLK does not go low when CLKHI or CLKLO go low.
At this time STAGE is equal to CLKLO, and on the next positive edge of this signal, IC3A does not change state (it is already low as BRAKE is active), but IC3B changes state to low, locking it from IC3A's output. Now combination "BA" of the IC6 source select inputs is equal to 00, and 1C0 is selected for STAGE signal (no change), and 2C0 is selected for the CPUCLK signal (which becomes equal to CLKLO - which just transitioned from low to high level, and is in high level).
All further cycles, until BRAKE goes inactive, STAGE is equal to CLKLO, and CPUCLK also equal to CLKLO, supplying low frequency to the CPU. As we see transition was performed seamlessly, with probably small suspension when CPUCLK is high during transition of low level from IC3A output to IC3B's output.
When BRAKE goes inactive, on the next positive edge of STAGE (which equals to CLKLO) IC3A locks high level, and "BA" of IC6 becomes 10, switching STAGE to 1C2 (CLKHI) and CPUCLK to 2C2 (high level). The same hint works here - CPUCLK just became high level before switching to 2C2 input, and remain in high level while "BA" is equal to 10. On the next positive edge of STAGE (=CLKHI), IC3B locks high level from output if IC3A, and "BA" becomes 11, switching STAGE to 1C3 (no change, stays CLKHI), and CPUCLK to 2C3 (from high level to CLKHI's high level phase), and CLKHI continues to "clock out" through the CPUCLK line to the CPU, making it run at the full speed. As you see, again, transition between CLKLO and CLKHI was seamless, with probably small suspension at high level.
Now, as you understand how the depicted circuit works, let's see how it fits into your GR8BIT.
Now your GR8BIT has two clock systems running simultaneously - CLKLO, which is generated by the VDP, and has default frequency of 3.58MHz, and CLKHI, which runs according to the crystal you installed as XTAL1 of the CPU board. See fig. 6 and fig. 7 for the images of our test CPU board.
Figure 3. Soldering short air-wires (view from the solder side)
Figure 4. Soldering remaining short air-wires and connecting with IC1 (view from the solder side)
Figure 5. Soldering long air-wires and connecting to CPU (view from the solder side). Dashed lines connect chosen configuration: red dashed line feeds CLKLO to the IC4, and pink dashed line selects 128 braking cycles. The braking cycle in this case takes 128/3.58MHz=35µs. Red right-most air-wire is connected to IORQ. Note that R2 resistor is removed and soldered into R8 position instead
There're several issues in this implementation which may, at some circumstanced, show incompatibility with genuine MSX standard:
Figure 6. Our acceleration test CPU board, component side
Figure 7. Our acceleration test CPU board, solder side (crystal is soldered from solder side because its pads at the component side were damaged due to frequent soldering/de-soldering - see fig. 6)
Figure 8. Running CHKSPD on the accelerated GR8BIT: as CHKSPD is not designed for variable speed designs, it displays slightly wrong results. Value of XTAL1 installed is 12MHz, not 13MHz.
Figure 9. The frequency GR8BIT is running during the MSX-DOS command prompt
End of KB0009 "Accelerating your GR8BIT".
|© 2011-2024 Eugeny Brychkov
GR8BIT™, GR8BOOK™, GR8BUS™, GR8BOX™, GR8STORM™ and GR8NET™ are trademarks
copyright © 2010-2024 Eugeny Brychkov.