Author Topic: A question about ethernet in PCLOS 2009  (Read 874 times)

Offline rastus

  • Sr. Member
  • ****
  • Posts: 256
A question about ethernet in PCLOS 2009
« on: September 04, 2009, 06:19:38 AM »
I'm posting this in the advanced users section rather than in Networks as I've actually solved the problem, but cannot fully understand how, and I would be very interested (in a geeky, nerdy sort of way) to get my head around what has happened.

I will explain.

I had run into an intermittent problem with the ethernet connection to the router. It had plagued me for ages, started after the Big Update, went away, then came back.

At random times, my internet connection would go to sleep. Typically, I'd have a few tabs open in Firefox, perhaps reading a webpage, then either click to open a new tab or to reload an existing one with a fresh page. At that point I would get a " Waiting for google" type message, followed by a "Cannot find the server" error. At first I thought this was a problem with my ISP. However, pinging the router would not get a response. It would reconnect (reluctantly) by either restarting the ethernet connection, rebooting or resetting the router, or by reconfiguring the ethernet connection in PCC (which restarts the connection anyway). Sometimes, if the page was left loading, it would reconnect itself after a time, as if the problem was merely a slow connection.

The computer is dual boot with XP, Windows connects and works flawlessly with the same hardware.

The network is as follows:

Netgear DG834v2  ADSL modem / router / firewall / wireless connection point.

DG834 wired to Abit AV8 based desktop computer (dual boot freshly installed 2009.2 and Windows XP Pro). The computer has a VIA Velocity Gigabit onboard LAN connection. Also wired are a HP OfficeJet 6300 AIO and a Linksys NSLU2 NAS storage link. The Linksys NSLU2 is not used constantly, as it is used for backups only.

The wireless connection side consists of another desktop computer running Windows XP Pro, a laptop running Windows 2000, and a Nintendo Wii. All of these work. Netgear hardware components are used throughout on both the wired and wireless.

The LAN is set up as follows:

Netgear (gateway): 192.168.0.1
Wired desktop computer (the problem): 192.168.0.2
Wireless desktop computer: 192.168.0.3
Wireless laptop: 192.168.0.4
Wired Linksus NSLU2: 192.168.0.5
Wired HP6300 printer: 192.168.0.6
Wireless Nintendo Wii: 192.168.0.7

The 192.168.0.xxx IP addresses are Netgear's default, the router comes out of the box with 192.168.0.1

DNS servers are manually assigned OpenDNS server addresses. The router works as a DHCP server, IP addresses are reserved, so it allocates the same address to the same MAC each time.

Aftre running Wireshark to capture the packets when the disconnects occur, there were continual ARP "Who has 192.168.0.1? Tell 192.168.0.2" before the router finally replied. Sometimes the router wouldn't reply, and the only solution was to reboot the router, or reboot the computer. I have run "arp -vn" from a terminal:

[root@localhost ~]# arp -vn
Address                  HWtype  HWaddress           Flags Mask            Iface
192.168.0.1              ether   00:0f:b5:1b:3c:4a   C                     eth0
Entries: 1      Skipped: 0      Found: 1

Howver, running this command during a disconnect period returned a load of "?"s, so it looks to me as if the ARP cache either corrupted, had been deleted, or PCLOS was trying to refresh by broadcasting to the router and not getting a response.

I tried to view this graphically by running the hardinfo GUI tool to view the cache (by clicking refresh), but the cache remained visible. Either the GUI has limitations or I just wasn't quick enough clicking the refresh and the ARP cache had rebuilt by the time I clicked.  I trust the terminal command and Wireshark and they say the cache info wasn't there.

I finally fixed this by downgrading the router firmware from 3.0.38 back to 3.0.31. I have scoured Google and the various Linux forums (including, obviously, here), also Netgear's forums and FAQs to find somebody else who had problems with this particular firmware and any Linux install. Guess what? Like Tigger, I'm the only one.

I am guessing that PCLOS, or at least the 2009 version, periodically clears its ARP cache, perhaps after a timed period of network inactivity, and the router is discarding the ARP packets. Windows doesn't have a problem with the router firmware, perhaps because it doesn't renew its cache. ARP cache corruption in Windows does cause connection issues occasionally.

Sorry that this has turned into a very long post, but the $64000 question: Does anybody know if PCLOS 2009 does indeed renew the ARP cache periodically? And... has anybody else experienced ethernet problems with PCLOS and a Netgear router? I'm trying to work out if PCLOS is doing the wrong thing, or if the Netgear firmware is faulty. If PCLOS is the problem, there should be other people here with similar issues. Equally, if Netgear's firmware is the problem, there should be complaints about the router. It's not like the DG834 is an unusual machine.
« Last Edit: September 04, 2009, 06:25:41 AM by rastus »

smcs_steve

  • Guest
Re: A question about ethernet in PCLOS 2009
« Reply #1 on: September 05, 2009, 04:36:39 AM »
Hello rastus,
Your experience sounds a little like the drama we have at times using Billion ADSL modem routers on the local Telstra exchanges.  We get a 20 Mb/s sync up but a download of only 4.5 Mb/s and a very ragged connection at that!  To resolve the problem we manually set the MTU in the modem to something like 1492.  Linux then gets upwards of 15Mb/s  (Windows with a bloated antivirus is another matter ~ 11 Mb/s)  My own Dlink G604T is set for MTU=1400
Did you see and change in MTU, MRU values when you changed firmware?
>Steve
« Last Edit: September 05, 2009, 06:02:12 AM by smcs_steve »

Offline rastus

  • Sr. Member
  • ****
  • Posts: 256
Re: A question about ethernet in PCLOS 2009
« Reply #2 on: September 05, 2009, 12:16:38 PM »
Hi Steve

Yes, I tried altering the MTU to various settings between 1400 and 1500. It made no difference.

I found this: http://linux-ip.net/html/ether-arp.html

which states:
Quote
Entries in the ARP cache are periodically and automatically verified unless continually used. Along with net/ipv4/neigh/$DEV/gc_stale_time, there are a number of other parameters in net/ipv4/neigh/$DEV which control the expiration of entries in the ARP cache.


So the arp cache renewal period is configured at kernel build. I am surmising that the arp renewal time is different between different kernel versions, which would explain the problem occurring after the big update, going away again, then returning, as I think I updated the kernel twice. Doesn't explain why the router with one firmware version refuses to reply to the broadcast but responds with another version though.
« Last Edit: September 05, 2009, 04:23:41 PM by rastus »